Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving write performance in casatables #226

Open
d3v-null opened this issue Jan 12, 2023 · 3 comments
Open

Improving write performance in casatables #226

d3v-null opened this issue Jan 12, 2023 · 3 comments

Comments

@d3v-null
Copy link
Contributor

d3v-null commented Jan 12, 2023

Our write performance using rubbl casatables is not quite as good as a C++ application which makes direct casacore API calls.

We think we could counteract this by writing multiple rows into a column simultaneously using BaseColumn::putSlice, instead of many calls to BaseColumn::put

I'm opening this ticket to ask if you have any thoughts on this before I have a crack at this myself. I'll wait until #220 is merged of course.

edit: maybe void putColumnRange (const Slicer& rowRange, const Array<T>& arr); is what we actually want.

Cheers.

@pkgw
Copy link
Owner

pkgw commented Jan 12, 2023

My only thought is that this sounds like a good idea! My current priorities are meaning that for the time being I'm not actively working on this library (although I am more than happy to maintain it), so it's not like I have any relevant work-in-progress that might conflict.

Along those lines, I had plain forgotten about #220 :-( I will add it to my list and try to get that merged.

@d3v-null
Copy link
Contributor Author

d3v-null commented Jan 19, 2023

Hey Peter, thanks for reviewing the other PR. I thought I'd update you on where I'm at with this.

I threw together a test implementation of put_cells and put_column in this branch, along with some benchmarks so that I could test whether this would actually improve performance. Unfortunately, it wasn't as dramatic as I had hoped.

To isolate my performance measurements, I implemented a minimal pure C++ benchmark, which showed that all of the alternative write patterns are significantly slower than puting cells row-wise like we're already doing, which is pretty disappointing.

The benchmark used a single table with:

  • a scalar double TIME column
  • an array[3] float UVW column
  • an array[N_CHANS, N_POLS] complex DATA column

Some definitions:

  • COLUMNWISE means write all of the rows in one column completely before moving to the next column,
  • ROWWISE means write one row completely before moving to the next
  • CELL means write individual cells with put, one at a time
  • CELLS means write all of the cells for a given timestep in groups using putColumnCells
  • COLUMNS means write an entire column in one go using putColumn

I also made an equivalent benchmark in rubbl.

Although the casacore benchmarks say that writing multiple cells simultaneously is much slower, I guess there's enough overhead required for the multi-dimensional columns to copy each cell or block of cells into a new array that the alternative write methods balance things out and give a slight performance gain.

If there are any alternative ways of writing to a table that I've missed, or if you can spot any obvious ways that the benchmarks are lacking, please let me know.

edit: the original C++ benchmark was writing to disk, while rubbl was writing to tmp, they also had different values for options in their column descriptions. Here are the new user times, along with a "noslice" / streamed version which writes the same cell, or chunk of cells repeatedly instead of doing a slice. nTimes=12, nBls=8256, nChs=768, nPols=4 (ms):

table type write mode C++ C++ -s rub rub-s
rowwise cell 324 128 896 681
rowwise cells 184 187 851 647
columnwise cell 303 125 888 672
columnwise cells 186 192 860 643
columnwise column 179 - 654 -

@pkgw
Copy link
Owner

pkgw commented Jan 19, 2023

Thanks so much for the detailed analysis and report! This is really cool (even if it's not the situation I want to be in).

The Rubbl architecture should be a pretty thin layer on top of the casatables I/O, so I'm not sure why the performance difference is so substantial. I'm afraid that I'm not really in a position to dig into this much myself, but I'm happy to try to help as best I can — and having some nice benchmarks like this is totally the place to start.

Once one has the benchmarks, ideally the next thing to do would be to plug them into profiling tools and get quantitative information about where the code is actually spending its time. I have a little experience with that kind of thing and can try to help if needed. Overall I find that setting up profiling usually involves some pretty grievous build hacks to get everything working, so don't worry if you need to do that.

Some other thoughts:

  • Maybe Rubbl is simply compiling the casatables code with different C++ compiler flags that affect performance? I think that release builds should be using sensible flags, but maybe the build.rs script needs to do something extra. And then there are flags specific to CASA — namely, we do have USE_THREADS=1, and don't have other flags.
  • Also, keeping in mind that Rubbl is bundling the casatables C++ code, it's possible that it would benefit from updating, although I would be shocked if that was responsible for the performance difference
  • It might be worth doing an strace of the two benchmarks and checking whether the patterns of I/O system calls look similar. If they do, I'd suspect that we've got some higher-level issue that's turning the CPU into a bottleneck.

Also I'll CC @cjordan just as an FYI — this may be of interest.

@d3v-null d3v-null changed the title Support for putSlice in casatables Improving write performance in casatables Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants