Improving write performance in `casatables` #226

d3v-null · 2023-01-12T11:25:54Z

Our write performance using rubbl casatables is not quite as good as a C++ application which makes direct casacore API calls.

We think we could counteract this by writing multiple rows into a column simultaneously using BaseColumn::putSlice, instead of many calls to BaseColumn::put

I'm opening this ticket to ask if you have any thoughts on this before I have a crack at this myself. I'll wait until #220 is merged of course.

edit: maybe void putColumnRange (const Slicer& rowRange, const Array<T>& arr); is what we actually want.

Cheers.

The text was updated successfully, but these errors were encountered:

pkgw · 2023-01-12T14:35:36Z

My only thought is that this sounds like a good idea! My current priorities are meaning that for the time being I'm not actively working on this library (although I am more than happy to maintain it), so it's not like I have any relevant work-in-progress that might conflict.

Along those lines, I had plain forgotten about #220 :-( I will add it to my list and try to get that merged.

d3v-null · 2023-01-19T04:10:19Z

Hey Peter, thanks for reviewing the other PR. I thought I'd update you on where I'm at with this.

I threw together a test implementation of put_cells and put_column in this branch, along with some benchmarks so that I could test whether this would actually improve performance. Unfortunately, it wasn't as dramatic as I had hoped.

To isolate my performance measurements, I implemented a minimal pure C++ benchmark, which showed that all of the alternative write patterns are significantly slower than puting cells row-wise like we're already doing, which is pretty disappointing.

The benchmark used a single table with:

a scalar double TIME column
an array[3] float UVW column
an array[N_CHANS, N_POLS] complex DATA column

Some definitions:

COLUMNWISE means write all of the rows in one column completely before moving to the next column,
ROWWISE means write one row completely before moving to the next
CELL means write individual cells with put, one at a time
CELLS means write all of the cells for a given timestep in groups using putColumnCells
COLUMNS means write an entire column in one go using putColumn

I also made an equivalent benchmark in rubbl.

Although the casacore benchmarks say that writing multiple cells simultaneously is much slower, I guess there's enough overhead required for the multi-dimensional columns to copy each cell or block of cells into a new array that the alternative write methods balance things out and give a slight performance gain.

If there are any alternative ways of writing to a table that I've missed, or if you can spot any obvious ways that the benchmarks are lacking, please let me know.

edit: the original C++ benchmark was writing to disk, while rubbl was writing to tmp, they also had different values for options in their column descriptions. Here are the new user times, along with a "noslice" / streamed version which writes the same cell, or chunk of cells repeatedly instead of doing a slice. nTimes=12, nBls=8256, nChs=768, nPols=4 (ms):

table type	write mode	C++	C++ -s	rub	rub-s
rowwise	cell	324	128	896	681
rowwise	cells	184	187	851	647
columnwise	cell	303	125	888	672
columnwise	cells	186	192	860	643
columnwise	column	179	-	654	-

pkgw · 2023-01-19T14:33:38Z

Thanks so much for the detailed analysis and report! This is really cool (even if it's not the situation I want to be in).

The Rubbl architecture should be a pretty thin layer on top of the casatables I/O, so I'm not sure why the performance difference is so substantial. I'm afraid that I'm not really in a position to dig into this much myself, but I'm happy to try to help as best I can — and having some nice benchmarks like this is totally the place to start.

Once one has the benchmarks, ideally the next thing to do would be to plug them into profiling tools and get quantitative information about where the code is actually spending its time. I have a little experience with that kind of thing and can try to help if needed. Overall I find that setting up profiling usually involves some pretty grievous build hacks to get everything working, so don't worry if you need to do that.

Some other thoughts:

Maybe Rubbl is simply compiling the casatables code with different C++ compiler flags that affect performance? I think that release builds should be using sensible flags, but maybe the build.rs script needs to do something extra. And then there are flags specific to CASA — namely, we do have USE_THREADS=1, and don't have other flags.
Also, keeping in mind that Rubbl is bundling the casatables C++ code, it's possible that it would benefit from updating, although I would be shocked if that was responsible for the performance difference
It might be worth doing an strace of the two benchmarks and checking whether the patterns of I/O system calls look similar. If they do, I'd suspect that we've got some higher-level issue that's turning the CPU into a bottleneck.

Also I'll CC @cjordan just as an FYI — this may be of interest.

d3v-null changed the title ~~Support for putSlice in casatables~~ Improving write performance in casatables Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving write performance in `casatables` #226

Improving write performance in `casatables` #226

d3v-null commented Jan 12, 2023 •

edited

Loading

pkgw commented Jan 12, 2023

d3v-null commented Jan 19, 2023 •

edited

Loading

pkgw commented Jan 19, 2023

Improving write performance in casatables #226

Improving write performance in casatables #226

Comments

d3v-null commented Jan 12, 2023 • edited Loading

pkgw commented Jan 12, 2023

d3v-null commented Jan 19, 2023 • edited Loading

pkgw commented Jan 19, 2023

Improving write performance in `casatables` #226

Improving write performance in `casatables` #226

d3v-null commented Jan 12, 2023 •

edited

Loading

d3v-null commented Jan 19, 2023 •

edited

Loading