-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving write performance in casatables
#226
Comments
My only thought is that this sounds like a good idea! My current priorities are meaning that for the time being I'm not actively working on this library (although I am more than happy to maintain it), so it's not like I have any relevant work-in-progress that might conflict. Along those lines, I had plain forgotten about #220 :-( I will add it to my list and try to get that merged. |
Hey Peter, thanks for reviewing the other PR. I thought I'd update you on where I'm at with this. I threw together a test implementation of To isolate my performance measurements, I implemented a minimal pure C++ benchmark, which showed that all of the alternative write patterns are significantly slower than The benchmark used a single table with:
Some definitions:
I also made an equivalent benchmark in rubbl. Although the casacore benchmarks say that writing multiple cells simultaneously is much slower, I guess there's enough overhead required for the multi-dimensional columns to copy each cell or block of cells into a new array that the alternative write methods balance things out and give a slight performance gain. If there are any alternative ways of writing to a table that I've missed, or if you can spot any obvious ways that the benchmarks are lacking, please let me know. edit: the original C++ benchmark was writing to disk, while rubbl was writing to tmp, they also had different values for
|
Thanks so much for the detailed analysis and report! This is really cool (even if it's not the situation I want to be in). The Rubbl architecture should be a pretty thin layer on top of the casatables I/O, so I'm not sure why the performance difference is so substantial. I'm afraid that I'm not really in a position to dig into this much myself, but I'm happy to try to help as best I can — and having some nice benchmarks like this is totally the place to start. Once one has the benchmarks, ideally the next thing to do would be to plug them into profiling tools and get quantitative information about where the code is actually spending its time. I have a little experience with that kind of thing and can try to help if needed. Overall I find that setting up profiling usually involves some pretty grievous build hacks to get everything working, so don't worry if you need to do that. Some other thoughts:
Also I'll CC @cjordan just as an FYI — this may be of interest. |
putSlice
in casatables
casatables
Our write performance using rubbl casatables is not quite as good as a C++ application which makes direct casacore API calls.
We think we could counteract this by writing multiple rows into a column simultaneously using
BaseColumn::putSlice
, instead of many calls toBaseColumn::put
I'm opening this ticket to ask if you have any thoughts on this before I have a crack at this myself. I'll wait until #220 is merged of course.
edit: maybe
void putColumnRange (const Slicer& rowRange, const Array<T>& arr);
is what we actually want.Cheers.
The text was updated successfully, but these errors were encountered: