-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When file indices are provided, only read models for the specified im… #210
Conversation
@dwpaley I suspect a test which exercised this would make it easier for reviewers to appreciate the differences here. |
I think I undersold the performance impact of this change with the 0.1s/0.8s stat. When loading serial data, a new imageset is created for every shot, so that number is multiplied by the number of images. I can give an example but unfortunately not using data in one of the regression data repos. I tested on dials.stills_process results from a 615-image .h5 data set collected on the MPCCD at SACLA. The run had 12 indexed images. I called Current master: 27.5, 23.9, 23.9 s I hope this helps illustrate why I consider this important. Yes it can also be fixed by supplying load_models=False, but this makes the default parameters usable. |
This was what the "load before heat death" patch set was for in #118 - every time you load an image you make an imageset of N images so you end up with a lovely N^2 process - when N ~ 20,000 this bites badly. This is a design failure in imageset creation 😕 |
@graeme-winter After discussing with @phyy-nx and @nksauter I refactored this so that the loop is over |
- Test get_imageset with single_file_indices not specified - General refactoring
FormatMultiImage.get_imageset assumes that single_file_indices=[] means we're not loading any images. However when the empty list is passed to ImageSet, it is treated the same as None, i.e. load all images in the container. To avoid inconsistency, this just crashes if passed an empty list.
Codecov Report
@@ Coverage Diff @@
## master #210 +/- ##
==========================================
- Coverage 45.38% 45.17% -0.21%
==========================================
Files 228 228
Lines 19197 19253 +56
Branches 2721 2727 +6
==========================================
- Hits 8712 8697 -15
- Misses 9971 10042 +71
Partials 514 514 |
7c8ff22
to
4b4c162
Compare
We were loading models (beam, detector, gonio, scan) for every image in a multiimage file, even when only 1 image was requested. This makes a big difference: FormatMultiImage.get_imageset now takes 0.1s instead of 0.8s when making a 1-image "set" from a 600-image .h5 file.
Leaving this as a draft while waiting to see tests.