When file indices are provided, only read models for the specified im… #210

dwpaley · 2020-08-13T19:18:32Z

We were loading models (beam, detector, gonio, scan) for every image in a multiimage file, even when only 1 image was requested. This makes a big difference: FormatMultiImage.get_imageset now takes 0.1s instead of 0.8s when making a 1-image "set" from a 600-image .h5 file.

Leaving this as a draft while waiting to see tests.

…ages

graeme-winter · 2020-08-14T13:04:52Z

@dwpaley I suspect a test which exercised this would make it easier for reviewers to appreciate the differences here. single_file_indices is not something I for one am familiar with...

format/FormatMultiImage.py

dwpaley · 2020-09-23T22:23:33Z

I think I undersold the performance impact of this change with the 0.1s/0.8s stat. When loading serial data, a new imageset is created for every shot, so that number is multiplied by the number of images.

I can give an example but unfortunately not using data in one of the regression data repos. I tested on dials.stills_process results from a 615-image .h5 data set collected on the MPCCD at SACLA. The run had 12 indexed images. I called $ dials.image_viewer *_integrated.* and timed until the viewer window opened:

Current master: 27.5, 23.9, 23.9 s
This branch: 6.7, 8.6, 6.0 s

I hope this helps illustrate why I consider this important. Yes it can also be fixed by supplying load_models=False, but this makes the default parameters usable.

graeme-winter · 2020-09-24T06:02:08Z

I think I undersold the performance impact of this change with the 0.1s/0.8s stat. When loading serial data, a new imageset is created for every shot, so that number is multiplied by the number of images.

This was what the "load before heat death" patch set was for in #118 - every time you load an image you make an imageset of N images so you end up with a lovely N^2 process - when N ~ 20,000 this bites badly.

This is a design failure in imageset creation 😕

dwpaley · 2020-10-01T06:47:35Z

@graeme-winter After discussing with @phyy-nx and @nksauter I refactored this so that the loop is over single_file_indices instead of over range(num_images). Thus no need to test if i is in single_file_indices. The arrays beam, detector etc still need to be the same length as num_images, so we fill them with None and then directly use single_file_indices to index into them.

format/FormatMultiImage.py

tests/test_imageset.py

format/FormatMultiImage.py

tests/test_imageset.py

format/FormatMultiImage.py

tests/test_imageset.py

- Test get_imageset with single_file_indices not specified - General refactoring

FormatMultiImage.get_imageset assumes that single_file_indices=[] means we're not loading any images. However when the empty list is passed to ImageSet, it is treated the same as None, i.e. load all images in the container. To avoid inconsistency, this just crashes if passed an empty list.

format/FormatMultiImage.py

codecov · 2020-10-13T21:19:57Z

Codecov Report

Merging #210 into master will decrease coverage by 0.20%.
The diff coverage is 74.19%.

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
- Coverage   45.38%   45.17%   -0.21%     
==========================================
  Files         228      228              
  Lines       19197    19253      +56     
  Branches     2721     2727       +6     
==========================================
- Hits         8712     8697      -15     
- Misses       9971    10042      +71     
  Partials      514      514

When file indices are provided, only read models for the specified im…

60ceb84

…ages

dwpaley marked this pull request as ready for review August 13, 2020 20:31

graeme-winter reviewed Aug 14, 2020

View reviewed changes

format/FormatMultiImage.py Outdated Show resolved Hide resolved

dwpaley added 3 commits August 18, 2020 14:04

short-circuit test 'i in single_file_indices' if it is superfluous

e442c35

test single_file_indices by counting calls to format._beam

fd9ee14

py2 compatibility

2316e75

refactor loop to avoid testing 'i in single_file_indices'

d0e578a

Anthchirp reviewed Oct 1, 2020

View reviewed changes

format/FormatMultiImage.py Outdated Show resolved Hide resolved

format/FormatMultiImage.py Outdated Show resolved Hide resolved

tests/test_imageset.py Outdated Show resolved Hide resolved

some cleanup items from @Anthchirp

dc476ce

rjgildea reviewed Oct 1, 2020

View reviewed changes

dwpaley added 2 commits October 1, 2020 17:24

Edits requested by @rjgildea

038b44a

- Test get_imageset with single_file_indices not specified - General refactoring

rjgildea reviewed Oct 2, 2020

View reviewed changes

format/FormatMultiImage.py Outdated Show resolved Hide resolved

dwpaley added 2 commits October 2, 2020 11:26

parametrize test_single_file_indices

2da42f9

Merge branch 'master' into multiimage_use_indices

99c0a35

add newsfragment

4b4c162

dwpaley force-pushed the multiimage_use_indices branch from 7c8ff22 to 4b4c162 Compare November 18, 2020 19:17

dwpaley merged commit cb6e61b into master Nov 18, 2020

dwpaley deleted the multiimage_use_indices branch November 18, 2020 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When file indices are provided, only read models for the specified im… #210

When file indices are provided, only read models for the specified im… #210

dwpaley commented Aug 13, 2020

graeme-winter commented Aug 14, 2020

dwpaley commented Sep 23, 2020

graeme-winter commented Sep 24, 2020

dwpaley commented Oct 1, 2020

codecov bot commented Oct 13, 2020

When file indices are provided, only read models for the specified im… #210

When file indices are provided, only read models for the specified im… #210

Conversation

dwpaley commented Aug 13, 2020

graeme-winter commented Aug 14, 2020

dwpaley commented Sep 23, 2020

graeme-winter commented Sep 24, 2020

dwpaley commented Oct 1, 2020

codecov bot commented Oct 13, 2020

Codecov Report