Lazy data in PP save pairs #2452

pp-mo · 2017-03-22T13:33:58Z

Preserve lazy data in PP save-pairs generation, so you can work with a list of save cubes and/or fields without excessive memory requirements (as intended for saving).

The existing PP save code does not preserve lazy data when making a PPField from a 2d cube slice.
There was no point bothering with that, as the PP save rules fetch the data, basically just to get the data fill-value (if there is one).

This fixes that, so that the save rules, and save field generation, both preserve the lazy aspect.
Changes here:

preserve lazy data in generating PP save pairs
add a cube.fill_value to generated saved cubes (only)
- this approach anticipates a probable future API change for cube to have a standard cube.fill_value property -- as currently under discussion for the dask migration work.

pp-mo · 2017-03-23T12:03:12Z

@bjlittle can you comment whether this now looks sensible, in light of ongoing Dask development ?

cpelley · 2017-03-28T12:41:20Z

Happy to review this when it's ready. We have external collaborators (CMIP6) who this effects and have limited availability of hardware to work around this problem.

Cheers

cpelley · 2017-04-20T08:32:26Z

this approach anticipates a probable future API change for cube to have a standard cube.fill_value property -- as currently under discussion for the dask migration work.

Description a little out of date?? #2433

marqh · 2017-04-20T12:58:26Z

Description a little out of date?? #2433

hi @cpelley

those changes are on a feature branch, no dask code is yet on master, to my knowledge.

so, i don't think this is out of date

cpelley · 2017-04-20T08:48:16Z

lib/iris/fileformats/pp.py

+        # Attach an extra cube "fill_value" property, allowing the save rules
+        # to deduce MDI more easily without realising the data.
+        # NOTE: it is done this way because this property may exist in future.
+        slice2D.fill_value = _data_fill_value(slice2D)


Adding a fill_value attribute at run-time seems rather messy to me especially as it presumes that the dask branch ends up merged onto master as it is. Perhaps it would be better to point this PR to the dask branch instead?
Saying that I'm not going to push this, If your happy then I'm happy.

Perhaps this could be avoided by making the code in _data_fill_value() into a set of save rules to determine pp.bmdi.

This is a purely temporary measure, based on the idea that the forthcoming dask change will definitely be introducing the generic "cube.fill_value" property.
We will remove this hack when the dask branch is merged back.
Obviously, the method actually used here is 🤢 but it is only here so you guys can check that it fixes the current memory burn problem.

cpelley · 2017-04-20T08:54:05Z

lib/iris/fileformats/pp.py

+        if slice2D.has_lazy_data():
+            slice_core_data = slice2D.lazy_data()
+        else:
+            slice_core_data = slice2D.data


Is this really necessary? Can't you simply avoid the if-else and instead simply always set it to the lazy array i.e. I doubt there would be much overhead in re-instantiating a lazy array when it has been realised.

i.e. pp_field._data = slice2D.lazy_data()

It's best like this because creating a simple wrapper around real data is substantially more costly in dask than it was with biggus : The earliest dask branch retained existing cube property code to get cube.shape and cube.dtype from cube.lazy_data(), and then the tests took nearly twice as long!
Having said that, this code should be removed when we transition to dask anyway, as mentioned above

cpelley · 2017-04-25T12:56:33Z

Thanks @pp-mo, ping @hdyson

hdyson · 2017-04-25T13:00:07Z

Thanks all, it's very much appreciated. I'll keep you informed once we've done a test with the data that revealed this issue (this may take a few days, it was an external collaborator that found it).

hdyson · 2017-05-11T08:43:18Z

Just confirming that this does work for the use case that our collaborators found - good stuff. We're seeing ~5x improvement in memory usage, and this is more than sufficient for our needs.

Thanks again!

pp-mo added the Status: Work in Progress label Mar 22, 2017

Prevent the PP save-pairs generation realising the cube data.

5965b89

pp-mo force-pushed the cube_fill_value branch from fc247e9 to 5965b89 Compare March 23, 2017 11:53

pp-mo changed the title ~~Cube fill value~~ Lazy data in PP save pairs Apr 18, 2017

cpelley self-requested a review April 19, 2017 20:43

cpelley requested changes Apr 20, 2017

View reviewed changes

cpelley approved these changes Apr 25, 2017

View reviewed changes

cpelley merged commit 089ebe8 into SciTools:master Apr 25, 2017

QuLogic removed the Status: Work in Progress label Apr 25, 2017

pp-mo deleted the cube_fill_value branch April 25, 2017 11:01

QuLogic added this to the v2.0 milestone May 11, 2017

lbdreyer mentioned this pull request May 17, 2017

Updated v1.13 whatsnew #2561

Merged

QuLogic modified the milestones: v1.13.0, v2.0 May 17, 2017

bjlittle added a commit to bjlittle/iris that referenced this pull request Jun 5, 2017

Backout PR SciTools#2452.

fbf0bab

This was referenced Aug 2, 2017

Integrate dask masked array support (just code changes, no tests fix) #2699

Merged

Fix fill_value when saving to pp #2704

Closed

bjlittle mentioned this pull request Aug 4, 2017

Fix PP test failiures #2711

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy data in PP save pairs #2452

Lazy data in PP save pairs #2452

Uh oh!

pp-mo commented Mar 22, 2017 •

edited

Loading

Uh oh!

pp-mo commented Mar 23, 2017

Uh oh!

cpelley commented Mar 28, 2017

Uh oh!

cpelley commented Apr 20, 2017

Uh oh!

marqh commented Apr 20, 2017

Uh oh!

cpelley Apr 20, 2017

Uh oh!

djkirkham Apr 21, 2017

Uh oh!

pp-mo Apr 25, 2017

Uh oh!

cpelley Apr 20, 2017

Uh oh!

pp-mo Apr 25, 2017

Uh oh!

cpelley commented Apr 25, 2017

Uh oh!

hdyson commented Apr 25, 2017

Uh oh!

hdyson commented May 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Lazy data in PP save pairs #2452

Lazy data in PP save pairs #2452

Uh oh!

Conversation

pp-mo commented Mar 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pp-mo commented Mar 23, 2017

Uh oh!

cpelley commented Mar 28, 2017

Uh oh!

cpelley commented Apr 20, 2017

Uh oh!

marqh commented Apr 20, 2017

Uh oh!

cpelley Apr 20, 2017

Choose a reason for hiding this comment

Uh oh!

djkirkham Apr 21, 2017

Choose a reason for hiding this comment

Uh oh!

pp-mo Apr 25, 2017

Choose a reason for hiding this comment

Uh oh!

cpelley Apr 20, 2017

Choose a reason for hiding this comment

Uh oh!

pp-mo Apr 25, 2017

Choose a reason for hiding this comment

Uh oh!

cpelley commented Apr 25, 2017

Uh oh!

hdyson commented Apr 25, 2017

Uh oh!

hdyson commented May 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pp-mo commented Mar 22, 2017 •

edited

Loading