-
Notifications
You must be signed in to change notification settings - Fork 300
Lazy data in PP save pairs #2452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| # (C) British Crown Copyright 2010 - 2016, Met Office | ||
| # (C) British Crown Copyright 2010 - 2017, Met Office | ||
| # | ||
| # This file is part of Iris. | ||
| # | ||
|
|
@@ -2322,6 +2322,18 @@ def as_pairs(cube, field_coords=None, target=None): | |
| target=target) | ||
|
|
||
|
|
||
| def _data_fill_value(cube): | ||
| # Function to deduce a fill_value for a cube's data. | ||
| # This should eventually be superceded by a cube 'fill_value' property. | ||
| if cube.has_lazy_data(): | ||
| fill_value = cube.lazy_data().fill_value | ||
| elif isinstance(cube.data, ma.MaskedArray): | ||
| fill_value = cube.data.fill_value | ||
| else: | ||
| fill_value = None | ||
| return fill_value | ||
|
|
||
|
|
||
| def save_pairs_from_cube(cube, field_coords=None, target=None): | ||
| """ | ||
| Use the PP saving rules (and any user rules) to convert a cube or | ||
|
|
@@ -2404,6 +2416,11 @@ def save_pairs_from_cube(cube, field_coords=None, target=None): | |
|
|
||
| # Save each named or latlon slice2D in the cube | ||
| for slice2D in cube.slices(field_coords): | ||
| # Attach an extra cube "fill_value" property, allowing the save rules | ||
| # to deduce MDI more easily without realising the data. | ||
| # NOTE: it is done this way because this property may exist in future. | ||
| slice2D.fill_value = _data_fill_value(slice2D) | ||
|
|
||
| # Start with a blank PPField | ||
| pp_field = PPField3() | ||
|
|
||
|
|
@@ -2426,8 +2443,12 @@ def save_pairs_from_cube(cube, field_coords=None, target=None): | |
| # From UM doc F3: "Set to -99 if LBEGIN not known" | ||
| pp_field.lbuser[1] = -99 | ||
|
|
||
| # Set the data | ||
| pp_field.data = slice2D.data | ||
| # Set the data, keeping it lazy where possible. | ||
| if slice2D.has_lazy_data(): | ||
| slice_core_data = slice2D.lazy_data() | ||
| else: | ||
| slice_core_data = slice2D.data | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this really necessary? Can't you simply avoid the if-else and instead simply always set it to the lazy array i.e. I doubt there would be much overhead in re-instantiating a lazy array when it has been realised. i.e.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's best like this because creating a simple wrapper around real data is substantially more costly in dask than it was with biggus : The earliest dask branch retained existing cube property code to get cube.shape and cube.dtype from cube.lazy_data(), and then the tests took nearly twice as long! |
||
| pp_field._data = slice_core_data | ||
|
|
||
| # Run the PP save rules on the slice2D, to fill the PPField, | ||
| # recording the rules that were used | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a fill_value attribute at run-time seems rather messy to me especially as it presumes that the dask branch ends up merged onto master as it is. Perhaps it would be better to point this PR to the dask branch instead?
Saying that I'm not going to push this, If your happy then I'm happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be avoided by making the code in
_data_fill_value()into a set of save rules to determinepp.bmdi.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a purely temporary measure, based on the idea that the forthcoming dask change will definitely be introducing the generic "cube.fill_value" property.
We will remove this hack when the dask branch is merged back.
Obviously, the method actually used here is 🤢 but it is only here so you guys can check that it fixes the current memory burn problem.