Lazy masked fill_value retrieval for NC save #2723

DPeterK · 2017-08-03T13:41:23Z

Enables lazy masked fill_value retrieval for NetCDF save.

Dask exposes no mechanism for retrieving the fill value of a lazy masked array (indeed, arguably this is currently not a soluble problem). This PR proposes a workaround for this problem, exposes the workaround as a function in iris._lazy_data and integrates the function into the NetCDF saver.

Specifically, the workaround realises the smallest possible slice of the lazy masked data array that, when realised, contains the required fill value. Turns out the smallest possible slice is actually an empty slice:

>>> m = np.ma.masked_array([0, 1, 2], mask=[1, 0, 1], fill_value=9999)
>>> dm = da.from_array(m, chunks=(3,), asarray=False)
>>> dm[:0].compute().fill_value
9999

Fixes #2703.

DPeterK · 2017-08-03T13:42:22Z

Note: no tests yet! I thought I'd push the changes so that people could get eyes on the proposed functionality.

DPeterK · 2017-08-03T14:41:07Z

Now with unit tests! Still got a bunch of existing test failures to deal with, though.

bjlittle · 2017-08-03T14:55:06Z

lib/iris/_lazy_data.py

+    if is_lazy_data(data):
+        inds = tuple([0] * (data.ndim-1))
+        smallest_slice = data[inds][:0]
+        data = as_concrete_data(smallest_slice)


@dkillick You might want coverage for 0-dim arrays and MaskedConstants ... I've seen those being passed around on my travels

Should work; see these tests... although I'm not convinced that they cover the MaskedConstant case. Thoughts?

How would one produce a MaskedConstant to test on? They seem to be things you can only make when you don't want to (i.e. I now want to and I can't manage to make one!) 😒

The following works:

>>> a = ma.masked_array([4], mask=True) >>> masked_constant = a[0] >>> print type(masked_constant) <class 'numpy.ma.core.MaskedConstant'>

lbdreyer · 2017-08-03T15:05:48Z

lib/iris/tests/unit/lazy_data/test_lazy_masked_fill_value.py

+from iris._lazy_data import lazy_masked_fill_value, _MAX_CHUNK_SIZE
+
+
+class Test_as_lazy_data(tests.IrisTest):


This needs renaming

Good spot! Thought I'd got away with copying an existing file...

bjlittle · 2017-08-03T15:07:22Z

lib/iris/_lazy_data.py

    return data


+def lazy_masked_fill_value(data):


Would you but into a rebrand, say "lazy_fill_value" instead?

You're going to have to try harder than that to convince me 😉

I guess that...

I only care about fill values if I have masked data,

so 'masked' could be omitted in that sense, but

clarity is a good thing – it's in the masked data case that I want to retrieve the fill value.

So I'm not currently buying into a rebrand! What do you see as the benefit of doing so?

lbdreyer · 2017-08-03T15:07:27Z

lib/iris/tests/unit/lazy_data/test_lazy_masked_fill_value.py

+        self.fill_value = 9999
+        self.m = ma.masked_array(data, mask=mask, fill_value=self.fill_value)
+        self.dm = da.from_array(self.m, asarray=False,
+                                chunks=_MAX_CHUNK_SIZE)


Could you not just call iris._lazy_data.as_lazy_data

I could, but I don't see how that benefits me / improves the existing approach... So if you have a good benefit / improvement for changing this I'd love to hear it!

Well that's just the argument we initially had of whether to include as_lazy_data.
We chose to put all our dealing with dask in a single module (iris._lazy_data) rather than having it dotted throughout the iris codebase

That's fine... apart from the fact that it doesn't hold in the tests, which make widespread use of da.from_array (including the test module I duplicated to make this one).

Either way, this intermediate step has now been banked, so do feel free to change this in a follow-up PR 😄

lbdreyer · 2017-08-03T15:11:44Z

lib/iris/_lazy_data.py

+    if is_lazy_data(data):
+        inds = tuple([0] * (data.ndim-1))
+        smallest_slice = data[inds][:0]
+        data = as_concrete_data(smallest_slice)


Why calculate inds, why not just use [:0]

i.e.

if is_lazy_data(data): smallest_slice = data[:0] data = as_concrete_data(smallest_slice)

An interesting idea! I was going to say "Because then it won't be the smallest slice!", which turns out to not quite be accurate (look at the shape of the resultant array 😱):

z = np.arange(24).reshape(2,3,4) z[:0] array([], shape=(0, 3, 4), dtype=int64)

Apparently (according to an offline conversation) doing the suggested also causes the 0D case to not work properly. So, while I'm 👍 for reducing SLOC and code complexity, I think the extra line in this case adds some worthwhile resilience.

bjlittle · 2017-08-04T10:19:38Z

@dkillick Awesome thanks 👍

I'm going to bank this PR and follow-up with any/all rework in another PR, given your unavailability.

Lazy masked array fill_value retrieval

3968ab7

DPeterK added the dask-mask label Aug 3, 2017

DPeterK requested review from bjlittle and lbdreyer August 3, 2017 13:41

Add unit tests for new iris._lazy_data function

68e2b66

bjlittle reviewed Aug 3, 2017

View reviewed changes

lbdreyer reviewed Aug 3, 2017

View reviewed changes

bjlittle reviewed Aug 3, 2017

View reviewed changes

lbdreyer reviewed Aug 3, 2017

View reviewed changes

QuLogic added this to the dask-mask milestone Aug 3, 2017

bjlittle merged commit 9ed20b7 into SciTools:dask_mask_array Aug 4, 2017

bjlittle self-assigned this Aug 4, 2017

DPeterK deleted the nc_lazy_fillvalue branch August 4, 2017 11:59

DPeterK mentioned this pull request Aug 4, 2017

Fix fill_value when saving to netCDF #2703

Closed

bjlittle mentioned this pull request Aug 4, 2017

Rework lazy_masked_fill_value. #2729

Merged

		from iris._lazy_data import lazy_masked_fill_value, _MAX_CHUNK_SIZE


		class Test_as_lazy_data(tests.IrisTest):

Lazy masked fill_value retrieval for NC save #2723

Lazy masked fill_value retrieval for NC save #2723

Uh oh!

Conversation

DPeterK commented Aug 3, 2017

Uh oh!

DPeterK commented Aug 3, 2017

Uh oh!

DPeterK commented Aug 3, 2017

Uh oh!

bjlittle Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DPeterK Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DPeterK Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lbdreyer Aug 3, 2017 • edited by DPeterK Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DPeterK Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjlittle commented Aug 4, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bjlittle Aug 3, 2017 •

edited

Loading

DPeterK Aug 3, 2017 •

edited

Loading

DPeterK Aug 3, 2017 •

edited

Loading

lbdreyer Aug 3, 2017 •

edited by DPeterK

Loading

DPeterK Aug 3, 2017 •

edited

Loading