fill_value and dtype #2433

bjlittle · 2017-03-13T14:16:45Z

This PR addresses the issue of dealing with the dtype and fill_value of the cube.

We need to deal with fact that dask does not support masked arrays, and so we need to take care to correctly handle the intended dtype and fill_value of the data payload in the cube.

The dtype and fill_value have now been promoted to be cube metadata attributes, because we care about them and define what a cube is. In the case of a dask lazy, masked data payload, we need to preserve the case where the intended dtype is integral.

We also have a shift in how fill_value is handled, in that previously we handed that off to biggus and numpy.ma in the hope that it would do the right thing. Now we deal with this directly within the cube through cube.fill_value, which is used as the intended fill_value of any masked data payload.

… -- need new API).

…ad of 'real'.

pp-mo · 2017-03-13T15:54:43Z

lib/iris/cube.py

    @dtype.setter
    def dtype(self, dtype):
-        self._dtype = dtype
+        if dtype != self.dtype:


Should we not always allow cube.dtype = None, to reset the dtype to the underlying data ?

I can see the case that you might want to unset the dtype for lazy data to None, which you can't at the moment given the current implementation ... I've no argument against that.

pp-mo · 2017-03-14T08:53:58Z

Hi @bjlittle highly impressed you have this passing now !!
Will review later...

In meantime, here's an idea to ponder on...
I analysed usecases as I was suggesting, especially for the "minimal" requirements.
The only important outcome was this : I suspect we don't really need dtype assignment, we can allow dtype (that is, the 'alternative real dtype') to be specified only when setting lazy data.

I think we will need to retain real-data reassignment "cube.data =", but we can probably do without lazy-data reassignment (I need to check out existing code). Even if not, we can define a particular function for this.
This would mean that 'real dtype' can only be supplied in association with new lazy data.
I think removing the dtype assignment makes the whole API cleaner and simpler (including simpler to explain).

Think on, will discuss later.

Data dtype fillvalue tests

Pp fix datadtypefillvalue

bjlittle · 2017-03-14T12:32:04Z

Ran 4006 tests in 104.584s

FAILED (SKIP=310, failures=38)

Test case failures by category - tick the box to own the failures, investigate and fix, push changes as a PR on this branch and strike out the category when done ...

~~iris.tests.unit.analysis.regrid.test_RectilinearRegridder.Test___call____circular~~ (x1) @lbdreyer bjlittle@2b77106 ☑️
iris.tests.test_basic_maths (x23) @bjlittle Fix basic maths tests. bjlittle/iris#20 ☑️
~~iris.tests.test_analysis~~ (x10) @lbdreyer Keep fill values during stats operations bjlittle/iris#19 ☑️
~~iris.tests.test_concatenate.Test2D~~ (x3) @bjlittle Fix concatenate tests. bjlittle/iris#18 ☑️
~~iris.tests.test_interpolation.TestNearestLinearInterpolRealData~~ (x1) @lbdreyer Keep cube's fill_value when resetting the cube's data in interpolation test bjlittle/iris#17 ☑️

Keep cube's fill_value when resetting the cube's data in interpolation test

…at64.

lbdreyer · 2017-03-14T18:32:01Z

Sorry @bjlittle! I accidentally pushed (still getting the hang of pycharm) the change
bjlittle@2b77106
directly to this branch.

You can always revert it if you diagree with it

Keep fill values during stats operations

Fix concatenate tests.

Fix basic maths tests.

bjlittle · 2017-03-15T10:52:03Z

lib/iris/cube.py

+        # as the fill_value is checked against self.dtype.
+        self._dtype = None
+        if dtype is not None:
+            self.dtype = dtype


I don't think we need to have this check here, we just need to initialise _dtype = None and then set self.dtype = dtype

lbdreyer · 2017-03-15T11:07:24Z

lib/iris/cube.py

+                dtype = np.dtype(dtype)
+                if dtype.kind != 'i':
+                    emsg = ('Can only cast lazy data to integral dtype, '
+                            'got {!r}.')


We need to be strict here. We're only allowing the dtype to be of integral type to support the masked case. If we didn't do this, then users would abuse this to perform casting of their data.

If we do want to support casting of data, for the sake of it, then we should support that properly, say with cube.astype(...) or whatever. Hence, that's why we nail this dtype change down to only being applied to the lazy, integral (masked) data case. HTH.

Just a note: if you're visiting this bit, I've found that this test should also be allowing 'u' type data, i.e. unsigned ints. E.G. if dtype.kind not in ('i', 'u'): ?
If not it can wait -- I think it could do with a more general code search anyway, as I think we may have missed this possibility elsewhere in the code too.

Hmmm good point @pp-mo ... See this concatenate change and here also ... is there a generic numpy kind to catch all integral kinds instead of i and u ?

This numpy scalars link was handy ... also see Array-protocol type strings section of numpy docs ...

And this pattern seems to work:

>>> integral = [np.arange(10, dtype=np.int8), ... np.arange(10, dtype=np.int16), ... np.arange(10, dtype=np.int32), ... np.arange(10, dtype=np.int64), ... np.arange(10, dtype=np.uint8), ... np.arange(10, dtype=np.uint16), ... np.arange(10, dtype=np.uint32), ... np.arange(10, dtype=np.uint64)] >>> for i in integral: ... print(isinstance(i[0], np.integer)) ... True True True True True True True True >>> f = np.arange(10, dtype=np.float) >>> isinstance(f[0], np.integer) False

Don't know if it's preferable over value.dtype.kind in ('i', 'u') though ...

We're only allowing the dtype to be of integral type to support the masked case

It might be worth putting a comment in the code about this

I only mentioned the kind=u/i thing for completeness..
Let's not get hung up on this, please.
It will be much simpler to fix it later as I'm sure it needs doing in unrelated bits of the code.

I do prefer np.integer.
It will also need to be fixed in iris/_concatenate.py e.g. https://github.com/SciTools/iris/pull/2433/files#diff-5fa4649482766af96ae11aead017e334R340

@pp-mo can you make an issue for this please

bjlittle · 2017-03-15T12:07:27Z

lib/iris/_concatenate.py

+        if kwargs['dtype'] is None and self.data_type.kind == 'i':
+            kwargs['dtype'] = self.data_type
+            defn = iris.cube.CubeMetadata(**kwargs)
+        return defn


I should have some unit tests for this really ...

Actually, I'm not going to do that here as this might all change given conversations with @pp-mo

Created the following issue #2439 for this

lbdreyer · 2017-03-15T12:28:29Z

lib/iris/cube.py

+    def fill_value(self, fill_value):
+        if fill_value is not None:
+            # Convert the given value to the dtype of the cube.
+            fill_value = np.asarray([fill_value])[0]


How about

fill_value, = np.asarray([fill_value])

I'm tending to steer away from the fill_value, = np.asarray([fill_value]) pattern to unpack a scalar value, as it's quite subtle i.e. it's pretty easy for the eye to miss.

However, we could opt for [fill_value] = np.asarray([fill_value]) instead ... ?

I'll make this change ...

lbdreyer · 2017-03-15T13:30:08Z

lib/iris/tests/test_concatenate.py

        result = concatenate(cubes)
        self.assertEqual(len(result), 2)

+    def test_masked_fill_value(self):


I do wonder whether "test_masked_diff_fill_value" would be better

bjlittle · 2017-03-15T13:43:07Z

@pp-mo @lbdreyer Let's get this merged asap ....

pp-mo

Can we just make this obscure stuff a bit clearer ?

I'm otherwise happy with the code.
I must just whizz through the test changes, too, but I'm not expecting to find anything much to complain of..

pp-mo · 2017-03-15T13:37:22Z

lib/iris/_concatenate.py

        return result

+    def promote_defn(self):
+        defn = self.defn


I think this is worthy of explanation !
Suggest:

call it "lazy_version_defn"

docstring like ....

Produce the defn (i.e. cube.metadata) which a lazy version of the cube would have. A cube with lazy data representing masked ints must have its `metadata.dtype` set appropriately.

Even if we think this may all change again later ...

pp-mo · 2017-03-15T13:42:30Z

lib/iris/_concatenate.py

-            # in :meth:`iris.cube.CubeList.concatenate_cube()`.
-            msg = 'Cube metadata differs for phenomenon: {}'
-            msgs.append(msg.format(self.defn.name()))
+            promoted = self.promote_defn()


Explain here too. Something like "Check whether lazy versions of the defns would match."

lbdreyer · 2017-03-15T14:20:13Z

The tests are failing due to pep8 @bjlittle

lbdreyer · 2017-03-15T14:22:56Z

Once this tests are passing I'll merge this in

lbdreyer · 2017-03-15T14:27:28Z

Are you happy for me to squash and merge (once the tests have passed)?

bjlittle · 2017-03-15T14:39:01Z

Ping @lbdreyer 👍

bjlittle · 2017-03-15T14:56:02Z

Ping @pp-mo @dkillick

lbdreyer · 2017-03-15T14:58:37Z

Sorry about the wait. @pp-mo wanted to do some final checks

* Testing ideas for cube data/dtype/fill_value interaction (not passing -- need new API). * Add checks for realisation; fix dtype tests; use 'lazy' keyword instead of 'real'. * With test failures to investigate. * Fixed tests for new code from SciTools#2433. * Fix tests * Pep8 fix. * Fixed data_dtype_fillvalue test fails (other things now failing). * Fix bug in clearing _dask_array when assigning real data. * Keep fill value of cube when resetting the cube's data * Fix concatenate tests. * Keep fill values during stats operations; cast cml fill values to float64. * Keep fill values during regrid test. * Fix basic maths tests. * Tidy cube.__init__ for dtype setting. * Review actions. * Rename concatenate unit tests. * Review actions. * pep8

pp-mo added 2 commits March 13, 2017 11:17

Testing ideas for cube data/dtype/fill_value interaction (not passing…

42074db

… -- need new API).

Add checks for realisation; fix dtype tests; use 'lazy' keyword inste…

bb972e5

…ad of 'real'.

bjlittle added Status: Work in Progress dask labels Mar 13, 2017

bjlittle added this to the dask milestone Mar 13, 2017

With test failures to investigate.

338f8d9

bjlittle force-pushed the fill-value-and-dtype branch from a1e5743 to 338f8d9 Compare March 13, 2017 14:27

pp-mo reviewed Mar 13, 2017

View reviewed changes

pp-mo and others added 2 commits March 13, 2017 16:29

Fixed tests for new code from SciTools#2433.

0b80591

Fix tests

db1f7e1

bjlittle and others added 3 commits March 14, 2017 09:12

Merge pull request #15 from pp-mo/data_dtype_fillvalue_tests

60344ca

Data dtype fillvalue tests

Pep8 fix.

8686da5

Fixed data_dtype_fillvalue test fails (other things now failing).

de1f83e

pp-mo mentioned this pull request Mar 14, 2017

Pp fix datadtypefillvalue bjlittle/iris#16

Merged

pp-mo and others added 2 commits March 14, 2017 12:15

Fix bug in clearing _dask_array when assigning real data.

6bf80a7

Merge pull request #16 from pp-mo/pp_fix__datadtypefillvalue

e0b5609

Pp fix datadtypefillvalue

lbdreyer and others added 4 commits March 14, 2017 14:57

Keep fill value of cube when resetting the cube's data

9b6636d

Merge pull request #17 from lbdreyer/fix_fill_value

385cdb9

Keep cube's fill_value when resetting the cube's data in interpolation test

Fix concatenate tests.

cf2af09

Keep fill values during stats operations; cast cml fill values to flo…

445b4e6

…at64.

lbdreyer mentioned this pull request Mar 14, 2017

Keep fill values during stats operations bjlittle/iris#19

Merged

Keep fill values during regrid test.

2b77106

bjlittle and others added 3 commits March 15, 2017 09:04

Merge pull request #19 from lbdreyer/fix_fill_val_analysis

479dad9

Keep fill values during stats operations

Fix basic maths tests.

cc14ab7

Merge pull request #18 from bjlittle/fix-test-concatenate

b27917c

Fix concatenate tests.

bjlittle mentioned this pull request Mar 15, 2017

Fix basic maths tests. bjlittle/iris#20

Merged

Merge pull request #20 from bjlittle/fix-basic-maths

0a80c37

Fix basic maths tests.

bjlittle commented Mar 15, 2017

View reviewed changes

lbdreyer reviewed Mar 15, 2017

View reviewed changes

Tidy cube.__init__ for dtype setting.

f4fd7a0

bjlittle changed the title ~~With test failures to investigate.~~ fill_value and dtype Mar 15, 2017

bjlittle commented Mar 15, 2017

View reviewed changes

lbdreyer reviewed Mar 15, 2017

View reviewed changes

pp-mo mentioned this pull request Mar 15, 2017

Fix dtype.kind usage #2438

Closed

Review actions.

2306a86

lbdreyer reviewed Mar 15, 2017

View reviewed changes

Rename concatenate unit tests.

d3b4212

pp-mo reviewed Mar 15, 2017

View reviewed changes

Review actions.

c74dc10

pep8

149a3be

bjlittle assigned lbdreyer Mar 15, 2017

lbdreyer merged commit 62cdd00 into SciTools:dask Mar 15, 2017

QuLogic removed the Status: Work in Progress label Mar 15, 2017

bjlittle deleted the fill-value-and-dtype branch March 15, 2017 15:04

cpelley mentioned this pull request Apr 20, 2017

Lazy data in PP save pairs #2452

Merged

QuLogic modified the milestones: dask, v2.0 Aug 2, 2017

fill_value and dtype #2433

fill_value and dtype #2433

Uh oh!

Conversation

bjlittle commented Mar 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pp-mo Mar 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pp-mo commented Mar 14, 2017

Uh oh!

bjlittle commented Mar 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lbdreyer commented Mar 14, 2017

Uh oh!

bjlittle Mar 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjlittle Mar 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjlittle Mar 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjlittle Mar 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjlittle Mar 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjlittle commented Mar 15, 2017

Uh oh!

pp-mo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lbdreyer commented Mar 15, 2017

Uh oh!

lbdreyer commented Mar 15, 2017

Uh oh!

lbdreyer commented Mar 15, 2017

bjlittle commented Mar 13, 2017 •

edited

Loading

pp-mo Mar 13, 2017 •

edited

Loading

bjlittle commented Mar 14, 2017 •

edited

Loading

bjlittle Mar 15, 2017 •

edited

Loading

bjlittle Mar 15, 2017 •

edited

Loading

bjlittle Mar 15, 2017 •

edited

Loading

bjlittle Mar 15, 2017 •

edited

Loading

bjlittle Mar 15, 2017 •

edited

Loading