Skip to content

Conversation

@corinnebosley
Copy link
Member

iris.util.new_axis cannot handle masked constants, so this is a check to turn masked constants into scalar masked arrays, at which point they can be handled correctly.

This fix is checked by a test in PR#2130: #2130

@corinnebosley
Copy link
Member Author

I'm not happy about these tests failing. They pass for me, as long as I remove the function 'tests.get_data_path(...)' and use the absolute path instead.

However, now that I know that there are tests for util.new_axis that live here, I will add my tests to these files in this PR and close PR#2130.

@corinnebosley
Copy link
Member Author

@marqh

lib/iris/util.py Outdated
if scalar_coord is not None:
scalar_coord = src_cube.coord(scalar_coord)

if isinstance(src_cube.data, np.ma.core.MaskedConstant):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using src_cube.data makes this function load deferred data, which is why you saw the deferred data test fail. Can you formulate this check without loading the data?

self.assertTrue(new_cube.has_lazy_data())
self.assertEqual(new_cube.shape, (1,) + cube.shape)

def test_masked_unit_array(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the same test twice in two places, why is it repeated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I don't know where it is supposed to live. Thanks for your comments @ajdawson, I'll fix these tomorrow.

@corinnebosley
Copy link
Member Author

@ajdawson I don't think I can do this check without touching the data at all ever, but I can touch the data only if the cube is scalar, which is what I have tried to do here.

@ajdawson
Copy link
Member

Can I just rewind this a little. I'd like to understand the actual problem here, and the use case this change is catering for. Would you mind providing a bit more description of this?

@corinnebosley
Copy link
Member Author

@marqh @ajdawson I'm a bit stuck now. Can't find the problem (tests pass on my own machine) and I don't know how to fix it.

@ajdawson
Copy link
Member

I think your current approach is incorrect, producing a masked array before the reshape won't fix this. I have an alternative implementation that passes existing tests + your new test, but I'd really like to get some more info about the actual problem first (see previous comment).

@ajdawson
Copy link
Member

I'll get the ball rolling with more detail about my concerns:

What you are trying to do is convert a MaskedConstant to a MaskedArray to avoid having the mask a different shape from the data. However, I'm concerned that this is the result of confusion between MaskedArray and MaskedConstant and not the real issue.

A MaskedConstant can have arbitrary dimension, but is always masked, and therefore does not need more than a scalar mask. When you add singleton dimensions to a MaskedConstant via sub-scripting with None (or np.newaxis) this changes the number of dimensions of the array, but not the mask, which makes sense since it is still a MaskedConstant and not a MaskedArray.

I'd like to know more about the problems caused down the line by having a non-scalar MaskedConstant inside a Cube (i.e. what motivated this PR?), so I can understand why (or why not) changing the behaviour of new_axis is the right thing to do.

@corinnebosley
Copy link
Member Author

corinnebosley commented Sep 15, 2016

@ajdawson I received a support request from someone highlighting that new_axis doesn't add a new axis to the mask along with the single-point array. In the examples he sent me, he was trying to add dimensions to the array and then transpose it. He pointed out that if he wanted to take a single point out of an existing cube, then add dimensions and transpose it, it didn't work because it was a Masked Constant.

I am in hospital at the moment, so I'm afraid I can't access the example file that he sent me or even get any more information from him about his purposes, but I can do that on Tuesday if you like.

@ajdawson
Copy link
Member

OK, thanks for the extra information, I can now reproduce the case you are talking about. I'd like to think about it some more, but it seems like your approach is the right option in this case. I think all we need to do is tidy up a few details. We can revisit next week or whenever you are ready.

@corinnebosley
Copy link
Member Author

@ajdawson Hi AJ, I am back at work now and eager to move forward with this issue. What are your thoughts?

self.assertTrue(new_cube.has_lazy_data())
self.assertEqual(new_cube.shape, (1,) + cube.shape)

def test_masked_unit_array(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be a unit test (in the location you had it previously) not an integration test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajdawson Why is this a unit test and not an integration test? I thought that because it was testing the behaviour of an iris function with a specific numpy data type, this would make the test not isolated to a single function. (I'm not being pedantic, I'm just trying to understand testing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, this is testing a simple invocation of one function (a code unit) that deosn't rely on a whole stack or anything external, so it is a unit test. Note that the test above it for lazy data is loading some real data from file and using new_axis on it, which is relying on biggus, the iris load mechanism etc. and is integrating several components together and looks a bit like end-user code. You'll notice there is a corresponding unit test for the lazy data case, perhaps compare and contrast for an idea of what is a unit test and what is an integration test, although ultimately the line is fuzzy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

lib/iris/util.py Outdated
if scalar_coord is not None:
scalar_coord = src_cube.coord(scalar_coord)

if src_cube.shape is ():
Copy link
Member

@ajdawson ajdawson Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to alter the actual slicing if we have a MaskedConstant rather than trying to correct for it beforehand. I tentatively suggest something like:

if src_cube.has_lazy_data():
    new_cube = iris.cube.Cube(src_cube.lazy_data()[None])
else:
    if isinstance(src_cube.data, ma.core.MaskedConstant):
        new_data = ma.array([np.nan], mask=[True])
    else:
        new_data = src_cube.data[None]
    new_cube = iris.cube.Cube(new_data)

with the following caveat: Do we know if this does the right thing when we have lazy data? I haven't checked. I think whatever implementation we choose we should have a test that is the equivalent of your new test but for a lazy array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajdawson This all makes sense, I am happy to implement these changes.

I am not sure if I've tried this with lazy data. I assume if I write a test using a lazy array this would also just be a unit test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so yes


def test_masked_unit_array(self):
cube = tests.stock.simple_3d_mask()
test_cube = cube[0][0][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accepted style for this is test_cube = cube[0, 0, 0]

lib/iris/util.py Outdated

if src_cube.shape is ():
if isinstance(src_cube.data, np.ma.core.MaskedConstant):
src_cube.data = np.ma.array(data=[0], mask=[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use ma.array(..., we have an import numpy.ma as ma in this file so we don't need the leading np.. This should be true everywhere in iris.

lib/iris/util.py Outdated

if src_cube.shape is ():
if isinstance(src_cube.data, np.ma.core.MaskedConstant):
src_cube.data = np.ma.array(data=[0], mask=[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use np.nan as the data value here? We don;t know what it should be since a MaskedConstant has no value really, so not-a-number seems reasonable. I don't really like this part but I'm not sure if we have another choice...

Copy link
Member

@ajdawson ajdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added my current thoughts on how to progress. Feel free to discuss these before committing to anything.

lib/iris/util.py Outdated
scalar_coord = src_cube.coord(scalar_coord)

if src_cube.shape is ():
if isinstance(src_cube.data, np.ma.core.MaskedConstant):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for np.ma.core.MaskedConstant -> ma.core.MaskedConstant.

@corinnebosley
Copy link
Member Author

@ajdawson Does the lazy data test look OK to you?

@ajdawson
Copy link
Member

I think the test is OK, but it fails: https://travis-ci.org/SciTools/iris/jobs/161862951#L5040
We need to determine what to do for lazy arrays.

@corinnebosley
Copy link
Member Author

Actually I think that might be my fault. I didn't check the dimensions of the lazy array before indexing into it to get a single point. Let me try again.

@corinnebosley
Copy link
Member Author

corinnebosley commented Sep 22, 2016

@ajdawson OK, this has made me a bit confused. The test is failing because the lazy array doesn't have a mask on it, so when I try and check the dimensions of the mask it doesn't know what to check. I can add a mask to the array, but will it then still be lazy?

Also, what happens if the single-point array we are passing to new_axis is both lazy and a Masked Constant? Is that a thing? If it is, then it will bypass the fix, right?

@djkirkham
Copy link
Contributor

what happens if the single-point array we are passing to new_axis is both lazy and a Masked Constant? Is that a thing?

Yes, I think that can happen.

I feel like this should really be fixed in biggus, and it might not be possible to make a fix in Iris anyway. It seems to me that there's no way to know whether realising the data would result in a MaskedConstant without actually realising it, so there's no way to know when to apply the fix.

I think biggus should be fixed so that src_cube.lazy_data()[None] doesn't cause problems for masked data.

@corinnebosley
Copy link
Member Author

@djkirkham You might be right.

@ajdawson
Copy link
Member

OK, if this is out of our direct control for deferred data I am happy for this fix to only apply to loaded data (i.e. as it currently is), with the condition that an issue is opened on biggus, and is picked up again in iris when possible. You'll need to remove the test for lazy data since it fails, and then I think this can be accepted.

@corinnebosley
Copy link
Member Author

@ajdawson @djkirkham Either of you know why these tests are failing now?

@djkirkham
Copy link
Contributor

djkirkham commented Sep 22, 2016

It's because there is a new version of cf_units which is being picked up. See #2144 which will fix these errors

@djkirkham
Copy link
Contributor

Actually, most of the errors are because the current version of cf_units has a bug. We're releasing a new version now which will fix all but one of the errors, and #2144 will fix the remaining one

@corinnebosley
Copy link
Member Author

Thanks Dan

@corinnebosley
Copy link
Member Author

@marqh Dan informs me that the tests for this PR should pass now that his own changes have been merged. Could you please re-run the Travis CI tests here? Thanks.

@ajdawson
Copy link
Member

Did you forget to rebase (and thus pull in required changes)? Tests are still failing.

@corinnebosley
Copy link
Member Author

@ajdawson Yes I did.

@corinnebosley
Copy link
Member Author

@ajdawson OK, so I rebased onto upstream master, but actually Dan's fix was merged into v1.10.x so the tests are still failing. Have spoken to Pete about it, who is waiting to speak to Mark about it, and then maybe somebody might speak to me about it again...

@ajdawson
Copy link
Member

Err, OK! What usually needs to happen is a merge of the v1.10.x branch into master to pickup bug-fix changes from there. Once that is done you can rebase this onto upstream/master. IWhen you can eventually do this, you may as well squash these commits into a single commit while you are at it.

@corinnebosley
Copy link
Member Author

@marqh

@corinnebosley
Copy link
Member Author

@ajdawson Are you happy to merge?

@ajdawson ajdawson merged commit 098ed88 into SciTools:master Oct 7, 2016
@ajdawson
Copy link
Member

ajdawson commented Oct 7, 2016

Yep! Thanks @corinnebosley

@corinnebosley
Copy link
Member Author

@ajdawson Thank you!

@corinnebosley corinnebosley deleted the util_fix branch October 7, 2016 08:54
@QuLogic QuLogic added this to the v1.11 milestone Oct 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants