-
Notifications
You must be signed in to change notification settings - Fork 300
Accept new copy behaviour from dask/dask#9555. #5041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think my view of this is that it is basically the Iris test that is wrong -- it asks too much. I find that in older dasks, if ... then So, that is still true with latest dask. So in my view, where in '_check_copy', Iris is currently checking that And if that makes sense, why really should Iris expect or require So, as you hinted above, I think these distinctions are a prerogative of Dask, and Iris shouldn't be insisting on the detail. Makes sense to me. Any more takers ?!? |
|
I've been racking my brains to think how the failed test could be a problem for the user. In general if we modify the data in a copy of a cube, the original cube's data is unaffected. I think this was a deliberate design choice. import numpy as np
import iris.cube
cube1 = iris.cube.Cube(np.arange(5))
cube2 = cube1.copy()
cube2.data[3] = 42
print(cube1.data)gives However, with latest dask: import dask.array as da
cube1 = iris.cube.Cube(da.from_array(np.arange(5)))
cube2 = cube1.copy()
cube2.data[3] = 42
print(cube1.data)I'm not at all sure that this second example represents a realistic user workflow though. The vast majority of the time, if a user has lazy data it's because they loaded from a file. |
|
Also, ping @DPeterK as he has clearly given this some thought recently. |
That is definitely (historically) correct! Though it has also caused some trouble. So, I think there is considerable room for a "pragmatic" approach. |
|
Thanks @rcomer for the ping - and excellent spot on the original cause of this issue! I tried to ping the scitools/devs group on the dask PR when it was brought to my attention so that it would reach everyone's collective attention... but couldn't, apparently due to a GitHub foible? That aside, the reason that I'm implicated here is that I implemented the change to dask that has now been backed out again in I'm in agreement with both @pp-mo and @rcomer with their thoughts to date, in that (a) we should let dask decide how to copy arrays, (b) we should be pragmatic in Iris about how we handle this, but nevertheless (c) this particular behaviour was clearly implemented for a reason (from a vague memory, either to support some cube math operations, or perhaps to preserve metadata across copy operations?) I think we have a few options from which we could choose:
Like @pp-mo, I think we need to be pragmatic about this. If Iris really doesn't need to care about making a new data array on cube copy, then I think we just go with (2) and let dask fully decide how to manage lazy data. If we do care about the old behaviour, we should perhaps promote (4) via a note in the documentation about the changed behaviour. My preference, though, is to just go for (2) and be done with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks @pp-mo! That's a full set of ✔ on |
🚀 Pull Request
Description
Replaces #5027.
Now updated since the original text below. Following helpful discussion (in the comments below) with @pp-mo, @rcomer, @DPeterK: we have decided to adopt Dask's new copy behaviour by adjusting our testing and announcing the change in a What's New entry. We expect minimal user disruption due to this change being fairly niche.
Original text:
Consult Iris pull request check list