-
Notifications
You must be signed in to change notification settings - Fork 300
biggus #900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
biggus #900
Conversation
|
My favorite part about this PR is that it removes twice as many lines as it adds! 👍 Of course, biggus is a major dependency, but I think this removes significant complexity from Iris -- which is a major plus for people like me, who are excited about figuring out how to add new features. |
lib/iris/cube.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to switch from np.asarray to np.asanyarray? The advantage of asarray is that it only lets through direct instances of ndarray, which guarantees that there aren't any funny ndarray subclasses in use (like matrix), which might overload operators in unexpected ways. In my opinion, asarray is a safer way to go, because it means that people writing math operations on cubes know exactly what they're getting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used np.asanyarray because I wanted to let masked arrays through. But that could be handled with its own explicit clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay. We usually use standard ndarrays using np.nan for masked values (because it's faster and more memory efficient), but I know Iris uses numpy's masked array module in many cases.
|
I've added several comments on top of b0034a6. Some are memory joggers or questions, others are review actions. Given the number of changes, please only append commits at this point. In terms of getting this in, I wonder if we need more documentation of biggus (or at least an uptodate pip install). What are you're thoughts on that front. Hope you have a good Christmas break. |
lib/iris/cube.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you haven't changed this, but you've changed everything else so here goes. The beginning of this sentence now appears somewhat contradictory to the first paragraph. When would the object not contain phenomenon values? I'd also add that it can be a biggus array in here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I'm all done. This is very impressive work 👏 which should make the manipulation and aggregation of large data sets become so much simpler. @pelson raises a number of points that need addressing and I think I've spotted a few small ones too (in particular the fill_value and the logic in merge()), but this is close to being merged in my opinion. |
At a minimum I'd love to cut a new biggus version. |
|
OK chaps. I've folded in the API experiments and a whole bunch of review feedback and cleaned up the net-null-change copyright nonsense. Sorry it's a fresh commit but it was all getting so complex I just wanted to get back to a comprehensible state. |
|
I've pushed a new commit which avoids the need to define NB. It depends on SciTools/biggus#54, so it'll need updating if/when that makes it on to master. |
|
(If anyone gets tempted to merge, perhaps it would be wise to tag biggus 0.3 first and update this PR to use that.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhattersley are you trying to avoid using self._data due to the old deferred data implementation? How's about self.__data instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I wanted to use a new attribute name to highlight any cases where code was being naughty and accessing the old private attribute. I'm certainly not fixated on the name _my_data but I don't want to use the double-underscore name-mangling.
lib/iris/cube.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhattersley okay I'm splitting hairs here ...
There's been a couple of times now where I've come back to this method to remind myself what exactly has_data means. I guess I'm being all rather 👴, but for me a cube always has data. The real question is whether that data is lazy or concrete.
In test_cdm.py L923:928 you defined the convenience methods is_lazy and is_concrete which makes the testing crystal clear ... would you consider extending the cube API to include such methods on the Cube and remove has_data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a single Cube.has_lazy_data() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, works for me, and fits in quite neatly with the lazy_data method.
Nice. Do it! 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do! 🤘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. 😄
|
@rhattersley you can now update the |
Done. |
|
👍 Top PR @rhattersley! Here we go 😲 ... merge! (waiting on Travis) |
Not any more. 😉 |
|
gulp ... what have I done 😉 |
|
🍻 😀 |
|
nice one |
This PR switches the handling of deferred data to biggus and gets rid of the data manager concept.
NB. To minimise the change, LazyArray is still used within AuxFactory and for deferred coordinate values.