Skip to content

Conversation

@rhattersley
Copy link
Member

This PR switches the handling of deferred data to biggus and gets rid of the data manager concept.

NB. To minimise the change, LazyArray is still used within AuxFactory and for deferred coordinate values.

@shoyer
Copy link
Contributor

shoyer commented Dec 22, 2013

My favorite part about this PR is that it removes twice as many lines as it adds! 👍

Of course, biggus is a major dependency, but I think this removes significant complexity from Iris -- which is a major plus for people like me, who are excited about figuring out how to add new features.

lib/iris/cube.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to switch from np.asarray to np.asanyarray? The advantage of asarray is that it only lets through direct instances of ndarray, which guarantees that there aren't any funny ndarray subclasses in use (like matrix), which might overload operators in unexpected ways. In my opinion, asarray is a safer way to go, because it means that people writing math operations on cubes know exactly what they're getting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used np.asanyarray because I wanted to let masked arrays through. But that could be handled with its own explicit clause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay. We usually use standard ndarrays using np.nan for masked values (because it's faster and more memory efficient), but I know Iris uses numpy's masked array module in many cases.

@pelson
Copy link
Member

pelson commented Dec 24, 2013

I've added several comments on top of b0034a6. Some are memory joggers or questions, others are review actions. Given the number of changes, please only append commits at this point.

In terms of getting this in, I wonder if we need more documentation of biggus (or at least an uptodate pip install). What are you're thoughts on that front.

Hope you have a good Christmas break.

lib/iris/cube.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you haven't changed this, but you've changed everything else so here goes. The beginning of this sentence now appears somewhat contradictory to the first paragraph. When would the object not contain phenomenon values? I'd also add that it can be a biggus array in here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to update this whilst keeping the benefits from #862. I hope @ajdawson and @jkettleb are happy with the result!

@esc24
Copy link
Member

esc24 commented Jan 2, 2014

I'm all done. This is very impressive work 👏 which should make the manipulation and aggregation of large data sets become so much simpler. @pelson raises a number of points that need addressing and I think I've spotted a few small ones too (in particular the fill_value and the logic in merge()), but this is close to being merged in my opinion.

@rhattersley
Copy link
Member Author

I wonder if we need more documentation of biggus (or at least an uptodate pip install). What are you're thoughts on that front.

At a minimum I'd love to cut a new biggus version.

@rhattersley
Copy link
Member Author

OK chaps. I've folded in the API experiments and a whole bunch of review feedback and cleaned up the net-null-change copyright nonsense. Sorry it's a fresh commit but it was all getting so complex I just wanted to get back to a comprehensible state.

@rhattersley
Copy link
Member Author

I've pushed a new commit which avoids the need to define == on a biggus.NumpyArrayAdapter.

NB. It depends on SciTools/biggus#54, so it'll need updating if/when that makes it on to master.

@rhattersley
Copy link
Member Author

(If anyone gets tempted to merge, perhaps it would be wise to tag biggus 0.3 first and update this PR to use that.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhattersley are you trying to avoid using self._data due to the old deferred data implementation? How's about self.__data instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I wanted to use a new attribute name to highlight any cases where code was being naughty and accessing the old private attribute. I'm certainly not fixated on the name _my_data but I don't want to use the double-underscore name-mangling.

lib/iris/cube.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhattersley okay I'm splitting hairs here ...

There's been a couple of times now where I've come back to this method to remind myself what exactly has_data means. I guess I'm being all rather 👴, but for me a cube always has data. The real question is whether that data is lazy or concrete.

In test_cdm.py L923:928 you defined the convenience methods is_lazy and is_concrete which makes the testing crystal clear ... would you consider extending the cube API to include such methods on the Cube and remove has_data ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a single Cube.has_lazy_data() method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, works for me, and fits in quite neatly with the lazy_data method.

Nice. Do it! 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do! 🤘

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 😄

@bjlittle
Copy link
Member

@rhattersley you can now update the .travis.yml to use biggus v0.3, which I've now tagged and pushed to PyPI.

@rhattersley
Copy link
Member Author

update the .travis.yml to use biggus v0.3

Done.

@bjlittle
Copy link
Member

👍 Top PR @rhattersley!

Here we go 😲 ... merge! (waiting on Travis)

@rhattersley
Copy link
Member Author

(waiting on Travis)

Not any more. 😉

bjlittle added a commit that referenced this pull request Jan 28, 2014
@bjlittle bjlittle merged commit 5c46808 into SciTools:master Jan 28, 2014
@rhattersley rhattersley deleted the biggus branch January 28, 2014 15:41
@bjlittle
Copy link
Member

gulp ... what have I done 😉

@rhattersley
Copy link
Member Author

🍻 😀

@bblay
Copy link
Contributor

bblay commented Jan 29, 2014

nice one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants