Skip to content

Conversation

@cpelley
Copy link

@cpelley cpelley commented Apr 25, 2016

  1. Simply utilises the capabilities of biggus to provide a transposed version of the data rather than making a copy (necessary with the sizes of data we are working with).
  2. Using 1, also provides in-place dimension inversion (intentionally constrained implementation). - now moving to a new PR after the merge of this one.

@cpelley
Copy link
Author

cpelley commented Apr 25, 2016

iris.tests.experimental.regrid.test_regrid_area_weighted_rectilinear_src_and_grid.TestAreaWeightedRegrid are failling on travis but I'm not seeing any issue locally :(

Any ideas anyone?

Chers

@cpelley cpelley changed the title ENH: Lazy cube data transpose ENH: Cube in-place transform and dimension inversion Apr 25, 2016
@cpelley cpelley changed the title ENH: Cube in-place transform and dimension inversion ENH: Cube in-place transpose and dimension inversion Apr 25, 2016
@ajdawson
Copy link
Member

failling on travis but I'm not seeing any issue locally :( Any ideas anyone?

We should increase the timeout for nose multiprocessing...

@ajdawson
Copy link
Member

So, can you describe how the inversion would proceed if I invert a 1d dimension coordinate that has a 1d Auxcoord describing it also?

@rhattersley
Copy link
Member

We should increase the timeout for nose multiprocessing...

👍 ... or speed up/remove the offending re-projection tests.

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

So, can you describe how the inversion would proceed if I invert a 1d dimension coordinate that has a 1d Auxcoord describing it also?

Certainly, it will fall over as I have called self.coord not self.coords (exception here).
I can add a test if you really want me to but I see no benefit to adding duplicate testing for Cube.coord.

Cheers

Thanks both for your interest on this.

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

This implementation is intentionally constrained. In future it will support multi-dimensional coordinates and multiple coordinates mapped to the same dimension (as/when driven by requirement).

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

We should increase the timeout for nose multiprocessing...

👍 ... or speed up/remove the offending re-projection tests.

Thanks both, I hadn't noticed that it was a TimedOutException raised. I'll look at how how easy they might be to migrate convert to true unittests.

@ajdawson
Copy link
Member

ajdawson commented Apr 26, 2016

This implementation is intentionally constrained. In future it will support multi-dimensional coordinates and multiple coordinates mapped to the same dimension (as/when driven by requirement).

I wanted to know what happens in the case I specified. Is an exception raised? Where is it raised from? (comments appeared out-of order, I saw the quoted one before the other arrived)

Certainly, it will fall over as I have called self.coord not self.coords

You may want to consider catching this exception and re-raising with a more appropriate error message?

lib/iris/cube.py Outdated
if len(coord_dims) > 1:
msg = ('Currently multidimensional coordinate inversion is '
'not supported')
raise RuntimeError(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is RuntimeError the best class of exception to be used here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ajdawson, sure.

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

I wanted to know what happens in the case I specified. Is an exception raised? Where is it raised from?

@ajdawson I thought I already did?

@cpelley: "Certainly, it will fall over as I have called self.coord not self.coords (exception here)..."

Perhaps an example might help?

>>> print cube
thingness / (1)                     (bar: 3; foo: 4)
     Dimension coordinates:
          bar                           x       -
          foo                           -       x
     Auxiliary coordinates:
          bing                          x       -
>>> cube.invert_dims((0,))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/h05/cpelley/git/iris/lib/iris/cube.py", line 2801, in invert_dims
    coord = self.coord(contains_dimension=dim)
  File "/home/h05/cpelley/git/iris/lib/iris/cube.py", line 1439, in coord
    raise iris.exceptions.CoordinateNotFoundError(msg)
iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 coordinate, but found 2. They were: bar, bing.'

@ajdawson
Copy link
Member

See above modified comment.

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

You may want to consider catching this exception and re-raising with a more appropriate error message?

I thought the exception was clear enough along with the traceback that says where it was raised (see above example).

@ajdawson
Copy link
Member

I thought the exception was clear enough

It doesn't tell you what actually went wrong though. The real problem is that in-place inversion doesn't support multiple coordinates spanning a single dimension. The end-user shouldn't have to know that you called the coord method to determine this (or know anything about the functionality and exceptions raised by the coord method). You should raise an exception that describes the immediate problem, in this case something like:

Inversion of dimensions that are spanned by multiple coordinates is not currently supported.
Dimension {} is spanned by ...

lib/iris/cube.py Outdated
coord = self.coord(contains_dimension=dim)
coord_dims = self.coord_dims(coord)
if len(coord_dims) > 1:
msg = ('Currently multidimensional coordinate inversion is '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't start the sentence with "Currently", try something along the lines of:

Inversion of multidimensional coordinates is not supported.

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

What's stopping you from indexing your cube with a[::-1] slice?

If the data is not lazy, that would create a copy (i.e. consume twice as much as necessary).

...but invert_dims seems too niche to be on the Cube

OK. I followed the following structural layout for util within our own project (with the location describing the context of what objects the utility works on):
util.coords, util.ndarray, util.cube etc.

I bring this up only as I would guess that my adding yet another function to util will bring up the debate of the complexity and length of utils.

Would you be happy for me to put follow my proposed structure change (making util a sub-package and putting invert_dims within a cube module?
(i.e. iris/util/cube/invert_dims)

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

I have re-written some of the TestAreaWeightedRegrid tests over in the unit testing area, however, those causing a TimedOutException probably need removal and completely new true unit tests written. I don't think this is trivial enough for this PR as it would require evaluation of new numbers for the area weighted regrid results by hand.

I'll await a response to my last comment before moving invert_dims out of the Cube class.

Cheers both.

@rhattersley
Copy link
Member

What's stopping you from indexing your cube with a[::-1] slice?

If the data is not lazy, that would create a copy (i.e. consume twice as much as necessary).

What if that was no longer the case?

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

@rhattersley: What if that was no longer the case?

Are you suggesting that indexing doesn't copy or that you are open to removing the need to copying data by indexing the cube?

@cpelley
Copy link
Author

cpelley commented Apr 26, 2016

I think I'm tempted to limit this PR to the initial commit (ENH: Lazy cube data transpose - 00dae8) unless there are any objections?

@ajdawson
Copy link
Member

Not from me, we can revisit the inversion issue in a separate PR if required.

@cpelley
Copy link
Author

cpelley commented Apr 27, 2016

I have reverted the cube_inverse idea so we are left with the simple transpose change (I'll put a separate PR up after this PR is merged dealing with the invert_dims).

Cheers

@ajdawson
Copy link
Member

I don't see why you are modifying tests about regridding, this change has nothing to do with the regrid code as far as I can see.

@cpelley
Copy link
Author

cpelley commented Apr 27, 2016

I don't see why you are modifying tests about regridding, this change has nothing to do with the regrid code as far as I can see.

See comment. I had started the work already but complete moving of the offending tests would be too much work to justify for this PR. Shall I update the nosetests timeout and raise an issue on this test class for performance? (for when someone is doing work on the regridding)

@ajdawson
Copy link
Member

I don't know if you need to do anything about it in this PR, it has nothing to do with the changes you have introduced. Changes to these tests should probably be in a separate PR where we can review carefully and have some record of the aim and scope of the changes.

@cpelley
Copy link
Author

cpelley commented Apr 27, 2016

Changes to these tests should probably be in a separate PR...

I only did so because possible solutions were discussed on this ticket.
Removed these changes now and created a new PR (#1986 to continue the issue of test failure of the regrid on travis).
I'll squash when your happy.

Cheers

def test_inplace_data_transpose(self):
target_id = id(self.cube.lazy_data())
self.cube.transpose((0, 1))
self.assertEqual(id(self.cube.lazy_data().array), target_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using .array you're relying on a particular implementation of the transpose algorithm. I'd suggest checking the cube is still lazy with has_lazy_data and then resolving the numbers and checking they actually match.

Also, I'd do an actual transpose rather than deliberately doing a "null" transpose.

Putting those together:

data = np.arange(12).reshape(3, 4)
cube = Cube(biggus.NumpyArrayAdapter(data))
cube.transpose()
self.assertTrue(cube.has_lazy_data())
self.assertArrayEqual(data.T, cube.data)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhattersley.

@rhattersley
Copy link
Member

Squash-away my friend! Squash-away!

@cpelley cpelley force-pushed the LAZY_CUBE_TRANSPOSE branch from 08df29a to fd04aa8 Compare May 9, 2016 14:29
@cpelley
Copy link
Author

cpelley commented May 9, 2016

squashed :)

Cheers

@rhattersley
Copy link
Member

ooh just remembered, don't we need to update what's new for this?

My turn to remember the "what's new" entry! 😉

@rhattersley
Copy link
Member

ooh just remembered, don't we need to update what's new for this?

My turn to remember the "what's new" entry! 😉

@cpelley In other words... the only thing this needs is a what's new entry and it's ready to merge.

@cpelley
Copy link
Author

cpelley commented May 12, 2016

Thanks @rhattersley yes sorry I'm on it.

@cpelley
Copy link
Author

cpelley commented May 20, 2016

whatsnew done.

Cheers

@@ -0,0 +1 @@
* The transpose method of a Cube now results in a transposed view of the original data rather than a transposed copy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this really captures the lazy-loading aspect...?

@cpelley
Copy link
Author

cpelley commented Jun 10, 2016

I have updated the wording of the whatsnew.

Cheers

@cpelley
Copy link
Author

cpelley commented Jun 24, 2016

Can we get this into the v1.10 milestone?

@DPeterK
Copy link
Member

DPeterK commented Jun 27, 2016

@cpelley realistically no - v1.10 is big enough already and we just need to get it out the door now rather than adding new functionality to the milestone that might hold it up.

@cpelley
Copy link
Author

cpelley commented Jun 27, 2016

@cpelley realistically no - v1.10 is big enough already and we just need to get it out the door now rather than adding new functionality to the milestone that might hold it up.

Can you please reconsider? This ticket has gone through a review with the only remaining thing flagged by @rhattersley being the whatsnew entry. This was completed over two weeks ago.

@djkirkham
Copy link
Contributor

I know I'm late to the party, but this change will mean that when transpose() is called on a cube without lazy data, it will then appear to have lazy data, i.e. cube.has_lazy_data() will return true.

@cpelley
Copy link
Author

cpelley commented Aug 9, 2016

Thanks for your interest @djkirkham. This behaviour was expected: iris.tests.unit.cube.test_Cube.Test_transpose.test_lazy_data

To modify the behaviour of has_lazy_data (i.e. change the implied meaning of this method) I think is outside the scope of this ticket.
Take for example the following simple example that based on your comment you might also have a problem with?

>>> arr = biggus.NumpyArrayAdapter(np.arange(4).reshape(2, 2))
>>> cube = iris.cube.Cube(arr)
>>> cube.has_lazy_data()
True

I should say that I don't have a problem with the above (but then I have taken the meaning of iris lazy_data to be the behaviour of has_lazy_data, which I think is perfectly reasonable especially given that there is no docstring. Initialising a cube with a biggus array is a very use case in case.

Is there a preference for doing some type checking in transpose?
This would be contrary to current iris behaviour in similar situations like shown in my example above.

Cheers

@djkirkham
Copy link
Contributor

I'm admittedly a novice when it comes to Iris, but my understanding of the meaning of has_lazy_data() is "has the data been loaded from its initial data source?". Thus, I don't have a problem with your example, since the data hasn't been "loaded" from the data source, which happens to be a numpy array.

For me, the usefulness of has_lazy_data() comes from the ability to use it to check whether cube operations load in the data. With the current change that ability is lost with respect to transpose.

Is there a preference for doing some type checking in transpose?

My preference would be to use self.lazy_data() if self.has_lazy_data() returns True, and use self.data otherwise, although maybe that defeats the object of this PR.

@cpelley
Copy link
Author

cpelley commented Aug 9, 2016

Thanks @djkirkham I'll have a closer look and see what I can do.

Cheers

@cpelley
Copy link
Author

cpelley commented Aug 9, 2016

...Thus, I don't have a problem with your example, since the data hasn't been "loaded" from the data source, which happens to be a numpy array.

I don't see how the term "load" comes into it in the context of my example. Right now, has_lazy_data simply means "is biggus array". I suggest we add documentation to has_lazy_data to clarify that (new ticket).

I think I agree in principle with what you expect as the return value though. I have pushed a new commit which now does this.

Thanks @djkirkham

@djkirkham
Copy link
Contributor

Thanks @cpelley, the changes look good.

Right now, has_lazy_data simply means "is biggus array".

I would say that's just the way it's implemented, rather than the intended meaning of the method. Correct me if I'm wrong, but currently the only way cube._my_data ever gets assigned a biggus array is if it's passed in by the user, and if the data in that array is ever loaded, cube._my_data gets assigned a NumPy array.

I agree that it would be a good idea to have the intended meaning documented somewhere.

@cpelley
Copy link
Author

cpelley commented Aug 16, 2016

Ping, the test failures are timeouts.

Cheers

@cpelley
Copy link
Author

cpelley commented Aug 23, 2016

ping

@cpelley
Copy link
Author

cpelley commented Nov 29, 2016

Please let me know if there's anything I can do to help to get this in.
This would provide very useful capability for CMIP6 due to the sheer size of data involved.

Cheers

@marqh
Copy link
Member

marqh commented Dec 7, 2016

Please let me know if there's anything I can do to help to get this in.
This would provide very useful capability for CMIP6 due to the sheer size of data involved.

this looks like a done deal to me, thanks @cpelley

@marqh marqh merged commit 0f432c8 into SciTools:master Dec 7, 2016
@QuLogic QuLogic added this to the v1.12.x milestone Dec 8, 2016
@cpelley
Copy link
Author

cpelley commented Dec 8, 2016

Thanks @marqh, this will make a significant difference to us. Appreciated.

@cpelley cpelley deleted the LAZY_CUBE_TRANSPOSE branch December 8, 2016 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants