Skip to content

Conversation

@corinnebosley
Copy link
Member

The changes in this module are to improve the performance of the trajectory interpolation routines for both linear and nearest neighbour schemes. The linear scheme now calls a cached linear interpolator (which improves performance by ~25%), and the nearest neighbour scheme calls a faster method (improving performance by ~62%).

There are a few points to make about these changes, firstly that the routines that are now employed by this module calculate a cube for every sample point before putting the data from that into a new trajectory cube, which I am sure takes up a lot of time. Secondly, one test (test_trajectory.py) fails after implementation of the scheme changes. There is a statement within this test to check whether the original cube data remains lazy, and it evidently does not when using extract_nearest_neighbour rather than _nearest_neighbour_indices_ndcoords.

I am open to advice or ideas about how to proceed with this problem. I am not fond of the idea of changing the test to fit the failure, so I could use a copy of the cube in the interpolation routine, but this seems a little convoluted.

cache = {}

# Cache the linear interpolator
scheme = iris.analysis.Linear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you measure the impact of using iris.analysis.Nearest for nearest neighbour interpolation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I didn't. That's probably worth a go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine, but it's not as quick (more like a 45% improvement), and the test still fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with a 45% improvement for the cost of consistency.

@pelson
Copy link
Member

pelson commented Feb 22, 2016

There is a statement within this test to check whether the original cube data remains lazy, and it evidently does not when using extract_nearest_neighbour rather than _nearest_neighbour_indices_ndcoords.

I don't think it is reasonable to expect an interpolation to keep the data lazy (at this moment in time). We've talked about adding such a capability (particularly for regrid), but it is beyond anything we've actually needed. In that sense, I'm ok with trajectory interpolate not being lazy (even though before, it was for nearest).

@pelson
Copy link
Member

pelson commented Feb 22, 2016

To be clear. It does constitute a major change, and I'd want that to be documented clearly.

@corinnebosley
Copy link
Member Author

@pelson
I have updated the pull request to include the commit in which the nearest neighbour scheme is consistent with the format of the linear scheme.

I am still a little unclear about what to do regarding the failed test, though. Are you suggesting that I do change the test, or just ignore the failure? And what exactly did you want me to document?

@pelson
Copy link
Member

pelson commented Feb 23, 2016

Are you suggesting that I do change the test, or just ignore the failure?

Certainly the former. Update the test as part of this PR. Whoever merges it (me maybe) is accepting the loss of functionality (lazy interpolation) in exchange for improved performance.

And what exactly did you want me to document?

@marqh
Copy link
Member

marqh commented Feb 23, 2016

Are you suggesting that I do change the test, or just ignore the failure?

Certainly the former. Update the test as part of this PR. Whoever merges it (me maybe) is accepting the loss of functionality (lazy interpolation) in exchange for improved performance.

I agree

@corinnebosley
Copy link
Member Author

After running the tests again, I have noticed a problem that occurs when interpolation is performed on hybrid-height cubes, so before this goes any further I will be doing some more investigations into where this originates from and what we can do about it.

@corinnebosley
Copy link
Member Author

Also in the hybrid height function of the trajectory test is this:

    self.assertCML([cube, xsec], ('trajectory', 'hybrid_height.cml'))

which also fails. Naively, I expected the results of two different nearest neighbour routines to be the same. The closest point is the closest point, whatever, right?

No.

I have now discovered that the nearest neighbour interpolation routine which I have removed uses a dramatically different calculation sequence than the standard scheme. The original routine calculates the closest point spherically using loads of coordinate transformations and trigonometry and so on and so forth, whereas the routine I have changed it to uses a super-quick, 4-line function to numerically deduce the nearest neighbour. Hence, the 'nearest neighbours' in the two tests are different.

It occurs to me that I may have sacrificed a small degree of accuracy for a fairly reasonable increase in performance. I don't want to take this too lightly because it looks like somebody put really quite a lot of work into the original nearest neighbour routine, but it does sadly seem somewhat extravagant considering its purpose.

I am, again, open to advice and discussion regarding this issue.

traj = (('grid_latitude', [20.499, 21.501, 22.501, 23.501]),
('grid_longitude', [31, 32, 33, 34]))
xsec = iris.analysis.trajectory.interpolate(cube, traj, method='nearest')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have altered the numbers here slightly so that the test case is not so unstable. In the reference, the nearest neighbour scheme rounded the grid latitudes to [20, 22, 23, 24]. This should ensure that the standard nearest neighbour scheme yields the same result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I've just noticed that it doesn't and I need to work out why.

@rhattersley
Copy link
Member

rhattersley commented Apr 26, 2016

Suggestions for how to progress:

  1. Rescue the linear interpolation performance enhancement (in this PR?)
  2. Update the docs for iris.analysis.trajectory.interpolate to make it clear that method='nearest' switches to a "spherical" mode and is loads slower. Also add a signpost to the faster (but more limited - it doesn't handle 2D coords which are useful for tri-polar grids) Nearest() scheme way of doing things.
  3. Figure out the API for providing fast (i.e. non-spherical) nearest-neighbour interpolation for a trajectory. (In the context of rationalising all our interpolation, regridding, and indexing APIs.) How can a user choose between cartesian and spherical?
  4. Do we need a transition API before (3) is ready? e.g. Add method='nearest_cartesian' to iris.analysis.trajectory.interpolate. (Possibly deprecated from the start if (3) suggests a preferred API.)

cc: @pp-mo

@corinnebosley corinnebosley deleted the trajectory_optimisation branch October 24, 2017 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants