-
Notifications
You must be signed in to change notification settings - Fork 300
Trajectory optimisation #1946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trajectory optimisation #1946
Conversation
lib/iris/analysis/trajectory.py
Outdated
| cache = {} | ||
|
|
||
| # Cache the linear interpolator | ||
| scheme = iris.analysis.Linear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you measure the impact of using iris.analysis.Nearest for nearest neighbour interpolation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I didn't. That's probably worth a go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine, but it's not as quick (more like a 45% improvement), and the test still fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with a 45% improvement for the cost of consistency.
I don't think it is reasonable to expect an interpolation to keep the data lazy (at this moment in time). We've talked about adding such a capability (particularly for regrid), but it is beyond anything we've actually needed. In that sense, I'm ok with trajectory interpolate not being lazy (even though before, it was for nearest). |
|
To be clear. It does constitute a major change, and I'd want that to be documented clearly. |
|
@pelson I am still a little unclear about what to do regarding the failed test, though. Are you suggesting that I do change the test, or just ignore the failure? And what exactly did you want me to document? |
Certainly the former. Update the test as part of this PR. Whoever merges it (me maybe) is accepting the loss of functionality (lazy interpolation) in exchange for improved performance.
|
I agree |
|
After running the tests again, I have noticed a problem that occurs when interpolation is performed on hybrid-height cubes, so before this goes any further I will be doing some more investigations into where this originates from and what we can do about it. |
|
Also in the hybrid height function of the trajectory test is this: which also fails. Naively, I expected the results of two different nearest neighbour routines to be the same. The closest point is the closest point, whatever, right? No. I have now discovered that the nearest neighbour interpolation routine which I have removed uses a dramatically different calculation sequence than the standard scheme. The original routine calculates the closest point spherically using loads of coordinate transformations and trigonometry and so on and so forth, whereas the routine I have changed it to uses a super-quick, 4-line function to numerically deduce the nearest neighbour. Hence, the 'nearest neighbours' in the two tests are different. It occurs to me that I may have sacrificed a small degree of accuracy for a fairly reasonable increase in performance. I don't want to take this too lightly because it looks like somebody put really quite a lot of work into the original nearest neighbour routine, but it does sadly seem somewhat extravagant considering its purpose. I am, again, open to advice and discussion regarding this issue. |
| traj = (('grid_latitude', [20.499, 21.501, 22.501, 23.501]), | ||
| ('grid_longitude', [31, 32, 33, 34])) | ||
| xsec = iris.analysis.trajectory.interpolate(cube, traj, method='nearest') | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have altered the numbers here slightly so that the test case is not so unstable. In the reference, the nearest neighbour scheme rounded the grid latitudes to [20, 22, 23, 24]. This should ensure that the standard nearest neighbour scheme yields the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I've just noticed that it doesn't and I need to work out why.
|
Suggestions for how to progress:
cc: @pp-mo |
The changes in this module are to improve the performance of the trajectory interpolation routines for both linear and nearest neighbour schemes. The linear scheme now calls a cached linear interpolator (which improves performance by ~25%), and the nearest neighbour scheme calls a faster method (improving performance by ~62%).
There are a few points to make about these changes, firstly that the routines that are now employed by this module calculate a cube for every sample point before putting the data from that into a new trajectory cube, which I am sure takes up a lot of time. Secondly, one test (test_trajectory.py) fails after implementation of the scheme changes. There is a statement within this test to check whether the original cube data remains lazy, and it evidently does not when using extract_nearest_neighbour rather than _nearest_neighbour_indices_ndcoords.
I am open to advice or ideas about how to proceed with this problem. I am not fond of the idea of changing the test to fit the failure, so I could use a copy of the cube in the interpolation routine, but this seems a little convoluted.