Skip to content

Conversation

@rcomer
Copy link
Member

@rcomer rcomer commented Jan 15, 2016

As discussed at #1887, this PR removes checks which are applied to coordinates before adding and subtracting cubes. It makes these operations consistent with multiplication and division in terms of whether two cubes are compatible.

Having bypassed _add_subtract_common, scalar coords which exist on cube but not on other ("ignorable" coords) are no longer removed as part of the operation. I think this is consistent with allowing the vector coords to differ (if I've understood correctly, _binary_op_common ignores all of the coords on other). There are no unit tests to check whether the "ignorable" coords are removed, but I am getting a failure in the doctests for userguide/cube_maths.rst because a print out of a cube shows more metadata than previously.

I have also removed the depreciated (since 0.8.0) keyword ignore. It's not 100% clear what it did, but I guess it triggered the above mentioned removal of "ignorable" coords.

@rcomer
Copy link
Member Author

rcomer commented Jan 15, 2016

I seem to be getting PEP8 failures for files I haven't touched 😕 .

@rhattersley
Copy link
Member

I can't see any PEP 8 failures, but there were time-outs on the main tests so I've restarted those and they've cleared up.

Also, the "Calculating the difference between two cubes" example in the user guide is showing a genuine mismatch. The result now has forecast_period and time coordinates, even though they aren't common to both t_last and t_first ... that doesn't seem right.

@rcomer
Copy link
Member Author

rcomer commented Feb 8, 2016

Thanks @rhattersley,

I can think of examples where it would make sense to keep the scalar coord from cube even though it doesn't match other. E.g. if you're calculating this year's temperature anomaly by this year's temperature minus climatological temperature, you might still want it labelled with this year's time. There are also examples where it wouldn't make sense, as you say.

Allowing the mismatch on the scalar cube is consistent with the fact that we'd now be allowing mismatches between the dimension coords on cube and other. It's also consistent with what happens when multiplying and dividing with cubes.

I'm not sure what the ideal answer is: perhaps creating a new coord by a function of the coords on cube and other. Though I fear this may get messy when the examples become more complicated...

One major issue with my branch as it stands is that it could break existing user code: if you assumed the (scalar) time coord has been deleted in the operation and try to add a new time coord. A compromise could be to keep _add_subtract_common but simply remove the lines to do with bad_group_coords. This would satisfy my main aim, which is to be able to add/subtract cubes where the dimension coords differ (often they differ only by a minor aspect of the metadata).

Thoughts?

@rhattersley
Copy link
Member

I can think of examples where it would make sense to keep the scalar coord ... also examples where it wouldn't make sense, as you say.

I'm not sure which would be better, but what does seem clear is:

  • 👍 👍 for consistency between plus/minus and multiply/divide.
  • Whatever we do should be simple. As you suggest, we don't want a "messy" combination of coordinates from cube and other. For what it's worth, the xarray package drops any mismatched coordinates.
  • We need to be careful about changing behaviour and the API. If it's backwards incompatible then it should go into the Iris 2.0 release.
    • e.g. I wouldn't delete the ignore keyword argument yet. It's already deprecated and does nothing, so can be safely removed in Iris 2.0.
    • We can ease the migration by providing an iris.FUTURE flag to control the behaviour. (I can help with that.)

If it were me I'd get all of +-*/ to drop mismatched coordinates (instead of just +-). In other words, I'd modify _binary_op_common to limit the result metadata to things which match in both cubes. But like I say, it's not clear to me which option is better so I don't want to push my opinion too hard!

Either way, I'm happy to help with code changes, review comments, throwing rocks from my ivory tower, etc. 😉

@rcomer
Copy link
Member Author

rcomer commented Mar 1, 2016

Just to add further to the murkiness, it occured to be that we have 2 type of "ignorable" coords:

  • The coord exists on both cube and other, but is different.
  • The coord exists on cube but not on other.

I think in the second case it makes more sense to keep the coord, as there may be a good reason why the coord doesn't exist on other. For example, other may contain some threshold values that don't vary with time (I've recently been trying to do a difference between soil moisture and the wilting point parameter, which only varies geographically).

@ajdawson
Copy link
Member

ajdawson commented Mar 1, 2016

I think in the second case it makes more sense to keep the coord, as there may be a good reason why the coord doesn't exist on other.

I'm inclined to agree with you there @rcomer.

@ajdawson ajdawson closed this Mar 1, 2016
@ajdawson ajdawson reopened this Mar 1, 2016
@ajdawson
Copy link
Member

ajdawson commented Mar 1, 2016

Sorry, wrong button!

@rhattersley
Copy link
Member

Sorry, wrong button!

We've all been there. 😁

@rcomer
Copy link
Member Author

rcomer commented Mar 1, 2016

OK, so how about we have _binary_op_common delete scalar coords that differ between cube and other, but keep scalar coords that only exist on cube?

Is there a beginners' guide to working with iris.FUTURE flags (or a fairly straight-forward example I can look at)?

@rcomer
Copy link
Member Author

rcomer commented Mar 1, 2016

In the meantime I have added the coord removal to _binary_op_common just to see if anything breaks!

@rcomer
Copy link
Member Author

rcomer commented Mar 2, 2016

Here is a better example from my recent work, where I definitely wouldn't want the division to remove metadata:

if cube.name() == 'geopotential':
    # Convert ERA-I geopotential to geopotential_height.
    gravity = iris.cube.Cube(9.80665, units='m s-2')
    cube = cube / gravity

I don't understand the latest Travis failure...

@marqh
Copy link
Member

marqh commented Mar 9, 2016

I don't understand the latest Travis failure...

they're not linked to your PR, just a transient test run artefact; all working now

@rcomer
Copy link
Member Author

rcomer commented Apr 12, 2016

Is there anything I need to do to take this forward? I've just seen the announcement that Iris 2.0 should be released this summer, which seems soon enough to me. So I guess extra work for backwards-compatibility in Iris 1.x would be more trouble than it's worth. Or have I misunderstood?

Thanks.

@rhattersley
Copy link
Member

rhattersley commented Apr 15, 2016

The removal of the deprecated ignore argument will have to wait for 2.0. But the main change in this PR is a change to existing functionality, so our rules say we need to provide a migration route - hence my iris.FUTURE comment previously. I'm happy to give that a go and see what the impact is.

@rhattersley
Copy link
Member

Does anyone have any suggestions for a good name for the FUTURE flag? I'm currently using flexible_cube_ops but would prefer something better.

@ajdawson
Copy link
Member

ops_metadata?

@rhattersley
Copy link
Member

For some reason I'd previously focussed on just the behaviour with respect to scalar coordinates and missed the differences in the behaviour of vector coordinates with add/subtract vs. multiply/divide. Some of these differences are hard to understand, let alone argue the case for changing so I've tried to capture what's going on. If I've missed an aspect or got it wrong please shout!

The current (i.e. prior to this PR) behaviour of add/subtract is:

  • Non-scalar coordinate metadata & values must always match ... right down to the level of the var_name.
  • Mismatched scalar coordinates are dropped, e.g. if a coordinate only exists on one of the two cubes, or if the coordinate values differ.

The handling of mismatched scalar coordinates might not be ideal, but the overall behaviour is pretty simple to get my head around. 👍

The new behaviour of add/subtract (which is mostly just the existing behaviour of multiply/divide, so not changed by this PR) is ... um... obscure?! 😕 But it includes:

  • If the shapes match, the coordinates are completely ignored.
  • If the shapes don't match, the non-scalar coordinates must have matching metadata (except if the coordinate only exists on the second cube?)
  • It never checks vector coordinate values! (So I can add a Cube with longitudes [0, 1, ..., 9] to a one with longitudes [12, 33, 34, 900, ...]. Is it just me, or is that a little odd?!)
  • Scalar coordinates are taken from the first cube, unless there is a scalar coordinate with the same metadata but a different value on the second cube ... in which case it is dropped.

Based on the discussion in this PR, the handling of output scalar coordinates seems like an improvement. But the handling of the non-scalar coordinates is all over the place. 😱

I'm all for consistency, but I'm wondering if this is the wrong approach to solving the var_name issue that was mentioned in #1887? What if instead we:

  • modify add/subtract to take the handling of output scalar coordinates from this PR
  • modify add/subtract to remove var_name from the non-scalar coordinate comparison,
  • and then make multiply/divide behave like add/subtract?

@rcomer
Copy link
Member Author

rcomer commented Apr 19, 2016

Can open; worms everywhere.... 😉

Having different var_names is one example of when the add/subtract checks cause issues.

Another is when you have vector coords which don't exist at all on other. So I think my example above (converting geopotential to geopotential_height) would break if you made multiply/divide like add/subtract. To get it working, I think the user would have to add a bunch of scalar coords to gravity which match all the vector coords on cube. In practice the user will probably give up on this and just do the operations on cube.data (with unfortunate implications for deferred loading and keeping track of units).

I also think it's reasonable for a user to want to do something like cube[n:] - cube[:-n], so I'm not really in favour of insisting that the vector coord points match up. A warning if they don't match might be handy, for example if you've failed to notice that your 2 global maps are 180 degrees offset from each other in longitude.

@rcomer
Copy link
Member Author

rcomer commented Apr 19, 2016

This might be a daft idea but...

Could the non-matching coords from other be renamed and added as aux_coords to the resulting cube? That way you don't lose any metadata in the operation, but it remains flexible as to which cubes can be broadcast to each other.

@rhattersley
Copy link
Member

Another is when you have vector coords which don't exist at all on other.

That seems like a simple to understand extension of the scalar coordinate handling. 👍

I also think it's reasonable for a user to want to do something like cube[n:] - cube[:-n], so I'm not really in favour of insisting that the vector coord points match up.

I can see how that would make some sense for a regular time series. Interestingly, that's not the way that something like pandas or xarray would work. (They would "line up" the cubes using the coordinate values of the two series and apply the subtraction where they overlap.) For that particular operation pandas supplies: time_series.diff().

If we're going to be more flexible, I still think we need to make the behaviour more predictable/easier to understand. For example, the var_name issue is only fixed by multiply/divide behaviour if the two cubes have exactly the same shape. If the two cubes have a different dimension order (e.g. one is a transpose of the other) then the var_name issue rears its head again.

Could the non-matching coords from other be renamed and added as aux_coords to the resulting cube?

Do we really need to do that? It doesn't seem very appealing.

@rcomer
Copy link
Member Author

rcomer commented Apr 19, 2016

I still think we need to make the behaviour more predictable/easier to understand. For example, the var_name issue is only fixed by multiply/divide behaviour if the two cubes have exactly the same shape. If the two cubes have a different dimension order (e.g. one is a transpose of the other) then the var_name issue rears its head again.

Agreed - I hadn't actually realised it was possible to broadcast cubes with different dimension orders!

Do we really need to do that? It doesn't seem very appealing.

No. Like I said, daft idea 😁.

@rcomer rcomer force-pushed the add-subtract-broadcast branch from fadc8ec to 9ccd0c4 Compare April 20, 2016 16:23
@rcomer
Copy link
Member Author

rcomer commented Apr 20, 2016

I've modified iris.util.as_compatible_shape so that it only matches coord names instead of coord definitions. This fixes the var_name issue when the cubes have different dimension order.

It also now adds a length 1 anonymous dimension if the coord isn't found on src_cube. Previously if I had
cube1 ['time', 'latitude', 'longitude']
and
cube2 ['time'], (with no scalar latitude/longitude coords)
cube2 did not broadcast to cube1 unless I transposed cube1 first (putting time last so that BA.broadcast_arrays passes). The transpose is no longer necessary with this change.

@rcomer
Copy link
Member Author

rcomer commented Apr 20, 2016

I just realised that in my as_compatible_shape change I haven't handled the case where src_cube has a dimension coord name that isn't on target_cube. I guess it would fail at the reshape step, but I should probably add a more helpful exception message.

@rcomer rcomer force-pushed the add-subtract-broadcast branch from 5134529 to b8a1cee Compare April 21, 2016 10:15
@rhattersley
Copy link
Member

I've modified iris.util.as_compatible_shape so that it only matches coord names instead of coord definitions. This fixes the var_name issue when the cubes have different dimension order.

So the coordinate comparison still doesn't pay attention to values (something which I don't have a particularly strong opinion on either way), and now doesn't pay attention to units, attributes, or the coordinate system. If we're going to ignore values then I guess I'm OK with that.

The transpose is no longer necessary with this change.

Very nice 👍

Before we get into the nit-picky business of finalising tests, and providing (and documenting) backwards compatibility... are we all happy with the proposed behaviour?

@rcomer
Copy link
Member Author

rcomer commented Apr 21, 2016

Before we get into the nit-picky business of finalising tests, and providing (and documenting) backwards compatibility... are we all happy with the proposed behaviour?

In the case where the shapes match, the coord names are still not checked (as far as I can see). Do we want to change that?

(I added the test_match_auxcoord at this point because yesterday's commit broke existing behaviour, so I wanted to nail that down).

Can I get a restart on my Travis TimedOutExceptions?

@rhattersley
Copy link
Member

Can I get a restart on my Travis TimedOutExceptions?

Done.

@rcomer
Copy link
Member Author

rcomer commented Apr 27, 2016

I just noticed that the user guide is still encouraging use of the ignore keyword (see note at the bottom of 10.1). We should probably correct that while we're here.

@rcomer
Copy link
Member Author

rcomer commented Jun 30, 2016

I've noticed recent PRs to make cube.transpose and iris.util.new_axis lazy. Could we do the same for iris.util.as_compatible_shape?

@mo-g
Copy link

mo-g commented Mar 21, 2019

@corinnebosley Does any of this relate to the POC you were doing today with xarray?

@bjlittle
Copy link
Member

@rcomer Remember this PR? I wouln't blame you if you didn't.

I just made the Status: Stalled for the moment, as it's in the area of cube arithmetic, which is scheduled to be refactored (as you know).

So I'm keen to get that work done, and we can see how it impacts / relates to this PR. Does that sound reasonable?

@rcomer
Copy link
Member Author

rcomer commented Jul 10, 2019

@bjlittle how could I forget? 😆

I'm fully expecting this PR to be superseded by whatever comes out of #2599. I guess there may be some ideas in here that we can use, but I suspect trying to resurrect the branch after all this time would be a hideous task.

@bjlittle
Copy link
Member

@rcomer Oh my gosh...

Righty, I'm going to put this PR to bed on the promise that we're finally going to address all these issues for iris version 3.0.0 😄

Maybe we should have some kinda party to celebrate if and when we finally get it all right?!

Cheers! I'll be in touch. I'd appreciate you test driving the forthcoming cube arithmetic feature branch.

@bjlittle bjlittle closed this Oct 22, 2019
@rcomer rcomer deleted the add-subtract-broadcast branch September 16, 2020 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants