Skip to content

Conversation

@rcomer
Copy link
Member

@rcomer rcomer commented Aug 25, 2021

🚀 Pull Request

Description

Previously, @duncanwp proposed changing the behaviour of coord.collapsed to use a mean rather than the current behaviour of taking the mid-point between the newly calculated bounds (#3029). This would have been a breaking change and, I think, there may be cases where the mean isn't the best choice.

What if we made the treatment of points a user choice?

Make a cube

import iris
import iris.cube

cube = iris.cube.Cube(range(5))
coord1 = iris.coords.DimCoord(range(5), long_name='foo')
coord2 = iris.coords.AuxCoord([5,2,4,42,0], long_name='bar')
cube.add_dim_coord(coord1, 0)
cube.add_aux_coord(coord2, 0)

Default behaviour is unchanged

print(cube.collapsed('foo', iris.analysis.MEAN))
unknown / (unknown)                 (scalar cube)
    Scalar coordinates:
        bar                         21, bound=(0, 42)
        foo                         2, bound=(0, 4)
    Cell methods:
        mean                        foo

But maybe we decide that the mean would be a better choice for 'bar'

import numpy as np
print(cube.collapsed('foo', iris.analysis.MEAN, points_funcs=dict(bar=np.mean)))
    Scalar coordinates:
        bar                         10.6, bound=(0, 42)
        foo                         2, bound=(0, 4)
    Cell methods:
        mean                        foo

In applications that calculate climatologies (#4098), neither the mid-point nor the mean really work, because averaging Januarys can give us July:

import datetime

import cf_units

times = [datetime.datetime(year, 1, 1) for year in range(2000, 2004)]
tunit = cf_units.Unit('days since 1970-01-01')
tcoord = iris.coords.DimCoord(tunit.date2num(times), units=tunit, standard_name='time')

print(tcoord.collapsed())
print(tcoord.collapsed(points_func=np.mean))
DimCoord([2001-07-02 00:00:00], bounds=[[2000-01-01 00:00:00, 2003-01-01 00:00:00]], standard_name='time', calendar='gregorian')
DimCoord([2001-07-02 06:00:00], bounds=[[2000-01-01 00:00:00, 2003-01-01 00:00:00]], standard_name='time', calendar='gregorian')

But we could reasonably use a max or min to get a sensible time of year

print(tcoord.collapsed(points_func=np.max))
print(tcoord.collapsed(points_func=np.min))
DimCoord([2003-01-01 00:00:00], bounds=[[2000-01-01 00:00:00, 2003-01-01 00:00:00]], standard_name='time', calendar='gregorian')
DimCoord([2000-01-01 00:00:00], bounds=[[2000-01-01 00:00:00, 2003-01-01 00:00:00]], standard_name='time', calendar='gregorian')

Obviously it would need a lot more work to do this properly, including addressing aggregated_by.


Consult Iris pull request check list

@rcomer rcomer marked this pull request as draft August 25, 2021 17:57
@rcomer rcomer closed this Oct 20, 2021
@rcomer rcomer deleted the collapse-configurable-points branch December 20, 2023 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant