Skip to content

Conversation

@mspelman07
Copy link
Contributor

Addresses: https://github.com/metoppv/mo-blue-team/issues/902

This PR builds onto the DistanceTo PR but adding in the generation of a selection of additional ancillary files. These both should be merged into the Samos feature branch when it is ready

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

gavinevans and others added 6 commits June 9, 2025 09:28
* Avoid cubelists of cubelists.

* Replace np.product with np.prod.

* Replace np.int.

* Replace np.NAN with np.nan.

* Replace np.NAN with np.nan.

* Replace np.NaN with np.nan.

* Replace np.product with np.prod.

* Replace assertRaisesRegexp with assertRaisesRegex and use collections.abc.Callable instead of collections.Callable.

* Simplify test in test_flatten.py to avoid cubelist within a cubelist.

* Replace Cube(None) with an alternative.

* Unpin environments for testing.

* Remove pygam version.

* Update improver_a.yml

* Remove non-essential dependencies from yml files and add pins.

* Minor edits following review comments.

* Modify docstring to better reflect the inputs provided.
* Updates to cube_combiner for new environment

* Update checksums
@brhooper brhooper self-assigned this Jun 23, 2025
Comment on lines 83 to 99
def generate_roughness_length_at_sites(
roughness_length: Cube, neighbour_cube: Cube
) -> Cube:
"""Generate a roughness length ancillary cube at the site locations. This performs a
spot extraction of the roughness length data at the site locations.
Args:
roughness_length:
A cube containing the roughness length data.
neighbour_cube:
A cube containing information about the spot data sites and
their grid point neighbours.
Returns:
A cube containing the roughness length at the site locations.
"""
return SpotExtraction(neighbour_selection_method="nearest")(
neighbour_cube, roughness_length
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not reviewed this properly yet, but I have noticed that the produced vegetative_roughness_length cube contains time, forecast_reference_time and forecast_period coordinates:

vegetative_roughness_length / (m)   (spot_index: 14883)
    Dimension coordinates:
        spot_index                             x
    Auxiliary coordinates:
        altitude                               x
        latitude                               x
        longitude                              x
        met_office_site_id                     x
        wmo_id                                 x
    Scalar coordinates:
        forecast_period             0 seconds
        forecast_reference_time     2017-06-13 06:00:00
        time                        2017-06-13 06:00:00
    Attributes:
        Conventions                 'CF-1.7'
        STASH                       m01s03i026
        model_grid_hash             '9766b1eb83e154c2cf1687d7d05f5583a227f68b89e44b74fefc7028c6d995cb'
        source                      'Data from Met Office Unified Model'
        title                       'unknown'
        um_version                  '10.4'

This means that the EMOS and SAMOS CLIs fail to identify this cube as an additional predictor. Can time-related coordinates be removed please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this metadata so it now looks like:
image

The ancillary file has also been updated so should now meet your needs.

@brhooper brhooper assigned maxwhitemet and unassigned brhooper Jul 8, 2025
Copy link
Contributor

@maxwhitemet maxwhitemet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mspelman07.
I think the PR is ready to move to second review after these very minor adjustments.

@maxwhitemet
Copy link
Contributor

Thank you @mspelman07. I am happy with the changes. I'm moving this ticket to second review for @brhooper to have under their profile so they can merge the associated PRs when the Samos feature branch is ready, as you have indicated the need for.

@maxwhitemet maxwhitemet assigned brhooper and unassigned maxwhitemet Jul 28, 2025
@bayliffe
Copy link
Contributor

@mspelman07 and @brhooper I provide a link below to the Makefile in the ancils repository.

https://github.com/MetOffice/improver_ancil/blob/master/Makefile

The make file can currently be run to reconstruct all of our derived ancillaries, meaning if we change a site list, or get an updated orography, etc. all the ancillaries derived from these basic ancils can be reproduced in their updated form. We need to ensure that any new ancillaries being produced for SAMOS are added to this Makefile so we don't find ourselves unable to reproduce them or update them without wading through notebooks.

We could consider doing this after the release, but nice to have it for when we start panicking and making changes close to the freeze date.

Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The various ancil generation codes need CLI interfaces, ideally with our preferred settings as default values. These can then be integrated into our ancil Makefile allowing these to be remade trivially when we change the sitelists or other basic inputs.

If there are inputs that are essentially static (shapefiles etc.) that need to be kept, these should be added to the improver_ancil repo unless their size is prohibitive (more than a few MB), in which case we should find an alternative location.

Copy link
Contributor

@brhooper brhooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mspelman07, I've added one comment for you to look at.

Comment on lines 214 to 215
"""Test a ValueError is raised if the chosen smoothing coefficient limits
are outside the range 0 to 0.5 inclusive."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this doc-string is describing this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I've corrected this

@brhooper brhooper assigned mspelman07 and unassigned brhooper Aug 19, 2025
…d to work. Still requires additional unit tests for new functionality.
* samos_ancil_distance_to:
  Format fix.
  Add a test for using a projection that is unsuitable for the sites being processed.
  Remove hardcoded projection for distance to calculation. Tests updated to work. Still requires additional unit tests for new functionality.
  Update dictionary call
@bayliffe
Copy link
Contributor

Assessment

I was a little nervous about the approach taken here for calculating the site-specific land fraction. The high-resolution (100m) Corrine data is regridded to the 2km UK domain and then neighbourhooding is applied to determine the land fraction around each grid cell, with a radius of 2.5km. The regridding does not preserve the information on the high resolution grid in anyway, meaning we are throwing away potentially useful information. This won't matter in lots of areas, but it will matter around coastlines where this diagnostic is most likely to be important.

To test the importance of this I have recreated the ancillary using the underlying high resolution data. To do this I have created a neighbour file for the site list on the Corrine data's native grid. This allows me to choose the nearest 100x100m grid cell to each site. I can apply the same kind of data groupings Marcus has applied to get a 0 and 1 land mask. For each site I can then extract a 51x51 grid box (5x5km) centred on the site's nearest grid cell, so the same 2.5km radius. I can sum the land points and divide by the total box size to get a land fraction.

Having done this I can then compare the site-specific land fractions generated by the two approaches. The histogram below indicates the distribution of differences:

image

As we might expect most differences are basically zero. What about the cases where there are differences?

Largest positive difference

The largest positive difference is for a site at Fratton; the high resolution method returns more land than the coarse method.

altitude	5.2 m
latitude	50.7969 degrees
longitude	-1.072 degrees
met_office_site_id	00379850
spot_index	12264

Coarse approach land fraction = 0.44
High res approach land fraction = 0.8

Which one seems right? We can look at a 5x5km box centred on the site:
image

Counting the sea colour pixels and dividing by the image total size we get a land fraction of:
1 - (83018 / (593 * 591)) = 0.76

This is very close to the land fraction returned using the high-resolution data without coarsening.

Largest negative difference

The largest negative difference is for a site off the coast of Middlesborough; the high resolution method returns less land than the coarse method. This site appears to be in the sea. It is labelled as "W0023" in the bestdata listings, suggesting it may be a wind turbine. If we grab a street view image looking towards the site we can see that this is indeed the case.

image
altitude	0.0 m
latitude	54.6453 degrees
longitude	-1.094 degrees
met_office_site_id	00382310
spot_index	14533

Coarse approach land fraction = 0.44 (again)
High res approach land fraction = 0.09

We can repeat our pixel counting exercise to determine a land fraction:

image

1 - (185914 / (466 * 453)) = 0.12

Again this is much closer to the value returned by the high-resolution approach. As such it seems worthwhile modifying this approach to use all of the information available in the Corrine dataset.

…nd is needed here specifically for the corinne surface type ancillary that has no spatial coordinate bounds. Adds a simple test to demonstrate that this works.
…lution of the input data, rather than coarsening this first, which yields more accurate land fraction estimates.
…seful as the neighbour cube has missing altitudes filled in from an orography, ensuring all of the definitions are complete.
…tion so we can be sure that the metadata of the resulting cube, specifically the altitude coordinate, is complete.
@bayliffe
Copy link
Contributor

I've now implemented the alternative approach to calculating the land fractions using the high resolution data. The plot below over the Portsmouth area gives some examples of the changes, with the outer annulus showing the land fraction derived from the coarser data and the inner circle the new land fraction.

image

We can see that those points further in land have significantly lower land fractions when calculated by the new method compared with the old. Those on the coastal fringes also show differences, but they still have land fractions around 50% as we might expect for relatively straight coastlines.

The coarse approach used a 2500m radius on a 2km grid, meaning nearly every site was calculating the land fraction from a 3x3 grid. As we can see below in a plot of counts of unique land fraction values, this results in significant spikes at 3/9, 4/9, 5/9, 6/9, 7/9 and 8/9ths.

image

In the first implementation of the ancillary using the new method I have retained the 2.5km radius (though we might want to reduce that). This is calculated on the Corinne 100m resolution grid, meaning 51x51 grid cell regions are used in the calculation, which leads to more distinct land fractions, without the peaks seen for the coarse method.

image

@bayliffe bayliffe assigned brhooper and maxwhitemet and unassigned bayliffe Aug 29, 2025
@maxwhitemet
Copy link
Contributor

I'm happy that the changes made better reflect reality. However, the ancillary's utility for SAMOS wind gust calibration is less straight forward:

Here we can see that the CRPS for the configuration with the new land fraction ancillary marginally outperforms that with the old ancillary, however both degrade the forecast skill relative to a simple SAMOS configuration with no ancillaries. The figures in the legend are the CRPS for the first time point.
image

On a site-by-site basis we can see that the difference is often nominal between configurations with the old and new ancillary, although some site-specific CRPS benefits (green; a reduction) and degradations (red; an increase) are apparent.

image image

Combining the mixed-bag of results in the plot of 'Samos Simple - Land old' with the CRPS line graph, it seems the land fraction ancil shouldn't be used for wind gusts calibration.

@maxwhitemet maxwhitemet assigned brhooper and unassigned brhooper and maxwhitemet Sep 2, 2025
brhooper
brhooper previously approved these changes Sep 10, 2025
Copy link
Contributor

@brhooper brhooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bayliffe, this all looks good to me.

@brhooper brhooper assigned bayliffe and unassigned brhooper Sep 10, 2025
Copy link
Contributor

@maxwhitemet maxwhitemet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the changes made. Approved 👍

@bayliffe bayliffe merged commit 83eeff6 into metoppv:master Sep 10, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants