Samos misc ancillaries #2133

mspelman07 · 2025-06-20T20:56:25Z

Addresses: https://github.com/metoppv/mo-blue-team/issues/902

This PR builds onto the DistanceTo PR but adding in the generation of a selection of additional ancillary files. These both should be merged into the Samos feature branch when it is ready

Testing:

Ran tests and they passed OK
Added new tests for the new feature(s)

* Avoid cubelists of cubelists. * Replace np.product with np.prod. * Replace np.int. * Replace np.NAN with np.nan. * Replace np.NAN with np.nan. * Replace np.NaN with np.nan. * Replace np.product with np.prod. * Replace assertRaisesRegexp with assertRaisesRegex and use collections.abc.Callable instead of collections.Callable. * Simplify test in test_flatten.py to avoid cubelist within a cubelist. * Replace Cube(None) with an alternative. * Unpin environments for testing. * Remove pygam version. * Update improver_a.yml * Remove non-essential dependencies from yml files and add pins. * Minor edits following review comments. * Modify docstring to better reflect the inputs provided.

* Updates to cube_combiner for new environment * Update checksums

brhooper · 2025-07-02T06:22:12Z

improver/generate_ancillaries/generate_miscellaneous_ancillaries.py

+def generate_roughness_length_at_sites(
+    roughness_length: Cube, neighbour_cube: Cube
+) -> Cube:
+    """Generate a roughness length ancillary cube at the site locations. This performs a
+    spot extraction of the roughness length data at the site locations.
+    Args:
+        roughness_length:
+            A cube containing the roughness length data.
+        neighbour_cube:
+            A cube containing information about the spot data sites and
+            their grid point neighbours.
+    Returns:
+        A cube containing the roughness length at the site locations.
+    """
+    return SpotExtraction(neighbour_selection_method="nearest")(
+        neighbour_cube, roughness_length
+    )


I've not reviewed this properly yet, but I have noticed that the produced vegetative_roughness_length cube contains time, forecast_reference_time and forecast_period coordinates:

vegetative_roughness_length / (m) (spot_index: 14883) Dimension coordinates: spot_index x Auxiliary coordinates: altitude x latitude x longitude x met_office_site_id x wmo_id x Scalar coordinates: forecast_period 0 seconds forecast_reference_time 2017-06-13 06:00:00 time 2017-06-13 06:00:00 Attributes: Conventions 'CF-1.7' STASH m01s03i026 model_grid_hash '9766b1eb83e154c2cf1687d7d05f5583a227f68b89e44b74fefc7028c6d995cb' source 'Data from Met Office Unified Model' title 'unknown' um_version '10.4'

This means that the EMOS and SAMOS CLIs fail to identify this cube as an additional predictor. Can time-related coordinates be removed please?

I've updated this metadata so it now looks like:

The ancillary file has also been updated so should now meet your needs.

maxwhitemet

Thanks @mspelman07.
I think the PR is ready to move to second review after these very minor adjustments.

improver/generate_ancillaries/generate_miscellaneous_ancillaries.py

improver_tests/generate_ancillaries/test_miscellaneous_ancillaries.py

improver/generate_ancillaries/generate_miscellaneous_ancillaries.py

maxwhitemet · 2025-07-28T08:48:33Z

Thank you @mspelman07. I am happy with the changes. I'm moving this ticket to second review for @brhooper to have under their profile so they can merge the associated PRs when the Samos feature branch is ready, as you have indicated the need for.

bayliffe · 2025-08-11T14:33:05Z

@mspelman07 and @brhooper I provide a link below to the Makefile in the ancils repository.

https://github.com/MetOffice/improver_ancil/blob/master/Makefile

The make file can currently be run to reconstruct all of our derived ancillaries, meaning if we change a site list, or get an updated orography, etc. all the ancillaries derived from these basic ancils can be reproduced in their updated form. We need to ensure that any new ancillaries being produced for SAMOS are added to this Makefile so we don't find ourselves unable to reproduce them or update them without wading through notebooks.

We could consider doing this after the release, but nice to have it for when we start panicking and making changes close to the freeze date.

bayliffe

The various ancil generation codes need CLI interfaces, ideally with our preferred settings as default values. These can then be integrated into our ancil Makefile allowing these to be remade trivially when we change the sitelists or other basic inputs.

If there are inputs that are essentially static (shapefiles etc.) that need to be kept, these should be added to the improver_ancil repo unless their size is prohibitive (more than a few MB), in which case we should find an alternative location.

brhooper

Thanks @mspelman07, I've added one comment for you to look at.

brhooper · 2025-08-19T13:20:41Z

improver_tests/generate_ancillaries/test_miscellaneous_ancillaries.py

+    """Test a ValueError is raised if the chosen smoothing coefficient limits
+    are outside the range 0 to 0.5 inclusive."""


I don't think that this doc-string is describing this test.

True, I've corrected this

…d to work. Still requires additional unit tests for new functionality.

…ing processed.

* samos_ancil_distance_to: Format fix. Add a test for using a projection that is unsuitable for the sites being processed. Remove hardcoded projection for distance to calculation. Tests updated to work. Still requires additional unit tests for new functionality. Update dictionary call

… an EPSG projection code.

bayliffe · 2025-08-28T10:35:23Z

Assessment

I was a little nervous about the approach taken here for calculating the site-specific land fraction. The high-resolution (100m) Corrine data is regridded to the 2km UK domain and then neighbourhooding is applied to determine the land fraction around each grid cell, with a radius of 2.5km. The regridding does not preserve the information on the high resolution grid in anyway, meaning we are throwing away potentially useful information. This won't matter in lots of areas, but it will matter around coastlines where this diagnostic is most likely to be important.

To test the importance of this I have recreated the ancillary using the underlying high resolution data. To do this I have created a neighbour file for the site list on the Corrine data's native grid. This allows me to choose the nearest 100x100m grid cell to each site. I can apply the same kind of data groupings Marcus has applied to get a 0 and 1 land mask. For each site I can then extract a 51x51 grid box (5x5km) centred on the site's nearest grid cell, so the same 2.5km radius. I can sum the land points and divide by the total box size to get a land fraction.

Having done this I can then compare the site-specific land fractions generated by the two approaches. The histogram below indicates the distribution of differences:

As we might expect most differences are basically zero. What about the cases where there are differences?

Largest positive difference

The largest positive difference is for a site at Fratton; the high resolution method returns more land than the coarse method.

altitude	5.2 m
latitude	50.7969 degrees
longitude	-1.072 degrees
met_office_site_id	00379850
spot_index	12264

Coarse approach land fraction = 0.44
High res approach land fraction = 0.8

Which one seems right? We can look at a 5x5km box centred on the site:

Counting the sea colour pixels and dividing by the image total size we get a land fraction of:
1 - (83018 / (593 * 591)) = 0.76

This is very close to the land fraction returned using the high-resolution data without coarsening.

Largest negative difference

The largest negative difference is for a site off the coast of Middlesborough; the high resolution method returns less land than the coarse method. This site appears to be in the sea. It is labelled as "W0023" in the bestdata listings, suggesting it may be a wind turbine. If we grab a street view image looking towards the site we can see that this is indeed the case.

altitude	0.0 m
latitude	54.6453 degrees
longitude	-1.094 degrees
met_office_site_id	00382310
spot_index	14533

Coarse approach land fraction = 0.44 (again)
High res approach land fraction = 0.09

We can repeat our pixel counting exercise to determine a land fraction:

1 - (185914 / (466 * 453)) = 0.12

Again this is much closer to the value returned by the high-resolution approach. As such it seems worthwhile modifying this approach to use all of the information available in the Corrine dataset.

…nd is needed here specifically for the corinne surface type ancillary that has no spatial coordinate bounds. Adds a simple test to demonstrate that this works.

…ue is used and when not.

…lution of the input data, rather than coarsening this first, which yields more accurate land fraction estimates.

…seful as the neighbour cube has missing altitudes filled in from an orography, ensuring all of the definitions are complete.

…tion so we can be sure that the metadata of the resulting cube, specifically the altitude coordinate, is complete.

bayliffe · 2025-08-29T11:58:42Z

I've now implemented the alternative approach to calculating the land fractions using the high resolution data. The plot below over the Portsmouth area gives some examples of the changes, with the outer annulus showing the land fraction derived from the coarser data and the inner circle the new land fraction.

We can see that those points further in land have significantly lower land fractions when calculated by the new method compared with the old. Those on the coastal fringes also show differences, but they still have land fractions around 50% as we might expect for relatively straight coastlines.

The coarse approach used a 2500m radius on a 2km grid, meaning nearly every site was calculating the land fraction from a 3x3 grid. As we can see below in a plot of counts of unique land fraction values, this results in significant spikes at 3/9, 4/9, 5/9, 6/9, 7/9 and 8/9ths.

In the first implementation of the ancillary using the new method I have retained the 2.5km radius (though we might want to reduce that). This is calculated on the Corinne 100m resolution grid, meaning 51x51 grid cell regions are used in the calculation, which leads to more distinct land fractions, without the peaks seen for the coarse method.

maxwhitemet · 2025-09-02T15:07:15Z

I'm happy that the changes made better reflect reality. However, the ancillary's utility for SAMOS wind gust calibration is less straight forward:

Here we can see that the CRPS for the configuration with the new land fraction ancillary marginally outperforms that with the old ancillary, however both degrade the forecast skill relative to a simple SAMOS configuration with no ancillaries. The figures in the legend are the CRPS for the first time point.

On a site-by-site basis we can see that the difference is often nominal between configurations with the old and new ancillary, although some site-specific CRPS benefits (green; a reduction) and degradations (red; an increase) are apparent.

Combining the mixed-bag of results in the plot of 'Samos Simple - Land old' with the CRPS line graph, it seems the land fraction ancil shouldn't be used for wind gusts calibration.

brhooper

Thanks @bayliffe, this all looks good to me.

maxwhitemet

Happy with the changes made. Approved 👍

gavinevans and others added 6 commits June 9, 2025 09:28

Updates to cube_combiner for new environment (metoppv#2128)

17c07fe

* Updates to cube_combiner for new environment * Update checksums

Add distance_to ancil file

0ab2b93

update docstring in file and add in unit tests for DistanceTo

440608b

Corrects the file heading

7bd75e9

Add in files and tests for generating the miscenllaneous files

f5a9e75

brhooper self-assigned this Jun 23, 2025

brhooper reviewed Jul 2, 2025

View reviewed changes

Add metadata updates to roughness length

924b2c6

brhooper assigned maxwhitemet and unassigned brhooper Jul 8, 2025

maxwhitemet requested changes Jul 15, 2025

View reviewed changes

maxwhitemet assigned mspelman07 and unassigned maxwhitemet Jul 15, 2025

mspelman07 added 3 commits July 25, 2025 10:02

Docstring updates from review

1baeec8

further review changes

c5d3586

Updates following review

242d003

mspelman07 assigned maxwhitemet and unassigned mspelman07 Jul 25, 2025

mspelman07 added the don't merge yet label Jul 25, 2025

maxwhitemet assigned brhooper and unassigned maxwhitemet Jul 28, 2025

bayliffe reviewed Aug 19, 2025

View reviewed changes

brhooper requested changes Aug 19, 2025

View reviewed changes

brhooper assigned mspelman07 and unassigned brhooper Aug 19, 2025

mspelman07 added 2 commits August 20, 2025 15:54

review response

3011512

add geopandas to environment

96fd862

bayliffe added 4 commits August 27, 2025 14:08

Remove hardcoded projection for distance to calculation. Tests update…

4de6621

…d to work. Still requires additional unit tests for new functionality.

Add a test for using a projection that is unsuitable for the sites be…

c9e8c41

…ing processed.

Format fix.

a5094aa

bayliffe dismissed brhooper’s stale review via acd792c August 27, 2025 13:54

Update misc ancils code that uses DistanceTo plugin to accept and use…

4d62d02

… an EPSG projection code.

bayliffe added 5 commits August 28, 2025 12:43

Add bounds guessing to neighbour finding. This has long been needed a…

6117963

…nd is needed here specifically for the corinne surface type ancillary that has no spatial coordinate bounds. Adds a simple test to demonstrate that this works.

Improver neighbour finding doc-string to indicate when the radius val…

26356a1

…ue is used and when not.

Rewrite the land fraction ancillary plugin to work on the native reso…

787d983

…lution of the input data, rather than coarsening this first, which yields more accurate land fraction estimates.

Add utility for extracting site list from a neighbour cube. This is u…

9f67239

…seful as the neighbour cube has missing altitudes filled in from an orography, ensuring all of the definitions are complete.

Re-implement the use of a neighbour cube for the land fraction genera…

7b44d37

…tion so we can be sure that the metadata of the resulting cube, specifically the altitude coordinate, is complete.

bayliffe assigned brhooper and maxwhitemet and unassigned bayliffe Aug 29, 2025

maxwhitemet assigned brhooper and unassigned brhooper and maxwhitemet Sep 2, 2025

brhooper previously approved these changes Sep 10, 2025

View reviewed changes

brhooper assigned bayliffe and unassigned brhooper Sep 10, 2025

Merge branch 'master' into samos_misc_ancillaries

b936dd1

bayliffe dismissed brhooper’s stale review via b936dd1 September 10, 2025 13:21

Formatting fix after conflict resolution.

7e371e6

brhooper approved these changes Sep 10, 2025

View reviewed changes

maxwhitemet approved these changes Sep 10, 2025

View reviewed changes

bayliffe merged commit 83eeff6 into metoppv:master Sep 10, 2025
7 checks passed

gavinevans removed the don't merge yet label Oct 1, 2025

		"""Test a ValueError is raised if the chosen smoothing coefficient limits
		are outside the range 0 to 0.5 inclusive."""

Samos misc ancillaries #2133

Samos misc ancillaries #2133

Uh oh!

Conversation

mspelman07 commented Jun 20, 2025

Uh oh!

brhooper Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

mspelman07 Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

maxwhitemet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxwhitemet commented Jul 28, 2025

Uh oh!

bayliffe commented Aug 11, 2025

Uh oh!

bayliffe left a comment

Choose a reason for hiding this comment

Uh oh!

brhooper left a comment

Choose a reason for hiding this comment

Uh oh!

brhooper Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

mspelman07 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

bayliffe commented Aug 28, 2025

Assessment

Largest positive difference

Largest negative difference

Uh oh!

bayliffe commented Aug 29, 2025

Uh oh!

maxwhitemet commented Sep 2, 2025

Uh oh!

brhooper left a comment

Choose a reason for hiding this comment

Uh oh!

maxwhitemet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants