Skip to content

Conversation

@gavinevans
Copy link
Contributor

Part of https://github.com/metoppv/mo-strategic-issues/issues/150

Description
This PR provides an initial set of changes to reduce the errors and failures when upgrading the package versions used by the improver repo. Please note that this is targeting a feature branch, rather than master. I've also unpinned the environments used by GitHub Actions.

The issues addressed in this PR:

  • Iris now strictly only allows cubes within a cubelist, whereas previously you could put a cubelist in a cubelist. I've corrected some instances of this behaviour.
  • np.product => np.prod
  • np.int => np.int32
  • np.NAN => np.nan
  • np.NaN => np.nan
  • assertRaisesRegexp => assertRaisesRegex
  • Cube(None) => Cube(shape=(0,))

There are a large number of other errors that will be addressed by forthcoming PRs.

Feature branch tests:

922 failed, 5633 passed, 140 skipped, 1 xpassed, 3827 warnings, 493 errors in 131.53s (0:02:11)

i.e. a ~20% failure rate.

Following this PR:

805 failed, 6120 passed, 228 skipped, 1 xpassed, 3900 warnings, 11 errors in 124.60s (0:02:04)

i.e. a ~12% failure rate.

Testing:

  • Ran tests and they passed OK

Copy link
Contributor

@mspelman07 mspelman07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I think this looks good, I just have a couple of minor comments.

brhooper added a commit to brhooper/improver that referenced this pull request May 28, 2025
Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, pre-commit seems to be missing by default. Worth talking to Carwyn about whether this should actually be included in the environment or if he wants it kept out and for this to be a site-specific concern. We do rely on it in the local tests, suggesting maybe it should be in there.

I can't quite get the same test success / fail ratio as you. I have run the tests using bin/improver-tests unit with the conda environment you've provided selected for loading. In this way I get the following:

collected 7161 items / 497 deselected / 2 skipped / 6664 selected
where the 497 tests deselected are the acceptance tests.
377 failed, 6052 passed, 225 skipped, 497 deselected, 1 xpassed, 2819 warnings, 11 errors in 257.82s (0:04:17)

Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this as our starting point.

@bayliffe bayliffe merged commit 9e809ce into metoppv:mobt_800_env_upgrade_feature_branch Jun 9, 2025
5 of 8 checks passed
@cpelley
Copy link
Contributor

cpelley commented Jun 10, 2025

The pre-commit environment is the developer environment and should ideally remain separate.

You install pre-commit once and use that for all your repositories (DAGrunner, improver, paraflow, improver_suite, Joe, Bob etc.):

pip install --user pre-commit

It will respect your currently active Conda or virtualenv environment.

Just sourced the eppenv env. then ran pre-commit on improver and all was fine.

I see that the improver_production conda environment currently contains pre-commit. I don't remember putting that in there. This is a problem as pre-commit shouldn't be part of the operational improver environment. Also, this means that it isn't so easily upgraded.

bayliffe added a commit that referenced this pull request Jul 30, 2025
* First cut changes in preparation for environment upgrade (#2124)

* Avoid cubelists of cubelists.

* Replace np.product with np.prod.

* Replace np.int.

* Replace np.NAN with np.nan.

* Replace np.NAN with np.nan.

* Replace np.NaN with np.nan.

* Replace np.product with np.prod.

* Replace assertRaisesRegexp with assertRaisesRegex and use collections.abc.Callable instead of collections.Callable.

* Simplify test in test_flatten.py to avoid cubelist within a cubelist.

* Replace Cube(None) with an alternative.

* Unpin environments for testing.

* Remove pygam version.

* Update improver_a.yml

* Remove non-essential dependencies from yml files and add pins.

* Minor edits following review comments.

* Modify docstring to better reflect the inputs provided.

* Updates to cube_combiner for new environment (#2128)

* Updates to cube_combiner for new environment

* Update checksums

* Mobt913: Environment upgrade - Metadata (#2137)

* Fix failing tests following environment change.

* Handle bug where num2date is returning cftime datetime instead of python datetime where it shouldn't

* Mobt915: environment upgrade - Orographic enhancement (#2138)

* Cast variable to float32 to resolve environment upgrade error

* Fix pre-commit reformatting

* Environment upgrade: expected value (#2139)

* Modify expected scalar values in unit tests to have specific float32 type to pass more rigorous numpy assert_allclose test.

* Update checksums

* Mobt 914 env update nowcasting (#2142)

* Ensures precision of output data are float32

* Reduces precision requirement of unit tests from 7 decimal places to 6.

* Fixes unit tests that were passing the wrong arguments to the methods they were testing

* Developer_tools: Updates metadata interpreter to print dict-like strings from CubeAttrsDict objects (#2134)

* Updates metadata interpreter to print dict-like strings from CubeAttrsDict objects

* Improves imports

* Simplifies conversion of cube_attrs to dict

* Environment upgrade: spot data (#2141)

* Change expected results in neighbour selection tied cases due to slight variation in return from coordinate transform.

* Change from using numpy float type to native float type when converting type of user provided percentiles.

* Fix up acceptance test which collects warnings. Update checksums.

* Mobt 916 environment upgrade regrid (#2144)

* Resolve failings tests following environment upgrade

* Fix pre_commit requirements

* Remove print statment and remove lots of unnecessarily added trailing .0s.

---------

Co-authored-by: benjamin.ayliffe <[email protected]>

* Upgrade utilities unit tests (#2140)

* Remove test for non-Cube inputs to a CubeList, as this is no longer possible.

* Unpack list for use when slicing in cube_extraction.

* Edits to test_temporal.py.

* Correction to datatypes within temporal_interpolation.py.

* Corrections to gradient_between_vertical_levels

* GAM corrections copied from #2126 for completeness.

* Edits, so that load unit tests pass, although we're no longer testing the lazy loading successfully, so this may need reconsideration.

* Alter ordering of bounds in mathematical_operations.py due to underlying change in iris.

* Retain intended indexing behaviour within neighbourhood_tools.py by converting list to tuple.

* Changes to allow more tolerance for solar interpolation tests, where data type differences can impact precision.

* Update load unit tests to override iris setting that prevents lazy loading for small files.

* Revert changes to solar.py, which are not required, following allowing a greater tolerance in the test_solar_interpolation unit test.

* Minor edit to test_load.py

* Minor edit to return a python datetime from the iris_time_to_datetime function, rather than a cftime datetime object.

* Simplification to use to_real_datetime method available in iris.

* Mobt906 Calibration unit tests upgrades for new environment (#2127)

* Fix tests in dz_rescaling. Added re.escape() call to regex pattern match.

* Update IMPROVER choose() function as numpy.lib.index_tricks.ndindex() moved to numpy.ndindex().

* Fix faulty plugin call in improver_tests/calibration/test_init.py.

* Fix tests in reliability_calibration. Added re.escape() call to regex pattern match.

* Fix tests in calibration/utilities. Added re.escape() call to regex pattern match.

* Modify rainforests_calibration/conftest.py so that both treelite and treelite_runtime are checked for before attempting to run relevant tests, rather than only treelite.

* Change expected results for failing ensemble_calibration/test_EstimateCoefficientsForEnsembleCalibration.py tests.

* Add treelite_runtime module dependency to environments containing treelite as a dependency.

* Revert changes related to rainforest calibration unit tests and treelite_runtime. More work is required to properly handle upgrading from treelite 3.x to 4.y.

* Change expected results for 2 tests in test_CalibratedForecastDistributionParameters.py.

* Revert changes to tests which were failing due to Regex pattern matching issues with numpy types being displayed. Instead, map these objects to base Python types, e.g str instead of np.str.

* Environment upgrade: generate ancillaries (#2143)

* Make constant float64 type where it is used to preserve original results.

* Changes to orographic smoothing coefficient generation to avoid type escalation to float64.

* Formatting update.

* enforce data type for cloud condensation level plugins (#2146)

* Acceptance test batch 2 (#2148)

* Enforce float32.

* Enforce input to cube.collapsed is the same type as the output.

* Get dtype without using .data.

* Modification to enforce the datatype within the mode_aggregator method and add an assertion to the unit tests to show that the output dtype matches the input dtype.

* Acceptance test batch 5 (#2149)

* Minor edit to textural.py

* Move setting of data type on the threshold values into the threshold plugin.

* Enforce data type (#2147)

* Update environment for rainforests calibration (#2136)

This patch provides changes to the apply_rainforests_calibration
plugin in order to support the IMPROVER repo environment update (see
Github issue #150).

Specifically, this patch sets lower bound versions
for the treelite and lightgbm packages to support an up-to-date
Python environment. Corresponding code changes for these version
upgrades are included. Furthermore, I have added a few minor quality
changes to the rainforests code - see below list.

This changeset includes:

* Set lower bound for treelite and lightgbm package versions
* Add tl2cgen package (needed for treelite v4.0.0 and up)
* Changes to rainforests calibration code to support updated package
versions
* Update docs to reflect new package versions
* Quality of life changes to rainforests calibration code:
     * Ensure correct type annotations are used
     * Use custom exceptions where feasible
     * Use StrEnum for model name to constrain possible argument values
     * Improved error handling
     * Additional clarity in docstrings
* Changes to rainforests calibration unit tests:
     * Update unit tests to support the above changes
     * Add additional rainforests calibration unit tests
     * Move some code that supports unit tests, to avoid duplication
* Update rainforests calibration acceptance tests and checksums 

Relates to Github issue #150

* Resolve calendar issues in kgos, checksums, and a depracated warning (#2151)

* Acceptance test batch 11 (#2150)

* Resolve calendar and float dtype issues following environment upgrade

* Repeat for weighted_blending

* Mobt 943 acceptance tests batch 9 (#2152)

* fixes mobt_943_acceptance_tests_batch_9 following environment changes
- Update calendars on KGOs
- Fix returning of float32 instead of in8 for categorical inputs
- Remove surprising use of iris.cube.CubeList() in acceptance test, causing test failures

* Pre-commit formatting fixes

* Simplify typecasting from previous commit

* Remove change of checksums so left until end of all acceptance testing

* Update retrieval of package version. (#2161)

Co-authored-by: Gavin Evans <[email protected]>

* MOBT-290 Environment upgrade: modal categories (#2157)

* Accommodate type overflow in modal aggregator whilst resetting the data type at the end of the process.

* Style fix

* Move dtype setting out of numpy.clip call as this causes a casting error. (#2158)

* Enforce float32 type for accumulated precipitation in nowcast accumulate code. (#2159)

* Remove timezone delocalisation from estimate-emos-from-table CLI to allow correct filtering of pyarrow parquet files. (#2156)

* Update checksums for acceptance test data following all of the environment upgrade work. (#2160)

* Cloud condensation level updates for the new environment (#2168)

* update checksums for cloud condensation level

* Limits cloud base pressure and temperature to surface values if super-saturated

---------

Co-authored-by: Stephen Moseley <[email protected]>

---------

Co-authored-by: gavinevans <[email protected]>
Co-authored-by: mspelman07 <[email protected]>
Co-authored-by: Max <[email protected]>
Co-authored-by: Stephen Moseley <[email protected]>
Co-authored-by: Ben Hooper <[email protected]>
Co-authored-by: Rachael Esler <[email protected]>
mo-philrelton added a commit that referenced this pull request Sep 8, 2025
* Fill value (#2129)

* update save function to include fill value

* update docstring

* update docstring

* Update tests for fill value

* 2154 BUG: Convective cloud base and top can be unphysical (#2155)

* Limits cloud base pressure and temperature to surface values if super-saturated

* Updates checksums for revised KGO

* Fix recalibration docstring (#2166)

* Use np.interp instead of scipy interp1d

* Revert "Use np.interp instead of scipy interp1d"

This reverts commit 3ea6cd6.

* fix docstring

---------

Co-authored-by: Belinda Trotta <[email protected]>

* Environment upgrade feature branch (#2167)

* First cut changes in preparation for environment upgrade (#2124)

* Avoid cubelists of cubelists.

* Replace np.product with np.prod.

* Replace np.int.

* Replace np.NAN with np.nan.

* Replace np.NAN with np.nan.

* Replace np.NaN with np.nan.

* Replace np.product with np.prod.

* Replace assertRaisesRegexp with assertRaisesRegex and use collections.abc.Callable instead of collections.Callable.

* Simplify test in test_flatten.py to avoid cubelist within a cubelist.

* Replace Cube(None) with an alternative.

* Unpin environments for testing.

* Remove pygam version.

* Update improver_a.yml

* Remove non-essential dependencies from yml files and add pins.

* Minor edits following review comments.

* Modify docstring to better reflect the inputs provided.

* Updates to cube_combiner for new environment (#2128)

* Updates to cube_combiner for new environment

* Update checksums

* Mobt913: Environment upgrade - Metadata (#2137)

* Fix failing tests following environment change.

* Handle bug where num2date is returning cftime datetime instead of python datetime where it shouldn't

* Mobt915: environment upgrade - Orographic enhancement (#2138)

* Cast variable to float32 to resolve environment upgrade error

* Fix pre-commit reformatting

* Environment upgrade: expected value (#2139)

* Modify expected scalar values in unit tests to have specific float32 type to pass more rigorous numpy assert_allclose test.

* Update checksums

* Mobt 914 env update nowcasting (#2142)

* Ensures precision of output data are float32

* Reduces precision requirement of unit tests from 7 decimal places to 6.

* Fixes unit tests that were passing the wrong arguments to the methods they were testing

* Developer_tools: Updates metadata interpreter to print dict-like strings from CubeAttrsDict objects (#2134)

* Updates metadata interpreter to print dict-like strings from CubeAttrsDict objects

* Improves imports

* Simplifies conversion of cube_attrs to dict

* Environment upgrade: spot data (#2141)

* Change expected results in neighbour selection tied cases due to slight variation in return from coordinate transform.

* Change from using numpy float type to native float type when converting type of user provided percentiles.

* Fix up acceptance test which collects warnings. Update checksums.

* Mobt 916 environment upgrade regrid (#2144)

* Resolve failings tests following environment upgrade

* Fix pre_commit requirements

* Remove print statment and remove lots of unnecessarily added trailing .0s.

---------

Co-authored-by: benjamin.ayliffe <[email protected]>

* Upgrade utilities unit tests (#2140)

* Remove test for non-Cube inputs to a CubeList, as this is no longer possible.

* Unpack list for use when slicing in cube_extraction.

* Edits to test_temporal.py.

* Correction to datatypes within temporal_interpolation.py.

* Corrections to gradient_between_vertical_levels

* GAM corrections copied from #2126 for completeness.

* Edits, so that load unit tests pass, although we're no longer testing the lazy loading successfully, so this may need reconsideration.

* Alter ordering of bounds in mathematical_operations.py due to underlying change in iris.

* Retain intended indexing behaviour within neighbourhood_tools.py by converting list to tuple.

* Changes to allow more tolerance for solar interpolation tests, where data type differences can impact precision.

* Update load unit tests to override iris setting that prevents lazy loading for small files.

* Revert changes to solar.py, which are not required, following allowing a greater tolerance in the test_solar_interpolation unit test.

* Minor edit to test_load.py

* Minor edit to return a python datetime from the iris_time_to_datetime function, rather than a cftime datetime object.

* Simplification to use to_real_datetime method available in iris.

* Mobt906 Calibration unit tests upgrades for new environment (#2127)

* Fix tests in dz_rescaling. Added re.escape() call to regex pattern match.

* Update IMPROVER choose() function as numpy.lib.index_tricks.ndindex() moved to numpy.ndindex().

* Fix faulty plugin call in improver_tests/calibration/test_init.py.

* Fix tests in reliability_calibration. Added re.escape() call to regex pattern match.

* Fix tests in calibration/utilities. Added re.escape() call to regex pattern match.

* Modify rainforests_calibration/conftest.py so that both treelite and treelite_runtime are checked for before attempting to run relevant tests, rather than only treelite.

* Change expected results for failing ensemble_calibration/test_EstimateCoefficientsForEnsembleCalibration.py tests.

* Add treelite_runtime module dependency to environments containing treelite as a dependency.

* Revert changes related to rainforest calibration unit tests and treelite_runtime. More work is required to properly handle upgrading from treelite 3.x to 4.y.

* Change expected results for 2 tests in test_CalibratedForecastDistributionParameters.py.

* Revert changes to tests which were failing due to Regex pattern matching issues with numpy types being displayed. Instead, map these objects to base Python types, e.g str instead of np.str.

* Environment upgrade: generate ancillaries (#2143)

* Make constant float64 type where it is used to preserve original results.

* Changes to orographic smoothing coefficient generation to avoid type escalation to float64.

* Formatting update.

* enforce data type for cloud condensation level plugins (#2146)

* Acceptance test batch 2 (#2148)

* Enforce float32.

* Enforce input to cube.collapsed is the same type as the output.

* Get dtype without using .data.

* Modification to enforce the datatype within the mode_aggregator method and add an assertion to the unit tests to show that the output dtype matches the input dtype.

* Acceptance test batch 5 (#2149)

* Minor edit to textural.py

* Move setting of data type on the threshold values into the threshold plugin.

* Enforce data type (#2147)

* Update environment for rainforests calibration (#2136)

This patch provides changes to the apply_rainforests_calibration
plugin in order to support the IMPROVER repo environment update (see
Github issue #150).

Specifically, this patch sets lower bound versions
for the treelite and lightgbm packages to support an up-to-date
Python environment. Corresponding code changes for these version
upgrades are included. Furthermore, I have added a few minor quality
changes to the rainforests code - see below list.

This changeset includes:

* Set lower bound for treelite and lightgbm package versions
* Add tl2cgen package (needed for treelite v4.0.0 and up)
* Changes to rainforests calibration code to support updated package
versions
* Update docs to reflect new package versions
* Quality of life changes to rainforests calibration code:
     * Ensure correct type annotations are used
     * Use custom exceptions where feasible
     * Use StrEnum for model name to constrain possible argument values
     * Improved error handling
     * Additional clarity in docstrings
* Changes to rainforests calibration unit tests:
     * Update unit tests to support the above changes
     * Add additional rainforests calibration unit tests
     * Move some code that supports unit tests, to avoid duplication
* Update rainforests calibration acceptance tests and checksums 

Relates to Github issue #150

* Resolve calendar issues in kgos, checksums, and a depracated warning (#2151)

* Acceptance test batch 11 (#2150)

* Resolve calendar and float dtype issues following environment upgrade

* Repeat for weighted_blending

* Mobt 943 acceptance tests batch 9 (#2152)

* fixes mobt_943_acceptance_tests_batch_9 following environment changes
- Update calendars on KGOs
- Fix returning of float32 instead of in8 for categorical inputs
- Remove surprising use of iris.cube.CubeList() in acceptance test, causing test failures

* Pre-commit formatting fixes

* Simplify typecasting from previous commit

* Remove change of checksums so left until end of all acceptance testing

* Update retrieval of package version. (#2161)

Co-authored-by: Gavin Evans <[email protected]>

* MOBT-290 Environment upgrade: modal categories (#2157)

* Accommodate type overflow in modal aggregator whilst resetting the data type at the end of the process.

* Style fix

* Move dtype setting out of numpy.clip call as this causes a casting error. (#2158)

* Enforce float32 type for accumulated precipitation in nowcast accumulate code. (#2159)

* Remove timezone delocalisation from estimate-emos-from-table CLI to allow correct filtering of pyarrow parquet files. (#2156)

* Update checksums for acceptance test data following all of the environment upgrade work. (#2160)

* Cloud condensation level updates for the new environment (#2168)

* update checksums for cloud condensation level

* Limits cloud base pressure and temperature to surface values if super-saturated

---------

Co-authored-by: Stephen Moseley <[email protected]>

---------

Co-authored-by: gavinevans <[email protected]>
Co-authored-by: mspelman07 <[email protected]>
Co-authored-by: Max <[email protected]>
Co-authored-by: Stephen Moseley <[email protected]>
Co-authored-by: Ben Hooper <[email protected]>
Co-authored-by: Rachael Esler <[email protected]>

* Environment upgrade tidy-up (#2171)

* Remove references to python 3.6 and 3.7. Upgrade pre-commit CI environment to use python 3.12. Update readme environment building advice, though this may be at odds with the currently available version of improver in the conda library. Update the python version badge shown in the readme.

* Test removal of non-commit-contributing author.

* Fix readthedocs (#2172)

* Fix to the readthedocs environment.

* Use pip for cf_units as no versions beyong 2.0.1 available from conda-forge.

* Use latest FRT option for cube combiner (#2174)

* Add use_latest_frt option to the cube combiner.

* Doc-string changes requested.

* Added ruff flake8-bandit to pre-commit (#2145)

* Removed bespoking of the wind gust metadata, e.g. typical gusts and extreme gusts (#2176)

* Removed bespoking of the metadata, e.g. typical gusts and extreme gusts

* Update to checksums following changes

* Remove Max's erroneously added site-init file from this PR.

---------

Co-authored-by: benjamin.ayliffe <[email protected]>

* rebadge_forecasts_as_latest_cycle modified for use in the mix suite (#2177)

* Modify rebadge_forecasts_as_latest_cycle function to look for and update a blend_time if it is present on the cubes being rebadged. This ensures that the blend time and forecast reference time cannot get out of step.

* Add raises section to doc-string.

* Bump actions/checkout from 4 to 5 (#2180)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add in ability to return midpoint of bounds (#2173)

* Add in ability to return midpoint of bounds

* small updates to comments

* Fix empty array check (#2179)

* HailFraction handles masked CTT data as "no convection" (#2181)

* Extends HailFraction unit test to show that masked CTT data is not treated as "no convection"

* Fixes code to explicitly handle masked or invalid convective cloud top temperature values as non-convective.

* Updates unit test to show that the test succeeds if the convective cloud top data are a masked array or not.

* Renames cct variable for better consistency

* Mobt775: Add SAMOS training plugins and unit tests (#2126)

* Create statistical.py. Add functionality to fit GAMs using pyGAM.

* Create GAMPredict class. Rename FitGAM to GAMFit. Add doc-strings and type-hinting for both classes. Add unit tests for both classes.

* Fix doc-string typos. Run black to check code formatting.

* Move pygam imports in to Classes/tests to reduce depenendency on this package.

* Rename ensemble_calibration to emos_calibration and update all references.

* Create samos_calibration.py and create TrainGAMsForSAMOS class within it.

* Add tests for TrainGAMsForSAMOS plugin. Modify TrainGAMsForSAMOS plugin and add calculate_cube_statistics method.

* Extended calculate_cube_statistics method to handle rolling window calculation over time coordinate. Add tests for this method.

* Create functions for converting between cube and dataframe representations for SAMOS. Move generic helper functions for SAMOS unit tests into their own file. Create TrainEMOSForSAMOS class. Create additional unit tests for TrainGAMsForSAMOS. Add scipy monkey patch to pygam imports to work around a known bug.

* Improve samos_calibration tests helper functions. Refactor TrainEMOSForSAMOS to use 2 methods which include all functionality. Add unit tests for TrainEMOSForSAMOS. Minor update to CalculateClimateAnomalies.

* Modify CalculateClimateAnomalies plugin to correctly calculate the reference_epoch coordinate bounds when the inputs have multiple time points.

* Make tests for TrainGAMsForSAMOS and TrainEMOSForSAMOS simpler to understand. Improve test of TrainGAMsForSAMOS by predicting from the fitted GAM and comparing the predictions to expected output.

* Improve doc-strings and type hints. Move test helper functions to relevant file where there are only used in one file. Improve argument names for some test helper functions. Run pre-commit to fix formatting errors.

* Correct filepath for EMOS in documentation.

* Formatting changes.

* Improvements to doc-strings and other changes following first review.

* Changes following review. Largest change is addition of calculate_statistic_by_rolling_window method to TrainGAMsForSAMOS class.

* Changes following review. Make calculate_statistic_by_rolling_window more robust to missing time points and allow it to handle period diagnostics. Add new tests.

* Start using collapse_realizations methods in improver.utilities.cube_manipulation.py instead of iris method. Remove use of cell methods in TrainGAMsForSAMOS as they are unnecessary.

* Minor changes following review.

* Mark tests requiring pygam as skippable if pygam is not available in the environment.

* Add type hints for apply_aggregator method of TrainGAMsForSAMOS class.

* Fix gh action Sphinx-Pytest-Coverage: updated intersphinx inventory URLs (#2186)

* new url

* Update conf.py

* Update doc/source/conf.py

Co-authored-by: bayliffe <[email protected]>

* Update conf.py

---------

Co-authored-by: bayliffe <[email protected]>

* Bump actions/stale from 9 to 10 (#2184)

Bumps [actions/stale](https://github.com/actions/stale) from 9 to 10.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](actions/stale@v9...v10)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-version: '10'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/setup-python from 5 to 6 (#2185)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5...v6)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* EPPT-2305 contrails engine mixing ratio (#2125)

* Adding initial class setup for CondensationTrailFormation, with mixing ratio calculation method.

* Update to CondensationTrailFormation and adding tests

* Changes to my temporary return to keep tests passing

* Adding missing test check for mixing ratio values, and adding actual engine factors

* Allowing passing of bespoke engine_contrail_factors

* Docstring change to stop Sphinx CI error

* Documentation changes from review

* Removing height cube requirement and adjusting test cube creation to fully utilise set_up_variable_cube

* Changes from second review

* EPPT 2534 Refactors CondensationTrailFormation so that the same resul… (#2130)

* EPPT 2534 Refactors CondensationTrailFormation so that the same results can be achieved with Numpy arrays and Iris Cubes.

* Ruff

* Removes reference to Cubes in plugin doc-string

* Renames new method at reviewer suggestion

* Adding first derivative of saturation vapour pressure table (#2132)

* First working version of the new SVP derivative code

* Removing print statements and final changes following testing

* Adding unit tests for the new SaturatedVapourPressureTableDerivative class

* Adding extended documentation

* Correcting path to the extended documentation

* Adding Derivative to the call so the correct function is used

* removing blank line in the main doc string

* Refactors the two svp generator plugins to reduce duplication

* Updating cube name

* Adding three more unit tests to check that an error message is returned if the input temperatures to the saturated vapour pressure calculation are out of bounds.

* Ruff changes

* Changes following Marcus's code review

* Changing the svp to svp_derivative to make it more meaningful. Also, correcting the values in one of the unit tests now that derivative values above the triple point have changes very slightly following the removal of a pair of brackets in the n2 calculation.

* forcing a pre-commit hook

* Pre-commit changes

---------

Co-authored-by: Stephen Moseley <[email protected]>

* EPPT-2389: Calculation of localised svp (#2164)

* Adding almost working method with confusing difference between test calculated versions

* Forcing return shape of local vapour pressure

* Fixing weird precision difference due to np.float32 and np.array(,dtype=np.float32)

* Fixing half of problem with pressure levels test

* Fixing axis problem for pressure levels by reshaping array when calculating local vapour pressure

* Generalising pressure levels shape in local_vapour_pressure calculation to allow for any shape of temperature etc data to be parsed

* Revert "EPPT-2389: Calculation of localised svp (#2164)" (#2169)

This reverts commit 873c009.

* EPPT-2510: Adding calculation of saturation vapour pressure derivative in air (#2165)

* Code changes to add the SVP derivative corrected for in air

* Adding a missing temperature term to the correction calculation

* Updating expected values in unit tests and changing one of the imports so it correctly finds the function

* Updating one of the ecpected values within the unit tests

* EPPT-2389: Calculation of localised SVP (version 2) (#2170)

* Adding almost working method with confusing difference between test calculated versions

* Forcing return shape of local vapour pressure

* Fixing weird precision difference due to np.float32 and np.array(,dtype=np.float32)

* Fixing half of problem with pressure levels test

* Fixing axis problem for pressure levels by reshaping array when calculating local vapour pressure

* Generalising pressure levels shape in local_vapour_pressure calculation to allow for any shape of temperature etc data to be parsed

* Fixing test that is broken by environment update

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: mspelman07 <[email protected]>
Co-authored-by: Stephen Moseley <[email protected]>
Co-authored-by: Belinda Trotta <[email protected]>
Co-authored-by: Belinda Trotta <[email protected]>
Co-authored-by: bayliffe <[email protected]>
Co-authored-by: gavinevans <[email protected]>
Co-authored-by: Max <[email protected]>
Co-authored-by: Ben Hooper <[email protected]>
Co-authored-by: Rachael Esler <[email protected]>
Co-authored-by: Carwyn Pelley <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Canvin <[email protected]>
Co-authored-by: Robert Neal <[email protected]>
bayliffe added a commit that referenced this pull request Sep 10, 2025
* First cut changes in preparation for environment upgrade (#2124)

* Avoid cubelists of cubelists.

* Replace np.product with np.prod.

* Replace np.int.

* Replace np.NAN with np.nan.

* Replace np.NAN with np.nan.

* Replace np.NaN with np.nan.

* Replace np.product with np.prod.

* Replace assertRaisesRegexp with assertRaisesRegex and use collections.abc.Callable instead of collections.Callable.

* Simplify test in test_flatten.py to avoid cubelist within a cubelist.

* Replace Cube(None) with an alternative.

* Unpin environments for testing.

* Remove pygam version.

* Update improver_a.yml

* Remove non-essential dependencies from yml files and add pins.

* Minor edits following review comments.

* Modify docstring to better reflect the inputs provided.

* Updates to cube_combiner for new environment (#2128)

* Updates to cube_combiner for new environment

* Update checksums

* Add distance_to ancil file

* update docstring in file and add in unit tests for DistanceTo

* Corrects the file heading

* Docstring updates from review

* further review changes

* review response

* add geopandas to environment

* updates for failing github actions

* Update dictionary call

* Remove hardcoded projection for distance to calculation. Tests updated to work. Still requires additional unit tests for new functionality.

* Add a test for using a projection that is unsuitable for the sites being processed.

* Format fix.

* Accept all proposed doc-string changes

Co-authored-by: Max <[email protected]>

* Correct doc-string formatting.

* Add alternative projections into tests for distanceto plugin.

---------

Co-authored-by: gavinevans <[email protected]>
Co-authored-by: benjamin.ayliffe <[email protected]>
Co-authored-by: Max <[email protected]>
bayliffe added a commit that referenced this pull request Sep 10, 2025
* First cut changes in preparation for environment upgrade (#2124)

* Avoid cubelists of cubelists.

* Replace np.product with np.prod.

* Replace np.int.

* Replace np.NAN with np.nan.

* Replace np.NAN with np.nan.

* Replace np.NaN with np.nan.

* Replace np.product with np.prod.

* Replace assertRaisesRegexp with assertRaisesRegex and use collections.abc.Callable instead of collections.Callable.

* Simplify test in test_flatten.py to avoid cubelist within a cubelist.

* Replace Cube(None) with an alternative.

* Unpin environments for testing.

* Remove pygam version.

* Update improver_a.yml

* Remove non-essential dependencies from yml files and add pins.

* Minor edits following review comments.

* Modify docstring to better reflect the inputs provided.

* Updates to cube_combiner for new environment (#2128)

* Updates to cube_combiner for new environment

* Update checksums

* Add distance_to ancil file

* update docstring in file and add in unit tests for DistanceTo

* Corrects the file heading

* Add in files and tests for generating the miscenllaneous files

* Add metadata updates to roughness length

* Docstring updates from review

* further review changes

* Updates following review

* review response

* add geopandas to environment

* updates for failing github actions

* fixing doc string

* fix tests

* Update dictionary call

* Remove hardcoded projection for distance to calculation. Tests updated to work. Still requires additional unit tests for new functionality.

* Add a test for using a projection that is unsuitable for the sites being processed.

* Format fix.

* Update misc ancils code that uses DistanceTo plugin to accept and use an EPSG projection code.

* Add bounds guessing to neighbour finding. This has long been needed and is needed here specifically for the corinne surface type ancillary that has no spatial coordinate bounds. Adds a simple test to demonstrate that this works.

* Improver neighbour finding doc-string to indicate when the radius value is used and when not.

* Rewrite the land fraction ancillary plugin to work on the native resolution of the input data, rather than coarsening this first, which yields more accurate land fraction estimates.

* Add utility for extracting site list from a neighbour cube. This is useful as the neighbour cube has missing altitudes filled in from an orography, ensuring all of the definitions are complete.

* Re-implement the use of a neighbour cube for the land fraction generation so we can be sure that the metadata of the resulting cube, specifically the altitude coordinate, is complete.

* Formatting fix after conflict resolution.

---------

Co-authored-by: gavinevans <[email protected]>
Co-authored-by: benjamin.ayliffe <[email protected]>
gavinevans pushed a commit that referenced this pull request Oct 8, 2025
* Create statistical.py. Add functionality to fit GAMs using pyGAM.

* Create GAMPredict class. Rename FitGAM to GAMFit. Add doc-strings and type-hinting for both classes. Add unit tests for both classes.

* Fix doc-string typos. Run black to check code formatting.

* Move pygam imports in to Classes/tests to reduce depenendency on this package.

* Rename ensemble_calibration to emos_calibration and update all references.

* Create samos_calibration.py and create TrainGAMsForSAMOS class within it.

* Add tests for TrainGAMsForSAMOS plugin. Modify TrainGAMsForSAMOS plugin and add calculate_cube_statistics method.

* Extended calculate_cube_statistics method to handle rolling window calculation over time coordinate. Add tests for this method.

* Create functions for converting between cube and dataframe representations for SAMOS. Move generic helper functions for SAMOS unit tests into their own file. Create TrainEMOSForSAMOS class. Create additional unit tests for TrainGAMsForSAMOS. Add scipy monkey patch to pygam imports to work around a known bug.

* Improve samos_calibration tests helper functions. Refactor TrainEMOSForSAMOS to use 2 methods which include all functionality. Add unit tests for TrainEMOSForSAMOS. Minor update to CalculateClimateAnomalies.

* Modify CalculateClimateAnomalies plugin to correctly calculate the reference_epoch coordinate bounds when the inputs have multiple time points.

* Make tests for TrainGAMsForSAMOS and TrainEMOSForSAMOS simpler to understand. Improve test of TrainGAMsForSAMOS by predicting from the fitted GAM and comparing the predictions to expected output.

* Improve doc-strings and type hints. Move test helper functions to relevant file where there are only used in one file. Improve argument names for some test helper functions. Run pre-commit to fix formatting errors.

* Correct filepath for EMOS in documentation.

* Update environment .yml files to match those intended for the new PS47 production environment, as described in #2124.

* Formatting changes.

* Move get_climatological_stats method of TrainGAMsForSAMOS class to be its own function. Move unit tests accordingly. Respecify expected results of unit tests which use this function, as the random number generation has changed. Create ApplySAMOS class. Formatting changes.

* Improvements to doc-strings and other changes following first review.

* Create ApplySAMOS plugin. Start adding tests for this plugin. Create CLIs for 3 SAMOS classes and add helper function for splitting input cubes to these CLIs. Modify apply EMOS so that it can return location and scale parameters. Move functionality for converting location and scale parameters to forecast values out of apply EMOS plugin. Various small fixes.

* Add explciit handling of spot data when converting cubes to dataframes. Modify calculation of means and standard deviations of cubes in GAMs to collapse over time coordinates.

* Acceptance test and cli for gams -test

* add cli for estimate_samos_coefficients

* add CLI and tests for apply_samos_coefficients

* Adding additional comments and fixing saving pickles

* Add CLI and tests for estimate-samos-gams-from-table

* Add CLI and test changes for estimate-samos-coefficients-from-table

* Modify CLI to allow it to accept multiple additional predictor cubes. Add an acceptance test that demonstrates this is working. Current master branch merged in to pull in environment upgrade changes.

* Remove EMOS predictor option from estimate samos coefficients CLI. Rejig arguments so multiple samos predictors can be provided.

* Reorder argument list in apply samos CLI so an indeterminate number of cubes can be provided after the GAMs.

* Create statistical.py. Add functionality to fit GAMs using pyGAM.

* Create GAMPredict class. Rename FitGAM to GAMFit. Add doc-strings and type-hinting for both classes. Add unit tests for both classes.

* Fix doc-string typos. Run black to check code formatting.

* Move pygam imports in to Classes/tests to reduce depenendency on this package.

* Create samos_calibration.py and create TrainGAMsForSAMOS class within it.

* Add tests for TrainGAMsForSAMOS plugin. Modify TrainGAMsForSAMOS plugin and add calculate_cube_statistics method.

* Extended calculate_cube_statistics method to handle rolling window calculation over time coordinate. Add tests for this method.

* Create functions for converting between cube and dataframe representations for SAMOS. Move generic helper functions for SAMOS unit tests into their own file. Create TrainEMOSForSAMOS class. Create additional unit tests for TrainGAMsForSAMOS. Add scipy monkey patch to pygam imports to work around a known bug.

* Improve samos_calibration tests helper functions. Refactor TrainEMOSForSAMOS to use 2 methods which include all functionality. Add unit tests for TrainEMOSForSAMOS. Minor update to CalculateClimateAnomalies.

* Make tests for TrainGAMsForSAMOS and TrainEMOSForSAMOS simpler to understand. Improve test of TrainGAMsForSAMOS by predicting from the fitted GAM and comparing the predictions to expected output.

* Improve doc-strings and type hints. Move test helper functions to relevant file where there are only used in one file. Improve argument names for some test helper functions. Run pre-commit to fix formatting errors.

* Formatting changes.

* Improvements to doc-strings and other changes following first review.

* Changes following review. Largest change is addition of calculate_statistic_by_rolling_window method to TrainGAMsForSAMOS class.

* Changes following review. Make calculate_statistic_by_rolling_window more robust to missing time points and allow it to handle period diagnostics. Add new tests.

* Start using collapse_realizations methods in improver.utilities.cube_manipulation.py instead of iris method. Remove use of cell methods in TrainGAMsForSAMOS as they are unnecessary.

* Minor changes following review.

* Move get_climatological_stats method of TrainGAMsForSAMOS class to be its own function. Move unit tests accordingly. Respecify expected results of unit tests which use this function, as the random number generation has changed. Create ApplySAMOS class. Formatting changes.

* Create ApplySAMOS plugin. Start adding tests for this plugin. Create CLIs for 3 SAMOS classes and add helper function for splitting input cubes to these CLIs. Modify apply EMOS so that it can return location and scale parameters. Move functionality for converting location and scale parameters to forecast values out of apply EMOS plugin. Various small fixes.

* Add explciit handling of spot data when converting cubes to dataframes. Modify calculation of means and standard deviations of cubes in GAMs to collapse over time coordinates.

* Fix errors introduced when rebasing.

* Add sd_clip to get_climatological_stats to enforce a lower bound on sd predictions. Fix Apply_SAMOS to use truth gams when converting calibrated anomalies to forecast values. Add enforcement of ECC bounds to Apply_SAMOS result.

* Fix test_TrainEMOSForSAMOS.py tests that were failing due to a change in how standard deviation predictions from a GAM are handled (a lower bound is clipped to now). Remove SAMOS CLIs as these are being created under another ticket.

* Remove SAMOS CLIs as these are being created under another ticket.

* Ensure ApplySAMOS.process() only looks for ECC bounds when handling realization or percentile outputs. Add truth_gam input to test_ApplySAMOS ApplySAMOS.process() calls.

* Merge changes

* Fixing up CLI related tests after the introduction of pickle related writing. Fixing up the doc-string for estimate_samos_coefficients_from_table which was causing sphinx issues and the improver help test to fail.

* Removed unused import.

* Fix up application of SAMOS to match sites appropriately by providing a unique site identifier option with a fall back to lat/lon/alt matching (might need some approximation as it is float comparison, but leaving this for now). Removed duplicate get_climatological_stats method and test which clones that in the utilities method.

* Add unique_site_id_key argument to the SAMOS related CLIs.

* Add tools for splitting gams, cubes, and parquet files from an input filepath list.

* Remove inputpickle type. Adopt joblib method of pickle writing.

* Add empty return to estimate_samos_gams in cases where there is no training data.

* Modify SAMOS from table CLIs to use filepath lists as input so that missing inputs can be handled and the CLIs are input order agnostic.

* Modify remaining SAMOS CLIs that need to use a filepath list as input rather than cubes and pickle files.

* Set wmo_id as default unique identifier for training of gams.

* Ensure result is not None before writing a file.

* Remove unnecessary print statements.

* Enforce consistent use of joblib for handling pickle files in CLIs and acceptance tests. Small modification to generation of data for SAMOS unit tests so that the standard deviation of forecasts at a site cannot be zero. Fix SAMOS unit and acceptance tests.

* Fix doc-strings

* Another attempt to fix doc-strings. Ensure joblib is used consistently for all pickle file handling.

* Fix doc-strings again. Move a pyarrow import within the function in which it is used to avoid full improver dependency on the module.

* Fix doc-strings.

* Modify test_ApplySAMOS.py so that all tests within the file only run if pygam is available within the environment.

* Fix test_with_output_pickle now that we use joblib for all pickle file handling.

* Ensure test_get_climatological_stats is skipped if pygam is not available in the environment.

* Correct probability template cube handling in split_cubes_for_samos. Add unit tests for split_cubes_for_samos.

* Remove unnecessary print statements.

* Add unit tests for prepare_cube_no_calibration

* Fix doc-string.

* Another doc-string change.

* Changes following review.

* Combine compare() and compare_pickled_objects() in improver_tests/acceptance/acceptance.py. Rename split_pickle_parquet_and_netcdf() to split_netcdf_parquet_pickle() in improver/calibration/__init__.py. Add unit tests for split_pickle_parquet_and_netcdf() and identify_parquet_type() in improver/calibration/__init__.py.

* Modify instances where tuples of cubes are returned so that cubelists are returned instead. Ensure that tuples of cubes are saved to netcdf rather than pickle files. Fix typo in test doc-string.

* Add unit tests for convert_cube_to_parquet() function.

* Recreate checksums

* Changes following review.

* Change how we check whether pyarrow is available in environment.

* Add handling to check whether pyarrow is available to these unit tests.

* Minor changes following review.

---------

Co-authored-by: Marcus Spelman <[email protected]>
Co-authored-by: benjamin.ayliffe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants