Skip to content

Conversation

@rachaelesler
Copy link
Contributor

@rachaelesler rachaelesler commented Jun 24, 2025

This patch provides changes to the apply_rainforests_calibration plugin in order to support the IMPROVER repo environment update (see Github issue #150).

Specifically, this patch sets lower bound versions for the treelite and lightgbm packages to support an up-to-date Python environment. Corresponding code changes for these version upgrades are included. Furthermore, I have added a few minor quality changes to the rainforests code - see below list.

This changeset includes:

  • Set lower bound for treelite and lightgbm package versions
  • Add tl2cgen package (needed for treelite v4.0.0 and up)
  • Changes to rainforests calibration code to support updated package versions
  • Update docs to reflect new package versions
  • Quality of life changes to rainforests calibration code:
    • Ensure correct type annotations are used
    • Use custom exceptions where feasible
    • Use StrEnum for model name to constrain possible argument values
    • Improved error handling
    • Additional clarity in docstrings
  • Changes to rainforests calibration unit tests:
    • Update unit tests to support the above changes
    • Add additional rainforests calibration unit tests
    • Move some code that supports unit tests, to avoid duplication

Relates to Github issue #150

Testing:

  • Ran tests and they passed OK (rainforests calibration unit tests only - acceptance tests TODO)
  • Added new tests for the new feature(s)

CLA

  • If a new developer, signed up to CLA

@rachaelesler rachaelesler marked this pull request as draft June 24, 2025 00:40
@rachaelesler rachaelesler added don't merge yet BoM review required PRs opened by non-BoM developers that require a BoM review labels Jun 24, 2025
CONTRIBUTING.md Outdated
- Mark Worsfold (Met Office, UK)
- Bruce Wright (Met Office, UK)
- Ying Zhao (Bureau of Meteorology, Australia)
- Rachael Esler (Bureau of Meteorology, Australia)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick comment, this list should be alphabetical by surname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know Gavin - have updated this in 6cf4c2e.

@rachaelesler rachaelesler marked this pull request as ready for review June 24, 2025 23:05
@metoppv metoppv deleted a comment from rachaelesler Jun 26, 2025
@rachaelesler rachaelesler marked this pull request as draft June 26, 2025 05:38
Copy link
Contributor

@btrotta-bom btrotta-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates to use the new API look fine, and the PR also makes some good code quality updates. I had a few minor suggestions about the wording in the docstrings, and also I'm not sure about the refactor of the parametrised test.


Treelite
~~~~~~~~~~~~~~~~~~
Lightweight decision tree forest model specification format, used for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to specify thatt it's a binary format (i.e. not a text file that you can read as a human like lightgbm). Also I suggest removing "forest" - this implies that lightgbm models are random forests, but they are actually gradient-boosted decision trees, which is a different thing (lightgbm can also do random forests, but this feature is seldom used because GBDTs are generally better).

Suggested change: "Lightweight binary format for specifying decision tree models"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great clarification, I have updated to use your suggestion in the latest changes.

TL2cgen (TreeLite 2 C GENerator)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Model compiler for decision tree ensembles, used for more efficient computation
of GBDT ensembles required for RainForests calibration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest changing both instances of "ensembles" to "models"; in ML, "ensemble" would usually mean a collection of multiple models (each of which could be a GBDT consisting of many trees).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, have updated this now

Args:
key_name: 'treelite_model' or 'lightgbm_model' are the expected names.
key_name: One of "treelite_model" or "lightgbm_model".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the key_name argument is a string, but in the function type hint it's a ModelKeyName object. Is this intentional?

Copy link
Contributor Author

@rachaelesler rachaelesler Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the key_name argument to use the typing.Literal Model so hopefully it makes a bit more sense now.

pass


class ModelKeyName(StrEnum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, I did not know about the StrEnum class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lightweight alternative is to use typing.Literal:

from typing import Literal

Model = Literal["lightgbm_model", "treelite_model"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code to use typing.Literal now.

monkeypatch.setattr(lightgbm, "Booster", MockBooster)
monkeypatch.setitem(sys.modules, "tl2cgen", None)
result = ApplyRainForestsCalibration(model_config)
assert type(result).__name__ == "ApplyRainForestsCalibrationLightGBM"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to check the type directly (rather than the name)? E.g.

assert type(result) == ApplyRainForestsCalibrationLightGBM

(and similarly for tests below)

Copy link
Contributor Author

@rachaelesler rachaelesler Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I've changed it so the __name__ property is not checked now. I had initially kept it as it was to keep the tests as unchanged from the original as possible, but checking the type rather than the name of the type is the way to go.

assert type(result).__name__ == "ApplyRainForestsCalibrationLightGBM"

def test_correct_class_when_treelite_available(self, monkeypatch, model_config):
""" "Test that the ApplyRainForestsCalibration constructor creates an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good pickup, have removed

# missing across any thresholds
model_config["24"]["0.0000"].pop("treelite_model", None)
if not lightgbm_keys:
def test_treelite_model_and_module_available(self, monkeypatch, model_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you've refactored this, since the current setup with the multiple parametrizations is a bit hard to follow. However, I think the new tests miss some cases that were covered by the old approach. The parametrization gives 2 x 2 x 2 = 8 cases, whereas the new tests test only 4 cases. For example, the new tests do not test the case where the treelite module is available but the model config does not specify treelite files. I think the easiest solution here may be to restore the old test with the parametrizations, but maybe rewrite it slightly (e.g. change variable names and/or update docstrings) to make it clearer what each parametrization does.

Also, do these tests work if treelite is not installed in the environment? If lightgbm is not available, I think the importerskip statement will just cause it to skip all tests. But if only treelite is missing, this is currently handled separately by the TREELITE_ENABLED variable. I'm not sure that case is handled by your proposed changes. I actually don't think we need to allow for the case where the env has lightgbm but not treelite, so I suggest we just change it to also just skip everything if treelite is not installed (with importerskip), and do away with the TREELITE_ENABLED variable (changing it to True in the parametrization).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're absolutely right. I have done as you suggested and restored the old test with parametrizations, updating the variable names and docstring. I've added a table with a summary of the expected output for each parameter but let me know if you think that's overkill. With the restored tests, they are all passing.

I'll test what happens when I remove the TREELITE_ENABLED parameter and see how it goes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed the TREELITE_ENABLED option now and the tests will now just skipped if treelite+tl2cgen are not installed as suggested. Great suggestion 😄

Tests that treelite models are loaded correctly when:
- all thresholds contain treelite model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is incorrect - it doesn't match the function name or the code below where the treelite model config is deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, have updated the code to reflect the docstring

@rachaelesler rachaelesler marked this pull request as ready for review July 2, 2025 04:19
@rachaelesler rachaelesler requested a review from btrotta-bom July 2, 2025 04:19
@codecov
Copy link

codecov bot commented Jul 3, 2025

Codecov Report

Attention: Patch coverage is 90.00000% with 5 lines in your changes missing coverage. Please review.

Please upload report for BASE (mobt_800_env_upgrade_feature_branch@2c9f313). Learn more about missing BASE report.

Files with missing lines Patch % Lines
improver/calibration/rainforest_calibration.py 90.00% 5 Missing ⚠️
Additional details and impacted files
@@                          Coverage Diff                           @@
##             mobt_800_env_upgrade_feature_branch    #2136   +/-   ##
======================================================================
  Coverage                                       ?   98.42%           
======================================================================
  Files                                          ?      139           
  Lines                                          ?    13651           
  Branches                                       ?        0           
======================================================================
  Hits                                           ?    13436           
  Misses                                         ?      215           
  Partials                                       ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This patch provides changes to the apply_rainforests_calibration
plugin in order to support the IMPROVER repo environment update (see
Github issue metoppv#150).

Specifically, this patch sets lower bound versions
for the treelite and lightgbm packages to support an up-to-date
Python environment. Corresponding code changes for these version
upgrades are included. Furthermore, I have added a few minor quality
changes to the rainforests code - see below list.

This changeset includes:

* Set lower bound for treelite and lightgbm package versions
* Add tl2cgen package (needed for treelite v4.0.0 and up)
* Changes to rainforests calibration code to support updated package
versions
* Update docs to reflect new package versions
* Quality of life changes to rainforests calibration code:
     * Ensure correct type annotations are used
     * Use custom exceptions where feasible
     * Use StrEnum for model name to constrain possible argument values
     * Improved error handling
     * Additional clarity in docstrings
* Changes to rainforests calibration unit tests:
     * Update unit tests to support the above changes
     * Add additional rainforests calibration unit tests
     * Move some code that supports unit tests, to avoid duplication

Relates to Github issue metoppv#150
Copy link
Contributor

@btrotta-bom btrotta-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the approach to work around the segfault issue. A couple of remaining issues as noted.

@pytest.mark.parametrize(
"lightgbm_keys",
(True, False),
ids=["lightgbm_keys_present", "lightgbm_keys_absent"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea to add the ids here, this will help a lot when interpreting test output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

| Available | Yes | No | ApplyRainForestsCalibrationTreelite |
| Unavailable | Yes | No | ModelFileNotFoundError |
| Available | No | No | ModelFileNotFoundError |
| Unavailable | No | No | ModelFileNotFoundError |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the table, makes things really clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

If all thresholds contain treelite model AND the treelite module is
available, tl2cgen Predictor is returned, otherwise return lightgbm
Boosters. Checks outputs are ordered when inputs can be unordered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks outputs are ordered when inputs can be unordered.

This doesn't seem to relate to anything in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, have removed now.


# Default number of threads set by lightgbm
# See https://lightgbm.readthedocs.io/en/stable/Parameters-Tuning.html#add-more-computational-resources
DEFAULT_NUM_THREADS = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs linked say 1 thread per CPU core, so in general the total number of threads would be greater than 1. You can get the number of available cores in python as described here https://stackoverflow.com/a/1006337

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed this now per your other suggestion.


@pytest.mark.parametrize("ordered_inputs", (True, False))
@pytest.mark.parametrize("default_threads", (True, False))
def test__init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI tests (environment b) are failing here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this was from a previous review, please disregard.

selects which class to use based on availability of modules.
"""

def test_correct_class_when_lightgbm_available(self, monkeypatch, model_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test and the following one are both covered by test__new__, so they can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good pickup, have removed these duplicate tests now.

model = result.tree_models[lead_time, threshold]
assert model.model_class == "lightgbm-Booster"
assert model.threads == expected_threads
assert model.threads == DEFAULT_NUM_THREADS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this line is actually testing anything - it works because the MockBooster is set up to have DEFAULT_NUM_THREADS, but (as noted in comment below), for a real instance of Booster the number of threads would be equal to the number of cpu cores. I don't think there's an easy way to test this - I tried to get it working but it seems that the threads property doesn't exist by default (maybe it doesn't get set if it's not specified when creating the object). So I suggest just removing this line (and then you also don't need to define DEFAULT_NUM_THREADS in utils.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, have removed this and the DEFAULT_NUM_THREADS variable now. In future we could use a "magic mock" or something similar but there is very little value in something like that right now.

@rachaelesler rachaelesler force-pushed the new-uprev-rainforests branch from 8f01fc5 to 4dc9d7d Compare July 3, 2025 05:38
Copy link
Contributor

@btrotta-bom btrotta-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good now. Thanks for adding the treelite acceptance tests also.

@rachaelesler rachaelesler merged commit 68e54c8 into metoppv:mobt_800_env_upgrade_feature_branch Jul 9, 2025
5 of 8 checks passed
@rachaelesler rachaelesler deleted the new-uprev-rainforests branch July 9, 2025 03:49
rachaelesler added a commit to metoppv/improver_test_data that referenced this pull request Jul 9, 2025
This patch updates the data for the apply-rainforests-calibration CLI acceptance tests. This is part of the IMPROVER environment upgrade.

This changeset includes:

    Update rainforests tests KGO
    Add Treelite model files to facilitate testing with Treelite models
    Update rainforests model config to include Treelite models
    Rename LightGBM model files

See related PR in the improver repo: metoppv/improver#2136

Relates to mobt-800
bayliffe added a commit that referenced this pull request Jul 30, 2025
* First cut changes in preparation for environment upgrade (#2124)

* Avoid cubelists of cubelists.

* Replace np.product with np.prod.

* Replace np.int.

* Replace np.NAN with np.nan.

* Replace np.NAN with np.nan.

* Replace np.NaN with np.nan.

* Replace np.product with np.prod.

* Replace assertRaisesRegexp with assertRaisesRegex and use collections.abc.Callable instead of collections.Callable.

* Simplify test in test_flatten.py to avoid cubelist within a cubelist.

* Replace Cube(None) with an alternative.

* Unpin environments for testing.

* Remove pygam version.

* Update improver_a.yml

* Remove non-essential dependencies from yml files and add pins.

* Minor edits following review comments.

* Modify docstring to better reflect the inputs provided.

* Updates to cube_combiner for new environment (#2128)

* Updates to cube_combiner for new environment

* Update checksums

* Mobt913: Environment upgrade - Metadata (#2137)

* Fix failing tests following environment change.

* Handle bug where num2date is returning cftime datetime instead of python datetime where it shouldn't

* Mobt915: environment upgrade - Orographic enhancement (#2138)

* Cast variable to float32 to resolve environment upgrade error

* Fix pre-commit reformatting

* Environment upgrade: expected value (#2139)

* Modify expected scalar values in unit tests to have specific float32 type to pass more rigorous numpy assert_allclose test.

* Update checksums

* Mobt 914 env update nowcasting (#2142)

* Ensures precision of output data are float32

* Reduces precision requirement of unit tests from 7 decimal places to 6.

* Fixes unit tests that were passing the wrong arguments to the methods they were testing

* Developer_tools: Updates metadata interpreter to print dict-like strings from CubeAttrsDict objects (#2134)

* Updates metadata interpreter to print dict-like strings from CubeAttrsDict objects

* Improves imports

* Simplifies conversion of cube_attrs to dict

* Environment upgrade: spot data (#2141)

* Change expected results in neighbour selection tied cases due to slight variation in return from coordinate transform.

* Change from using numpy float type to native float type when converting type of user provided percentiles.

* Fix up acceptance test which collects warnings. Update checksums.

* Mobt 916 environment upgrade regrid (#2144)

* Resolve failings tests following environment upgrade

* Fix pre_commit requirements

* Remove print statment and remove lots of unnecessarily added trailing .0s.

---------

Co-authored-by: benjamin.ayliffe <benjamin.ayliffe@metoffice.gov.uk>

* Upgrade utilities unit tests (#2140)

* Remove test for non-Cube inputs to a CubeList, as this is no longer possible.

* Unpack list for use when slicing in cube_extraction.

* Edits to test_temporal.py.

* Correction to datatypes within temporal_interpolation.py.

* Corrections to gradient_between_vertical_levels

* GAM corrections copied from #2126 for completeness.

* Edits, so that load unit tests pass, although we're no longer testing the lazy loading successfully, so this may need reconsideration.

* Alter ordering of bounds in mathematical_operations.py due to underlying change in iris.

* Retain intended indexing behaviour within neighbourhood_tools.py by converting list to tuple.

* Changes to allow more tolerance for solar interpolation tests, where data type differences can impact precision.

* Update load unit tests to override iris setting that prevents lazy loading for small files.

* Revert changes to solar.py, which are not required, following allowing a greater tolerance in the test_solar_interpolation unit test.

* Minor edit to test_load.py

* Minor edit to return a python datetime from the iris_time_to_datetime function, rather than a cftime datetime object.

* Simplification to use to_real_datetime method available in iris.

* Mobt906 Calibration unit tests upgrades for new environment (#2127)

* Fix tests in dz_rescaling. Added re.escape() call to regex pattern match.

* Update IMPROVER choose() function as numpy.lib.index_tricks.ndindex() moved to numpy.ndindex().

* Fix faulty plugin call in improver_tests/calibration/test_init.py.

* Fix tests in reliability_calibration. Added re.escape() call to regex pattern match.

* Fix tests in calibration/utilities. Added re.escape() call to regex pattern match.

* Modify rainforests_calibration/conftest.py so that both treelite and treelite_runtime are checked for before attempting to run relevant tests, rather than only treelite.

* Change expected results for failing ensemble_calibration/test_EstimateCoefficientsForEnsembleCalibration.py tests.

* Add treelite_runtime module dependency to environments containing treelite as a dependency.

* Revert changes related to rainforest calibration unit tests and treelite_runtime. More work is required to properly handle upgrading from treelite 3.x to 4.y.

* Change expected results for 2 tests in test_CalibratedForecastDistributionParameters.py.

* Revert changes to tests which were failing due to Regex pattern matching issues with numpy types being displayed. Instead, map these objects to base Python types, e.g str instead of np.str.

* Environment upgrade: generate ancillaries (#2143)

* Make constant float64 type where it is used to preserve original results.

* Changes to orographic smoothing coefficient generation to avoid type escalation to float64.

* Formatting update.

* enforce data type for cloud condensation level plugins (#2146)

* Acceptance test batch 2 (#2148)

* Enforce float32.

* Enforce input to cube.collapsed is the same type as the output.

* Get dtype without using .data.

* Modification to enforce the datatype within the mode_aggregator method and add an assertion to the unit tests to show that the output dtype matches the input dtype.

* Acceptance test batch 5 (#2149)

* Minor edit to textural.py

* Move setting of data type on the threshold values into the threshold plugin.

* Enforce data type (#2147)

* Update environment for rainforests calibration (#2136)

This patch provides changes to the apply_rainforests_calibration
plugin in order to support the IMPROVER repo environment update (see
Github issue #150).

Specifically, this patch sets lower bound versions
for the treelite and lightgbm packages to support an up-to-date
Python environment. Corresponding code changes for these version
upgrades are included. Furthermore, I have added a few minor quality
changes to the rainforests code - see below list.

This changeset includes:

* Set lower bound for treelite and lightgbm package versions
* Add tl2cgen package (needed for treelite v4.0.0 and up)
* Changes to rainforests calibration code to support updated package
versions
* Update docs to reflect new package versions
* Quality of life changes to rainforests calibration code:
     * Ensure correct type annotations are used
     * Use custom exceptions where feasible
     * Use StrEnum for model name to constrain possible argument values
     * Improved error handling
     * Additional clarity in docstrings
* Changes to rainforests calibration unit tests:
     * Update unit tests to support the above changes
     * Add additional rainforests calibration unit tests
     * Move some code that supports unit tests, to avoid duplication
* Update rainforests calibration acceptance tests and checksums 

Relates to Github issue #150

* Resolve calendar issues in kgos, checksums, and a depracated warning (#2151)

* Acceptance test batch 11 (#2150)

* Resolve calendar and float dtype issues following environment upgrade

* Repeat for weighted_blending

* Mobt 943 acceptance tests batch 9 (#2152)

* fixes mobt_943_acceptance_tests_batch_9 following environment changes
- Update calendars on KGOs
- Fix returning of float32 instead of in8 for categorical inputs
- Remove surprising use of iris.cube.CubeList() in acceptance test, causing test failures

* Pre-commit formatting fixes

* Simplify typecasting from previous commit

* Remove change of checksums so left until end of all acceptance testing

* Update retrieval of package version. (#2161)

Co-authored-by: Gavin Evans <gavin.evans@metoffice.gov.uk>

* MOBT-290 Environment upgrade: modal categories (#2157)

* Accommodate type overflow in modal aggregator whilst resetting the data type at the end of the process.

* Style fix

* Move dtype setting out of numpy.clip call as this causes a casting error. (#2158)

* Enforce float32 type for accumulated precipitation in nowcast accumulate code. (#2159)

* Remove timezone delocalisation from estimate-emos-from-table CLI to allow correct filtering of pyarrow parquet files. (#2156)

* Update checksums for acceptance test data following all of the environment upgrade work. (#2160)

* Cloud condensation level updates for the new environment (#2168)

* update checksums for cloud condensation level

* Limits cloud base pressure and temperature to surface values if super-saturated

---------

Co-authored-by: Stephen Moseley <stephen.moseley@metoffice.gov.uk>

---------

Co-authored-by: gavinevans <gavin.evans@metoffice.gov.uk>
Co-authored-by: mspelman07 <99179165+mspelman07@users.noreply.github.com>
Co-authored-by: Max <max.white@metoffice.gov.uk>
Co-authored-by: Stephen Moseley <stephen.moseley@metoffice.gov.uk>
Co-authored-by: Ben Hooper <114418734+brhooper@users.noreply.github.com>
Co-authored-by: Rachael Esler <44791931+rachaelesler@users.noreply.github.com>
mo-philrelton added a commit that referenced this pull request Sep 8, 2025
* Fill value (#2129)

* update save function to include fill value

* update docstring

* update docstring

* Update tests for fill value

* 2154 BUG: Convective cloud base and top can be unphysical (#2155)

* Limits cloud base pressure and temperature to surface values if super-saturated

* Updates checksums for revised KGO

* Fix recalibration docstring (#2166)

* Use np.interp instead of scipy interp1d

* Revert "Use np.interp instead of scipy interp1d"

This reverts commit 3ea6cd6.

* fix docstring

---------

Co-authored-by: Belinda Trotta <btrotta-bom@users.noreply.github.com>

* Environment upgrade feature branch (#2167)

* First cut changes in preparation for environment upgrade (#2124)

* Avoid cubelists of cubelists.

* Replace np.product with np.prod.

* Replace np.int.

* Replace np.NAN with np.nan.

* Replace np.NAN with np.nan.

* Replace np.NaN with np.nan.

* Replace np.product with np.prod.

* Replace assertRaisesRegexp with assertRaisesRegex and use collections.abc.Callable instead of collections.Callable.

* Simplify test in test_flatten.py to avoid cubelist within a cubelist.

* Replace Cube(None) with an alternative.

* Unpin environments for testing.

* Remove pygam version.

* Update improver_a.yml

* Remove non-essential dependencies from yml files and add pins.

* Minor edits following review comments.

* Modify docstring to better reflect the inputs provided.

* Updates to cube_combiner for new environment (#2128)

* Updates to cube_combiner for new environment

* Update checksums

* Mobt913: Environment upgrade - Metadata (#2137)

* Fix failing tests following environment change.

* Handle bug where num2date is returning cftime datetime instead of python datetime where it shouldn't

* Mobt915: environment upgrade - Orographic enhancement (#2138)

* Cast variable to float32 to resolve environment upgrade error

* Fix pre-commit reformatting

* Environment upgrade: expected value (#2139)

* Modify expected scalar values in unit tests to have specific float32 type to pass more rigorous numpy assert_allclose test.

* Update checksums

* Mobt 914 env update nowcasting (#2142)

* Ensures precision of output data are float32

* Reduces precision requirement of unit tests from 7 decimal places to 6.

* Fixes unit tests that were passing the wrong arguments to the methods they were testing

* Developer_tools: Updates metadata interpreter to print dict-like strings from CubeAttrsDict objects (#2134)

* Updates metadata interpreter to print dict-like strings from CubeAttrsDict objects

* Improves imports

* Simplifies conversion of cube_attrs to dict

* Environment upgrade: spot data (#2141)

* Change expected results in neighbour selection tied cases due to slight variation in return from coordinate transform.

* Change from using numpy float type to native float type when converting type of user provided percentiles.

* Fix up acceptance test which collects warnings. Update checksums.

* Mobt 916 environment upgrade regrid (#2144)

* Resolve failings tests following environment upgrade

* Fix pre_commit requirements

* Remove print statment and remove lots of unnecessarily added trailing .0s.

---------

Co-authored-by: benjamin.ayliffe <benjamin.ayliffe@metoffice.gov.uk>

* Upgrade utilities unit tests (#2140)

* Remove test for non-Cube inputs to a CubeList, as this is no longer possible.

* Unpack list for use when slicing in cube_extraction.

* Edits to test_temporal.py.

* Correction to datatypes within temporal_interpolation.py.

* Corrections to gradient_between_vertical_levels

* GAM corrections copied from #2126 for completeness.

* Edits, so that load unit tests pass, although we're no longer testing the lazy loading successfully, so this may need reconsideration.

* Alter ordering of bounds in mathematical_operations.py due to underlying change in iris.

* Retain intended indexing behaviour within neighbourhood_tools.py by converting list to tuple.

* Changes to allow more tolerance for solar interpolation tests, where data type differences can impact precision.

* Update load unit tests to override iris setting that prevents lazy loading for small files.

* Revert changes to solar.py, which are not required, following allowing a greater tolerance in the test_solar_interpolation unit test.

* Minor edit to test_load.py

* Minor edit to return a python datetime from the iris_time_to_datetime function, rather than a cftime datetime object.

* Simplification to use to_real_datetime method available in iris.

* Mobt906 Calibration unit tests upgrades for new environment (#2127)

* Fix tests in dz_rescaling. Added re.escape() call to regex pattern match.

* Update IMPROVER choose() function as numpy.lib.index_tricks.ndindex() moved to numpy.ndindex().

* Fix faulty plugin call in improver_tests/calibration/test_init.py.

* Fix tests in reliability_calibration. Added re.escape() call to regex pattern match.

* Fix tests in calibration/utilities. Added re.escape() call to regex pattern match.

* Modify rainforests_calibration/conftest.py so that both treelite and treelite_runtime are checked for before attempting to run relevant tests, rather than only treelite.

* Change expected results for failing ensemble_calibration/test_EstimateCoefficientsForEnsembleCalibration.py tests.

* Add treelite_runtime module dependency to environments containing treelite as a dependency.

* Revert changes related to rainforest calibration unit tests and treelite_runtime. More work is required to properly handle upgrading from treelite 3.x to 4.y.

* Change expected results for 2 tests in test_CalibratedForecastDistributionParameters.py.

* Revert changes to tests which were failing due to Regex pattern matching issues with numpy types being displayed. Instead, map these objects to base Python types, e.g str instead of np.str.

* Environment upgrade: generate ancillaries (#2143)

* Make constant float64 type where it is used to preserve original results.

* Changes to orographic smoothing coefficient generation to avoid type escalation to float64.

* Formatting update.

* enforce data type for cloud condensation level plugins (#2146)

* Acceptance test batch 2 (#2148)

* Enforce float32.

* Enforce input to cube.collapsed is the same type as the output.

* Get dtype without using .data.

* Modification to enforce the datatype within the mode_aggregator method and add an assertion to the unit tests to show that the output dtype matches the input dtype.

* Acceptance test batch 5 (#2149)

* Minor edit to textural.py

* Move setting of data type on the threshold values into the threshold plugin.

* Enforce data type (#2147)

* Update environment for rainforests calibration (#2136)

This patch provides changes to the apply_rainforests_calibration
plugin in order to support the IMPROVER repo environment update (see
Github issue #150).

Specifically, this patch sets lower bound versions
for the treelite and lightgbm packages to support an up-to-date
Python environment. Corresponding code changes for these version
upgrades are included. Furthermore, I have added a few minor quality
changes to the rainforests code - see below list.

This changeset includes:

* Set lower bound for treelite and lightgbm package versions
* Add tl2cgen package (needed for treelite v4.0.0 and up)
* Changes to rainforests calibration code to support updated package
versions
* Update docs to reflect new package versions
* Quality of life changes to rainforests calibration code:
     * Ensure correct type annotations are used
     * Use custom exceptions where feasible
     * Use StrEnum for model name to constrain possible argument values
     * Improved error handling
     * Additional clarity in docstrings
* Changes to rainforests calibration unit tests:
     * Update unit tests to support the above changes
     * Add additional rainforests calibration unit tests
     * Move some code that supports unit tests, to avoid duplication
* Update rainforests calibration acceptance tests and checksums 

Relates to Github issue #150

* Resolve calendar issues in kgos, checksums, and a depracated warning (#2151)

* Acceptance test batch 11 (#2150)

* Resolve calendar and float dtype issues following environment upgrade

* Repeat for weighted_blending

* Mobt 943 acceptance tests batch 9 (#2152)

* fixes mobt_943_acceptance_tests_batch_9 following environment changes
- Update calendars on KGOs
- Fix returning of float32 instead of in8 for categorical inputs
- Remove surprising use of iris.cube.CubeList() in acceptance test, causing test failures

* Pre-commit formatting fixes

* Simplify typecasting from previous commit

* Remove change of checksums so left until end of all acceptance testing

* Update retrieval of package version. (#2161)

Co-authored-by: Gavin Evans <gavin.evans@metoffice.gov.uk>

* MOBT-290 Environment upgrade: modal categories (#2157)

* Accommodate type overflow in modal aggregator whilst resetting the data type at the end of the process.

* Style fix

* Move dtype setting out of numpy.clip call as this causes a casting error. (#2158)

* Enforce float32 type for accumulated precipitation in nowcast accumulate code. (#2159)

* Remove timezone delocalisation from estimate-emos-from-table CLI to allow correct filtering of pyarrow parquet files. (#2156)

* Update checksums for acceptance test data following all of the environment upgrade work. (#2160)

* Cloud condensation level updates for the new environment (#2168)

* update checksums for cloud condensation level

* Limits cloud base pressure and temperature to surface values if super-saturated

---------

Co-authored-by: Stephen Moseley <stephen.moseley@metoffice.gov.uk>

---------

Co-authored-by: gavinevans <gavin.evans@metoffice.gov.uk>
Co-authored-by: mspelman07 <99179165+mspelman07@users.noreply.github.com>
Co-authored-by: Max <max.white@metoffice.gov.uk>
Co-authored-by: Stephen Moseley <stephen.moseley@metoffice.gov.uk>
Co-authored-by: Ben Hooper <114418734+brhooper@users.noreply.github.com>
Co-authored-by: Rachael Esler <44791931+rachaelesler@users.noreply.github.com>

* Environment upgrade tidy-up (#2171)

* Remove references to python 3.6 and 3.7. Upgrade pre-commit CI environment to use python 3.12. Update readme environment building advice, though this may be at odds with the currently available version of improver in the conda library. Update the python version badge shown in the readme.

* Test removal of non-commit-contributing author.

* Fix readthedocs (#2172)

* Fix to the readthedocs environment.

* Use pip for cf_units as no versions beyong 2.0.1 available from conda-forge.

* Use latest FRT option for cube combiner (#2174)

* Add use_latest_frt option to the cube combiner.

* Doc-string changes requested.

* Added ruff flake8-bandit to pre-commit (#2145)

* Removed bespoking of the wind gust metadata, e.g. typical gusts and extreme gusts (#2176)

* Removed bespoking of the metadata, e.g. typical gusts and extreme gusts

* Update to checksums following changes

* Remove Max's erroneously added site-init file from this PR.

---------

Co-authored-by: benjamin.ayliffe <benjamin.ayliffe@metoffice.gov.uk>

* rebadge_forecasts_as_latest_cycle modified for use in the mix suite (#2177)

* Modify rebadge_forecasts_as_latest_cycle function to look for and update a blend_time if it is present on the cubes being rebadged. This ensures that the blend time and forecast reference time cannot get out of step.

* Add raises section to doc-string.

* Bump actions/checkout from 4 to 5 (#2180)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add in ability to return midpoint of bounds (#2173)

* Add in ability to return midpoint of bounds

* small updates to comments

* Fix empty array check (#2179)

* HailFraction handles masked CTT data as "no convection" (#2181)

* Extends HailFraction unit test to show that masked CTT data is not treated as "no convection"

* Fixes code to explicitly handle masked or invalid convective cloud top temperature values as non-convective.

* Updates unit test to show that the test succeeds if the convective cloud top data are a masked array or not.

* Renames cct variable for better consistency

* Mobt775: Add SAMOS training plugins and unit tests (#2126)

* Create statistical.py. Add functionality to fit GAMs using pyGAM.

* Create GAMPredict class. Rename FitGAM to GAMFit. Add doc-strings and type-hinting for both classes. Add unit tests for both classes.

* Fix doc-string typos. Run black to check code formatting.

* Move pygam imports in to Classes/tests to reduce depenendency on this package.

* Rename ensemble_calibration to emos_calibration and update all references.

* Create samos_calibration.py and create TrainGAMsForSAMOS class within it.

* Add tests for TrainGAMsForSAMOS plugin. Modify TrainGAMsForSAMOS plugin and add calculate_cube_statistics method.

* Extended calculate_cube_statistics method to handle rolling window calculation over time coordinate. Add tests for this method.

* Create functions for converting between cube and dataframe representations for SAMOS. Move generic helper functions for SAMOS unit tests into their own file. Create TrainEMOSForSAMOS class. Create additional unit tests for TrainGAMsForSAMOS. Add scipy monkey patch to pygam imports to work around a known bug.

* Improve samos_calibration tests helper functions. Refactor TrainEMOSForSAMOS to use 2 methods which include all functionality. Add unit tests for TrainEMOSForSAMOS. Minor update to CalculateClimateAnomalies.

* Modify CalculateClimateAnomalies plugin to correctly calculate the reference_epoch coordinate bounds when the inputs have multiple time points.

* Make tests for TrainGAMsForSAMOS and TrainEMOSForSAMOS simpler to understand. Improve test of TrainGAMsForSAMOS by predicting from the fitted GAM and comparing the predictions to expected output.

* Improve doc-strings and type hints. Move test helper functions to relevant file where there are only used in one file. Improve argument names for some test helper functions. Run pre-commit to fix formatting errors.

* Correct filepath for EMOS in documentation.

* Formatting changes.

* Improvements to doc-strings and other changes following first review.

* Changes following review. Largest change is addition of calculate_statistic_by_rolling_window method to TrainGAMsForSAMOS class.

* Changes following review. Make calculate_statistic_by_rolling_window more robust to missing time points and allow it to handle period diagnostics. Add new tests.

* Start using collapse_realizations methods in improver.utilities.cube_manipulation.py instead of iris method. Remove use of cell methods in TrainGAMsForSAMOS as they are unnecessary.

* Minor changes following review.

* Mark tests requiring pygam as skippable if pygam is not available in the environment.

* Add type hints for apply_aggregator method of TrainGAMsForSAMOS class.

* Fix gh action Sphinx-Pytest-Coverage: updated intersphinx inventory URLs (#2186)

* new url

* Update conf.py

* Update doc/source/conf.py

Co-authored-by: bayliffe <benjamin.ayliffe@metoffice.gov.uk>

* Update conf.py

---------

Co-authored-by: bayliffe <benjamin.ayliffe@metoffice.gov.uk>

* Bump actions/stale from 9 to 10 (#2184)

Bumps [actions/stale](https://github.com/actions/stale) from 9 to 10.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](actions/stale@v9...v10)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-version: '10'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/setup-python from 5 to 6 (#2185)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5...v6)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* EPPT-2305 contrails engine mixing ratio (#2125)

* Adding initial class setup for CondensationTrailFormation, with mixing ratio calculation method.

* Update to CondensationTrailFormation and adding tests

* Changes to my temporary return to keep tests passing

* Adding missing test check for mixing ratio values, and adding actual engine factors

* Allowing passing of bespoke engine_contrail_factors

* Docstring change to stop Sphinx CI error

* Documentation changes from review

* Removing height cube requirement and adjusting test cube creation to fully utilise set_up_variable_cube

* Changes from second review

* EPPT 2534 Refactors CondensationTrailFormation so that the same resul… (#2130)

* EPPT 2534 Refactors CondensationTrailFormation so that the same results can be achieved with Numpy arrays and Iris Cubes.

* Ruff

* Removes reference to Cubes in plugin doc-string

* Renames new method at reviewer suggestion

* Adding first derivative of saturation vapour pressure table (#2132)

* First working version of the new SVP derivative code

* Removing print statements and final changes following testing

* Adding unit tests for the new SaturatedVapourPressureTableDerivative class

* Adding extended documentation

* Correcting path to the extended documentation

* Adding Derivative to the call so the correct function is used

* removing blank line in the main doc string

* Refactors the two svp generator plugins to reduce duplication

* Updating cube name

* Adding three more unit tests to check that an error message is returned if the input temperatures to the saturated vapour pressure calculation are out of bounds.

* Ruff changes

* Changes following Marcus's code review

* Changing the svp to svp_derivative to make it more meaningful. Also, correcting the values in one of the unit tests now that derivative values above the triple point have changes very slightly following the removal of a pair of brackets in the n2 calculation.

* forcing a pre-commit hook

* Pre-commit changes

---------

Co-authored-by: Stephen Moseley <stephen.moseley@metoffice.gov.uk>

* EPPT-2389: Calculation of localised svp (#2164)

* Adding almost working method with confusing difference between test calculated versions

* Forcing return shape of local vapour pressure

* Fixing weird precision difference due to np.float32 and np.array(,dtype=np.float32)

* Fixing half of problem with pressure levels test

* Fixing axis problem for pressure levels by reshaping array when calculating local vapour pressure

* Generalising pressure levels shape in local_vapour_pressure calculation to allow for any shape of temperature etc data to be parsed

* Revert "EPPT-2389: Calculation of localised svp (#2164)" (#2169)

This reverts commit 873c009.

* EPPT-2510: Adding calculation of saturation vapour pressure derivative in air (#2165)

* Code changes to add the SVP derivative corrected for in air

* Adding a missing temperature term to the correction calculation

* Updating expected values in unit tests and changing one of the imports so it correctly finds the function

* Updating one of the ecpected values within the unit tests

* EPPT-2389: Calculation of localised SVP (version 2) (#2170)

* Adding almost working method with confusing difference between test calculated versions

* Forcing return shape of local vapour pressure

* Fixing weird precision difference due to np.float32 and np.array(,dtype=np.float32)

* Fixing half of problem with pressure levels test

* Fixing axis problem for pressure levels by reshaping array when calculating local vapour pressure

* Generalising pressure levels shape in local_vapour_pressure calculation to allow for any shape of temperature etc data to be parsed

* Fixing test that is broken by environment update

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: mspelman07 <99179165+mspelman07@users.noreply.github.com>
Co-authored-by: Stephen Moseley <stephen.moseley@metoffice.gov.uk>
Co-authored-by: Belinda Trotta <belinda.trotta@bom.gov.au>
Co-authored-by: Belinda Trotta <btrotta-bom@users.noreply.github.com>
Co-authored-by: bayliffe <benjamin.ayliffe@metoffice.gov.uk>
Co-authored-by: gavinevans <gavin.evans@metoffice.gov.uk>
Co-authored-by: Max <max.white@metoffice.gov.uk>
Co-authored-by: Ben Hooper <114418734+brhooper@users.noreply.github.com>
Co-authored-by: Rachael Esler <44791931+rachaelesler@users.noreply.github.com>
Co-authored-by: Carwyn Pelley <carwyn.pelley@metoffice.gov.uk>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Canvin <41655739+nivnac@users.noreply.github.com>
Co-authored-by: Robert Neal <robert.neal@metoffice.gov.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BoM review required PRs opened by non-BoM developers that require a BoM review test Type:TechnicalDebt

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants