Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
00bc9d1
Create statistical.py. Add functionality to fit GAMs using pyGAM.
brhooper May 1, 2025
b976a2c
Create GAMPredict class. Rename FitGAM to GAMFit. Add doc-strings and…
brhooper May 2, 2025
0f23a30
Fix doc-string typos. Run black to check code formatting.
brhooper May 2, 2025
f479f51
Move pygam imports in to Classes/tests to reduce depenendency on this…
brhooper May 2, 2025
502a302
Rename ensemble_calibration to emos_calibration and update all refere…
brhooper May 6, 2025
4771f86
Create samos_calibration.py and create TrainGAMsForSAMOS class within…
brhooper May 8, 2025
87c63a4
Add tests for TrainGAMsForSAMOS plugin. Modify TrainGAMsForSAMOS plug…
brhooper May 12, 2025
16dee57
Extended calculate_cube_statistics method to handle rolling window ca…
brhooper May 14, 2025
336f372
Create functions for converting between cube and dataframe representa…
brhooper May 16, 2025
d994fbd
Improve samos_calibration tests helper functions. Refactor TrainEMOSF…
brhooper May 23, 2025
8e9d4de
Modify CalculateClimateAnomalies plugin to correctly calculate the re…
brhooper May 23, 2025
7649605
Make tests for TrainGAMsForSAMOS and TrainEMOSForSAMOS simpler to und…
brhooper May 23, 2025
3c6de9e
Improve doc-strings and type hints. Move test helper functions to rel…
brhooper May 28, 2025
c444cae
Correct filepath for EMOS in documentation.
brhooper May 28, 2025
9376a27
Update environment .yml files to match those intended for the new PS4…
brhooper May 28, 2025
7c1a02a
Formatting changes.
brhooper May 28, 2025
139cec8
Move get_climatological_stats method of TrainGAMsForSAMOS class to be…
brhooper May 29, 2025
0956a6c
Improvements to doc-strings and other changes following first review.
brhooper Jun 10, 2025
b91aae0
Create ApplySAMOS plugin. Start adding tests for this plugin. Create …
brhooper Jun 30, 2025
c71b04c
Merge branch 'mobt775_samos_2' of https://github.com/brhooper/improve…
mspelman07 Jul 10, 2025
eb5adb5
Add explciit handling of spot data when converting cubes to dataframe…
brhooper Jul 10, 2025
b5b6b00
Merge branch 'mobt775_samos_3' of https://github.com/brhooper/improve…
mspelman07 Jul 10, 2025
2293832
Acceptance test and cli for gams -test
mspelman07 Jul 14, 2025
12e9d7e
add cli for estimate_samos_coefficients
mspelman07 Jul 15, 2025
e7a4833
add CLI and tests for apply_samos_coefficients
mspelman07 Jul 23, 2025
4a0db0e
Adding additional comments and fixing saving pickles
mspelman07 Jul 23, 2025
1212008
Add CLI and tests for estimate-samos-gams-from-table
mspelman07 Jul 24, 2025
4b2b585
Add CLI and test changes for estimate-samos-coefficients-from-table
mspelman07 Jul 24, 2025
01fd36c
Merge remote-tracking branch 'upstream/master' into samos_cli_parquet
bayliffe Aug 20, 2025
cca24fd
Modify CLI to allow it to accept multiple additional predictor cubes.…
bayliffe Aug 21, 2025
3c551d4
Remove EMOS predictor option from estimate samos coefficients CLI. Re…
bayliffe Aug 21, 2025
f894596
Reorder argument list in apply samos CLI so an indeterminate number o…
bayliffe Aug 21, 2025
9808910
Create statistical.py. Add functionality to fit GAMs using pyGAM.
brhooper May 1, 2025
fceff87
Create GAMPredict class. Rename FitGAM to GAMFit. Add doc-strings and…
brhooper May 2, 2025
0939adf
Fix doc-string typos. Run black to check code formatting.
brhooper May 2, 2025
a0fc6df
Move pygam imports in to Classes/tests to reduce depenendency on this…
brhooper May 2, 2025
9f2789a
Create samos_calibration.py and create TrainGAMsForSAMOS class within…
brhooper May 8, 2025
ce4638f
Add tests for TrainGAMsForSAMOS plugin. Modify TrainGAMsForSAMOS plug…
brhooper May 12, 2025
fd709de
Extended calculate_cube_statistics method to handle rolling window ca…
brhooper May 14, 2025
f6d6afd
Create functions for converting between cube and dataframe representa…
brhooper May 16, 2025
d4b0e56
Improve samos_calibration tests helper functions. Refactor TrainEMOSF…
brhooper May 23, 2025
d699819
Make tests for TrainGAMsForSAMOS and TrainEMOSForSAMOS simpler to und…
brhooper May 23, 2025
1b8c3be
Improve doc-strings and type hints. Move test helper functions to rel…
brhooper May 28, 2025
537d0df
Formatting changes.
brhooper May 28, 2025
f443615
Improvements to doc-strings and other changes following first review.
brhooper Jun 10, 2025
bf61330
Changes following review. Largest change is addition of calculate_sta…
brhooper Aug 19, 2025
d16a594
Changes following review. Make calculate_statistic_by_rolling_window …
brhooper Aug 27, 2025
46bfb6c
Start using collapse_realizations methods in improver.utilities.cube_…
brhooper Aug 27, 2025
4afe241
Minor changes following review.
brhooper Aug 29, 2025
787b825
Move get_climatological_stats method of TrainGAMsForSAMOS class to be…
brhooper May 29, 2025
e0173f5
Create ApplySAMOS plugin. Start adding tests for this plugin. Create …
brhooper Jun 30, 2025
1079c92
Add explciit handling of spot data when converting cubes to dataframe…
brhooper Jul 10, 2025
6ea0739
Fix errors introduced when rebasing.
brhooper Aug 27, 2025
3d5b5ae
Add sd_clip to get_climatological_stats to enforce a lower bound on s…
brhooper Aug 29, 2025
f3f3c28
Fix test_TrainEMOSForSAMOS.py tests that were failing due to a change…
brhooper Aug 29, 2025
ca62a14
Remove SAMOS CLIs as these are being created under another ticket.
brhooper Aug 29, 2025
d3cd219
Ensure ApplySAMOS.process() only looks for ECC bounds when handling r…
brhooper Aug 29, 2025
df32e39
Merge branch 'mobt775_samos_3' into samos_cli_parquet
bayliffe Sep 1, 2025
e9913e8
Merge changes
bayliffe Sep 1, 2025
8f89ac9
Fixing up CLI related tests after the introduction of pickle related …
bayliffe Sep 1, 2025
e2d2f34
Removed unused import.
bayliffe Sep 1, 2025
9bf01f5
Fix up application of SAMOS to match sites appropriately by providing…
bayliffe Sep 2, 2025
38f8995
Add unique_site_id_key argument to the SAMOS related CLIs.
bayliffe Sep 2, 2025
f571a95
Add tools for splitting gams, cubes, and parquet files from an input …
bayliffe Sep 4, 2025
cf63b92
Remove inputpickle type. Adopt joblib method of pickle writing.
bayliffe Sep 4, 2025
0955749
Add empty return to estimate_samos_gams in cases where there is no tr…
bayliffe Sep 4, 2025
28c3ceb
Modify SAMOS from table CLIs to use filepath lists as input so that m…
bayliffe Sep 4, 2025
3474fd5
Modify remaining SAMOS CLIs that need to use a filepath list as input…
bayliffe Sep 4, 2025
dea15d7
Set wmo_id as default unique identifier for training of gams.
bayliffe Sep 4, 2025
2c3c673
Ensure result is not None before writing a file.
bayliffe Sep 4, 2025
9f174af
Merge pull request #1 from bayliffe/samos_cli_parquet
brhooper Sep 11, 2025
2982e9d
Remove unnecessary print statements.
brhooper Aug 29, 2025
303a67c
Enforce consistent use of joblib for handling pickle files in CLIs an…
brhooper Sep 18, 2025
6088ce0
Fix doc-strings
brhooper Sep 19, 2025
d63307a
Another attempt to fix doc-strings. Ensure joblib is used consistentl…
brhooper Sep 19, 2025
87aed63
Fix doc-strings again. Move a pyarrow import within the function in w…
brhooper Sep 19, 2025
45e0be8
Fix doc-strings.
brhooper Sep 19, 2025
55210d4
Modify test_ApplySAMOS.py so that all tests within the file only run …
brhooper Sep 19, 2025
94399cb
Fix test_with_output_pickle now that we use joblib for all pickle fil…
brhooper Sep 19, 2025
91a1c17
Ensure test_get_climatological_stats is skipped if pygam is not avail…
brhooper Sep 19, 2025
24cf2a2
Correct probability template cube handling in split_cubes_for_samos. …
brhooper Sep 19, 2025
186530a
Remove unnecessary print statements.
brhooper Sep 22, 2025
37b89eb
Add unit tests for prepare_cube_no_calibration
brhooper Sep 22, 2025
43f0f8c
Fix doc-string.
brhooper Sep 22, 2025
5f0ae37
Another doc-string change.
brhooper Sep 22, 2025
ac32cca
Changes following review.
brhooper Sep 25, 2025
c90b391
Combine compare() and compare_pickled_objects() in improver_tests/acc…
brhooper Sep 26, 2025
550fc0b
Modify instances where tuples of cubes are returned so that cubelists…
brhooper Sep 29, 2025
308f31a
Add unit tests for convert_cube_to_parquet() function.
brhooper Sep 30, 2025
ff144ab
Recreate checksums
brhooper Sep 30, 2025
c014404
Resolve conflicts with master
brhooper Sep 30, 2025
e941c64
Changes following review.
brhooper Oct 1, 2025
187473e
Change how we check whether pyarrow is available in environment.
brhooper Oct 1, 2025
ec93dea
Add handling to check whether pyarrow is available to these unit tests.
brhooper Oct 1, 2025
eae8c34
Minor changes following review.
brhooper Oct 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 172 additions & 36 deletions improver/calibration/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
and coefficient inputs.
"""

import glob
from collections import OrderedDict
from pathlib import Path
from typing import Dict, List, Optional, Tuple, Union
Expand All @@ -19,6 +20,8 @@
get_diagnostic_cube_name_from_probability_name,
)
from improver.utilities.cube_manipulation import MergeCubes
from improver.utilities.flatten import flatten
from improver.utilities.load import load_cubelist


class CalibrationSchemas:
Expand Down Expand Up @@ -268,71 +271,76 @@ def split_forecasts_and_bias_files(cubes: CubeList) -> Tuple[Cube, Optional[Cube
return forecast_cube, bias_cubes


def split_pickle_parquet_and_netcdf(
files: List[Path],
) -> Tuple[Optional[CubeList], Optional[List[Path]], Optional[object]]:
"""Split the input files into pickle, parquet, and netcdf files.
Any or all of NetCDF, Parquet, and pickle files can be loaded. Only a single
pickle file is expected, but multiple netCDF and parquet files can be provided.
def split_netcdf_parquet_pickle(files):
"""Split the input files into netcdf, parquet, and pickle files.
Only a single pickle file is expected.

Args:
files:
A list of input file paths.
A list of input file paths which will be split into pickle,
parquet, and netcdf files.

Returns:
- A flattened cube list containing all the cubes loaded from NetCDF files, or None.
- A list of paths to Parquet files, or None.
- A loaded pickle file, or None.
- A flattened cube list containing all the cubes contained within the
provided paths to NetCDF files.
- A list of paths to Parquet files.
- A loaded pickle file.

Raises:
ValueError: If the path provided is not loadable as a pickle file, parquet file
or netcdf file.
ValueError: If multiple pickle files provided, as only one is ever expected.
"""
cubes = iris.cube.CubeList()
loaded_pickle = None
cubes = CubeList([])
loaded_pickles = []
parquets = []

for file_path in files:
if not file_path.exists():
continue
for file in files:
file_paths = glob.glob(str(file))
for file_path_str in file_paths:
file_path = Path(file_path_str)
if not file_path.exists():
continue

# Directories indicate we are working with parquet files.
if file_path.is_dir():
parquets.append(file_path)
continue
# Directories indicate we are working with parquet files.
if file_path.is_dir():
parquets.append(file_path)
continue

try:
cube = iris.load(file_path)
cubes.extend(cube)
except ValueError:
if loaded_pickle is not None:
msg = "Multiple pickle inputs have been provided. Only one is expected."
raise ValueError(msg)
try:
loaded_pickle = joblib.load(file_path)
except Exception as e:
msg = f"Failed to load {file_path}: {e}"
raise ValueError(msg)
cube = load_cubelist(str(file_path))
cubes.extend(cube)
except ValueError:
try:
loaded_pickles.append(joblib.load(file_path))
except Exception as e:
msg = f"Failed to load {file_path}: {e}"
raise ValueError(msg)

if len(loaded_pickles) > 1:
msg = "Multiple pickle inputs have been provided. Only one is expected."
raise ValueError(msg)

return (
cubes if cubes else None,
parquets if parquets else None,
loaded_pickle if loaded_pickle else None,
loaded_pickles[0] if loaded_pickles else None,
)


def identify_parquet_type(
parquet_paths: List[Path],
) -> Tuple[Optional[Path], Optional[Path]]:
def identify_parquet_type(parquet_paths: List[Path]):
"""Determine whether the provided parquet paths contain forecast or truth data.
This is done by checking the columns within the parquet files for the presence
of a forecast_period column which is only present for forecast data.

Args:
parquet_paths:
A list of paths to Parquet files.


Returns:
- The path to the Parquet file containing the historical forecasts.
- The path to the Parquet file containing the truths.
"""
# import here to avoid dependency on pyarrow for all of improver
import pyarrow.parquet as pq

forecast_table_path = None
Expand All @@ -351,6 +359,134 @@ def identify_parquet_type(
return forecast_table_path, truth_table_path


def split_cubes_for_samos(
cubes: CubeList,
gam_features: List[str],
truth_attribute: Optional[str] = None,
expect_emos_coeffs: bool = False,
expect_emos_fields: bool = False,
):
"""Function to split the forecast, truth, gam additional predictors and emos
additional predictor cubes.

Args:
cubes:
A list of input cubes which will be split into relevant groups.
gam_features:
A list of strings containing the names of the additional fields
required for the SAMOS GAMs.
truth_attribute:
An attribute and its value in the format of "attribute=value",
which must be present on truth cubes. If None, no truth cubes are
expected or returned.
expect_emos_coeffs:
If True, EMOS coefficient cubes are expected to be found in the input
cubes. If False, an error will be raised if any such cubes are found.
expect_emos_fields:
If True, additional EMOS fields are expected to be found in the input
cubes. If False, an error will be raised if any such cubes are found.

Raises:
IOError:
If EMOS coefficients cubes are found when they are not expected.
IOError:
If additional fields cubes are found which do not match the features in
gam_features.
IOError:
If probability cubes are provided with more than one name.

Returns:
- A cube containing all the historic forecasts, or None if no such cubes
were found.
- A cube containing all the truth data, or None if no such cubes were found
or no truth_attribute was provided.
- A cubelist containing all the additional fields required for the GAMs,
or None if no such cubes were found.
- A cubelist containing all the EMOS coefficient cubes, or None if no such
cubes were found.
- A cubelist containing all the additional fields required for EMOS,
or None if no such cubes were found.
- A cube containing a probability template, or None if no such cube is found.
"""
forecast = iris.cube.CubeList([])
truth = iris.cube.CubeList([])
gam_additional_fields = iris.cube.CubeList([])
emos_coefficients = iris.cube.CubeList([])
emos_additional_fields = iris.cube.CubeList([])
prob_template = None

# Prepare variables used to split forecast and truth.
truth_key, truth_value = None, None
if truth_attribute:
truth_key, truth_value = truth_attribute.split("=")

for cube in flatten(cubes):
if "time" in [c.name() for c in cube.coords()]:
if truth_key and cube.attributes.get(truth_key) == truth_value:
truth.append(cube.copy())
else:
forecast.append(cube.copy())
elif "emos_coefficient" in cube.name():
emos_coefficients.append(cube.copy())
elif cube.name() in gam_features:
gam_additional_fields.append(cube.copy())
else:
emos_additional_fields.append(cube.copy())

# Check that all required inputs are present and no unexpected cubes have been
# found.
missing_inputs = []
if len(forecast) == 0:
missing_inputs.append("forecast")
if truth_key and len(truth) == 0:
missing_inputs.append("truth")
if missing_inputs:
raise IOError(f"Missing {' and '.join(missing_inputs)} input.")

if not expect_emos_coeffs and len(emos_coefficients) > 0:
msg = (
f"Found EMOS coefficients cubes when they were not expected. The following "
f"such cubes were found: {[c.name() for c in emos_coefficients]}."
)
raise IOError(msg)

if not expect_emos_fields and len(emos_additional_fields) > 0:
msg = (
f"Found additional fields cubes which do not match the features in "
f"gam_features. The following cubes were found: "
f"{[c.name() for c in emos_additional_fields]}."
)
raise IOError(msg)

# Split out prob_template cube if required.
forecast_names = [c.name() for c in forecast]
prob_forecast_names = [name for name in forecast_names if "probability" in name]
if len(set(prob_forecast_names)) > 1:
msg = (
"Providing multiple probability cubes is not supported. A probability cube "
"can either be provided as the forecast or the probability template, but "
f"not both. Cubes provided: {prob_forecast_names}."
)
raise IOError(msg)
else:
if len(set(forecast_names)) > 1:
prob_template = forecast.extract(prob_forecast_names[0])[0]
forecast.remove(prob_template)

forecast = MergeCubes()(forecast)
if truth_key:
truth = MergeCubes()(truth)

return (
None if not forecast else forecast,
None if not truth else truth,
None if not gam_additional_fields else gam_additional_fields,
None if not emos_coefficients else emos_coefficients,
None if not emos_additional_fields else emos_additional_fields,
prob_template,
)


def validity_time_check(forecast: Cube, validity_times: List[str]) -> bool:
"""Check the validity time of the forecast matches the accepted validity times
within the validity times list.
Expand Down
Loading