ESMValGroup · schlunma · Feb 28, 2025 · Jan 15, 2024 · Jan 16, 2024 · Jan 17, 2024
diff --git a/doc/sphinx/source/recipes/figures/benchmarking/annual_cycle.png b/doc/sphinx/source/recipes/figures/benchmarking/annual_cycle.png
diff --git a/doc/sphinx/source/recipes/figures/benchmarking/boxplots.png b/doc/sphinx/source/recipes/figures/benchmarking/boxplots.png
diff --git a/doc/sphinx/source/recipes/figures/benchmarking/diurnal_cycle.png b/doc/sphinx/source/recipes/figures/benchmarking/diurnal_cycle.png
diff --git a/doc/sphinx/source/recipes/figures/benchmarking/map.png b/doc/sphinx/source/recipes/figures/benchmarking/map.png
diff --git a/doc/sphinx/source/recipes/figures/benchmarking/timeseries.png b/doc/sphinx/source/recipes/figures/benchmarking/timeseries.png
diff --git a/doc/sphinx/source/recipes/figures/benchmarking/zonal.png b/doc/sphinx/source/recipes/figures/benchmarking/zonal.png
diff --git a/...inx/source/recipes/figures/model_evaluation/diurnal_cycle_clt_sepacific_3hr.png b/...inx/source/recipes/figures/model_evaluation/diurnal_cycle_clt_sepacific_3hr.png
diff --git a/doc/sphinx/source/recipes/figures/monitor/diurnal_cycle_clt_tropics_3hr.png b/doc/sphinx/source/recipes/figures/monitor/diurnal_cycle_clt_tropics_3hr.png
diff --git a/...s/figures/monitor/diurnalcycle_pr_tropics_EC-Earth3_3hr_historical_r1i1p1f1.png b/...s/figures/monitor/diurnalcycle_pr_tropics_EC-Earth3_3hr_historical_r1i1p1f1.png
diff --git a/doc/sphinx/source/recipes/index.rst b/doc/sphinx/source/recipes/index.rst
@@ -21,6 +21,7 @@ large variety of input data.
 .. toctree::
    :maxdepth: 1
 
+   recipe_benchmarking
    recipe_model_evaluation
    recipe_monitor
    recipe_portrait

diff --git a/doc/sphinx/source/recipes/recipe_benchmarking.rst b/doc/sphinx/source/recipes/recipe_benchmarking.rst
@@ -0,0 +1,175 @@
+.. _recipe_benchmarking:
+
+Model Benchmarking
+==================
+
+Overview
+--------
+
+These recipes and diagnostics are based on :ref:`recipe_monitor <recipe_monitor>` that allow plotting arbitrary preprocessor output, i.e., arbitrary variables from arbitrary datasets. An extension of these diagnostics is used to benchmark a model simulation with other datasets (e.g. CMIP6). The benchmarking features are described in `Lauer et al.`_.
+
+.. _`Lauer et al.`: https://doi.org/10.5194/egusphere-2024-1518
+
+Available recipes and diagnostics
+---------------------------------
+
+Recipes are stored in `recipes/model_evaluation`
+
+* recipe_model_benchmarking_annual_cycle.yml
+* recipe_model_benchmarking_boxplots.yml
+* recipe_model_benchmarking_diurnal_cycle.yml
+* recipe_model_benchmarking_maps.yml
+* recipe_model_benchmarking_timeseries.yml
+* recipe_model_benchmarking_zonal.yml
+
+Diagnostics are stored in `diag_scripts/monitor/`
+
+* :ref:`multi_datasets.py
+  <api.esmvaltool.diag_scripts.monitor.multi_datasets>`:
+  Monitoring diagnostic to show multiple datasets in one plot (incl. biases).
+
+
+Recipe settings
+~~~~~~~~~~~~~~~
+
+See :ref:`multi_datasets.py<api.esmvaltool.diag_scripts.monitor.multi_datasets>` for a list of all possible configuration options that can be specified in the recipe.
+
+.. note::
+   Please note that exactly one dataset (the dataset to be benchmarked) needs to specify the facet ``benchmark_dataset: true`` in the dataset entry of the recipe. For line plots (i.e. annual cycle, diurnal cycle, time series), it is recommended, to specify a particular line color and line style in the ``scripts`` section of the recipe for the dataset to be benchmarked (``benchmark_dataset: true``) so that this dataset is easy to identify in the plot. In the example below, MIROC6 is the dataset to be benchmarked and ERA5 is used as a reference dataset.
+
+.. code-block:: yaml
+
+   scripts:
+     allplots:
+       script: monitor/multi_datasets.py
+       plot_folder: '{plot_dir}'
+       plot_filename: '{plot_type}_{real_name}_{mip}'
+       group_variables_by: variable_group
+       facet_used_for_labels: alias
+       plots:
+         diurnal_cycle:
+           annual_mean_kwargs: false
+           legend_kwargs:
+             loc: upper right
+           plot_kwargs:
+             'MIROC6':
+               color: red
+               label: '{alias}'
+               linestyle: '-'
+               linewidth: 2
+               zorder: 4
+             ERA5:
+               color: black
+               label: '{dataset}'
+               linestyle: '-'
+               linewidth: 2
+               zorder: 3
+             MultiModelPercentile10:
+               color: gray
+               label: '{dataset}'
+               linestyle: '--'
+               linewidth: 1
+               zorder: 2
+             MultiModelPercentile90:
+               color: gray
+               label: '{dataset}'
+               linestyle: '--'
+               linewidth: 1
+               zorder: 2
+             default:
+               color: lightgray
+               label: null
+               linestyle: '-'
+               linewidth: 1
+               zorder: 1
+
+Variables
+---------
+
+Any, but the variables' number of dimensions should match the ones expected by each plot.
+
+References
+----------
+
+* Lauer, A., L. Bock, B. Hassler, P. Jöckel, L. Ruhe, and M. Schlund: Monitoring and benchmarking Earth
+  System Model simulations with ESMValTool v2.12.0, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2024-1518, 2024.
+
+Example plots
+-------------
+
+.. _fig_benchmarking_annual_cycle:
+.. figure::  /recipes/figures/benchmarking/annual_cycle.png
+   :align:   center
+   :width:   16cm
+
+   (Left) Multi-year global mean (2000-2004) of the seasonal cycle of near-surface temperature
+   in K from a simulation of MIROC6 and the reference dataset HadCRUT5 (black). The thin gray
+   lines show individual CMIP6 models used for comparison, the dashed gray lines show the 10%
+   and 90% percentiles of these CMIP6 models. (Right) same as (left) but for area-weighted RMSE
+   of near-surface temperature. The light blue shading shows the range of the 10% to 90%
+   percentiles of RMSE values from the ensemble of CMIP6 models used for comparison. Created
+   with recipe_model_benchmarking_annual_cycle.yml.
+
+.. _fig_benchmarking_boxplots:
+.. figure::  /recipes/figures/benchmarking/boxplots.png
+   :align:   center
+   :width:   16cm
+
+   (Left) Global area-weighted RMSE (smaller=better), (middle) weighted Pearson’s correlation
+   coefficient (higher=better) and (right) weighted Earth mover’s distance (smaller=better) of
+   the geographical pattern of 5-year means of different variables from a simulation of MIROC6
+   (red cross) in comparison to the CMIP6 ensemble (boxplot). Reference datasets for calculating
+   the three metrics are: near-surface temperature (tas): HadCRUT5, surface temperature (ts):
+   HadISST, precipitation (pr): GPCP-SG, air pressure at sea level (psl): ERA5, shortwave (rsut)
+   longwave (rlut) radiative fluxes at TOA and shortwave (swcre) and longwave (lwcre) cloud
+   radiative effects: CERES-EBAF. Each box indicates the range from the first quartile to the
+   third quartile, the vertical lines show the median, and the whiskers the minimum and maximum
+   values, excluding the outliers. Outliers are defined as being outside 1.5 times the
+   interquartile range. Created with recipe_model_benchmarking_boxplots.yml.
+
+.. _fig_benchmarking_diurn_cycle:
+.. figure::  /recipes/figures/benchmarking/diurnal_cycle.png
+   :align:   center
+   :width:   10cm
+
+   Area-weighted RMSE of the annual mean diurnal cycle (year 2000) of precipitation averaged over
+   the tropical ocean (ocean grid cells in the latitude belt 30°S to 30°N) from a simulation of
+   MIROC6 averaged compared with ERA5 data (black). The light blue shading shows the range of the
+   10% to 90% percentiles of RMSE values from the ensemble of CMIP6 models used for comparison.
+   Created with recipe_benchmarking_diurnal_cycle.yml.
+
+.. _fig_benchmarking_map:
+.. figure::  /recipes/figures/benchmarking/map.png
+   :align:   center
+   :width:   10cm
+
+   5-year annual mean (2000-2004) area-weighted RMSE of the precipitation rate in mm day-1 from a
+   simulation of MIROC6 compared with GPCP-SG data. The stippled areas mask grid cells where the
+   RMSE is smaller than the 90% percentile of RMSE values from an ensemble of CMIP6 models.
+   Created with recipe_model_benchmarking_maps.yml
+
+.. _fig_benchmarking_timeseries:
+.. figure::  /recipes/figures/benchmarking/timeseries.png
+   :align:   center
+   :width:   16cm
+
+   (Left) Time series from 2000 through 2014 of global average monthly mean temperature anomalies
+   (reference period 2000-2009) of the near-surface temperature in K from a simulation of MIROC6
+   (red) and the reference dataset HadCRUT5 (black). The thin gray lines show individual CMIP6
+   models used for comparison, the dashed gray lines show the 10% and 90% percentiles of these
+   CMIP6 models. (Right) same as (left) but for area-weighted RMSE of the near-surface air
+   temperature. The light blue shading shows the range of the 10% to 90% percentiles of RMSE
+   values from the ensemble of CMIP6 models used for comparison. Created with
+   recipe_model_benchmarking_timeseries.yml.
+
+.. _fig_benchmarking_zonal:
+.. figure::  /recipes/figures/benchmarking/zonal.png
+   :align:   center
+   :width:   10cm
+
+   5-year annual mean bias (2000-2004) of the zonally averaged temperature in K from a historical
+   simulation of MIROC6 compared with ERA5 reanalysis data. The stippled areas mask grid cells
+   where the absolute BIAS (:math:`|BIAS|`) is smaller than the maximum of the absolute 10%
+   (:math:`|p10|`) and the absolute 90% (:math:`|p90|`) percentiles from an ensemble of CMIP6
+   models, i.e. :math:`|BIAS| \geq max( |p10|, |p90|)`. Created with
+   recipe_model_benchmarking_zonal.yml.
diff --git a/doc/sphinx/source/recipes/recipe_model_evaluation.rst b/doc/sphinx/source/recipes/recipe_model_evaluation.rst
@@ -62,37 +62,38 @@ section).
 Example plots
 -------------
 
-.. _fig_1:
 .. figure::  /recipes/figures/model_evaluation/map_tas_MPI-ESM1-2-HR_Amon.jpg
    :align:   center
    :width:   14cm
 
-Global climatology of 2m near-surface air temperature.
+   Global climatology of 2m near-surface air temperature.
 
-.. _fig_2:
 .. figure::  /recipes/figures/model_evaluation/map_swcre_MPI-ESM1-2-HR_Amon.jpg
    :align:   center
    :width:   14cm
 
-Global climatology of the shortwave cloud radiative effect (SWCRE).
+   Global climatology of the shortwave cloud radiative effect (SWCRE).
 
-.. _fig_3:
 .. figure::  /recipes/figures/model_evaluation/timeseries_rtnt_ambiguous_dataset_Amon.jpg
    :align:   center
    :width:   14cm
 
-Time series of the global mean top-of-the-atmosphere net radiative flux.
+   Time series of the global mean top-of-the-atmosphere net radiative flux.
 
-.. _fig_4:
 .. figure::  /recipes/figures/model_evaluation/variable_vs_lat_pr_Amon.jpg
    :align:   center
    :width:   14cm
 
-Zonal mean precipitation.
+   Zonal mean precipitation.
 
-.. _fig_5:
 .. figure::  /recipes/figures/model_evaluation/annual_cycle_clt_southerocean_Amon.jpg
    :align:   center
    :width:   14cm
 
-Annual cycle of Southern Ocean total cloud cover.
+   Annual cycle of Southern Ocean total cloud cover.
+
+.. figure::  /recipes/figures/model_evaluation/diurnal_cycle_clt_sepacific_3hr.png
+   :align:   center
+   :width:   14cm
+
+   Diurnal cycle of Southeast Pacific total cloud cover.
diff --git a/doc/sphinx/source/recipes/recipe_monitor.rst b/doc/sphinx/source/recipes/recipe_monitor.rst
@@ -145,88 +145,102 @@ Example plots
    :align:   center
    :width:   14cm
 
-Global climatology of tas.
+   Global climatology of tas.
 
 .. _fig_seasonclimglobal:
 .. figure::  /recipes/figures/monitor/seasonclim.png
    :align:   center
    :width:   14cm
 
-Seasonal climatology of pr, with a custom colorbar.
+   Seasonal climatology of pr, with a custom colorbar.
 
 .. _fig_monthlyclimglobal:
 .. figure::  /recipes/figures/monitor/monclim.png
    :align:   center
    :width:   14cm
 
-Monthly climatology of sivol, only for March and September.
+   Monthly climatology of sivol, only for March and September.
 
 .. _fig_timeseries:
 .. figure::  /recipes/figures/monitor/timeseries.png
    :align:   center
    :width:   14cm
 
-Timeseries of Niño 3.4 index, computed directly with the preprocessor.
+   Timeseries of Niño 3.4 index, computed directly with the preprocessor.
 
 .. _fig_annual_cycle:
 .. figure::  /recipes/figures/monitor/annualcycle.png
    :align:   center
    :width:   14cm
 
-Annual cycle of tas.
+   Annual cycle of tas.
 
 .. _fig_timeseries_with_ref:
 .. figure::  /recipes/figures/monitor/timeseries_with_ref.png
    :align:   center
    :width:   14cm
 
-Timeseries of tas including a reference dataset.
+   Timeseries of tas including a reference dataset.
 
 .. _fig_annual_cycle_with_ref:
 .. figure::  /recipes/figures/monitor/annualcycle_with_ref.png
    :align:   center
    :width:   14cm
 
-Annual cycle of tas including a reference dataset.
+   Annual cycle of tas including a reference dataset.
+
+.. _fig_diurnal_cycle:
+.. figure::  /recipes/figures/monitor/diurnalcycle_pr_tropics_EC-Earth3_3hr_historical_r1i1p1f1.png
+   :align:   center
+   :width:   14cm
+
+   Diurnal cycle of precipitation in the Tropics from EC-Earth3.
+
+.. _fig_diurnal_cycle_with_ref:
+.. figure::  /recipes/figures/monitor/diurnal_cycle_clt_tropics_3hr.png
+   :align:   center
+   :width:   14cm
+
+   Diurnal cycle of clt including a reference dataset.
 
 .. _fig_map_with_ref:
 .. figure::  /recipes/figures/monitor/map_with_ref.png
    :align:   center
    :width:   14cm
 
-Global climatology of tas including a reference dataset.
+   Global climatology of tas including a reference dataset.
 
 .. _fig_zonal_mean_profile_with_ref:
 .. figure::  /recipes/figures/monitor/zonalmean_profile_with_ref.png
    :align:   center
    :width:   14cm
 
-Zonal mean profile of ta including a reference dataset.
+   Zonal mean profile of ta including a reference dataset.
 
 .. _fig_1d_profile_with_ref:
 .. figure::  /recipes/figures/monitor/1d_profile_with_ref.png
    :align:   center
    :width:   14cm
 
-1D profile of ta including a reference dataset.
+   1D profile of ta including a reference dataset.
 
 .. _fig_variable_vs_lat_with_ref:
 .. figure::  /recipes/figures/monitor/variable_vs_lat_with_ref.png
    :align:   center
    :width:   14cm
 
-Zonal mean pr including a reference dataset.
+   Zonal mean pr including a reference dataset.
 
 .. _fig_hovmoeller_z_vs_time_with_ref:
 .. figure::  /recipes/figures/monitor/hovmoeller_z_vs_time_with_ref.png
    :align:   center
    :width:   14cm
 
-Hovmoeller plot (pressure vs. time) of ta including a reference dataset.
+   Hovmoeller plot (pressure vs. time) of ta including a reference dataset.
 
 .. _fig_hovmoeller_time_vs_lat_with_ref:
 .. figure:: /recipes/figures/monitor/hovmoeller_time_vs_lat_with_ref.png
    :align:   center
    :width:   14cm
 
-Hovmoeller plot (time vs. latitude) of tas including a reference dataset
+   Hovmoeller plot (time vs. latitude) of tas including a reference dataset
diff --git a/doc/sphinx/source/recipes/recipe_thermodyn_diagtool.rst b/doc/sphinx/source/recipes/recipe_thermodyn_diagtool.rst
@@ -27,7 +27,7 @@ in pressure levels, the daily fields of 2D near-surface temperature and horizont
 required to perform a vertical interpolation, substituting data in pressure levels where surface pressure is
 lower than the respective level and fields are not stored as an output of the analysed model.
 
-The material entropy production is computed by using the indirect or the direct method (or both). The former 
+The material entropy production is computed by using the indirect or the direct method (or both). The former
 method relies on the convergence of radiative heat in the atmosphere (cfr. Lucarini et al., 2011; Pascale et al., 2011),
 the latter on all viscous and non-viscous dissipative processes occurring in the atmosphere
 (namely the sensible heat fluxes, the hydrological cycle with its components and the kinetic energy dissipation).
@@ -139,12 +139,10 @@ References
 Example plots
 -------------
 
-.. _fig_1:
 .. figure:: /recipes/figures/thermodyn_diagtool/meridional_transp.png
    :align:   left
    :width:   14cm
 
-.. _fig_2:
 .. figure:: /recipes/figures/thermodyn_diagtool/CanESM2_wmb_transp.png
    :align:   right
    :width:   14cm
diff --git a/esmvaltool/config-references.yml b/esmvaltool/config-references.yml
@@ -810,6 +810,7 @@ projects:
   crescendo: EU H2020 project CRESCENDO
   dlrveu2: DLR project VEU2
   dlrveu: DLR project VEU
+  dlrmabak: DLR project MABAK
   embrace: EU FP7 project EMBRACE
   esm2025: EU H2020 project ESM2025 - Earth system models for the future
   esmval: DLR project ESMVal

diff --git a/esmvaltool/diag_scripts/clouds/clouds.ncl b/esmvaltool/diag_scripts/clouds/clouds.ncl
@@ -114,7 +114,8 @@ begin
 
   variables = metadata_att_as_array(variable_info, "short_name")
   if (.not. any(variables .eq. var0)) then
-    errstr = "diagnostic " + diag + " requires the following variable: " + var0
+    errstr = "diagnostic " + DIAG_SCRIPT \
+             + " requires the following variable: " + var0
     error_msg("f", DIAG_SCRIPT, "", errstr)
   end if
 
@@ -539,6 +540,10 @@ begin
       res@cnLevels            = ispan(0, 60, 5)
     end if
 
+    if (var0.eq."ts") then
+      res@cnLevels            = ispan(274, 304, 2)
+    end if
+
 ;    res@lbLabelBarOn         = False
     res@gsnRightString       = ""