ESMValGroup · valeriupredoi · Oct 3, 2022 · Jul 25, 2022 · Sep 2, 2022 · Sep 2, 2022
diff --git a/doc/develop/fixing_data.rst b/doc/develop/fixing_data.rst
@@ -415,27 +415,34 @@ To allow ESMValCore to locate the data files, use the following steps:
         ICON:
           ...
           input_dir:
-            default: '{version}_{component}_{exp}_{grid}_{ensemble}'
+            default:
+              - '{exp}'
+              - '{exp}/outdata'
           input_file:
-            default: '{version}_{component}_{exp}_{grid}_{ensemble}_{var_type}*.nc'
+            default: '{exp}_{var_type}*.nc'
           ...
 
-     To find your ICON data that is for example located in
-     ``{rootpath}/42-0_atm_amip_R2B5_r1i1/42-0_atm_amip_R2B5_r1i1_2d_1979.nc``
-     (``{rootpath}`` is ESMValTool ``rootpath`` for the project ``ICON``
-     defined in your :ref:`user configuration file`), use the following dataset
-     entry in your recipe:
+     To find your ICON data that is for example located in files like
+     ``{rootpath}/amip/amip_atm_2d_ml_20000101T000000Z.nc`` (``{rootpath}`` is
+     ESMValTool ``rootpath`` for the project ``ICON`` defined in your
+     :ref:`user configuration file`), use the following dataset entry in your
+     recipe:
 
      .. code-block:: yaml
 
         datasets:
-          - {project: ICON, dataset: ICON, version: 42-0, component: atm, exp: amip, grid: R2B5, ensemble: r1i1, var_type: 2d}
+          - {project: ICON, dataset: ICON, exp: amip}
 
      Please note the duplication of the name ``ICON`` in ``project`` and
      ``dataset``, which is necessary to comply with ESMValTool's data finding
      and CMORizing functionalities.
      For other native models, ``dataset`` could also refer to a subversion of
      the model.
+     Note that it is possible to predefine facets in an :ref:`extra facets file
+     <add_new_fix_native_datasets_extra_facets>`.
+     In this ICON example, the facet ``var_type`` is :download:`predefined
+     </../esmvalcore/_config/extra_facets/icon-mappings.yml>` for many
+     variables.
 
 .. _add_new_fix_native_datasets_fix_data:
 

diff --git a/doc/quickstart/configure.rst b/doc/quickstart/configure.rst
@@ -639,10 +639,12 @@ Example:
    ICON:
      cmor_strict: false
      input_dir:
-       default: '{version}_{component}_{exp}_{grid}_{ensemble}'
+       default:
+         - '{exp}'
+         - '{exp}/outdata'
      input_file:
-       default: '{version}_{component}_{exp}_{grid}_{ensemble}_{var_type}*.nc'
-     output_file: '{dataset}_{version}_{component}_{grid}_{mip}_{exp}_{ensemble}_{short_name}_{var_type}'
+       default: '{exp}_{var_type}*.nc'
+     output_file: '{project}_{dataset}_{exp}_{var_type}_{mip}_{short_name}'
      cmor_type: 'CMIP6'
      cmor_default_table_prefix: 'CMIP6_'
 

diff --git a/doc/quickstart/find_data.rst b/doc/quickstart/find_data.rst
@@ -148,6 +148,91 @@ The following models are natively supported by ESMValCore.
 In contrast to the native observational datasets listed above, they use
 dedicated projects instead of the project ``native6``.
 
+.. _read_cesm:
+
+CESM
+^^^^
+
+ESMValTool is able to read native `CESM <https://www.cesm.ucar.edu/>`__ model
+output.
+
+.. warning::
+
+   The support for native CESM output is still experimental.
+   Currently, only one variable (`tas`) is fully supported. Other 2D variables
+   might be supported by specifying appropriate facets in the recipe or extra
+   facets files (see text below).
+   3D variables (data that uses a vertical dimension) are not supported, yet.
+
+The default naming conventions for input directories and files for CESM are
+
+* input directories: 3 different types supported:
+   * ``/`` (run directory)
+   * ``[case]/[gcomp]/hist`` (short-term archiving)
+   * ``[case]/[gcomp]/proc/[tdir]/[tperiod]`` (post-processed data)
+* input files: ``[case].[scomp].[type].[string]*nc``
+
+as configured in the :ref:`config-developer file <config-developer>` (using the
+default DRS ``drs: default`` in the :ref:`user configuration file`).
+More information about CESM naming conventions are given `here
+<https://www.cesm.ucar.edu/models/cesm2/naming_conventions.html>`__.
+
+.. note::
+
+   The ``[string]`` entry in the input file names above does not only
+   correspond to the (optional) ``$string`` entry for `CESM model output files
+   <https://www.cesm.ucar.edu/models/cesm2/naming_conventions.html#modelOutputFilenames>`__,
+   but can also be used to read `post-processed files
+   <https://www.cesm.ucar.edu/models/cesm2/naming_conventions.html#ppDataFilenames>`__.
+   In the latter case, ``[string]`` corresponds to the combination
+   ``$SSTRING.$TSTRING``.
+
+Thus, example dataset entries could look like this:
+
+.. code-block:: yaml
+
+  datasets:
+    - {project: CESM, dataset: CESM2, case: f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001, type: h0, mip: Amon, short_name: tas, start_year: 2000, end_year: 2014}
+    - {project: CESM, dataset: CESM2, case: f.e21.F1850_BGC.f09_f09_mg17.CFMIP-hadsst-piForcing.001, type: h0, gcomp: atm, scomp: cam, mip: Amon, short_name: tas, start_year: 2000, end_year: 2014}
+
+Variable-specific defaults for the facet ``gcomp`` and ``scomp`` are given in
+the extra facets (see next paragraph) for some variables, but this can be
+overwritten in the recipe.
+
+Similar to any other fix, the CESM fix allows the use of :ref:`extra
+facets<extra_facets>`.
+By default, the file :download:`cesm-mappings.yml
+</../esmvalcore/_config/extra_facets/cesm-mappings.yml>` is used for that
+purpose.
+Currently, this file only contains default facets for a single variable
+(`tas`); for other variables, these entries need to be defined in the recipe.
+Supported keys for extra facets are:
+
+==================== ====================================== =================================
+Key                  Description                            Default value if not specified
+==================== ====================================== =================================
+``gcomp``            Generic component-model name           No default (needs to be specified
+                                                            in extra facets or recipe if
+                                                            default DRS is used)
+``raw_name``         Variable name of the variable in the   CMOR variable name of the
+                     raw input file                         corresponding variable
+``scomp``            Specific component-model name          No default (needs to be specified
+                                                            in extra facets or recipe if
+                                                            default DRS is used)
+``string``           Short string which is used to further  ``''`` (empty string)
+                     identify the history file type
+                     (corresponds to ``$string`` or
+                     ``$SSTRING.$TSTRING`` in the CESM file
+                     name conventions; see note above)
+``tdir``             Entry to distinguish time averages     ``''`` (empty string)
+                     from time series from diagnostic plot
+                     sets (only used for post-processed
+                     data)
+``tperiod``          Time period over which the data was    ``''`` (empty string)
+                     processed (only used for
+                     post-processed data)
+==================== ====================================== =================================
+
 .. _read_emac:
 
 EMAC

diff --git a/esmvalcore/_config/extra_facets/cesm-mappings.yml b/esmvalcore/_config/extra_facets/cesm-mappings.yml
@@ -0,0 +1,30 @@
+# Extra facets for native CESM model output
+
+# Notes:
+# - All facets can also be specified in the recipes. The values given here are
+#   only defaults.
+# - The facets ``gcomp``, ``scomp``, ``string``, ``tdir``, and ``tperiod`` have
+#   to be specified in the recipe if they are not given here and default DRS is
+#   used.
+
+# A complete list of supported keys is given in the documentation (see
+# ESMValCore/doc/quickstart/find_data.rst).
+---
+
+CESM2:
+
+  '*':
+    # Optional facets for every variable
+    # It is necessary to define them here to allow multiple file/dir name
+    # conventions, see
+    # https://www.cesm.ucar.edu/models/cesm2/naming_conventions.html
+    '*':
+      string: ''
+      tdir: ''
+      tperiod: ''
+
+    # Default facets for variables
+    tas:
+      raw_name: TREFHT
+      gcomp: atm
+      scomp: cam
diff --git a/esmvalcore/_config/extra_facets/emac-mappings.yml b/esmvalcore/_config/extra_facets/emac-mappings.yml
@@ -1,13 +1,10 @@
 # Extra facets for native EMAC model output
 
-# All extra facets for EMAC are optional but might be necessary for some
-# variables.
-
 # Notes:
 # - All facets can also be specified in the recipes. The values given here are
 #   only defaults.
 # - The facets ``channel`` and ``postproc_flag`` have to be specified in the
-#   recipe if they are not given here.
+#   recipe if they are not given here and default DRS is used.
 # - If ``raw_name`` is omitted and no derivation in the EMAC fix is given, the
 #   CMOR short_name is used by default. To support single and multiple raw
 #   names for a variable, ``raw_name`` can be given as str and list. In the
@@ -26,6 +23,8 @@
 # A complete list of supported keys is given in the documentation (see
 # ESMValCore/doc/quickstart/find_data.rst).
 ---
+
+# Optional facets for every variable
 '*':
   '*':
     '*':

diff --git a/esmvalcore/_config/extra_facets/icon-mappings.yml b/esmvalcore/_config/extra_facets/icon-mappings.yml
@@ -1,11 +1,17 @@
 # Extra facets for native ICON model output
 
-# All extra facets for ICON are optional but might be necessary for some
-# variables. A complete list of supported keys is given in the documentation
-# (see ESMValCore/doc/quickstart/find_data.rst).
+# Notes:
+# - All facets can also be specified in the recipes. The values given here are
+#   only defaults.
+# - The facet ``var_type`` has to be specified in the recipe if it is not given
+#   here and default DRS is used.
+
+# A complete list of supported keys is given in the documentation (see
+# ESMValCore/doc/quickstart/find_data.rst).
 ---
 
 ICON:
+
   '*':
     # Cell measures
     areacella:

diff --git a/esmvalcore/_config/extra_facets/ipslcm-mappings.yml b/esmvalcore/_config/extra_facets/ipslcm-mappings.yml
@@ -6,12 +6,12 @@
 # ESMValTool to use key 'ipsl_varname' for building the filename,
 # while for format 'Output' it specifies to use key 'group'
 #
-# Specifying 'igcm_dir' here allows to avoid having to specifiy it in
+# Specifying 'igcm_dir' here allows to avoid having to specify it in
 # datasets definitions
 #
 # Key 'use_cdo' allows to choose whether CDO will be invoked for
 # selecting a variable in a multi-variable file. This generally allows
-# for smaller overal load time. But because CDO has a licence which is
+# for smaller overall load time. But because CDO has a licence which is
 # not compliant with ESMValtool licence policy, the default
 # configuration is to avoid using it. You may use customized settings
 # by installing a modified version of this file as

diff --git a/esmvalcore/cmor/_fixes/cesm/__init__.py b/esmvalcore/cmor/_fixes/cesm/__init__.py
diff --git a/esmvalcore/cmor/_fixes/cesm/cesm2.py b/esmvalcore/cmor/_fixes/cesm/cesm2.py
@@ -0,0 +1,51 @@
+"""On-the-fly CMORizer for CESM2.
+
+Warning
+-------
+The support for native CESM output is still experimental. Currently, only one
+variable (`tas`) is fully supported. Other 2D variables might be supported by
+specifying appropriate facets in the recipe or extra facets files (see
+doc/quickstart/find_data.rst for details). 3D variables are currently not
+supported.
+
+To add support for more variables, expand the extra facets file
+(esmvalcore/_config/extra_facets/cesm-mappings.yml) and/or add classes to this
+file for variables that need more complex fixes (see
+esmvalcore/cmor/_fixes/emac/emac.py for examples).
+
+"""
+
+import logging
+
+from iris.cube import CubeList
+
+from ..native_datasets import NativeDatasetFix
+
+logger = logging.getLogger(__name__)
+
+
+class AllVars(NativeDatasetFix):
+    """Fixes for all variables."""
+
+    # Dictionary to map invalid units in the data to valid entries
+    INVALID_UNITS = {
+        'fraction': '1',
+    }
+
+    def fix_metadata(self, cubes):
+        """Fix metadata."""
+        cube = self.get_cube(cubes)
+
+        # Fix time, latitude, and longitude coordinates
+        # Note: 3D variables are currently not supported
+        self.fix_regular_time(cube)
+        self.fix_regular_lat(cube)
+        self.fix_regular_lon(cube)
+
+        # Fix scalar coordinates
+        self.fix_scalar_coords(cube)
+
+        # Fix metadata of variable
+        self.fix_var_metadata(cube)
+
+        return CubeList([cube])
diff --git a/esmvalcore/config-developer.yml b/esmvalcore/config-developer.yml
@@ -176,3 +176,16 @@ ICON:
   output_file: '{project}_{dataset}_{exp}_{var_type}_{mip}_{short_name}'
   cmor_type: 'CMIP6'
   cmor_default_table_prefix: 'CMIP6_'
+
+CESM:
+  cmor_strict: false
+  input_dir:
+    default:
+      - '/'  # run directory
+      - '{case}/{gcomp}/hist'  # short-term archiving
+      - '{case}/{gcomp}/proc/{tdir}/{tperiod}'  # postprocessed data
+  input_file:
+    default: '{case}.{scomp}.{type}.{string}*nc'
+  output_file: '{project}_{dataset}_{case}_{gcomp}_{scomp}_{type}_{mip}_{short_name}'
+  cmor_type: 'CMIP6'
+  cmor_default_table_prefix: 'CMIP6_'
diff --git a/tests/integration/cmor/_fixes/cesm/__init__.py b/tests/integration/cmor/_fixes/cesm/__init__.py