From 512e337a317a516b791745b4ee425b17d23f6531 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Thu, 5 Mar 2026 19:15:35 +0100
Subject: [PATCH 01/28] ci: restore default ci (#392)

This PR restores the default ci hooks for pyFV3, pySHiELD, and pace. It
can be merged again once

1. NDSL `2026.02.00` is released
2. and the repos have been updated accordingly.

The reason for unhooking the CI in the first place was to allow NDSL
`2026.02.00` make a hard breaking change with the Backen PR (Backend
object with properties instead of a string with the GT4Py backend name).
---
 .github/workflows/fv3_translate_tests.yaml | 5 +----
 .github/workflows/pace_tests.yaml          | 5 +----
 .github/workflows/shield_tests.yaml        | 5 +----
 3 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/.github/workflows/fv3_translate_tests.yaml b/.github/workflows/fv3_translate_tests.yaml
index 7d863e59..f28e5ad1 100644
--- a/.github/workflows/fv3_translate_tests.yaml
+++ b/.github/workflows/fv3_translate_tests.yaml
@@ -10,10 +10,7 @@ on:
 
 jobs:
   fv3_translate_tests:
-    # TODO
-    # restore once NDSL 2026.02.00 is released and pyFV3 is updated.
-    # uses: NOAA-GFDL/pyFV3/.github/workflows/translate.yaml@develop
-    uses: romanc/pyFV3/.github/workflows/translate.yaml@noop
+    uses: NOAA-GFDL/pyFV3/.github/workflows/translate.yaml@develop
     with:
       component_trigger: true
       component_name: NDSL
diff --git a/.github/workflows/pace_tests.yaml b/.github/workflows/pace_tests.yaml
index 2e9ae1f3..ea3d40b3 100644
--- a/.github/workflows/pace_tests.yaml
+++ b/.github/workflows/pace_tests.yaml
@@ -10,10 +10,7 @@ on:
 
 jobs:
   pace_main_tests:
-    # TODO
-    # restore once NDSL 2026.02.00 is released and pace is updated.
-    # uses: NOAA-GFDL/pace/.github/workflows/main_unit_tests.yaml@develop
-    uses: romanc/pace/.github/workflows/main_unit_tests.yaml@noop
+    uses: NOAA-GFDL/pace/.github/workflows/main_unit_tests.yaml@develop
     with:
       component_trigger: true
       component_name: NDSL
diff --git a/.github/workflows/shield_tests.yaml b/.github/workflows/shield_tests.yaml
index c5fa7c13..53ba510b 100644
--- a/.github/workflows/shield_tests.yaml
+++ b/.github/workflows/shield_tests.yaml
@@ -10,10 +10,7 @@ on:
 
 jobs:
   shield_translate_tests:
-    # TODO
-    # restore once NDSL 2026.02.00 is released and pySHiELD is updated.
-    # uses: NOAA-GFDL/pySHiELD/.github/workflows/translate.yaml@develop
-    uses: romanc/pySHiELD/.github/workflows/translate.yaml@noop
+    uses: NOAA-GFDL/pySHiELD/.github/workflows/translate.yaml@develop
     with:
       component_trigger: true
       component_name: NDSL

From 60f2c434d96b0ae211dbc6bfd995c0c9ae21c5b8 Mon Sep 17 00:00:00 2001
From: Florian Deconinck <deconinck.florian@gmail.com>
Date: Fri, 6 Mar 2026 10:20:55 -0500
Subject: [PATCH 02/28] Add `to_xarray` API to State (#395)

---
 ndsl/quantity/state.py | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/ndsl/quantity/state.py b/ndsl/quantity/state.py
index 4e1a7dd0..9447cccd 100644
--- a/ndsl/quantity/state.py
+++ b/ndsl/quantity/state.py
@@ -500,20 +500,10 @@ def _netcdf_name(self, directory_path: Path, postfix: str = "") -> Path:
             rank_postfix = f"_rank{MPI.COMM_WORLD.Get_rank()}"
         return directory_path / f"{type(self).__name__}{rank_postfix}{postfix}.nc4"
 
-    def to_netcdf(self, directory_path: Path | None = None, postfix: str = "") -> None:
+    def to_xarray(self) -> xr.DataTree:
         """
-        Save state to NetCDF. Can be reloaded with `update_from_netcdf`.
-
-        If applicable, will save separate NetCDF files for each running rank.
-
-        The file names are deduced from the class name, and post fix with rank number
-        in the case of a multi-process use.
-
-        Args:
-            directory_path: directory to save the netcdf in
+        Format the State into a xr.DataTree.
         """
-        if directory_path is None:
-            directory_path = Path("./")
 
         def _save_recursive(state: State) -> dict:
             local_data = {}
@@ -547,9 +537,24 @@ def _save_recursive(state: State) -> dict:
             datatree.pop(key)
         datatree["/"] = xr.Dataset(data_vars=top_level)
 
-        xr.DataTree.from_dict(datatree).to_netcdf(
-            self._netcdf_name(directory_path, postfix)
-        )
+        return xr.DataTree.from_dict(datatree)
+
+    def to_netcdf(self, directory_path: Path | None = None, postfix: str = "") -> None:
+        """
+        Save state to NetCDF. Can be reloaded with `update_from_netcdf`.
+
+        If applicable, will save separate NetCDF files for each running rank.
+
+        The file names are deduced from the class name, and post fix with rank number
+        in the case of a multi-process use.
+
+        Args:
+            directory_path: directory to save the netcdf in
+        """
+        if directory_path is None:
+            directory_path = Path("./")
+
+        self.to_xarray().to_netcdf(self._netcdf_name(directory_path, postfix))
 
     def update_from_netcdf(self, directory_path: Path, postfix: str = "") -> None:
         """This is a mirror of the `to_netcdf` method NOT a generic

From 7eb52db6291c4ed861ea90ac9b6619479037d1f0 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Fri, 6 Mar 2026 18:51:56 +0100
Subject: [PATCH 03/28] BREAKING CHANGE: drop support for `X_DIM` and friends,
 remove deprecated backend functions (#393)

* refactor: remove deprecated functions

* refactor: remove support for `X_DIM` and friends
---
 examples/NDSL/03_orchestration_basics.ipynb |  8 +++---
 ndsl/constants.py                           | 14 +++++------
 ndsl/dsl/gt4py_utils.py                     | 20 ---------------
 ndsl/stencils/corners.py                    | 27 +++------------------
 tests/test_zarr_monitor.py                  |  3 +--
 5 files changed, 16 insertions(+), 56 deletions(-)

diff --git a/examples/NDSL/03_orchestration_basics.ipynb b/examples/NDSL/03_orchestration_basics.ipynb
index 93f818d5..39320f56 100644
--- a/examples/NDSL/03_orchestration_basics.ipynb
+++ b/examples/NDSL/03_orchestration_basics.ipynb
@@ -35,7 +35,7 @@
     "    orchestrate,\n",
     "    QuantityFactory,\n",
     ")\n",
-    "from ndsl.constants import X_DIM, Y_DIM, Z_DIM\n",
+    "from ndsl.constants import I_DIM, J_DIM, K_DIM\n",
     "from ndsl.dsl.typing import FloatField, Float\n",
     "from ndsl.boilerplate import get_factories_single_tile_orchestrated"
    ]
@@ -93,7 +93,7 @@
     "            domain=grid_indexing.domain_compute(),\n",
     "        )\n",
     "        self._tmp_field = quantity_factory.zeros(\n",
-    "            [X_DIM, Y_DIM, Z_DIM], \"n/a\", dtype=dtype\n",
+    "            [I_DIM, J_DIM, K_DIM], \"n/a\", dtype=dtype\n",
     "        )\n",
     "        self._n_halo = quantity_factory.sizer.n_halo\n",
     "\n",
@@ -134,9 +134,9 @@
     "    )\n",
     "    local_sum = LocalSum(stencil_factory, qty_factory)\n",
     "\n",
-    "    in_field = qty_factory.zeros([X_DIM, Y_DIM, Z_DIM], \"n/a\", dtype=dtype)\n",
+    "    in_field = qty_factory.zeros([I_DIM, J_DIM, K_DIM], \"n/a\", dtype=dtype)\n",
     "    in_field.view[:] = 2.0\n",
-    "    out_field = qty_factory.zeros([X_DIM, Y_DIM, Z_DIM], \"n/a\", dtype=dtype)\n",
+    "    out_field = qty_factory.zeros([I_DIM, J_DIM, K_DIM], \"n/a\", dtype=dtype)\n",
     "\n",
     "    # Run\n",
     "    local_sum(in_field, out_field)"
diff --git a/ndsl/constants.py b/ndsl/constants.py
index d1685f1e..840df528 100644
--- a/ndsl/constants.py
+++ b/ndsl/constants.py
@@ -38,13 +38,13 @@ def _get_constant_version(
 # Common constants
 #####################
 
-I_DIM = X_DIM = "i"
-I_INTERFACE_DIM = X_INTERFACE_DIM = "i_interface"
-J_DIM = Y_DIM = "j"
-J_INTERFACE_DIM = Y_INTERFACE_DIM = "j_interface"
-K_DIM = Z_DIM = "k"
-K_INTERFACE_DIM = Z_INTERFACE_DIM = "k_interface"
-K_SOIL_DIM = Z_SOIL_DIM = "k_soil"
+I_DIM = "i"
+I_INTERFACE_DIM = "i_interface"
+J_DIM = "j"
+J_INTERFACE_DIM = "j_interface"
+K_DIM = "k"
+K_INTERFACE_DIM = "k_interface"
+K_SOIL_DIM = "k_soil"
 
 I_DIMS = (I_DIM, I_INTERFACE_DIM)
 J_DIMS = (J_DIM, J_INTERFACE_DIM)
diff --git a/ndsl/dsl/gt4py_utils.py b/ndsl/dsl/gt4py_utils.py
index 39afa30d..e4dcb532 100644
--- a/ndsl/dsl/gt4py_utils.py
+++ b/ndsl/dsl/gt4py_utils.py
@@ -1,4 +1,3 @@
-import warnings
 from collections.abc import Callable, Sequence
 from functools import wraps
 from typing import Any
@@ -447,25 +446,6 @@ def asarray(array, to_type=np.ndarray, dtype=None, order=None):
             return cp.asarray(array, dtype, order)
 
 
-def is_gpu_backend(backend: Backend) -> bool:
-    warnings.warn(
-        "Function `gt4py_utils.is_gpu_backend` is deprecated, please use `Backend.is_gpu_backend()`",
-        category=DeprecationWarning,
-        stacklevel=2,
-    )
-    return backend.is_gpu_backend()
-
-
-def backend_is_fortran_aligned(backend: Backend) -> bool:
-    warnings.warn(
-        "Function `gt4py_utils.backend_is_fortran_aligned` is deprecated "
-        "please use `Backend.backend_is_fortran_aligned()`",
-        category=DeprecationWarning,
-        stacklevel=2,
-    )
-    return backend.is_fortran_aligned()
-
-
 def zeros(shape, dtype=Float, *, backend: Backend):
     storage_type = cp.ndarray if backend.is_gpu_backend() else np.ndarray
     xp = cp if cp and storage_type is cp.ndarray else np
diff --git a/ndsl/stencils/corners.py b/ndsl/stencils/corners.py
index cdce6ae0..23873f5c 100644
--- a/ndsl/stencils/corners.py
+++ b/ndsl/stencils/corners.py
@@ -1,4 +1,3 @@
-import warnings
 from typing import Literal, TypeAlias, no_type_check
 
 from gt4py.cartesian import gtscript
@@ -10,19 +9,10 @@
 from ndsl.dsl.typing import FloatField
 
 
-FillCornersDirection: TypeAlias = Literal["i", "x", "j", "y"]
+FillCornersDirection: TypeAlias = Literal["i", "j"]
 GridType: TypeAlias = Literal["A", "B"]  # Arakawa grid type
 
 
-def _check_for_deprecation(axis: str) -> None:
-    if axis in ["x", "y"]:
-        warnings.warn(
-            f"Corners direction {axis} is deprecated use 'i' or 'j'",
-            category=DeprecationWarning,
-            stacklevel=2,
-        )
-
-
 def kslice_from_inputs(
     kstart: int, nk: int | None, grid_indexer: GridIndexing
 ) -> tuple[slice, int]:
@@ -365,13 +355,12 @@ def __init__(
             domain = default_domain
         """The full domain required to do corner computation everywhere"""
 
-        _check_for_deprecation(direction)
-        if direction in ["x", "i"]:
+        if direction in ["i"]:
             defn = fill_corners_bgrid_x_defn
-        elif direction in ["y", "j"]:
+        elif direction in ["j"]:
             defn = fill_corners_bgrid_y_defn
         else:
-            raise ValueError("Direction must be either 'x' or 'y'")
+            raise ValueError("Direction must be either 'i' or 'j'")
         externals = stencil_factory.grid_indexing.axis_offsets(
             origin=origin, domain=domain
         )
@@ -519,7 +508,6 @@ def fill_sw_corner_2d_bgrid(
     direction: FillCornersDirection,
     grid_indexer: GridIndexing,
 ) -> None:
-    _check_for_deprecation(direction)
     if direction in ["x", "i"]:
         q[grid_indexer.isc - i, grid_indexer.jsc - j, :] = q[
             grid_indexer.isc - j, grid_indexer.jsc + i, :
@@ -537,7 +525,6 @@ def fill_nw_corner_2d_bgrid(
     direction: FillCornersDirection,
     grid_indexer: GridIndexing,
 ) -> None:
-    _check_for_deprecation(direction)
     if direction in ["x", "i"]:
         q[grid_indexer.isc - i, grid_indexer.jec + 1 + j, :] = q[
             grid_indexer.isc - j, grid_indexer.jec + 1 - i, :
@@ -555,7 +542,6 @@ def fill_se_corner_2d_bgrid(
     direction: FillCornersDirection,
     grid_indexer: GridIndexing,
 ) -> None:
-    _check_for_deprecation(direction)
     if direction in ["x", "i"]:
         q[grid_indexer.iec + 1 + i, grid_indexer.jsc - j, :] = q[
             grid_indexer.iec + 1 + j, grid_indexer.jsc + i, :
@@ -573,7 +559,6 @@ def fill_ne_corner_2d_bgrid(
     direction: FillCornersDirection,
     grid_indexer: GridIndexing,
 ) -> None:
-    _check_for_deprecation(direction)
     if direction in ["x", "i"]:
         q[grid_indexer.iec + 1 + i, grid_indexer.jec + 1 + j :] = q[
             grid_indexer.iec + 1 + j, grid_indexer.jec + 1 - i, :
@@ -593,7 +578,6 @@ def fill_sw_corner_2d_agrid(
     kstart: int = 0,
     nk: int | None = None,
 ) -> None:
-    _check_for_deprecation(direction)
     kslice, nk = kslice_from_inputs(kstart, nk, grid_indexer)
     if direction in ["x", "i"]:
         q[grid_indexer.isc - i, grid_indexer.jsc - j, kslice] = q[
@@ -614,7 +598,6 @@ def fill_nw_corner_2d_agrid(
     kstart: int = 0,
     nk: int | None = None,
 ) -> None:
-    _check_for_deprecation(direction)
     kslice, nk = kslice_from_inputs(kstart, nk, grid_indexer)
     if direction in ["x", "i"]:
         q[grid_indexer.isc - i, grid_indexer.jec + j, kslice] = q[
@@ -635,7 +618,6 @@ def fill_se_corner_2d_agrid(
     kstart: int = 0,
     nk: int | None = None,
 ) -> None:
-    _check_for_deprecation(direction)
     kslice, nk = kslice_from_inputs(kstart, nk, grid_indexer)
     if direction in ["x", "i"]:
         q[grid_indexer.iec + i, grid_indexer.jsc - j, kslice] = q[
@@ -656,7 +638,6 @@ def fill_ne_corner_2d_agrid(
     kstart: int = 0,
     nk: int | None = None,
 ) -> None:
-    _check_for_deprecation(direction)
     kslice, nk = kslice_from_inputs(kstart, nk, grid_indexer)
     if direction in ["x", "i"]:
         q[grid_indexer.iec + i, grid_indexer.jec + j, kslice] = q[
diff --git a/tests/test_zarr_monitor.py b/tests/test_zarr_monitor.py
index 68b5f07e..7c0a6757 100644
--- a/tests/test_zarr_monitor.py
+++ b/tests/test_zarr_monitor.py
@@ -18,7 +18,6 @@
     J_INTERFACE_DIM,
     K_DIM,
     K_SOIL_DIM,
-    X_DIM,
 )
 from ndsl.monitor.zarr_monitor import ZarrMonitor, array_chunks, get_calendar
 from ndsl.optional_imports import zarr
@@ -95,7 +94,7 @@ def base_state(request, nz, ny, nx, numpy) -> dict:
         return {
             "var1": Quantity(
                 numpy.ones([ny, nx]),
-                dims=(J_DIM, X_DIM),
+                dims=(J_DIM, I_DIM),
                 units="m",
                 backend=Backend.python(),
             )

From ada2d5379540aba26681b2a1ea2588dd6895fc48 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Wed, 11 Mar 2026 14:39:46 +0100
Subject: [PATCH 04/28] pref: improve ochestration transpile/compile times
 (#396)

* perf: improve compile/transpile times in orchestration workflow

* keep validation right before compile
---
 external/dace                  |  2 +-
 external/gt4py                 |  2 +-
 ndsl/dsl/dace/orchestration.py | 12 ++----------
 3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/external/dace b/external/dace
index 8f7601b4..1fb39786 160000
--- a/external/dace
+++ b/external/dace
@@ -1 +1 @@
-Subproject commit 8f7601b4e4953a1b785ac676701fc1d4e6540b37
+Subproject commit 1fb397865e89c6b8907c4de0cded046e153b48ac
diff --git a/external/gt4py b/external/gt4py
index ba53a691..24b2dab3 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit ba53a691b6e5cf893cbba07cdbd0e88f2444cf9a
+Subproject commit 24b2dab321a25a87d3d4c36ed20e0c3fc6c525d5
diff --git a/ndsl/dsl/dace/orchestration.py b/ndsl/dsl/dace/orchestration.py
index 4037ca7f..38426b8d 100644
--- a/ndsl/dsl/dace/orchestration.py
+++ b/ndsl/dsl/dace/orchestration.py
@@ -122,7 +122,7 @@ def _to_gpu(sdfg: SDFG) -> None:
 def _simplify(
     sdfg: SDFG,
     *,
-    validate: bool = True,
+    validate: bool = False,
     validate_all: bool = False,
     verbose: bool = False,
 ) -> dict | None:
@@ -146,9 +146,6 @@ def _build_sdfg(
     backend_name = config.get_backend()
 
     if is_compiling:
-        with DaCeProgress(config, "Validate original SDFG"):
-            sdfg.validate()
-
         # Fully specialize all known symbols and then propagate these changes in the simplify
         # pass that follows. This is not only a smart idea in general, but also simplifies (haha)
         # the schedule tree (optimization) roundtrip.
@@ -271,9 +268,6 @@ def _build_sdfg(
                 negative_delp_checker(sdfg)
                 negative_qtracers_checker(sdfg)
 
-        with DaCeProgress(config, "Validate before compile"):
-            sdfg.validate()
-
         # Compile
         with DaCeProgress(config, "Codegen & compile"):
             sdfg.compile()
@@ -646,9 +640,7 @@ def __call__(self, *arg, **kwarg):  # type: ignore[no-untyped-def]
                 return wrapped(*arg, **kwarg)
 
             def __sdfg__(self, *args, **kwargs):  # type: ignore[no-untyped-def]
-                sdfg = wrapped.__sdfg__(*args, **kwargs)
-                sdfg.validate()
-                return sdfg
+                return wrapped.__sdfg__(*args, **kwargs)
 
             def __sdfg_closure__(self, reevaluate=None):  # type: ignore[no-untyped-def]
                 return wrapped.__sdfg_closure__(reevaluate)

From b6a25a386c8461309a63d8d88391c33a07bf7bcf Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Thu, 12 Mar 2026 15:09:30 +0100
Subject: [PATCH 05/28] build: update gt4py to get compiler support (#399)

* build: update gt4py to get compiler support

The PR adds extended compiler support by updating GT4Py, which now
auto-detects compilers (gnu, intel, clang, and apple-clang) and sets
defaults for the compiler flags accordingly. For example, not all
compilers have the same OpenMP flags and `apple-clang` doesn't support
it out of the box anyway. All of this is now caputred at the GT4Py
level.

In additition, GT4Py now automatically disables FMA operations in case
of `-O0` (optimization level 0) to help with stability in porting when
comparing to Fortran generated reference data.

* Apply gt4py.cartesian compiler default to orchestration pipeline via DaceConfig
Fix optimization level (potentially) ignored on GPU

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
---
 external/gt4py               |  2 +-
 ndsl/dsl/__init__.py         | 10 ----------
 ndsl/dsl/dace/dace_config.py | 21 +++++++++++++--------
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/external/gt4py b/external/gt4py
index 24b2dab3..2d2511ad 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit 24b2dab321a25a87d3d4c36ed20e0c3fc6c525d5
+Subproject commit 2d2511ad0652ca92f6f36ce48f484d61d8939c50
diff --git a/ndsl/dsl/__init__.py b/ndsl/dsl/__init__.py
index 202b1569..f562b982 100644
--- a/ndsl/dsl/__init__.py
+++ b/ndsl/dsl/__init__.py
@@ -1,6 +1,5 @@
 # Literal precision for both GT4Py & NDSL
 import os
-import platform
 import sys
 from typing import Literal
 
@@ -36,15 +35,6 @@ def _get_literal_precision(default: Literal["32", "64"] = "64") -> Literal["32",
 os.environ["GT4PY_LITERAL_INT_PRECISION"] = str(NDSL_GLOBAL_PRECISION)
 os.environ["GT4PY_LITERAL_FLOAT_PRECISION"] = str(NDSL_GLOBAL_PRECISION)
 
-# OpenMP handling
-
-detected_macos = platform.system() == "Darwin"
-if detected_macos:
-    ndsl_log.warning(
-        "Multithreading is deactivated under MacOS due to apple-clang not handling OpenMP by default."
-    )
-os.environ["GT4PY_CARTESIAN_ENABLE_OPENMP"] = "False" if detected_macos else "True"
-
 
 # Set cache names for default gt backends workflow
 import gt4py.cartesian.config  # noqa: E402
diff --git a/ndsl/dsl/dace/dace_config.py b/ndsl/dsl/dace/dace_config.py
index 32dfdb9f..7f1a4004 100644
--- a/ndsl/dsl/dace/dace_config.py
+++ b/ndsl/dsl/dace/dace_config.py
@@ -7,6 +7,7 @@
 
 import dace.config
 from gt4py.cartesian.config import GT4PY_COMPILE_OPT_LEVEL
+from gt4py.cartesian.utils.compiler import cxx_compiler_defaults, gpu_configuration
 
 from ndsl import LocalComm
 from ndsl.comm.communicator import Communicator
@@ -226,23 +227,18 @@ def __init__(
             else:
                 dace.config.Config.set("compiler", "build_type", value="Release")
 
-            # Required to True for gt4py storage/memory
-            dace.config.Config.set(
-                "compiler",
-                "allow_view_arguments",
-                value=True,
-            )
             # Resolve "march/mtune" option for GPU
             # - turn on numeric-centric SSE by default
             # - Neoverse-V2 Grace CPU is too new for GCC 14 and -march=native will fail
             # - use alternative march=armv8-a instead
             march_cpu = "armv8-a" if is_arm_neoverse else "native"
             # Removed --fmath
+            cxx_defaults = cxx_compiler_defaults(GT4PY_COMPILE_OPT_LEVEL)
             dace.config.Config.set(
                 "compiler",
                 "cpu",
                 "args",
-                value=f"-march={march_cpu} -std=c++17 -fPIC -Wall -Wextra -O{optimization_level}",
+                value=f"-march={march_cpu} -std=c++17 -fPIC -Wall -Wextra -O{optimization_level} {cxx_defaults.cxx_compile_flags}",
             )
             # Potentially buggy - deactivate
             dace.config.Config.set(
@@ -257,11 +253,12 @@ def __init__(
             # - use alternative mcpu=native instead
             march_option = "-mcpu=native" if is_arm_neoverse else "-march=native"
             # Removed --fast-math
+            gpu_config = gpu_configuration(GT4PY_COMPILE_OPT_LEVEL)
             dace.config.Config.set(
                 "compiler",
                 "cuda",
                 "args",
-                value=f"-std=c++14 -Xcompiler -fPIC -O3 -Xcompiler {march_option}",
+                value=f"-std=c++14 -Xcompiler -fPIC -O{optimization_level} -Xcompiler {march_option} {gpu_config.gpu_compile_flags}",
             )
 
             cuda_sm = cp.cuda.Device(0).compute_capability if cp else 60
@@ -280,6 +277,14 @@ def __init__(
                 "max_concurrent_streams",
                 value=-1,  # no concurrent streams, every kernel on defaultStream
             )
+
+            # Required to True for gt4py storage/memory
+            dace.config.Config.set(
+                "compiler",
+                "allow_view_arguments",
+                value=True,
+            )
+
             # Speed up built time
             dace.config.Config.set(
                 "compiler",

From 38197e1c971433e8c4ef3ba67e74c759f788a944 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Mon, 16 Mar 2026 13:28:01 +0100
Subject: [PATCH 06/28] gt4py update: fix GCC 12/13 compiler flags (#400)

Propage Cxx flags that work both with GCC 12 and 13.
---
 external/gt4py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/external/gt4py b/external/gt4py
index 2d2511ad..47f7a9a1 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit 2d2511ad0652ca92f6f36ce48f484d61d8939c50
+Subproject commit 47f7a9a13d6ac9c8bfb5dab38f18456f51847cdf

From eb8c61819816a22c7b791533512d5ac89693d007 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Mon, 16 Mar 2026 18:26:32 +0100
Subject: [PATCH 07/28] docs|ci: versioning and release (#401)

* For release `2025.03.00` (#127)

* updating 4d handling

* debug 4d test data

* more iter

* moving ser_to_nc here

* updating datatype in translate test

* typing works

* fix dict, lint

* remove empty line

* change from 4d to Nd

* Expose `k_start` and `k_end` automatically for any FrozenStencil

* Fix k_start + utest

* lint

* Fix for 2d stencils

* Add threshold overrides to the multimodal metric

* Always report results, add summary with one liners

* Remove "mmr" from the keys

* README in testing

* Better Latex (?)

* Better Latex (?)

* fixing a typo that breaks bools in translate tests (#80)

* Fix summary filename

* Fix report, filename

* Fix choosing right absolute difference for F32

* Make robust for NaN value

* Detect when array have different dimensions, if only one dimension, collapse
Clean up type infer and log work

* Lint

* Add rank 0 to the data

* Check data exists for rank, skip & print if not

* Fix bad logic on skip test for parallel

* Verbose exported names

* Make boilerplate calls more nimble

* New option: `which_savepoint`
Better error on bad output data
Fix missing integer type check

* QOL for mypy/flak8 type hints

* Add SECONDS_PER_DAY as a constants following mixed precision standards

* Lint

* Cleanups in dace orchestration

Readability improvements in dace orchestration including

- early returns
- spelling out variable names
- fixing typos

* Rename program -> dace_program

* Make sure all constants adhere to the floating point precision set by the system

* Move `is_float` to `dsl.typing`

* Move Quantity to sub-directory + breakout the subcomponent

* Fix tests

* Lint

* Remove `cp.ndarray` since cupy is optional

* Restore workaround for optional cupy

* "GFS" -> "UFS"

* Cupy trick for metadata

* Add comments for constant explanation

* Describe 64/32-bit FloatFields

* Make sure the `make_storage_data` respects the array dtype.

* Fix logic for MultiModal metric and verbose it

* Added an MPI all_reduce for quantities based on SUM operation to communicator.py

* linted

* Add initial skeleton of pytest test for all reduce

* Added assertion tests for 1, 2 and 3D quantities passed through mpi_allreduce_sum

* Linted

* Added pytest.mark to skip test if mpi4py isn't available

* lint changes

* Addressed PR comments and added additional CPU backends to unit test

* Added setters for various Quantity properties to enable setting of Quantity metadata and data properties.

* Added function in QuantityMetadata class that allows copying of Metadata properties from one class to another.  Subsequent Quantity setters that performed the copying of QuantityMetadata properties were removed

* Expose all SG metric terms in grid_data

* Add `Allreduce` and all MPI OP

* Update utest

* Fix `local_comm`

* Fix utest

* Enforce `comm_abc.Comm` into Communicator

* Fix `comm` object in serial utest

* Lint + `MPIComm` on testing architecture

* Make sure the correct allocator backend is used for Quantities

* Add in_place option for Allreduce

* Cleanup ndsl/dsl/dace/utils.py (#96)

* Fix typos
* DaCeProgress: avoid double assignment of prefix
* Add type hints/simplify kernel_theoretical_timing

Adding type hints allowed to simplify `kernel_theoretical_timing`.

* Fix merge

* Hotfix for grid generation use of mpi operators

* Merge examples/mpi/.gitignore into top-level .gitignore

* Remove hard-coded __version__ numbers

Removes hard-coded version numbers from `__init__` files.

* Fixing a bunch of typos

* hotfix netcdf version for dockerfiles

* Updated version number in setup.py to reflect new release, 2025.01.00

* Adding in exception for compute domains with sizes less than or equal to halo size (#103)

* Adding in exception for compute domains with less than 4 points to vector_halo_update method

* Updated exception in communicator to compare halo size to compute domain size

* linting

* Moved domain size checker to SubtileGridSizer class method from_tile_params

* Fix passing down ak/bk for pressure coefficients when they are available from an outside source (online model case) (#107)

* [QOL] Logging, Type Hints and Quantity helpers (#108)

* Log on rank 0
Docstrings & typi hints on logger
Stencil Config has a `verbose` option
On verbose: FrozenStencil log when run (in GT backends)

* Update `config` in orchestrate call to solve type hint inconcistencies

* Quantity helper `to_netcdf` with multi rank support

* Automatic Int precision and stencil regeneration change (#104)

* Added feature to enable automatic detection of integer precision. Should remove the need for i32/i64 declaration (although their functionality is still retained) and replace both with the regular Int type

* change default rebuild state to false for get_factories

* Merged Float and Int precision detection functions into one common path

* Re-added old function to fulfil a PACE dependency

* updated docstring

* Added ability to declare 32 or 64 bit IntFields, overrulling the system precision

* Added one dimensional bool fields

* Fix error message in typing.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* output type for global_set_precision

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Bump DaCe to v1.0.1 (#109)

Our current DaCe version is some commit from September 2024. Meanwhile DaCe matured to v1 and recently release v1.0.1. This brings the DaCe submodule to the latest stable release version.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Streamline linting workflow (#110)

Linting should give fast feedback. The current workflow takes ~3mins where most of the time is spent installing (unnecessary) python packages. To run `pre-commit`, we only need the source files and `pre-commit` itself, which can be installed standalone. This brings runtime of the linting stage down to ~30 seconds.

Other changes

- update checkout action to v4
- update python setup action to v5
- change python version from 3.11.7 to 3.11 (any patch number will do)

This is a follow-up of PR https://github.com/NOAA-GFDL/PyFV3/pull/40 in PyFV3.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [FIX] Type hint for precision dependant Float, Int (#111)

* Fix the type hint of Float, Int

* Attempt using TypeAlias

* Feature: Adding documentation (#97)

* Added doc files

* Adding image files to docs

* Linting

* Updated docs to reflect changes requested in PR 97

* Linting

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Translate test] Save better reports & netCDF for multiple ranks on failure (#106)

* Save reports & netCDF for multiple ranks on failure
Fix multi modal threshold for parallel tests

* Order field by name in NetCDF

* Print all indices in logs. Sort by descernding ULP

* Allow sorting by metrics and index with `--sort_report` option

* Remove the `rank` froom SavepointCase. Access is done via `grid`

* Some docstrings

* Adds some quick capacities used in the post-radiation phase of the physics, including the  Stefan-Boltzmann constant (#116)

* add namelist option

* add stephan boltzmann constant

* lint

* Apply suggestions from code review

Change comments to docstring style

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Adding temperature of h2o triple point (#115)

* add ttp

* Update ndsl/constants.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* switch comments to docstrings for autodocs

* lint

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Feature] Porting workflow: enhancing errors readability (#114)

* Save all fields (pass and fail) and organize them by field

* Option `--no_report` to bypass logging & netcdf save
Move logs per variable into a `details` subfolder

* Order variable name in serialbox-to-netcdf

* `extra_data_load` function to load savepoint data saved outside the canonical savepoint

* Docs / Type Hint

* Fixed typo in error statment

---------

Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>

* Feature: NetCDF output precision configurable (#117)

* Removed hard-code of np.float32 from NetCDFMonitor transfer_type, replaced with Float type

* Added multiple options for NetCDF precision

* Added checking for use of 32 precision and float64 output

* Using NumPy type instead of string in NetCDFMonitor precision variable

* Added warning to netcdf_monitor.py for mismatch in precision settings

* Forgot f-string in warn message of netcdf_monitor

* Mixed Precision fixes and QOL (#118)

* Ignore `.next` caches

* CNST_OP20 is a true 64-bit

* Translate: Fix reading parameters with the right precision

* Multimodal metric: Skip reporting on expected values

* Bad commit

* Add license (Apache 2.0) (#105)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Change deprecated `np.product()` to `np.prod()` (#120)

Starting with numpy v1.25.0, `np.product()` is deprecated and
`np.prod()` should be used instead.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Update GT4Py and DaCe to bring in refactored GT4Py/DaCe bridge that exposes control flow (#119)

* Update DaCe to v1.0.2

DaCe v1.0.2 brings two fixes for DaCe transformations: one for
DeadDataflowElimination and one for StateFusion.

* Bump gt4py to include refactored gt4py/dace bridge

* Test with modified pace pipeline

- added this to re-trigger the new pace pipeline after limiting zarr to
  not install v3 (for now) because of breaking API changes.
- added this note to re-trigger after fixing the pace pipeline to not
  pull requirements from `develop`.
- added this note to ret-trigger after fixing the repo name

* Revert "Test with modified pace pipeline"

This reverts commit cd6560ea6129663d3445fafb36d02f03cb661b4d.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Grid Mixed Precision and Coriolis force load (+ QOL) (#121)

* Pass `dtype` down in allocator utils (gt4py_utils)

* Allow coriolis forces to be read in

* Edge factors are always 64-bit

* Quantity QOL

* Make sure to pass `dtype` to load the grid cleanly

* Translate grid: load coriolis forces, area 64 is 64-bit

* Bad merge

* Typo

* GEOS version of dz_min (#122)

* Doc enhancment (#123)

**Description**
Port and adaptation of the initial commit of the documentation.

Fixes issue https://github.com/NOAA-GFDL/NDSL/issues/113


**Checklist:**
- [X] I have performed a self-review of my own code
- [X] I have made corresponding changes to the documentation
- [X] My changes generate no new warnings

* Fix saving NetCDF for parallel translate test (#125)

* Release candidate 2025.03.00 (#124)

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Fix for bad merge of 7fdfa5 (#129)

---------

Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Roman Cattaneo <>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

* NDSL 2025.11.00 (#333)

* Check gt4py-backend options in config (#291)

* For release `2025.03.00` (#127)

* updating 4d handling

* debug 4d test data

* more iter

* moving ser_to_nc here

* updating datatype in translate test

* typing works

* fix dict, lint

* remove empty line

* change from 4d to Nd

* Expose `k_start` and `k_end` automatically for any FrozenStencil

* Fix k_start + utest

* lint

* Fix for 2d stencils

* Add threshold overrides to the multimodal metric

* Always report results, add summary with one liners

* Remove "mmr" from the keys

* README in testing

* Better Latex (?)

* Better Latex (?)

* fixing a typo that breaks bools in translate tests (#80)

* Fix summary filename

* Fix report, filename

* Fix choosing right absolute difference for F32

* Make robust for NaN value

* Detect when array have different dimensions, if only one dimension, collapse
Clean up type infer and log work

* Lint

* Add rank 0 to the data

* Check data exists for rank, skip & print if not

* Fix bad logic on skip test for parallel

* Verbose exported names

* Make boilerplate calls more nimble

* New option: `which_savepoint`
Better error on bad output data
Fix missing integer type check

* QOL for mypy/flak8 type hints

* Add SECONDS_PER_DAY as a constants following mixed precision standards

* Lint

* Cleanups in dace orchestration

Readability improvements in dace orchestration including

- early returns
- spelling out variable names
- fixing typos

* Rename program -> dace_program

* Make sure all constants adhere to the floating point precision set by the system

* Move `is_float` to `dsl.typing`

* Move Quantity to sub-directory + breakout the subcomponent

* Fix tests

* Lint

* Remove `cp.ndarray` since cupy is optional

* Restore workaround for optional cupy

* "GFS" -> "UFS"

* Cupy trick for metadata

* Add comments for constant explanation

* Describe 64/32-bit FloatFields

* Make sure the `make_storage_data` respects the array dtype.

* Fix logic for MultiModal metric and verbose it

* Added an MPI all_reduce for quantities based on SUM operation to communicator.py

* linted

* Add initial skeleton of pytest test for all reduce

* Added assertion tests for 1, 2 and 3D quantities passed through mpi_allreduce_sum

* Linted

* Added pytest.mark to skip test if mpi4py isn't available

* lint changes

* Addressed PR comments and added additional CPU backends to unit test

* Added setters for various Quantity properties to enable setting of Quantity metadata and data properties.

* Added function in QuantityMetadata class that allows copying of Metadata properties from one class to another.  Subsequent Quantity setters that performed the copying of QuantityMetadata properties were removed

* Expose all SG metric terms in grid_data

* Add `Allreduce` and all MPI OP

* Update utest

* Fix `local_comm`

* Fix utest

* Enforce `comm_abc.Comm` into Communicator

* Fix `comm` object in serial utest

* Lint + `MPIComm` on testing architecture

* Make sure the correct allocator backend is used for Quantities

* Add in_place option for Allreduce

* Cleanup ndsl/dsl/dace/utils.py (#96)

* Fix typos
* DaCeProgress: avoid double assignment of prefix
* Add type hints/simplify kernel_theoretical_timing

Adding type hints allowed to simplify `kernel_theoretical_timing`.

* Fix merge

* Hotfix for grid generation use of mpi operators

* Merge examples/mpi/.gitignore into top-level .gitignore

* Remove hard-coded __version__ numbers

Removes hard-coded version numbers from `__init__` files.

* Fixing a bunch of typos

* hotfix netcdf version for dockerfiles

* Updated version number in setup.py to reflect new release, 2025.01.00

* Adding in exception for compute domains with sizes less than or equal to halo size (#103)

* Adding in exception for compute domains with less than 4 points to vector_halo_update method

* Updated exception in communicator to compare halo size to compute domain size

* linting

* Moved domain size checker to SubtileGridSizer class method from_tile_params

* Fix passing down ak/bk for pressure coefficients when they are available from an outside source (online model case) (#107)

* [QOL] Logging, Type Hints and Quantity helpers (#108)

* Log on rank 0
Docstrings & typi hints on logger
Stencil Config has a `verbose` option
On verbose: FrozenStencil log when run (in GT backends)

* Update `config` in orchestrate call to solve type hint inconcistencies

* Quantity helper `to_netcdf` with multi rank support

* Automatic Int precision and stencil regeneration change (#104)

* Added feature to enable automatic detection of integer precision. Should remove the need for i32/i64 declaration (although their functionality is still retained) and replace both with the regular Int type

* change default rebuild state to false for get_factories

* Merged Float and Int precision detection functions into one common path

* Re-added old function to fulfil a PACE dependency

* updated docstring

* Added ability to declare 32 or 64 bit IntFields, overrulling the system precision

* Added one dimensional bool fields

* Fix error message in typing.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* output type for global_set_precision

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Bump DaCe to v1.0.1 (#109)

Our current DaCe version is some commit from September 2024. Meanwhile DaCe matured to v1 and recently release v1.0.1. This brings the DaCe submodule to the latest stable release version.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Streamline linting workflow (#110)

Linting should give fast feedback. The current workflow takes ~3mins where most of the time is spent installing (unnecessary) python packages. To run `pre-commit`, we only need the source files and `pre-commit` itself, which can be installed standalone. This brings runtime of the linting stage down to ~30 seconds.

Other changes

- update checkout action to v4
- update python setup action to v5
- change python version from 3.11.7 to 3.11 (any patch number will do)

This is a follow-up of PR https://github.com/NOAA-GFDL/PyFV3/pull/40 in PyFV3.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [FIX] Type hint for precision dependant Float, Int (#111)

* Fix the type hint of Float, Int

* Attempt using TypeAlias

* Feature: Adding documentation (#97)

* Added doc files

* Adding image files to docs

* Linting

* Updated docs to reflect changes requested in PR 97

* Linting

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Translate test] Save better reports & netCDF for multiple ranks on failure (#106)

* Save reports & netCDF for multiple ranks on failure
Fix multi modal threshold for parallel tests

* Order field by name in NetCDF

* Print all indices in logs. Sort by descernding ULP

* Allow sorting by metrics and index with `--sort_report` option

* Remove the `rank` froom SavepointCase. Access is done via `grid`

* Some docstrings

* Adds some quick capacities used in the post-radiation phase of the physics, including the  Stefan-Boltzmann constant (#116)

* add namelist option

* add stephan boltzmann constant

* lint

* Apply suggestions from code review

Change comments to docstring style

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Adding temperature of h2o triple point (#115)

* add ttp

* Update ndsl/constants.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* switch comments to docstrings for autodocs

* lint

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Feature] Porting workflow: enhancing errors readability (#114)

* Save all fields (pass and fail) and organize them by field

* Option `--no_report` to bypass logging & netcdf save
Move logs per variable into a `details` subfolder

* Order variable name in serialbox-to-netcdf

* `extra_data_load` function to load savepoint data saved outside the canonical savepoint

* Docs / Type Hint

* Fixed typo in error statment

---------

Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>

* Feature: NetCDF output precision configurable (#117)

* Removed hard-code of np.float32 from NetCDFMonitor transfer_type, replaced with Float type

* Added multiple options for NetCDF precision

* Added checking for use of 32 precision and float64 output

* Using NumPy type instead of string in NetCDFMonitor precision variable

* Added warning to netcdf_monitor.py for mismatch in precision settings

* Forgot f-string in warn message of netcdf_monitor

* Mixed Precision fixes and QOL (#118)

* Ignore `.next` caches

* CNST_OP20 is a true 64-bit

* Translate: Fix reading parameters with the right precision

* Multimodal metric: Skip reporting on expected values

* Bad commit

* Add license (Apache 2.0) (#105)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Change deprecated `np.product()` to `np.prod()` (#120)

Starting with numpy v1.25.0, `np.product()` is deprecated and
`np.prod()` should be used instead.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Update GT4Py and DaCe to bring in refactored GT4Py/DaCe bridge that exposes control flow (#119)

* Update DaCe to v1.0.2

DaCe v1.0.2 brings two fixes for DaCe transformations: one for
DeadDataflowElimination and one for StateFusion.

* Bump gt4py to include refactored gt4py/dace bridge

* Test with modified pace pipeline

- added this to re-trigger the new pace pipeline after limiting zarr to
  not install v3 (for now) because of breaking API changes.
- added this note to re-trigger after fixing the pace pipeline to not
  pull requirements from `develop`.
- added this note to ret-trigger after fixing the repo name

* Revert "Test with modified pace pipeline"

This reverts commit cd6560ea6129663d3445fafb36d02f03cb661b4d.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Grid Mixed Precision and Coriolis force load (+ QOL) (#121)

* Pass `dtype` down in allocator utils (gt4py_utils)

* Allow coriolis forces to be read in

* Edge factors are always 64-bit

* Quantity QOL

* Make sure to pass `dtype` to load the grid cleanly

* Translate grid: load coriolis forces, area 64 is 64-bit

* Bad merge

* Typo

* GEOS version of dz_min (#122)

* Doc enhancment (#123)

**Description**
Port and adaptation of the initial commit of the documentation.

Fixes issue https://github.com/NOAA-GFDL/NDSL/issues/113


**Checklist:**
- [X] I have performed a self-review of my own code
- [X] I have made corresponding changes to the documentation
- [X] My changes generate no new warnings

* Fix saving NetCDF for parallel translate test (#125)

* Release candidate 2025.03.00 (#124)

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Fix for bad merge of 7fdfa5 (#129)

---------

Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Roman Cattaneo <>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

* check for backend existence in config

* pc

* update stale backend name

---------

Co-authored-by: Frank Malatino <142349306+fmalatino@users.noreply.github.com>
Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Frank Malatino <frank.malatino@noaa.gov>

* fix: allow any Comm object in ZarrMonitor (#292)

This PR is fallout from adding types in PR #257 and #258. The
`ZarrMonitor` provides a `DummyComm` which is instantiated in case no
`Comm` object is given. The type of the `Comm` object in `ZarrMonitor`
was wrongly limited to that `DummyComm`, which only broke when we
attempted to update the submodule in `pace`.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Patch domain checks to only happen once (#293)

* For release `2025.03.00` (#127)

* updating 4d handling

* debug 4d test data

* more iter

* moving ser_to_nc here

* updating datatype in translate test

* typing works

* fix dict, lint

* remove empty line

* change from 4d to Nd

* Expose `k_start` and `k_end` automatically for any FrozenStencil

* Fix k_start + utest

* lint

* Fix for 2d stencils

* Add threshold overrides to the multimodal metric

* Always report results, add summary with one liners

* Remove "mmr" from the keys

* README in testing

* Better Latex (?)

* Better Latex (?)

* fixing a typo that breaks bools in translate tests (#80)

* Fix summary filename

* Fix report, filename

* Fix choosing right absolute difference for F32

* Make robust for NaN value

* Detect when array have different dimensions, if only one dimension, collapse
Clean up type infer and log work

* Lint

* Add rank 0 to the data

* Check data exists for rank, skip & print if not

* Fix bad logic on skip test for parallel

* Verbose exported names

* Make boilerplate calls more nimble

* New option: `which_savepoint`
Better error on bad output data
Fix missing integer type check

* QOL for mypy/flak8 type hints

* Add SECONDS_PER_DAY as a constants following mixed precision standards

* Lint

* Cleanups in dace orchestration

Readability improvements in dace orchestration including

- early returns
- spelling out variable names
- fixing typos

* Rename program -> dace_program

* Make sure all constants adhere to the floating point precision set by the system

* Move `is_float` to `dsl.typing`

* Move Quantity to sub-directory + breakout the subcomponent

* Fix tests

* Lint

* Remove `cp.ndarray` since cupy is optional

* Restore workaround for optional cupy

* "GFS" -> "UFS"

* Cupy trick for metadata

* Add comments for constant explanation

* Describe 64/32-bit FloatFields

* Make sure the `make_storage_data` respects the array dtype.

* Fix logic for MultiModal metric and verbose it

* Added an MPI all_reduce for quantities based on SUM operation to communicator.py

* linted

* Add initial skeleton of pytest test for all reduce

* Added assertion tests for 1, 2 and 3D quantities passed through mpi_allreduce_sum

* Linted

* Added pytest.mark to skip test if mpi4py isn't available

* lint changes

* Addressed PR comments and added additional CPU backends to unit test

* Added setters for various Quantity properties to enable setting of Quantity metadata and data properties.

* Added function in QuantityMetadata class that allows copying of Metadata properties from one class to another.  Subsequent Quantity setters that performed the copying of QuantityMetadata properties were removed

* Expose all SG metric terms in grid_data

* Add `Allreduce` and all MPI OP

* Update utest

* Fix `local_comm`

* Fix utest

* Enforce `comm_abc.Comm` into Communicator

* Fix `comm` object in serial utest

* Lint + `MPIComm` on testing architecture

* Make sure the correct allocator backend is used for Quantities

* Add in_place option for Allreduce

* Cleanup ndsl/dsl/dace/utils.py (#96)

* Fix typos
* DaCeProgress: avoid double assignment of prefix
* Add type hints/simplify kernel_theoretical_timing

Adding type hints allowed to simplify `kernel_theoretical_timing`.

* Fix merge

* Hotfix for grid generation use of mpi operators

* Merge examples/mpi/.gitignore into top-level .gitignore

* Remove hard-coded __version__ numbers

Removes hard-coded version numbers from `__init__` files.

* Fixing a bunch of typos

* hotfix netcdf version for dockerfiles

* Updated version number in setup.py to reflect new release, 2025.01.00

* Adding in exception for compute domains with sizes less than or equal to halo size (#103)

* Adding in exception for compute domains with less than 4 points to vector_halo_update method

* Updated exception in communicator to compare halo size to compute domain size

* linting

* Moved domain size checker to SubtileGridSizer class method from_tile_params

* Fix passing down ak/bk for pressure coefficients when they are available from an outside source (online model case) (#107)

* [QOL] Logging, Type Hints and Quantity helpers (#108)

* Log on rank 0
Docstrings & typi hints on logger
Stencil Config has a `verbose` option
On verbose: FrozenStencil log when run (in GT backends)

* Update `config` in orchestrate call to solve type hint inconcistencies

* Quantity helper `to_netcdf` with multi rank support

* Automatic Int precision and stencil regeneration change (#104)

* Added feature to enable automatic detection of integer precision. Should remove the need for i32/i64 declaration (although their functionality is still retained) and replace both with the regular Int type

* change default rebuild state to false for get_factories

* Merged Float and Int precision detection functions into one common path

* Re-added old function to fulfil a PACE dependency

* updated docstring

* Added ability to declare 32 or 64 bit IntFields, overrulling the system precision

* Added one dimensional bool fields

* Fix error message in typing.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* output type for global_set_precision

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Bump DaCe to v1.0.1 (#109)

Our current DaCe version is some commit from September 2024. Meanwhile DaCe matured to v1 and recently release v1.0.1. This brings the DaCe submodule to the latest stable release version.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Streamline linting workflow (#110)

Linting should give fast feedback. The current workflow takes ~3mins where most of the time is spent installing (unnecessary) python packages. To run `pre-commit`, we only need the source files and `pre-commit` itself, which can be installed standalone. This brings runtime of the linting stage down to ~30 seconds.

Other changes

- update checkout action to v4
- update python setup action to v5
- change python version from 3.11.7 to 3.11 (any patch number will do)

This is a follow-up of PR https://github.com/NOAA-GFDL/PyFV3/pull/40 in PyFV3.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [FIX] Type hint for precision dependant Float, Int (#111)

* Fix the type hint of Float, Int

* Attempt using TypeAlias

* Feature: Adding documentation (#97)

* Added doc files

* Adding image files to docs

* Linting

* Updated docs to reflect changes requested in PR 97

* Linting

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Translate test] Save better reports & netCDF for multiple ranks on failure (#106)

* Save reports & netCDF for multiple ranks on failure
Fix multi modal threshold for parallel tests

* Order field by name in NetCDF

* Print all indices in logs. Sort by descernding ULP

* Allow sorting by metrics and index with `--sort_report` option

* Remove the `rank` froom SavepointCase. Access is done via `grid`

* Some docstrings

* Adds some quick capacities used in the post-radiation phase of the physics, including the  Stefan-Boltzmann constant (#116)

* add namelist option

* add stephan boltzmann constant

* lint

* Apply suggestions from code review

Change comments to docstring style

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Adding temperature of h2o triple point (#115)

* add ttp

* Update ndsl/constants.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* switch comments to docstrings for autodocs

* lint

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Feature] Porting workflow: enhancing errors readability (#114)

* Save all fields (pass and fail) and organize them by field

* Option `--no_report` to bypass logging & netcdf save
Move logs per variable into a `details` subfolder

* Order variable name in serialbox-to-netcdf

* `extra_data_load` function to load savepoint data saved outside the canonical savepoint

* Docs / Type Hint

* Fixed typo in error statment

---------

Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>

* Feature: NetCDF output precision configurable (#117)

* Removed hard-code of np.float32 from NetCDFMonitor transfer_type, replaced with Float type

* Added multiple options for NetCDF precision

* Added checking for use of 32 precision and float64 output

* Using NumPy type instead of string in NetCDFMonitor precision variable

* Added warning to netcdf_monitor.py for mismatch in precision settings

* Forgot f-string in warn message of netcdf_monitor

* Mixed Precision fixes and QOL (#118)

* Ignore `.next` caches

* CNST_OP20 is a true 64-bit

* Translate: Fix reading parameters with the right precision

* Multimodal metric: Skip reporting on expected values

* Bad commit

* Add license (Apache 2.0) (#105)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Change deprecated `np.product()` to `np.prod()` (#120)

Starting with numpy v1.25.0, `np.product()` is deprecated and
`np.prod()` should be used instead.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Update GT4Py and DaCe to bring in refactored GT4Py/DaCe bridge that exposes control flow (#119)

* Update DaCe to v1.0.2

DaCe v1.0.2 brings two fixes for DaCe transformations: one for
DeadDataflowElimination and one for StateFusion.

* Bump gt4py to include refactored gt4py/dace bridge

* Test with modified pace pipeline

- added this to re-trigger the new pace pipeline after limiting zarr to
  not install v3 (for now) because of breaking API changes.
- added this note to re-trigger after fixing the pace pipeline to not
  pull requirements from `develop`.
- added this note to ret-trigger after fixing the repo name

* Revert "Test with modified pace pipeline"

This reverts commit cd6560ea6129663d3445fafb36d02f03cb661b4d.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Grid Mixed Precision and Coriolis force load (+ QOL) (#121)

* Pass `dtype` down in allocator utils (gt4py_utils)

* Allow coriolis forces to be read in

* Edge factors are always 64-bit

* Quantity QOL

* Make sure to pass `dtype` to load the grid cleanly

* Translate grid: load coriolis forces, area 64 is 64-bit

* Bad merge

* Typo

* GEOS version of dz_min (#122)

* Doc enhancment (#123)

**Description**
Port and adaptation of the initial commit of the documentation.

Fixes issue https://github.com/NOAA-GFDL/NDSL/issues/113


**Checklist:**
- [X] I have performed a self-review of my own code
- [X] I have made corresponding changes to the documentation
- [X] My changes generate no new warnings

* Fix saving NetCDF for parallel translate test (#125)

* Release candidate 2025.03.00 (#124)

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Fix for bad merge of 7fdfa5 (#129)

---------

Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Roman Cattaneo <>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

* check domain size args only once

* review & test

---------

Co-authored-by: Frank Malatino <142349306+fmalatino@users.noreply.github.com>
Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Frank Malatino <frank.malatino@noaa.gov>

* BREAKING CHANGE: change constructor of `QuantityFactory` (#228)

* Breaking change: QuantityFactory from GridSizer and backend name

Change `QuantityFactory` to initialize from a `GridSizer` (as
previously) and a backend name (new). This effectively hides the
previous `numpy` argument, which is effectively an internal allocator
that users shouldn't need to know about. It's basically what
`from_backend()` was doing before (which is now obsolete and was thus
removed).

This is a BREAKING CHANGE and users will need to update their codes if
they instantiated QuantityFactories themselves. For users relying on the
`boilerplate` module, no changes need to happen.

* Keep QuantityFactory.from_backend() with a deprecation warning

* Extended docstings

This is mainly to force a new run of the pyshild workflow now that
pyshield tests are exclusively using `QuantityFactory.from_backend()`
which is compatible with changes proposed in this PR.

* More updates to docstrings

* fixup after rebase

* Unrelated: tests are supposed to return `None`

* fixup: move method back to current place

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: remove ndsl/exceptions (#281)

* BREAKING CHANGE: remove ndsl/exceptions

The module has been deprecated last release and will be removed with
this release.

* fixup: documentation

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: remove deprecated environment variables (#282)

Those environment variable were deprecated in the last release and will
be removed with this release.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* ci: specialize concurrency group per repo (#287)

* ci: per repo concurrency group

Note: using `${{ github.repository }}` sounds like a good idea. In
practice, that doesn't play nice when the workflow is called from
another repository because in that case, `github.repository` resolves
to the calling repository.

* fix file ending of called workflows

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Remove ndsl.Namelist (#297)

* Removing ndsl.Namelist

* Removing use_legacy_namelist flag functionality
while keeping the flag itself.

* - Removing ndsl.Namelist
- Removing use_legacy_namelist flag functionality
(while keeping the flag itself for now)

* linting

* Removing namelist.md and test_namelist.py

* [feature] Common data types for orchestration via `compiletime` (#296)

* `Quantity`, `Local` & `State` default to `dace.compiletime` auto-magically in orchestration

* Fix type check, remove `Local`

* Unit tests

* Fix for type annotations that aren't type

* BREAKING CHANGE: remove deprecated ndsl/units.py (#283)

The module has been deprecated in the last release and is now removed
in this release cycle.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: removal of extra_dim_lengths (#295)

`extra_dim_lengths` on the `GridSizer` was replaced by `data_dimensions`
in the `2025.10.00` release. Now that the release is out, let's clean up
and remove the deprecated API. This also includes
`set_extra_dim_lengths()` in the `QuantityFactory`.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: remove deprecated ndsl/filesystem.py (#284)

The module was deprecated in the last release and will now be remove in
this release cycle.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* docs: release checklist and documentation (#299)

* release checklist and documentation

* Add template for patch release

* review

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* gt4py update: fix absolute indexin in debug backend (#302)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* column min/max stencil - value and index (#301)

* column min/max and a unit test

* working unit test, pre-commit changes

* alternative type ignore method

* reverted previous change

* using boilerplate code

* reverting previous change

* build: gt4py udpdate (fix upcasting, abs k test coverage) (#303)

This PR updates GT4Py to bring the following up from GT4Py

- fix upcasting such that users can have variable k-offsets with
  expressions consisting of different types.
- increase test coverage for absolute k indexing

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* restore default PR template (#305)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: last 2025.10.00 deprecations (`CopyCorners`, `Quantity.values()`, `extra_dim_lengths` on `SubtileGridSizer` (#300)

* Remove deprecated extra_dim_lengths of SubtileGridSizer

This is a follow-up from https://github.com/NOAA-GFDL/NDSL/pull/295.

* Remove deprecated CopyCorners

* Remove deprecated `Quantity.values()`

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: remove leftover debug print statements (#308)

This PR just removes a bunch of leftover debug print statements from
`ndsl/` and `tests/`.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: make GridSizer an abstract base class (#306)

`GridSizer` is de-facto already a base class with abstract methods
`get_origin()`, `get_extent()`, and `get_shape()`. This PR just
formalizes that intent.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: directly use gt_storage in QuantityFactory (#307)

In the past, `QuantityFactory` would allow not only allocating with
gt4py storage objects, but also directly from `numpy` or `cupy`. This
ability was removed in PR https://github.com/NOAA-GFDL/NDSL/pull/228.
With that removal comes the opportunity to streamline allocation in
`QuantityFactory`, removing the need for a `Allocator` class in the
middle.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [Feature] Schedule Tree:  refine transient (#304)

* Fix axis merge

* Remove debug print

* Refine transients + utests

* Lint

* Revert to deactivating the experimental stree work

* Use context manager for  `_INTERNAL__SCHEDULE_TREE_OPTIMIZATION`

* Typo

* Clean refine transients code

* Derive common strides layout from backend
Refactor code to make re-sizing more compact in main algorithm
Fix bad recursion
Add todo list and verbose state of optimization

* Lint

* Remove `transient` to `State` lifetime - keep PR on target

* Lint

* build: gt4py update (upcasting in cast operations) (#310)

This PR updates GT4Py to bring the fix for upcasting inside cast
operations from GT4py to NDSL.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* build: gt4py update (precision of global constants) (#313)

This PR updates GT4Py in NDSL to bring up a PR that fixes the precision
of global constants. So far, we'd discard any type annotation on global
constants and just use the default literal precision instead. With this
change, we respect potential type annotations on global constants.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: Quantity constructor: `gt4py_backend`  -> `backend` (#312)

* refactor: force kwargs in ctor of  Quantity/Local

Force keyword arguments for optional arguments to those constructors.
This will facilitate the `gt4py_backen` -> `backend` transition.

* refactor: prefer `backend` over `gt4py_backend` in Quantity

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: prepare `ZarrMonitor` for upcomming `Comm` changes (#315)

* refactor: ZarrMonitor: you'll have to bring your own comm objects

* ci: run unit tests with optional zarr dependency

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Introduce a `single_code_path` flag in the DaCeConfig that forces a single cache to be built. (#311)

* refactor: Deprecate optional backend argument to Quantity/Local (#314)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: remove DummyComm as alias to LocalComm (#319)

There's no need for this alias. We thus replace all occurrences for the
alias with the underlying `LocalComm` directly.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Deprecate `CopyCornersXY` (#317)

`CopyCornersXY` are replaced with `CopyCornersX` and `CopyCornersY` in
PyFV3. The class is currently unused and will be removed after the next
release of NDSL.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: Deprecate `NullComm` in favor of `MPIComm` and `LocalComm` (#318)

* unrelated: fix typo in warning message

* refactor: change NullComm -> MPIComm in boilerplate

This adds a test that the MPI communicator only has one rank if a
single-tile setup is requested.

* refactor: deprecate NullComm

`NullComm` can be replaced with either `LocalComm` or `MPIComm`.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* build: gt4py update (self-assignment in serial vertical loops) (#316)

This updates the gt4py dependency to bring up the fix that allows
self-assignment with offset reads in K for serial (e.g.
FORWARD/BACKWARD) vertical loops.

See https://github.com/GridTools/gt4py/pull/2388 (in particular the test
cases for details on what is allowed and what not).

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: specify backend when allocating a Quantity (#320)

This PR is a follow-up from https://github.com/NOAA-GFDL/NDSL/pull/314
and adds the soon to be required `backend` parameter to constructor
calls of `Quantity`. I missed a couple ones because PRs were merged in
parallel, e.g. re-enabling the `ZarrMonitor` tests.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [Translate test] Compute the percentage of changing grid points that error (#322)

* Add `inputs` to MultiModalFloat metric
Compute the percentage of changing grid points that errored

* Lint

* Removing --no_legacy_namelist flag (#323)

* Added new functions: column_min_ddim & column_max_ddim and cooresponding test (#324)

Functionality is the same as column_min/max, but separate functions are needed to handle cases with off grid data dimensions

* [Optimization/Experimental] Better `AxisMerge` for column physics (#325)

* Add `CleanUpScheduleTree` pass to prep for merge

* Decluter axis merge logs, expose new pass

* Verbose Pipeline passes (with temporary stree saves)

* Deactivaete IF_SCOPE push, remove attempt to keep merging if next nodes not a MapScope

* Docs of TODO

* Draft of more extended testing

* Fix `CartesianRefineTransients` for non-array

* Some lint

* Clean up the Tree of ForScope.loop_range

* Utest: group test under a single orchestrated class, add missing feature and expected failures

* [Feature/Experimental] Stree Refine Transient optimization pass:  data dimensions and proper unit tests (#327)

* Rename test for axis merge

* Properly refine fields with data dimensions
Fix indexing in memlets properly

* utest: coverage of all implemented tests

* Clean up timing print of orchestration

* Lint

* Fix bad reference to in/out memlets, remopve dead code, better code

* Share test infrastructure, rename stencils

* Lint

* Better naming in utest stencils

* [Update] GT4Py & DaCe updated to 2025.11.25 state of `main` (#330)

* DaCe update: fix networkx dependency breaking with 3.6

* GT4Py: Runtime interval bounds in `debug`

* [Tool] Best Guess Netcdfs diff (#177)

* Best guess netcdfs compare

* Add FieldBundle to debugger

* lint

* Move executable to `pyproject`

* Lint

* Update `gt4py` to capture improvement to user error (#331)

* [Rework/Experimental] Refine Transient v2: `Ranges` for all! (#328)

* Rework the `RefineTransient` to use `Range` - simpler, cleaner and more robust. Also props us for a better refine

* Remove unused code

---------

Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

* [Fix] [Translate] Update API for parallel test when using `MultiModalMetric` (#332)

* Remove old options for `MultiModalFloatMetric`

* Defensive programming: bail out if we can't measure the ref vs input diff

---------

Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>
Co-authored-by: Frank Malatino <142349306+fmalatino@users.noreply.github.com>
Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Frank Malatino <frank.malatino@noaa.gov>
Co-authored-by: Janice Kim <Janice.Kim@noaa.gov>

* build|ci: add versioning needs to release checklist

Our auto-versioning system (`setuptools_scm`) expects to be able to find
the latest tag from the current HEAD of the `develop` branch. Since we
tag on `main`, we need to make sure to merge `main` back down into
`develop`.

Last time we did merge `main` into `develop` was after `2025.01.00`.
This is why before this PR all version numbers that aren't spot-on on a
release, show `2025.1.0` as base version. This PR merges `main` into
`develop` and adds an item to the release checklist such that we don't
forget for future releases.

The PR also changes the configuration such that unreleased versions
contain the version hash, e.g. for this PR it looks something like

```none
2026.2.0.post1.dev8+g7d06ef689
```

where `2026.2.0` denotes the last release tag, `post1` means we are
ahead, `dev8` means the distance is 8 commmits, `g` stands for `git` as
source control management system, and `7d06ef689` is the commit hash.
Compared to the previous version that looked something like

```none
2025.1.1.dev353
```

I think this is way more information.

---------

Co-authored-by: Frank Malatino <142349306+fmalatino@users.noreply.github.com>
Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>
Co-authored-by: Frank Malatino <frank.malatino@noaa.gov>
Co-authored-by: Janice Kim <Janice.Kim@noaa.gov>
---
 .github/PULL_REQUEST_TEMPLATE/release-patch.md | 2 +-
 .github/PULL_REQUEST_TEMPLATE/release.md       | 1 +
 pyproject.toml                                 | 3 +--
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/PULL_REQUEST_TEMPLATE/release-patch.md b/.github/PULL_REQUEST_TEMPLATE/release-patch.md
index c588dd8f..5fd0e3f6 100644
--- a/.github/PULL_REQUEST_TEMPLATE/release-patch.md
+++ b/.github/PULL_REQUEST_TEMPLATE/release-patch.md
@@ -29,4 +29,4 @@ What to do to actually release:
 What to do after a release:
 
 - [ ] update the pace PR from the pre-commit checklist to include the released version of NDSL and merge it.
-- [ ] in NDSL, merge `main` back into `develop` (potentially adding a commit to fix the issue "properly")
+- [ ] in NDSL, merge `main` back into `develop` (potentially adding a commit to fix the issue "properly") to have all changes in develop and ensure `setuptools_scm` finds the latest release tag
diff --git a/.github/PULL_REQUEST_TEMPLATE/release.md b/.github/PULL_REQUEST_TEMPLATE/release.md
index 00aa3403..ceca5067 100644
--- a/.github/PULL_REQUEST_TEMPLATE/release.md
+++ b/.github/PULL_REQUEST_TEMPLATE/release.md
@@ -22,5 +22,6 @@ What to do to actually release:
 
 What to do after a release:
 
+- [ ] merge `main` down into `develop` to ensure `setuptools_scm` finds the latest release tag
 - [ ] update the pace PR from the pre-commit checklist to include the released version of NDSL and merge it.
 - [ ] merge breaking changes in NDSL (e.g. search for deprecation warnings)
diff --git a/pyproject.toml b/pyproject.toml
index 3583f889..07e3d5c3 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -127,5 +127,4 @@ include-package-data = true
 include = ["ndsl", "ndsl.*"]
 
 [tool.setuptools_scm]
-local_scheme = "dirty-tag"
-version_scheme = "guess-next-dev"
+version_scheme = "no-guess-dev"

From af09f734468465e25611ba6e6e8aa0f3be42f44d Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Mon, 23 Mar 2026 13:56:12 +0100
Subject: [PATCH 08/28] build: update gt4py (numpy 2 compatibility and `ipcx`
 support) (#403)

* build: update gt4py to bring numpy 2 changes

* unrelated: simple cleanup in conftest
---
 external/gt4py                    | 2 +-
 ndsl/stencils/testing/conftest.py | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/external/gt4py b/external/gt4py
index 47f7a9a1..6f5a9e5c 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit 47f7a9a13d6ac9c8bfb5dab38f18456f51847cdf
+Subproject commit 6f5a9e5cc699a4cce7197849a78dfe83bf48379d
diff --git a/ndsl/stencils/testing/conftest.py b/ndsl/stencils/testing/conftest.py
index 213c025e..3aa2b907 100644
--- a/ndsl/stencils/testing/conftest.py
+++ b/ndsl/stencils/testing/conftest.py
@@ -152,9 +152,8 @@ def get_test_class(test_name: str) -> type | None:
         return_class = getattr(translate, translate_class_name)  # type: ignore[name-defined] # noqa: F821
     except AttributeError as err:
         if translate_class_name in err.args[0]:
-            return_class = None
-        else:
-            raise err
+            return None
+        raise err
     return return_class
 
 
From 469a58aa2005bfbe5e19a28204db13ec20aeebd2 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Mon, 23 Mar 2026 14:55:11 +0100
Subject: [PATCH 09/28] typing: add types to `ndsl.stencils.testing.grid`
 (#404)

* tests: remove dead code

These functions of the `Grid` class must be dead code because the
`Quantity` constructor now requires a backend argument (which isn't
provided). If anyone were to use that code, we'd see a type error.

* tests: typing Grid class

Add type hints to the `Grid` class used in translate tests. This is not
perfect, but at least we have now most types. Lots of magic happening
here and if we don't have minimal type hint and type checking we'll
missing things like the `Quantity` update in the previous commit.

* remove now stale ignore directive
---
 ndsl/stencils/testing/__init__.py  |   2 +-
 ndsl/stencils/testing/conftest.py  |   2 +-
 ndsl/stencils/testing/grid.py      | 363 +++++++++++++++--------------
 ndsl/stencils/testing/savepoint.py |   2 +-
 ndsl/stencils/testing/translate.py |   2 +-
 5 files changed, 190 insertions(+), 181 deletions(-)

diff --git a/ndsl/stencils/testing/__init__.py b/ndsl/stencils/testing/__init__.py
index a55d8647..b1c0aff9 100644
--- a/ndsl/stencils/testing/__init__.py
+++ b/ndsl/stencils/testing/__init__.py
@@ -1,4 +1,4 @@
-from .grid import Grid  # type: ignore
+from .grid import Grid
 from .parallel_translate import (
     ParallelTranslate,
     ParallelTranslate2Py,
diff --git a/ndsl/stencils/testing/conftest.py b/ndsl/stencils/testing/conftest.py
index 3aa2b907..b2ceaacf 100644
--- a/ndsl/stencils/testing/conftest.py
+++ b/ndsl/stencils/testing/conftest.py
@@ -19,7 +19,7 @@
 from ndsl.comm.partitioner import CubedSpherePartitioner, TilePartitioner
 from ndsl.config import Backend
 from ndsl.dsl.dace.dace_config import DaceConfig
-from ndsl.stencils.testing.grid import Grid  # type: ignore
+from ndsl.stencils.testing.grid import Grid
 from ndsl.stencils.testing.parallel_translate import ParallelTranslate
 from ndsl.stencils.testing.savepoint import SavepointCase, Translate, dataset_to_dict
 from ndsl.stencils.testing.translate import TranslateGrid
diff --git a/ndsl/stencils/testing/grid.py b/ndsl/stencils/testing/grid.py
index 26de91c5..ef208b71 100644
--- a/ndsl/stencils/testing/grid.py
+++ b/ndsl/stencils/testing/grid.py
@@ -1,7 +1,10 @@
-# type: ignore
+from typing import Sequence
 
 import numpy as np
+from f90nml import Namelist
 
+from ndsl import GridSizer
+from ndsl.comm.communicator import Communicator
 from ndsl.comm.partitioner import TilePartitioner
 from ndsl.config import Backend
 from ndsl.constants import I_DIM, J_DIM, K_DIM, N_HALO_DEFAULT
@@ -27,7 +30,6 @@
 
 
 class Grid:
-    # indices = ["is_", "ie", "isd", "ied", "js", "je", "jsd", "jed"]
     index_pairs = [("is_", "js"), ("ie", "je"), ("isd", "jsd"), ("ied", "jed")]
     shape_params = ["npz", "npx", "npy"]
     # npx -- number of grid corners on one tile of the domain
@@ -35,8 +37,31 @@ class Grid:
     # But we need to add the halo - 1 to change this check to 0 based python arrays
     # grid.ie == npx + halo - 2
 
+    # shape params (initialized in __init__ with `setattr`)
+    npx: int
+    npy: int
+    npz: int
+
+    # index params (initialized in __int___ with `setattr`)
+    is_: int  # `is` is a reserved keyword in python
+    ie: int
+    isd: int
+    ied: int
+    js: int
+    je: int
+    jsd: int
+    jed: int
+
     @classmethod
-    def _make(cls, npx, npy, npz, layout, rank, backend: Backend):
+    def _make(
+        cls,
+        npx: int,
+        npy: int,
+        npz: int,
+        layout: tuple[int, int],
+        rank: int,
+        backend: Backend,
+    ) -> "Grid":
         shape_params = {
             "npx": npx,
             "npy": npy,
@@ -60,13 +85,15 @@ def _make(cls, npx, npy, npz, layout, rank, backend: Backend):
         return cls(indices, shape_params, rank, layout, backend, local_indices=True)
 
     @classmethod
-    def from_namelist(cls, namelist, rank, backend: Backend):
+    def from_namelist(cls, namelist: Namelist, rank: int, backend: Backend) -> "Grid":
         return cls._make(
             namelist.npx, namelist.npy, namelist.npz, namelist.layout, rank, backend
         )
 
     @classmethod
-    def with_data_from_namelist(cls, namelist, communicator, backend: Backend):
+    def with_data_from_namelist(
+        cls, namelist: Namelist, communicator: Communicator, backend: Backend
+    ) -> "Grid":
         grid = cls.from_namelist(namelist, communicator.rank, backend)
         grid.make_grid_data(
             npx=namelist.npx,
@@ -79,14 +106,14 @@ def with_data_from_namelist(cls, namelist, communicator, backend: Backend):
 
     def __init__(
         self,
-        indices,
-        shape_params,
-        rank,
-        layout,
+        indices: dict[str, int],
+        shape_params: dict[str, int],
+        rank: int,
+        layout: tuple[int, int],
         backend: Backend,
         data_fields: dict | None = None,
-        local_indices=False,
-    ):
+        local_indices: bool = False,
+    ) -> None:
         if data_fields is None:
             data_fields = {}
 
@@ -129,16 +156,16 @@ def __init__(
         self.se_corner = self.east_edge and self.south_edge
         self.nw_corner = self.west_edge and self.north_edge
         self.ne_corner = self.east_edge and self.north_edge
-        self.data_fields = {}
+        self.data_fields: dict = {}
         self.add_data(data_fields)
-        self._sizer = None
-        self._quantity_factory = None
-        self._grid_data = None
-        self._driver_grid_data = None
-        self._damping_coefficients = None
+        self._sizer: GridSizer | None = None
+        self._quantity_factory: QuantityFactory | None = None
+        self._grid_data: GridData | None = None
+        self._driver_grid_data: DriverGridData | None = None
+        self._damping_coefficients: DampingCoefficients | None = None
 
     @property
-    def sizer(self):
+    def sizer(self) -> GridSizer:
         if self._sizer is None:
             # in the future this should use from_namelist, when we have a non-flattened
             # namelist
@@ -164,63 +191,32 @@ def quantity_factory(self) -> QuantityFactory:
             self._quantity_factory = QuantityFactory(self.sizer, backend=self.backend)
         return self._quantity_factory
 
-    def make_quantity(
-        self,
-        array,
-        dims=(I_DIM, J_DIM, K_DIM),
-        units="Unknown",
-        origin=None,
-        extent=None,
-    ):
-        if origin is None:
-            origin = self.compute_origin()
-        if extent is None:
-            extent = self.domain_shape_compute()
-        return Quantity(array, dims=dims, units=units, origin=origin, extent=extent)
-
-    def quantity_dict_update(
-        self,
-        data_dict,
-        varname,
-        dims=(I_DIM, J_DIM, K_DIM),
-        units="Unknown",
-    ):
-        data_dict[varname + "_quantity"] = self.quantity_wrap(
-            data_dict[varname], dims=dims, units=units
-        )
-
-    def quantity_wrap(
-        self,
-        data,
-        dims=(I_DIM, J_DIM, K_DIM),
-        units="unknown",
-    ):
-        origin = self.sizer.get_origin(dims)
-        extent = self.sizer.get_extent(dims)
-        return Quantity(data, dims=dims, units=units, origin=origin, extent=extent)
-
-    def global_to_local_1d(self, global_value, subtile_index, subtile_length):
-        return int(global_value - subtile_index * subtile_length)
+    def global_to_local_1d(
+        self, global_value: int, subtile_index: int, subtile_length: int
+    ) -> int:
+        return global_value - subtile_index * subtile_length
 
-    def global_to_local_x(self, i_global):
+    def global_to_local_x(self, i_global: int) -> int:
         return self.global_to_local_1d(
             i_global, self.subtile_index[1], self.subtile_width_x
         )
 
-    def global_to_local_y(self, j_global):
+    def global_to_local_y(self, j_global: int) -> int:
         return self.global_to_local_1d(
             j_global, self.subtile_index[0], self.subtile_width_y
         )
 
-    def global_to_local_indices(self, i_global, j_global):
+    def global_to_local_indices(self, i_global: int, j_global: int) -> tuple[int, int]:
         i_local = self.global_to_local_x(i_global)
         j_local = self.global_to_local_y(j_global)
         return i_local, j_local
 
-    def local_to_global_1d(self, local_value, subtile_index, subtile_length):
-        return int(local_value + subtile_index * subtile_length)
+    def local_to_global_1d(
+        self, local_value: int, subtile_index: int, subtile_length: int
+    ) -> int:
+        return local_value + subtile_index * subtile_length
 
-    def local_to_global_indices(self, i_local, j_local):
+    def local_to_global_indices(self, i_local: int, j_local: int) -> tuple[int, int]:
         i_global = self.local_to_global_1d(
             i_local, self.subtile_index[1], self.subtile_width_x
         )
@@ -229,53 +225,53 @@ def local_to_global_indices(self, i_local, j_local):
         )
         return i_global, j_global
 
-    def add_data(self, data_dict):
+    def add_data(self, data_dict: dict) -> None:
         self.data_fields.update(data_dict)
         for k, v in self.data_fields.items():
             setattr(self, k, v)
 
-    def irange_compute(self):
+    def irange_compute(self) -> range:
         return range(self.is_, self.ie + 1)
 
-    def irange_compute_x(self):
+    def irange_compute_x(self) -> range:
         return range(self.is_, self.ie + 2)
 
-    def jrange_compute(self):
+    def jrange_compute(self) -> range:
         return range(self.js, self.je + 1)
 
-    def jrange_compute_y(self):
+    def jrange_compute_y(self) -> range:
         return range(self.js, self.je + 2)
 
-    def irange_domain(self):
+    def irange_domain(self) -> range:
         return range(self.isd, self.ied + 1)
 
-    def jrange_domain(self):
+    def jrange_domain(self) -> range:
         return range(self.jsd, self.jed + 1)
 
-    def krange(self):
+    def krange(self) -> range:
         return range(0, self.npz)
 
-    def compute_interface(self):
+    def compute_interface(self) -> tuple[slice, ...]:
         return self.slice_dict(self.compute_dict())
 
-    def x3d_interface(self):
+    def x3d_interface(self) -> tuple[slice, ...]:
         return self.slice_dict(self.x3d_compute_dict())
 
-    def y3d_interface(self):
+    def y3d_interface(self) -> tuple[slice, ...]:
         return self.slice_dict(self.y3d_compute_dict())
 
-    def x3d_domain_interface(self):
+    def x3d_domain_interface(self) -> tuple[slice, ...]:
         return self.slice_dict(self.x3d_domain_dict())
 
-    def y3d_domain_interface(self):
+    def y3d_domain_interface(self) -> tuple[slice, ...]:
         return self.slice_dict(self.y3d_domain_dict())
 
-    def add_one(self, num):
+    def add_one(self, num: int | None) -> int:
         if num is None:
-            return None
+            raise ValueError("Can't add one to `None`.")
         return num + 1
 
-    def slice_dict(self, d, ndim: int = 3):
+    def slice_dict(self, d: dict, ndim: int = 3) -> tuple[slice, ...]:
         iters: str = "ijk" if ndim > 1 else "k"
         return tuple(
             [
@@ -286,7 +282,7 @@ def slice_dict(self, d, ndim: int = 3):
             ]
         )
 
-    def default_domain_dict(self):
+    def default_domain_dict(self) -> dict:
         return {
             "istart": self.isd,
             "iend": self.ied,
@@ -296,13 +292,13 @@ def default_domain_dict(self):
             "kend": self.npz - 1,
         }
 
-    def default_dict_buffer_2d(self):
+    def default_dict_buffer_2d(self) -> dict:
         mydict = self.default_domain_dict()
         mydict["iend"] += 1
         mydict["jend"] += 1
         return mydict
 
-    def compute_dict(self):
+    def compute_dict(self) -> dict:
         return {
             "istart": self.is_,
             "iend": self.ie,
@@ -312,23 +308,23 @@ def compute_dict(self):
             "kend": self.npz - 1,
         }
 
-    def compute_dict_buffer_2d(self):
+    def compute_dict_buffer_2d(self) -> dict:
         mydict = self.compute_dict()
         mydict["iend"] += 1
         mydict["jend"] += 1
         return mydict
 
-    def default_buffer_k_dict(self):
+    def default_buffer_k_dict(self) -> dict:
         mydict = self.default_domain_dict()
         mydict["kend"] = self.npz
         return mydict
 
-    def compute_buffer_k_dict(self):
+    def compute_buffer_k_dict(self) -> dict:
         mydict = self.compute_dict()
         mydict["kend"] = self.npz
         return mydict
 
-    def x3d_domain_dict(self):
+    def x3d_domain_dict(self) -> dict:
         horizontal_dict = {
             "istart": self.isd,
             "iend": self.ied + 1,
@@ -337,7 +333,7 @@ def x3d_domain_dict(self):
         }
         return {**self.default_domain_dict(), **horizontal_dict}
 
-    def y3d_domain_dict(self):
+    def y3d_domain_dict(self) -> dict:
         horizontal_dict = {
             "istart": self.isd,
             "iend": self.ied,
@@ -346,7 +342,7 @@ def y3d_domain_dict(self):
         }
         return {**self.default_domain_dict(), **horizontal_dict}
 
-    def x3d_compute_dict(self):
+    def x3d_compute_dict(self) -> dict:
         horizontal_dict = {
             "istart": self.is_,
             "iend": self.ie + 1,
@@ -355,7 +351,7 @@ def x3d_compute_dict(self):
         }
         return {**self.default_domain_dict(), **horizontal_dict}
 
-    def y3d_compute_dict(self):
+    def y3d_compute_dict(self) -> dict:
         horizontal_dict = {
             "istart": self.is_,
             "iend": self.ie,
@@ -364,7 +360,7 @@ def y3d_compute_dict(self):
         }
         return {**self.default_domain_dict(), **horizontal_dict}
 
-    def x3d_compute_domain_y_dict(self):
+    def x3d_compute_domain_y_dict(self) -> dict:
         horizontal_dict = {
             "istart": self.is_,
             "iend": self.ie + 1,
@@ -373,7 +369,7 @@ def x3d_compute_domain_y_dict(self):
         }
         return {**self.default_domain_dict(), **horizontal_dict}
 
-    def y3d_compute_domain_x_dict(self):
+    def y3d_compute_domain_x_dict(self) -> dict:
         horizontal_dict = {
             "istart": self.isd,
             "iend": self.ied,
@@ -382,18 +378,22 @@ def y3d_compute_domain_x_dict(self):
         }
         return {**self.default_domain_dict(), **horizontal_dict}
 
-    def domain_shape_full(self, *, add: tuple[int, int, int] = (0, 0, 0)):
+    def domain_shape_full(
+        self, *, add: tuple[int, int, int] = (0, 0, 0)
+    ) -> tuple[int, int, int]:
         """Domain shape for the full array including halo points."""
         return (self.nid + add[0], self.njd + add[1], self.npz + add[2])
 
-    def domain_shape_compute(self, *, add: tuple[int, int, int] = (0, 0, 0)):
+    def domain_shape_compute(
+        self, *, add: tuple[int, int, int] = (0, 0, 0)
+    ) -> tuple[int, int, int]:
         """Compute domain shape excluding halo points."""
         return (self.nic + add[0], self.njc + add[1], self.npz + add[2])
 
-    def copy_right_edge(self, var, i_index, j_index):
+    def copy_right_edge(self, var, i_index, j_index):  # type: ignore
         return np.copy(var[i_index:, :, :]), np.copy(var[:, j_index:, :])
 
-    def insert_left_edge(self, var, edge_data_i, i_index, edge_data_j, j_index):
+    def insert_left_edge(self, var, edge_data_i, i_index, edge_data_j, j_index):  # type: ignore
         if len(var.shape) < 3:
             var[:i_index, :] = edge_data_i
             var[:, :j_index] = edge_data_j
@@ -401,7 +401,7 @@ def insert_left_edge(self, var, edge_data_i, i_index, edge_data_j, j_index):
             var[:i_index, :, :] = edge_data_i
             var[:, :j_index, :] = edge_data_j
 
-    def insert_right_edge(self, var, edge_data_i, i_index, edge_data_j, j_index):
+    def insert_right_edge(self, var, edge_data_i, i_index, edge_data_j, j_index):  # type: ignore
         if len(var.shape) < 3:
             var[i_index:, :] = edge_data_i
             var[:, j_index:] = edge_data_j
@@ -409,21 +409,25 @@ def insert_right_edge(self, var, edge_data_i, i_index, edge_data_j, j_index):
             var[i_index:, :, :] = edge_data_i
             var[:, j_index:, :] = edge_data_j
 
-    def uvar_edge_halo(self, var):
+    def uvar_edge_halo(self, var):  # type: ignore
         return self.copy_right_edge(var, self.ie + 2, self.je + 1)
 
-    def vvar_edge_halo(self, var):
+    def vvar_edge_halo(self, var):  # type: ignore
         return self.copy_right_edge(var, self.ie + 1, self.je + 2)
 
-    def compute_origin(self, add: tuple[int, int, int] = (0, 0, 0)):
+    def compute_origin(
+        self, add: tuple[int, int, int] = (0, 0, 0)
+    ) -> tuple[int, int, int]:
         """Start of the compute domain (e.g. (halo, halo, 0))"""
         return (self.is_ + add[0], self.js + add[1], add[2])
 
-    def full_origin(self, add: tuple[int, int, int] = (0, 0, 0)):
+    def full_origin(
+        self, add: tuple[int, int, int] = (0, 0, 0)
+    ) -> tuple[int, int, int]:
         """Start of the full array including halo points (e.g. (0, 0, 0))"""
         return (self.isd + add[0], self.jsd + add[1], add[2])
 
-    def horizontal_starts_from_shape(self, shape):
+    def horizontal_starts_from_shape(self, shape: Sequence[int]) -> tuple[int, int]:
         if shape[0:2] in [
             self.domain_shape_compute()[0:2],
             self.domain_shape_compute(add=(1, 0, 0))[0:2],
@@ -431,12 +435,13 @@ def horizontal_starts_from_shape(self, shape):
             self.domain_shape_compute(add=(1, 1, 0))[0:2],
         ]:
             return self.is_, self.js
-        elif shape[0:2] == (self.nic + 2, self.njc + 2):
+
+        if shape[0:2] == (self.nic + 2, self.njc + 2):
             return self.is_ - 1, self.js - 1
-        else:
-            return 0, 0
 
-    def get_halo_update_spec(
+        return 0, 0
+
+    def get_halo_update_spec(  # type: ignore
         self,
         shape,
         origin,
@@ -452,7 +457,7 @@ def get_halo_update_spec(
     @property
     def grid_indexing(self) -> GridIndexing:
         return GridIndexing(
-            domain=tuple(int(item) for item in self.domain_shape_compute()),
+            domain=self.domain_shape_compute(),
             n_halo=self.halo,
             south_edge=self.south_edge,
             north_edge=self.north_edge,
@@ -465,16 +470,18 @@ def damping_coefficients(self) -> DampingCoefficients:
         if self._damping_coefficients is not None:
             return self._damping_coefficients
         self._damping_coefficients = DampingCoefficients(
-            divg_u=self.divg_u,
-            divg_v=self.divg_v,
-            del6_u=self.del6_u,
-            del6_v=self.del6_v,
-            da_min=self.da_min,
-            da_min_c=self.da_min_c,
+            divg_u=self.divg_u,  # type: ignore
+            divg_v=self.divg_v,  # type: ignore
+            del6_u=self.del6_u,  # type: ignore
+            del6_v=self.del6_v,  # type: ignore
+            da_min=self.da_min,  # type: ignore
+            da_min_c=self.da_min_c,  # type: ignore
         )
         return self._damping_coefficients
 
-    def set_damping_coefficients(self, damping_coefficients: DampingCoefficients):
+    def set_damping_coefficients(
+        self, damping_coefficients: DampingCoefficients
+    ) -> None:
         self._damping_coefficients = damping_coefficients
 
     @property
@@ -526,103 +533,103 @@ def grid_data(self) -> GridData:
 
         horizontal = HorizontalGridData(
             lon=self.quantity_factory.from_array(
-                data=self.bgrid1,
+                data=self.bgrid1,  # type: ignore
                 dims=GridDefinitions.lon.dims,
                 units=GridDefinitions.lon.units,
             ),
             lat=self.quantity_factory.from_array(
-                data=self.bgrid2,
+                data=self.bgrid2,  # type: ignore
                 dims=GridDefinitions.lat.dims,
                 units=GridDefinitions.lat.units,
             ),
             lon_agrid=self.quantity_factory.from_array(
-                data=self.agrid1,
+                data=self.agrid1,  # type: ignore
                 dims=GridDefinitions.lon_agrid.dims,
                 units=GridDefinitions.lon_agrid.units,
             ),
             lat_agrid=self.quantity_factory.from_array(
-                data=self.agrid2,
+                data=self.agrid2,  # type: ignore
                 dims=GridDefinitions.lat_agrid.dims,
                 units=GridDefinitions.lat_agrid.units,
             ),
             area=self.quantity_factory.from_array(
-                data=self.area,
+                data=self.area,  # type: ignore
                 dims=GridDefinitions.area.dims,
                 units=GridDefinitions.area.units,
             ),
             area_64=self.quantity_factory.from_array(
-                data=self.area_64,
+                data=self.area_64,  # type: ignore
                 dims=GridDefinitions.area.dims,
                 units=GridDefinitions.area.units,
                 allow_mismatch_float_precision=True,
             ),
             rarea=self.quantity_factory.from_array(
-                data=self.rarea,
+                data=self.rarea,  # type: ignore
                 dims=GridDefinitions.rarea.dims,
                 units=GridDefinitions.rarea.units,
             ),
             rarea_c=self.quantity_factory.from_array(
-                data=self.rarea_c,
+                data=self.rarea_c,  # type: ignore
                 dims=GridDefinitions.rarea_c.dims,
                 units=GridDefinitions.rarea_c.units,
             ),
             dx=self.quantity_factory.from_array(
-                data=self.dx,
+                data=self.dx,  # type: ignore
                 dims=GridDefinitions.dx.dims,
                 units=GridDefinitions.dx.units,
             ),
             dy=self.quantity_factory.from_array(
-                data=self.dy,
+                data=self.dy,  # type: ignore
                 dims=GridDefinitions.dy.dims,
                 units=GridDefinitions.dy.units,
             ),
             dxc=self.quantity_factory.from_array(
-                data=self.dxc,
+                data=self.dxc,  # type: ignore
                 dims=GridDefinitions.dxc.dims,
                 units=GridDefinitions.dxc.units,
             ),
             dyc=self.quantity_factory.from_array(
-                data=self.dyc,
+                data=self.dyc,  # type: ignore
                 dims=GridDefinitions.dyc.dims,
                 units=GridDefinitions.dyc.units,
             ),
             dxa=self.quantity_factory.from_array(
-                data=self.dxa,
+                data=self.dxa,  # type: ignore
                 dims=GridDefinitions.dxa.dims,
                 units=GridDefinitions.dxa.units,
             ),
             dya=self.quantity_factory.from_array(
-                data=self.dya,
+                data=self.dya,  # type: ignore
                 dims=GridDefinitions.dya.dims,
                 units=GridDefinitions.dya.units,
             ),
             rdx=self.quantity_factory.from_array(
-                data=self.rdx,
+                data=self.rdx,  # type: ignore
                 dims=GridDefinitions.rdx.dims,
                 units=GridDefinitions.rdx.units,
             ),
             rdy=self.quantity_factory.from_array(
-                data=self.rdy,
+                data=self.rdy,  # type: ignore
                 dims=GridDefinitions.rdy.dims,
                 units=GridDefinitions.rdy.units,
             ),
             rdxc=self.quantity_factory.from_array(
-                data=self.rdxc,
+                data=self.rdxc,  # type: ignore
                 dims=GridDefinitions.rdxc.dims,
                 units=GridDefinitions.rdxc.units,
             ),
             rdyc=self.quantity_factory.from_array(
-                data=self.rdyc,
+                data=self.rdyc,  # type: ignore
                 dims=GridDefinitions.rdyc.dims,
                 units=GridDefinitions.rdyc.units,
             ),
             rdxa=self.quantity_factory.from_array(
-                data=self.rdxa,
+                data=self.rdxa,  # type: ignore
                 dims=GridDefinitions.rdxa.dims,
                 units=GridDefinitions.rdxa.units,
             ),
             rdya=self.quantity_factory.from_array(
-                data=self.rdya,
+                data=self.rdya,  # type: ignore
                 dims=GridDefinitions.rdya.dims,
                 units=GridDefinitions.rdya.units,
             ),
@@ -631,22 +638,22 @@ def grid_data(self) -> GridData:
             es1=clipped_data["es1"],
             ew2=clipped_data["ew2"],
             a11=self.quantity_factory.from_array(
-                data=self.a11,
+                data=self.a11,  # type: ignore
                 dims=GridDefinitions.a11.dims,
                 units=GridDefinitions.a11.units,
             ),
             a12=self.quantity_factory.from_array(
-                data=self.a12,
+                data=self.a12,  # type: ignore
                 dims=GridDefinitions.a12.dims,
                 units=GridDefinitions.a12.units,
             ),
             a21=self.quantity_factory.from_array(
-                data=self.a21,
+                data=self.a21,  # type: ignore
                 dims=GridDefinitions.a21.dims,
                 units=GridDefinitions.a21.units,
             ),
             a22=self.quantity_factory.from_array(
-                data=self.a22,
+                data=self.a22,  # type: ignore
                 dims=GridDefinitions.a22.dims,
                 units=GridDefinitions.a22.units,
             ),
@@ -657,156 +664,156 @@ def grid_data(self) -> GridData:
         )
         vertical = VerticalGridData(
             ak=self.quantity_factory.from_array(
-                data=self.ak,
+                data=self.ak,  # type: ignore
                 dims=GridDefinitions.ak.dims,
                 units=GridDefinitions.ak.units,
             ),
             bk=self.quantity_factory.from_array(
-                data=self.bk,
+                data=self.bk,  # type: ignore
                 dims=GridDefinitions.bk.dims,
                 units=GridDefinitions.bk.units,
             ),
         )
         contravariant = ContravariantGridData(
             cosa=self.quantity_factory.from_array(
-                data=self.cosa,
+                data=self.cosa,  # type: ignore
                 dims=GridDefinitions.cosa.dims,
                 units=GridDefinitions.cosa.units,
             ),
             cosa_u=self.quantity_factory.from_array(
-                data=self.cosa_u,
+                data=self.cosa_u,  # type: ignore
                 dims=GridDefinitions.cosa_u.dims,
                 units=GridDefinitions.cosa_u.units,
             ),
             cosa_v=self.quantity_factory.from_array(
-                data=self.cosa_v,
+                data=self.cosa_v,  # type: ignore
                 dims=GridDefinitions.cosa_v.dims,
                 units=GridDefinitions.cosa_v.units,
             ),
             cosa_s=self.quantity_factory.from_array(
-                data=self.cosa_s,
+                data=self.cosa_s,  # type: ignore
                 dims=GridDefinitions.cosa_s.dims,
                 units=GridDefinitions.cosa_s.units,
             ),
             sina_u=self.quantity_factory.from_array(
-                data=self.sina_u,
+                data=self.sina_u,  # type: ignore
                 dims=GridDefinitions.sina_u.dims,
                 units=GridDefinitions.sina_u.units,
             ),
             sina_v=self.quantity_factory.from_array(
-                data=self.sina_v,
+                data=self.sina_v,  # type: ignore
                 dims=GridDefinitions.sina_v.dims,
                 units=GridDefinitions.sina_v.units,
             ),
             rsina=self.quantity_factory.from_array(
-                data=self.rsina,
+                data=self.rsina,  # type: ignore
                 dims=GridDefinitions.rsina.dims,
                 units=GridDefinitions.rsina.units,
             ),
             rsin_u=self.quantity_factory.from_array(
-                data=self.rsin_u,
+                data=self.rsin_u,  # type: ignore
                 dims=GridDefinitions.rsin_u.dims,
                 units=GridDefinitions.rsin_u.units,
             ),
             rsin_v=self.quantity_factory.from_array(
-                data=self.rsin_v,
+                data=self.rsin_v,  # type: ignore
                 dims=GridDefinitions.rsin_v.dims,
                 units=GridDefinitions.rsin_v.units,
             ),
             rsin2=self.quantity_factory.from_array(
-                data=self.rsin2,
+                data=self.rsin2,  # type: ignore
                 dims=GridDefinitions.rsin2.dims,
                 units=GridDefinitions.rsin2.units,
             ),
         )
         angle = AngleGridData(
             sin_sg1=self.quantity_factory.from_array(
-                data=self.sin_sg1,
+                data=self.sin_sg1,  # type: ignore
                 dims=GridDefinitions.sin_sg1.dims,
                 units=GridDefinitions.sin_sg1.units,
             ),
             sin_sg2=self.quantity_factory.from_array(
-                data=self.sin_sg2,
+                data=self.sin_sg2,  # type: ignore
                 dims=GridDefinitions.sin_sg2.dims,
                 units=GridDefinitions.sin_sg2.units,
             ),
             sin_sg3=self.quantity_factory.from_array(
-                data=self.sin_sg3,
+                data=self.sin_sg3,  # type: ignore
                 dims=GridDefinitions.sin_sg3.dims,
                 units=GridDefinitions.sin_sg3.units,
             ),
             sin_sg4=self.quantity_factory.from_array(
-                data=self.sin_sg4,
+                data=self.sin_sg4,  # type: ignore
                 dims=GridDefinitions.sin_sg4.dims,
                 units=GridDefinitions.sin_sg4.units,
             ),
             sin_sg5=self.quantity_factory.from_array(
-                data=self.sin_sg5,
+                data=self.sin_sg5,  # type: ignore
                 dims=GridDefinitions.sin_sg5.dims,
                 units=GridDefinitions.sin_sg5.units,
             ),
             sin_sg6=self.quantity_factory.from_array(
-                data=self.sin_sg6,
+                data=self.sin_sg6,  # type: ignore
                 dims=GridDefinitions.sin_sg6.dims,
                 units=GridDefinitions.sin_sg6.units,
             ),
             sin_sg7=self.quantity_factory.from_array(
-                data=self.sin_sg7,
+                data=self.sin_sg7,  # type: ignore
                 dims=GridDefinitions.sin_sg7.dims,
                 units=GridDefinitions.sin_sg7.units,
             ),
             sin_sg8=self.quantity_factory.from_array(
-                data=self.sin_sg8,
+                data=self.sin_sg8,  # type: ignore
                 dims=GridDefinitions.sin_sg8.dims,
                 units=GridDefinitions.sin_sg8.units,
             ),
             sin_sg9=self.quantity_factory.from_array(
-                data=self.sin_sg9,
+                data=self.sin_sg9,  # type: ignore
                 dims=GridDefinitions.sin_sg9.dims,
                 units=GridDefinitions.sin_sg9.units,
             ),
             cos_sg1=self.quantity_factory.from_array(
-                data=self.cos_sg1,
+                data=self.cos_sg1,  # type: ignore
                 dims=GridDefinitions.cos_sg1.dims,
                 units=GridDefinitions.cos_sg1.units,
             ),
             cos_sg2=self.quantity_factory.from_array(
-                data=self.cos_sg2,
+                data=self.cos_sg2,  # type: ignore
                 dims=GridDefinitions.cos_sg2.dims,
                 units=GridDefinitions.cos_sg2.units,
             ),
             cos_sg3=self.quantity_factory.from_array(
-                data=self.cos_sg3,
+                data=self.cos_sg3,  # type: ignore
                 dims=GridDefinitions.cos_sg3.dims,
                 units=GridDefinitions.cos_sg3.units,
             ),
             cos_sg4=self.quantity_factory.from_array(
-                data=self.cos_sg4,
+                data=self.cos_sg4,  # type: ignore
                 dims=GridDefinitions.cos_sg4.dims,
                 units=GridDefinitions.cos_sg4.units,
             ),
             cos_sg5=self.quantity_factory.from_array(
-                data=self.cos_sg5,
+                data=self.cos_sg5,  # type: ignore
                 dims=GridDefinitions.cos_sg5.dims,
                 units=GridDefinitions.cos_sg5.units,
             ),
             cos_sg6=self.quantity_factory.from_array(
-                data=self.cos_sg6,
+                data=self.cos_sg6,  # type: ignore
                 dims=GridDefinitions.cos_sg6.dims,
                 units=GridDefinitions.cos_sg6.units,
             ),
             cos_sg7=self.quantity_factory.from_array(
-                data=self.cos_sg7,
+                data=self.cos_sg7,  # type: ignore
                 dims=GridDefinitions.cos_sg7.dims,
                 units=GridDefinitions.cos_sg7.units,
             ),
             cos_sg8=self.quantity_factory.from_array(
-                data=self.cos_sg8,
+                data=self.cos_sg8,  # type: ignore
                 dims=GridDefinitions.cos_sg8.dims,
                 units=GridDefinitions.cos_sg8.units,
             ),
             cos_sg9=self.quantity_factory.from_array(
-                data=self.cos_sg9,
+                data=self.cos_sg9,  # type: ignore
                 dims=GridDefinitions.cos_sg9.dims,
                 units=GridDefinitions.cos_sg9.units,
             ),
@@ -816,8 +823,8 @@ def grid_data(self) -> GridData:
             vertical_data=vertical,
             contravariant_data=contravariant,
             angle_data=angle,
-            fc=self.fC,
-            fc_agrid=self.f0,
+            fc=self.fC,  # type: ignore
+            fc_agrid=self.f0,  # type: ignore
         )
         return self._grid_data
 
@@ -825,21 +832,23 @@ def grid_data(self) -> GridData:
     def driver_grid_data(self) -> DriverGridData:
         if self._driver_grid_data is None:
             self._driver_grid_data = DriverGridData.new_from_grid_variables(
-                vlon=self.vlon,
-                vlat=self.vlat,
-                edge_vect_w=self.edge_vect_w,
-                edge_vect_e=self.edge_vect_e,
-                edge_vect_s=self.edge_vect_s,
-                edge_vect_n=self.edge_vect_n,
-                es1=self.es1,
-                ew2=self.ew2,
+                vlon=self.vlon,  # type: ignore
+                vlat=self.vlat,  # type: ignore
+                edge_vect_w=self.edge_vect_w,  # type: ignore
+                edge_vect_e=self.edge_vect_e,  # type: ignore
+                edge_vect_s=self.edge_vect_s,  # type: ignore
+                edge_vect_n=self.edge_vect_n,  # type: ignore
+                es1=self.es1,  # type: ignore
+                ew2=self.ew2,  # type: ignore
             )
         return self._driver_grid_data
 
-    def set_grid_data(self, grid_data: GridData):
+    def set_grid_data(self, grid_data: GridData) -> None:
         self._grid_data = grid_data
 
-    def make_grid_data(self, npx, npy, npz, communicator, backend: Backend):
+    def make_grid_data(
+        self, npx: int, npy: int, npz: int, communicator: Communicator, backend: Backend
+    ) -> None:
         metric_terms = MetricTerms.from_tile_sizing(
             npx=npx, npy=npy, npz=npz, communicator=communicator, backend=backend
         )
diff --git a/ndsl/stencils/testing/savepoint.py b/ndsl/stencils/testing/savepoint.py
index 22263b40..ff4e1be3 100644
--- a/ndsl/stencils/testing/savepoint.py
+++ b/ndsl/stencils/testing/savepoint.py
@@ -5,7 +5,7 @@
 import numpy as np
 import xarray as xr
 
-from ndsl.stencils.testing.grid import Grid  # type: ignore
+from ndsl.stencils.testing.grid import Grid
 
 
 def dataset_to_dict(ds: xr.Dataset) -> dict[str, np.ndarray | float | int]:
diff --git a/ndsl/stencils/testing/translate.py b/ndsl/stencils/testing/translate.py
index 45d4a8c2..f45b1f47 100644
--- a/ndsl/stencils/testing/translate.py
+++ b/ndsl/stencils/testing/translate.py
@@ -9,7 +9,7 @@
 from ndsl.dsl.stencil import StencilFactory
 from ndsl.optional_imports import cupy as cp
 from ndsl.quantity import Quantity
-from ndsl.stencils.testing.grid import Grid  # type: ignore
+from ndsl.stencils.testing.grid import Grid
 from ndsl.stencils.testing.savepoint import DataLoader
 
 
From 86d7d54d38d639b5be6309f0b84e8deb4d8a3c03 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Tue, 24 Mar 2026 18:23:47 +0100
Subject: [PATCH 10/28] build: update gt4py (data dimensions size one) (#406)

---
 external/gt4py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/external/gt4py b/external/gt4py
index 6f5a9e5c..2e96fc08 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit 6f5a9e5cc699a4cce7197849a78dfe83bf48379d
+Subproject commit 2e96fc08b55f7fa596a7a3d506f2cf18c3fcd349

From b6e94f12043ac7979536b10d471b239b35e5298c Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Wed, 25 Mar 2026 14:51:28 +0100
Subject: [PATCH 11/28] fix: cleanups in translate test discovery (#405)

* cleanups in translate test discovery

* only take file part
---
 ndsl/stencils/testing/conftest.py | 130 ++++++++++++++----------------
 1 file changed, 62 insertions(+), 68 deletions(-)

diff --git a/ndsl/stencils/testing/conftest.py b/ndsl/stencils/testing/conftest.py
index b2ceaacf..bd7d7b42 100644
--- a/ndsl/stencils/testing/conftest.py
+++ b/ndsl/stencils/testing/conftest.py
@@ -1,5 +1,4 @@
-import os
-import re
+from collections.abc import Callable
 from pathlib import Path
 from typing import Any
 
@@ -124,10 +123,10 @@ def pytest_configure(config: pytest.Config) -> None:
 
 @pytest.fixture()
 def data_path(pytestconfig: pytest.Config) -> tuple[Path, Path]:
-    return data_path_and_namelist_filename_from_config(pytestconfig)
+    return _data_path_and_namelist_filename_from_config(pytestconfig)
 
 
-def data_path_and_namelist_filename_from_config(
+def _data_path_and_namelist_filename_from_config(
     config: pytest.Config,
 ) -> tuple[Path, Path]:
     data_path = Path(config.getoption("data_path"))
@@ -136,81 +135,78 @@ def data_path_and_namelist_filename_from_config(
 
 @pytest.fixture
 def threshold_overrides(pytestconfig: pytest.Config) -> dict | None:
-    return thresholds_from_file(pytestconfig)
+    return _thresholds_from_file(pytestconfig)
 
 
-def thresholds_from_file(config: pytest.Config) -> dict | None:
+def _thresholds_from_file(config: pytest.Config) -> dict | None:
     thresholds_file = config.getoption("threshold_overrides_file")
     if thresholds_file is None:
         return None
     return yaml.safe_load(open(thresholds_file, "r"))
 
 
-def get_test_class(test_name: str) -> type | None:
+def _test_class_from_name(test_name: str) -> type:
     translate_class_name = f"Translate{test_name.replace('-', '_')}"
     try:
         return_class = getattr(translate, translate_class_name)  # type: ignore[name-defined] # noqa: F821
     except AttributeError as err:
         if translate_class_name in err.args[0]:
-            return None
+            # raise with custom error message if translate test wasn't found
+            raise ValueError(
+                f"Could not find translate test class for test name '{test_name}'."
+            )
         raise err
     return return_class
 
 
-def is_parallel_test(test_name: str) -> bool:
-    test_class = get_test_class(test_name)
-    if test_class is None:
-        return False
+def _is_parallel(test_name: str) -> bool:
+    test_class = _test_class_from_name(test_name)
     return issubclass(test_class, ParallelTranslate)
 
 
-def get_test_class_instance(
+def _is_sequential(test_name: str) -> bool:
+    return not _is_parallel(test_name)
+
+
+def _test_class_instance(
     test_name: str, grid: Grid, namelist: Namelist, stencil_factory: StencilFactory
 ) -> Translate:
-    translate_class = get_test_class(test_name)
-    if translate_class is None:
-        raise ValueError(
-            f"Could not find translate test class for test name '{test_name}'."
-        )
-
+    translate_class = _test_class_from_name(test_name)
     return translate_class(grid, namelist, stencil_factory)
 
 
-def get_all_savepoint_names(metafunc: Any, data_path: Path) -> set[str]:
+def _all_savepoint_names(
+    metafunc: Any, data_path: Path, predicate: Callable[[str], bool] | None
+) -> list[str]:
     only_names = metafunc.config.getoption("which_modules")
     if only_names is None:
-        names = [
-            fname[:-3] for fname in os.listdir(data_path) if re.match(r".*\.nc", fname)
-        ]
-        savepoint_names = set([s[:-3] for s in names if s.endswith("-In")])
+        savepoint_names = set(
+            str(fname.name)[:-6] for fname in data_path.glob("*-In.nc")
+        )
     else:
         savepoint_names = set(only_names.split(","))
         savepoint_names.discard("")
+
+    # Handle skipped translate tests
     skip_names = metafunc.config.getoption("skip_modules")
     if skip_names is not None:
         savepoint_names.difference_update(skip_names.split(","))
-    return savepoint_names
 
+    if predicate is None:
+        return list(savepoint_names)
+
+    return [name for name in savepoint_names if predicate(name)]
 
-def get_sequential_savepoint_names(metafunc: Any, data_path: Path) -> list[str]:
-    all_names = get_all_savepoint_names(metafunc, data_path)
-    sequential_names = []
-    for name in all_names:
-        if not is_parallel_test(name):
-            sequential_names.append(name)
-    return sequential_names
 
+def _sequential_savepoint_names(metafunc: Any, data_path: Path) -> list[str]:
+    return _all_savepoint_names(metafunc, data_path, _is_sequential)
 
-def get_parallel_savepoint_names(metafunc: Any, data_path: Path) -> list[str]:
-    all_names = get_all_savepoint_names(metafunc, data_path)
-    parallel_names = []
-    for name in all_names:
-        if is_parallel_test(name):
-            parallel_names.append(name)
-    return parallel_names
 
+def _parallel_savepoint_names(metafunc: Any, data_path: Path) -> list[str]:
+    return _all_savepoint_names(metafunc, data_path, _is_parallel)
 
-def get_ranks(metafunc: Any, layout: tuple[int, int]) -> list[int] | range:
+
+def _get_ranks(metafunc: Any, layout: tuple[int, int]) -> list[int] | range:
     only_rank = metafunc.config.getoption("which_rank")
     if only_rank is not None:
         return [int(only_rank)]
@@ -222,17 +218,17 @@ def get_ranks(metafunc: Any, layout: tuple[int, int]) -> list[int] | range:
     elif topology == "cubed-sphere":
         total_ranks = 6 * layout[0] * layout[1]
     else:
-        raise NotImplementedError(f"Topology {topology} is unknown.")
+        raise NotImplementedError(f"Topology '{topology}' is unknown.")
 
     return range(total_ranks)
 
 
-def get_savepoint_restriction(metafunc: Any) -> int | None:
+def _get_savepoint_restriction(metafunc: Any) -> int | None:
     svpt = metafunc.config.getoption("which_savepoint")
     return int(svpt) if svpt else None
 
 
-def get_config(backend: Backend, communicator: Communicator | None) -> StencilConfig:
+def _get_config(backend: Backend, communicator: Communicator | None) -> StencilConfig:
     stencil_config = StencilConfig(
         compilation_config=CompilationConfig(
             backend=backend, rebuild=False, validate_args=True
@@ -245,16 +241,16 @@ def get_config(backend: Backend, communicator: Communicator | None) -> StencilCo
     return stencil_config
 
 
-def sequential_savepoint_cases(
+def _sequential_savepoint_cases(
     metafunc: Any, data_path: Path, namelist_filename: Path, *, backend: str
 ) -> list[SavepointCase]:
     ndsl_backend = Backend(backend)
-    savepoint_names = get_sequential_savepoint_names(metafunc, data_path)
+    savepoint_names = _sequential_savepoint_names(metafunc, data_path)
     namelist = load_f90nml(namelist_filename)
     grid_params = grid_params_from_f90nml(namelist)
-    stencil_config = get_config(ndsl_backend, None)
-    ranks = get_ranks(metafunc, grid_params["layout"])
-    savepoint_to_replay = get_savepoint_restriction(metafunc)
+    stencil_config = _get_config(ndsl_backend, None)
+    ranks = _get_ranks(metafunc, grid_params["layout"])
+    savepoint_to_replay = _get_savepoint_restriction(metafunc)
     grid_mode = metafunc.config.getoption("grid")
     topology_mode = metafunc.config.getoption("topology")
     sort_report = metafunc.config.getoption("sort_report")
@@ -311,7 +307,7 @@ def _savepoint_cases(
                 backend=backend,
             ).python_grid()
             if grid_mode == "compute":
-                compute_grid_data(
+                _compute_grid_data(
                     grid, grid_params, backend, grid_params["layout"], topology_mode
                 )
         else:
@@ -322,9 +318,7 @@ def _savepoint_cases(
             grid_indexing=grid.grid_indexing,
         )
         for test_name in sorted(list(savepoint_names)):
-            testobj = get_test_class_instance(
-                test_name, grid, namelist, stencil_factory
-            )
+            testobj = _test_class_instance(test_name, grid, namelist, stencil_factory)
             n_calls = xr.open_dataset(data_path / f"{test_name}-In.nc").sizes[
                 "savepoint"
             ]
@@ -347,7 +341,7 @@ def _savepoint_cases(
     return return_list
 
 
-def compute_grid_data(
+def _compute_grid_data(
     grid: Grid,
     grid_params: dict,
     backend: Backend,
@@ -358,12 +352,12 @@ def compute_grid_data(
         npx=grid_params["npx"],
         npy=grid_params["npy"],
         npz=grid_params["npz"],
-        communicator=get_communicator(MPIComm(), layout, topology_mode),
+        communicator=_get_communicator(MPIComm(), layout, topology_mode),
         backend=backend,
     )
 
 
-def parallel_savepoint_cases(
+def _parallel_savepoint_cases(
     metafunc: Any,
     data_path: Path,
     namelist_filename: Path,
@@ -378,11 +372,11 @@ def parallel_savepoint_cases(
     topology_mode = metafunc.config.getoption("topology")
     sort_report = metafunc.config.getoption("sort_report")
     no_report = metafunc.config.getoption("no_report")
-    communicator = get_communicator(comm, grid_params["layout"], topology_mode)
-    stencil_config = get_config(ndsl_backend, communicator)
-    savepoint_names = get_parallel_savepoint_names(metafunc, data_path)
+    communicator = _get_communicator(comm, grid_params["layout"], topology_mode)
+    stencil_config = _get_config(ndsl_backend, communicator)
+    savepoint_names = _parallel_savepoint_names(metafunc, data_path)
     grid_mode = metafunc.config.getoption("grid")
-    savepoint_to_replay = get_savepoint_restriction(metafunc)
+    savepoint_to_replay = _get_savepoint_restriction(metafunc)
 
     return _savepoint_cases(
         savepoint_names,
@@ -403,16 +397,16 @@ def pytest_generate_tests(metafunc: Any) -> None:
     backend = metafunc.config.getoption("backend")
     if MPI.COMM_WORLD.Get_size() > 1:
         if metafunc.function.__name__ == "test_parallel_savepoint":
-            generate_parallel_stencil_tests(metafunc, backend=backend)
+            _generate_parallel_stencil_tests(metafunc, backend=backend)
     elif metafunc.function.__name__ == "test_sequential_savepoint":
-        generate_sequential_stencil_tests(metafunc, backend=backend)
+        _generate_sequential_stencil_tests(metafunc, backend=backend)
 
 
-def generate_sequential_stencil_tests(metafunc: Any, *, backend: str) -> None:
-    data_path, namelist_filename = data_path_and_namelist_filename_from_config(
+def _generate_sequential_stencil_tests(metafunc: Any, *, backend: str) -> None:
+    data_path, namelist_filename = _data_path_and_namelist_filename_from_config(
         metafunc.config
     )
-    savepoint_cases = sequential_savepoint_cases(
+    savepoint_cases = _sequential_savepoint_cases(
         metafunc,
         data_path,
         namelist_filename,
@@ -423,13 +417,13 @@ def generate_sequential_stencil_tests(metafunc: Any, *, backend: str) -> None:
     )
 
 
-def generate_parallel_stencil_tests(metafunc: Any, *, backend: str) -> None:
-    data_path, namelist_filename = data_path_and_namelist_filename_from_config(
+def _generate_parallel_stencil_tests(metafunc: Any, *, backend: str) -> None:
+    data_path, namelist_filename = _data_path_and_namelist_filename_from_config(
         metafunc.config
     )
     # get MPI environment
     comm = MPIComm()
-    savepoint_cases = parallel_savepoint_cases(
+    savepoint_cases = _parallel_savepoint_cases(
         metafunc,
         data_path,
         namelist_filename,
@@ -442,7 +436,7 @@ def generate_parallel_stencil_tests(metafunc: Any, *, backend: str) -> None:
     )
 
 
-def get_communicator(
+def _get_communicator(
     comm: Comm, layout: tuple[int, int], topology_mode: str
 ) -> Communicator:
     tile_partitioner = TilePartitioner(layout)

From 0fff1142897fc428a795f5438bcc655e37eac1ef Mon Sep 17 00:00:00 2001
From: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Date: Wed, 25 Mar 2026 12:54:42 -0400
Subject: [PATCH 12/28] Improved translate test logging (#398)

* fixed the changing points calculations, added changing column calculation, and updated the log within the comparision calculations of the translate test

* linting

* better error handling in cases where bad columns cannot be computed

* Better messaging when no input data is available

* new print statement and other minor changes based on PR feedback

* checking if column count and changing point count worked before printing one line result

* addded a comming + linting
---
 ndsl/testing/comparison.py | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/ndsl/testing/comparison.py b/ndsl/testing/comparison.py
index c1a5b1ef..b5c1ac0d 100644
--- a/ndsl/testing/comparison.py
+++ b/ndsl/testing/comparison.py
@@ -260,11 +260,17 @@ def __init__(
         # We might have sliced outputs in the translate test. Rather than funnel the slicing
         # all the way down, we bail out if we can measure input vs reference
         if input_values is not None and input_values.shape == reference_values.shape:
-            self.number_changing_values = (
-                (input_values != reference_values).flatten().shape[0]
-            )
+            self.number_changing_values = (input_values != reference_values).sum()
+            # column information is only relevant if data is three-dimensional
+            if len(input_values.shape) == 3:
+                self.changing_column_map = (input_values != reference_values).any(
+                    axis=2
+                )
+            else:
+                self.changing_column_map = None
         else:
             self.number_changing_values = None
+            self.changing_column_map = None
 
     def _compute_all_metrics(
         self,
@@ -338,20 +344,38 @@ def report(self, file_path: str | None = None) -> list[str]:
         failed_indices = np.logical_not(self.success).nonzero()
         # List all errors to terminal and file
         bad_indices_count = len(failed_indices[0])
+        if self.changing_column_map is not None:
+            if self.success.ndim == 3:
+                bad_column_count = (
+                    np.logical_not(self.success).any(axis=2) & self.changing_column_map
+                ).sum()
+                total_column_count = self.changing_column_map.sum()
+                bad_column_pct = round(bad_column_count / total_column_count * 100, 2)
+            else:
+                bad_column_count = None
+                total_column_count = None
+                bad_column_pct = None
+        else:
+            bad_column_count = None
+            total_column_count = None
+            bad_column_pct = None
         full_count = len(self.references.flatten())
         failures_of_all_grid_points_pct = round(
             100.0 * (bad_indices_count / full_count), 2
         )
-        if self.number_changing_values is not None:
+        if (
+            self.number_changing_values is not None
+            and bad_indices_count is not None
+            and bad_column_count is not None
+        ):
             failures_of_changing_gridpoint_pct = round(
                 100.0 * (bad_indices_count / self.number_changing_values), 2
             )
-            report_local_failures = f"Failures (changing grid points) ({bad_indices_count}/{self.number_changing_values}) ({failures_of_changing_gridpoint_pct}%)\n"
+            report_local_failures = f"Failures: (changing columns, chainging points, all points) | {bad_column_count}/{total_column_count} - {bad_column_pct}%, {bad_indices_count}/{self.number_changing_values} - {failures_of_changing_gridpoint_pct}%, {bad_indices_count}/{full_count} - {failures_of_all_grid_points_pct}%\n"
         else:
-            report_local_failures = ""
+            report_local_failures = f"all grid points: {bad_indices_count}/{full_count} - {failures_of_all_grid_points_pct}%\n"
         report = [
             f"{report_local_failures}"
-            f"Failures (all grid points) ({bad_indices_count}/{full_count}) ({failures_of_all_grid_points_pct}%)\n",
             f"Index   Computed   Reference   "
             f"{'🔶 ' if not self.absolute_eps.is_default else ''}Absolute E(<{self.absolute_eps.value:.2e})  "
             f"{'🔶 ' if not self.relative_fraction.is_default else ''}Relative E(<{self.relative_fraction.value * 100:.2e}%)   "

From ccf2ed36420c6377901bf52c86396e208e81e686 Mon Sep 17 00:00:00 2001
From: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Date: Wed, 25 Mar 2026 12:56:40 -0400
Subject: [PATCH 13/28] Improved data_loader (#402)

* added ability for data_loader to auto-detect savepoint index in netcdf file

* added _ before internal field in DataLoader

* linting
---
 ndsl/stencils/testing/savepoint.py      | 8 +++++---
 ndsl/stencils/testing/test_translate.py | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/ndsl/stencils/testing/savepoint.py b/ndsl/stencils/testing/savepoint.py
index ff4e1be3..d57b71d2 100644
--- a/ndsl/stencils/testing/savepoint.py
+++ b/ndsl/stencils/testing/savepoint.py
@@ -22,20 +22,22 @@ def _process_if_scalar(value: np.ndarray) -> np.ndarray | float | int:
 
 
 class DataLoader:
-    def __init__(self, rank: int, data_path: Path) -> None:
+    def __init__(self, rank: int, data_path: Path, i_call: int) -> None:
         self._data_path = data_path
         self._rank = rank
+        self._i_call = i_call
 
     def load(
         self,
         name: str,
         postfix: str = "",
-        i_call: int = 0,
+        use_dynamic_i_call: bool = False,
     ) -> dict[str, np.ndarray | float | int]:
+        call_index = self._i_call if use_dynamic_i_call else 0
         return dataset_to_dict(
             xr.open_dataset(self._data_path / f"{name}{postfix}.nc")
             .isel(rank=self._rank)
-            .isel(savepoint=i_call)
+            .isel(savepoint=call_index)
         )
 
 
diff --git a/ndsl/stencils/testing/test_translate.py b/ndsl/stencils/testing/test_translate.py
index c7f7fa22..a4068269 100644
--- a/ndsl/stencils/testing/test_translate.py
+++ b/ndsl/stencils/testing/test_translate.py
@@ -214,7 +214,7 @@ def test_sequential_savepoint(
     original_input_data = copy.deepcopy(input_data)
     # give the user a chance to load data from other savepoints to allow
     # for gathering required data from multiple sources (constants, etc.)
-    case.testobj.extra_data_load(DataLoader(case.grid.rank, case.data_dir))
+    case.testobj.extra_data_load(DataLoader(case.grid.rank, case.data_dir, case.i_call))
 
     # run python version of functionality
     output = case.testobj.compute(input_data)

From c47852d879379825ebff5d4c8476493149e7898f Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Fri, 27 Mar 2026 18:40:10 +0100
Subject: [PATCH 14/28] fix backend default in translate test conftest (#407)

---
 ndsl/stencils/testing/conftest.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ndsl/stencils/testing/conftest.py b/ndsl/stencils/testing/conftest.py
index bd7d7b42..6e5b17af 100644
--- a/ndsl/stencils/testing/conftest.py
+++ b/ndsl/stencils/testing/conftest.py
@@ -33,7 +33,7 @@ def pytest_addoption(parser: pytest.Parser) -> None:
     parser.addoption(
         "--backend",
         action="store",
-        default="st:python:cpu:numpy",
+        default="st:numpy:cpu:IJK",
         help="Backend to execute the test with, can only be one.",
     )
     parser.addoption(

From 30b068c3dd3c3a792cd09780302332c1c1fe8f00 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Mon, 30 Mar 2026 19:36:31 +0200
Subject: [PATCH 15/28] build: update GT4Py (loop layout fixes) (#410)

---
 external/gt4py                          | 2 +-
 ndsl/stencils/testing/test_translate.py | 6 ++----
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/external/gt4py b/external/gt4py
index 2e96fc08..7c74e715 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit 2e96fc08b55f7fa596a7a3d506f2cf18c3fcd349
+Subproject commit 7c74e71542993354216df6254b4d02ed7500c732
diff --git a/ndsl/stencils/testing/test_translate.py b/ndsl/stencils/testing/test_translate.py
index a4068269..091db5f6 100644
--- a/ndsl/stencils/testing/test_translate.py
+++ b/ndsl/stencils/testing/test_translate.py
@@ -315,14 +315,12 @@ def test_sequential_savepoint(
 
 def get_communicator(comm, layout):
     partitioner = CubedSpherePartitioner(TilePartitioner(layout))
-    communicator = CubedSphereCommunicator(comm, partitioner)
-    return communicator
+    return CubedSphereCommunicator(comm, partitioner)
 
 
 def get_tile_communicator(comm, layout):
     partitioner = TilePartitioner(layout)
-    communicator = TileCommunicator(comm, partitioner)
-    return communicator
+    return TileCommunicator(comm, partitioner)
 
 
 @pytest.mark.parallel

From 4f5315efe920a6e2a1773809cfdc2628ffa79495 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Tue, 31 Mar 2026 18:47:46 +0200
Subject: [PATCH 16/28] fix: xumpy.random honors dtype now (#412)

---
 ndsl/xumpy/alloc.py                | 16 ++++++++++---
 tests/test_xumpy.py                | 38 ------------------------------
 tests/xumpy/__init__.py            |  0
 tests/xumpy/test_alloc.py          | 31 ++++++++++++++++++++++++
 tests/xumpy/test_count_nonzeros.py | 12 ++++++++++
 tests/xumpy/test_minmax.py         | 17 +++++++++++++
 6 files changed, 73 insertions(+), 41 deletions(-)
 delete mode 100644 tests/test_xumpy.py
 create mode 100644 tests/xumpy/__init__.py
 create mode 100644 tests/xumpy/test_alloc.py
 create mode 100644 tests/xumpy/test_count_nonzeros.py
 create mode 100644 tests/xumpy/test_minmax.py

diff --git a/ndsl/xumpy/alloc.py b/ndsl/xumpy/alloc.py
index dc3fd0b2..77be4cac 100644
--- a/ndsl/xumpy/alloc.py
+++ b/ndsl/xumpy/alloc.py
@@ -2,6 +2,7 @@
 
 import numpy as np
 import numpy.typing as npt
+from numpy._typing import _SupportsDType
 
 from ndsl.config import Backend
 from ndsl.dsl.typing import Float
@@ -13,6 +14,12 @@
 
 # Taking a page from cupy's playbook to have tuple & ndarray
 _ShapeLike = SupportsIndex | Sequence[SupportsIndex]
+_DTypeLikeFloat32 = (
+    np.dtype[np.float32] | _SupportsDType[np.dtype[np.float32]] | type[np.float32]
+)
+_DTypeLikeFloat64 = (
+    np.dtype[np.float64] | _SupportsDType[np.dtype[np.float64]] | type[np.float64]
+)
 
 
 def zeros(
@@ -59,8 +66,11 @@ def full(
 def random(
     shape: _ShapeLike,
     backend: Backend,
-    dtype: npt.DTypeLike = Float,
+    dtype: _DTypeLikeFloat32 | _DTypeLikeFloat64 = Float,  # type: ignore [valid-type]
 ) -> np.ndarray | cp.ndarray:
     if backend.is_gpu_backend():
-        cp.random.rand(*shape)
-    return np.random.rand(*shape)
+        gen = cp.random.default_rng()
+        return gen.random(shape, dtype, None)
+
+    gen = np.random.default_rng()
+    return gen.random(shape, dtype, None)
diff --git a/tests/test_xumpy.py b/tests/test_xumpy.py
deleted file mode 100644
index cc198bb7..00000000
--- a/tests/test_xumpy.py
+++ /dev/null
@@ -1,38 +0,0 @@
-import numpy as np
-
-import ndsl.xumpy as xp
-from ndsl.config import Backend
-
-
-shape = (2, 2, 5)
-
-
-def test_xumpy_alloc():
-    rand_array = xp.random(shape, Backend.python())
-    assert rand_array.shape == shape
-    assert (rand_array != xp.random(shape, Backend.python())).all()
-
-    assert (np.ones(shape) == xp.ones(shape, Backend.python())).all()
-    assert (np.zeros(shape) == xp.zeros(shape, Backend.python())).all()
-    assert (
-        np.full(shape, 42.42) == xp.full(shape, value=42.42, backend=Backend.python())
-    ).all()
-
-
-def test_xumpy_minmax():
-    rand_array = xp.random(shape, Backend.python())
-
-    assert (np.max(rand_array, axis=1) == xp.max(rand_array, axis=1)).all()
-    assert (np.min(rand_array, axis=1) == xp.min(rand_array, axis=1)).all()
-
-    out_buffer = xp.empty(shape, Backend.python())
-    xp.max_on_horizontal_plane(rand_array, out_buffer)
-
-    assert (np.max(rand_array, axis=(0, 1)) == out_buffer).all()
-
-
-def test_xumpy_counts():
-    rand_array = xp.random(shape, Backend.python())
-    rand_array[1, 1, :] = 0
-
-    assert np.count_nonzero(rand_array) == xp.count_nonzero(rand_array)
diff --git a/tests/xumpy/__init__.py b/tests/xumpy/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/xumpy/test_alloc.py b/tests/xumpy/test_alloc.py
new file mode 100644
index 00000000..d348d0a7
--- /dev/null
+++ b/tests/xumpy/test_alloc.py
@@ -0,0 +1,31 @@
+import numpy as np
+import pytest
+
+from ndsl import xumpy as xp
+from ndsl.config import Backend
+
+
+@pytest.mark.parametrize("dtype", [None, np.float32, np.float64])
+def test_random(dtype) -> None:
+    shape = (2, 3, 5)
+    rand_array = xp.random(shape, Backend.python())
+    assert rand_array.shape == shape
+    assert (rand_array != xp.random(shape, Backend.python())).all()
+
+
+def test_ones() -> None:
+    shape = (2, 3, 5)
+    assert (np.ones(shape) == xp.ones(shape, Backend.python())).all()
+
+
+def test_zeros() -> None:
+    shape = (2, 3, 5)
+    assert (np.zeros(shape) == xp.zeros(shape, Backend.python())).all()
+
+
+def test_full() -> None:
+    shape = (2, 3, 5)
+    value = 42.42
+    assert (
+        np.full(shape, value) == xp.full(shape, value=value, backend=Backend.python())
+    ).all()
diff --git a/tests/xumpy/test_count_nonzeros.py b/tests/xumpy/test_count_nonzeros.py
new file mode 100644
index 00000000..4f0c2837
--- /dev/null
+++ b/tests/xumpy/test_count_nonzeros.py
@@ -0,0 +1,12 @@
+import numpy as np
+
+from ndsl import xumpy as xp
+from ndsl.config import Backend
+
+
+def test_count_nonzero():
+    shape = (2, 3, 5)
+    rand_array = xp.random(shape, Backend.python())
+    rand_array[1, 1, :] = 0
+
+    assert np.count_nonzero(rand_array) == xp.count_nonzero(rand_array)
diff --git a/tests/xumpy/test_minmax.py b/tests/xumpy/test_minmax.py
new file mode 100644
index 00000000..ccdf2b3b
--- /dev/null
+++ b/tests/xumpy/test_minmax.py
@@ -0,0 +1,17 @@
+import numpy as np
+
+from ndsl import xumpy as xp
+from ndsl.config import Backend
+
+
+def test_minmax():
+    shape = (2, 3, 5)
+    rand_array = xp.random(shape, Backend.python())
+
+    assert (np.max(rand_array, axis=1) == xp.max(rand_array, axis=1)).all()
+    assert (np.min(rand_array, axis=1) == xp.min(rand_array, axis=1)).all()
+
+    out_buffer = xp.empty(shape, Backend.python())
+    xp.max_on_horizontal_plane(rand_array, out_buffer)
+
+    assert (np.max(rand_array, axis=(0, 1)) == out_buffer).all()

From da721f11f6259a8e03a8737b0f6fa7909f81be01 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Tue, 31 Mar 2026 18:51:20 +0200
Subject: [PATCH 17/28] refactor: use xumpy for allocation in gt4py_utils
 (#388)

---
 ndsl/dsl/gt4py_utils.py | 54 +++++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/ndsl/dsl/gt4py_utils.py b/ndsl/dsl/gt4py_utils.py
index e4dcb532..8f944d0f 100644
--- a/ndsl/dsl/gt4py_utils.py
+++ b/ndsl/dsl/gt4py_utils.py
@@ -1,3 +1,4 @@
+import warnings
 from collections.abc import Callable, Sequence
 from functools import wraps
 from typing import Any
@@ -6,9 +7,10 @@
 import numpy.typing as npt
 from gt4py import storage as gt_storage
 
+from ndsl import xumpy
 from ndsl.config.backend import Backend
 from ndsl.constants import N_HALO_DEFAULT
-from ndsl.dsl.typing import DTypes, Float
+from ndsl.dsl.typing import Float
 from ndsl.logging import ndsl_log
 from ndsl.optional_imports import cupy as cp
 
@@ -49,19 +51,16 @@ def wrapper(*args, **kwargs) -> Any:
     return inner
 
 
-def _mask_to_dimensions(
-    mask: tuple[bool, ...], shape: Sequence[int]
-) -> list[str | int]:
+def _mask_to_dimensions(mask: tuple[bool, ...], shape: Sequence[int]) -> list[str]:
     assert len(mask) >= 3
-    dimensions: list[str | int] = []
+    dimensions: list[str] = []
     for i, axis in enumerate(("I", "J", "K")):
         if mask[i]:
             dimensions.append(axis)
     if len(mask) > 3:
         for i in range(3, len(mask)):
-            dimensions.append(str(shape[i]))
-    offset = int(sum(mask))
-    dimensions.extend(shape[offset:])
+            if mask[i]:
+                dimensions.append(str(shape[i]))
     return dimensions
 
 
@@ -86,7 +85,7 @@ def make_storage_data(
     origin: tuple[int, ...] = origin,
     *,
     backend: Backend,
-    dtype: DTypes = Float,
+    dtype: npt.DTypeLike = Float,
     mask: tuple[bool, ...] | None = None,
     start: tuple[int, ...] = (0, 0, 0),
     dummy: tuple[int, ...] | None = None,
@@ -205,12 +204,12 @@ def _make_storage_data_1d(
     axis: int = 2,
     read_only: bool = True,
     *,
-    dtype: DTypes = Float,
+    dtype: npt.DTypeLike = Float,
     backend: Backend,
 ) -> npt.NDArray:
     # axis refers to a repeated axis, dummy refers to a singleton axis
     axis = min(axis, len(shape) - 1)
-    buffer = zeros(shape[axis], dtype=dtype, backend=backend)
+    buffer = xumpy.zeros(shape[axis], backend, dtype)
     if dummy:
         axis = list(set((0, 1, 2)).difference(dummy))[0]
 
@@ -242,7 +241,7 @@ def _make_storage_data_2d(
     axis: int = 2,
     read_only: bool = True,
     *,
-    dtype: DTypes = Float,
+    dtype: npt.DTypeLike = Float,
     backend: Backend,
 ) -> npt.NDArray:
     # axis refers to which axis should be repeated (when making a full 3d data),
@@ -256,7 +255,7 @@ def _make_storage_data_2d(
 
     start1, start2 = start[0:2]
     size1, size2 = data.shape
-    buffer = zeros(shape2d, dtype=dtype, backend=backend)
+    buffer = xumpy.zeros(shape2d, backend, dtype)
     buffer[start1 : start1 + size1, start2 : start2 + size2] = asarray(
         data, type(buffer)
     )
@@ -276,12 +275,12 @@ def _make_storage_data_3d(
     shape: tuple[int, ...],
     start: tuple[int, ...] = (0, 0, 0),
     *,
-    dtype: DTypes = Float,
+    dtype: npt.DTypeLike = Float,
     backend: Backend,
 ) -> npt.NDArray:
     istart, jstart, kstart = start
     isize, jsize, ksize = data.shape
-    buffer = zeros(shape, dtype=dtype, backend=backend)
+    buffer = xumpy.zeros(shape, backend, dtype)
     buffer[
         istart : istart + isize,
         jstart : jstart + jsize,
@@ -295,12 +294,12 @@ def _make_storage_data_Nd(
     shape: tuple[int, ...],
     start: tuple[int, ...] | None = None,
     *,
-    dtype: DTypes = Float,
+    dtype: npt.DTypeLike = Float,
     backend: Backend,
 ) -> npt.NDArray:
     if start is None:
         start = tuple([0] * data.ndim)
-    buffer = zeros(shape, dtype=dtype, backend=backend)
+    buffer = xumpy.zeros(shape, backend, dtype)
     idx = tuple([slice(start[i], start[i] + data.shape[i]) for i in range(len(start))])
     buffer[idx] = asarray(data, type(buffer))
     return buffer
@@ -311,7 +310,7 @@ def make_storage_from_shape(
     origin: tuple[int, ...] = origin,
     *,
     backend: Backend,
-    dtype: DTypes = Float,
+    dtype: npt.DTypeLike = Float,
     mask: tuple[bool, ...] | None = None,
 ) -> npt.NDArray:
     """Create a new gt4py storage of a given shape filled with zeros.
@@ -333,12 +332,16 @@ def make_storage_from_shape(
            )
         3) q_out = utils.make_storage_from_shape(q_in.shape, origin,)
     """
-    if not mask:
+    if mask is None:
         n_dims = len(shape)
         if n_dims == 1:
             mask = (False, False, True)  # Assume 1D is a k-field
+        elif n_dims == 2:
+            mask = (True, True, False)  # Assume 2D is an ij-field
+        elif n_dims < 3:
+            raise NotImplementedError(f"Unexpected number of dimensions {n_dims}.")
         else:
-            mask = (n_dims * (True,)) + ((3 - n_dims) * (False,))
+            mask = n_dims * (True,)
     storage = gt_storage.zeros(
         shape,
         dtype,
@@ -359,7 +362,7 @@ def make_storage_dict(
     axis: int = 2,
     *,
     backend: Backend,
-    dtype: DTypes = Float,
+    dtype: npt.DTypeLike = Float,
 ) -> dict[str, npt.NDArray]:
     assert names is not None, "for 4d variable storages, specify a list of names"
     if shape is None:
@@ -447,9 +450,12 @@ def asarray(array, to_type=np.ndarray, dtype=None, order=None):
 
 
 def zeros(shape, dtype=Float, *, backend: Backend):
-    storage_type = cp.ndarray if backend.is_gpu_backend() else np.ndarray
-    xp = cp if cp and storage_type is cp.ndarray else np
-    return xp.zeros(shape, dtype=dtype)
+    warnings.warn(
+        "gt4py_utils.zeros() is deprecated. Use `zeros()` from `ndsl.xumpy` instead.",
+        category=DeprecationWarning,
+        stacklevel=2,
+    )
+    return xumpy.zeros(shape, backend, dtype)
 
 
 def sum(array, axis=None, dtype=Float, out=None, keepdims=False):

From db026fc79016cfef92f9ed64cfc47da17a5dc195 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Tue, 31 Mar 2026 23:45:15 +0200
Subject: [PATCH 18/28] build: update gt4py (fix scalarization issue with
 temporaries) (#413)

This PR updates the GT4Py submodule to fix a scalarization issue with
temporaries, which happened in GT4Py's optimization IR (oir).

The issue was found as part of stabilizing GF2020.
---
 external/gt4py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/external/gt4py b/external/gt4py
index 7c74e715..43df726b 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit 7c74e71542993354216df6254b4d02ed7500c732
+Subproject commit 43df726bc959e4d71bc4d24bb68498760d1ff61a

From a6228940aac6188fc9b3c18afe06b2fb1b8b86b5 Mon Sep 17 00:00:00 2001
From: Florian Deconinck <deconinck.florian@gmail.com>
Date: Wed, 1 Apr 2026 11:06:05 -0400
Subject: [PATCH 19/28] Default to `BuildAndRun` (#408)

---
 ndsl/dsl/stencil_config.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ndsl/dsl/stencil_config.py b/ndsl/dsl/stencil_config.py
index 43eb08e0..71aa7425 100644
--- a/ndsl/dsl/stencil_config.py
+++ b/ndsl/dsl/stencil_config.py
@@ -195,7 +195,7 @@ def __init__(
             else DaceConfig(
                 communicator=None,
                 backend=self.compilation_config.backend,
-                orchestration=DaCeOrchestration.Run,
+                orchestration=DaCeOrchestration.BuildAndRun,
             )
         )
         self.backend_opts = {

From 9dfdaf54aec421745d8e0927807d2ec67db5d05d Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Wed, 1 Apr 2026 22:23:11 +0200
Subject: [PATCH 20/28] build: update submodules (#417)

update gt4py to bring ddim rhs checker

update dace to bring it up-to-date with what's configured in gt4py.
---
 external/dace  | 2 +-
 external/gt4py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/external/dace b/external/dace
index 1fb39786..0d9f3b4e 160000
--- a/external/dace
+++ b/external/dace
@@ -1 +1 @@
-Subproject commit 1fb397865e89c6b8907c4de0cded046e153b48ac
+Subproject commit 0d9f3b4ede7a87aa3c86913481740390431e2b21
diff --git a/external/gt4py b/external/gt4py
index 43df726b..47b95820 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit 43df726bc959e4d71bc4d24bb68498760d1ff61a
+Subproject commit 47b95820dc4eda34397790d11b22dc2d1cb77f99

From 4294976fbfb60d3355df03af3787d09d77e5045d Mon Sep 17 00:00:00 2001
From: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>
Date: Wed, 1 Apr 2026 22:48:04 +0200
Subject: [PATCH 21/28] [Update] Upgrade to the `numpy` 2x series (#415)

* Update to `numpy` 2.x series

* Remove `index_tricsk` import (now deeper into impl)
Move to `no.prod`
Lint

* Clean up lint around `type` in Buffer.py

* Update to `safe_mpi_allocate` type hint

* Update `dtype` in halo update transformer

* Use `npt.DTypeLike` for better support

* Remove `NumpyModule` type wrapper and replace `types.Module`

* Restore `Allocator` in types

* Move `npt.DTypeLike` to cover more ground

* Missing commit

* type ignore a mypy mistake + restore MPI GPU test

* Go back to list of str for `op_flags`

* Narrow type ignore

* [TMP] CI using branches on the dowstream repositories

* Remove `type: ignore`

* `no_type_check` the entire thing

* Add `to_xarray` API to State

* update pyfv3

* Clean up left out bad `gt4py_backend` fixture (HOW DID EVER RUN?!)

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
---
 .github/workflows/fv3_translate_tests.yaml |  2 +-
 .github/workflows/pace_tests.yaml          |  2 +-
 .github/workflows/shield_tests.yaml        |  2 +-
 ndsl/buffer.py                             | 19 +++++++--------
 ndsl/comm/communicator.py                  | 12 +++++-----
 ndsl/dsl/typing.py                         |  3 ++-
 ndsl/halo/data_transformer.py              | 21 ++++++++++-------
 ndsl/halo/updater.py                       |  9 ++++----
 ndsl/quantity/metadata.py                  |  9 ++++----
 ndsl/quantity/quantity.py                  |  6 ++---
 ndsl/types.py                              | 27 ----------------------
 ndsl/typing.py                             |  2 +-
 ndsl/utils.py                              |  3 ++-
 ndsl/viz/fv3/_plot_cube.py                 |  2 +-
 ndsl/viz/fv3/_timestep_histograms.py       |  4 ++--
 setup.py                                   |  2 +-
 tests/mpi/test_mpi_halo_update.py          |  4 ++--
 17 files changed, 56 insertions(+), 73 deletions(-)

diff --git a/.github/workflows/fv3_translate_tests.yaml b/.github/workflows/fv3_translate_tests.yaml
index f28e5ad1..3adee5f9 100644
--- a/.github/workflows/fv3_translate_tests.yaml
+++ b/.github/workflows/fv3_translate_tests.yaml
@@ -10,7 +10,7 @@ on:
 
 jobs:
   fv3_translate_tests:
-    uses: NOAA-GFDL/pyFV3/.github/workflows/translate.yaml@develop
+    uses: twicki/pyFV3/.github/workflows/translate.yaml@update/numpy_2x
     with:
       component_trigger: true
       component_name: NDSL
diff --git a/.github/workflows/pace_tests.yaml b/.github/workflows/pace_tests.yaml
index ea3d40b3..b40482a9 100644
--- a/.github/workflows/pace_tests.yaml
+++ b/.github/workflows/pace_tests.yaml
@@ -10,7 +10,7 @@ on:
 
 jobs:
   pace_main_tests:
-    uses: NOAA-GFDL/pace/.github/workflows/main_unit_tests.yaml@develop
+    uses: floriandeconinck/pace/.github/workflows/main_unit_tests.yaml@update/numpy_2x
     with:
       component_trigger: true
       component_name: NDSL
diff --git a/.github/workflows/shield_tests.yaml b/.github/workflows/shield_tests.yaml
index 53ba510b..5c4eb812 100644
--- a/.github/workflows/shield_tests.yaml
+++ b/.github/workflows/shield_tests.yaml
@@ -10,7 +10,7 @@ on:
 
 jobs:
   shield_translate_tests:
-    uses: NOAA-GFDL/pySHiELD/.github/workflows/translate.yaml@develop
+    uses: floriandeconinck/pySHiELD/.github/workflows/translate.yaml@update/numpy_2x
     with:
       component_trigger: true
       component_name: NDSL
diff --git a/ndsl/buffer.py b/ndsl/buffer.py
index d19a5122..1430cbc3 100644
--- a/ndsl/buffer.py
+++ b/ndsl/buffer.py
@@ -2,9 +2,10 @@
 
 import contextlib
 from collections.abc import Callable, Generator, Iterable
+from typing import Any
 
 import numpy as np
-from numpy.lib.index_tricks import IndexExpression
+import numpy.typing as npt
 
 from ndsl.performance.timer import NullTimer, Timer
 from ndsl.types import Allocator
@@ -16,7 +17,7 @@
 )
 
 
-BufferKey = tuple[Callable, Iterable[int], type]
+BufferKey = tuple[Callable, Iterable[int], npt.DTypeLike]
 BUFFER_CACHE: dict[BufferKey, list["Buffer"]] = {}
 
 
@@ -41,7 +42,7 @@ def __init__(self, key: BufferKey, array: np.ndarray):
 
     @classmethod
     def pop_from_cache(
-        cls, allocator: Allocator, shape: Iterable[int], dtype: type
+        cls, allocator: Allocator, shape: Iterable[int], dtype: npt.DTypeLike
     ) -> Buffer:
         """Retrieve or insert then retrieve of buffer from cache.
 
@@ -78,8 +79,8 @@ def finalize_memory_transfer(self) -> None:
     def assign_to(
         self,
         destination_array: np.ndarray,
-        buffer_slice: IndexExpression = np.index_exp[:],
-        buffer_reshape: IndexExpression = None,
+        buffer_slice: Any = np.index_exp[:],
+        buffer_reshape: Any | None = None,
     ) -> None:
         """Assign internal array to destination_array.
 
@@ -95,7 +96,7 @@ def assign_to(
             )
 
     def assign_from(
-        self, source_array: np.ndarray, buffer_slice: IndexExpression = np.index_exp[:]
+        self, source_array: np.ndarray, buffer_slice: Any = np.index_exp[:]
     ) -> None:
         """Assign source_array to internal array.
 
@@ -107,7 +108,7 @@ def assign_from(
 
 @contextlib.contextmanager
 def array_buffer(
-    allocator: Allocator, shape: Iterable[int], dtype: type
+    allocator: Allocator, shape: Iterable[int], dtype: npt.DTypeLike
 ) -> Generator[Buffer, Buffer, None]:
     """
     A context manager providing a contiguous array, which may be re-used between calls.
@@ -132,7 +133,7 @@ def send_buffer(
     allocator: Callable,
     array: np.ndarray,
     timer: Timer | None = None,
-) -> np.ndarray:
+) -> Generator[np.ndarray]:
     """A context manager ensuring that `array` is contiguous in a context where it is
     being sent as data, copying into a recycled buffer array if necessary.
 
@@ -166,7 +167,7 @@ def recv_buffer(
     allocator: Callable,
     array: np.ndarray,
     timer: Timer | None = None,
-) -> np.ndarray:
+) -> Generator[np.ndarray]:
     """A context manager ensuring that array is contiguous in a context where it is
     being used to receive data, using a recycled buffer array and then copying the
     result into array if necessary.
diff --git a/ndsl/comm/communicator.py b/ndsl/comm/communicator.py
index 983a35b2..1304bae0 100644
--- a/ndsl/comm/communicator.py
+++ b/ndsl/comm/communicator.py
@@ -2,6 +2,7 @@
 
 import abc
 from collections.abc import Mapping, Sequence
+from types import ModuleType
 from typing import Any, Self, cast
 
 import numpy as np
@@ -16,7 +17,6 @@
 from ndsl.optional_imports import cupy
 from ndsl.performance.timer import NullTimer, Timer
 from ndsl.quantity import Quantity, QuantityHaloSpec, QuantityMetadata
-from ndsl.types import NumpyModule
 
 
 def to_numpy(array, dtype=None) -> np.ndarray:  # type: ignore[no-untyped-def]
@@ -83,7 +83,7 @@ def size(self) -> int:
         """Total number of ranks in this communicator"""
         return self.comm.Get_size()
 
-    def _maybe_force_cpu(self, module: NumpyModule) -> NumpyModule:
+    def _maybe_force_cpu(self, module: ModuleType) -> ModuleType:
         """
         Get a numpy-like module depending on configuration and
         Quantity original allocator.
@@ -223,7 +223,7 @@ def _get_gather_recv_quantity(
     ) -> Quantity:
         """Initialize a Quantity for use when receiving global data during gather"""
         recv_quantity = Quantity(
-            send_metadata.np.zeros(global_extent, dtype=send_metadata.dtype),  # type: ignore
+            send_metadata.np.zeros(global_extent, dtype=send_metadata.dtype),
             dims=send_metadata.dims,
             units=send_metadata.units,
             origin=tuple([0 for dim in send_metadata.dims]),
@@ -238,7 +238,7 @@ def _get_scatter_recv_quantity(
     ) -> Quantity:
         """Initialize a Quantity for use when receiving subtile data during scatter"""
         recv_quantity = Quantity(
-            send_metadata.np.zeros(shape, dtype=send_metadata.dtype),  # type: ignore
+            send_metadata.np.zeros(shape, dtype=send_metadata.dtype),
             dims=send_metadata.dims,
             units=send_metadata.units,
             backend=send_metadata.backend,
@@ -837,7 +837,7 @@ def _get_gather_recv_quantity(
         # needs to change the quantity dimensions since we add a "tile" dimension,
         # unlike for tile scatter/gather which retains the same dimensions
         recv_quantity = Quantity(
-            metadata.np.zeros(global_extent, dtype=metadata.dtype),  # type: ignore
+            metadata.np.zeros(global_extent, dtype=metadata.dtype),
             dims=(constants.TILE_DIM,) + metadata.dims,
             units=metadata.units,
             origin=(0,) + tuple([0 for dim in metadata.dims]),
@@ -859,7 +859,7 @@ def _get_scatter_recv_quantity(
         # needs to change the quantity dimensions since we remove a "tile" dimension,
         # unlike for tile scatter/gather which retains the same dimensions
         recv_quantity = Quantity(
-            metadata.np.zeros(shape, dtype=metadata.dtype),  # type: ignore
+            metadata.np.zeros(shape, dtype=metadata.dtype),
             dims=metadata.dims[1:],
             units=metadata.units,
             backend=metadata.backend,
diff --git a/ndsl/dsl/typing.py b/ndsl/dsl/typing.py
index 7229cc9b..01ae54d4 100644
--- a/ndsl/dsl/typing.py
+++ b/ndsl/dsl/typing.py
@@ -1,6 +1,7 @@
 from typing import TypeAlias
 
 import numpy as np
+import numpy.typing as npt
 from gt4py.cartesian import gtscript
 
 from ndsl.dsl import NDSL_GLOBAL_PRECISION
@@ -110,7 +111,7 @@ def cast_to_index3d(val: tuple[int, ...]) -> Index3D:
     return val
 
 
-def is_float(dtype: type) -> bool:
+def is_float(dtype: npt.DTypeLike) -> bool:
     """Expected floating point type"""
     return dtype in [
         Float,
diff --git a/ndsl/halo/data_transformer.py b/ndsl/halo/data_transformer.py
index db2b4edd..606de803 100644
--- a/ndsl/halo/data_transformer.py
+++ b/ndsl/halo/data_transformer.py
@@ -4,6 +4,8 @@
 from collections.abc import Sequence
 from dataclasses import dataclass
 from enum import Enum
+from types import ModuleType
+from typing import no_type_check
 from uuid import UUID, uuid1
 
 import numpy as np
@@ -22,7 +24,6 @@
 from ndsl.halo.rotate import rotate_scalar_data, rotate_vector_data
 from ndsl.optional_imports import cupy as cp
 from ndsl.quantity import Quantity, QuantityHaloSpec
-from ndsl.types import NumpyModule
 from ndsl.utils import device_synchronize
 
 
@@ -53,7 +54,11 @@ def _push_stream(stream: "cp.cuda.Stream") -> None:
 INDICES_CACHE: dict[str, "cp.ndarray"] = {}
 
 
-def _build_flatten_indices(  # type: ignore[no-untyped-def]
+# `array_value[...] = xxx` is failing mypy because of bad inference
+# of the type. We can't type ignore, because mypy also thinks that it
+# no needed (but if removed, it will fail...)
+@no_type_check
+def _build_flatten_indices(
     key,
     shape,
     slices: tuple[slice, ...],
@@ -186,7 +191,7 @@ class HaloDataTransformer(abc.ABC):
 
     def __init__(
         self,
-        np_module: NumpyModule,
+        np_module: ModuleType,
         exchange_descriptors_x: Sequence[HaloExchangeSpec],
         exchange_descriptors_y: Sequence[HaloExchangeSpec] | None = None,
     ) -> None:
@@ -237,7 +242,7 @@ def finalize(self) -> None:
 
     @staticmethod
     def get(
-        np_module: NumpyModule,
+        np_module: ModuleType,
         exchange_descriptors_x: Sequence[HaloExchangeSpec],
         exchange_descriptors_y: Sequence[HaloExchangeSpec] | None = None,
     ) -> HaloDataTransformer:
@@ -308,7 +313,7 @@ def _compile(self) -> None:
 
         # Compute required size
         buffer_size = 0
-        dtype = None
+        dtype = np.float32  # default that will be overriden or not used
         for edge_x in self._infos_x:
             buffer_size += edge_x.pack_buffer_size
             dtype = edge_x.specification.dtype
@@ -320,12 +325,12 @@ def _compile(self) -> None:
         self._pack_buffer = Buffer.pop_from_cache(
             self._np_module.zeros,
             (buffer_size,),
-            dtype,  # type: ignore[arg-type]
+            dtype,
         )
         self._unpack_buffer = Buffer.pop_from_cache(
             self._np_module.zeros,
             (buffer_size,),
-            dtype,  # type: ignore[arg-type]
+            dtype,
         )
 
     def ready(self) -> bool:
@@ -589,7 +594,7 @@ class _CuKernelArgs:
 
     def __init__(
         self,
-        np_module: NumpyModule,
+        np_module: ModuleType,
         exchange_descriptors_x: Sequence[HaloExchangeSpec],
         exchange_descriptors_y: Sequence[HaloExchangeSpec] | None = None,
     ) -> None:
diff --git a/ndsl/halo/updater.py b/ndsl/halo/updater.py
index 1cd19499..1dd65ef7 100644
--- a/ndsl/halo/updater.py
+++ b/ndsl/halo/updater.py
@@ -2,6 +2,7 @@
 
 from collections import defaultdict
 from collections.abc import Iterable, Mapping
+from types import ModuleType
 from typing import TYPE_CHECKING
 
 import numpy as np
@@ -14,7 +15,7 @@
 from ndsl.halo.rotate import rotate_scalar_data
 from ndsl.performance.timer import NullTimer, Timer
 from ndsl.quantity import Quantity, QuantityHaloSpec
-from ndsl.types import AsyncRequest, NumpyModule
+from ndsl.types import AsyncRequest
 from ndsl.utils import device_synchronize
 
 
@@ -95,7 +96,7 @@ def __del__(self) -> None:
     def from_scalar_specifications(
         cls,
         comm: Communicator,
-        numpy_like_module: NumpyModule,
+        numpy_like_module: ModuleType,
         specifications: Iterable[QuantityHaloSpec],
         boundaries: Iterable[Boundary],
         tag: int,
@@ -147,7 +148,7 @@ def from_scalar_specifications(
     def from_vector_specifications(
         cls,
         comm: Communicator,
-        numpy_like_module: NumpyModule,
+        numpy_like_module: ModuleType,
         specifications_x: Iterable[QuantityHaloSpec],
         specifications_y: Iterable[QuantityHaloSpec],
         boundaries: Iterable[Boundary],
@@ -475,7 +476,7 @@ def _Isend_vector_shared_boundary(
         ]
         return send_requests
 
-    def _maybe_force_cpu(self, module: NumpyModule) -> NumpyModule:
+    def _maybe_force_cpu(self, module: ModuleType) -> ModuleType:
         """
         Get a numpy-like module depending on configuration and
         Quantity original allocator.
diff --git a/ndsl/quantity/metadata.py b/ndsl/quantity/metadata.py
index 409c0ca0..08d13162 100644
--- a/ndsl/quantity/metadata.py
+++ b/ndsl/quantity/metadata.py
@@ -1,13 +1,14 @@
 from __future__ import annotations
 
 import dataclasses
+from types import ModuleType
 from typing import Any
 
 import numpy as np
+import numpy.typing as npt
 
 from ndsl.config.backend import Backend
 from ndsl.optional_imports import cupy
-from ndsl.types import NumpyModule
 
 
 if cupy is None:
@@ -28,7 +29,7 @@ class QuantityMetadata:
     "Units of the quantity."
     data_type: type
     "ndarray-like type used to store the data."
-    dtype: type
+    dtype: npt.DTypeLike
     "dtype of the data in the ndarray-like object."
     backend: Backend
     "NDSL backend. Used for performance optimal data allocation."
@@ -39,7 +40,7 @@ def dim_lengths(self) -> dict[str, int]:
         return dict(zip(self.dims, self.extent))
 
     @property
-    def np(self) -> NumpyModule:
+    def np(self) -> ModuleType:
         """numpy-like module used to interact with the data."""
         if issubclass(self.data_type, cupy.ndarray):
             return cupy
@@ -72,5 +73,5 @@ class QuantityHaloSpec:
     origin: tuple[int, ...]
     extent: tuple[int, ...]
     dims: tuple[str, ...]
-    numpy_module: NumpyModule
+    numpy_module: ModuleType
     dtype: Any
diff --git a/ndsl/quantity/quantity.py b/ndsl/quantity/quantity.py
index 71fa89d3..416cc847 100644
--- a/ndsl/quantity/quantity.py
+++ b/ndsl/quantity/quantity.py
@@ -2,6 +2,7 @@
 
 import warnings
 from collections.abc import Iterable, Sequence
+from types import ModuleType
 from typing import Any, cast
 
 import dace
@@ -18,7 +19,6 @@
 from ndsl.optional_imports import cupy
 from ndsl.quantity.bounds import BoundedArrayView
 from ndsl.quantity.metadata import QuantityHaloSpec, QuantityMetadata
-from ndsl.types import NumpyModule
 
 
 if cupy is None:
@@ -326,7 +326,7 @@ def data_as_xarray(self) -> xr.DataArray:
         return xr.DataArray(data, dims=self.dims, attrs=self.attrs)
 
     @property
-    def np(self) -> NumpyModule:
+    def np(self) -> ModuleType:
         return self.metadata.np
 
     @property
@@ -408,7 +408,7 @@ def transpose(
         target_dims = _collapse_dims(target_dims, self.dims)
         transpose_order = [self.dims.index(dim) for dim in target_dims]
         transposed = Quantity(
-            self.np.transpose(self.data, transpose_order),  # type: ignore[attr-defined]
+            self.np.transpose(self.data, transpose_order),
             dims=_transpose_sequence(self.dims, transpose_order),
             units=self.units,
             origin=_transpose_sequence(self.origin, transpose_order),
diff --git a/ndsl/types.py b/ndsl/types.py
index e51eb666..b88a25b3 100644
--- a/ndsl/types.py
+++ b/ndsl/types.py
@@ -1,4 +1,3 @@
-import functools
 from collections.abc import Iterable
 from typing import TypeAlias
 
@@ -14,32 +13,6 @@ def __call__(self, shape: Iterable[int], dtype: type) -> None:
         pass
 
 
-class NumpyModule(Protocol):
-    empty: Allocator
-    zeros: Allocator
-    ones: Allocator
-
-    @functools.wraps(np.rot90)
-    def rot90(self, *args, **kwargs):  # type: ignore[no-untyped-def]
-        pass
-
-    @functools.wraps(np.sum)
-    def sum(self, *args, **kwargs):  # type: ignore[no-untyped-def]
-        pass
-
-    @functools.wraps(np.log)
-    def log(self, *args, **kwargs):  # type: ignore[no-untyped-def]
-        pass
-
-    @functools.wraps(np.sin)
-    def sin(self, *args, **kwargs):  # type: ignore[no-untyped-def]
-        pass
-
-    @functools.wraps(np.asarray)
-    def asarray(self, *args, **kwargs):  # type: ignore[no-untyped-def]
-        pass
-
-
 class AsyncRequest(Protocol):
     """Define the result of an over-the-network capable communication API"""
 
diff --git a/ndsl/typing.py b/ndsl/typing.py
index ddbf1681..c01eecf5 100644
--- a/ndsl/typing.py
+++ b/ndsl/typing.py
@@ -3,4 +3,4 @@
 from ndsl.comm.communicator import Communicator
 from ndsl.comm.partitioner import Partitioner
 from ndsl.performance.collector import AbstractPerformanceCollector
-from ndsl.types import AsyncRequest, NumpyModule
+from ndsl.types import AsyncRequest
diff --git a/ndsl/utils.py b/ndsl/utils.py
index 6fcbd3e0..684ad5fd 100644
--- a/ndsl/utils.py
+++ b/ndsl/utils.py
@@ -5,6 +5,7 @@
 
 import f90nml
 import numpy as np
+import numpy.typing as npt
 
 import ndsl.constants as constants
 from ndsl.optional_imports import cupy as cp
@@ -88,7 +89,7 @@ def device_synchronize() -> None:
 
 
 def safe_mpi_allocate(
-    allocator: Allocator, shape: Iterable[int], dtype: type
+    allocator: Allocator, shape: Iterable[int], dtype: npt.DTypeLike
 ) -> np.ndarray:
     """Make sure the allocation use an allocator that works with MPI
 
diff --git a/ndsl/viz/fv3/_plot_cube.py b/ndsl/viz/fv3/_plot_cube.py
index 35e658fb..97e0deea 100644
--- a/ndsl/viz/fv3/_plot_cube.py
+++ b/ndsl/viz/fv3/_plot_cube.py
@@ -379,7 +379,7 @@ def _segment_plot_inputs(x, y, masked_array):
     """
     is_nan = np.isnan(masked_array)
     if np.sum(is_nan) == 0:  # contiguous section, just plot it
-        if np.product(masked_array.shape) > 0:
+        if np.prod(masked_array.shape) > 0:
             yield (x, y, masked_array)
     else:
         x_nans = np.sum(is_nan, axis=1) / is_nan.shape[1]
diff --git a/ndsl/viz/fv3/_timestep_histograms.py b/ndsl/viz/fv3/_timestep_histograms.py
index 7743fa82..2f0ecb87 100644
--- a/ndsl/viz/fv3/_timestep_histograms.py
+++ b/ndsl/viz/fv3/_timestep_histograms.py
@@ -20,7 +20,7 @@ def plot_daily_and_hourly_hist(
     return fig
 
 
-def plot_daily_hist(ax: Axes, time_list: Sequence[datetime.datetime]):
+def plot_daily_hist(ax: Axes, time_list: Sequence[datetime.datetime | np.datetime64]):
     """Given list of datetimes, plot histogram of count per calendar day on ax"""
     ser = pd.Series(time_list)
     groupby_list = [ser.dt.year, ser.dt.month, ser.dt.day]
@@ -28,7 +28,7 @@ def plot_daily_hist(ax: Axes, time_list: Sequence[datetime.datetime]):
     ax.set_ylabel("Count")
 
 
-def plot_hourly_hist(ax: Axes, time_list: Sequence[datetime.datetime]):
+def plot_hourly_hist(ax: Axes, time_list: Sequence[datetime.datetime | np.datetime64]):
     """Given list of datetimes, plot histogram of count per UTC hour on ax"""
     ser = pd.Series(time_list)
     ser.groupby(ser.dt.hour).count().plot(ax=ax, kind="bar", title="Hourly count")
diff --git a/setup.py b/setup.py
index 5ca50f55..8b9f9922 100644
--- a/setup.py
+++ b/setup.py
@@ -20,7 +20,7 @@ def local_pkg(name: str, relative_path: str) -> str:
     "h5netcdf",  # for xarray
     "h5py",  # for h5netcdf >= 1.8
     "dask",  # for xarray
-    "numpy==1.26.4",
+    "numpy>=2",
     "matplotlib",  # for plotting in boilerplate
     "cartopy",  # for plotting in ndsl.viz
     "pytest-subtests",  # for translate tests
diff --git a/tests/mpi/test_mpi_halo_update.py b/tests/mpi/test_mpi_halo_update.py
index 11b91a75..32ba6f43 100644
--- a/tests/mpi/test_mpi_halo_update.py
+++ b/tests/mpi/test_mpi_halo_update.py
@@ -246,7 +246,7 @@ def boundary_dict(ranks_per_tile):
 
 @pytest.fixture
 def depth_quantity(
-    dims, units, origin, extent, shape, numpy, dtype, n_points, n_buffer
+    dims, units, origin, extent, shape, numpy, dtype, n_points, n_buffer, ndsl_backend
 ):
     """A quantity whose value indicates the distance from the computational
     domain boundary."""
@@ -274,7 +274,7 @@ def depth_quantity(
         units=units,
         origin=origin,
         extent=extent,
-        backend=Backend.python(),
+        backend=ndsl_backend,
     )
 
 
From f4ba5d66f3957d08db92cca5fa01508728f3e129 Mon Sep 17 00:00:00 2001
From: Roman Cattaneo <romanc@users.noreply.github.com>
Date: Thu, 2 Apr 2026 09:42:00 +0200
Subject: [PATCH 22/28] build: Add support for python 3.13 (#394)

---
 .github/workflows/unit_tests.yaml | 2 +-
 pyproject.toml                    | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/unit_tests.yaml b/.github/workflows/unit_tests.yaml
index c70b728d..0e7aa421 100644
--- a/.github/workflows/unit_tests.yaml
+++ b/.github/workflows/unit_tests.yaml
@@ -18,7 +18,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: ['3.11', '3.12']
+        python-version: ['3.11', '3.12', '3.13']
     name: Python ${{ matrix.python-version }}
     steps:
       - name: Checkout repository
diff --git a/pyproject.toml b/pyproject.toml
index 07e3d5c3..68045a58 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -11,14 +11,16 @@ classifiers = [
   "Private :: Do Not Upload",
   "Natural Language :: English",
   "Programming Language :: Python :: 3",
-  "Programming Language :: Python :: 3.11"
+  "Programming Language :: Python :: 3.11",
+  "Programming Language :: Python :: 3.12",
+  "Programming Language :: Python :: 3.13"
 ]
 dynamic = ["dependencies", "version"]
 license = "Apache-2.0"
 license-files = ["LICENSE.txt", "ndsl/viz/fv3/README.md"]
 name = "ndsl"
 readme = "README.md"
-requires-python = ">=3.11,<3.13"
+requires-python = ">=3.11,<3.14"
 
 [project.optional-dependencies]
 demos = ["ipython", "ipykernel"]

From bfeac8e707453f1bfa3dfa16b33213f02a77a5e3 Mon Sep 17 00:00:00 2001
From: Florian Deconinck <deconinck.florian@gmail.com>
Date: Thu, 2 Apr 2026 12:04:41 -0400
Subject: [PATCH 23/28] [Translate] Fix default shape for KJI / Fortran-aligned
 backend (#409)

* Fix default shape for KJI / Fortran-aligned backend

* Test to trip lint on CI but not locally (again)

* Default to `Any` because we can't list all `dtypes` and we can't filter enough for mypy to be happy

* Missing files

* Restore `self.maxshape` since it's used outside (sic) the inner translate system

* Fix type hint in `xumpy`

* Use np.floating generic type hint
---
 ndsl/stencils/testing/translate.py |  7 +++++--
 ndsl/xumpy/alloc.py                | 11 ++---------
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/ndsl/stencils/testing/translate.py b/ndsl/stencils/testing/translate.py
index f45b1f47..8be579dc 100644
--- a/ndsl/stencils/testing/translate.py
+++ b/ndsl/stencils/testing/translate.py
@@ -32,7 +32,7 @@ def pad_field_in_j(field, nj: int, backend: Backend):
 def as_numpy(
     value: dict[str, Any] | Quantity | np.ndarray,
 ) -> np.ndarray | dict[str, np.ndarray]:
-    def _convert(value: Quantity | np.ndarray) -> np.ndarray:
+    def _convert(value: Any) -> np.ndarray:
         if isinstance(value, Quantity):
             return value.data
         elif isinstance(value, np.ndarray):
@@ -74,10 +74,13 @@ def __init__(
         self.out_vars: dict[str, Any] = {}
         self.write_vars: list = []
         self.grid = grid
-        self.maxshape: tuple[int, ...] = grid.domain_shape_full(add=(1, 1, 1))
         self.ordered_input_vars = None
         self.ignore_near_zero_errors: dict[str, Any] = {}
         self.skip_test = skip_test
+        if self.stencil_factory.backend.is_fortran_aligned():
+            self.maxshape = self.grid.domain_shape_full()
+        else:
+            self.maxshape = self.grid.domain_shape_full(add=(1, 1, 1))
 
     def extra_data_load(self, data_loader: DataLoader):
         pass
diff --git a/ndsl/xumpy/alloc.py b/ndsl/xumpy/alloc.py
index 77be4cac..ca5d2092 100644
--- a/ndsl/xumpy/alloc.py
+++ b/ndsl/xumpy/alloc.py
@@ -2,7 +2,6 @@
 
 import numpy as np
 import numpy.typing as npt
-from numpy._typing import _SupportsDType
 
 from ndsl.config import Backend
 from ndsl.dsl.typing import Float
@@ -14,12 +13,6 @@
 
 # Taking a page from cupy's playbook to have tuple & ndarray
 _ShapeLike = SupportsIndex | Sequence[SupportsIndex]
-_DTypeLikeFloat32 = (
-    np.dtype[np.float32] | _SupportsDType[np.dtype[np.float32]] | type[np.float32]
-)
-_DTypeLikeFloat64 = (
-    np.dtype[np.float64] | _SupportsDType[np.dtype[np.float64]] | type[np.float64]
-)
 
 
 def zeros(
@@ -55,7 +48,7 @@ def empty(
 def full(
     shape: _ShapeLike,
     backend: Backend,
-    value: np.ScalarType,
+    value: npt.DTypeLike,
     dtype: npt.DTypeLike = Float,
 ) -> np.ndarray | cp.ndarray:
     if backend.is_gpu_backend():
@@ -66,7 +59,7 @@ def full(
 def random(
     shape: _ShapeLike,
     backend: Backend,
-    dtype: _DTypeLikeFloat32 | _DTypeLikeFloat64 = Float,  # type: ignore [valid-type]
+    dtype: np.floating = Float,
 ) -> np.ndarray | cp.ndarray:
     if backend.is_gpu_backend():
         gen = cp.random.default_rng()

From 1815e4f8cebccab7c6be31f31a6b1e0d96b5a016 Mon Sep 17 00:00:00 2001
From: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Date: Fri, 3 Apr 2026 12:16:17 -0400
Subject: [PATCH 24/28] Translate test dimension fix during comparison (#419)

* added np.squeeze for all non-scalar fields at the top of the BaseMetric class, fixes an error where some fields have a lingering size one dimension causes a dimension mismatch with the reference data

* linting
---
 ndsl/testing/comparison.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ndsl/testing/comparison.py b/ndsl/testing/comparison.py
index b5c1ac0d..b04f09c0 100644
--- a/ndsl/testing/comparison.py
+++ b/ndsl/testing/comparison.py
@@ -27,8 +27,8 @@ def __init__(
         reference_values: np.ndarray,
         computed_values: np.ndarray,
     ):
-        self.references = np.atleast_1d(reference_values)
-        self.computed = np.atleast_1d(computed_values)
+        self.references = np.squeeze(np.atleast_1d(reference_values))
+        self.computed = np.squeeze(np.atleast_1d(computed_values))
         self.check = False
 
     @abstractmethod

From 80bd4370bec5d70d4e9a24c4519434d82355466f Mon Sep 17 00:00:00 2001
From: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>
Date: Wed, 8 Apr 2026 14:24:30 +0200
Subject: [PATCH 25/28] restore the ci  hooks for shield and pace (#420)

* restore the ci  hooks for shield and pace

* add missing test
---
 .github/workflows/fv3_translate_tests.yaml | 2 +-
 .github/workflows/pace_tests.yaml          | 2 +-
 .github/workflows/shield_tests.yaml        | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/fv3_translate_tests.yaml b/.github/workflows/fv3_translate_tests.yaml
index 3adee5f9..f28e5ad1 100644
--- a/.github/workflows/fv3_translate_tests.yaml
+++ b/.github/workflows/fv3_translate_tests.yaml
@@ -10,7 +10,7 @@ on:
 
 jobs:
   fv3_translate_tests:
-    uses: twicki/pyFV3/.github/workflows/translate.yaml@update/numpy_2x
+    uses: NOAA-GFDL/pyFV3/.github/workflows/translate.yaml@develop
     with:
       component_trigger: true
       component_name: NDSL
diff --git a/.github/workflows/pace_tests.yaml b/.github/workflows/pace_tests.yaml
index b40482a9..ea3d40b3 100644
--- a/.github/workflows/pace_tests.yaml
+++ b/.github/workflows/pace_tests.yaml
@@ -10,7 +10,7 @@ on:
 
 jobs:
   pace_main_tests:
-    uses: floriandeconinck/pace/.github/workflows/main_unit_tests.yaml@update/numpy_2x
+    uses: NOAA-GFDL/pace/.github/workflows/main_unit_tests.yaml@develop
     with:
       component_trigger: true
       component_name: NDSL
diff --git a/.github/workflows/shield_tests.yaml b/.github/workflows/shield_tests.yaml
index 5c4eb812..53ba510b 100644
--- a/.github/workflows/shield_tests.yaml
+++ b/.github/workflows/shield_tests.yaml
@@ -10,7 +10,7 @@ on:
 
 jobs:
   shield_translate_tests:
-    uses: floriandeconinck/pySHiELD/.github/workflows/translate.yaml@update/numpy_2x
+    uses: NOAA-GFDL/pySHiELD/.github/workflows/translate.yaml@develop
     with:
       component_trigger: true
       component_name: NDSL

From 220f3a58991822e7c379d4302b8973f2885f92a2 Mon Sep 17 00:00:00 2001
From: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>
Date: Wed, 8 Apr 2026 16:23:55 +0200
Subject: [PATCH 26/28] update gt4py (#421)

---
 external/gt4py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/external/gt4py b/external/gt4py
index 47b95820..0a0ec7f5 160000
--- a/external/gt4py
+++ b/external/gt4py
@@ -1 +1 @@
-Subproject commit 47b95820dc4eda34397790d11b22dc2d1cb77f99
+Subproject commit 0a0ec7f5ffdbca331aeccf1acf8fc604d8f070a1

From fa6503a9f63bf3df88147a886af90c342aed672c Mon Sep 17 00:00:00 2001
From: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>
Date: Thu, 9 Apr 2026 15:32:37 +0200
Subject: [PATCH 27/28] Release: NDSL `2026.03.00` (#422)

* For release `2025.03.00` (#127)

* updating 4d handling

* debug 4d test data

* more iter

* moving ser_to_nc here

* updating datatype in translate test

* typing works

* fix dict, lint

* remove empty line

* change from 4d to Nd

* Expose `k_start` and `k_end` automatically for any FrozenStencil

* Fix k_start + utest

* lint

* Fix for 2d stencils

* Add threshold overrides to the multimodal metric

* Always report results, add summary with one liners

* Remove "mmr" from the keys

* README in testing

* Better Latex (?)

* Better Latex (?)

* fixing a typo that breaks bools in translate tests (#80)

* Fix summary filename

* Fix report, filename

* Fix choosing right absolute difference for F32

* Make robust for NaN value

* Detect when array have different dimensions, if only one dimension, collapse
Clean up type infer and log work

* Lint

* Add rank 0 to the data

* Check data exists for rank, skip & print if not

* Fix bad logic on skip test for parallel

* Verbose exported names

* Make boilerplate calls more nimble

* New option: `which_savepoint`
Better error on bad output data
Fix missing integer type check

* QOL for mypy/flak8 type hints

* Add SECONDS_PER_DAY as a constants following mixed precision standards

* Lint

* Cleanups in dace orchestration

Readability improvements in dace orchestration including

- early returns
- spelling out variable names
- fixing typos

* Rename program -> dace_program

* Make sure all constants adhere to the floating point precision set by the system

* Move `is_float` to `dsl.typing`

* Move Quantity to sub-directory + breakout the subcomponent

* Fix tests

* Lint

* Remove `cp.ndarray` since cupy is optional

* Restore workaround for optional cupy

* "GFS" -> "UFS"

* Cupy trick for metadata

* Add comments for constant explanation

* Describe 64/32-bit FloatFields

* Make sure the `make_storage_data` respects the array dtype.

* Fix logic for MultiModal metric and verbose it

* Added an MPI all_reduce for quantities based on SUM operation to communicator.py

* linted

* Add initial skeleton of pytest test for all reduce

* Added assertion tests for 1, 2 and 3D quantities passed through mpi_allreduce_sum

* Linted

* Added pytest.mark to skip test if mpi4py isn't available

* lint changes

* Addressed PR comments and added additional CPU backends to unit test

* Added setters for various Quantity properties to enable setting of Quantity metadata and data properties.

* Added function in QuantityMetadata class that allows copying of Metadata properties from one class to another.  Subsequent Quantity setters that performed the copying of QuantityMetadata properties were removed

* Expose all SG metric terms in grid_data

* Add `Allreduce` and all MPI OP

* Update utest

* Fix `local_comm`

* Fix utest

* Enforce `comm_abc.Comm` into Communicator

* Fix `comm` object in serial utest

* Lint + `MPIComm` on testing architecture

* Make sure the correct allocator backend is used for Quantities

* Add in_place option for Allreduce

* Cleanup ndsl/dsl/dace/utils.py (#96)

* Fix typos
* DaCeProgress: avoid double assignment of prefix
* Add type hints/simplify kernel_theoretical_timing

Adding type hints allowed to simplify `kernel_theoretical_timing`.

* Fix merge

* Hotfix for grid generation use of mpi operators

* Merge examples/mpi/.gitignore into top-level .gitignore

* Remove hard-coded __version__ numbers

Removes hard-coded version numbers from `__init__` files.

* Fixing a bunch of typos

* hotfix netcdf version for dockerfiles

* Updated version number in setup.py to reflect new release, 2025.01.00

* Adding in exception for compute domains with sizes less than or equal to halo size (#103)

* Adding in exception for compute domains with less than 4 points to vector_halo_update method

* Updated exception in communicator to compare halo size to compute domain size

* linting

* Moved domain size checker to SubtileGridSizer class method from_tile_params

* Fix passing down ak/bk for pressure coefficients when they are available from an outside source (online model case) (#107)

* [QOL] Logging, Type Hints and Quantity helpers (#108)

* Log on rank 0
Docstrings & typi hints on logger
Stencil Config has a `verbose` option
On verbose: FrozenStencil log when run (in GT backends)

* Update `config` in orchestrate call to solve type hint inconcistencies

* Quantity helper `to_netcdf` with multi rank support

* Automatic Int precision and stencil regeneration change (#104)

* Added feature to enable automatic detection of integer precision. Should remove the need for i32/i64 declaration (although their functionality is still retained) and replace both with the regular Int type

* change default rebuild state to false for get_factories

* Merged Float and Int precision detection functions into one common path

* Re-added old function to fulfil a PACE dependency

* updated docstring

* Added ability to declare 32 or 64 bit IntFields, overrulling the system precision

* Added one dimensional bool fields

* Fix error message in typing.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* output type for global_set_precision

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Bump DaCe to v1.0.1 (#109)

Our current DaCe version is some commit from September 2024. Meanwhile DaCe matured to v1 and recently release v1.0.1. This brings the DaCe submodule to the latest stable release version.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Streamline linting workflow (#110)

Linting should give fast feedback. The current workflow takes ~3mins where most of the time is spent installing (unnecessary) python packages. To run `pre-commit`, we only need the source files and `pre-commit` itself, which can be installed standalone. This brings runtime of the linting stage down to ~30 seconds.

Other changes

- update checkout action to v4
- update python setup action to v5
- change python version from 3.11.7 to 3.11 (any patch number will do)

This is a follow-up of PR https://github.com/NOAA-GFDL/PyFV3/pull/40 in PyFV3.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [FIX] Type hint for precision dependant Float, Int (#111)

* Fix the type hint of Float, Int

* Attempt using TypeAlias

* Feature: Adding documentation (#97)

* Added doc files

* Adding image files to docs

* Linting

* Updated docs to reflect changes requested in PR 97

* Linting

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Translate test] Save better reports & netCDF for multiple ranks on failure (#106)

* Save reports & netCDF for multiple ranks on failure
Fix multi modal threshold for parallel tests

* Order field by name in NetCDF

* Print all indices in logs. Sort by descernding ULP

* Allow sorting by metrics and index with `--sort_report` option

* Remove the `rank` froom SavepointCase. Access is done via `grid`

* Some docstrings

* Adds some quick capacities used in the post-radiation phase of the physics, including the  Stefan-Boltzmann constant (#116)

* add namelist option

* add stephan boltzmann constant

* lint

* Apply suggestions from code review

Change comments to docstring style

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Adding temperature of h2o triple point (#115)

* add ttp

* Update ndsl/constants.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* switch comments to docstrings for autodocs

* lint

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Feature] Porting workflow: enhancing errors readability (#114)

* Save all fields (pass and fail) and organize them by field

* Option `--no_report` to bypass logging & netcdf save
Move logs per variable into a `details` subfolder

* Order variable name in serialbox-to-netcdf

* `extra_data_load` function to load savepoint data saved outside the canonical savepoint

* Docs / Type Hint

* Fixed typo in error statment

---------

Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>

* Feature: NetCDF output precision configurable (#117)

* Removed hard-code of np.float32 from NetCDFMonitor transfer_type, replaced with Float type

* Added multiple options for NetCDF precision

* Added checking for use of 32 precision and float64 output

* Using NumPy type instead of string in NetCDFMonitor precision variable

* Added warning to netcdf_monitor.py for mismatch in precision settings

* Forgot f-string in warn message of netcdf_monitor

* Mixed Precision fixes and QOL (#118)

* Ignore `.next` caches

* CNST_OP20 is a true 64-bit

* Translate: Fix reading parameters with the right precision

* Multimodal metric: Skip reporting on expected values

* Bad commit

* Add license (Apache 2.0) (#105)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Change deprecated `np.product()` to `np.prod()` (#120)

Starting with numpy v1.25.0, `np.product()` is deprecated and
`np.prod()` should be used instead.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Update GT4Py and DaCe to bring in refactored GT4Py/DaCe bridge that exposes control flow (#119)

* Update DaCe to v1.0.2

DaCe v1.0.2 brings two fixes for DaCe transformations: one for
DeadDataflowElimination and one for StateFusion.

* Bump gt4py to include refactored gt4py/dace bridge

* Test with modified pace pipeline

- added this to re-trigger the new pace pipeline after limiting zarr to
  not install v3 (for now) because of breaking API changes.
- added this note to re-trigger after fixing the pace pipeline to not
  pull requirements from `develop`.
- added this note to ret-trigger after fixing the repo name

* Revert "Test with modified pace pipeline"

This reverts commit cd6560ea6129663d3445fafb36d02f03cb661b4d.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Grid Mixed Precision and Coriolis force load (+ QOL) (#121)

* Pass `dtype` down in allocator utils (gt4py_utils)

* Allow coriolis forces to be read in

* Edge factors are always 64-bit

* Quantity QOL

* Make sure to pass `dtype` to load the grid cleanly

* Translate grid: load coriolis forces, area 64 is 64-bit

* Bad merge

* Typo

* GEOS version of dz_min (#122)

* Doc enhancment (#123)

**Description**
Port and adaptation of the initial commit of the documentation.

Fixes issue https://github.com/NOAA-GFDL/NDSL/issues/113


**Checklist:**
- [X] I have performed a self-review of my own code
- [X] I have made corresponding changes to the documentation
- [X] My changes generate no new warnings

* Fix saving NetCDF for parallel translate test (#125)

* Release candidate 2025.03.00 (#124)

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Fix for bad merge of 7fdfa5 (#129)

---------

Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Roman Cattaneo <>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

* NDSL 2025.11.00 (#333)

* Check gt4py-backend options in config (#291)

* For release `2025.03.00` (#127)

* updating 4d handling

* debug 4d test data

* more iter

* moving ser_to_nc here

* updating datatype in translate test

* typing works

* fix dict, lint

* remove empty line

* change from 4d to Nd

* Expose `k_start` and `k_end` automatically for any FrozenStencil

* Fix k_start + utest

* lint

* Fix for 2d stencils

* Add threshold overrides to the multimodal metric

* Always report results, add summary with one liners

* Remove "mmr" from the keys

* README in testing

* Better Latex (?)

* Better Latex (?)

* fixing a typo that breaks bools in translate tests (#80)

* Fix summary filename

* Fix report, filename

* Fix choosing right absolute difference for F32

* Make robust for NaN value

* Detect when array have different dimensions, if only one dimension, collapse
Clean up type infer and log work

* Lint

* Add rank 0 to the data

* Check data exists for rank, skip & print if not

* Fix bad logic on skip test for parallel

* Verbose exported names

* Make boilerplate calls more nimble

* New option: `which_savepoint`
Better error on bad output data
Fix missing integer type check

* QOL for mypy/flak8 type hints

* Add SECONDS_PER_DAY as a constants following mixed precision standards

* Lint

* Cleanups in dace orchestration

Readability improvements in dace orchestration including

- early returns
- spelling out variable names
- fixing typos

* Rename program -> dace_program

* Make sure all constants adhere to the floating point precision set by the system

* Move `is_float` to `dsl.typing`

* Move Quantity to sub-directory + breakout the subcomponent

* Fix tests

* Lint

* Remove `cp.ndarray` since cupy is optional

* Restore workaround for optional cupy

* "GFS" -> "UFS"

* Cupy trick for metadata

* Add comments for constant explanation

* Describe 64/32-bit FloatFields

* Make sure the `make_storage_data` respects the array dtype.

* Fix logic for MultiModal metric and verbose it

* Added an MPI all_reduce for quantities based on SUM operation to communicator.py

* linted

* Add initial skeleton of pytest test for all reduce

* Added assertion tests for 1, 2 and 3D quantities passed through mpi_allreduce_sum

* Linted

* Added pytest.mark to skip test if mpi4py isn't available

* lint changes

* Addressed PR comments and added additional CPU backends to unit test

* Added setters for various Quantity properties to enable setting of Quantity metadata and data properties.

* Added function in QuantityMetadata class that allows copying of Metadata properties from one class to another.  Subsequent Quantity setters that performed the copying of QuantityMetadata properties were removed

* Expose all SG metric terms in grid_data

* Add `Allreduce` and all MPI OP

* Update utest

* Fix `local_comm`

* Fix utest

* Enforce `comm_abc.Comm` into Communicator

* Fix `comm` object in serial utest

* Lint + `MPIComm` on testing architecture

* Make sure the correct allocator backend is used for Quantities

* Add in_place option for Allreduce

* Cleanup ndsl/dsl/dace/utils.py (#96)

* Fix typos
* DaCeProgress: avoid double assignment of prefix
* Add type hints/simplify kernel_theoretical_timing

Adding type hints allowed to simplify `kernel_theoretical_timing`.

* Fix merge

* Hotfix for grid generation use of mpi operators

* Merge examples/mpi/.gitignore into top-level .gitignore

* Remove hard-coded __version__ numbers

Removes hard-coded version numbers from `__init__` files.

* Fixing a bunch of typos

* hotfix netcdf version for dockerfiles

* Updated version number in setup.py to reflect new release, 2025.01.00

* Adding in exception for compute domains with sizes less than or equal to halo size (#103)

* Adding in exception for compute domains with less than 4 points to vector_halo_update method

* Updated exception in communicator to compare halo size to compute domain size

* linting

* Moved domain size checker to SubtileGridSizer class method from_tile_params

* Fix passing down ak/bk for pressure coefficients when they are available from an outside source (online model case) (#107)

* [QOL] Logging, Type Hints and Quantity helpers (#108)

* Log on rank 0
Docstrings & typi hints on logger
Stencil Config has a `verbose` option
On verbose: FrozenStencil log when run (in GT backends)

* Update `config` in orchestrate call to solve type hint inconcistencies

* Quantity helper `to_netcdf` with multi rank support

* Automatic Int precision and stencil regeneration change (#104)

* Added feature to enable automatic detection of integer precision. Should remove the need for i32/i64 declaration (although their functionality is still retained) and replace both with the regular Int type

* change default rebuild state to false for get_factories

* Merged Float and Int precision detection functions into one common path

* Re-added old function to fulfil a PACE dependency

* updated docstring

* Added ability to declare 32 or 64 bit IntFields, overrulling the system precision

* Added one dimensional bool fields

* Fix error message in typing.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* output type for global_set_precision

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Bump DaCe to v1.0.1 (#109)

Our current DaCe version is some commit from September 2024. Meanwhile DaCe matured to v1 and recently release v1.0.1. This brings the DaCe submodule to the latest stable release version.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Streamline linting workflow (#110)

Linting should give fast feedback. The current workflow takes ~3mins where most of the time is spent installing (unnecessary) python packages. To run `pre-commit`, we only need the source files and `pre-commit` itself, which can be installed standalone. This brings runtime of the linting stage down to ~30 seconds.

Other changes

- update checkout action to v4
- update python setup action to v5
- change python version from 3.11.7 to 3.11 (any patch number will do)

This is a follow-up of PR https://github.com/NOAA-GFDL/PyFV3/pull/40 in PyFV3.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [FIX] Type hint for precision dependant Float, Int (#111)

* Fix the type hint of Float, Int

* Attempt using TypeAlias

* Feature: Adding documentation (#97)

* Added doc files

* Adding image files to docs

* Linting

* Updated docs to reflect changes requested in PR 97

* Linting

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Translate test] Save better reports & netCDF for multiple ranks on failure (#106)

* Save reports & netCDF for multiple ranks on failure
Fix multi modal threshold for parallel tests

* Order field by name in NetCDF

* Print all indices in logs. Sort by descernding ULP

* Allow sorting by metrics and index with `--sort_report` option

* Remove the `rank` froom SavepointCase. Access is done via `grid`

* Some docstrings

* Adds some quick capacities used in the post-radiation phase of the physics, including the  Stefan-Boltzmann constant (#116)

* add namelist option

* add stephan boltzmann constant

* lint

* Apply suggestions from code review

Change comments to docstring style

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Adding temperature of h2o triple point (#115)

* add ttp

* Update ndsl/constants.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* switch comments to docstrings for autodocs

* lint

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Feature] Porting workflow: enhancing errors readability (#114)

* Save all fields (pass and fail) and organize them by field

* Option `--no_report` to bypass logging & netcdf save
Move logs per variable into a `details` subfolder

* Order variable name in serialbox-to-netcdf

* `extra_data_load` function to load savepoint data saved outside the canonical savepoint

* Docs / Type Hint

* Fixed typo in error statment

---------

Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>

* Feature: NetCDF output precision configurable (#117)

* Removed hard-code of np.float32 from NetCDFMonitor transfer_type, replaced with Float type

* Added multiple options for NetCDF precision

* Added checking for use of 32 precision and float64 output

* Using NumPy type instead of string in NetCDFMonitor precision variable

* Added warning to netcdf_monitor.py for mismatch in precision settings

* Forgot f-string in warn message of netcdf_monitor

* Mixed Precision fixes and QOL (#118)

* Ignore `.next` caches

* CNST_OP20 is a true 64-bit

* Translate: Fix reading parameters with the right precision

* Multimodal metric: Skip reporting on expected values

* Bad commit

* Add license (Apache 2.0) (#105)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Change deprecated `np.product()` to `np.prod()` (#120)

Starting with numpy v1.25.0, `np.product()` is deprecated and
`np.prod()` should be used instead.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Update GT4Py and DaCe to bring in refactored GT4Py/DaCe bridge that exposes control flow (#119)

* Update DaCe to v1.0.2

DaCe v1.0.2 brings two fixes for DaCe transformations: one for
DeadDataflowElimination and one for StateFusion.

* Bump gt4py to include refactored gt4py/dace bridge

* Test with modified pace pipeline

- added this to re-trigger the new pace pipeline after limiting zarr to
  not install v3 (for now) because of breaking API changes.
- added this note to re-trigger after fixing the pace pipeline to not
  pull requirements from `develop`.
- added this note to ret-trigger after fixing the repo name

* Revert "Test with modified pace pipeline"

This reverts commit cd6560ea6129663d3445fafb36d02f03cb661b4d.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Grid Mixed Precision and Coriolis force load (+ QOL) (#121)

* Pass `dtype` down in allocator utils (gt4py_utils)

* Allow coriolis forces to be read in

* Edge factors are always 64-bit

* Quantity QOL

* Make sure to pass `dtype` to load the grid cleanly

* Translate grid: load coriolis forces, area 64 is 64-bit

* Bad merge

* Typo

* GEOS version of dz_min (#122)

* Doc enhancment (#123)

**Description**
Port and adaptation of the initial commit of the documentation.

Fixes issue https://github.com/NOAA-GFDL/NDSL/issues/113


**Checklist:**
- [X] I have performed a self-review of my own code
- [X] I have made corresponding changes to the documentation
- [X] My changes generate no new warnings

* Fix saving NetCDF for parallel translate test (#125)

* Release candidate 2025.03.00 (#124)

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Fix for bad merge of 7fdfa5 (#129)

---------

Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Roman Cattaneo <>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

* check for backend existence in config

* pc

* update stale backend name

---------

Co-authored-by: Frank Malatino <142349306+fmalatino@users.noreply.github.com>
Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Frank Malatino <frank.malatino@noaa.gov>

* fix: allow any Comm object in ZarrMonitor (#292)

This PR is fallout from adding types in PR #257 and #258. The
`ZarrMonitor` provides a `DummyComm` which is instantiated in case no
`Comm` object is given. The type of the `Comm` object in `ZarrMonitor`
was wrongly limited to that `DummyComm`, which only broke when we
attempted to update the submodule in `pace`.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Patch domain checks to only happen once (#293)

* For release `2025.03.00` (#127)

* updating 4d handling

* debug 4d test data

* more iter

* moving ser_to_nc here

* updating datatype in translate test

* typing works

* fix dict, lint

* remove empty line

* change from 4d to Nd

* Expose `k_start` and `k_end` automatically for any FrozenStencil

* Fix k_start + utest

* lint

* Fix for 2d stencils

* Add threshold overrides to the multimodal metric

* Always report results, add summary with one liners

* Remove "mmr" from the keys

* README in testing

* Better Latex (?)

* Better Latex (?)

* fixing a typo that breaks bools in translate tests (#80)

* Fix summary filename

* Fix report, filename

* Fix choosing right absolute difference for F32

* Make robust for NaN value

* Detect when array have different dimensions, if only one dimension, collapse
Clean up type infer and log work

* Lint

* Add rank 0 to the data

* Check data exists for rank, skip & print if not

* Fix bad logic on skip test for parallel

* Verbose exported names

* Make boilerplate calls more nimble

* New option: `which_savepoint`
Better error on bad output data
Fix missing integer type check

* QOL for mypy/flak8 type hints

* Add SECONDS_PER_DAY as a constants following mixed precision standards

* Lint

* Cleanups in dace orchestration

Readability improvements in dace orchestration including

- early returns
- spelling out variable names
- fixing typos

* Rename program -> dace_program

* Make sure all constants adhere to the floating point precision set by the system

* Move `is_float` to `dsl.typing`

* Move Quantity to sub-directory + breakout the subcomponent

* Fix tests

* Lint

* Remove `cp.ndarray` since cupy is optional

* Restore workaround for optional cupy

* "GFS" -> "UFS"

* Cupy trick for metadata

* Add comments for constant explanation

* Describe 64/32-bit FloatFields

* Make sure the `make_storage_data` respects the array dtype.

* Fix logic for MultiModal metric and verbose it

* Added an MPI all_reduce for quantities based on SUM operation to communicator.py

* linted

* Add initial skeleton of pytest test for all reduce

* Added assertion tests for 1, 2 and 3D quantities passed through mpi_allreduce_sum

* Linted

* Added pytest.mark to skip test if mpi4py isn't available

* lint changes

* Addressed PR comments and added additional CPU backends to unit test

* Added setters for various Quantity properties to enable setting of Quantity metadata and data properties.

* Added function in QuantityMetadata class that allows copying of Metadata properties from one class to another.  Subsequent Quantity setters that performed the copying of QuantityMetadata properties were removed

* Expose all SG metric terms in grid_data

* Add `Allreduce` and all MPI OP

* Update utest

* Fix `local_comm`

* Fix utest

* Enforce `comm_abc.Comm` into Communicator

* Fix `comm` object in serial utest

* Lint + `MPIComm` on testing architecture

* Make sure the correct allocator backend is used for Quantities

* Add in_place option for Allreduce

* Cleanup ndsl/dsl/dace/utils.py (#96)

* Fix typos
* DaCeProgress: avoid double assignment of prefix
* Add type hints/simplify kernel_theoretical_timing

Adding type hints allowed to simplify `kernel_theoretical_timing`.

* Fix merge

* Hotfix for grid generation use of mpi operators

* Merge examples/mpi/.gitignore into top-level .gitignore

* Remove hard-coded __version__ numbers

Removes hard-coded version numbers from `__init__` files.

* Fixing a bunch of typos

* hotfix netcdf version for dockerfiles

* Updated version number in setup.py to reflect new release, 2025.01.00

* Adding in exception for compute domains with sizes less than or equal to halo size (#103)

* Adding in exception for compute domains with less than 4 points to vector_halo_update method

* Updated exception in communicator to compare halo size to compute domain size

* linting

* Moved domain size checker to SubtileGridSizer class method from_tile_params

* Fix passing down ak/bk for pressure coefficients when they are available from an outside source (online model case) (#107)

* [QOL] Logging, Type Hints and Quantity helpers (#108)

* Log on rank 0
Docstrings & typi hints on logger
Stencil Config has a `verbose` option
On verbose: FrozenStencil log when run (in GT backends)

* Update `config` in orchestrate call to solve type hint inconcistencies

* Quantity helper `to_netcdf` with multi rank support

* Automatic Int precision and stencil regeneration change (#104)

* Added feature to enable automatic detection of integer precision. Should remove the need for i32/i64 declaration (although their functionality is still retained) and replace both with the regular Int type

* change default rebuild state to false for get_factories

* Merged Float and Int precision detection functions into one common path

* Re-added old function to fulfil a PACE dependency

* updated docstring

* Added ability to declare 32 or 64 bit IntFields, overrulling the system precision

* Added one dimensional bool fields

* Fix error message in typing.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* output type for global_set_precision

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Bump DaCe to v1.0.1 (#109)

Our current DaCe version is some commit from September 2024. Meanwhile DaCe matured to v1 and recently release v1.0.1. This brings the DaCe submodule to the latest stable release version.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Streamline linting workflow (#110)

Linting should give fast feedback. The current workflow takes ~3mins where most of the time is spent installing (unnecessary) python packages. To run `pre-commit`, we only need the source files and `pre-commit` itself, which can be installed standalone. This brings runtime of the linting stage down to ~30 seconds.

Other changes

- update checkout action to v4
- update python setup action to v5
- change python version from 3.11.7 to 3.11 (any patch number will do)

This is a follow-up of PR https://github.com/NOAA-GFDL/PyFV3/pull/40 in PyFV3.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [FIX] Type hint for precision dependant Float, Int (#111)

* Fix the type hint of Float, Int

* Attempt using TypeAlias

* Feature: Adding documentation (#97)

* Added doc files

* Adding image files to docs

* Linting

* Updated docs to reflect changes requested in PR 97

* Linting

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Translate test] Save better reports & netCDF for multiple ranks on failure (#106)

* Save reports & netCDF for multiple ranks on failure
Fix multi modal threshold for parallel tests

* Order field by name in NetCDF

* Print all indices in logs. Sort by descernding ULP

* Allow sorting by metrics and index with `--sort_report` option

* Remove the `rank` froom SavepointCase. Access is done via `grid`

* Some docstrings

* Adds some quick capacities used in the post-radiation phase of the physics, including the  Stefan-Boltzmann constant (#116)

* add namelist option

* add stephan boltzmann constant

* lint

* Apply suggestions from code review

Change comments to docstring style

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Adding temperature of h2o triple point (#115)

* add ttp

* Update ndsl/constants.py

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* switch comments to docstrings for autodocs

* lint

---------

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* [Feature] Porting workflow: enhancing errors readability (#114)

* Save all fields (pass and fail) and organize them by field

* Option `--no_report` to bypass logging & netcdf save
Move logs per variable into a `details` subfolder

* Order variable name in serialbox-to-netcdf

* `extra_data_load` function to load savepoint data saved outside the canonical savepoint

* Docs / Type Hint

* Fixed typo in error statment

---------

Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>

* Feature: NetCDF output precision configurable (#117)

* Removed hard-code of np.float32 from NetCDFMonitor transfer_type, replaced with Float type

* Added multiple options for NetCDF precision

* Added checking for use of 32 precision and float64 output

* Using NumPy type instead of string in NetCDFMonitor precision variable

* Added warning to netcdf_monitor.py for mismatch in precision settings

* Forgot f-string in warn message of netcdf_monitor

* Mixed Precision fixes and QOL (#118)

* Ignore `.next` caches

* CNST_OP20 is a true 64-bit

* Translate: Fix reading parameters with the right precision

* Multimodal metric: Skip reporting on expected values

* Bad commit

* Add license (Apache 2.0) (#105)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Change deprecated `np.product()` to `np.prod()` (#120)

Starting with numpy v1.25.0, `np.product()` is deprecated and
`np.prod()` should be used instead.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Update GT4Py and DaCe to bring in refactored GT4Py/DaCe bridge that exposes control flow (#119)

* Update DaCe to v1.0.2

DaCe v1.0.2 brings two fixes for DaCe transformations: one for
DeadDataflowElimination and one for StateFusion.

* Bump gt4py to include refactored gt4py/dace bridge

* Test with modified pace pipeline

- added this to re-trigger the new pace pipeline after limiting zarr to
  not install v3 (for now) because of breaking API changes.
- added this note to re-trigger after fixing the pace pipeline to not
  pull requirements from `develop`.
- added this note to ret-trigger after fixing the repo name

* Revert "Test with modified pace pipeline"

This reverts commit cd6560ea6129663d3445fafb36d02f03cb661b4d.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Grid Mixed Precision and Coriolis force load (+ QOL) (#121)

* Pass `dtype` down in allocator utils (gt4py_utils)

* Allow coriolis forces to be read in

* Edge factors are always 64-bit

* Quantity QOL

* Make sure to pass `dtype` to load the grid cleanly

* Translate grid: load coriolis forces, area 64 is 64-bit

* Bad merge

* Typo

* GEOS version of dz_min (#122)

* Doc enhancment (#123)

**Description**
Port and adaptation of the initial commit of the documentation.

Fixes issue https://github.com/NOAA-GFDL/NDSL/issues/113


**Checklist:**
- [X] I have performed a self-review of my own code
- [X] I have made corresponding changes to the documentation
- [X] My changes generate no new warnings

* Fix saving NetCDF for parallel translate test (#125)

* Release candidate 2025.03.00 (#124)

Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

* Fix for bad merge of 7fdfa5 (#129)

---------

Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Roman Cattaneo <>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

* check domain size args only once

* review & test

---------

Co-authored-by: Frank Malatino <142349306+fmalatino@users.noreply.github.com>
Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Frank Malatino <frank.malatino@noaa.gov>

* BREAKING CHANGE: change constructor of `QuantityFactory` (#228)

* Breaking change: QuantityFactory from GridSizer and backend name

Change `QuantityFactory` to initialize from a `GridSizer` (as
previously) and a backend name (new). This effectively hides the
previous `numpy` argument, which is effectively an internal allocator
that users shouldn't need to know about. It's basically what
`from_backend()` was doing before (which is now obsolete and was thus
removed).

This is a BREAKING CHANGE and users will need to update their codes if
they instantiated QuantityFactories themselves. For users relying on the
`boilerplate` module, no changes need to happen.

* Keep QuantityFactory.from_backend() with a deprecation warning

* Extended docstings

This is mainly to force a new run of the pyshild workflow now that
pyshield tests are exclusively using `QuantityFactory.from_backend()`
which is compatible with changes proposed in this PR.

* More updates to docstrings

* fixup after rebase

* Unrelated: tests are supposed to return `None`

* fixup: move method back to current place

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: remove ndsl/exceptions (#281)

* BREAKING CHANGE: remove ndsl/exceptions

The module has been deprecated last release and will be removed with
this release.

* fixup: documentation

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: remove deprecated environment variables (#282)

Those environment variable were deprecated in the last release and will
be removed with this release.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* ci: specialize concurrency group per repo (#287)

* ci: per repo concurrency group

Note: using `${{ github.repository }}` sounds like a good idea. In
practice, that doesn't play nice when the workflow is called from
another repository because in that case, `github.repository` resolves
to the calling repository.

* fix file ending of called workflows

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Remove ndsl.Namelist (#297)

* Removing ndsl.Namelist

* Removing use_legacy_namelist flag functionality
while keeping the flag itself.

* - Removing ndsl.Namelist
- Removing use_legacy_namelist flag functionality
(while keeping the flag itself for now)

* linting

* Removing namelist.md and test_namelist.py

* [feature] Common data types for orchestration via `compiletime` (#296)

* `Quantity`, `Local` & `State` default to `dace.compiletime` auto-magically in orchestration

* Fix type check, remove `Local`

* Unit tests

* Fix for type annotations that aren't type

* BREAKING CHANGE: remove deprecated ndsl/units.py (#283)

The module has been deprecated in the last release and is now removed
in this release cycle.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: removal of extra_dim_lengths (#295)

`extra_dim_lengths` on the `GridSizer` was replaced by `data_dimensions`
in the `2025.10.00` release. Now that the release is out, let's clean up
and remove the deprecated API. This also includes
`set_extra_dim_lengths()` in the `QuantityFactory`.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: remove deprecated ndsl/filesystem.py (#284)

The module was deprecated in the last release and will now be remove in
this release cycle.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* docs: release checklist and documentation (#299)

* release checklist and documentation

* Add template for patch release

* review

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* gt4py update: fix absolute indexin in debug backend (#302)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* column min/max stencil - value and index (#301)

* column min/max and a unit test

* working unit test, pre-commit changes

* alternative type ignore method

* reverted previous change

* using boilerplate code

* reverting previous change

* build: gt4py udpdate (fix upcasting, abs k test coverage) (#303)

This PR updates GT4Py to bring the following up from GT4Py

- fix upcasting such that users can have variable k-offsets with
  expressions consisting of different types.
- increase test coverage for absolute k indexing

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* restore default PR template (#305)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* BREAKING CHANGE: last 2025.10.00 deprecations (`CopyCorners`, `Quantity.values()`, `extra_dim_lengths` on `SubtileGridSizer` (#300)

* Remove deprecated extra_dim_lengths of SubtileGridSizer

This is a follow-up from https://github.com/NOAA-GFDL/NDSL/pull/295.

* Remove deprecated CopyCorners

* Remove deprecated `Quantity.values()`

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: remove leftover debug print statements (#308)

This PR just removes a bunch of leftover debug print statements from
`ndsl/` and `tests/`.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: make GridSizer an abstract base class (#306)

`GridSizer` is de-facto already a base class with abstract methods
`get_origin()`, `get_extent()`, and `get_shape()`. This PR just
formalizes that intent.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: directly use gt_storage in QuantityFactory (#307)

In the past, `QuantityFactory` would allow not only allocating with
gt4py storage objects, but also directly from `numpy` or `cupy`. This
ability was removed in PR https://github.com/NOAA-GFDL/NDSL/pull/228.
With that removal comes the opportunity to streamline allocation in
`QuantityFactory`, removing the need for a `Allocator` class in the
middle.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [Feature] Schedule Tree:  refine transient (#304)

* Fix axis merge

* Remove debug print

* Refine transients + utests

* Lint

* Revert to deactivating the experimental stree work

* Use context manager for  `_INTERNAL__SCHEDULE_TREE_OPTIMIZATION`

* Typo

* Clean refine transients code

* Derive common strides layout from backend
Refactor code to make re-sizing more compact in main algorithm
Fix bad recursion
Add todo list and verbose state of optimization

* Lint

* Remove `transient` to `State` lifetime - keep PR on target

* Lint

* build: gt4py update (upcasting in cast operations) (#310)

This PR updates GT4Py to bring the fix for upcasting inside cast
operations from GT4py to NDSL.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* build: gt4py update (precision of global constants) (#313)

This PR updates GT4Py in NDSL to bring up a PR that fixes the precision
of global constants. So far, we'd discard any type annotation on global
constants and just use the default literal precision instead. With this
change, we respect potential type annotations on global constants.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: Quantity constructor: `gt4py_backend`  -> `backend` (#312)

* refactor: force kwargs in ctor of  Quantity/Local

Force keyword arguments for optional arguments to those constructors.
This will facilitate the `gt4py_backen` -> `backend` transition.

* refactor: prefer `backend` over `gt4py_backend` in Quantity

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: prepare `ZarrMonitor` for upcomming `Comm` changes (#315)

* refactor: ZarrMonitor: you'll have to bring your own comm objects

* ci: run unit tests with optional zarr dependency

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Introduce a `single_code_path` flag in the DaCeConfig that forces a single cache to be built. (#311)

* refactor: Deprecate optional backend argument to Quantity/Local (#314)

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: remove DummyComm as alias to LocalComm (#319)

There's no need for this alias. We thus replace all occurrences for the
alias with the underlying `LocalComm` directly.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* Deprecate `CopyCornersXY` (#317)

`CopyCornersXY` are replaced with `CopyCornersX` and `CopyCornersY` in
PyFV3. The class is currently unused and will be removed after the next
release of NDSL.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: Deprecate `NullComm` in favor of `MPIComm` and `LocalComm` (#318)

* unrelated: fix typo in warning message

* refactor: change NullComm -> MPIComm in boilerplate

This adds a test that the MPI communicator only has one rank if a
single-tile setup is requested.

* refactor: deprecate NullComm

`NullComm` can be replaced with either `LocalComm` or `MPIComm`.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* build: gt4py update (self-assignment in serial vertical loops) (#316)

This updates the gt4py dependency to bring up the fix that allows
self-assignment with offset reads in K for serial (e.g.
FORWARD/BACKWARD) vertical loops.

See https://github.com/GridTools/gt4py/pull/2388 (in particular the test
cases for details on what is allowed and what not).

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* refactor: specify backend when allocating a Quantity (#320)

This PR is a follow-up from https://github.com/NOAA-GFDL/NDSL/pull/314
and adds the soon to be required `backend` parameter to constructor
calls of `Quantity`. I missed a couple ones because PRs were merged in
parallel, e.g. re-enabling the `ZarrMonitor` tests.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

* [Translate test] Compute the percentage of changing grid points that error (#322)

* Add `inputs` to MultiModalFloat metric
Compute the percentage of changing grid points that errored

* Lint

* Removing --no_legacy_namelist flag (#323)

* Added new functions: column_min_ddim & column_max_ddim and cooresponding test (#324)

Functionality is the same as column_min/max, but separate functions are needed to handle cases with off grid data dimensions

* [Optimization/Experimental] Better `AxisMerge` for column physics (#325)

* Add `CleanUpScheduleTree` pass to prep for merge

* Decluter axis merge logs, expose new pass

* Verbose Pipeline passes (with temporary stree saves)

* Deactivaete IF_SCOPE push, remove attempt to keep merging if next nodes not a MapScope

* Docs of TODO

* Draft of more extended testing

* Fix `CartesianRefineTransients` for non-array

* Some lint

* Clean up the Tree of ForScope.loop_range

* Utest: group test under a single orchestrated class, add missing feature and expected failures

* [Feature/Experimental] Stree Refine Transient optimization pass:  data dimensions and proper unit tests (#327)

* Rename test for axis merge

* Properly refine fields with data dimensions
Fix indexing in memlets properly

* utest: coverage of all implemented tests

* Clean up timing print of orchestration

* Lint

* Fix bad reference to in/out memlets, remopve dead code, better code

* Share test infrastructure, rename stencils

* Lint

* Better naming in utest stencils

* [Update] GT4Py & DaCe updated to 2025.11.25 state of `main` (#330)

* DaCe update: fix networkx dependency breaking with 3.6

* GT4Py: Runtime interval bounds in `debug`

* [Tool] Best Guess Netcdfs diff (#177)

* Best guess netcdfs compare

* Add FieldBundle to debugger

* lint

* Move executable to `pyproject`

* Lint

* Update `gt4py` to capture improvement to user error (#331)

* [Rework/Experimental] Refine Transient v2: `Ranges` for all! (#328)

* Rework the `RefineTransient` to use `Range` - simpler, cleaner and more robust. Also props us for a better refine

* Remove unused code

---------

Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>

* [Fix] [Translate] Update API for parallel test when using `MultiModalMetric` (#332)

* Remove old options for `MultiModalFloatMetric`

* Defensive programming: bail out if we can't measure the ref vs input diff

---------

Co-authored-by: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>
Co-authored-by: Frank Malatino <142349306+fmalatino@users.noreply.github.com>
Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Frank Malatino <frank.malatino@noaa.gov>
Co-authored-by: Janice Kim <Janice.Kim@noaa.gov>

---------

Co-authored-by: Frank Malatino <142349306+fmalatino@users.noreply.github.com>
Co-authored-by: Oliver Elbert <oliver.elbert36@gmail.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
Co-authored-by: Florian Deconinck <florian.deconinck@gmail.com>
Co-authored-by: Oliver Elbert <Oliver.Elbert@noaa.gov>
Co-authored-by: Christopher Kung <christopher.w.kung@nasa.gov>
Co-authored-by: Roman Cattaneo <romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <79879064+CharlesKrop@users.noreply.github.com>
Co-authored-by: Charles Kropiewnicki <charles.j.krop@gmail.com>
Co-authored-by: Frank Malatino <frank.malatino@noaa.gov>
Co-authored-by: Janice Kim <Janice.Kim@noaa.gov>

From 6f1ca86fb659ace34b8ecdf8b3ea5ddfcde910ca Mon Sep 17 00:00:00 2001
From: Tobias Wicky-Pfund <tobias.wicky@meteoswiss.ch>
Date: Thu, 9 Apr 2026 17:40:39 +0200
Subject: [PATCH 28/28] Revert "Release: NDSL `2026.03.00` (#422)" (#424)

This reverts commit fa6503a9f63bf3df88147a886af90c342aed672c.