Skip to content

Add ANN-based mesoscale streamfunction parameterization#1109

Merged
Hallberg-NOAA merged 14 commits into
NOAA-GFDL:dev/gfdlfrom
dhruvbalwada:rho_flux_ANN_gfdl_ready
May 19, 2026
Merged

Add ANN-based mesoscale streamfunction parameterization#1109
Hallberg-NOAA merged 14 commits into
NOAA-GFDL:dev/gfdlfrom
dhruvbalwada:rho_flux_ANN_gfdl_ready

Conversation

@dhruvbalwada
Copy link
Copy Markdown

@dhruvbalwada dhruvbalwada commented May 6, 2026

This PR adds a neural-network parameterization of the mesoscale eddy streamfunction as an alternative to (or addition on top of) the standard Gent–McWilliams thickness diffusion. It uses the MOM_ANN.F90 inference framework. This is an extension of the Balwada et al. 2-layer paper to any arbitrary vertical grid where computation of density gradients is possible.

This PR was collaboratively authored by Dhruv Balwada and Pavel Perezhogin (@Pperezhogin).

Scientific motivation

  • The standard GM closure parameterizes the eddy streamfunction as -K · slope, with K a chosen diffusivity (or K based on MEKE or some other method).
  • For the GM closure, one interpretation of the eddy streamfunction is in terms of TEM as <u'ρ'> / |<ρ_z>| (e.g. Ferrari et al. 2010 eqn 6). In our implementation the ANN is trained to predict this density flux directly, and the predicted flux is converted to a streamfunction by dividing by the local vertical density gradient (with a floor to avoid division by zero).
  • The training target is the coarse-grained density flux from a high-resolution simulation, which at the moment is from CM2.6.

What is added

A new module:

  • src/parameterizations/lateral/MOM_meso_sfn_ANN.F90 — main code for predicting density fluxes using ANN, and converting it to a streamfunction.

Modifications:

  • src/parameterizations/lateral/MOM_thickness_diffuse.F90 — adds the ANN streamfunction (Sfn_unlim_u_3D, Sfn_unlim_v_3D) on top of the GM streamfunction. The ANN coexists correctly with USE_STORED_SLOPES and the SKEBs path.
  • src/core/MOM_isopycnal_slopes.F90calc_isoneutral_slopes extended to optionally return density gradients (drdx_u, drdy_v, drdz_u, drdz_v) consumed by the ANN.
  • src/core/MOM.F90 — init wiring.

The trained ANN weights file (netCDF) is hosted in the m2lines/ANN-momentum-buoyancy-mesoscale repository.

How to enable

#override THICKNESSDIFFUSE = True
#override KHTH = 0.0
#override USE_THICKNESS_DIFFUSE_ANN = True
#override MESO_SFN_ANN_FILE = relative/or/absolute/path/to/ann_instance_20Dec.nc
#override MESO_SFN_ANN_WINDOW = 3

#override KHTH_USE_FGNV_STREAMFUNCTION = True
#override FGNV_FILTER_SCALE = 1.0
#override FGNV_C_MIN = 0.01

Other knobs (MESO_SFN_ANN_COEFF, MESO_SFN_MAG_GRAD_FLOOR, MESO_SFN_FLUX_CLAMP, MESO_UPSILON_CLAMP) have sensible defaults set from CESM-grid diagnosis.

We have only ever tested this with FGNV on.

Some runtimes from CESM

Tabulating mpp_clock statistics across    896 PEs...
                                      hits          tmin          tmax          tavg          tstd  tfrac grain pemin pemax
Total runtime                            1    483.492331    483.492427    483.492364      0.000026  1.000     0     0   895
(Ocean thickness diffusion *)          744     25.140072     38.153809     34.799101      2.384975  0.072    31     0   895

AI disclosure

AI tools (including Claude Code) were used in making this contribution — primarily for code-style auditing, doxygen drafting, dimensional-rescaling review, and merge-conflict analysis. All physics, numerical decisions, parameter defaults, and test choices were made by the human authors. We have read docs/Consortium-policy-on-AI.md.

Copilot AI review requested due to automatic review settings May 6, 2026 17:25
Comment thread src/core/MOM.F90 Outdated
Comment thread src/core/MOM_isopycnal_slopes.F90 Outdated
Comment thread src/parameterizations/lateral/MOM_thickness_diffuse.F90 Outdated
Comment thread src/parameterizations/lateral/MOM_meso_sfn_ANN.F90 Outdated
Comment thread src/parameterizations/lateral/MOM_meso_sfn_ANN.F90 Outdated
@Hallberg-NOAA Hallberg-NOAA added enhancement New feature or request Parameter change Input parameter changes (addition, removal, or description) labels May 6, 2026
dhruvbalwada added a commit to dhruvbalwada/MOM6 that referenced this pull request May 11, 2026
Address Hallberg's review comments about opaque-type encapsulation
(NOAA-GFDL#1109).

- MOM.F90: Drop the if/else around the two thickness_diffuse calls
  and pass u, v unconditionally. The previous gating on
  CS%thickness_diffuse_CSp%use_meso_sfn_ANN peeked inside the
  opaque thickness_diffuse_CS type. The runtime check inside
  thickness_diffuse at line 503 (CS%use_meso_sfn_ANN .and.
  present_vel) still gates the ANN call correctly; present_vel
  is now always true but the conjunction is harmless.

- MOM_thickness_diffuse.F90: Make use_meso_sfn_ANN private
  (was incorrectly declared public). No external readers remain
  after the MOM.F90 refactor.

Expected to be bitwise identical based on inspection of the runtime
gate at MOM_thickness_diffuse.F90:503 (the conjunction
CS%use_meso_sfn_ANN .and. present_vel still routes the same way).
Not locally verified by test.repro / test.regression; CI will check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dhruvbalwada added a commit to dhruvbalwada/MOM6 that referenced this pull request May 11, 2026
Per Hallberg's review feedback on NOAA-GFDL#1109: emphasize
in the Doxygen comments that the new drdx_u and drdy_v output
arguments of calc_isoneutral_slopes are evaluated along surfaces
of constant z (not along isopycnals or model interfaces).

All answers are bitwise identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dhruvbalwada added a commit to dhruvbalwada/MOM6 that referenced this pull request May 11, 2026
Address Hallberg's style review on NOAA-GFDL#1109:

- Loop variables: rename lowercase k -> K on interface-range
  loops (k=2,nz and k=1,nz+1) in meso_sfn_ANN_compute,
  center_grad_rho, and center2uv. Rename lowercase j -> J on
  v-point loops (j=js-1,je with v-point array access in the
  body). Update one inner u-loop (do i=is-1,ie -> do I=is-1,ie)
  to match the body's I usage.

- Array indexing: flip interface-staggered array accesses
  (drdx_c, drdy_c, drdx_u, drdy_v, drdz_u, drdz_v, Fx_c, Fy_c,
  Fx_u, Fy_v, sfn_u, sfn_v, var1_c, var2_c, var1_u, var2_v) to
  use uppercase K on their third index. Layer-staggered velocity
  arrays (dudx, dudy, dvdx, dvdy) keep lowercase k to signal
  "the layer below interface K".

- mag_grad expression: rewrite from the FMA-paranoid
  (US%Z_to_L*drd_*)*(US%Z_to_L*drd_*) form to Hallberg's
  preferred US%Z_to_L**2*drd_***2 form (he explicitly noted
  the extra parentheses are not needed for FMA/symmetry).

Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True. The
case-only changes are bit-identical by language definition (Fortran
is case-insensitive). The simplified mag_grad expression is
bit-different under FMA-aware compilers. Since the feature defaults
to .false. and has not been released, no _BUG flag is required.
Not locally verified; CI will check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/parameterizations/lateral/MOM_thickness_diffuse.F90 Outdated
Comment thread src/parameterizations/lateral/MOM_thickness_diffuse.F90 Outdated
dhruvbalwada added a commit to dhruvbalwada/MOM6 that referenced this pull request May 17, 2026
Address Hallberg's follow-up comments on NOAA-GFDL#1109:

- Drop `, optional` from the u and v dummy arguments of
  thickness_diffuse. Both callers in MOM.F90 unconditionally pass
  u,v after the previous opaque-type refactor (98038be), so the
  optional attribute is no longer needed.

- Remove the now-redundant `present_vel` variable (declaration
  and assignment), and simplify the ANN call gate from
  `if (CS%use_meso_sfn_ANN .and. present_vel)` to
  `if (CS%use_meso_sfn_ANN)`.

- Restore the truncated `v` Doxygen description and tighten
  whitespace on both u and v declaration lines.

All answers are bitwise identical: `present_vel` was always true
after the prior refactor, so the conjunction always took the same
branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dhruvbalwada added a commit to dhruvbalwada/MOM6 that referenced this pull request May 17, 2026
Per Hallberg's review on NOAA-GFDL#1109: avoid the cost of
the wider halo in find_eta for default users.

The ANN streamfunction needs the interface elevation `e` with a
wider halo to support its stencil. Previously the halo bump was
unconditional. Now it is gated on `CS%use_meso_sfn_ANN` so that
default users (USE_THICKNESS_DIFFUSE_ANN=False) retain the
original `halo_size=1` call.

Default-off users now match upstream dev/gfdl exactly at this
call site. ANN-on users are unchanged relative to the previous
branch state.

Hallberg also recommends verifying halo updates by comparing runs
at different MPI process counts (e.g., mpirun -n 63 vs -n 64);
that test has not yet been run. CI may catch some issues; the
cross-process-count check is a separate validation step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recent updates everything here is looking good to me. We will merge this as soon as we have access to the Gaea computer again and the latest version has passed the pipeline testing. Thank you for this valuable contribution to MOM6, @dhruvbalwada!

@Hallberg-NOAA
Copy link
Copy Markdown
Member

@dhruvbalwada, would you object if we were to squash the 15 commits in this PR into a single commit?

We retain all the commits in a PR if we think that they might be useful later in case we need to come back and do a git bisect, but in this case a number of the intermediate commits are correcting newly introduced bugs that we would rather not appear in the history. The other option is to go back and reimplement this PR with just the bug-free commits that we deliberately choose to retain. In a case like this the more common choice is just to squash the PR into a single commit.

dhruvbalwada and others added 12 commits May 19, 2026 10:00
Extend MOM_isopycnal_slopes to output density gradients (drdx_u,
drdy_v, drdz_u, drdz_v) as optional arguments from
calc_isoneutral_slopes. Extend MOM_thickness_diffuse to accept
externally computed streamfunctions (Sfn_unlim_u_3D, Sfn_unlim_v_3D)
in thickness_diffuse_full. Wire up the new ANN streamfunction module
in MOM.F90 with initialization, computation, and restart calls.
Introduces MOM_meso_sfn_ANN, which predicts density-flux-based
mesoscale streamfunctions using a neural network. Features include
configurable stencil windows for local density and velocity gradient
inputs, runtime-tunable clamping thresholds, boundary proximity
masking, and a fallback GM_simple mode for testing. The ANN outputs
buoyancy fluxes that are converted to streamfunctions via the
local density gradient magnitude.

This is answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True.
The parameter defaults to False, so no existing experiments are
affected.
Address Pavel Perezhogin's review and failing CI checks.

Review:
- Swap ANN_apply_vector_orig -> ANN_apply_vector_oi (faster path)
- Gate MOM_thickness_diffuse limiters on use_meso_sfn_ANN so the
  dev/gfdl baseline is unchanged when the ANN is off
- Refactor the streamfunction clamp to act on the velocity-scale
  Upsilon (Ferrari et al. 2010) before multiplication by dy_Cu/dx_Cv,
  giving a grid-independent cap. Rename the runtime parameter
  MESO_SFN_SFN_CLAMP to MESO_UPSILON_CLAMP and shift the default
  from 1.0e6 volume-transport units to 1.0e4 m2 s-1 velocity-scale.

Unit cleanup:
- Density flux variables (Fx_u, Fy_v, Fx_c, Fy_c) relabeled from
  [R L-1 ~> kg m-4] to [R L T-1 ~> kg m-2 s-1]
- Volume streamfunction dummy args sfn_u/sfn_v relabeled
  [L2 T-1] -> [Z L2 T-1 ~> m3 s-1] to match Sfn_unlim_u_3D
- mag_grad / mag_grad_floor switched from [R L-1] to [R Z-1]
  (the computation already produces [R Z-1] via the Z_to_L rescale
  of drdx_u; the floor comparison only works correctly under
  internal rescaling when both sides share the same internal unit)
- Fix conversion= factors on all 11 meso_sfn_* diagnostic fields;
  previous values were copy-pasted from sfn_unlim_x and did not
  match the registered quantities
- Add scale= argument to mag_grad_floor and MESO_UPSILON_CLAMP
  get_param calls for rescale-safe behavior
- Replace MESO_SFN_GRAD_MAG_EPSILON parameter with derived
  h_neglect-style locals rho_grad_neglect [R L-1] and
  vel_grad_neglect [T-1] (1e-30 scaled per dimension)

CI fixes:
- build-openmp: add present_drdx_u/drdx_u/drdz_u and
  present_drdy_v/drdy_v/drdz_v to the two !$OMP parallel do
  shared(...) clauses in calc_isoneutral_slopes
- check-style-and-docstrings: split multi-variable id_* declarations
  in MESO_SFN_ANN_CS so each has its own doxygen !< comment

Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Act on the broader audit of MOM_meso_sfn_ANN and fix the two
remaining CI failures.

Audit-driven changes:
- Remove the GM_simple and GM_ann model-type branches entirely; they
  were testing scaffolding. The only production path is nondim_rhoF_ann.
  This also removes the MESO_SFN_ANN_TYPE runtime parameter, the
  meso_sfn_ann_model_type CS field, and the now-unused local Kappa.
- Add a startup bounds check for MESO_SFN_ANN_WINDOW: the stencil
  shift must be <= 3 to stay inside the halo=3 provision in
  meso_sfn_ANN_compute, and the window must be an odd integer.
- Set and check the CS%initialized flag so using the module without
  calling _init produces a clear FATAL instead of later undefined
  behaviour.
- Rename lowercase runtime params to the MOM6 uppercase convention:
  meso_sfn_ann_coeff/type/window/file -> MESO_SFN_ANN_COEFF,
  MESO_SFN_ANN_WINDOW, MESO_SFN_ANN_FILE (type removed).
- Expand the module docstring with a Ford-style summary of the
  pipeline and an explicit warning that the dimensionalization in
  _compute is an implicit contract with the training procedure.
- Document the extended stencil bounds in the corner-point velocity
  gradient loop.

CI fixes:
- build-openmp: add Sfn_unlim_u_3D / Sfn_unlim_v_3D to the shared(...)
  clauses of the two !$OMP parallel do blocks in thickness_diffuse_full.
- check-style-and-docstrings: wrap the Fy_c declaration comment onto
  a second line so the line stays under the 120-char limit.

Breaking parameter changes (branch not yet released): MESO_SFN_ANN_TYPE
removed; the four lowercase params listed above renamed to uppercase.

Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use faster ANN inference interface (ANN_apply_array_sio). 18% acceleration, regression is the same
The tests passed:
test.grid
test.layout
test.restart
test.openmp
test.nan
test.dim

1) Trailing space is removed

2) Physical dimensionality is added to all float variables

3) Loop bounds are adjusted to get rid of out of bound memory access and pass test.grid

4) Scaling with physical units (i.e., structure US) is adjusted in multiple places to pass test.dim

5) Maximum allowed window size is limited to 3, which was tested now. Larger windows will likely needed calling pass_var (not yet implemented)
Default values of limiting parameters are adjusted based on the diagnosis of CESM ocean model output.

MESO_UPSILON_CLAMP reduced from 1e4 m2 s-1 to 15 m2 s-1. This choice guarantees that the maximum overturning per one grid point and water column is approximately 1 Sv on CESM grid (before limiting and smoothing performed in MOM_thickness_diffuse.F90). The previous default value could allow up to ~600 Sv per grid point, which is an enormously large number.

MESO_SFN_FLUX_CLAMP, which is ~100 kg m-2 s-1, is likely never activated. Still, this number is reasonable as maximum predicted fluxes in CESM are ~20 kg m-2 s-1.

MESO_SFN_MAG_GRAD_FLOOR, which is 1.0e-10 kg m-4, is a very realistic number for CESM. The density gradients are approximately in a range from 1e-10 up to 1e-1, with a couple of grid points having 1e-12.
Adding an exchange of halo points for the stream function predicted by the neural network. 

This halo exchange has no effect in simple geometries, but gaurantees Temperature/Salinity conservation on a tripolar grid, where rotational invariance is required to achieve conservation through the pole. Exact rotational invariance is not supported by the neural network. Rotational invariance is learned from data approximately, and is preserved less strictly than the numerical precision. Exchange of a streamfunction guarantees that there is no jump in the thickness flux at the grid point adjacent to the pole.
Address Hallberg's review comments about opaque-type encapsulation
(NOAA-GFDL#1109).

- MOM.F90: Drop the if/else around the two thickness_diffuse calls
  and pass u, v unconditionally. The previous gating on
  CS%thickness_diffuse_CSp%use_meso_sfn_ANN peeked inside the
  opaque thickness_diffuse_CS type. The runtime check inside
  thickness_diffuse at line 503 (CS%use_meso_sfn_ANN .and.
  present_vel) still gates the ANN call correctly; present_vel
  is now always true but the conjunction is harmless.

- MOM_thickness_diffuse.F90: Make use_meso_sfn_ANN private
  (was incorrectly declared public). No external readers remain
  after the MOM.F90 refactor.

Expected to be bitwise identical based on inspection of the runtime
gate at MOM_thickness_diffuse.F90:503 (the conjunction
CS%use_meso_sfn_ANN .and. present_vel still routes the same way).
Not locally verified by test.repro / test.regression; CI will check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Hallberg's review feedback on NOAA-GFDL#1109: emphasize
in the Doxygen comments that the new drdx_u and drdy_v output
arguments of calc_isoneutral_slopes are evaluated along surfaces
of constant z (not along isopycnals or model interfaces).

All answers are bitwise identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Hallberg's style review on NOAA-GFDL#1109:

- Loop variables: rename lowercase k -> K on interface-range
  loops (k=2,nz and k=1,nz+1) in meso_sfn_ANN_compute,
  center_grad_rho, and center2uv. Rename lowercase j -> J on
  v-point loops (j=js-1,je with v-point array access in the
  body). Update one inner u-loop (do i=is-1,ie -> do I=is-1,ie)
  to match the body's I usage.

- Array indexing: flip interface-staggered array accesses
  (drdx_c, drdy_c, drdx_u, drdy_v, drdz_u, drdz_v, Fx_c, Fy_c,
  Fx_u, Fy_v, sfn_u, sfn_v, var1_c, var2_c, var1_u, var2_v) to
  use uppercase K on their third index. Layer-staggered velocity
  arrays (dudx, dudy, dvdx, dvdy) keep lowercase k to signal
  "the layer below interface K".

- mag_grad expression: rewrite from the FMA-paranoid
  (US%Z_to_L*drd_*)*(US%Z_to_L*drd_*) form to Hallberg's
  preferred US%Z_to_L**2*drd_***2 form (he explicitly noted
  the extra parentheses are not needed for FMA/symmetry).

Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True. The
case-only changes are bit-identical by language definition (Fortran
is case-insensitive). The simplified mag_grad expression is
bit-different under FMA-aware compilers. Since the feature defaults
to .false. and has not been released, no _BUG flag is required.
Not locally verified; CI will check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dhruvbalwada and others added 2 commits May 19, 2026 10:00
Address Hallberg's follow-up comments on NOAA-GFDL#1109:

- Drop `, optional` from the u and v dummy arguments of
  thickness_diffuse. Both callers in MOM.F90 unconditionally pass
  u,v after the previous opaque-type refactor (98038be), so the
  optional attribute is no longer needed.

- Remove the now-redundant `present_vel` variable (declaration
  and assignment), and simplify the ANN call gate from
  `if (CS%use_meso_sfn_ANN .and. present_vel)` to
  `if (CS%use_meso_sfn_ANN)`.

- Restore the truncated `v` Doxygen description and tighten
  whitespace on both u and v declaration lines.

All answers are bitwise identical: `present_vel` was always true
after the prior refactor, so the conjunction always took the same
branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Hallberg's review on NOAA-GFDL#1109: avoid the cost of
the wider halo in find_eta for default users.

The ANN streamfunction needs the interface elevation `e` with a
wider halo to support its stencil. Previously the halo bump was
unconditional. Now it is gated on `CS%use_meso_sfn_ANN` so that
default users (USE_THICKNESS_DIFFUSE_ANN=False) retain the
original `halo_size=1` call.

Default-off users now match upstream dev/gfdl exactly at this
call site. ANN-on users are unchanged relative to the previous
branch state.

Hallberg also recommends verifying halo updates by comparing runs
at different MPI process counts (e.g., mpirun -n 63 vs -n 64);
that test has not yet been run. CI may catch some issues; the
cross-process-count check is a separate validation step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Hallberg-NOAA Hallberg-NOAA force-pushed the rho_flux_ANN_gfdl_ready branch from 83fdbf6 to 2295ee2 Compare May 19, 2026 14:00
@dhruvbalwada
Copy link
Copy Markdown
Author

dhruvbalwada commented May 19, 2026

Thank you for checking everything and all the feedback.
I don't mind squashing all commits - whatever is the right approach for MOM6 I am happy with it.

Also important to mention that none of this work would have been possible without @Pperezhogin, he helped a lot in both the development path and getting the initial PR ready.

@marshallward
Copy link
Copy Markdown
Member

@Hallberg-NOAA Could you make sure this line appears at end of the squash commit, to ensure that Pavel is attributed to the commit?

Co-authored-by: Pavel Perezhogin <pperezhogin@gmail.com>

@Hallberg-NOAA
Copy link
Copy Markdown
Member

Yes, we will absolutely make sure that Pavel gets the credit he deserves!

@Hallberg-NOAA
Copy link
Copy Markdown
Member

This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/30906 with the expected warnings about a new runtime parameter.

@Hallberg-NOAA Hallberg-NOAA merged commit bec6193 into NOAA-GFDL:dev/gfdl May 19, 2026
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Parameter change Input parameter changes (addition, removal, or description)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants