Add ANN-based mesoscale streamfunction parameterization#1109
Conversation
Address Hallberg's review comments about opaque-type encapsulation (NOAA-GFDL#1109). - MOM.F90: Drop the if/else around the two thickness_diffuse calls and pass u, v unconditionally. The previous gating on CS%thickness_diffuse_CSp%use_meso_sfn_ANN peeked inside the opaque thickness_diffuse_CS type. The runtime check inside thickness_diffuse at line 503 (CS%use_meso_sfn_ANN .and. present_vel) still gates the ANN call correctly; present_vel is now always true but the conjunction is harmless. - MOM_thickness_diffuse.F90: Make use_meso_sfn_ANN private (was incorrectly declared public). No external readers remain after the MOM.F90 refactor. Expected to be bitwise identical based on inspection of the runtime gate at MOM_thickness_diffuse.F90:503 (the conjunction CS%use_meso_sfn_ANN .and. present_vel still routes the same way). Not locally verified by test.repro / test.regression; CI will check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Hallberg's review feedback on NOAA-GFDL#1109: emphasize in the Doxygen comments that the new drdx_u and drdy_v output arguments of calc_isoneutral_slopes are evaluated along surfaces of constant z (not along isopycnals or model interfaces). All answers are bitwise identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Hallberg's style review on NOAA-GFDL#1109: - Loop variables: rename lowercase k -> K on interface-range loops (k=2,nz and k=1,nz+1) in meso_sfn_ANN_compute, center_grad_rho, and center2uv. Rename lowercase j -> J on v-point loops (j=js-1,je with v-point array access in the body). Update one inner u-loop (do i=is-1,ie -> do I=is-1,ie) to match the body's I usage. - Array indexing: flip interface-staggered array accesses (drdx_c, drdy_c, drdx_u, drdy_v, drdz_u, drdz_v, Fx_c, Fy_c, Fx_u, Fy_v, sfn_u, sfn_v, var1_c, var2_c, var1_u, var2_v) to use uppercase K on their third index. Layer-staggered velocity arrays (dudx, dudy, dvdx, dvdy) keep lowercase k to signal "the layer below interface K". - mag_grad expression: rewrite from the FMA-paranoid (US%Z_to_L*drd_*)*(US%Z_to_L*drd_*) form to Hallberg's preferred US%Z_to_L**2*drd_***2 form (he explicitly noted the extra parentheses are not needed for FMA/symmetry). Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True. The case-only changes are bit-identical by language definition (Fortran is case-insensitive). The simplified mag_grad expression is bit-different under FMA-aware compilers. Since the feature defaults to .false. and has not been released, no _BUG flag is required. Not locally verified; CI will check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Hallberg's follow-up comments on NOAA-GFDL#1109: - Drop `, optional` from the u and v dummy arguments of thickness_diffuse. Both callers in MOM.F90 unconditionally pass u,v after the previous opaque-type refactor (98038be), so the optional attribute is no longer needed. - Remove the now-redundant `present_vel` variable (declaration and assignment), and simplify the ANN call gate from `if (CS%use_meso_sfn_ANN .and. present_vel)` to `if (CS%use_meso_sfn_ANN)`. - Restore the truncated `v` Doxygen description and tighten whitespace on both u and v declaration lines. All answers are bitwise identical: `present_vel` was always true after the prior refactor, so the conjunction always took the same branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Hallberg's review on NOAA-GFDL#1109: avoid the cost of the wider halo in find_eta for default users. The ANN streamfunction needs the interface elevation `e` with a wider halo to support its stencil. Previously the halo bump was unconditional. Now it is gated on `CS%use_meso_sfn_ANN` so that default users (USE_THICKNESS_DIFFUSE_ANN=False) retain the original `halo_size=1` call. Default-off users now match upstream dev/gfdl exactly at this call site. ANN-on users are unchanged relative to the previous branch state. Hallberg also recommends verifying halo updates by comparing runs at different MPI process counts (e.g., mpirun -n 63 vs -n 64); that test has not yet been run. CI may catch some issues; the cross-process-count check is a separate validation step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hallberg-NOAA
left a comment
There was a problem hiding this comment.
With the recent updates everything here is looking good to me. We will merge this as soon as we have access to the Gaea computer again and the latest version has passed the pipeline testing. Thank you for this valuable contribution to MOM6, @dhruvbalwada!
|
@dhruvbalwada, would you object if we were to squash the 15 commits in this PR into a single commit? We retain all the commits in a PR if we think that they might be useful later in case we need to come back and do a git bisect, but in this case a number of the intermediate commits are correcting newly introduced bugs that we would rather not appear in the history. The other option is to go back and reimplement this PR with just the bug-free commits that we deliberately choose to retain. In a case like this the more common choice is just to squash the PR into a single commit. |
Extend MOM_isopycnal_slopes to output density gradients (drdx_u, drdy_v, drdz_u, drdz_v) as optional arguments from calc_isoneutral_slopes. Extend MOM_thickness_diffuse to accept externally computed streamfunctions (Sfn_unlim_u_3D, Sfn_unlim_v_3D) in thickness_diffuse_full. Wire up the new ANN streamfunction module in MOM.F90 with initialization, computation, and restart calls.
Introduces MOM_meso_sfn_ANN, which predicts density-flux-based mesoscale streamfunctions using a neural network. Features include configurable stencil windows for local density and velocity gradient inputs, runtime-tunable clamping thresholds, boundary proximity masking, and a fallback GM_simple mode for testing. The ANN outputs buoyancy fluxes that are converted to streamfunctions via the local density gradient magnitude. This is answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True. The parameter defaults to False, so no existing experiments are affected.
Address Pavel Perezhogin's review and failing CI checks. Review: - Swap ANN_apply_vector_orig -> ANN_apply_vector_oi (faster path) - Gate MOM_thickness_diffuse limiters on use_meso_sfn_ANN so the dev/gfdl baseline is unchanged when the ANN is off - Refactor the streamfunction clamp to act on the velocity-scale Upsilon (Ferrari et al. 2010) before multiplication by dy_Cu/dx_Cv, giving a grid-independent cap. Rename the runtime parameter MESO_SFN_SFN_CLAMP to MESO_UPSILON_CLAMP and shift the default from 1.0e6 volume-transport units to 1.0e4 m2 s-1 velocity-scale. Unit cleanup: - Density flux variables (Fx_u, Fy_v, Fx_c, Fy_c) relabeled from [R L-1 ~> kg m-4] to [R L T-1 ~> kg m-2 s-1] - Volume streamfunction dummy args sfn_u/sfn_v relabeled [L2 T-1] -> [Z L2 T-1 ~> m3 s-1] to match Sfn_unlim_u_3D - mag_grad / mag_grad_floor switched from [R L-1] to [R Z-1] (the computation already produces [R Z-1] via the Z_to_L rescale of drdx_u; the floor comparison only works correctly under internal rescaling when both sides share the same internal unit) - Fix conversion= factors on all 11 meso_sfn_* diagnostic fields; previous values were copy-pasted from sfn_unlim_x and did not match the registered quantities - Add scale= argument to mag_grad_floor and MESO_UPSILON_CLAMP get_param calls for rescale-safe behavior - Replace MESO_SFN_GRAD_MAG_EPSILON parameter with derived h_neglect-style locals rho_grad_neglect [R L-1] and vel_grad_neglect [T-1] (1e-30 scaled per dimension) CI fixes: - build-openmp: add present_drdx_u/drdx_u/drdz_u and present_drdy_v/drdy_v/drdz_v to the two !$OMP parallel do shared(...) clauses in calc_isoneutral_slopes - check-style-and-docstrings: split multi-variable id_* declarations in MESO_SFN_ANN_CS so each has its own doxygen !< comment Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Act on the broader audit of MOM_meso_sfn_ANN and fix the two remaining CI failures. Audit-driven changes: - Remove the GM_simple and GM_ann model-type branches entirely; they were testing scaffolding. The only production path is nondim_rhoF_ann. This also removes the MESO_SFN_ANN_TYPE runtime parameter, the meso_sfn_ann_model_type CS field, and the now-unused local Kappa. - Add a startup bounds check for MESO_SFN_ANN_WINDOW: the stencil shift must be <= 3 to stay inside the halo=3 provision in meso_sfn_ANN_compute, and the window must be an odd integer. - Set and check the CS%initialized flag so using the module without calling _init produces a clear FATAL instead of later undefined behaviour. - Rename lowercase runtime params to the MOM6 uppercase convention: meso_sfn_ann_coeff/type/window/file -> MESO_SFN_ANN_COEFF, MESO_SFN_ANN_WINDOW, MESO_SFN_ANN_FILE (type removed). - Expand the module docstring with a Ford-style summary of the pipeline and an explicit warning that the dimensionalization in _compute is an implicit contract with the training procedure. - Document the extended stencil bounds in the corner-point velocity gradient loop. CI fixes: - build-openmp: add Sfn_unlim_u_3D / Sfn_unlim_v_3D to the shared(...) clauses of the two !$OMP parallel do blocks in thickness_diffuse_full. - check-style-and-docstrings: wrap the Fy_c declaration comment onto a second line so the line stays under the 120-char limit. Breaking parameter changes (branch not yet released): MESO_SFN_ANN_TYPE removed; the four lowercase params listed above renamed to uppercase. Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use faster ANN inference interface (ANN_apply_array_sio). 18% acceleration, regression is the same
The tests passed: test.grid test.layout test.restart test.openmp test.nan test.dim 1) Trailing space is removed 2) Physical dimensionality is added to all float variables 3) Loop bounds are adjusted to get rid of out of bound memory access and pass test.grid 4) Scaling with physical units (i.e., structure US) is adjusted in multiple places to pass test.dim 5) Maximum allowed window size is limited to 3, which was tested now. Larger windows will likely needed calling pass_var (not yet implemented)
…Bs parameterizations
Default values of limiting parameters are adjusted based on the diagnosis of CESM ocean model output. MESO_UPSILON_CLAMP reduced from 1e4 m2 s-1 to 15 m2 s-1. This choice guarantees that the maximum overturning per one grid point and water column is approximately 1 Sv on CESM grid (before limiting and smoothing performed in MOM_thickness_diffuse.F90). The previous default value could allow up to ~600 Sv per grid point, which is an enormously large number. MESO_SFN_FLUX_CLAMP, which is ~100 kg m-2 s-1, is likely never activated. Still, this number is reasonable as maximum predicted fluxes in CESM are ~20 kg m-2 s-1. MESO_SFN_MAG_GRAD_FLOOR, which is 1.0e-10 kg m-4, is a very realistic number for CESM. The density gradients are approximately in a range from 1e-10 up to 1e-1, with a couple of grid points having 1e-12.
Adding an exchange of halo points for the stream function predicted by the neural network. This halo exchange has no effect in simple geometries, but gaurantees Temperature/Salinity conservation on a tripolar grid, where rotational invariance is required to achieve conservation through the pole. Exact rotational invariance is not supported by the neural network. Rotational invariance is learned from data approximately, and is preserved less strictly than the numerical precision. Exchange of a streamfunction guarantees that there is no jump in the thickness flux at the grid point adjacent to the pole.
Address Hallberg's review comments about opaque-type encapsulation (NOAA-GFDL#1109). - MOM.F90: Drop the if/else around the two thickness_diffuse calls and pass u, v unconditionally. The previous gating on CS%thickness_diffuse_CSp%use_meso_sfn_ANN peeked inside the opaque thickness_diffuse_CS type. The runtime check inside thickness_diffuse at line 503 (CS%use_meso_sfn_ANN .and. present_vel) still gates the ANN call correctly; present_vel is now always true but the conjunction is harmless. - MOM_thickness_diffuse.F90: Make use_meso_sfn_ANN private (was incorrectly declared public). No external readers remain after the MOM.F90 refactor. Expected to be bitwise identical based on inspection of the runtime gate at MOM_thickness_diffuse.F90:503 (the conjunction CS%use_meso_sfn_ANN .and. present_vel still routes the same way). Not locally verified by test.repro / test.regression; CI will check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Hallberg's review feedback on NOAA-GFDL#1109: emphasize in the Doxygen comments that the new drdx_u and drdy_v output arguments of calc_isoneutral_slopes are evaluated along surfaces of constant z (not along isopycnals or model interfaces). All answers are bitwise identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Hallberg's style review on NOAA-GFDL#1109: - Loop variables: rename lowercase k -> K on interface-range loops (k=2,nz and k=1,nz+1) in meso_sfn_ANN_compute, center_grad_rho, and center2uv. Rename lowercase j -> J on v-point loops (j=js-1,je with v-point array access in the body). Update one inner u-loop (do i=is-1,ie -> do I=is-1,ie) to match the body's I usage. - Array indexing: flip interface-staggered array accesses (drdx_c, drdy_c, drdx_u, drdy_v, drdz_u, drdz_v, Fx_c, Fy_c, Fx_u, Fy_v, sfn_u, sfn_v, var1_c, var2_c, var1_u, var2_v) to use uppercase K on their third index. Layer-staggered velocity arrays (dudx, dudy, dvdx, dvdy) keep lowercase k to signal "the layer below interface K". - mag_grad expression: rewrite from the FMA-paranoid (US%Z_to_L*drd_*)*(US%Z_to_L*drd_*) form to Hallberg's preferred US%Z_to_L**2*drd_***2 form (he explicitly noted the extra parentheses are not needed for FMA/symmetry). Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True. The case-only changes are bit-identical by language definition (Fortran is case-insensitive). The simplified mag_grad expression is bit-different under FMA-aware compilers. Since the feature defaults to .false. and has not been released, no _BUG flag is required. Not locally verified; CI will check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Hallberg's follow-up comments on NOAA-GFDL#1109: - Drop `, optional` from the u and v dummy arguments of thickness_diffuse. Both callers in MOM.F90 unconditionally pass u,v after the previous opaque-type refactor (98038be), so the optional attribute is no longer needed. - Remove the now-redundant `present_vel` variable (declaration and assignment), and simplify the ANN call gate from `if (CS%use_meso_sfn_ANN .and. present_vel)` to `if (CS%use_meso_sfn_ANN)`. - Restore the truncated `v` Doxygen description and tighten whitespace on both u and v declaration lines. All answers are bitwise identical: `present_vel` was always true after the prior refactor, so the conjunction always took the same branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Hallberg's review on NOAA-GFDL#1109: avoid the cost of the wider halo in find_eta for default users. The ANN streamfunction needs the interface elevation `e` with a wider halo to support its stencil. Previously the halo bump was unconditional. Now it is gated on `CS%use_meso_sfn_ANN` so that default users (USE_THICKNESS_DIFFUSE_ANN=False) retain the original `halo_size=1` call. Default-off users now match upstream dev/gfdl exactly at this call site. ANN-on users are unchanged relative to the previous branch state. Hallberg also recommends verifying halo updates by comparing runs at different MPI process counts (e.g., mpirun -n 63 vs -n 64); that test has not yet been run. CI may catch some issues; the cross-process-count check is a separate validation step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
83fdbf6 to
2295ee2
Compare
|
Thank you for checking everything and all the feedback. Also important to mention that none of this work would have been possible without @Pperezhogin, he helped a lot in both the development path and getting the initial PR ready. |
|
@Hallberg-NOAA Could you make sure this line appears at end of the squash commit, to ensure that Pavel is attributed to the commit? |
|
Yes, we will absolutely make sure that Pavel gets the credit he deserves! |
|
This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/30906 with the expected warnings about a new runtime parameter. |
This PR adds a neural-network parameterization of the mesoscale eddy streamfunction as an alternative to (or addition on top of) the standard Gent–McWilliams thickness diffusion. It uses the
MOM_ANN.F90inference framework. This is an extension of the Balwada et al. 2-layer paper to any arbitrary vertical grid where computation of density gradients is possible.This PR was collaboratively authored by Dhruv Balwada and Pavel Perezhogin (@Pperezhogin).
Scientific motivation
-K · slope, with K a chosen diffusivity (or K based on MEKE or some other method).<u'ρ'> / |<ρ_z>|(e.g. Ferrari et al. 2010 eqn 6). In our implementation the ANN is trained to predict this density flux directly, and the predicted flux is converted to a streamfunction by dividing by the local vertical density gradient (with a floor to avoid division by zero).What is added
A new module:
src/parameterizations/lateral/MOM_meso_sfn_ANN.F90— main code for predicting density fluxes using ANN, and converting it to a streamfunction.Modifications:
src/parameterizations/lateral/MOM_thickness_diffuse.F90— adds the ANN streamfunction (Sfn_unlim_u_3D,Sfn_unlim_v_3D) on top of the GM streamfunction. The ANN coexists correctly withUSE_STORED_SLOPESand the SKEBs path.src/core/MOM_isopycnal_slopes.F90—calc_isoneutral_slopesextended to optionally return density gradients (drdx_u,drdy_v,drdz_u,drdz_v) consumed by the ANN.src/core/MOM.F90— init wiring.The trained ANN weights file (netCDF) is hosted in the m2lines/ANN-momentum-buoyancy-mesoscale repository.
How to enable
Other knobs (
MESO_SFN_ANN_COEFF,MESO_SFN_MAG_GRAD_FLOOR,MESO_SFN_FLUX_CLAMP,MESO_UPSILON_CLAMP) have sensible defaults set from CESM-grid diagnosis.We have only ever tested this with FGNV on.
Some runtimes from CESM
AI disclosure
AI tools (including Claude Code) were used in making this contribution — primarily for code-style auditing, doxygen drafting, dimensional-rescaling review, and merge-conflict analysis. All physics, numerical decisions, parameter defaults, and test choices were made by the human authors. We have read
docs/Consortium-policy-on-AI.md.