Skip to content

Prepare for PR to dev/gfdl#6

Open
nikizadehgfdl wants to merge 2043 commits into
nikizadehgfdl:merge-bgc-obc-3_fix_restartfrom
NOAA-GFDL:dev/gfdl
Open

Prepare for PR to dev/gfdl#6
nikizadehgfdl wants to merge 2043 commits into
nikizadehgfdl:merge-bgc-obc-3_fix_restartfrom
NOAA-GFDL:dev/gfdl

Conversation

@nikizadehgfdl
Copy link
Copy Markdown
Owner

No description provided.

nikizadehgfdl pushed a commit that referenced this pull request Aug 12, 2022
Add initialization tests using CS%initialized
@marshallward marshallward force-pushed the dev/gfdl branch 3 times, most recently from 8a66adc to 9de6ce7 Compare September 11, 2023 18:27
marshallward and others added 25 commits July 16, 2025 03:47
This patch rewrites the vertvisc_coef, find_coupling_coef, and
find_coupling_coef_gl90 functions as loops in k-j-i form.  Many
operations previously applied over i-k slices are now applied
concurrently over i-j slices.

This changes (and in some cases reduces) the cache usage, but it greatly
increases the concurrency of the solvers, which should be more favorable
to GPU migration.

In the "profile benchmark" p0 test, the runtime reduced by 8%, so this
new method should be suitable for CPU and GPU.

Several arrays have been promoted to 3D in order to facilitate the 3d
structure:

* hvel, dz_vel, z_i, z_i_gl90
* hvel_shelf, dz_vel_shelf
* a_cpl, a_cpl_gl90
* dz_harm

There is a solution in here which reduces find_coupling_coef to an i-j
operation, which would further reduce the number of 3d arrays and
perhaps improve performance, but at the moment I can't quite get there.

There is also a possibility that arrays like a_cpl, h_ml, could be
replaced with their equivalent arrays in CS such as CS%a_[uv], but this
would require some method to handle the different shapes of a_u and a_v.

Because of the extensive penetration of the loop inversion, there isn't
a practical means of breaking this into multiple commits, and the
simplest approach is to accept it as a single commit.

Additional changes:

* CS is no longer defined as a pointer, and is passed on stack.
  Association checks are also removed

  This enables vectorization of loops which contain CS

* A new function, `touch_ij`, which makes a trivial modification to loop
  indices i and j, is called before the u and v stages of vertvisc_coef.

  The presence of this function seems to force the compiler to inspect
  the dependency of i and j during interprocedural optimization (IPO)
  and find additional optimizations.

  It appears to be responsible for a 4% speedup.

* Most operations on a_cpl_gl90 are now conditional, rather than
  assuming that the array is zero when GL90 is disabled.

This is not the final word on vertvisc_coef optimization, but I think
it's good enough to move forward.
  Avoid a possible segmentation fault and the possibility of flipping the signs
of the normal velocities multiple times in rotate_OBC_segment_data(), depending
on the kinds of input OBC data that are being read in from a file.  The problem
with duplicate velocity sign changes would occur if both tidal and non-tidal
velocities are being read from a file.  These bugs only impact cases with
certain types of OBCs being specified from a file and with grid rotation turned
on.  It seems that such cases never worked with grid rotation before, so these
bugs are being fixed directly rather than via a runtime parameter.  This commit
will change answers with grid rotation in some cases with OBCs, but answers are
bitwise identical in all cases without OBCs or without grid rotation enabled.
  Moved the OBC initialization code that had previously been in
MOM_initialize_state() into the new routine MOM_initialize_OBCs().  This is a
first step toward further refactoring that eventually do all of the
initialization of the OBCs no the final rotated grid to avoid a confusing set of
duplicated pointers and extra function calls.  All answers are bitwise identical
but there is a new publicly visible interface and new calls to it from within
initialize_MOM.
  Moved various calls out of MOM_initialize_OBCs to simplify the OBC
initialization code and the amount of rotation-specific OBC code that is needed.
The call to initialize_segment_data is moved into initialize_MOM adjacent to
other OBC configuration code that depends on the vertical grid.  The call to
update_OBC_segment_data now occurs later in initialize_MOM and is now shared
between the cases where there is and is not grid rotation in use.  There is no
longer any need to retain set_tracer_data, so it has been removed.  In addition,
changes were made to supercritical_set_OBC_data to support rotated grids.  All
answers are bitwise identical.
  Call MOM_initialize_OBCs() only with the rotated OBC type after the
initialization and potentially rotation of the overall model state in a single
call that is used regardless of whether index rotation is being used, rather
than having two separate calls to MOM_initialize_OBCs() depending on whether
grid rotation is in use.  All answers are bitwise identical.
  Split rotate_OBC_init() into rotate_OBC_segment_fields() and
rotate_OBC_segment_tracer_registry(), and also split out
rotate_OBC_segment_tracer_data() from rotate_OBC_segment_data(), so that the
segment tracer registry can be handled separately and at different points in the
code from the other information on the OBC segment types.  In addition, the
calls to these two new interfaces have been moved much earlier in initialize_MOM
to sit next to other related OBC code.  This is being done in anticipation that
the tracer registries need not be dealt with on the unrotated OBC type when
there is grid rotation and that this will lead to a simplification of the OBC
code and the calls to it.  All answers are bitwise identical, but there are
changes to the public interfaces to the OBC code.
  Eliminate the rotated-index calls to register_temp_salt_segments() and
rotate_OBC_segment_tracer_registry() from iniitalize_MOM() because there is no
longer any use of the segment tracer registries from the unrotated OBC type.
Rotate_OBC_segment_tracer_registry() and rotate_OBC_segment_tracer_data() are no
longer called at all, so they have been eliminated.  This commit also moves
around some OBC-related calls in initialize_MOM to consolidate OBC-related code
blocks, and eliminated the module use statements for some unused routines.  All
answers are bitwise identical, but a publicly visible interface has been
eliminated.
  Added calls to open_boundary_end for the potentially rotated OBC type in
MOM_end() and for the unrotated OBC type after the last time it is used in
initialize_MOM().  The former is standard clean-up at the end of a run, while
the latter will avoid a one-time memory leak when the model is run with
grid-rotation.  All answers are bitwise identical.
  Merged the functionality provided by rotate_OBC_segment_fields() into
rotate_OBC_config() and removed rotate_OBC_segment_fields().  All answers are
bitwise identical but there is one fewer public OBC interface.
  Merged the functionality provided by open_boundary_setup_vert() into
initialize_segment_data() and removed open_boundary_setup_vert().  All answers
are bitwise identical but there is one fewer public OBC interface.
…ded (#936)

* OBC configuration consistency check included

In the subroutine parse_segment_str, a check is now executed to ensure the number of OBC types for each segment is not above 8. The model now stops if more than 8 segments are configured in MOM_input.

* modification of OBC string consistency check

the maximum number of OBC types is now derived from the length of action_str
* Fix issues in MOM_diagnose_KdWork logic

- Fix missing allocation logic for diffusivities needed by idz variables.
- Fix a logic mistake cross-up of ePBL diagnostic id that should have been bkgnd diagnostic id

* Adds code to compute ePBL BBL mstar and bottom boundary layer depth

- Adds output of buoyancy flux from geothermal routine, needed for computing bottom mstar.
- Adds capability into ePBL to use mstar coefficient in BBL, including options to specify the same or different as the surface scheme
- Adds capability to compute bottom mixed layer depth into surface energy based mixed layer depth diagnostic.
- Fixes a discrepency with surface mixed layer depth calculation while preserving old calculation via bug flag.

* Update formatting and rescale ustar_bbl_z_t for non Boussinesq mode in ePBL

* Fix sign error introduced in previous commit

* Wrong ustar BBL was being rescaled in ePBL BBL in prev commit

* ePBL Mstar and bottom MLD commit updates
- ePBL MStar in non Boussinesq scaling updated for converting to ustar in Z_T
- Added more robust PE calculation in MLD calculation with OM4 flag that preserves old answer
- Referenced bottom MLD claculation potential density to bottom pressure instead of surface pressure.

* Fixed sign error in PE anomaly bottom MLD calculation

---------

Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov>
  Add the new runtime option MASK_COASTAL_PRESSURE_FORCE to use land masks to
zero out the diagnosed barotropic pressure gradient accelerations at coastal or
land points.  If it is enabled, this changes diagnostics and it improves the
reproducibility of certain debugging checksums, but it does not alter the
solutions themselves.  By default, all answers and diagnostics are bitwise
identical, but there is a new runtime parameter in some MOM_parameter_doc files.
* Update to allow reproducibility for NMME initialization runs.

Uses DEFAULT_ANSWER_DATE <20190101 to reproduce forecast results from older executable.

* Correcting trailing spaces.

* (*) Update to allow for reproducibility of NMME results.

Will change logfiles due to new parameter (REPRODUCE_2018_NMME_ANSWERS) being added.

* Broke two lines into continuation lines due to length exceeding 132 characters previously.

* Fixed trailing spaces
Flags `do_advection` and `do_diabatic` in subroutine step_MOM are
simplified for slightly better readability and efficiency.
If FRAZIL=False tv%frazil is not allocated but its halo might still
be updated in post_diabatic_halo_updates, which will fail.

If FRAZIL=True, no answers are changed.

If FRAZIL=False, no answers from before this bug was introduced are changed.
update MOM6 to its main repo. 20250801 commit
  Updated the default values of 15 runtime parameters, as agreed upon in a MOM6
consortium wide conversations on July 29, 2024 and December 16, 2024.  The most
prominent of these is that the default equation of state is now EQN_OF_STATE =
"WRIGHT_FULL", in place of the buggy previous default of "WRIGHT".

  The 8 answer date parameters REGRIDDING_ANSWER_DATE, TIDES_ANSWER_DATE,
WAVE_INTERFACE_ANSWER_DATE, MEKE_GM_SRC_ANSWER_DATE, NDIFF_ANSWER_DATE,
KPP%ANSWER_DATE, HOR_DIFF_ANSWER_DATE and LOTW_BBL_ANSWER_DATE all now take
their default values from DEFAULT_ANSWER_DATE.

  The bug-retention parameters MEKE_GM_SRC_ALT_SLOPE_BUG, HOR_DIFF_LIMIT_BUG,
BACKSCATTER_UNDERBOUND, DETERMINE_TEMP_CONVERGENCE_BUG, LA_MISALIGNMENT_BUG and
IDL_HURR_SCM_EDGE_TAPER_BUG are now false by default.

  The MOM_input files for the test cases in the `.testing/tc[01234]` directories
were updated to explicitly set all of these parameters to their previous default
values if they are used in the relevant cases, as well as parameters that will
be change following discussions from June 2, 2205.

  Bitwise identical answers are recovered if all 15 of these parameters are set
explicitly, but answers can change if any of these parameters take default
values.  All of these default changes are the consensus decision of
consortium-wise MOM6 dev calls.
  Updated the default values of 6 runtime parameters, as agreed upon in a MOM6
consortium wide conversations on June 2, 2025. VISC_REM_BUG and FRICTWORK_BUG
are now false by default. MASS_WEIGHT_IN_PRESSURE_GRADIENT_TOP now takes its
default value from the setting for MASS_WEIGHT_IN_PRESSURE_GRADIENT.  Similarly,
the default for DRAG_DIFFUSIVITY_ANSWER_DATE is now set to follow
SET_DIFF_ANSWER_DATE. KELVIN_WAVE_VEL_NUDGING_TIMESCALE is now set to a negative
value by default, forcing the user to provide a valid value at runtime.
SHELFWAVE_CORRECT_AMPLITUDE now defaults to True.
  Obsoleted 7 runtime parameters that always take the same value, as agreed upon
in a MOM6 consortium wide conversations on June 2, 2025, and added tests to
catch these parameters in MOM_obsolete_params.  The parameters that were
obsoleted include BETTER_BOUND_KH, BETTER_BOUND_AH, USE_DIABATIC_TIME_BUG,
FIX_UNSPLIT_DT_VISC_BUG, CFL_BASED_TRUNCATIONS and KD_BACKGROUND_VIA_KDML_BUG.
The runtime parameter MAXVEL still exists, but it is logged in a different order
than before, changing the contents of the MOM_parameter_doc files. All answers
are bitwise identical in cases that run, but cases that use these parameters
will experience a fatal error.
* (*) Add option for minmod limiter for RayTracing

This PR adds a minmod limiter option for the advection
of the internal tides energy, which becomes the new default.
The current positive definite scheme is kept as an option
but has shown to create ripples at the grid scale.

* change adv_limiter options from string to integer

---------

Co-authored-by: Raphael Dussin <Raphael.Dussin@noaa.gov>
* (*) Multiple fixes for the ray tracing

- Solve the issue of rays not propagating through the northfold: the
use of pass_vector for speed_[x/y] is not appropriate since these arrays
are meant to be scalar and the direction is contained in the angle dimension
of energy. changing to regular pass_var at the appropriate cell location fixes it

- The energy gets trapped at critical latitude: this PR introduces 2 options for
energy propagation, either propagate along the critical latitude or reflect on it.
These are controlled by TURN_CRITICAL_LAT (False: get trapped, True: do something)
and REFLECT_CRITICAL_LAT (True: reflect like a solid wall, False: propagate along)

- Several divisions by constant number were eliminated

* (*) Multiple fixes for the ray tracing

- Solve the issue of rays not propagating through the northfold: the
use of pass_vector for speed_[x/y] is not appropriate since these arrays
are meant to be scalar and the direction is contained in the angle dimension
of energy. changing to regular pass_var at the appropriate cell location fixes it

- The energy gets trapped at critical latitude: this PR introduces 2 options for
energy propagation, either propagate along the critical latitude or reflect on it.
These are controlled by TURN_CRITICAL_LAT (False: get trapped, True: do something)
and REFLECT_CRITICAL_LAT (True: reflect like a solid wall, False: propagate along)

- Several divisions by constant number were eliminated

* add call to turning latitude in propagate_x

This should satisfy the rotational symmetry.

As expected, this has no impact on global case
since reflected energy from propagate_x then
does not need to be reflected in propagate_y.

* update units order

---------

Co-authored-by: Raphael Dussin <Raphael.Dussin@noaa.gov>
When debugging layout non-reproducible problems, this hchksum showed
up as having changes in the halo BUT it is immediately followed by
a pass_var(). Either we need to only check the C-domain or do hchksum
with halos after the pass_var().
chrisb13 and others added 30 commits April 20, 2026 10:08
Add three new documentation files and supporting pointer files to
guide both human contributors and AI coding assistants:

- docs/Consortium-policy-on-AI.md : Project policy on AI-assisted
contributions, authored by Hallberg and Ward. Establishes that a human
contributor is accountable for all submitted code, requires disclosure
of AI assistance in PRs, and catalogs specific risks (physical
correctness, comprehension debt, vacuous tests, code churn, unsafe
code). Permits AI for development assistance, debugging, documentation,
CI triage, and PR review; prohibits autonomous contribution without
human oversight.

- docs/AGENTS.md : Technical instructions for AI coding assistants
covering the parameter system (get_param, _BUG flags, ANSWER_DATE),
diagnostics (registration, posting, masking conventions), the test
suite (.testing/ with tc0–tc4, test categories, verification via
bitwise-identical checksums), git workflow (fork-based branching, commit
message conventions with */+ prefixes, PR description style), physics
domain knowledge, common development tasks (new parameterizations,
diagnostics, runtime parameters, bug fixes), architecture constraints
(infra/framework layering), and defensive programming practices.

- docs/Code-style.md : Fortran code style guide migrated from the wiki
and extended with sections on module structure, naming conventions,
memory macros (SZI_, SZIB_, etc.), the dimensional unit annotation
system ([rescaled ~> MKS]), and Doxygen documentation requirements.
The line-length guideline is set to 120 characters to match what
trailer.py and the GitHub Action enforce. Code-style.md is excluded
from doxygen INPUT because its example code blocks contain \namespace
and #include which doxygen parses as commands even inside fenced code
blocks, producing spurious warnings that fail CI.

Pointer files so AI tools auto-discover the conventions:
- CLAUDE.md (Claude Code) — uses @docs/AGENTS.md directive
- AGENTS.md (OpenAI Codex / OpenCode) — links to docs/AGENTS.md
- .github/copilot-instructions.md (GitHub Copilot) — links to
  docs/AGENTS.md

Also updates existing documentation:
- README.md: expanded project description, added resource links
  (developer wiki, forum, CVMix, TEOS-10, GOTM)
- docs/code_organization.rst: updated directory tree to reflect current
  layout (added FMS2, nuopc_cap, timing_tests, unit_tests; removed
  mct_cap; added driver descriptions)
- docs/about.rst: fixed typos
- docs/front_page.md: link to Code-style.md now points to GitHub
  (mom-ocean/MOM6) since the file is no longer in the doxygen output
- docs/README.md: pointed Doxygen style references to docs/Code-style.md
  instead of the retired wiki pages
- docs/Doxyfile_*: added Consortium-policy-on-AI.md to doxygen INPUT

And, for full disclosure, copilot/Claude Opus 4.6 was used to reconcile
the redundant or conflicting information between our first draft of the
docs (on google drive) and files in the repository.
#1079)

This PR corrects an implementation of a finite-element solver for ice-sheet/shelf velocity. It computes a Jacobian weight J_q in shape functions for quadrature points and applies it to integrals over cells. This correction is significant when the shape of the cell is not rectangular. The same corrections are applied for subgrid parameterizations.

Additionally, it fixes missing deallocations of several arrays.

Changes made:
 - nitialize_ice_shelf_dyn: allocates a new variable `Jac`
 - bilinear_shape_fn_grid: computes a Jacobian determinant (`Jac`) at each quadrature point
 - CG_action: uses `jac_wt= Jac * IareaT` for computing the ice deformation and basal traction terms
 - matrix_diagonal: uses `jac_wt= Jac * IareaT` for computing diagonal terms
 - CG_action_subgrid_basal: computes `jac_sub_wt` using new arguments `dxCv_S`, `dxCv_N`,`dyCv_E` and `dyCv_W` that determine cell-edge spacing
 - ice_shelf_dyn_end: deallocates `Jac`, `CS%sx_shelf`, `CS%sy_shelf`, `CS%u_flux_bndry_val`, `CS%v_flux_bndry_val`,`CS%calv_mask`, `CS%Phi`, `CS%Phisub`, `CS%PhiC`.

Copilot (model Claude Sonet 4.6) assisted with this PR.
…1089)

* Vendor sphinxcontrib-autodoc_doxygen into docs/_ext/

This is piece 1 of the sphinx toolchain upgrade.  The docs build
previously depended on four forks which haven't been updated since 2020.

The chain pinned the entire build to Sphinx 3.2.1, requiring a growing
list of transitive version ceilings (jinja2<3.1, sphinxcontrib_*<1.0.x,
alabaster<0.7.14, setuptools<82).

This commit handles the sphinxcontrib-autodoc_doxygen dependency. The
upstream (rmcgibbo) has been dormant since June 2021; the fork was
effectively a rewrite (~90% of xmlutils.py changed, both documenters
replaced, the whole domain switched from cpp to f). Monkey-patching
the upstream package was rejected because there was nothing stable to
patch against. Instead the fork's code is vendored in-tree at
docs/_ext/autodoc_doxygen/ so it can be debugged, blamed, and edited
like any other project source.

Changes vs the fork at tag 0.7.13:

 - Renamed doxynamespace.rst template to doxymodule.rst and dropped
   the unused C++ doxyclass.rst (DoxygenClassDocumenter was already
   unregistered in the fork).
 - Removed ~80 lines of dead code: commented-out pdb.set_trace lines,
   the unregistered DoxygenClassDocumenter class, the dead
   visit_ref_angus alternative, _import_by_name_original, and the
   unused try-import of flint.
 - Dropped Python 2 compatibility cruft: from __future__ imports,
   from six import itervalues (replaced with dict.values()).
 - Ported four Sphinx 3 -> 8 API changes:
     * DoxygenAutosummary.get_items now passes self.bridge (the
       DocumenterBridge) to documenter constructors instead of self,
       so Documenter.__init__ finds directive.genopt.
     * self.warn / self.directive.warn calls (removed in Sphinx 8)
       replaced with sphinx.util.logging.getLogger(...).warning.
     * Wrapped env.doc2path in os.fspath to silence the
       RemovedInSphinx90Warning about str paths.

docs/conf.py: prepend _ext to sys.path and rename the extension in
the extensions list from sphinxcontrib.autodoc_doxygen to
autodoc_doxygen.

docs/conf.py also carries a monkey-patch for a parallel-build bug in
upstream VACUMM/sphinx-fortran's FortranDomain.merge_domaindata. The
upstream method references an undefined `outNames` (typo for
`ourNames`) and also unpacks the wrong tuple shape from modules and
objects dicts. With -j > 1 the merge silently fails and the f domain
ends up empty, losing every Fortran cross-reference. The patch is
heavily commented with a TODO to submit an upstream PR and drop the
workaround during piece 2 of the upgrade.

docs/requirements.txt: drop the sphinxcontrib-autodoc_doxygen git
dependency. The other three forks remain pending pieces 2, 3, 4.

Validation: built against a new venv.sph8 (Python 3.11 + Sphinx 8.2.3 +
sphinx-rtd-theme 3.1.0 + sphinxcontrib-bibtex 2.6.5 + upstream
VACUMM/sphinx-fortran master). `make html` produces 220 warnings vs
224 on the Sphinx 3 baseline. The f domain indexes 48 modules and
538 objects. Equation pages are effectively byte-identical (off only
by Sphinx 8's section-id hoist optimization and the pilcrow-as-CSS
theme change). Module pages are slightly larger because more
cross-references now resolve than in the baseline build.

* Swap sphinx-fortran fork for upstream VACUMM pinned commit

Piece 2 of the sphinx toolchain upgrade.

The jr3cermak/sphinx-fortran@1.2.2 fork existed to add Sphinx 3 API
compatibility in 2020. Upstream VACUMM/sphinx-fortran has continued
evolving since then (Sphinx logging API migration, parallel read
support, modern setuptools, removal of the future dependency) and now
works with Sphinx 8 directly. Upstream has not cut a PyPI release past
1.1.1 so we pin to a specific master commit for reproducibility.

The pinned commit ships a broken FortranDomain.merge_domaindata that
loses every f-domain object in parallel builds; the monkey-patch in
conf.py from the previous commit handles that. We will revisit that
patch and the pin together once an upstream PR is merged (see the
TODO(piece-2) marker in conf.py setup).

Validation: rebuilt docs with the pinned commit installed into
venv.sph8. `make html` produces 220 warnings (identical to the
pre-swap sph8 build), the f domain indexes 48 modules and 538 objects,
mom.html has 36 internal cross-reference links, and f-modindex.html is
generated. No behavior change from the sph8 build validated in the
previous commit.

* Drop sphinx fork for stock Sphinx 8

Piece 3 of the docs toolchain upgrade plan.
This commit removes the last need for the patched sphinx fork and
collapses the long list of transitive version ceilings it forced.

Changes:

 - requirements.txt: drop git+https://github.com/jr3cermak/sphinx.git
   @v3.2.1mom6.4. Pin sphinx>=8,<9 from PyPI. Drop the eight
   ceiling lines (jinja2<3.1, sphinxcontrib_*<1.0.x, alabaster<0.7.14,
   setuptools<82.0.0) and the workaround comment around sphinx-fortran's
   broken requirements.txt; those existed only because Sphinx 3.2.1 dragged
   them in. Add lxml as an explicit dep (it is required by the vendored
   _ext/autodoc_doxygen extension to parse Doxygen XML). Keep six because
   the pinned sphinx-fortran commit imports it at module load time.

 - conf.py: add a monkey-patch for sphinx.util.math.wrap_displaymath that
   replicates the functional changes in the jr3cermak fork. The patch
   detects parts containing `\begin{equation}`, `\begin{eqnarray}`, or
   `\begin{align}` and emits them verbatim (no outer
   `\begin{equation}\begin{split}...\end{split} \end{equation}` wrapping).
   Without this, MOM6's math-heavy sources produce nested LaTeX
   environments that pdflatex chokes on. The patch is heavily commented
   with the rationale and a TODO marker for a possible upstream
   contribution. Verified by unit-testing the function directly (plain
   math still wrapped, explicit-environment math passed through verbatim)
   and by inspecting the generated _build/latex/ MOM6.tex: 0 double-wrapped
   equation/eqnarray/align blocks, 100 + 18 + 17 explicit environments
   preserved as top-level blocks.

 - conf.py: tighten exclude_patterns. Sphinx walks the entire source
   tree by default, which previously caused it to descend into local
   virtualenvs (`venv.sph3/`, `venv.sph8/`, `venv-3.11/`, etc.) and
   pick up LICENSE.rst, README.rst, and stray autosummary template
   files from inside site-packages. Each of those generated a "document
   isn't included in any toctree" warning and a corresponding stray
   .html file in the build output. The new exclusions cover venv*,
   venv-*, _build.*, and _ext/*/templates. Final build: 90 HTML files
   (down from 148, the lost 58 were all venv junk and template
   scaffolding) and 122 warnings (down from 220, with the same project-
   internal warnings as before plus the venv noise removed).

 - Makefile: add the equation post-processing hook to the html target,
   gated on UPDATEHTMLEQS=Y. The jr3cermak/sphinx fork carried this
   as a patch to sphinx/cmd/build.py so it ran after every sphinx-build
   invocation. Now invoked from the Makefile so it works against
   stock upstream Sphinx. The existing nortd target already had the
   same hook with different arguments (-p APIs -b doxygen vs -p html
   -b sphinx).

The forked sphinx-fortran fix and the upstream PR for the math wrapping
change are deferred to a follow-up.

Validation: built from a fresh venv (venv.sph8.fresh) created by
`python3.11 -m venv` and `pip install -r requirements.txt`. The
install resolves cleanly with no manual intervention. `make html`
succeeds with 122 warnings, indexes 48 modules and 538 f-domain
objects, generates f-modindex.html, and produces 36 internal cross-
reference links on mom.html. `make html UPDATEHTMLEQS=Y` runs the
post-processing script successfully. `make latex` produces 0
double-wrapped math environments.

* Drop flint dependency from docs build

Piece 4 of the docs toolchain upgrade plan.
With this commit the dependencies in docs/requirements.txt are only on
maintained upstream sources.

flint (marshallward/flint, vendored as jr3cermak/flint at 0.0.1) was
historically pulled into the docs build with the intent of patching
up Doxygen's incomplete parsing of Fortran functions with `result()`
clauses. The fork's sphinxcontrib-autodoc_doxygen imported it at
module load time but never actually called it; the import was a
no-op behind a `try: import flint; except: pass`. We removed that
import as part of piece 1's dead-code cleanup, so the vendored
extension no longer touches flint at all.

Verified empirically: uninstalled flint from venv.sph8.fresh and
rebuilt. The build is byte-identical to the previous one with flint
installed (122 warnings, 48 f-modules, 538 f-objects, 36 internal
cross-reference links on mom.html, f-modindex.html present, 90 html
files). Nothing in the current pipeline depends on flint.

The two surviving "flint" mentions in docs/_ext/autodoc_doxygen/ are
both in comments describing the original intent, not live code; they
are left in place as historical context. If a future enhancement
wants to actually fix Doxygen's result() clause parsing, that should
be a separate piece of work and would not necessarily resurrect
flint specifically.

docs/README.md is updated to drop the flint bullet from the
requirements list and the flint row from the credits table. Other
stale entries in that section (sphinxcontrib_autodox-doxygen, the
sphinx and sphinx-fortran fork rows) remain and will need a broader
README sweep in a follow-up.

* Sphinx upgrade follow-up fixes

Consolidates four small fixes that followed the Sphinx 8 upgrade:

- Switch all Makefile targets from `sphinx-build -b` to `-M` for
  consistent parallel build support.
- Update docs/README.md to match the modernized toolchain (remove
  references to the four sphinx* forks, update requirements and
  credits sections).
- Fix crash in autodoc_doxygen visit_image on empty <image>
  elements (node.text is None when doxygen produces <image/>
  without caption text).
- Add __pycache__ and *.pyc to docs/.gitignore so the vendored
  extension's bytecode does not appear in git status.

* RTD build infrastructure

Consolidates four fixes for the Read the Docs build environment:

- Override RTD's default sphinx runner with a build.jobs.build.html
  entry in .readthedocs.yml that runs sphinx-build -M html -j auto,
  since RTD's high-level sphinx: key has no parallelism option.
- Make doxygen_xml path resolution robust to cwd changes: resolve
  relative paths against app.confdir rather than the ambient cwd,
  which differs between local builds and RTD.
- Switch Makefile html target from -j 4 to -j auto to match RTD.
- Enable html_static_path = ['_static'] in conf.py so custom CSS
  files (autodoxysource.css) are copied into the build output.
  Previously commented out, which meant app.add_css_file() added
  the <link> tag but the file was never deployed.

* Fix O(N^2) XPath scans in autodoc_doxygen

Two instances of the same class of bug in the vendored
autodoc_doxygen extension, both scanning the entire merged doxygen
XML tree (1230 files, 109 MB with programlisting) on every call:

1. scanNode used `node.xpath('//latexonly')` etc. In XPath, `//`
   starts from the document root, not from `node`. Since the
   extension merges all XML into one tree, every call scanned
   the whole tree. Fix: `//` -> `.//` (descendant-of-self).
   Recovery: 6m33s -> 48.8s wall clock at full input scale.

2. visit_ref used `get_doxygen_root().findall('.//*[@id=X]')` to
   resolve each prose <ref> by linearly scanning every element in
   the merged tree. With XML_PROGRAMLISTING=YES the tree tripled
   in size, making this the dominant cost (250s of 911s serial).
   Fix: lazy {id: element} dict built once on first use, O(1)
   lookup thereafter. Recovery: 911s -> 676s serial.

* Add source code browser to Sphinx docs

Generate one HTML page per Fortran source file with syntax
highlighting and clickable cross-references. Clicking an
identifier in the source jumps to its API documentation page;
each API entry gets a [source] link back to the source listing.

Implementation:
- Enable XML_PROGRAMLISTING in Doxyfile_rtd (standalone commit
  originally, now folded in).
- New autodoxysource directive in _ext/autodoc_doxygen/ that walks
  doxygen's <programlisting> XML, emitting per-line anchors,
  CSS-classed highlight spans, and pending_xref nodes for
  identifiers.
- Stub generator produces 329 :orphan: source pages under
  api/generated/source/.
- [source] links added to DoxygenMethodDocumenter,
  DoxygenTypeDocumenter, and DoxygenModuleDocumenter.
- CSS styled to match the sphinx_rtd_theme's code blocks (font
  stack, size, colors, line gutter).
- Node count optimization: coalesce text runs in _walk_highlight
  (10-30 nodes/line -> 3-5), set support_smartquotes=False on the
  source container to skip the smartquotes transform. Combined
  with the visit_ref fix, total build time with the source browser
  is 115s parallel (vs 360s before optimization).

* Improve API documentation

Consolidates five improvements to the rendered API pages:

- Show Fortran type declarations for function/subroutine
  parameters and derived-type members. Types are rendered as
  inline code (e.g. ``real, intent(in), optional``) by looking
  up the <param> elements on the parent <memberdef>.
- Move [source] links from the top of each entry to the bottom,
  so the description and parameters appear first.
- Fix functions (as opposed to subroutines) not being registered
  in the sphinx-fortran domain. format_name() was prepending the
  return type (e.g. "real") which sphinx-fortran's f_sig_re regex
  cannot parse, leaving function entries without anchors.
- Add Functions and Source Files index pages to the API Reference,
  with cross-reference links to module pages and source browser
  pages respectively.
- Exclude upstream CVMix sources from doxygen input to avoid
  warnings from undocumented third-party code.

* Fix multi-line math blocks losing indentation in LaTeX

Multi-line math content (e.g. \begin{align}...\end{align}) from
doxygen XML was emitted as:

    .. math:: \begin{align}
    \mathbf{v}
      = \mathbf{u} + ...
     \end{align}

RST requires directive body content to be indented. The
continuation lines at column 0 were parsed as regular text by
docutils, producing garbled LaTeX output with "Runaway argument"
and "Missing $" errors — 164 LaTeX errors on the Notation page
alone.

Fix: in visit_formula, detect multi-line math text and emit it
as a properly indented block:

    .. math::

       \begin{align}
       \mathbf{v}
         = \mathbf{u} + ...
       \end{align}

Single-line math is unchanged (stays on the .. math:: line).

After fix: `make latexpdf` LaTeX errors drop from 164 to 10.
The remaining 10 are unrelated (9 Unicode Greek characters in
source comments that pdflatex cannot render, 1 duplicate label).
Under optimization -O2 with ifort we had some floating-point divergences
in the new Recon1d schemes. Two schemes diverged from their counterparts
that use the old "select" pathways. One scheme failed self-consistency
tests due to inlining allowing some aggressive optimization to occur.

Recon1d_PPM_H4_2019:
- Add explicit parentheses in end_value_h4() to enforce evaluation order
  and ensure bit-reproducibility across compilers.

Recon1d_PPM_hybgen:
- Refactor reconstruct() into a cleaner two-pass approach using new
  private helpers (hybgen_ppm_coefs, bound_edge_values,
  check_discontinuous_edge_values) copied from phased-out modules to
  avoid external dependencies. Bit-for-bit with the old PPM_HYBGEN scheme.
- Add average() override that clamps sub-cell averages to
  [min(ul,ur), max(ul,ur)], reproducing force_bounds_in_subcell behaviour.

Recon1d_PLM_WLS:
- Fix LS_error(): remove 0.5/0.25 factors and reformulate as squared
  deviation from the optimal slope.
- Also updated Doxygen math which had some errors.
- In check_reconstruction(), copy state via assignment (perturbed = this)
  since an inline call to %reconstruct() on same data was yielding LSB
  differences.
* More fixes to achieve consistent rotation with FMAs

* New Krylov methods for ice-shelf velocity

Added MINRES and CR methods to solve the shallow shelf approximation.
Further optimized CG. The new CG is answer changing, but only because
IDIAGU/IDIAGv reciprocals are now procomputed to replace per-iteration
divisions with multiplications. MINRES and CR can be faster than CG.
Also added CG_HALO_SHRINK parameter to disable the halo-shrinking
scheme for CG; combined with the other changes, this reduces the
number of pass_vector calls to once per iteration, rather than 4
pass_vector calls every nihalo iterations. Disabling the
halo-shrinking can be faster for certain combinations of grid
properties (halo size, symmetric/nonsymmetric, grid size).

* Remove unneeded line

Removed a line that caused an error when compiling in debug mode,
where a variable was being set to the value of another variable that
was not set yet. This line was not needed.

* Add wrapper for ice-shelf Krylov methods

Added a wrapper subroutine that performs the shared setup for each of
the Krylov methods, calls the selected Krylov method, and assigns
BCs. Also: made sure the number of iters is returned from each method
in the unlikely case of an early exit; slight change to where CG
convergence is tested to avoid evaluating extra unneeded code.

* Fix ice-shelf inner solver units. Change exit point for MINRES for efficiency.

* Fix units for residual scaling within inner solvers

* Fix formatting
* Add register restart for taux_bot and tauy_bot with SPLIT_BOTTOM_STRESS
- taux_bot and tauy_bot need to be in the restart file when using SPLIT_BOTTOM_STRESS, but they presently are not.
- This commit adds them to the restart files in MOM_dynamics_split_RK2.

* Adds restart taux(y)_bot capability from RK2 to RK2b

* Fix formatting issues in MOM_dynamics_split_RK2b.F90

* Fixes for allocating bottom stress in static memory
- As suggest by R. Hallberg and copilot, the ALLOC_ macros should not have been used on taux/y_bot, so it has been replaced with explicit allocate statements.
- Fixed some minor formatting issues caught by copilot.

* Fix issue with deallocating taux/y_bot

---------

Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov>
Co-authored-by: Alistair Adcroft <adcroft@users.noreply.github.com>
  Refactored downsample_field_3d() and downsample_field_2d() and added the
option to use a rotationally symmetric sum when downsampling diagnostic output.
The new runtime parameter SYMMETRIC_DOWNSAMPLE_SUMS, triggers the use of a
symmetric order of the sums, either directly or for downsampling by a factor of
4 or more via a call to symmetric_sum().  Internally the refactoring stores the
values that are to be summed in a 2-d or 1-d array before calling the new
internal routine square_sum() or sum_1d().  There are also two comments added
noting where thicknesses are used at velocity points without averaging to
correct for the difference in their staggered locations.  By default, all
solutions and diagnostics are identical, although there is annoying change in
the locations of negative zeros for some diagnostics.  However, when
SYMMETRIC_DOWNSAMPLE_SUMS is set to true, answers are mathematically equivalent
but there are changes at roundoff in diagnostics that are written at 64-bit
precision.
Add AI-assistance policy, agent instructions, and code style docs
This commit addresses an unintentional answer change introduced in
commit 1e9ac39. Subroutine initialize_OBC_segment_reservoirs misses the
flag OBC%update_OBC_seg_data, which would be false during
initialization (maybe a bug itself). This flag defers the BGC tracer
initialization to subroutine fill_obgc_segments, which is called later.
The get_param call for DT_OBC_SEG_UPDATE_OBGC was placed before
MOM_initialize_fixed(), so OBC_in was always NULL() at that point. The
guard do_not_log=.not.associated(OBC_in) therefore suppresses the
parameter from MOM_parameter_doc even in runs with active open boundary
conditions.

Fix by moving the call to after OBC_in is set, right after
MOM_initialize_fixed.
Reworked the SSA FEM shape functions to ensure bitwise identical answers under rotation
  The tracer advection diagnostics had been accumulated using values of the do_i
masks that were not initialized, but this had no logical impact because the
tracer fluxes themselves were zero at any points where the value of the masks
might matter.  To address this, the unnecessary masks were eliminated, instead
relying on the fact that the advective tracer fluxes are already 0 at any points
that would ever be masked out.  Once merged in, this commit will have addressed
the issue noted at github.com//issues/952, which can then be
closed.  All solutions and diagnostics should be bitwise identical (apart from
the annoying possibility of changes in negative zeros).
  Added the optional runtime parameter may_reset to MOM_set_verbosity() to help
regulate when the verbosity can be changed in repeated calls on the same PEs.
Also added a new module variable, verbosity_set, to the MOM_error_handler module
to record whether the verbosity has already been set by a call to
MOM_set_verbosity().  The valid range for verbosity in MOM_set_verbosity was
increase from [1 to 9] up to [0 to 9] to match the description of verbosity in
various comments. In addition, the primary call MOM_set_verbosity() in
initialize_MOM() now includes the new argument with the value that will cause
the MOM6 setting for VERBOSITY to take precedence over any values coming from
other components (like SIS2 in some cases) that also use MOM_error_handler code
and run on the same PEs.  Also extended the description of the VERBOSITY
parameter to note the setting that causes the call tree messages to be printed.
All answers are bitwise identical, but there is a new optional argument to a
publicly visible routine.
* Diag buffer: Set fill_value_3d ks/ke size

The `fill_value_3d` unit test of `diag_buffer_unit_tests_3d` was not
setting its vertical extent, causing inaccurate allocation sizes and
potential segmentation faults.  (Or, in my case, complete memory
consumption and OS meltdowns.)

This patch adds `set_vertical_extent` and uses the same 10-element
width.  This appears to fix the issue for me.

* Diag buffers: Streamline testing

Some unused variables and incorrect comments have been removed from the
unit testing.

Assisted-by: copilot
`cuberoot()` uses the `modulo()` intrinsic to compute the exponent of
the rescaled value of its input `a` in the range [0.125,1).  However,
there are platforms (e.g. NVIDIA GPUs) which do not implement
`modulo()`.

Review of this function also revealed some unnecessary complexity which
was simplified by using the `sign()` intrinsic.

This patch updates the `rescale_cbrt()` implementation, and also
documents the underlying mathematics more carefully.
  Converted the internal subroutine add_xyz_method() in MOM_diag_mediator into
the function xyz_method() for greater generality and code-reuse.  It no longer
takes a diag_type argument, instead working directly with the contents of its
axes_grp type argument.  All answers are bitwise identical, and all publicly
visible interfaces remain the same.
  Set downsampled masks using the same stencils as is used for the masked
downsampling of the underlying data.  This required extensive revisions to the
downsample_mask_2d() and downsample_mask_3d() routines, including a new method
argument to indicate how the data is to be downsampled analogous to the argument
that was already being provided to the downsample_field routines.  All solutions
and primary diagnostics are bitwise identical, but this commit corrects a
glaring inconsistency in the masking of downsampled diagnostics.
  Eliminated the MSK downsampling method and the MSK option code in the
downsample_field() routines because this option did not take in to account the
points that are actually used in downsampling fields that are not at tracer
points.  Instead the downsampled mask is determined by calling
downsample_field() on an array of ones to determine where there are unmasked
downsampled points.  This change only applies in cases where an optional mask
argument is being supplied to a post_data() call (as is the case for some static
fields).  As post_data() is currently used in MOM6, a mask argument only is
passed in for diagnostics at tracer points so it is possible that there are not
actually any instances where this will change any answers.
  At one point, fill_miss_2d had an optional `prev` array that could be used to
fill in values that were not horizontally connected to other points at the same
level, and it issued a warning when that array was not provided.  However,
`prev` was always being provided, so in 2022 as a part of some more extensive
refactoring and clean-up of MOM_horizontal_regridding.F90 this optional argument
was made mandatory, and the extra logical test around the warning block became
identical to the first.  As noted in github.com//issues/929, this
lead to a block of code that would clearly never be reached.  This extra block
of code has now been removed.  All answers are bitwise identical.
  Added an else branch in thickness_diffuse_init to set CS%Use_KH_in_MEKE to
false when USE_MEKE is false, thereby avoiding the possible problems with
segmentation faults due to an uninitialized variable, as documented in
github.com/mom-ocean/issues/1696. It will remain a fatal error if
`USE_KE_IN_MEKE` is in an input file but `USE_MEKE = False` and
`FATAL_UNUSED_PARAMS = True`.  All answers that worked previously will be
bitwise identical, but this could fix problems with a segmentation fault arising
from an uninitialized logical variable.
* Change loop order from m-[ij]-k to m-k-[ij].
* Rename u_L/v_L to L_in/L_out for clarity.
* Add flux_to_res and face_area temporary variables for efficiency
* Hoist dir and OBC%tres_[xy] writes outside the tracer loop.
* Remove unused local variable ntr.
Copy resrv_lfac_in/out from segment%field (looked up by fd_id at
runtime in update_segment_tracer_reservoirs) into the segment tracer
registry. The copy can be populated via new optional arguments added to
register_segment_tracer, which is only used by BGC tracers.

resrv_lfac_in/out is used in subroutine register_segment_tracer, which
updates tracer reservoir. Logically, segment%field is the "raw" data,
internal to OBC module, while segment%tr_Reg is what is seen by the rest
of the model. Storing resrv_lfac_in/out in the registry frees the need
of segment%field data in update_segment_tracer_reservoirs.

The change eliminates the need to use fd_id to identify whether
resrv_lfac_in/out should be the default (1.0). Correspondingly,
fd_index optional arguments is removed from register_segment_tracer,
as its no longer needed.

The change also paves the way to use resrv_lfac_in/out to fold the
special uniform concentration case in the upcoming commit.
  Modified the offline tracer code to always allocate the offline mixed layer
depth when offline tracers are being used, and set it to HMIX_FIXED when
READ_MLD is false.  In addition, MLD_VAR_NAME is now passed to
update_offline_from_files(), whereas it was previously just being set and
ignored.  As a follow up, it might be useful to add the option of setting the
mixed layer in offline tracer mode dynamically using diagnoseMLDbyEnergy() or
diagnoseMLDbyDensityDifference(), but that is beyond the scope of this commit.
This commit addresses the bugs identified at
github.com//issues/1100.  All answers are bitwise identical in all
cases that worked previously, but some offline tracer cases that had been
failing previously with a segmentation fault should work properly now.
…nservation, and adding time integration (#1088)

* Add ability to use different ML depths with brine plume

Codes in 3 options, MLD_003, MLD_EN1, and H_EPBL, all of which can be passed to brine plume param to set the plume depth

* Update brine plume scheme to conserve salt and time integrate

- Brine plume scheme failed dimensional consistency because it was not multiplied by the time step.  The units of plume_flux in the description are updated, which exposes why it failed the time dimension consistency test.
- The brine plume scheme was moved after the application of other fluxes that can add salt/heat/mass to the ocean in applyboundaryfluxesinout.  This change made it easier to track that it only moves salt in the vertical and doesn't create/destroy salt.
- The brine plume salt flux is now multipled by dt to convert from a rate to an amount.  This has a large impact on the behavior, but is approximately compensated by an unrealated bug in SIS2.
- The brine plume scheme did not conserve salt because it could neglect salt if the mixed layer depth wasn't deeper than the center of the bottom grid cell, possibly due to round off.  A new condition is added to insert salt into the bottom layer if it is thick enough to accept salt.  This significantly improves conservation, though round-off level errors are still possible.
- Complete a stub for a brine plume diagnostic, which can be used to verify how the brine plume scheme moves salt in the vertical.
- Adds an option to increase the vertical scale for brine plumes proportionally to the chosen mixed layer depth.
- Updated some parameter names and formatting related to the MLD used in the brine scheme.

* Fix OMP directives for brine plume helpers in MOM_diabatic_aux

* Updates for conservation in brine plume parameterization

- The brine plume salt flux contribution is no longer added to the top level and subtracted later in the brine plume part. Rather, the salt that is added in applyboundaryfluxes now has the brine plume salt flux removed.  The salt flux is then simply applied by adding it back in during the brine plume part.
- The code to handle flux deposition within the mixed layer was rewritten to ensure the salt flux is entirely used by the bottom-most finite thickness layer.
- The flux is now prescribed by analytical integral over the layer instead of midpoint quadrature.
- Debugging options are enhanced to better track and report on salt conservation.

* Make sure salt added/removed in brine plume is always computed

* Fix formatting and OMP directive

* Clean up brine plume commit, standard out, and debug options

* Fix OMP associated with brine plume

* Fix OMP associated with brine plume (again)

---------

Authored-by: brandon.reichl <brandon.reichl@noaa.gov>
  This commit provides the MOM6 diag_mediator with information about the
locations and orientations of any open boundary condition segments at velocity
points, and then uses this to select the correct thicknesses to use for vertical
remapping of diagnostics onto alternative diagnostic grids (e.g., z or rho2).
It also makes a change to properly interpolate the thicknesses that are used as
weights for downsampling velocities.  Previously a one-sided estimate of the
thicknesses was being used, which did not satisfy rotational consistency.

  The specific changes here include:

 . The addition of the new publicly visible routine diag_mediator_set_OBC_info()
   and a call to this routine from initialize_MOM() when OBCs are being used.
   This routine stores information about the location and orientation of OBC
   segments.  It does not need to be called when OBCs are not being used.

 . The addition of two new integer arrays to the diag_ctrl type with values of
   0, -1 or 1, depending on the presence and orientation of OBCs at velocity
   faces.

 . The addition of OBC_u and OBC_v arguments to 3 publicly visible routines in
   MOM_diag_remap (diag_remap_do_remap(), vertically_reintegrate_diag_field()
   and vertically_interpolate_diag_field()) and to their private counterparts
   (do_remap vertically_reintegrate_field() and vertically_interpolate_field()).
   These arrays are used to select the appropriately interpolated thicknesses
   at velocity points to use as weights for remapping.

  In tests of these changes in the loop current test case, the changes to the
remapped velocities and transports at OBC points are subtle but real, while in
the interior (away from OBCs) the diagnostics are unaltered.

  All solutions are bitwise identical and no primary diagnostics are changed,
but there are changes to remapped diagnostics in cases with open boundary
conditions.  Within this commit this a new publicly visible subroutine, several
changes to the interfaces of 3 publicly visible routines and the addition of two
new 2-d arrays to the diagnostics control structure.
* *Add framework support for ANN mesoscale streamfunction

Extend MOM_isopycnal_slopes to output density gradients (drdx_u,
drdy_v, drdz_u, drdz_v) as optional arguments from
calc_isoneutral_slopes. Extend MOM_thickness_diffuse to accept
externally computed streamfunctions (Sfn_unlim_u_3D, Sfn_unlim_v_3D)
in thickness_diffuse_full. Wire up the new ANN streamfunction module
in MOM.F90 with initialization, computation, and restart calls.

* *Add ANN-based mesoscale streamfunction module

Introduces MOM_meso_sfn_ANN, which predicts density-flux-based
mesoscale streamfunctions using a neural network. Features include
configurable stencil windows for local density and velocity gradient
inputs, runtime-tunable clamping thresholds, boundary proximity
masking, and a fallback GM_simple mode for testing. The ANN outputs
buoyancy fluxes that are converted to streamfunctions via the
local density gradient magnitude.

This is answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True.
The parameter defaults to False, so no existing experiments are
affected.

* *Clean up ANN mesoscale streamfunction review feedback

Address Pavel Perezhogin's review and failing CI checks.

Review:
- Swap ANN_apply_vector_orig -> ANN_apply_vector_oi (faster path)
- Gate MOM_thickness_diffuse limiters on use_meso_sfn_ANN so the
  dev/gfdl baseline is unchanged when the ANN is off
- Refactor the streamfunction clamp to act on the velocity-scale
  Upsilon (Ferrari et al. 2010) before multiplication by dy_Cu/dx_Cv,
  giving a grid-independent cap. Rename the runtime parameter
  MESO_SFN_SFN_CLAMP to MESO_UPSILON_CLAMP and shift the default
  from 1.0e6 volume-transport units to 1.0e4 m2 s-1 velocity-scale.

Unit cleanup:
- Density flux variables (Fx_u, Fy_v, Fx_c, Fy_c) relabeled from
  [R L-1 ~> kg m-4] to [R L T-1 ~> kg m-2 s-1]
- Volume streamfunction dummy args sfn_u/sfn_v relabeled
  [L2 T-1] -> [Z L2 T-1 ~> m3 s-1] to match Sfn_unlim_u_3D
- mag_grad / mag_grad_floor switched from [R L-1] to [R Z-1]
  (the computation already produces [R Z-1] via the Z_to_L rescale
  of drdx_u; the floor comparison only works correctly under
  internal rescaling when both sides share the same internal unit)
- Fix conversion= factors on all 11 meso_sfn_* diagnostic fields;
  previous values were copy-pasted from sfn_unlim_x and did not
  match the registered quantities
- Add scale= argument to mag_grad_floor and MESO_UPSILON_CLAMP
  get_param calls for rescale-safe behavior
- Replace MESO_SFN_GRAD_MAG_EPSILON parameter with derived
  h_neglect-style locals rho_grad_neglect [R L-1] and
  vel_grad_neglect [T-1] (1e-30 scaled per dimension)

CI fixes:
- build-openmp: add present_drdx_u/drdx_u/drdz_u and
  present_drdy_v/drdy_v/drdz_v to the two !$OMP parallel do
  shared(...) clauses in calc_isoneutral_slopes
- check-style-and-docstrings: split multi-variable id_* declarations
  in MESO_SFN_ANN_CS so each has its own doxygen !< comment

Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* *Audit cleanup and CI fixes on meso_sfn_ANN

Act on the broader audit of MOM_meso_sfn_ANN and fix the two
remaining CI failures.

Audit-driven changes:
- Remove the GM_simple and GM_ann model-type branches entirely; they
  were testing scaffolding. The only production path is nondim_rhoF_ann.
  This also removes the MESO_SFN_ANN_TYPE runtime parameter, the
  meso_sfn_ann_model_type CS field, and the now-unused local Kappa.
- Add a startup bounds check for MESO_SFN_ANN_WINDOW: the stencil
  shift must be <= 3 to stay inside the halo=3 provision in
  meso_sfn_ANN_compute, and the window must be an odd integer.
- Set and check the CS%initialized flag so using the module without
  calling _init produces a clear FATAL instead of later undefined
  behaviour.
- Rename lowercase runtime params to the MOM6 uppercase convention:
  meso_sfn_ann_coeff/type/window/file -> MESO_SFN_ANN_COEFF,
  MESO_SFN_ANN_WINDOW, MESO_SFN_ANN_FILE (type removed).
- Expand the module docstring with a Ford-style summary of the
  pipeline and an explicit warning that the dimensionalization in
  _compute is an implicit contract with the training procedure.
- Document the extended stencil bounds in the corner-point velocity
  gradient loop.

CI fixes:
- build-openmp: add Sfn_unlim_u_3D / Sfn_unlim_v_3D to the shared(...)
  clauses of the two !$OMP parallel do blocks in thickness_diffuse_full.
- check-style-and-docstrings: wrap the Fy_c declaration comment onto
  a second line so the line stays under the 120-char limit.

Breaking parameter changes (branch not yet released): MESO_SFN_ANN_TYPE
removed; the four lowercase params listed above renamed to uppercase.

Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Resolve memory leak (deallocate)

Use faster ANN inference interface (ANN_apply_array_sio). 18% acceleration, regression is the same

* Changes to pass MOM6 testing system.

The tests passed:
test.grid
test.layout
test.restart
test.openmp
test.nan
test.dim

1) Trailing space is removed

2) Physical dimensionality is added to all float variables

3) Loop bounds are adjusted to get rid of out of bound memory access and pass test.grid

4) Scaling with physical units (i.e., structure US) is adjusted in multiple places to pass test.dim

5) Maximum allowed window size is limited to 3, which was tested now. Larger windows will likely needed calling pass_var (not yet implemented)

* Add proper interaction of buoyancy ANN with use_stored_slopes and SKEBs parameterizations

* Change default value of MESO_UPSILON_CLAMP parameter

Default values of limiting parameters are adjusted based on the diagnosis of CESM ocean model output.

MESO_UPSILON_CLAMP reduced from 1e4 m2 s-1 to 15 m2 s-1. This choice guarantees that the maximum overturning per one grid point and water column is approximately 1 Sv on CESM grid (before limiting and smoothing performed in MOM_thickness_diffuse.F90). The previous default value could allow up to ~600 Sv per grid point, which is an enormously large number.

MESO_SFN_FLUX_CLAMP, which is ~100 kg m-2 s-1, is likely never activated. Still, this number is reasonable as maximum predicted fluxes in CESM are ~20 kg m-2 s-1.

MESO_SFN_MAG_GRAD_FLOOR, which is 1.0e-10 kg m-4, is a very realistic number for CESM. The density gradients are approximately in a range from 1e-10 up to 1e-1, with a couple of grid points having 1e-12.

* Integrate pass_vector for sfn_u and sfn_v

Adding an exchange of halo points for the stream function predicted by the neural network. 

This halo exchange has no effect in simple geometries, but gaurantees Temperature/Salinity conservation on a tripolar grid, where rotational invariance is required to achieve conservation through the pole. Exact rotational invariance is not supported by the neural network. Rotational invariance is learned from data approximately, and is preserved less strictly than the numerical precision. Exchange of a streamfunction guarantees that there is no jump in the thickness flux at the grid point adjacent to the pole.

* Encapsulate use_meso_sfn_ANN in thickness_diffuse_CS

Address Hallberg's review comments about opaque-type encapsulation
(#1109).

- MOM.F90: Drop the if/else around the two thickness_diffuse calls
  and pass u, v unconditionally. The previous gating on
  CS%thickness_diffuse_CSp%use_meso_sfn_ANN peeked inside the
  opaque thickness_diffuse_CS type. The runtime check inside
  thickness_diffuse at line 503 (CS%use_meso_sfn_ANN .and.
  present_vel) still gates the ANN call correctly; present_vel
  is now always true but the conjunction is harmless.

- MOM_thickness_diffuse.F90: Make use_meso_sfn_ANN private
  (was incorrectly declared public). No external readers remain
  after the MOM.F90 refactor.

Expected to be bitwise identical based on inspection of the runtime
gate at MOM_thickness_diffuse.F90:503 (the conjunction
CS%use_meso_sfn_ANN .and. present_vel still routes the same way).
Not locally verified by test.repro / test.regression; CI will check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Clarify drdx_u/drdy_v docstrings as gradients along z surfaces

Per Hallberg's review feedback on #1109: emphasize
in the Doxygen comments that the new drdx_u and drdy_v output
arguments of calc_isoneutral_slopes are evaluated along surfaces
of constant z (not along isopycnals or model interfaces).

All answers are bitwise identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* *Apply MOM6 soft-case convention in MOM_meso_sfn_ANN

Address Hallberg's style review on #1109:

- Loop variables: rename lowercase k -> K on interface-range
  loops (k=2,nz and k=1,nz+1) in meso_sfn_ANN_compute,
  center_grad_rho, and center2uv. Rename lowercase j -> J on
  v-point loops (j=js-1,je with v-point array access in the
  body). Update one inner u-loop (do i=is-1,ie -> do I=is-1,ie)
  to match the body's I usage.

- Array indexing: flip interface-staggered array accesses
  (drdx_c, drdy_c, drdx_u, drdy_v, drdz_u, drdz_v, Fx_c, Fy_c,
  Fx_u, Fy_v, sfn_u, sfn_v, var1_c, var2_c, var1_u, var2_v) to
  use uppercase K on their third index. Layer-staggered velocity
  arrays (dudx, dudy, dvdx, dvdy) keep lowercase k to signal
  "the layer below interface K".

- mag_grad expression: rewrite from the FMA-paranoid
  (US%Z_to_L*drd_*)*(US%Z_to_L*drd_*) form to Hallberg's
  preferred US%Z_to_L**2*drd_***2 form (he explicitly noted
  the extra parentheses are not needed for FMA/symmetry).

Answer-changing only when USE_THICKNESS_DIFFUSE_ANN=True. The
case-only changes are bit-identical by language definition (Fortran
is case-insensitive). The simplified mag_grad expression is
bit-different under FMA-aware compilers. Since the feature defaults
to .false. and has not been released, no _BUG flag is required.
Not locally verified; CI will check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop optional from u,v in thickness_diffuse; fix v docstring

Address Hallberg's follow-up comments on #1109:

- Drop `, optional` from the u and v dummy arguments of
  thickness_diffuse. Both callers in MOM.F90 unconditionally pass
  u,v after the previous opaque-type refactor (98038be), so the
  optional attribute is no longer needed.

- Remove the now-redundant `present_vel` variable (declaration
  and assignment), and simplify the ANN call gate from
  `if (CS%use_meso_sfn_ANN .and. present_vel)` to
  `if (CS%use_meso_sfn_ANN)`.

- Restore the truncated `v` Doxygen description and tighten
  whitespace on both u and v declaration lines.

All answers are bitwise identical: `present_vel` was always true
after the prior refactor, so the conjunction always took the same
branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Make find_eta halo size conditional on use_meso_sfn_ANN

Per Hallberg's review on #1109: avoid the cost of
the wider halo in find_eta for default users.

The ANN streamfunction needs the interface elevation `e` with a
wider halo to support its stencil. Previously the halo bump was
unconditional. Now it is gated on `CS%use_meso_sfn_ANN` so that
default users (USE_THICKNESS_DIFFUSE_ANN=False) retain the
original `halo_size=1` call.

Default-off users now match upstream dev/gfdl exactly at this
call site. ANN-on users are unchanged relative to the previous
branch state.

Hallberg also recommends verifying halo updates by comparing runs
at different MPI process counts (e.g., mpirun -n 63 vs -n 64);
that test has not yet been run. CI may catch some issues; the
cross-process-count check is a separate validation step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Pavel <pperezhogin@gmail.com>
Co-authored-by: Pavel Perezhogin <35234405+Pperezhogin@users.noreply.github.com>
Building MOM_diag_mediator.F90 with GCC and -O2 became much slower after
diagnostic buffers were added.
```
    commit 6f6975a
    Author: Andrew Shao <andrew.shao@hpe.com>
    Date:   Wed Mar 18 12:17:43 2026 -0700

        +Extend diag_mediator to allow the piecemeal posting of diagnostics (#809)
```
Before this commit, MOM_diag_mediator.F90 built in about 15 seconds.
After this commit, build time increased to more than 170 seconds.

The slowdown was isolated to derived types with multiple allocatable
array components, not to abstract types, inheritance, or type-bound
methods.  A minimal form of the triggering pattern is shown below.
```
  type :: diag_buffer_2d
    real, allocatable :: buffer(:,:,:)
    integer, allocatable :: ids(:)
  end type
```
The component did not need to be referenced in MOM_diag_mediator.F90.
It was enough for diag_buffer_2d to appear as a component of axes_grp.

Somehow, this was resolved by defining an explicit dummy finalizer in
`diag_buffer_2d`.
```
  type :: diag_buffer_2d
    ! ...
  contains
    final :: finalize_diag_buffer_2d
  end type diag_buffer_2d

  subroutine finalize_diag_buffer_2d
    type(diag_buffer_2d), intent(inout) :: this
  end subroutine finalize_diag_buffer_2d
```
With this change, build time returns to about 15 seconds.

* The finalizers are intentionally empty.  Allocatable components are
  automatically deallocated by the language, so no explicit deallocate()
  calls are needed here.

  Finalizers should only be used for custom cleanup, such as external
  resources or manually managed pointer targets.

* `diag_buffer_[23]d` contains an array of `buffer_[23]d` objects.  The
  language standard notes that components are finalized before the
  parent.

* No similar compile-time benefit was seen from adding dummy finalizers
  to buffer_[23]d, so those types are left unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.