Skip to content

MOM6: Corrected line lengths and Travis testing#944

Merged
adcroft merged 4 commits into
mom-ocean:dev/gfdlfrom
Hallberg-NOAA:fix_line_lengths
Jul 8, 2019
Merged

MOM6: Corrected line lengths and Travis testing#944
adcroft merged 4 commits into
mom-ocean:dev/gfdlfrom
Hallberg-NOAA:fix_line_lengths

Conversation

@Hallberg-NOAA
Copy link
Copy Markdown
Collaborator

Corrected the Travis tests to include testing for lines exceeding 120
characters in lenght, and fixed several places where excessive line lengths had
been allowed to be merged into dev/gfdl. All answers are bitwise identical and
there are not changes to the documentation files generated by MOM6. MOM6
commits with this PR include:

  RGC_tracer.F90 previously had some very long comments at the end of some
lines.  These have now been split onto multiple lines to respect the MOM6
standards for line-length.  All answers are bitwise identical.
  Split excessively long lines and corrected the syntax for unit documentation
in MOM_lateral_mixing_coeffs.F90 and MOM_thickness_diffuse.F90.  All answers are
bitwise identical.
  Added a dimensional scaling factor for fmax in MOM_hor_visc.F90 that was
dropped at some point in the merging of the dev/ncar code into dev/gfdl.  All
answers are bitwise identical and now pass the dimensional scaling test.
  Added the 120 character line limit into the travis testing script.
@Hallberg-NOAA
Copy link
Copy Markdown
Collaborator Author

This PR is being tested with https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines/8443

@adcroft adcroft merged commit 4cb18ac into mom-ocean:dev/gfdl Jul 8, 2019
@Hallberg-NOAA Hallberg-NOAA deleted the fix_line_lengths branch July 30, 2021 18:58
Hallberg-NOAA pushed a commit to Hallberg-NOAA/MOM6 that referenced this pull request Sep 19, 2025
* Add MOM_ANN module

* Mesoscale momentum parameterization with ANN

- Computes subgrid stress using ANN in MOM_Zanna_Bolton
- Uses MOM_ANN module for ANN inference

Equivalent MOM_override for defaults
```
USE_ZB2020 = True
ZB2020_USE_ANN = True
USE_CIRCULATION_IN_HORVISC = True
ZB2020_ANN_FILE_TALL = /path/to/ocean3d/subfilter/FGR3/EXP1/model/Tall.nc
```

* Mesoscale momentum parameterization with ANN (#2)

Blank commit after squash/rebase was handled on command line

* Moved MOM_ANN.F90 to src/framework/

* Minor refactor of MOM_ANN

- Removed unused modules
- Removed unused MOM_memory.h
- Added input and output means which default to 0 and
  do not need to be present in the weights file
- Gave defaults to means, norms, tests so that they do
  no need to be present in file
- Added missing array notation "(:)"
- Minor formatting

* Adds unit tests and timing test to MOM_ANN

- Added ANN_allocate, set_layer, set_input_normalization, and
  set_output_normalization methods to allow reconfiguration during
  unit tests
- Added ANN_unit_tests with some simple constructed-by-code
  networks with known solutions
- Added config_src/drivers/unit_tests/test_MOM_ANN.F90 to drive
  unit tests
- Added  config_src/drivers/timing_tests/time_MOM_ANN.F90 as
  rudimentary for timing inference

* Adding multiple forms of inference

- Adds inference operating on array (instead of single vector of
  features)
- Implements several different versions of inference with various
  loop orders
  - Involves storing the transpose of A in the type
  - Tested by checking inference on same inputs is identical between
    variants
    - Added randomizers to assist in unit testing
- Adds timing of variants to config_src/drivers/timing/time_MOM_ANN.F90
- Adds an interface (MOM_apply) to select preferred version of
  inference subroutine
- Added command line args to time_MOM_ANN.F90 to allow more rapid
  evaluation of performance

Variants explored, timed with gfortran (13.2) -O3 on Xeon:
- vector_v1:
  - original inference from Pavel
- vector_v2:
  - allocate work arrays just once, using widest layer
  - loop over layers in 2's to avoid pointer calculations and copies
  - speed up, x0.8 relative to v1
- vector_v3:
  - transpose loops
  - slow down, x1.54 relative to v1
- vector_v4:
  - transpose weights with same loop order as v1
  - slow down, x1.03 relative to v1
- array_v1:
  - same structure as v2, working on x(space,feature) input/outputs
  - speed up, x0.41 relative to v1
- array_v2:
  - as for array_v1 but with transposed loop order
  - apply activation function on vector of first index while in cache
  - speed up, x0.35 relative to v1
- array_v3:
  - same structure as v2, working on x(feature,space) input/outputs
  - speed up, x0.58 relative to v1

* Renamed ANN variants and added some module documentation

- Added module dox
- Renamed _v1, _v2 etc to labels
- Added ANN_apply_array_sio to ANN_apply interface
- Replaced "flops" with "MBps" in timing output

* Removed alternative variants of ANN in favor of optimized

- Deleted variants of ANN that did not perform as well as the two
  versions that remain.

* Apply array_sio function in ANN inference for momentum fluxes (#5)

* Apply array_sio ANN inference for computation of momentum fluxes

* remove trailing space

* Initial commit

* address Robert Hallberg code review

* Restore deafult value of ZB_SCALING coefficient

---------

Co-authored-by: Alistair Adcroft <Alistair.Adcroft@noaa.gov>
Co-authored-by: Alistair Adcroft <adcroft@users.noreply.github.com>
mnlevy1981 added a commit to TURBO-ESM/MOM6 that referenced this pull request May 19, 2026
* Add SHALLOW_ALE_RESOLUTION

SHALLOW_ALE_RESOLUTION implements a HYBGEN-style Z-sigma-Z near surface
fixed coordinate for HYCOM1.  For example the US Navy's GOFS 3.1 HYCOM
setup has 41 layers, with the top 14 layers in a Z-sigma-Z configuration.
For MOM6 HYCOM1 this is: SHALLOW_ALE_RESOLUTION = 14*1.0,27*0.0 for 14
1m "shallow" layers.

Let N_SIGMA be the number of consecutive non-zero entries, typically < NK.
When rest depth is shallower than SUM(SHALLOW_ALE_RESOLUTION(1:N_SIGMA))
use SHALLOW_ALE_RESOLUTION.  When rest depth is deeper than
SUM(SHALLOW_ALE_RESOLUTION(1:N_SIGMA)) use ALE_RESOLUTION.  Otherwise
use a linear sum of the two weighted by rest depth.

The default of all zeros turns this option off, and when off answers are
unchanged.  The new parmeter SHALLOW_ALE_RESOLUTION is only present when
using HYCOM1.

* Non-integer HYBRID_MAP values

The 2-d REAL map array in HYBRID_MAP usually contains integer values
each referencing one profile.  It can instead contain non-integer
values of the form I+frac, which indicate a weighted sum of profiles:
(1-frac) p(I) + (frac) p(I+1).  The same profile can be used multiple
times, e.g. if 1st profile is also 4th can get profiles between 1 and 2
and between 1 and 3.

HYBRID_3D is more general, but HYBRID_MAP covers most practical uses.

* indent continuations, source code <= 100 chars

* +Add RESCALE_STRONG_DRAG

  Added the new runtime option RESCALE_STRONG_DRAG, that can be set to true to
reduce the barotropic contribution to the layer accelerations to account for the
difference between the forces that can be counteracted by the stronger drag with
BT_STRONG_DRAG and the average of the layer viscous remnants after a baroclinic
timestep.  In testing, this new capability eliminates some of the growing
instabilities that can occur with an ice shelf and BT_STRONG_DRAG set to true.
This commit also adds new diagnostics of the barotropic step viscous
remnants and the eta anomalies contributing to barotropic pressure forces,
either averaged over the barotropic step or at each barotropic step.  By
default all answers are bitwise identical, but there is a new runtime parameter
and 4 new diagnostics.

* Add option to horizontally homogenize the Stokes drift when used via … (#967)

* Add option to horizontally homogenize the Stokes drift when used via the dataoverride surfbands procedure.

* Add variable description in new method for horizontally averaging Stokes drift.

---------

Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov>

* Fix calculation of CAv_Stokes diagnostic

  Corrected a horizontal indexing bug in the calculation of the CAv_Stokes
diagnostic, making it rotationally consistent and consistent with the
calculation of CAu_Stokes.  This bug has been there since the CAv_Stokes
diagnostic was originally added.  The loop range over which qS is calculated was
also reduced to the range over which it is used.  All solutions are bitwise
identical, but this commit does change the values of a (perhaps infrequently
used) diagnostic.

* makedep: Update interpreter directive to python3

The interpreter directive ("shebang") of makedep is updated to
`python3`, rather than the version-agnostic `python`.

Although we never invoke the shebang of the script, there are OS
environments out there which will object to any presence of a
versionless python.  PEP 394 also strongly recommends the adoption of
python3 as the executable name, regardless of Py2 support.

* Continuity ppm port to gpu (#29)

* present_vhbt_or_set_bt_cont: merge couple of loops

* meridional_flux_thickness: cpu optimize a bit

* meridional_flux_adjust: back to jki

* set_merid_BT_cont: pull out meridional_flux_adjst

* set_meridional_BT_cont: optimize cpu

* rm remaining merid/zonal_flux_layer

* zonal_flux_layere: improve naming a bit

* zonal_flux_layere_OBC: improve naming a bit

* improve flux_elem line wraps

* optimize data transfers a bit

* pass elem not arr

* add some missing documentation

* fix trailing spaces and missing var docs

* add last param docs

* meridional_flux_adjust: fix fpe err

* fix another fpe

* cleanup args for new helper subroutines

* clean up enter/exit data

* fix passing h twice

* meridional_mass_flux: do concurrent

* zonal_flux_adjust: use a few 3d tmp arrays to mirror meridional_flux_adjust

* move target update out of continuity

* initialize pbv%por_face_area[U/V] on GPU

* cleanup some transfers from continuity_PPM

* clean up a few minor things

* zonal/meridional_flux_adjust: use scalar u/v_new

* declare vp,up,h_tmp on gpu

* remove h update

* continuity_PPM: minimize mapping stmts

* zonal_flux_adjust: minimize mapping stmts

* set_zonal_bt_cont: minimize mapping stmts

* merional_flux_adjust: minimize mapping stmts

* set_merid_bt_cont: minimize mapping stmts

* zonal/meridional_flux_adjust: tmp vars duhdu/dvhdv 3d -> scalar

* separate alloc of private variables for gcc

* u/vh_aux: 3d->2d

* target teams loop recognizable by amdflang

* Continuity CS outside of init

This moves the Continuity CS to the dycore init function.  For some
reason, this avoids an answer change with CPU.  (Possibly because alloc
inside of a function doesn't quite match the CS outside of it?)

A few minor data transfers are also added to fix up differences in the
chksum log output.

* zonal_mass_flux: isolate zonal_flux_layer

* zonal_mass_flux: seperate local_specified_BC block loop

* zonal_mass_flux: add j dim to tmp vars

* zonal_mass_flux: seperate visc_rem_max init loops

* zonal_mass_flux: seperate du_min/max_CFL init loops

* zonal_mass_flux: separate duhdu/uh_tot_0 init loop

* zonal_mass_flux: separate du_min/max_CFL aggress_adjust update case (untested)

* zonal_mass_flux: separate du_min/max_CFL non-aggress_adjust update case

* zonal_mass_flux: separate du_min/max_CFL non-use_visc_rem update case (untested)

* zonal_mass_flux: separate du_min/max_CFL 0-clamp loop

* zonal_mass_flux: separate do_I local_specified_BC init loop (untested)

* copy zonal_flux_adjust that accepts 2d args

* zonal_mass_flux move j loop into zonal_flux_adjust copy

* zonal_flux_adjust_fused: use 2d internal arr

* zonal_flux_adjusted_fused: separate all loops

* zonal_mass_flux: separate u/du_cor update (untested)

* zonal_mass_flux: replicate former present(uhbt) control flow

* copy set_zonal_BT_cont with 2d args

* zonal_mass_flux: move j-loop into set_zonal_BT_cont_fused

* set_zonal_BT_cont_fused: use zonal_flux_adjust_fused

* set_zonal_BT_cont_fused: separate init loop

* set_zonal_BT_cont_fused: separate duR/L init loop

* set_zonal_BT_cont_fused: separate u_0/L/R init loop

* set_zonal_BT_cont_fused: use zonal_flux_layer_fused

* set_zonal_BT_cont_fused: separate last 2 loops

* copy merid_flux_layer that accepts entire arrs

* merid_flux_layer_fused: separate loops

* merididional_mass_flux: separate local_specified_BC loop (untested)

* merididional_mass_flux: separate tmp variable init loops

* merididional_mass_flux: separate dv_min/max_CFL calc loops

* merididional_mass_flux: separate dv_min/max_CFL 0 clamp loop

* merididional_mass_flux: separate simple_OBC_pt init loop (untested)

* copy meridional_flux_adjust that accepts entire arrs

* meridional_mass_flux: separate meridional_flux_adjust_fused

* meridional_mass_flux: move j-loop into meridional_flux_adjust_fused

* meridional_flux_adjust_fused: separate vh_aux,dvhdv init loops

* meridional_flux_adjust_fused: 1d->2d tmp arrs

* meridional_flux_adjust_fused: separate all arrs

* meridional_mass_flux: separate (d)v_cor assignment loops

* copy set_merid_BT_cont that accepts entire arrs

* meridional_mass_flux: move j-loop into set_merid_BT_cont_fused

* set_merid_BT_cont_fused: use meridional_flux_adjust_fused

* set_merid_BT_cont_fused: separate tmp var init

* set_merid_BT_cont_fused: separate short circuit loop

* set_merid_BT_cont_fused: rm redundant k loop in short circuit

* set_merid_BT_cont_fused: separate dvL/R init loop

* set_merid_BT_cont_fused: make remaining tmp arrs 3d

* set_merid_BT_cont_fused: use merid_flux_layer_fused

* set_merid_BT_cont_fused: separate last loop

* meridional_mass_flux: separate any_simple_OBC loop

* zonal_edge_thickness: move k loop->PPM_reconstruction_x

* meridional_edge_thickness: move k loop->PPM_reconstruction_y

* PPM_reconstruction_x/y: move k loop->PPM_limit_pos

* PPM_reconstruction_x/y: move k loop->PPM_limit_cw84 (untested)

* zonal_BT_mass_flux: separate all loops (untested)

* meridional_BT_mass_flux: separate all loops (untested)

* set_zonal_BT_cont_fused: clean up var defs

* use SZJB_(G) in do_I dclrns in merid* subroutines

* set_merid_BT_cont_fused: clean up var defs

* set_merid/zonal_BT_cont_fused -> set_merid/zonal_BT_cont

* zonal_flux_adjust_fused: clean up var defs

* zonal_flux_adjust_fused -> zonal_flux_adjust

* zonal_flux_layer_fused -> zonal_flux_layer

* meridional_flux_adjust_fused: clean up var defs

* meridional_flux_adjust_fused -> meridional_flux_adjust

* merid_flux_layer_fused: clean up var defs

* merid_flux_layer_fused -> merid_flux_layer

* fix call merid/zonal_flux_layer line conts

* zonal_mass_flux: use visc_rem_u_tmp

* zonal_mass_flux: separate du_min/max_CFL non-use_visc_rem update case (untested)

* copy set_zonal_BT_cont with 2d args

* meridional_mass_flux: use visc_rem_v_tmp

* copy meridional_flux_adjust that accepts entire arrs

* copy set_merid_BT_cont that accepts entire arrs

* set_merid_BT_cont_fused: rm redundant k loop in short circuit

* set_merid_BT_cont_fused: rm redundant k loop in short circuit

* zonal_flux_adjust_fused -> zonal_flux_adjust

* meridional_flux_adjust_fused -> meridional_flux_adjust

* zonal_flux_layer: move GV in var dclrn

* zonal/meridional_mass_flux: remove old visc_rem vars

* remove redundant kloop

* remove problematic omp directives

* port PPM_limit_pos/CW84, PPM_reconstruction_x, zonal_edge_thickness

* PPM_reconstruction_x: add enter/exit data stmts

* continuity_zonal_convergence: save loop range for porting convenience

* port continuity_zonal_convergence

* zonal_mass_flux: array init -> loop init

* zonal_mass_flux: port init loops

* port zonal_flux_layer

* zonal_flux_layer: add enter/exit data stmts

* port zonal_flux_adjust

* port set_zonal_bt_cont

* zonal_flux_adjust: add more arrs in enter/exit data

* set_zonal_BT_cont: add enter/exit data stmts

* port zonal_flux_thickness loops

* zonal_flux_thickness add enter/exit data stmts

* zonal_mass_flux prepare obc for porting

* zonal_mass_flux: merge any_somple_OBC loops

* zonal_mass_flux port main loops

* zonal_mass_flux port OBC loops

* zonal_mass_flux: add enter/exit data stmts

* continuity_ppm: add initial enter/exit data stmts

* port PPM_reconstruction_y, meridional_edge_thickness

* port continuity_merdional_convergence

* ppm_reconstruction_y: add enter/exit data stmts

* port merid_flux_layer

* meridional_flux_adjust: port loops

* meridional_flux_adjust: add enter/exit data stmts

* meridional_mass_flux: port core loops

* meridional_mass_flux: port OBC loops (untested)

* meridional_flux_thickness: port core loops

* meridional_flux_thickness: attempt port OBC loops (untested)

* meridional_flux_thickness: add enter/exit data stmts

* port set_merid_BT_cont loops

* set_merid_bt_cont: add enter/exit data stmts

* meridional_mass_flux: add enter/exit data stmts

* zonal/meridional_mass_flux: add missing vars in enter/exit stmts

* continuity_PPM: complete enter/exit data stmts

* *_edge_thickness: do concurrent

* zonal_mass_flux: do concurrent-ify

* set_zonal_BT_cont: do concurrent-ify

* zonal_flux_adjust: do concurrent-ify

* zonal_flux_layer: do concurrent

* zonal_flux_thickness: do concurrent

* continuity_zonal_convergence: do concurrent

* meridional_mass_flux: do concurrent

* set_merid_bt_cont: do concurrent

* meridional_flux_adjust: do concurrent

* merid_flux_layer: do concurrent

* meridional_flux_thickness: do concurrent

* continuity_merdional_convergence: do concurrent

* formatting

* continuity_PPM: update LB

* meridional/zonal_flux_adjust: separate ij-reduction

* zonal_flux_adjust: a couple jki loops

* set_zonal_bt_cont: some jki loops

* zonal_mass_flux: some jki loops

* optimise loops by using scalar zonal_flux_layer

* zonal_flux_adjust: duhdu -> scalar

* elemental zonal_flux_layere and separated OBC

* zonal_flux_adjust: back to jki

* zonal_flux_thickness: improve cpu perf a bit

* zonal_flux_adjust: add some comments and guard early exit

* zonal_flux_adjust: use omp target loop for private arrs

* set_zonal_BT_cont: use omp target loop for private arrs

* set_zonal_BT_cont: do conc jki loop

* zonal_mass_flux: rm useless do_i init

* zonal_flux_layere: precalc g_dy_Cu*por_face_areaU

* zonal_flux_layere: precalc dh

* zonal_flux_thickness: precalc dh

* zonal_flux_adjust: make uhbt optional

* zonal_flux_thickness: assign to outputs directly

* mv zonal_flux_adjust from set_zonal_bt_cont->zonal_mass_flux

* zonal_mass_flux: reuse visc_rem_u_tmp more

* zonal_mass_flux: remove redundant if stmt

* zonal_mass_flux: force inline zonal_flux_layere

* zonal_flux_layere: inlinable gfortran -O3 but slower for ifort

* zonal_flux_layere: make a bit "smaller"

* zonal_mass_flux: move present(uhbt) or set_BT_cont to new subroutine

* Revert "zonal_flux_layere: make a bit "smaller""

This reverts commit 316152eb3cf515c4179f8ceb738f9d259233915c.

* Revert "zonal_flux_layere: inlinable gfortran -O3 but slower for ifort"

This reverts commit 51ee896c54b4a349acb2ffe70489dc76558864b1.

* fix gcc line truncation

* pass doxygen tests

* remove forceinline dirs

* attempt document vars

* a couple long lines

* last trailing space

* meridional_mass_flux: use zonal_flux_layere

* zonal_flux_layere_OBC: make elemental

* meridional_mass_flux: move dv_min/max_CFL calc into j-loop

* meridional_mass_flux: move big chunk into separate subroutine

* present_vhbt_or_set_bt_cont: merge couple of loops

* meridional_flux_thickness: cpu optimize a bit

* meridional_flux_adjust: back to jki

* set_merid_BT_cont: pull out meridional_flux_adjst

* set_meridional_BT_cont: optimize cpu

* rm remaining merid/zonal_flux_layer

* zonal_flux_layere: improve naming a bit

* zonal_flux_layere_OBC: improve naming a bit

* improve flux_elem line wraps

* optimize data transfers a bit

* pass elem not arr

* add some missing documentation

* fix trailing spaces and missing var docs

* add last param docs

* meridional_flux_adjust: fix fpe err

* fix another fpe

* cleanup args for new helper subroutines

* clean up enter/exit data

* fix passing h twice

* meridional_mass_flux: do concurrent

* zonal_flux_adjust: use a few 3d tmp arrays to mirror meridional_flux_adjust

* move target update out of continuity

* initialize pbv%por_face_area[U/V] on GPU

* cleanup some transfers from continuity_PPM

* clean up a few minor things

* zonal/meridional_flux_adjust: use scalar u/v_new

* declare vp,up,h_tmp on gpu

* remove h update

* continuity_PPM: minimize mapping stmts

* zonal_flux_adjust: minimize mapping stmts

* set_zonal_bt_cont: minimize mapping stmts

* merional_flux_adjust: minimize mapping stmts

* set_merid_bt_cont: minimize mapping stmts

* zonal/meridional_flux_adjust: tmp vars duhdu/dvhdv 3d -> scalar

* separate alloc of private variables for gcc

* u/vh_aux: 3d->2d

* target teams loop recognizable by amdflang

* Continuity CS outside of init

This moves the Continuity CS to the dycore init function.  For some
reason, this avoids an answer change with CPU.  (Possibly because alloc
inside of a function doesn't quite match the CS outside of it?)

A few minor data transfers are also added to fix up differences in the
chksum log output.

* Continuity: Add locality to do concurrent

Do concurrent inside of !$omp target teams loop seems to fail standard
openmp tests if locality is not correctly set.

This patch adds the correct locality to the four `!$omp target teams
loop` directives.

The domore argument has also been removed, and replaced with a
`.not.any(do_i(:))` test.

---------

Co-authored-by: Marshall Ward <marshall.ward@noaa.gov>

* +Correct halo update sizes and reduce halo updates (#969)

Added the new argument dyn_h_stencil to initialize_dyn_split_RK2 and the
other 3 dynamic core initialization routines to return the size of the stencil
for thicknesses as used by the dynamic core, depending on the options that are
being used for the Coriolis and continuity schemes, and then used this in a set
of halo updates in step_MOM_dynamics.  With this change some additional halo
updates that have recently been added inside of step_MOM_dyn_split_RK2 and the
other 3 dynamic core time stepping routines could be eliminated.  All answers
are bitwise identical, but there is a new argument to 4 public interfaces.

Co-authored-by: Marshall Ward <marshall.ward@gmail.com>

* Correct multi-PE velocity truncation counts (#955)

Modified the conditions that determine when to increment the count of velocity
truncations to use the thickness as interpolated to velocity points to determine
whether layers are thick enough to be counted, rather than the arithmetic mean
thickness, and only count truncations that occur in the non-symmetric
computational domain to avoid double counting.  The filtering thicknesses
should be very similar in the ocean interior, but they will differ at open
boundary condition points.  The corrected counting was verified by running the
sloshing/layer test case with a maximum CFL set to 0.01 to create lots of
truncations, and then verifying that the truncation count is now the same on 1
and 10 PEs, whereas before it was not.  All solutions are bitwise identical, but
the reported truncation counts in the ocean.stats files can change in
multiple-PE cases with velocity truncations.

* ANN parameterization of horizontal momentum eddy fluxes (#944)

* Add MOM_ANN module

* Mesoscale momentum parameterization with ANN

- Computes subgrid stress using ANN in MOM_Zanna_Bolton
- Uses MOM_ANN module for ANN inference

Equivalent MOM_override for defaults
```
USE_ZB2020 = True
ZB2020_USE_ANN = True
USE_CIRCULATION_IN_HORVISC = True
ZB2020_ANN_FILE_TALL = /path/to/ocean3d/subfilter/FGR3/EXP1/model/Tall.nc
```

* Mesoscale momentum parameterization with ANN (#2)

Blank commit after squash/rebase was handled on command line

* Moved MOM_ANN.F90 to src/framework/

* Minor refactor of MOM_ANN

- Removed unused modules
- Removed unused MOM_memory.h
- Added input and output means which default to 0 and
  do not need to be present in the weights file
- Gave defaults to means, norms, tests so that they do
  no need to be present in file
- Added missing array notation "(:)"
- Minor formatting

* Adds unit tests and timing test to MOM_ANN

- Added ANN_allocate, set_layer, set_input_normalization, and
  set_output_normalization methods to allow reconfiguration during
  unit tests
- Added ANN_unit_tests with some simple constructed-by-code
  networks with known solutions
- Added config_src/drivers/unit_tests/test_MOM_ANN.F90 to drive
  unit tests
- Added  config_src/drivers/timing_tests/time_MOM_ANN.F90 as
  rudimentary for timing inference

* Adding multiple forms of inference

- Adds inference operating on array (instead of single vector of
  features)
- Implements several different versions of inference with various
  loop orders
  - Involves storing the transpose of A in the type
  - Tested by checking inference on same inputs is identical between
    variants
    - Added randomizers to assist in unit testing
- Adds timing of variants to config_src/drivers/timing/time_MOM_ANN.F90
- Adds an interface (MOM_apply) to select preferred version of
  inference subroutine
- Added command line args to time_MOM_ANN.F90 to allow more rapid
  evaluation of performance

Variants explored, timed with gfortran (13.2) -O3 on Xeon:
- vector_v1:
  - original inference from Pavel
- vector_v2:
  - allocate work arrays just once, using widest layer
  - loop over layers in 2's to avoid pointer calculations and copies
  - speed up, x0.8 relative to v1
- vector_v3:
  - transpose loops
  - slow down, x1.54 relative to v1
- vector_v4:
  - transpose weights with same loop order as v1
  - slow down, x1.03 relative to v1
- array_v1:
  - same structure as v2, working on x(space,feature) input/outputs
  - speed up, x0.41 relative to v1
- array_v2:
  - as for array_v1 but with transposed loop order
  - apply activation function on vector of first index while in cache
  - speed up, x0.35 relative to v1
- array_v3:
  - same structure as v2, working on x(feature,space) input/outputs
  - speed up, x0.58 relative to v1

* Renamed ANN variants and added some module documentation

- Added module dox
- Renamed _v1, _v2 etc to labels
- Added ANN_apply_array_sio to ANN_apply interface
- Replaced "flops" with "MBps" in timing output

* Removed alternative variants of ANN in favor of optimized

- Deleted variants of ANN that did not perform as well as the two
  versions that remain.

* Apply array_sio function in ANN inference for momentum fluxes (#5)

* Apply array_sio ANN inference for computation of momentum fluxes

* remove trailing space

* Initial commit

* address Robert Hallberg code review

* Restore deafult value of ZB_SCALING coefficient

---------

Co-authored-by: Alistair Adcroft <Alistair.Adcroft@noaa.gov>
Co-authored-by: Alistair Adcroft <adcroft@users.noreply.github.com>

* Part one of thickness reservoirs

- fixed the restart trouble

* Better Kelvin wave results in layer mode

* Response to comments on h_reservoirs PR.

* Inline Harmonic Analysis

Update 1: The accumulator of FtF is now associated with each FtSSH,
instead of shared between all FtSSH, such that both are updated at
the same model time steps, eliminating the potential inconsistency
when different FtSSH are called at different time steps.

* Inline Harmonic Analysis

Update 2: HA_register are called inside HA_init. This should be done
for all variables/fields.

* Inline Harmonic Analysis

Update 3: A logical flag is set to control whether harmonic analysis
is to be enabled.

* Inline Harmonic Analysis

Update 4: time_ref and const_name are defined in HA_init, instead
of being copied from MOM_tidal_forcing. This commit prepares for
the separation of MOM_harmonic_analysis and MOM_tidal_forcing.

* Inline Harmonic Analysis

Update 5: MOM_harmonic_analysis is now independent of MOM_tidal_forcing,
providing more flexibility for performing harmonic analyses on tidal
constituents not available in MOM_tidal_forcing (e.g., MK3, M4).

* Inline Harmonic Analysis

Update 6: The frequencies of 8 overtides/compound tides (MK3, MN4,
M4, MS4, S4, M6, S6, M8) have been added for the harmonic analysis.

* Bug fix in MOM_open_boundary

Fixed the inconsistency for defining the reference time of tides in
MOM_tidal_forcing and MOM_open_boundary.

* +Use I0 format to simplify integer output

  Use the I0 format that was introduced with Fortran 95 in 155 lines scattered
across 40 files to simplify or shorten some error messages.  In 21 cases,
adjustl() calls that are no longer necessary for intended formatting were also
eliminated.  These changes have the effect of ensuring that there are still
appropriate messages if there are, for example, more than 99 vertical layers or
9999 points (total) in a horizontal directions or more than 9999 PEs.  In 15
cases, this change allowed for the elimination or reduction of if tests that
formatted output based on the size of an integer.  All answers are bitwise
identical but there there may be some minor formatting changes in some error
messages.

* Vert friction: Fix index errors

This patch fixes two class of index errors in multiple functions of
`MOM_vert_friction.F90`:

* `j=G%isc,G%jec` had been incorrectly applied to multiple loops.  This
  went undetected because we almost exclusively use local indexing where
  `G%isc == G%jsc`, but is nonetheless a serious error.  Thanks to Jorge
  Luis Gálvez Vallejo for reporting.

* One errant loop in the shelf code had `i=is,je`.  This was undetected
  due to poor ice shelf coverage testing.  Thanks to Claire Yung for
  reporting.

* Vert friction: Column loops moved in layers

This patch moves the k-column loops inside of ji-layer loops, rather
than outer-k loops of layers.

The primary motivation is to restore performance at high-bandwidth runs,
which were insufficiently tested during development of the k-j-i form.

The inner-column loops show improved performance for both low and
high-bandwidth runs.

The high-bandwidth benchmark case: (128-core, 256x128 x 75 layer)
```
                                   Profile   Reference
      (Ocean vertical viscosity):   7.158s,  15.047s (-52.4%)
```
The low bandwidth case: (1-core, 32x32 x 75 layer)
```
                                   Profile   Reference
      (Ocean vertical viscosity):   3.911s,   4.788s (-18.3%)
```
For the GFDL OM5 production configuration at 503, runtimes of the slowest ranks
were reduced in proportion to the high-bandwidth case above.

For the reference dev/gfdl,
```
                                      hits          tmin          tmax          tavg
(Ocean vertical viscosity)             288      4.303819     21.483670     14.452196
```
After apply this patch, times reduce ~40%
```
                                      hits          tmin          tmax          tavg
(Ocean vertical viscosity)             288      0.976130     13.398768      8.689331
```

* Moving to columns allowed for removal of many `do_i` tests, since the test is
  applied before starting the loop.

* The `touch_ij` dummy function was removed, since we're no longer trying to
  force an IPO optimization.

* The shelf requires a re-calculation of the various thickness averages
  (h_arith, etc).  These could be saved as 1D if it becomes a problem.

* In addition to the usual regression testing, I also found no regressions in
  selected ice shelf configurations.

* Bug fix for linear wave drag in MOM_barotropic

* Linear wave drag is limited to be only applied to land points, using
velocity point masks mask2dC[uv].

* Rayleigh_[uv] calculation and bt_rem_[uv] update from linear wave drag
 is limited for Htot>0 only.

This patch eliminates potential NaN in Rayleigh_[uv] in an unusual
scenario that Htot==0.0 and lin_drag_[uv]/=0. The changes do not change
answers: bt_rem_[uv] is zero at land points regardless. Rayleigh_[uv]
is added to [uv]_accel_bt which is masked before updating velocity.

* Suppress warning message of negative eta in land

In MOM_barotropic and non-Boussinesq mode, warning message on negative
eta is now only issued at wet points, consistently with Boussinesq.

* Flip the order of acceleration and velocity chksum

In MOM_dynamics_split_RK2, now accleration chksum is printed before
velocity with debug on, so that we could know which accleration term is
responsible for a NaN in velocity.

* vv gpu good (#44)

* Vertvisc: First part of loop

Moving gradually

* add nvtx ranges and module warpper

* port loop to openmp for now -> will move to do concurrent. Focused on porting to GPU now

* more gpu

* port the vertical velocity to GPUs

* port the following vertical velocity loops to GPUs

* more loops using omp target teams, validated

* fix compute sanitizer death

* meridional velcoity component ported

* finish porting vertvisc

* offload another loop

* revert the last thing

* add collapse(2) to the expensive loops

* fix

* fix validation

* updates to porting: find coupling coef

* almost done, just missingt he killer loop

* lable the block of death

* duct taped version of all funcs ported to GPUs

* back to format and delete block constructs

* remove the maps ins remnant yay!

* remove need for local vertvisc_u/v variables

* death loop of death has been ported

* vertvisc coef does not have explicit mappings anymore

* the maps are gone!

* limit vel using single data transfer

* do concurrent

* some memory cleanup and code cleanup - profiling time

* update and lots of nvtx markers for opt

* limit vel betterments and notes for optimization

* make limit vel good

* h_ml transfer

* Vertfrict: Move CS allocations to init

Allocations of the vert frictions and visc control structures were moved
from the dycore subroutine loops to the model initialization.

Redundant grid (G) transfers were removed (although they were not
triggering any transfers).

Most (but not all) visc arrays have also been moved into the
initialization.  Kv_shear and Ray_[uv] need to be added.

Trailing whitespace also (inadvertently) got removed, hopefully not a
distraction.

* coupling coeff fixes

This appears to improve the CPU/GPU repro of the latest merge of dev/gpu
into dev_gpu_vertvisc.

* Resolves some of the u/v/h/dz field states between the barotropic and
  vertvisc functions.  (Development of each assumed that the other was
  not complete).

* This seems to fix some issues with a_[uv], Kv_tot, and dz inside of
  find_coupling_coef and vertvisc coef

  * chksum transfers were added to ensure consistent

* The diff has moved to frhat[uv] in btcalc.  The fields are all zero on
  GPU, so they are probably not transferred.

Work in progress...

* Remove redundant BT_cont transfer

The BT_cont fields are now on GPU, so they don't need to be transferred
before the btcalc call.

Checksum transfers for h_ml were also added.  Still assuming that it's
computed on the CPU.

* delete nvtcx

* test to see if guarding and reintroducing the check helps

* an ugly fix for a simple problem: just want to be sure before I dive

* spaces

* Vertvisc: formatting/style cleanup

This patch reduces the formatting changes to this branch and brings it
closer to both dev/gfdl and the MOM6 style guide.

One minor non-cosmetic change is the reversion of ADp%dv_dt_str(i,J,1)
calculation into one of the main Thomas loop.  This returns it to a
separate loop.

* Vert friction: Explicit CS alloc in dycore init

This patch removes the NVIDIA compiler preprocessing and explicitly
compiles the `CS` in the dynamic core initilization.

`vertvisc_init` is no longer responsible for allocation.  The allocation
test has also been removed.  We just have to trust each other now (or
segfault when it refs an unallocated field).

The previous error in the CI was the absense of `CS` allocation in the
other timestep methods (unsplit, unsplit RK, split RK2b).

* Vert friction: Data transfer reduction

Several modifications to the data transfer statements

* Some redundant transfers were removed

* Input/output transfers were moved up to the dycore loop

* Vert friction: Cosmetic cleanup

Remove some redundant comments, and an unused array loop.

* Vert friction: nk_in_ml bugfix

---------

Co-authored-by: Marshall Ward <marshall.ward@noaa.gov>

* +Add the new parameter RESOLN_FUNCTION_OBC_BUG

  Added the new runtime parameter RESOLN_FUNCTION_OBC_BUG that can be set to
false to take open boundary conditions into account when calculating the
resolution functions at u-, v- or q-points.  By default the wave speeds used to
calculate resolution functions do not take OBCs into account and all answers are
bitwise identical.

* port main rk loop in a single commit  (#47)

* port main rk loop

* typo and formatting

* delete nan inducing copy

* Dycore: NaN bugfix + minor diffs

* hp upload after zero-initialization was causing random errors, likely
  in halo values.  Now that zero-initialization is GPU-side, no need for
  upload.

* Some nested do-concurrents were consolidated

* Lots of whitespace fixes

* Dycore: Mem reduction, first pass

* Dycore: Mem cleanup second pass

* Dycore: memcheck 3

* dycore: memcheck next

* Dycore: memcheck, visc_rem and halos

* dycore: Keep dz on GPU

* Dycore: Remove [uv]_inst transfers

* dycore: cs%eta fully on gpu

* Dycore: Remove pbv copies

A small change, but seems correct.
pbv is input-only in step RK2, and is set outside in the main dycore
loop (on CPU), followed by an upload.

* dycore: memfix accel trim

* Dycore: move up, vp down

(and apparently an h download was redundant.  looks ok, but havent 100%
verified...)

* dycore: Shift accels later in loop

* dycore: tau[xy]_bot on GPU only

* Dycore: [uv]_inst moved outside loop

* Dycore: [uv]p, visc_rem_[uv] downshift

* Dycore: [uvh]_av, uh, vh downshift

Also some halo padding

* Dycore: Remove multiple forcing/hp updates

* Dycore: memfix [uv]_bc_accel

* Dycore: Remove u_accel_bt and eta_pred transfers

* Dycore: mem cleanup visc_rem_[uv]

---------

Co-authored-by: Marshall Ward <marshall.ward@noaa.gov>

* +*Fix 3-equation ice-ocean flux iteration (#972)

Fix the 3-equation iteration for the buoyancy flux between the ocean and an
overlying ice-shelf when ICE_SHELF_BUOYANCY_FLUX_ITT_BUGFIX is true and
SHELF_3EQ_GAMMA it false.  This code now uses proper bounding of the
self-consistent solution, avoiding further amplifying the fluxes in the cases
when the differences between the diffusivities of heat and salt to make the
buoyancy flux destabilizing for finite turbulent mixing.  Both the
false-position iterations and the (appropriately chosen) Newton's method
iterations have been extensively examined and determined to be working correctly
via print statements that have subsequently been removed for efficiency.

  Previously, the code to determine the 3-equation solution for the buoyancy
flux between the ocean and an ice shelf had been skipping iteration altogether
or doing un-bounded Newton's method iterations with a sign error in part of the
derivative, including taking the square root of negative numbers, leading to the
issue described at https://github.com/NOAA-GFDL/MOM6/issues/945.  That issue has
now been corrected and can be closed once this commit has been merged into
the dev/gfdl branch of MOM6.

  This commit also changes the names of the runtime parameters to correct the
ice shelf flux iteration bugs from ICE_SHELF_BUOYANCY_FLUX_ITT_BUG and
ICE_SHELF_SALT_FLUX_ITT_BUG to ICE_SHELF_BUOYANCY_FLUX_ITT_BUGFIX and
ICE_SHELF_SALT_FLUX_ITT_BUGFIX to avoid confusion with other ..._BUG parameters
where `true` is to turn the bugs on, whereas here `true` fixes them.  The old
names are retained via `old_name` arguments to the `get_param()` calls, so no
existing configurations will be disrupted by these changes.

  Additionally, an expression to determine a scaling factor to limit ice-shelf
bottom slopes in `calc_shelf_driving_stress()` was refactored to avoid the
possibility of division by zero.

  This commit will change (and correct) answers for cases with
ICE_SHELF_BUOYANCY_FLUX_ITT_BUGFIX set to true, but as these would often fail
with a NaN from taking the square root of a negative value, it is very unlikely
that any such configurations are actively being used, and there seems little
point in retaining the previous answers.  No answers are changed in cases that
do not use an active ice shelf.

Co-authored-by: Alistair Adcroft <adcroft@users.noreply.github.com>

* Fix divide by zero fpe in apply_oda_incupd (#164)

Move computation of tmp_h inside the if G%mask2dT test to avoid divide by zero error

* Dycore: h/h_av mem cleanup

* Dycore: Shift h, [uv]_av, [uv]h updates to halo

* Dycore: static [uv]htr allocations

NVIDIA was correcting for our missing uhtr allocations on the GPU in the
slowest way possible.  This patch adds an explicit allocation, which
reduced the number of transfers dramatically.

* add flag for gpu2gpu do_group_update mpi transfers for latest fms

* Dycore: Move halo updates to GPU

ALso:
* Declate [uv]int_cor in MOM_barotropic as loop-locals
* Add h to data transfer after RK2 step

* DO_LOCALITY() bugfix; use GPU FMS in .testing

The default FMS is not compatible with the new GPU-based MPI methods, so
we just change it in .testing/Makefile for now.  Down the road, we
should probably move this into the .github config.

DO_LOCALITY() now returns a ; if do concurrent locality modifiers are
unsupported.  This prevents errors associated with line continuations.

* btstep: pre-calculate problematic eta checks on GPU

* problem_eta->submerged,any_problem_eta->eta_is_submerged

* Fixes shelfwave failure in debug mode

 - rotated OBC%segment%num_fields needs to be set.

* Make sure reversed segments get rotated.

* Bugfix for default TIDES_ANSWER_DATE in SAL

Fix a bug that the recently changed default answer date for
TIDES_ANSWER_DATE is not properly applied to MOM_self_attr_load.
TIDES_ANSWER_DATE is used in MOM_self_attr_load to check if SAL_USE_BPA
is used after a timestamp, so its default should be consistent with
MOM_PressureForce_FV.

* Added local versions of density_elem and density_derivs without "this" argument

* Fixes a typo in Recon1d PPM limiter

Thanks to both @alperaltuntas and @marshallward who noted that a PPM limiter
has the expression `( u2 - u1 ) * ( u1 - u0 ) <- 0.0` which is interpreted
as `( u2 - u1 ) * ( u1 - u0 ) < -0.0a. Needless to say, the intended code
was `( u2 - u1 ) * ( u1 - u0 ) <= 0.0`.

The same typo was copied to three files.

The high-order estimate of edge value was previously bounded by (u2,u1)
or (u1,u0). The missed conditions of either `( u2 - u1) == 0.` or
`( u1 - u0 ) == 0.` would then have been caught by the subsequence test
for an interior extrema. Thus, I think the cell was still limited to PCM
appropriately. However, the typo obscured the intention of the limiter
and I was lucky it still worked.

* Fixes shelfwave failure in debug mode

 - rotated OBC%segment%num_fields needs to be set.

* Make sure reversed segments get rotated.

* Frequency-dependent drag in tensor form

This commit allows the frequency-dependent drag to be implemented in
tensor form, by incorporating the off-diagonal components of the wave
drag tensor into the MOM_wave_drag module.

* Adds a PLM reconstruction scheme using least squares for the slope

Recon1d_PLM_WLS provides a piecewise linear reconstruction where the
slope is the "best" fit as determined by volume-weighted least squares.

The reconstruction is NOT limited by neighboring cells.
Therefore, this reconstruction is NOT useful for vertical remapping or grid generation.
It is instead intended for the pressure gradient calculation;
the idea is to disconnect the PLM slope from the values in vanish(ing)
layers which appear to be the source of pressure-gradient errors over
topographic slopes in z*-coordinate tests.

Because the normal limiters do not apply, the only test I could think of
was to check that the least squares fit was actually correct. The
documentation explains how this was checked (which took a while due to
round-off challenges with the loss function).

* Vert friction: Force FMAs in tridiag solvers

Switching the vertical friction loops from k/j/i to j/i/k replaced the
evaluation of `b1` by FMA with a simpler version, causing an answer
change when FMAs are enabled.

Although less efficient, this patch adds an always-false loop to trick
the compiler and force it to always execute `b1` by FMA.

Specifically, loops of the following form execute `b1` by FMA.
```
    do k=2,nz
      if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k)

      c1(k) = dt * CS%a_v(i,J,K) * b1
      b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1)
--->  b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1))
      d1 = b_denom_1 * b1
      visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1
    enddo
```
Switching to j/i/k ordering allows the Intel compiler to cache `a_[uv](K)` for
use in the next iteration of `k` and evaluate `b1` by a single multiplication.

If we insert an impossible branch, such as the following:
```
    do k=2,nz
      if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k)

      c1(k) = dt * CS%a_v(i,J,K) * b1
      b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1)
      b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1))
      d1 = b_denom_1 * b1
      visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1

--->  if (dt < 0) exit
    enddo
```
then it blocks the lookahead logic of the compiler and forces the FMA execution
as in the k/j/i version.

There is a moderate impact on performance.
```
Before:
                                     hits          tmin          tmax          tavg          tstd  tfrac grain pemin pemax
(Ocean vertical viscosity)             300      2.717543      3.805039      3.523935      0.174203  0.064    31     0   511
```
```
After:
                                     hits          tmin          tmax          tavg          tstd  tfrac grain pemin pemax
(Ocean vertical viscosity)             300      2.780148      3.999669      3.761651      0.210061  0.069    31     0   511
```
so this should only be considered a temporary fix until FMA answer changes are
permitted.

* fix segfault

* send wt_* to gpu for btstep_timeloop

* move uv_old download into if block

* port find_ustar_mech_forcing interface of find_ustar

* init I_Hbbl on GPU instead of CPU

* clean up some loops

* kji -> jki some loops

* remove redundant CS%CA[u,v]_pref downloads

* fuse some loops in vertical viscosity

* move visc%nkml_visc_[u,v] to out of j loop

* Vert frict: nk_in_ml reduce locality in v

This patch adds a missing reduce(max: max_nk) locality modifier to
nk_in_ml in the v-direction.

This fixes an openmp regression that was detected in ifx 2025.2.

* port find_uv_at_h

* Corrected unit descriptions in 64 comments

  Corrected the descriptions of variable units in 64 comments spread across 16
files, including a dozen instances where "arbitrary" was misspelled.  All
answers are bitwise identical and only comments were changed.

* *Update TC testing parameters for late 2025

  Updated the values of about 21 parameters (many of which are repeated across
TC test cases) used in the TC testing to test the most recent versions of code
that is selected with ANSWER_DATE flags and to avoid testing the buggy versions
of code that is regulated by _BUG flags.  This includes some changes to broaden
the range of equations of state that are being tested and to test some newer
versions.  This does change the details of the TC tests, but they should (and
do) still pass TC regression tests across code versions.

* Added frazil to ice shelf (#985)

* Added frazil to ice shelf

The frazil mass flux to the ice-shelf base is calculated by
multiplying frazil energy [J m-2] by the inverse of the timestep times
the latent heat of fusion [kg J-1 s-1].

This frazil mass flux is incorporated as a negative water flux from
the ice shelf. This negative water flux then acts to add the frazil
mass to the ice shelf base
(MOM_ice_shelf.F90/change_thickness_using_melt) and remove it from
the ocean surface as evaporation (MOM_ice_shelf.F90/add_shelf_flux).

Note frazil is reset to zero at the start of each therm timestep in
MOM.F90/step_MOM.

Some additional changes were also made to how the ice-shelf flux
factor is implemented, so that is only scales ice-shelf melt without
affecting the frazil mass flux.

* Fixed a commented line where fluxes%water_flux should be ISS%water_flux

* Extend the PGF reconstruction to allow PLM-WLS

The PLM reconstruction used within the pressure gradient force
now supports the weighted least squares approach for slope
estimation.

In a catastrophic version of seamount/z where vanished layers
slightly inflate, the regular finite volume PLM method is sensitive
to values in the vanished layers and leads to a feedback that causes
en error growth (spontaneous motion). The PLM-WLS method is insensitive
to the vanished layers and in the same test leads only to round-off
level noise in the flow.

* Spatially varying bottom drag coefficient (#983)

* Spatially varying bottom drag coefficient

The spatially varying bottom drag coefficient can be specified by
providing a map of the spatially varying scaling factor.

* Spatially varying bottom drag coefficient

Fixed the inconsistency at open boundaries when CDRAG_MAP is true.

* Correction on total column thickness for wetting

In a number of cases, total resting column thickness is calucated as
G%bathyT + G%Z_ref, which is largely correct but for wetting, i.e.
G%bathyT < 0. This commit makes a correction for seven cases with this
potential bug.

There is no answer changes if no wetting points are used and G%Z_ref is
zero.

List of modules/processes affected:
* MOM_barotropic
    * affects only surface stress when BT_NONLIN_STRESS is False.
* MOM_wave_speed
* h2 calculations in
    * subroutine internal_tides_init
    * subroutine int_tide_input_int
    * subroutine tidal_mixing_init
* MOM_lateral_mixing_coeffs
* MOM_MEKE

* fix repeated symbol eror that ocurs in benchmark

* Barotropic: frhat[uv] HYBRID repro fix

This patch fixes a minor bit reproducibility regression in the
calculation of `hat[uv]tot` and, consequently, `frhat[uv]` in the ifort
compiler.

Currently, the `hat[uv]tot` loops were moved to separate loops.  This
patch moves `hat[uv]tot` back into the principal `frhat[uv]` (or in this
case, `hat[uv]`) loop.

There may be performance consequences to this, I have not yet
investigated.

This was notably subtle, since many tests only call `btcalc()` once
without the BTCONT `h_[uv]` inputs, was only observed in `frhatu`, and
did not actually change solution answers.  Nonetheless, this is a
genuine answer change in an actively used solver, so we want to preserve
bit repro until discussed and approved by the consortium.

Although I can't see exactly the reason for the diff, it seems possible
that Intel has added a reduction-like optimization, even at `-O0`.  It
also may not have occurred in production compilations with `-fp-model`
flags.

* Allow overshoot for for grounding test

In commit b8c807be327c0, we made the test for SSH penetrating the sea floor
when using BT_LIMIT_INTEGRAL_TRANSPORT because we thought it could never
happen. Unfortunately, floating-point round off allows violations and we
were hitting the now fatal error. This commit calculates the precision we
can expect for the current SSH and then if the ocean thickness has become
negative within this precision, we reset to zero thickness.

This should not change answers in that BT_LIMIT_INTEGRAL_TRANSPORT is a
new option, and if anyone was using it they would have encountered a FATAL,
and this fix does not alter any positive thicknesses.

* Initialize and integer only set on root_PE()

When debugging with all run-time tests turned on, the integer `num_lines`
was flagged as used but uninitialized when being passed to `broadcast()`.
I don't think the code was wrong, just that the checks expected the "inout"
argument to be set on all processors when the purpose of `broadcast()` is
to take the value from root_PE and send to everyone else. I don't know why
this hadn't been detected before - maybe compiler version. The fix is trivial
and has no impact on production codes or answers.

* Fix an uninitialized float in set_viscous_ML()

`oldfn` was not initialized when used in a logical test. This did not
matter for numerical results; the logical expression always evaluated to
the False correctly due to other parts of the expression. Nevertheless,
this variable was technically used uninitialized and a debugging executable
doesn't get past this. Hence the fix.

* Add floor to "h_marg" in continuity_PPM

When debugging the ice sheet configuration, a non-zero barotropic transport
could not be reconciled with the layer transports because the derivative of
net layer transports was zero (d/dv hu). This arose due to all layer flows
pointed from vanished to thick so that their marginal thicknesses were
individually zero. Adding a floor to the marginal thickness allows the
solver to find the adjustment that does reconciles the two estimates.

I've made this optional via parameter CONT_USE_H_MARG_MIN, and with default
of False. If this situation had occurred before, we surely would have had
a crash so it's likely that always applying this floor would not change
answers. However, there's the weak possibility that a teeny-tiny transport,
smaller than H_subroundoff, has existed in a run and then this answer would
change. With the default of False we can be sure there are no answer
changes, but it is recommended to use this option for safety.

* *+Fix CHANNEL_DRAG with bathymetry above sea level

  The CHANNEL_DRAG option was using a harmonic mean to interpolate adjacent
bottom depths at velocity points to vorticity points.  However, this is not well
behaved when the bottom depth is negative (i.e., above sea level), as was noted
as a part of PR #975. This commit adds the new runtime parameter
CHANNEL_DRAG_SHELFBREAK_DEPTH to set a depth below which a harmonic mean bottom
depth is still used to mimic a continental shelfbreak profile, but above which a
simple arithmetic mean is used to interpolate bathymetry to vorticity points for
use with CHANNEL_DRAG. The expressions vary continuously with depth and avoid
the previous problems with division by zero or a badly formed harmonic mean.  By
default, all answers are bitwise identical in any cases that worked previously,
but cases with oceans (or Great Lakes) in basins with bottoms that are above
sea-level should now work sensibly when CHANNEL_DRAG is enabled.  There is a new
runtime parameter in some cases.

* Corrected 66 unit descriptions in comments

  Corrected the incorrect or inconsistent unit descriptions of 28 variables,
added descriptions of the units of 4 others, and corrected the non-standard
syntax (e.g. backwards or in the wrong order) in the description of 35
variables, scattered across 27 files.  Only comments are changed and all answers
are bitwise identical.

* Fix for ice-shelf friction velocity bugs (#995)

* Fix for ice-shelf friction velocity bugs

Fixed an incorrect area used to calculate cell-centered ocean surface velocity under the ice_shelf, which can impact the calculation of ice-shelf friction velocity. Added missing flags to some allocate_surface_state calls so that sfc_state%taux_shelf and sfc_state%tauy_shelf are allocated. This is required for the surface-stress-based (rather than surface-velocity-based) calculation of ice-shelf friction velocity. Also added taux_shelf and tauy_shelf as diagnostics for the surface stress under the ice shelf.

* Removed unneeded taux_shelf and tauy_shelf diagnostics

* Added ustar_from_vel_bugfix flag, which if true, fixes the ustar from ocean velocity bug

* offload pass_uta_uhbta

* (+) Decouple FMS infra from framework

This patch undoes a coupling of the FMS infra layer to the MOM6
framework code.

In the current FMS infra layers, the `get_extern_field_info()` and
`init_extern_field()` functions require content defined in
`src/framework`.   This prevents the development of new
independent infra layers, which much also depend on infra-agnostic
content.

In particular, the FMS2 implementation of `get_extern_field_axes()`
relies exclusively on the framework function, `get_var_axes_info()`.

Both infras also return the `axes_info` type, a MOM-specific
framework-level descriptor, rather than the infra `axistype`.

This patch resolves these inconsistencies.

* `axis_info` no longer appears at infra-level.  All relevant functions
  now reference `axistype`.

* `src/framework/MOM_io.F90` now provide functions for translating
  `axistype` to `axis_info`.

Some specific changes are summarized below.

* `get_external_field_info` is now a framework-level function of
  `MOM_interpolate.F90` , using infra-level implementations of
  `get_extern_field_(size|axes|missing)`.  Each is now explicitly
  defined at the infra-level.

* The FMS2 `get_external_field_axes` is now an entirely new function,
  and is largely a duplicate of `get_var_axes_info()`.  The major
  difference is that it returns a list of `axistype`.  It also replaces
  the fixed x-y-z fetch with a slightly more generic list of axes.

  (It still requires at least three dimensions, however.)

* `set_axis_data` is only used internally by the FMS2 infra.  It is
  included in FMS1 but raises an nonimplementation error.

There is one minor API change.

* The `name` argument was added to `get_axis_data`.  It is now the
  second argument, to match the style of existing functions, and size
  was moved to the third argument.

Other minor framework references have been removed.

* `MOM_error` and `FATAL` now refernce their `MOM_error_infra`
  equivalents.

* `lowercase`, which was previously only defined in FMS1, has been added
  to the FMS2 infra.  Note that this is a duplication of the function in
  `src/framework/MOM_string_functions.F90`.

* Add mom_cap_outputlog.F90 that enables output logging diagnostics at a given hourly output frequency

Author: Denise Worthen
This feature is required for UFS operational configurations and is used to determine when 
MOM6 output (diagnostics and restart) have been completed. The log files created by 
this feature can be queried by the Global Workflow to either trigger downstream jobs 
or to ensure that if a run fails and a restart is required, model output is available 
consistent with a given restart file.

* Revert vertical viscosity subroutines to JIK loop order used in dev/gfdl (#101)

* Rewrote vertvisc_remnant to follow dev/gfdl jik structure. Uses omp directives and private vars to offload to gpu

* Rewrote vertvisc to follow dev/gfdl jik structure and use omp directives + private vars in tridiagonal solver

* Vertvisc with jik loops based on Marshals vv_coef branch, no transfers except ustar and hml

* Switch to distribute parallel do and omp declare target to allow threading with find_coupling call inside of vertvisc_coef

* Cleanup commented out code in vertvisc_coef

* Revert non-k find_couling functions back to their dev/gfdl version, reintroduce conditionals to vertvisc_coef

* Explicitly make coupling routines gpu routines to avoid crash when compiling with O2. Make transfers explicit

* Harmonization with dev/gfdl

* Several minor changes to harmonize this patch (and dev/gpu) with
  dev/gfdl.

  * touch_ij is removed since it is no longer used by the CPU.
  * Various whitespace changes to OMP directives and loop indices.
  * Unused OMP directives have been removed.
  * An `associated(ADp%dv_dt_visc)` typo has been fixed.

* Added latent heat flux from ice shelf to ocean fluxes

* Fixes wrong number of levels in z-coord diags

When a z-coordinate diagnostic grid is specified via the "PARAM"
method of coordinate definition, then the number of levels was always
the same as the main model. This commit fixes this by first allowing
for upto a 1000 levels in the new grid, checking for the actual
requested size, and then allocating to that size.

It appears we have no examples using this mode, which is probably
how this bug has persisted so long. This "PARAM" method of specifying
grids is being used in a range of new CMIP7 diagnostics in both
MOM6 and COBALT.

* Fix bug in registration of ALE sponge diagnostics for generic tracers (#1003)

* Init all sponge tendency diag IDs to -1 immediately

* No need to reset to -1 since initialized when declared

* Move init_ALE_sponge_diags to after all tracers have been set up

* Fix reference of (rarely) unassociated pointer

These two references to members of a pointer don't seem to be hit except
under special circumstances but nevertheless I ran in to them when debugging
an unrelated problem. There are two references to members of `diag%axes` that
assume `diag%axes` are associated, but in the specific case I was debugging
this was not the case.

* Adds 5 CMIP7 diagnostics for vertically integrated heat/salt content

Five vertically integrated diagnostics are requested in CMIP7. These
ultimately are to be for four vertical intervals (0-300m, 300-700m, etc.)
but we will handle that through addition of a 4-level diagnostic grid,
configured at run-time. This commit handles the conversion from temperature
or salt to heat content or salt content (by mass) and registers a
"vertically extensive" quantity so that the diagnostics know to re-integrate
rather than remap.

Changes:
- Added diagnostics absscint, pfscint, scint, chcint and phcint
- Moved registration of temp_int and salt_int to within an existing
  `if (use_temperature)` block
- Made public 2 GSW conversion functions in MOM_EOS

* Optimized the ice-shelf CG scheme by reducing the number of times reproducing_sum (and therefore, mpp_sum) is called. Previously, several 2-D arrays were each being passed within their own reproducing_sum calls, which is now avoided by consolidating the 2-D arrays into one 3-D array that is passed to a single reproducing_sum call.

* Check that frazil is allocated before adding it to ice-shelf water flux calculation. Needed for runs without frazil.

* Added melt_mask for ice shelves

* Added melt_mask to ice-shelf restart

* comments and units

* subroutine ice_shelf_solve_inner: Completed variable descriptions and units; converted cg_halo and max_cg_halo from real to integer

* +Add trim_trailing_commas and ints_to_string

  Copied the function i2s from MOM_diag_mediator into the function
ints_to_string in MOM_string_functions, and moved the code removing trailing
commas from two places in MOM_diag_mediator into the new function
trim_trailing_commas in MOM_string_functions.  Because of the duplication of
code between MOM6, SIS2 and the MOM6 ice shelf code, these functions would need
to be replicated 3 or 6 times without these changes.  Also added unit tests of
both new functions to string_functions_unit_tests.  All answers are bitwise
identical but there are two new public functions in MOM_string_functions.

* Call trim_trailing_commas from register_diag_field

  Call trim_trailing_commas from register_diag_field and register_static_field
and ints_to_string from trim_trailing_commas and eliminated the now redundant
routine i2s.  All code functions exactly as before but there is less duplicative
code.

* Refactor nsten_halo in routine advect_tracer

Move nsten_halo out of iteration loop

* Fix OBC indexing bug in MOM_tracer_advect

Fix a bug that tracers in domain outside of the
OBC is falsely updated when then the OBC is in the
interior. The bug was due to an indexing error in
routine advect_x.

* MOM_interpolate: use get_axis_size()

The prior version of `get_external_field_info` incorrectly relied on the
`size` output of `get_external_field_info_infra` to determine the size
of an external field's axes, since all external fields are assumed to be
domain-decomposed.

Since axis metadata is generally opaque, we have introduced a new infra
function, `get_axis_data`, which returns the size of an axis.

* Adds the ability to read a CDEPS configuration file to provide in-line forcing. 

* Adds the ability to read a CDEPS configuration file to provide in-line forcing. 
Currently this is set up to read a non-climatological lrunoff data stream only.

* ice-ocean-nolib: Fix SIS2 paths

Patch to fix the SIS2 paths in the pipeline CI script.  Explicitly
excludes the icebergs stub, since we are using the actual icebergs
model.

* Correct the path to the Icepack interfaces

  The previous attempt to fix the automated no-library build of the ice-ocean
model incorrectly specified the path to the Icepack_interfaces.  This has now
been corrected from `src/SIS2/config_src/external/Icepack_interfaces` to
`src/SIS2/config_src/external/Icepack_interfaces` in pipeline-ci-tool.sh.  The
real mystery here is why the testing on the previous PR actually worked.

* Delete unneeded masks args from 25 post_data calls

  Removed redundant mask arguments from 25 post_data() calls for 2-d arrays that
were using masks that would have been set anyway based on the axes of these
diagnostics.  Explicit masks are only required for arrays that use unusual
masks, pass atypically sized arrays (e.g., just the computational domain), or
are static diagnostics that do not evolve in time.  All answers and diagnostic
output are bitwise identical.

* vertvisc: add missing b_denom_1 map delete

* vertvisc: remove scalar allocs

* Add 2D meanSL field

The spatially varying time mean sea level meanSL is used as a reference
height to calculate, e.g., time mean ocean column thickness
max(meanSL + bathyT, 0.0). This field allows the model run in a domain
with spatically varying mean height, e.g. the Great Lakes system.

This first commit insulates the changes from the rest of the model. It
only adds the field to ocean_grid_type and dyn_horgrid_type, the
transcription between the two types, and a routine to read it from a
file. The field is not yet used by the rest of the code.

* Use meanSL to calcualte mean column thickness

This commit uses G%meanSL in 13 modules. The change is essentially
replacing G%bathyT + G%Z_ref with G%meanSL + G%bathyT. Note that this
does NOT mean parameter G%Z_ref is replaced by G%meanSL. G%Z_ref is
factored in both G%meanSL and G%bathyT and it is kept as a useful
consistency testing tool.

Another cosmetic change is made by using G%meanSL + G%bathyT, instead of
G%bathyT + G%meanSL, which (hopefully) can be easily interpreted as
G%meanSL - (-G%bathyT).

* Modify max_depth calculation using meanSL

max_depth is really used as a maximum static thickness throughout the
model, so meanSL needs to be considered.

* +Fix how missing values are handled in post_data

  At no point does MOM6 code actually set arrays passed to the post_data() to
have a missing value.  Instead a missing value is set in output files entirely
by masking.  This commit eliminates the logic that would (inaccurately) try to
reset fields that seem to match rescaled missing values to the output missing
value.  The previous code was inaccurate, in that a rescaled field could have
taken on the unscaled missing value as a valid data point and still have been
incorrectly marked is missing, although the odds of this happening are
exceptionally small and it would only be cases with dimensional rescaling where
this could have applied.  For 2-d diagnostics, this commit eliminates a
duplicative array syntax math expression that did exactly what the code now
does. All solutions are identical, and because the missing value was not being
explicitly it is unlikely that any diagnostics will change.

* Remove ice-sheet melting/freezing contribution to fluxes%latent because it is already accounted for in fluxes%sens

* Add tracing instrumentation to nuopc driver  (#162)

* adds calls to ufs tracing routines that will create a trace file which can then be visualized, 
  which is found to be useful in identifying various performance issues.

* +Add G%IdxCu_OBCmask and G%IdyCv_OBCmask

  Added the new elements `IdxCu_OBCmask` and `IdyCv_OBCmask` to the
`ocean_grid_type` and `dyn_horgrid_type` to facilitate the application of
no-gradient open boundary conditions at faces with essentially no added
overhead.  These new arrays are set initially in `set_derived_metrics()` and
`set_derived_dyn_horgrid()`, but may be reset in `initialize_masks()` and
`open_boundary_impose_land_mask()`.  All answers are bitwise identical but there
are a pair of new 2-d arrays in two transparent grid types.

* Use G%IdxCu_OBCmask in 7 places

  Modified the code to use `G%IdxCu_OBCmask` and `G%IdyCv_OBCmask` in 7 places
each in 6 modules.  They are used instead of `G%OBCmaskCu*G%IdxCu` and
`G%OBCmaskCv*G%IdyCv`, to which they are equivalent.  This change should
slightly speed up the model, and as expected all answers are bitwise identical.

* Add option to scale tidal amplitude for bottom ustar. (#1016)

* Add option to scale tidal amplitude for bottom ustar.

- previously we used the tidal amplitude to compute ustar.
- The additional factor translates between amplitude and time mean tidal current.
- Setting the factor TIDEAMP_FACTOR<0 preserves old answers.

* Update tideamp factor implementation for efficiency
- factor out the negative "default" value to automatically set to multiply by 1.0 instead of using an if-block.
- factor in the c-grid averaging 0.5 to further reduce extra operations, but clearly label the parameter to reflect this.

---------

Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov>

* Add vertical tracer flux diagnostic for dye tracers (#1022)

* Add vertical tracer flux diagnostic for dye tracers

- Register vertical flux diagnostic in initialize_dye_tracer
- Calculate net vertical flux from entrainment (positive upward)
- Post flux diagnostic in dye_tracer_column_physics

* changed diagnostic registration to be at interface, made sure boundary fluxes are zero

* changed lines 338 and 354 as needed.

Fixed accidental space on Line 1.

* Regroup MOM_initialize_fixed params in param_doc

This commit is meant to fix the issue that all parameters in
MOM_initialize_fixed after OBC are logged under module MOM_open_boundary
in MOM_parameter_doc.

By moving log_version call after OBC, parameters from
MOM_initialize_fixed are now logged under three "modules" in
MOM_parameter_doc:
1. Parameters before OBC are under module MOM_grid_init, which also
(incorrectly) includes topography relatd parameters.
2. module MOM_open_boundary
3. Parameters after OBC are under module MOM_initialize_fixed.

The change makes sure OBC parameters are well separated from the other
parameters. This is a hack rather than a fix.

* Minor open_boundary_config refactor

* Make OBC related calls in MOM_initialize_fixed explicitly
conditional for readibility.
* Early return in open_boundary_config if there is no se…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants