GFDL to main (2025-09-25)#1680
Conversation
* (*) Add option for minmod limiter for RayTracing This PR adds a minmod limiter option for the advection of the internal tides energy, which becomes the new default. The current positive definite scheme is kept as an option but has shown to create ripples at the grid scale. * change adv_limiter options from string to integer --------- Co-authored-by: Raphael Dussin <Raphael.Dussin@noaa.gov>
* (*) Multiple fixes for the ray tracing - Solve the issue of rays not propagating through the northfold: the use of pass_vector for speed_[x/y] is not appropriate since these arrays are meant to be scalar and the direction is contained in the angle dimension of energy. changing to regular pass_var at the appropriate cell location fixes it - The energy gets trapped at critical latitude: this PR introduces 2 options for energy propagation, either propagate along the critical latitude or reflect on it. These are controlled by TURN_CRITICAL_LAT (False: get trapped, True: do something) and REFLECT_CRITICAL_LAT (True: reflect like a solid wall, False: propagate along) - Several divisions by constant number were eliminated * (*) Multiple fixes for the ray tracing - Solve the issue of rays not propagating through the northfold: the use of pass_vector for speed_[x/y] is not appropriate since these arrays are meant to be scalar and the direction is contained in the angle dimension of energy. changing to regular pass_var at the appropriate cell location fixes it - The energy gets trapped at critical latitude: this PR introduces 2 options for energy propagation, either propagate along the critical latitude or reflect on it. These are controlled by TURN_CRITICAL_LAT (False: get trapped, True: do something) and REFLECT_CRITICAL_LAT (True: reflect like a solid wall, False: propagate along) - Several divisions by constant number were eliminated * add call to turning latitude in propagate_x This should satisfy the rotational symmetry. As expected, this has no impact on global case since reflected energy from propagate_x then does not need to be reflected in propagate_y. * update units order --------- Co-authored-by: Raphael Dussin <Raphael.Dussin@noaa.gov>
When debugging layout non-reproducible problems, this hchksum showed up as having changes in the halo BUT it is immediately followed by a pass_var(). Either we need to only check the C-domain or do hchksum with halos after the pass_var().
The rate at which eta_src was added to eta within the barotropic solver allowed the instantaneous eta to ground-out and drop below the bottom during the extend steps over which the filtered state is calculated. eta_src was previously adjusted so that it could at worst ground out over the course of the baroclinic step, but did not allow for the extra time needed when filtering. This change was found to be helpful when debugging problems in vanished layers under a grounded thermodynamic ice shelf.
An adjustment to the barotropic mass source used in the correction step of RK2 was making matters worse for layers under a grounded ice shelf. This adjustment essentially does not do anything for the deep ocean (i.e. was tiny tiny) so we suspect this was a hold over from the earlier BT algorithms.
The BT solver regularly spits out warnings about the SSH dropping below
the ocean bottom, particularly frequently for extreme scenarios such as
the vicinity of the ground line in an ice shelf cavity. These changes
apply a limiter on the time-integrated transport within the BT solver
that based on the initial volume adjusted for any sinks.
- Added run-time flag BT_LIMIT_INTEGRAL_TRANSPORT that turns on
calculation of the initial and projected available volume.
By default, this is off and the default available volume is huge.
- Implemented a limiter without `if`s for efficiency.
- Added a FATAL test to check that the initial conditions do not already
have negative total thickness.
- Flipped the WARNING about negative total thickness to be FATAL when
using the limiter, since if the limiter is not robust then we've done
something wrong again.
- Extended check on number of steps for BT solver, wrt filtering steps
- Changed an ==0 to <=0, since we encountered a weird situation where
it happened (unknown cause)
Adds analogous code to MOM_dynamics_split_RK2b.F90 that is identical to the code that was added to MOM_dynamics_split_RK2.F90 to avoid readjusting the barotropic mass source to try to reconcile the baroclinic and barotropic sea surface heights. By default all answers are bitwise identical, but the runtime parameter BT_ADJ_CORR_MASS_SRC will not appear in the MOM_parameter_doc files for cases with SPLIT_RK2B set to True.
The function query_initialized() expects the argument to be a target array, but when passed an allocatable it leads to a uninitialized jump condition when "restarting" MEKE%Le. Rearranging the logic to only query if the array is allocated avoids the error.
In two functions wihtin MOM_diag_mediator.F90, a "msg" string is declared and passed to error check (assertion), but was never filled with characters. I've removed the errant string and changed the error message to use already available information, namely the name of the field being configured.
Added the new debugging runtime parameter OBC_REVERSE_SEGMENT_ORDER that can be set to true to internally store and apply the OBC segments in the reverse of the order with which they are specified by the used. When the OBC segments are order independent, changing this value should give bitwise identical results. There is a new integer argument indicating the segment to use for setting parameters in setup_u_point_obc and setup_v_point_obc, but these are internal routines that are not publicly visible. This commit also changes some of the segment index variables from l (which can be hard to distinguish from 1) to n, and it standardizes the spacing in some loops over segments or in variable assignments to better align with the MOM6 style guide. By default the new runtime parameter is false and all answers are bitwise identical.
Added the ability to take open boundary conditions into account when setting up the masks, by providing the new optional arguments OBC_dir_u and OBC_dir_v in calls to initialize_masks(). The new optional argument open_corner_OBCs to initialize_masks() can be used to specify that convex corners between OBC segments should be treated as open. These had to be handled via optional arguments because SIS2 also calls initialize_masks and it is under separate version control. By default all answers are bitwise identical, but there are new optional arguments to a publicly visible routine.
Adds the option to avoid using exterior data at open boundary condition points, and to avoid projecting information outside across open boundary condition faces. Such projections are problematic because there can be pairs of OBCs that would project inconsistent values into the same point. This option is enabled by setting the new runtime parameter OBC_PROJECTION_BUG = False. To implement these changes, the weights for the barotropic solver to use when interpolating total thickness onto vorticity points when calculating the barotropic potential vorticity for use in the Coriolis scheme are now being stored in the new q_wt element in the barotropic control structure. These weights are set up during initialization with information about open boundary conditions optionally taken into account. In addition the total thicknesses at open boundary velocity points that are combined with the potential vorticities to give the barotropic Coriolis terms are always taken from the interior of an OBC face. Some inconsistencies in the units of a negligible but positive cell volume in the denominator of the barotropic PVs were also corrected in a way that will not change answers when dimensional rescaling is not being applied. This commit also includes the run-time configurable option to set the corner point land-mask at the bay-like intersection of open boundary conditions to be unmasked so that open boundary conditions can be applied there. By default all answers are bitwise identical, but answers in cases with open boundary conditions do change when these new capabilities are enabled.
Returned the functionality of open_boundary_setup_vert() to a separate routine, and no longer have initialize_segment_data() carry out these initialization steps. These routines had previously been combined in the expectation that they would always occur at about the same point in the code but this turns out not to be the case for generic tracers with open boundary conditions. This renewed routine is called directly form initialize_MOM. All answers are bitwise identical but there is one added public OBC interface.
Add separate OBC_for_remap and OBC_for_bug arguments to MOM_initialize_state() to clarify the distinct roles of the two arguments, to make it clear that one can not be used without rotation, and to prepare for the eventual obsoleting of OBC_RESERVOIR_INIT_BUG. This commit also simplifies the logic setting ntstep when do_thermo is false. All answers are bitwise identical, but there is a new optional argument to MOM_initialize_state and another argument has been renamed.
Added the new subroutine chksum_OBC_segment_data() from the contents of the loop over the segments in chksum_OBC_segments, as this can be useful when called separately for debugging some of the constructs that will be coming in subsequent commits. Also added rotate_OBC_segment_values_needed() to copy over the values_needed and _index variables between segments, and to avoid the need for duplicate code blocks in rotate_OBC_segment_data() and rotate_OBC_config(). There is also some minor revisions to rename segnam to segname for greater clarity. This commit also makes greater use of the I0 format that was introduced with Fortran 95 to simplify or shorten some error messages. All answers are bitwise identical but there are two new internal subroutines in the MOM_open_boundary module.
Call initialize_segment_data() with the rotated OBC type. This includes the addition of a new turns arguments to initialize_segment_data() and the elimination of the ocean_grid_type argument to this routine. This includes the addition of the new function rotated_field_name() to swap the names of u- and v- velocity fields when the grid is rotated, avoiding the duplication of code. Internally in initialize_segment_data(), num_fields was renamed to reflect the fact that it is actually the number of tracer or other fields that are initialized via the OBC manifest string (which does not include generic OBGC tracers), and not the total number of fields on a segment. Code was also added to allow for rotate_OBC_config() to be called either before or after initialize_segment_data(). With these changes, the call to initialize_segment_data() can be moved much later in the order of the initialization calls to address issues with the initialization of OBGC tracers with open boundary conditions. All answers are bitwise identical, but there are changes to the arguments of the publicly visible routine initialize_segment_data().
Move call to initialize_segment_data() after the call to call_tracer_register_obc_segments() to correct an answer changing bug in models with OBGC tracers and open boundaries that was causing the dev/CEFI branch answers not to reproduce with dev/gfdl. This commit will change answers (and restore previous answers) in cases that use open boundary conditions with biogeochemical tracers, such as those in COBALT.
The temporary blocks of code allowing for initialize_segment_data either before or after rotate_OBC_config() were eliminated, as was the now unused subroutine rotate_OBC_segment_data(). All of the functionality of rotate_OBC_segment_data had already been merged into initialize_segment_data. All answers are bitwise identical, but there is one less internal subroutine, and the order of calls to initialize the open boundary conditions is now less flexible.
* Added a 7th-, 5th- and 3rd-order WENO schemes to reconstruct the PV and added the Koren scheme to reconstruct the kinetic energy in MOM_CoriolisAdv: - Set CORIOLIS_SCHEME = "WENOVI7TH_PV_ENSTRO", "WENOVI5TH_PV_ENSTRO", :WENOVI3RD_PV_ENSTRO" to use a 7th-, 5th-, and 3rd-order WENO scheme, respectively, to reconstruct PV. - Set KE_SCHEME = "KE_UP3" to use a third-order upwind scheme with the Koren flux limiter to reconstruct the kinetic energy. Set KE_USE_LIMITER = False (default is True) to not use the flux limiter. The KE_UP3 scheme is recommended to be used along with the WENO Coriolis schemes. - Set WENO_VELOCITY_SMOOTH = True to use the velocity field to estimate the smoothness indicator for WENO. This scheme tends to yield higher-order of accuracy and thus more energetic eddy fields. - The WENO schemes conduct a thickness-weighted reconstruction for PV to avoid PV singularities. - The order of WENO is reduced to 5th, 3rd, and 1st subsequently when it approaches lateral boundaries - The halo size is 4 and 5 for the 7th- and 5th-order WENO schemes. When they are used, parameters USE_WENO and WENO_stencil are passed to MOM_dynamics_split_RK2 to increase the halo update size. Only MOM_dynamics_split_RK2 is changed now. The other RK2 modules will need to be updated later. * Replaced the WENO_stencil by the CorAdv_stencil function that is passed to MOM_dynamics_split_RK2 * Debug * Replaced a few ratios to variables. Corrected the unit of some variables * Changed adding area_tiny and h_tiny to taking the max * Removed the dimensional constant in WENO weighting * Added stencil in $OMP(shared). Added additional comments * Removed trailing space * Changed the zero check if statement for fac_fn. Changed Jsq, Jeq, Isq, Ieq to private variables * Addressed couple of rotational symmetry issues * Removed commented lines in RK2 and added WENO stencil information to RK2B * Added Is_q, Ie_q, Js_q, Je_q to set bounds for Coriolis schemes * Enlarged halo update size for WENO schemes in dynamic_unsplit modules * Removed trailing space * Moved the update of h to a group pass in RK2 module * Removed unnecessary lines --------- Co-authored-by: Alistair Adcroft <Alistair.Adcroft@noaa.gov> and Robert Hallberg <Robert.Hallberg@noaa.gov>
Add two options supporting 3D HYCOM1 TARGET_DENSITIES and ALE_RESOLUTION. The fully general interface is ALE_COORDINATE_CONFIG=HYBRID_3D which is like HYBRID except that the netCDF variables (e.g. sigma2 and dz) are 3D fields (x-y-z). Like HYBRID, the 2nd variable (e.g. dz) can be replaced by a FNC1 string that sets a dz profile that is used everywhere. The typical use of 3D HYCOM1 fields is to have different TARGET_DENSITIES in semi-enclosed seas for layers that are always below its sill depth. For example, in the OM4 75-layer setup layers 71 amd 72 are only active in the Arctic and layers 74 and 75 are only active in the Mediterranean. So the number of layers can be reduced to 71 by specifying different deep targets in these semi-emclosed seas. In this case ALE_RESOLUTION would be the same everywhere, but the resulting 71 layer configuration can be further improved by using shallow ALE_RESOLUTION in the northern hemisphere and the standard, deep, dz's in the southern hemisphere. This is safe, because deep layers are never in fixed coordinates near the equator. To simplify this typical use case, ALE_COORDINATE_CONFIG=HYBRID_MAP requires three netCDF variables (e.g. map,sigma2,dz) with the last two being 2-D z-index fields containing a small number of profiles and the first a 2-D x-y field of index values indicating which profile to use at each location. Like HYBRID and HYBRID_3D, the 3rd variable (e.g. dz) can be replaced by a FNC1 string that sets a dz profile that is used everywhere. Answers are not changes unless ALE_COORDINATE_CONFIG is set to HYBRID_3D or HYBRID_MAP.
SHALLOW_ALE_RESOLUTION implements a HYBGEN-style Z-sigma-Z near surface fixed coordinate for HYCOM1. For example the US Navy's GOFS 3.1 HYCOM setup has 41 layers, with the top 14 layers in a Z-sigma-Z configuration. For MOM6 HYCOM1 this is: SHALLOW_ALE_RESOLUTION = 14*1.0,27*0.0 for 14 1m "shallow" layers. Let N_SIGMA be the number of consecutive non-zero entries, typically < NK. When rest depth is shallower than SUM(SHALLOW_ALE_RESOLUTION(1:N_SIGMA)) use SHALLOW_ALE_RESOLUTION. When rest depth is deeper than SUM(SHALLOW_ALE_RESOLUTION(1:N_SIGMA)) use ALE_RESOLUTION. Otherwise use a linear sum of the two weighted by rest depth. The default of all zeros turns this option off, and when off answers are unchanged. The new parmeter SHALLOW_ALE_RESOLUTION is only present when using HYCOM1.
The 2-d REAL map array in HYBRID_MAP usually contains integer values each referencing one profile. It can instead contain non-integer values of the form I+frac, which indicate a weighted sum of profiles: (1-frac) p(I) + (frac) p(I+1). The same profile can be used multiple times, e.g. if 1st profile is also 4th can get profiles between 1 and 2 and between 1 and 3. HYBRID_3D is more general, but HYBRID_MAP covers most practical uses.
Added the new runtime option RESCALE_STRONG_DRAG, that can be set to true to reduce the barotropic contribution to the layer accelerations to account for the difference between the forces that can be counteracted by the stronger drag with BT_STRONG_DRAG and the average of the layer viscous remnants after a baroclinic timestep. In testing, this new capability eliminates some of the growing instabilities that can occur with an ice shelf and BT_STRONG_DRAG set to true. This commit also adds new diagnostics of the barotropic step viscous remnants and the eta anomalies contributing to barotropic pressure forces, either averaged over the barotropic step or at each barotropic step. By default all answers are bitwise identical, but there is a new runtime parameter and 4 new diagnostics.
#967) * Add option to horizontally homogenize the Stokes drift when used via the dataoverride surfbands procedure. * Add variable description in new method for horizontally averaging Stokes drift. --------- Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov>
Corrected a horizontal indexing bug in the calculation of the CAv_Stokes diagnostic, making it rotationally consistent and consistent with the calculation of CAu_Stokes. This bug has been there since the CAv_Stokes diagnostic was originally added. The loop range over which qS is calculated was also reduced to the range over which it is used. All solutions are bitwise identical, but this commit does change the values of a (perhaps infrequently used) diagnostic.
The interpreter directive ("shebang") of makedep is updated to
`python3`, rather than the version-agnostic `python`.
Although we never invoke the shebang of the script, there are OS
environments out there which will object to any presence of a
versionless python. PEP 394 also strongly recommends the adoption of
python3 as the executable name, regardless of Py2 support.
Added the new argument dyn_h_stencil to initialize_dyn_split_RK2 and the other 3 dynamic core initialization routines to return the size of the stencil for thicknesses as used by the dynamic core, depending on the options that are being used for the Coriolis and continuity schemes, and then used this in a set of halo updates in step_MOM_dynamics. With this change some additional halo updates that have recently been added inside of step_MOM_dyn_split_RK2 and the other 3 dynamic core time stepping routines could be eliminated. All answers are bitwise identical, but there is a new argument to 4 public interfaces. Co-authored-by: Marshall Ward <marshall.ward@gmail.com>
@jiandewang Thanks, that's great news. Could I ask you to test one more branch? This keeps the FMA ops but removes the https://github.com/marshallward/MOM6/tree/pr-defma-no-visc_rem-init |
|
@marshallward sure will test it and get back to you |
|
@marshallward pr-defma-no-visc_rem-init branch works fine. |
Switching the vertical friction loops from k/j/i to j/i/k replaced the
evaluation of `b1` by FMA with a simpler version, causing an answer
change when FMAs are enabled.
Although less efficient, this patch adds an always-false loop to trick
the compiler and force it to always execute `b1` by FMA.
Specifically, loops of the following form execute `b1` by FMA.
```
do k=2,nz
if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k)
c1(k) = dt * CS%a_v(i,J,K) * b1
b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1)
---> b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1))
d1 = b_denom_1 * b1
visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1
enddo
```
Switching to j/i/k ordering allows the Intel compiler to cache `a_[uv](K)` for
use in the next iteration of `k` and evaluate `b1` by a single multiplication.
If we insert an impossible branch, such as the following:
```
do k=2,nz
if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k)
c1(k) = dt * CS%a_v(i,J,K) * b1
b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1)
b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1))
d1 = b_denom_1 * b1
visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1
---> if (dt < 0) exit
enddo
```
then it blocks the lookahead logic of the compiler and forces the FMA execution
as in the k/j/i version.
There is a moderate impact on performance.
```
Before:
hits tmin tmax tavg tstd tfrac grain pemin pemax
(Ocean vertical viscosity) 300 2.717543 3.805039 3.523935 0.174203 0.064 31 0 511
```
```
After:
hits tmin tmax tavg tstd tfrac grain pemin pemax
(Ocean vertical viscosity) 300 2.780148 3.999669 3.761651 0.210061 0.069 31 0 511
```
so this should only be considered a temporary fix until FMA answer changes are
permitted.
|
Thanks for your help @jiandewang the PR has been updated with the FMA repro fix, as well as the OBC fixes from @kshedstrom. |
|
@marshallward since there are two new commits being added by Bob and kate, let me do a clean test before click approval |
dougiesquire
left a comment
There was a problem hiding this comment.
ACCESS-NRI approves. Thanks all
|
@alperaltuntas Do you know if NCAR reviewed the PR yet? |
We are in the process. We'll get back in a day or two. |
|
We are encountering answer changes in one of our configurations (wave coupling enabled) with intel. With gnu, however, the regression test passes. I am in the process of looking further to determine the root cause. |
|
I re-ran it using latest UFS hash, works fine. |
|
We've discussed this with NCAR and agreed to move ahead with this PR. As always, thanks to everyone for working through any answer changes. |
A plentiful end-of-FY25 contribution from GFDL:
Contributors: