MOM6: Corrected line lengths and Travis testing by Hallberg-NOAA · Pull Request #944 · mom-ocean/MOM6

Hallberg-NOAA · 2019-07-08T18:04:52Z

Corrected the Travis tests to include testing for lines exceeding 120
characters in lenght, and fixed several places where excessive line lengths had
been allowed to be merged into dev/gfdl. All answers are bitwise identical and
there are not changes to the documentation files generated by MOM6. MOM6
commits with this PR include:

NOAA-GFDL/MOM6@85939c3 Travis tests for lines exceeding 120 characters
NOAA-GFDL/MOM6@00d99ea (*)Multiply fmax by US%s_to_T in MOM_hor_visc.F90
NOAA-GFDL/MOM6@7a9cf32 Split excessively long lines in 2 files
NOAA-GFDL/MOM6@71693b5 Split long comments in RGC_tracer.F90

RGC_tracer.F90 previously had some very long comments at the end of some lines. These have now been split onto multiple lines to respect the MOM6 standards for line-length. All answers are bitwise identical.

Split excessively long lines and corrected the syntax for unit documentation in MOM_lateral_mixing_coeffs.F90 and MOM_thickness_diffuse.F90. All answers are bitwise identical.

Added a dimensional scaling factor for fmax in MOM_hor_visc.F90 that was dropped at some point in the merging of the dev/ncar code into dev/gfdl. All answers are bitwise identical and now pass the dimensional scaling test.

Added the 120 character line limit into the travis testing script.

Hallberg-NOAA · 2019-07-08T18:06:28Z

This PR is being tested with https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines/8443

* Add MOM_ANN module * Mesoscale momentum parameterization with ANN - Computes subgrid stress using ANN in MOM_Zanna_Bolton - Uses MOM_ANN module for ANN inference Equivalent MOM_override for defaults ``` USE_ZB2020 = True ZB2020_USE_ANN = True USE_CIRCULATION_IN_HORVISC = True ZB2020_ANN_FILE_TALL = /path/to/ocean3d/subfilter/FGR3/EXP1/model/Tall.nc ``` * Mesoscale momentum parameterization with ANN (#2) Blank commit after squash/rebase was handled on command line * Moved MOM_ANN.F90 to src/framework/ * Minor refactor of MOM_ANN - Removed unused modules - Removed unused MOM_memory.h - Added input and output means which default to 0 and do not need to be present in the weights file - Gave defaults to means, norms, tests so that they do no need to be present in file - Added missing array notation "(:)" - Minor formatting * Adds unit tests and timing test to MOM_ANN - Added ANN_allocate, set_layer, set_input_normalization, and set_output_normalization methods to allow reconfiguration during unit tests - Added ANN_unit_tests with some simple constructed-by-code networks with known solutions - Added config_src/drivers/unit_tests/test_MOM_ANN.F90 to drive unit tests - Added config_src/drivers/timing_tests/time_MOM_ANN.F90 as rudimentary for timing inference * Adding multiple forms of inference - Adds inference operating on array (instead of single vector of features) - Implements several different versions of inference with various loop orders - Involves storing the transpose of A in the type - Tested by checking inference on same inputs is identical between variants - Added randomizers to assist in unit testing - Adds timing of variants to config_src/drivers/timing/time_MOM_ANN.F90 - Adds an interface (MOM_apply) to select preferred version of inference subroutine - Added command line args to time_MOM_ANN.F90 to allow more rapid evaluation of performance Variants explored, timed with gfortran (13.2) -O3 on Xeon: - vector_v1: - original inference from Pavel - vector_v2: - allocate work arrays just once, using widest layer - loop over layers in 2's to avoid pointer calculations and copies - speed up, x0.8 relative to v1 - vector_v3: - transpose loops - slow down, x1.54 relative to v1 - vector_v4: - transpose weights with same loop order as v1 - slow down, x1.03 relative to v1 - array_v1: - same structure as v2, working on x(space,feature) input/outputs - speed up, x0.41 relative to v1 - array_v2: - as for array_v1 but with transposed loop order - apply activation function on vector of first index while in cache - speed up, x0.35 relative to v1 - array_v3: - same structure as v2, working on x(feature,space) input/outputs - speed up, x0.58 relative to v1 * Renamed ANN variants and added some module documentation - Added module dox - Renamed _v1, _v2 etc to labels - Added ANN_apply_array_sio to ANN_apply interface - Replaced "flops" with "MBps" in timing output * Removed alternative variants of ANN in favor of optimized - Deleted variants of ANN that did not perform as well as the two versions that remain. * Apply array_sio function in ANN inference for momentum fluxes (#5) * Apply array_sio ANN inference for computation of momentum fluxes * remove trailing space * Initial commit * address Robert Hallberg code review * Restore deafult value of ZB_SCALING coefficient --------- Co-authored-by: Alistair Adcroft <Alistair.Adcroft@noaa.gov> Co-authored-by: Alistair Adcroft <adcroft@users.noreply.github.com>

* Add SHALLOW_ALE_RESOLUTION SHALLOW_ALE_RESOLUTION implements a HYBGEN-style Z-sigma-Z near surface fixed coordinate for HYCOM1. For example the US Navy's GOFS 3.1 HYCOM setup has 41 layers, with the top 14 layers in a Z-sigma-Z configuration. For MOM6 HYCOM1 this is: SHALLOW_ALE_RESOLUTION = 14*1.0,27*0.0 for 14 1m "shallow" layers. Let N_SIGMA be the number of consecutive non-zero entries, typically < NK. When rest depth is shallower than SUM(SHALLOW_ALE_RESOLUTION(1:N_SIGMA)) use SHALLOW_ALE_RESOLUTION. When rest depth is deeper than SUM(SHALLOW_ALE_RESOLUTION(1:N_SIGMA)) use ALE_RESOLUTION. Otherwise use a linear sum of the two weighted by rest depth. The default of all zeros turns this option off, and when off answers are unchanged. The new parmeter SHALLOW_ALE_RESOLUTION is only present when using HYCOM1. * Non-integer HYBRID_MAP values The 2-d REAL map array in HYBRID_MAP usually contains integer values each referencing one profile. It can instead contain non-integer values of the form I+frac, which indicate a weighted sum of profiles: (1-frac) p(I) + (frac) p(I+1). The same profile can be used multiple times, e.g. if 1st profile is also 4th can get profiles between 1 and 2 and between 1 and 3. HYBRID_3D is more general, but HYBRID_MAP covers most practical uses. * indent continuations, source code <= 100 chars * +Add RESCALE_STRONG_DRAG Added the new runtime option RESCALE_STRONG_DRAG, that can be set to true to reduce the barotropic contribution to the layer accelerations to account for the difference between the forces that can be counteracted by the stronger drag with BT_STRONG_DRAG and the average of the layer viscous remnants after a baroclinic timestep. In testing, this new capability eliminates some of the growing instabilities that can occur with an ice shelf and BT_STRONG_DRAG set to true. This commit also adds new diagnostics of the barotropic step viscous remnants and the eta anomalies contributing to barotropic pressure forces, either averaged over the barotropic step or at each barotropic step. By default all answers are bitwise identical, but there is a new runtime parameter and 4 new diagnostics. * Add option to horizontally homogenize the Stokes drift when used via … (#967) * Add option to horizontally homogenize the Stokes drift when used via the dataoverride surfbands procedure. * Add variable description in new method for horizontally averaging Stokes drift. --------- Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov> * Fix calculation of CAv_Stokes diagnostic Corrected a horizontal indexing bug in the calculation of the CAv_Stokes diagnostic, making it rotationally consistent and consistent with the calculation of CAu_Stokes. This bug has been there since the CAv_Stokes diagnostic was originally added. The loop range over which qS is calculated was also reduced to the range over which it is used. All solutions are bitwise identical, but this commit does change the values of a (perhaps infrequently used) diagnostic. * makedep: Update interpreter directive to python3 The interpreter directive ("shebang") of makedep is updated to `python3`, rather than the version-agnostic `python`. Although we never invoke the shebang of the script, there are OS environments out there which will object to any presence of a versionless python. PEP 394 also strongly recommends the adoption of python3 as the executable name, regardless of Py2 support. * Continuity ppm port to gpu (#29) * present_vhbt_or_set_bt_cont: merge couple of loops * meridional_flux_thickness: cpu optimize a bit * meridional_flux_adjust: back to jki * set_merid_BT_cont: pull out meridional_flux_adjst * set_meridional_BT_cont: optimize cpu * rm remaining merid/zonal_flux_layer * zonal_flux_layere: improve naming a bit * zonal_flux_layere_OBC: improve naming a bit * improve flux_elem line wraps * optimize data transfers a bit * pass elem not arr * add some missing documentation * fix trailing spaces and missing var docs * add last param docs * meridional_flux_adjust: fix fpe err * fix another fpe * cleanup args for new helper subroutines * clean up enter/exit data * fix passing h twice * meridional_mass_flux: do concurrent * zonal_flux_adjust: use a few 3d tmp arrays to mirror meridional_flux_adjust * move target update out of continuity * initialize pbv%por_face_area[U/V] on GPU * cleanup some transfers from continuity_PPM * clean up a few minor things * zonal/meridional_flux_adjust: use scalar u/v_new * declare vp,up,h_tmp on gpu * remove h update * continuity_PPM: minimize mapping stmts * zonal_flux_adjust: minimize mapping stmts * set_zonal_bt_cont: minimize mapping stmts * merional_flux_adjust: minimize mapping stmts * set_merid_bt_cont: minimize mapping stmts * zonal/meridional_flux_adjust: tmp vars duhdu/dvhdv 3d -> scalar * separate alloc of private variables for gcc * u/vh_aux: 3d->2d * target teams loop recognizable by amdflang * Continuity CS outside of init This moves the Continuity CS to the dycore init function. For some reason, this avoids an answer change with CPU. (Possibly because alloc inside of a function doesn't quite match the CS outside of it?) A few minor data transfers are also added to fix up differences in the chksum log output. * zonal_mass_flux: isolate zonal_flux_layer * zonal_mass_flux: seperate local_specified_BC block loop * zonal_mass_flux: add j dim to tmp vars * zonal_mass_flux: seperate visc_rem_max init loops * zonal_mass_flux: seperate du_min/max_CFL init loops * zonal_mass_flux: separate duhdu/uh_tot_0 init loop * zonal_mass_flux: separate du_min/max_CFL aggress_adjust update case (untested) * zonal_mass_flux: separate du_min/max_CFL non-aggress_adjust update case * zonal_mass_flux: separate du_min/max_CFL non-use_visc_rem update case (untested) * zonal_mass_flux: separate du_min/max_CFL 0-clamp loop * zonal_mass_flux: separate do_I local_specified_BC init loop (untested) * copy zonal_flux_adjust that accepts 2d args * zonal_mass_flux move j loop into zonal_flux_adjust copy * zonal_flux_adjust_fused: use 2d internal arr * zonal_flux_adjusted_fused: separate all loops * zonal_mass_flux: separate u/du_cor update (untested) * zonal_mass_flux: replicate former present(uhbt) control flow * copy set_zonal_BT_cont with 2d args * zonal_mass_flux: move j-loop into set_zonal_BT_cont_fused * set_zonal_BT_cont_fused: use zonal_flux_adjust_fused * set_zonal_BT_cont_fused: separate init loop * set_zonal_BT_cont_fused: separate duR/L init loop * set_zonal_BT_cont_fused: separate u_0/L/R init loop * set_zonal_BT_cont_fused: use zonal_flux_layer_fused * set_zonal_BT_cont_fused: separate last 2 loops * copy merid_flux_layer that accepts entire arrs * merid_flux_layer_fused: separate loops * merididional_mass_flux: separate local_specified_BC loop (untested) * merididional_mass_flux: separate tmp variable init loops * merididional_mass_flux: separate dv_min/max_CFL calc loops * merididional_mass_flux: separate dv_min/max_CFL 0 clamp loop * merididional_mass_flux: separate simple_OBC_pt init loop (untested) * copy meridional_flux_adjust that accepts entire arrs * meridional_mass_flux: separate meridional_flux_adjust_fused * meridional_mass_flux: move j-loop into meridional_flux_adjust_fused * meridional_flux_adjust_fused: separate vh_aux,dvhdv init loops * meridional_flux_adjust_fused: 1d->2d tmp arrs * meridional_flux_adjust_fused: separate all arrs * meridional_mass_flux: separate (d)v_cor assignment loops * copy set_merid_BT_cont that accepts entire arrs * meridional_mass_flux: move j-loop into set_merid_BT_cont_fused * set_merid_BT_cont_fused: use meridional_flux_adjust_fused * set_merid_BT_cont_fused: separate tmp var init * set_merid_BT_cont_fused: separate short circuit loop * set_merid_BT_cont_fused: rm redundant k loop in short circuit * set_merid_BT_cont_fused: separate dvL/R init loop * set_merid_BT_cont_fused: make remaining tmp arrs 3d * set_merid_BT_cont_fused: use merid_flux_layer_fused * set_merid_BT_cont_fused: separate last loop * meridional_mass_flux: separate any_simple_OBC loop * zonal_edge_thickness: move k loop->PPM_reconstruction_x * meridional_edge_thickness: move k loop->PPM_reconstruction_y * PPM_reconstruction_x/y: move k loop->PPM_limit_pos * PPM_reconstruction_x/y: move k loop->PPM_limit_cw84 (untested) * zonal_BT_mass_flux: separate all loops (untested) * meridional_BT_mass_flux: separate all loops (untested) * set_zonal_BT_cont_fused: clean up var defs * use SZJB_(G) in do_I dclrns in merid* subroutines * set_merid_BT_cont_fused: clean up var defs * set_merid/zonal_BT_cont_fused -> set_merid/zonal_BT_cont * zonal_flux_adjust_fused: clean up var defs * zonal_flux_adjust_fused -> zonal_flux_adjust * zonal_flux_layer_fused -> zonal_flux_layer * meridional_flux_adjust_fused: clean up var defs * meridional_flux_adjust_fused -> meridional_flux_adjust * merid_flux_layer_fused: clean up var defs * merid_flux_layer_fused -> merid_flux_layer * fix call merid/zonal_flux_layer line conts * zonal_mass_flux: use visc_rem_u_tmp * zonal_mass_flux: separate du_min/max_CFL non-use_visc_rem update case (untested) * copy set_zonal_BT_cont with 2d args * meridional_mass_flux: use visc_rem_v_tmp * copy meridional_flux_adjust that accepts entire arrs * copy set_merid_BT_cont that accepts entire arrs * set_merid_BT_cont_fused: rm redundant k loop in short circuit * set_merid_BT_cont_fused: rm redundant k loop in short circuit * zonal_flux_adjust_fused -> zonal_flux_adjust * meridional_flux_adjust_fused -> meridional_flux_adjust * zonal_flux_layer: move GV in var dclrn * zonal/meridional_mass_flux: remove old visc_rem vars * remove redundant kloop * remove problematic omp directives * port PPM_limit_pos/CW84, PPM_reconstruction_x, zonal_edge_thickness * PPM_reconstruction_x: add enter/exit data stmts * continuity_zonal_convergence: save loop range for porting convenience * port continuity_zonal_convergence * zonal_mass_flux: array init -> loop init * zonal_mass_flux: port init loops * port zonal_flux_layer * zonal_flux_layer: add enter/exit data stmts * port zonal_flux_adjust * port set_zonal_bt_cont * zonal_flux_adjust: add more arrs in enter/exit data * set_zonal_BT_cont: add enter/exit data stmts * port zonal_flux_thickness loops * zonal_flux_thickness add enter/exit data stmts * zonal_mass_flux prepare obc for porting * zonal_mass_flux: merge any_somple_OBC loops * zonal_mass_flux port main loops * zonal_mass_flux port OBC loops * zonal_mass_flux: add enter/exit data stmts * continuity_ppm: add initial enter/exit data stmts * port PPM_reconstruction_y, meridional_edge_thickness * port continuity_merdional_convergence * ppm_reconstruction_y: add enter/exit data stmts * port merid_flux_layer * meridional_flux_adjust: port loops * meridional_flux_adjust: add enter/exit data stmts * meridional_mass_flux: port core loops * meridional_mass_flux: port OBC loops (untested) * meridional_flux_thickness: port core loops * meridional_flux_thickness: attempt port OBC loops (untested) * meridional_flux_thickness: add enter/exit data stmts * port set_merid_BT_cont loops * set_merid_bt_cont: add enter/exit data stmts * meridional_mass_flux: add enter/exit data stmts * zonal/meridional_mass_flux: add missing vars in enter/exit stmts * continuity_PPM: complete enter/exit data stmts * *_edge_thickness: do concurrent * zonal_mass_flux: do concurrent-ify * set_zonal_BT_cont: do concurrent-ify * zonal_flux_adjust: do concurrent-ify * zonal_flux_layer: do concurrent * zonal_flux_thickness: do concurrent * continuity_zonal_convergence: do concurrent * meridional_mass_flux: do concurrent * set_merid_bt_cont: do concurrent * meridional_flux_adjust: do concurrent * merid_flux_layer: do concurrent * meridional_flux_thickness: do concurrent * continuity_merdional_convergence: do concurrent * formatting * continuity_PPM: update LB * meridional/zonal_flux_adjust: separate ij-reduction * zonal_flux_adjust: a couple jki loops * set_zonal_bt_cont: some jki loops * zonal_mass_flux: some jki loops * optimise loops by using scalar zonal_flux_layer * zonal_flux_adjust: duhdu -> scalar * elemental zonal_flux_layere and separated OBC * zonal_flux_adjust: back to jki * zonal_flux_thickness: improve cpu perf a bit * zonal_flux_adjust: add some comments and guard early exit * zonal_flux_adjust: use omp target loop for private arrs * set_zonal_BT_cont: use omp target loop for private arrs * set_zonal_BT_cont: do conc jki loop * zonal_mass_flux: rm useless do_i init * zonal_flux_layere: precalc g_dy_Cu*por_face_areaU * zonal_flux_layere: precalc dh * zonal_flux_thickness: precalc dh * zonal_flux_adjust: make uhbt optional * zonal_flux_thickness: assign to outputs directly * mv zonal_flux_adjust from set_zonal_bt_cont->zonal_mass_flux * zonal_mass_flux: reuse visc_rem_u_tmp more * zonal_mass_flux: remove redundant if stmt * zonal_mass_flux: force inline zonal_flux_layere * zonal_flux_layere: inlinable gfortran -O3 but slower for ifort * zonal_flux_layere: make a bit "smaller" * zonal_mass_flux: move present(uhbt) or set_BT_cont to new subroutine * Revert "zonal_flux_layere: make a bit "smaller"" This reverts commit 316152eb3cf515c4179f8ceb738f9d259233915c. * Revert "zonal_flux_layere: inlinable gfortran -O3 but slower for ifort" This reverts commit 51ee896c54b4a349acb2ffe70489dc76558864b1. * fix gcc line truncation * pass doxygen tests * remove forceinline dirs * attempt document vars * a couple long lines * last trailing space * meridional_mass_flux: use zonal_flux_layere * zonal_flux_layere_OBC: make elemental * meridional_mass_flux: move dv_min/max_CFL calc into j-loop * meridional_mass_flux: move big chunk into separate subroutine * present_vhbt_or_set_bt_cont: merge couple of loops * meridional_flux_thickness: cpu optimize a bit * meridional_flux_adjust: back to jki * set_merid_BT_cont: pull out meridional_flux_adjst * set_meridional_BT_cont: optimize cpu * rm remaining merid/zonal_flux_layer * zonal_flux_layere: improve naming a bit * zonal_flux_layere_OBC: improve naming a bit * improve flux_elem line wraps * optimize data transfers a bit * pass elem not arr * add some missing documentation * fix trailing spaces and missing var docs * add last param docs * meridional_flux_adjust: fix fpe err * fix another fpe * cleanup args for new helper subroutines * clean up enter/exit data * fix passing h twice * meridional_mass_flux: do concurrent * zonal_flux_adjust: use a few 3d tmp arrays to mirror meridional_flux_adjust * move target update out of continuity * initialize pbv%por_face_area[U/V] on GPU * cleanup some transfers from continuity_PPM * clean up a few minor things * zonal/meridional_flux_adjust: use scalar u/v_new * declare vp,up,h_tmp on gpu * remove h update * continuity_PPM: minimize mapping stmts * zonal_flux_adjust: minimize mapping stmts * set_zonal_bt_cont: minimize mapping stmts * merional_flux_adjust: minimize mapping stmts * set_merid_bt_cont: minimize mapping stmts * zonal/meridional_flux_adjust: tmp vars duhdu/dvhdv 3d -> scalar * separate alloc of private variables for gcc * u/vh_aux: 3d->2d * target teams loop recognizable by amdflang * Continuity CS outside of init This moves the Continuity CS to the dycore init function. For some reason, this avoids an answer change with CPU. (Possibly because alloc inside of a function doesn't quite match the CS outside of it?) A few minor data transfers are also added to fix up differences in the chksum log output. * Continuity: Add locality to do concurrent Do concurrent inside of !$omp target teams loop seems to fail standard openmp tests if locality is not correctly set. This patch adds the correct locality to the four `!$omp target teams loop` directives. The domore argument has also been removed, and replaced with a `.not.any(do_i(:))` test. --------- Co-authored-by: Marshall Ward <marshall.ward@noaa.gov> * +Correct halo update sizes and reduce halo updates (#969) Added the new argument dyn_h_stencil to initialize_dyn_split_RK2 and the other 3 dynamic core initialization routines to return the size of the stencil for thicknesses as used by the dynamic core, depending on the options that are being used for the Coriolis and continuity schemes, and then used this in a set of halo updates in step_MOM_dynamics. With this change some additional halo updates that have recently been added inside of step_MOM_dyn_split_RK2 and the other 3 dynamic core time stepping routines could be eliminated. All answers are bitwise identical, but there is a new argument to 4 public interfaces. Co-authored-by: Marshall Ward <marshall.ward@gmail.com> * Correct multi-PE velocity truncation counts (#955) Modified the conditions that determine when to increment the count of velocity truncations to use the thickness as interpolated to velocity points to determine whether layers are thick enough to be counted, rather than the arithmetic mean thickness, and only count truncations that occur in the non-symmetric computational domain to avoid double counting. The filtering thicknesses should be very similar in the ocean interior, but they will differ at open boundary condition points. The corrected counting was verified by running the sloshing/layer test case with a maximum CFL set to 0.01 to create lots of truncations, and then verifying that the truncation count is now the same on 1 and 10 PEs, whereas before it was not. All solutions are bitwise identical, but the reported truncation counts in the ocean.stats files can change in multiple-PE cases with velocity truncations. * ANN parameterization of horizontal momentum eddy fluxes (#944) * Add MOM_ANN module * Mesoscale momentum parameterization with ANN - Computes subgrid stress using ANN in MOM_Zanna_Bolton - Uses MOM_ANN module for ANN inference Equivalent MOM_override for defaults ``` USE_ZB2020 = True ZB2020_USE_ANN = True USE_CIRCULATION_IN_HORVISC = True ZB2020_ANN_FILE_TALL = /path/to/ocean3d/subfilter/FGR3/EXP1/model/Tall.nc ``` * Mesoscale momentum parameterization with ANN (#2) Blank commit after squash/rebase was handled on command line * Moved MOM_ANN.F90 to src/framework/ * Minor refactor of MOM_ANN - Removed unused modules - Removed unused MOM_memory.h - Added input and output means which default to 0 and do not need to be present in the weights file - Gave defaults to means, norms, tests so that they do no need to be present in file - Added missing array notation "(:)" - Minor formatting * Adds unit tests and timing test to MOM_ANN - Added ANN_allocate, set_layer, set_input_normalization, and set_output_normalization methods to allow reconfiguration during unit tests - Added ANN_unit_tests with some simple constructed-by-code networks with known solutions - Added config_src/drivers/unit_tests/test_MOM_ANN.F90 to drive unit tests - Added config_src/drivers/timing_tests/time_MOM_ANN.F90 as rudimentary for timing inference * Adding multiple forms of inference - Adds inference operating on array (instead of single vector of features) - Implements several different versions of inference with various loop orders - Involves storing the transpose of A in the type - Tested by checking inference on same inputs is identical between variants - Added randomizers to assist in unit testing - Adds timing of variants to config_src/drivers/timing/time_MOM_ANN.F90 - Adds an interface (MOM_apply) to select preferred version of inference subroutine - Added command line args to time_MOM_ANN.F90 to allow more rapid evaluation of performance Variants explored, timed with gfortran (13.2) -O3 on Xeon: - vector_v1: - original inference from Pavel - vector_v2: - allocate work arrays just once, using widest layer - loop over layers in 2's to avoid pointer calculations and copies - speed up, x0.8 relative to v1 - vector_v3: - transpose loops - slow down, x1.54 relative to v1 - vector_v4: - transpose weights with same loop order as v1 - slow down, x1.03 relative to v1 - array_v1: - same structure as v2, working on x(space,feature) input/outputs - speed up, x0.41 relative to v1 - array_v2: - as for array_v1 but with transposed loop order - apply activation function on vector of first index while in cache - speed up, x0.35 relative to v1 - array_v3: - same structure as v2, working on x(feature,space) input/outputs - speed up, x0.58 relative to v1 * Renamed ANN variants and added some module documentation - Added module dox - Renamed _v1, _v2 etc to labels - Added ANN_apply_array_sio to ANN_apply interface - Replaced "flops" with "MBps" in timing output * Removed alternative variants of ANN in favor of optimized - Deleted variants of ANN that did not perform as well as the two versions that remain. * Apply array_sio function in ANN inference for momentum fluxes (#5) * Apply array_sio ANN inference for computation of momentum fluxes * remove trailing space * Initial commit * address Robert Hallberg code review * Restore deafult value of ZB_SCALING coefficient --------- Co-authored-by: Alistair Adcroft <Alistair.Adcroft@noaa.gov> Co-authored-by: Alistair Adcroft <adcroft@users.noreply.github.com> * Part one of thickness reservoirs - fixed the restart trouble * Better Kelvin wave results in layer mode * Response to comments on h_reservoirs PR. * Inline Harmonic Analysis Update 1: The accumulator of FtF is now associated with each FtSSH, instead of shared between all FtSSH, such that both are updated at the same model time steps, eliminating the potential inconsistency when different FtSSH are called at different time steps. * Inline Harmonic Analysis Update 2: HA_register are called inside HA_init. This should be done for all variables/fields. * Inline Harmonic Analysis Update 3: A logical flag is set to control whether harmonic analysis is to be enabled. * Inline Harmonic Analysis Update 4: time_ref and const_name are defined in HA_init, instead of being copied from MOM_tidal_forcing. This commit prepares for the separation of MOM_harmonic_analysis and MOM_tidal_forcing. * Inline Harmonic Analysis Update 5: MOM_harmonic_analysis is now independent of MOM_tidal_forcing, providing more flexibility for performing harmonic analyses on tidal constituents not available in MOM_tidal_forcing (e.g., MK3, M4). * Inline Harmonic Analysis Update 6: The frequencies of 8 overtides/compound tides (MK3, MN4, M4, MS4, S4, M6, S6, M8) have been added for the harmonic analysis. * Bug fix in MOM_open_boundary Fixed the inconsistency for defining the reference time of tides in MOM_tidal_forcing and MOM_open_boundary. * +Use I0 format to simplify integer output Use the I0 format that was introduced with Fortran 95 in 155 lines scattered across 40 files to simplify or shorten some error messages. In 21 cases, adjustl() calls that are no longer necessary for intended formatting were also eliminated. These changes have the effect of ensuring that there are still appropriate messages if there are, for example, more than 99 vertical layers or 9999 points (total) in a horizontal directions or more than 9999 PEs. In 15 cases, this change allowed for the elimination or reduction of if tests that formatted output based on the size of an integer. All answers are bitwise identical but there there may be some minor formatting changes in some error messages. * Vert friction: Fix index errors This patch fixes two class of index errors in multiple functions of `MOM_vert_friction.F90`: * `j=G%isc,G%jec` had been incorrectly applied to multiple loops. This went undetected because we almost exclusively use local indexing where `G%isc == G%jsc`, but is nonetheless a serious error. Thanks to Jorge Luis Gálvez Vallejo for reporting. * One errant loop in the shelf code had `i=is,je`. This was undetected due to poor ice shelf coverage testing. Thanks to Claire Yung for reporting. * Vert friction: Column loops moved in layers This patch moves the k-column loops inside of ji-layer loops, rather than outer-k loops of layers. The primary motivation is to restore performance at high-bandwidth runs, which were insufficiently tested during development of the k-j-i form. The inner-column loops show improved performance for both low and high-bandwidth runs. The high-bandwidth benchmark case: (128-core, 256x128 x 75 layer) ``` Profile Reference (Ocean vertical viscosity): 7.158s, 15.047s (-52.4%) ``` The low bandwidth case: (1-core, 32x32 x 75 layer) ``` Profile Reference (Ocean vertical viscosity): 3.911s, 4.788s (-18.3%) ``` For the GFDL OM5 production configuration at 503, runtimes of the slowest ranks were reduced in proportion to the high-bandwidth case above. For the reference dev/gfdl, ``` hits tmin tmax tavg (Ocean vertical viscosity) 288 4.303819 21.483670 14.452196 ``` After apply this patch, times reduce ~40% ``` hits tmin tmax tavg (Ocean vertical viscosity) 288 0.976130 13.398768 8.689331 ``` * Moving to columns allowed for removal of many `do_i` tests, since the test is applied before starting the loop. * The `touch_ij` dummy function was removed, since we're no longer trying to force an IPO optimization. * The shelf requires a re-calculation of the various thickness averages (h_arith, etc). These could be saved as 1D if it becomes a problem. * In addition to the usual regression testing, I also found no regressions in selected ice shelf configurations. * Bug fix for linear wave drag in MOM_barotropic * Linear wave drag is limited to be only applied to land points, using velocity point masks mask2dC[uv]. * Rayleigh_[uv] calculation and bt_rem_[uv] update from linear wave drag is limited for Htot>0 only. This patch eliminates potential NaN in Rayleigh_[uv] in an unusual scenario that Htot==0.0 and lin_drag_[uv]/=0. The changes do not change answers: bt_rem_[uv] is zero at land points regardless. Rayleigh_[uv] is added to [uv]_accel_bt which is masked before updating velocity. * Suppress warning message of negative eta in land In MOM_barotropic and non-Boussinesq mode, warning message on negative eta is now only issued at wet points, consistently with Boussinesq. * Flip the order of acceleration and velocity chksum In MOM_dynamics_split_RK2, now accleration chksum is printed before velocity with debug on, so that we could know which accleration term is responsible for a NaN in velocity. * vv gpu good (#44) * Vertvisc: First part of loop Moving gradually * add nvtx ranges and module warpper * port loop to openmp for now -> will move to do concurrent. Focused on porting to GPU now * more gpu * port the vertical velocity to GPUs * port the following vertical velocity loops to GPUs * more loops using omp target teams, validated * fix compute sanitizer death * meridional velcoity component ported * finish porting vertvisc * offload another loop * revert the last thing * add collapse(2) to the expensive loops * fix * fix validation * updates to porting: find coupling coef * almost done, just missingt he killer loop * lable the block of death * duct taped version of all funcs ported to GPUs * back to format and delete block constructs * remove the maps ins remnant yay! * remove need for local vertvisc_u/v variables * death loop of death has been ported * vertvisc coef does not have explicit mappings anymore * the maps are gone! * limit vel using single data transfer * do concurrent * some memory cleanup and code cleanup - profiling time * update and lots of nvtx markers for opt * limit vel betterments and notes for optimization * make limit vel good * h_ml transfer * Vertfrict: Move CS allocations to init Allocations of the vert frictions and visc control structures were moved from the dycore subroutine loops to the model initialization. Redundant grid (G) transfers were removed (although they were not triggering any transfers). Most (but not all) visc arrays have also been moved into the initialization. Kv_shear and Ray_[uv] need to be added. Trailing whitespace also (inadvertently) got removed, hopefully not a distraction. * coupling coeff fixes This appears to improve the CPU/GPU repro of the latest merge of dev/gpu into dev_gpu_vertvisc. * Resolves some of the u/v/h/dz field states between the barotropic and vertvisc functions. (Development of each assumed that the other was not complete). * This seems to fix some issues with a_[uv], Kv_tot, and dz inside of find_coupling_coef and vertvisc coef * chksum transfers were added to ensure consistent * The diff has moved to frhat[uv] in btcalc. The fields are all zero on GPU, so they are probably not transferred. Work in progress... * Remove redundant BT_cont transfer The BT_cont fields are now on GPU, so they don't need to be transferred before the btcalc call. Checksum transfers for h_ml were also added. Still assuming that it's computed on the CPU. * delete nvtcx * test to see if guarding and reintroducing the check helps * an ugly fix for a simple problem: just want to be sure before I dive * spaces * Vertvisc: formatting/style cleanup This patch reduces the formatting changes to this branch and brings it closer to both dev/gfdl and the MOM6 style guide. One minor non-cosmetic change is the reversion of ADp%dv_dt_str(i,J,1) calculation into one of the main Thomas loop. This returns it to a separate loop. * Vert friction: Explicit CS alloc in dycore init This patch removes the NVIDIA compiler preprocessing and explicitly compiles the `CS` in the dynamic core initilization. `vertvisc_init` is no longer responsible for allocation. The allocation test has also been removed. We just have to trust each other now (or segfault when it refs an unallocated field). The previous error in the CI was the absense of `CS` allocation in the other timestep methods (unsplit, unsplit RK, split RK2b). * Vert friction: Data transfer reduction Several modifications to the data transfer statements * Some redundant transfers were removed * Input/output transfers were moved up to the dycore loop * Vert friction: Cosmetic cleanup Remove some redundant comments, and an unused array loop. * Vert friction: nk_in_ml bugfix --------- Co-authored-by: Marshall Ward <marshall.ward@noaa.gov> * +Add the new parameter RESOLN_FUNCTION_OBC_BUG Added the new runtime parameter RESOLN_FUNCTION_OBC_BUG that can be set to false to take open boundary conditions into account when calculating the resolution functions at u-, v- or q-points. By default the wave speeds used to calculate resolution functions do not take OBCs into account and all answers are bitwise identical. * port main rk loop in a single commit (#47) * port main rk loop * typo and formatting * delete nan inducing copy * Dycore: NaN bugfix + minor diffs * hp upload after zero-initialization was causing random errors, likely in halo values. Now that zero-initialization is GPU-side, no need for upload. * Some nested do-concurrents were consolidated * Lots of whitespace fixes * Dycore: Mem reduction, first pass * Dycore: Mem cleanup second pass * Dycore: memcheck 3 * dycore: memcheck next * Dycore: memcheck, visc_rem and halos * dycore: Keep dz on GPU * Dycore: Remove [uv]_inst transfers * dycore: cs%eta fully on gpu * Dycore: Remove pbv copies A small change, but seems correct. pbv is input-only in step RK2, and is set outside in the main dycore loop (on CPU), followed by an upload. * dycore: memfix accel trim * Dycore: move up, vp down (and apparently an h download was redundant. looks ok, but havent 100% verified...) * dycore: Shift accels later in loop * dycore: tau[xy]_bot on GPU only * Dycore: [uv]_inst moved outside loop * Dycore: [uv]p, visc_rem_[uv] downshift * Dycore: [uvh]_av, uh, vh downshift Also some halo padding * Dycore: Remove multiple forcing/hp updates * Dycore: memfix [uv]_bc_accel * Dycore: Remove u_accel_bt and eta_pred transfers * Dycore: mem cleanup visc_rem_[uv] --------- Co-authored-by: Marshall Ward <marshall.ward@noaa.gov> * +*Fix 3-equation ice-ocean flux iteration (#972) Fix the 3-equation iteration for the buoyancy flux between the ocean and an overlying ice-shelf when ICE_SHELF_BUOYANCY_FLUX_ITT_BUGFIX is true and SHELF_3EQ_GAMMA it false. This code now uses proper bounding of the self-consistent solution, avoiding further amplifying the fluxes in the cases when the differences between the diffusivities of heat and salt to make the buoyancy flux destabilizing for finite turbulent mixing. Both the false-position iterations and the (appropriately chosen) Newton's method iterations have been extensively examined and determined to be working correctly via print statements that have subsequently been removed for efficiency. Previously, the code to determine the 3-equation solution for the buoyancy flux between the ocean and an ice shelf had been skipping iteration altogether or doing un-bounded Newton's method iterations with a sign error in part of the derivative, including taking the square root of negative numbers, leading to the issue described at https://github.com/NOAA-GFDL/MOM6/issues/945. That issue has now been corrected and can be closed once this commit has been merged into the dev/gfdl branch of MOM6. This commit also changes the names of the runtime parameters to correct the ice shelf flux iteration bugs from ICE_SHELF_BUOYANCY_FLUX_ITT_BUG and ICE_SHELF_SALT_FLUX_ITT_BUG to ICE_SHELF_BUOYANCY_FLUX_ITT_BUGFIX and ICE_SHELF_SALT_FLUX_ITT_BUGFIX to avoid confusion with other ..._BUG parameters where `true` is to turn the bugs on, whereas here `true` fixes them. The old names are retained via `old_name` arguments to the `get_param()` calls, so no existing configurations will be disrupted by these changes. Additionally, an expression to determine a scaling factor to limit ice-shelf bottom slopes in `calc_shelf_driving_stress()` was refactored to avoid the possibility of division by zero. This commit will change (and correct) answers for cases with ICE_SHELF_BUOYANCY_FLUX_ITT_BUGFIX set to true, but as these would often fail with a NaN from taking the square root of a negative value, it is very unlikely that any such configurations are actively being used, and there seems little point in retaining the previous answers. No answers are changed in cases that do not use an active ice shelf. Co-authored-by: Alistair Adcroft <adcroft@users.noreply.github.com> * Fix divide by zero fpe in apply_oda_incupd (#164) Move computation of tmp_h inside the if G%mask2dT test to avoid divide by zero error * Dycore: h/h_av mem cleanup * Dycore: Shift h, [uv]_av, [uv]h updates to halo * Dycore: static [uv]htr allocations NVIDIA was correcting for our missing uhtr allocations on the GPU in the slowest way possible. This patch adds an explicit allocation, which reduced the number of transfers dramatically. * add flag for gpu2gpu do_group_update mpi transfers for latest fms * Dycore: Move halo updates to GPU ALso: * Declate [uv]int_cor in MOM_barotropic as loop-locals * Add h to data transfer after RK2 step * DO_LOCALITY() bugfix; use GPU FMS in .testing The default FMS is not compatible with the new GPU-based MPI methods, so we just change it in .testing/Makefile for now. Down the road, we should probably move this into the .github config. DO_LOCALITY() now returns a ; if do concurrent locality modifiers are unsupported. This prevents errors associated with line continuations. * btstep: pre-calculate problematic eta checks on GPU * problem_eta->submerged,any_problem_eta->eta_is_submerged * Fixes shelfwave failure in debug mode - rotated OBC%segment%num_fields needs to be set. * Make sure reversed segments get rotated. * Bugfix for default TIDES_ANSWER_DATE in SAL Fix a bug that the recently changed default answer date for TIDES_ANSWER_DATE is not properly applied to MOM_self_attr_load. TIDES_ANSWER_DATE is used in MOM_self_attr_load to check if SAL_USE_BPA is used after a timestamp, so its default should be consistent with MOM_PressureForce_FV. * Added local versions of density_elem and density_derivs without "this" argument * Fixes a typo in Recon1d PPM limiter Thanks to both @alperaltuntas and @marshallward who noted that a PPM limiter has the expression `( u2 - u1 ) * ( u1 - u0 ) <- 0.0` which is interpreted as `( u2 - u1 ) * ( u1 - u0 ) < -0.0a. Needless to say, the intended code was `( u2 - u1 ) * ( u1 - u0 ) <= 0.0`. The same typo was copied to three files. The high-order estimate of edge value was previously bounded by (u2,u1) or (u1,u0). The missed conditions of either `( u2 - u1) == 0.` or `( u1 - u0 ) == 0.` would then have been caught by the subsequence test for an interior extrema. Thus, I think the cell was still limited to PCM appropriately. However, the typo obscured the intention of the limiter and I was lucky it still worked. * Fixes shelfwave failure in debug mode - rotated OBC%segment%num_fields needs to be set. * Make sure reversed segments get rotated. * Frequency-dependent drag in tensor form This commit allows the frequency-dependent drag to be implemented in tensor form, by incorporating the off-diagonal components of the wave drag tensor into the MOM_wave_drag module. * Adds a PLM reconstruction scheme using least squares for the slope Recon1d_PLM_WLS provides a piecewise linear reconstruction where the slope is the "best" fit as determined by volume-weighted least squares. The reconstruction is NOT limited by neighboring cells. Therefore, this reconstruction is NOT useful for vertical remapping or grid generation. It is instead intended for the pressure gradient calculation; the idea is to disconnect the PLM slope from the values in vanish(ing) layers which appear to be the source of pressure-gradient errors over topographic slopes in z*-coordinate tests. Because the normal limiters do not apply, the only test I could think of was to check that the least squares fit was actually correct. The documentation explains how this was checked (which took a while due to round-off challenges with the loss function). * Vert friction: Force FMAs in tridiag solvers Switching the vertical friction loops from k/j/i to j/i/k replaced the evaluation of `b1` by FMA with a simpler version, causing an answer change when FMAs are enabled. Although less efficient, this patch adds an always-false loop to trick the compiler and force it to always execute `b1` by FMA. Specifically, loops of the following form execute `b1` by FMA. ``` do k=2,nz if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k) c1(k) = dt * CS%a_v(i,J,K) * b1 b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1) ---> b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1)) d1 = b_denom_1 * b1 visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1 enddo ``` Switching to j/i/k ordering allows the Intel compiler to cache `a_[uv](K)` for use in the next iteration of `k` and evaluate `b1` by a single multiplication. If we insert an impossible branch, such as the following: ``` do k=2,nz if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k) c1(k) = dt * CS%a_v(i,J,K) * b1 b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1) b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1)) d1 = b_denom_1 * b1 visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1 ---> if (dt < 0) exit enddo ``` then it blocks the lookahead logic of the compiler and forces the FMA execution as in the k/j/i version. There is a moderate impact on performance. ``` Before: hits tmin tmax tavg tstd tfrac grain pemin pemax (Ocean vertical viscosity) 300 2.717543 3.805039 3.523935 0.174203 0.064 31 0 511 ``` ``` After: hits tmin tmax tavg tstd tfrac grain pemin pemax (Ocean vertical viscosity) 300 2.780148 3.999669 3.761651 0.210061 0.069 31 0 511 ``` so this should only be considered a temporary fix until FMA answer changes are permitted. * fix segfault * send wt_* to gpu for btstep_timeloop * move uv_old download into if block * port find_ustar_mech_forcing interface of find_ustar * init I_Hbbl on GPU instead of CPU * clean up some loops * kji -> jki some loops * remove redundant CS%CA[u,v]_pref downloads * fuse some loops in vertical viscosity * move visc%nkml_visc_[u,v] to out of j loop * Vert frict: nk_in_ml reduce locality in v This patch adds a missing reduce(max: max_nk) locality modifier to nk_in_ml in the v-direction. This fixes an openmp regression that was detected in ifx 2025.2. * port find_uv_at_h * Corrected unit descriptions in 64 comments Corrected the descriptions of variable units in 64 comments spread across 16 files, including a dozen instances where "arbitrary" was misspelled. All answers are bitwise identical and only comments were changed. * *Update TC testing parameters for late 2025 Updated the values of about 21 parameters (many of which are repeated across TC test cases) used in the TC testing to test the most recent versions of code that is selected with ANSWER_DATE flags and to avoid testing the buggy versions of code that is regulated by _BUG flags. This includes some changes to broaden the range of equations of state that are being tested and to test some newer versions. This does change the details of the TC tests, but they should (and do) still pass TC regression tests across code versions. * Added frazil to ice shelf (#985) * Added frazil to ice shelf The frazil mass flux to the ice-shelf base is calculated by multiplying frazil energy [J m-2] by the inverse of the timestep times the latent heat of fusion [kg J-1 s-1]. This frazil mass flux is incorporated as a negative water flux from the ice shelf. This negative water flux then acts to add the frazil mass to the ice shelf base (MOM_ice_shelf.F90/change_thickness_using_melt) and remove it from the ocean surface as evaporation (MOM_ice_shelf.F90/add_shelf_flux). Note frazil is reset to zero at the start of each therm timestep in MOM.F90/step_MOM. Some additional changes were also made to how the ice-shelf flux factor is implemented, so that is only scales ice-shelf melt without affecting the frazil mass flux. * Fixed a commented line where fluxes%water_flux should be ISS%water_flux * Extend the PGF reconstruction to allow PLM-WLS The PLM reconstruction used within the pressure gradient force now supports the weighted least squares approach for slope estimation. In a catastrophic version of seamount/z where vanished layers slightly inflate, the regular finite volume PLM method is sensitive to values in the vanished layers and leads to a feedback that causes en error growth (spontaneous motion). The PLM-WLS method is insensitive to the vanished layers and in the same test leads only to round-off level noise in the flow. * Spatially varying bottom drag coefficient (#983) * Spatially varying bottom drag coefficient The spatially varying bottom drag coefficient can be specified by providing a map of the spatially varying scaling factor. * Spatially varying bottom drag coefficient Fixed the inconsistency at open boundaries when CDRAG_MAP is true. * Correction on total column thickness for wetting In a number of cases, total resting column thickness is calucated as G%bathyT + G%Z_ref, which is largely correct but for wetting, i.e. G%bathyT < 0. This commit makes a correction for seven cases with this potential bug. There is no answer changes if no wetting points are used and G%Z_ref is zero. List of modules/processes affected: * MOM_barotropic * affects only surface stress when BT_NONLIN_STRESS is False. * MOM_wave_speed * h2 calculations in * subroutine internal_tides_init * subroutine int_tide_input_int * subroutine tidal_mixing_init * MOM_lateral_mixing_coeffs * MOM_MEKE * fix repeated symbol eror that ocurs in benchmark * Barotropic: frhat[uv] HYBRID repro fix This patch fixes a minor bit reproducibility regression in the calculation of `hat[uv]tot` and, consequently, `frhat[uv]` in the ifort compiler. Currently, the `hat[uv]tot` loops were moved to separate loops. This patch moves `hat[uv]tot` back into the principal `frhat[uv]` (or in this case, `hat[uv]`) loop. There may be performance consequences to this, I have not yet investigated. This was notably subtle, since many tests only call `btcalc()` once without the BTCONT `h_[uv]` inputs, was only observed in `frhatu`, and did not actually change solution answers. Nonetheless, this is a genuine answer change in an actively used solver, so we want to preserve bit repro until discussed and approved by the consortium. Although I can't see exactly the reason for the diff, it seems possible that Intel has added a reduction-like optimization, even at `-O0`. It also may not have occurred in production compilations with `-fp-model` flags. * Allow overshoot for for grounding test In commit b8c807be327c0, we made the test for SSH penetrating the sea floor when using BT_LIMIT_INTEGRAL_TRANSPORT because we thought it could never happen. Unfortunately, floating-point round off allows violations and we were hitting the now fatal error. This commit calculates the precision we can expect for the current SSH and then if the ocean thickness has become negative within this precision, we reset to zero thickness. This should not change answers in that BT_LIMIT_INTEGRAL_TRANSPORT is a new option, and if anyone was using it they would have encountered a FATAL, and this fix does not alter any positive thicknesses. * Initialize and integer only set on root_PE() When debugging with all run-time tests turned on, the integer `num_lines` was flagged as used but uninitialized when being passed to `broadcast()`. I don't think the code was wrong, just that the checks expected the "inout" argument to be set on all processors when the purpose of `broadcast()` is to take the value from root_PE and send to everyone else. I don't know why this hadn't been detected before - maybe compiler version. The fix is trivial and has no impact on production codes or answers. * Fix an uninitialized float in set_viscous_ML() `oldfn` was not initialized when used in a logical test. This did not matter for numerical results; the logical expression always evaluated to the False correctly due to other parts of the expression. Nevertheless, this variable was technically used uninitialized and a debugging executable doesn't get past this. Hence the fix. * Add floor to "h_marg" in continuity_PPM When debugging the ice sheet configuration, a non-zero barotropic transport could not be reconciled with the layer transports because the derivative of net layer transports was zero (d/dv hu). This arose due to all layer flows pointed from vanished to thick so that their marginal thicknesses were individually zero. Adding a floor to the marginal thickness allows the solver to find the adjustment that does reconciles the two estimates. I've made this optional via parameter CONT_USE_H_MARG_MIN, and with default of False. If this situation had occurred before, we surely would have had a crash so it's likely that always applying this floor would not change answers. However, there's the weak possibility that a teeny-tiny transport, smaller than H_subroundoff, has existed in a run and then this answer would change. With the default of False we can be sure there are no answer changes, but it is recommended to use this option for safety. * *+Fix CHANNEL_DRAG with bathymetry above sea level The CHANNEL_DRAG option was using a harmonic mean to interpolate adjacent bottom depths at velocity points to vorticity points. However, this is not well behaved when the bottom depth is negative (i.e., above sea level), as was noted as a part of PR #975. This commit adds the new runtime parameter CHANNEL_DRAG_SHELFBREAK_DEPTH to set a depth below which a harmonic mean bottom depth is still used to mimic a continental shelfbreak profile, but above which a simple arithmetic mean is used to interpolate bathymetry to vorticity points for use with CHANNEL_DRAG. The expressions vary continuously with depth and avoid the previous problems with division by zero or a badly formed harmonic mean. By default, all answers are bitwise identical in any cases that worked previously, but cases with oceans (or Great Lakes) in basins with bottoms that are above sea-level should now work sensibly when CHANNEL_DRAG is enabled. There is a new runtime parameter in some cases. * Corrected 66 unit descriptions in comments Corrected the incorrect or inconsistent unit descriptions of 28 variables, added descriptions of the units of 4 others, and corrected the non-standard syntax (e.g. backwards or in the wrong order) in the description of 35 variables, scattered across 27 files. Only comments are changed and all answers are bitwise identical. * Fix for ice-shelf friction velocity bugs (#995) * Fix for ice-shelf friction velocity bugs Fixed an incorrect area used to calculate cell-centered ocean surface velocity under the ice_shelf, which can impact the calculation of ice-shelf friction velocity. Added missing flags to some allocate_surface_state calls so that sfc_state%taux_shelf and sfc_state%tauy_shelf are allocated. This is required for the surface-stress-based (rather than surface-velocity-based) calculation of ice-shelf friction velocity. Also added taux_shelf and tauy_shelf as diagnostics for the surface stress under the ice shelf. * Removed unneeded taux_shelf and tauy_shelf diagnostics * Added ustar_from_vel_bugfix flag, which if true, fixes the ustar from ocean velocity bug * offload pass_uta_uhbta * (+) Decouple FMS infra from framework This patch undoes a coupling of the FMS infra layer to the MOM6 framework code. In the current FMS infra layers, the `get_extern_field_info()` and `init_extern_field()` functions require content defined in `src/framework`. This prevents the development of new independent infra layers, which much also depend on infra-agnostic content. In particular, the FMS2 implementation of `get_extern_field_axes()` relies exclusively on the framework function, `get_var_axes_info()`. Both infras also return the `axes_info` type, a MOM-specific framework-level descriptor, rather than the infra `axistype`. This patch resolves these inconsistencies. * `axis_info` no longer appears at infra-level. All relevant functions now reference `axistype`. * `src/framework/MOM_io.F90` now provide functions for translating `axistype` to `axis_info`. Some specific changes are summarized below. * `get_external_field_info` is now a framework-level function of `MOM_interpolate.F90` , using infra-level implementations of `get_extern_field_(size|axes|missing)`. Each is now explicitly defined at the infra-level. * The FMS2 `get_external_field_axes` is now an entirely new function, and is largely a duplicate of `get_var_axes_info()`. The major difference is that it returns a list of `axistype`. It also replaces the fixed x-y-z fetch with a slightly more generic list of axes. (It still requires at least three dimensions, however.) * `set_axis_data` is only used internally by the FMS2 infra. It is included in FMS1 but raises an nonimplementation error. There is one minor API change. * The `name` argument was added to `get_axis_data`. It is now the second argument, to match the style of existing functions, and size was moved to the third argument. Other minor framework references have been removed. * `MOM_error` and `FATAL` now refernce their `MOM_error_infra` equivalents. * `lowercase`, which was previously only defined in FMS1, has been added to the FMS2 infra. Note that this is a duplication of the function in `src/framework/MOM_string_functions.F90`. * Add mom_cap_outputlog.F90 that enables output logging diagnostics at a given hourly output frequency Author: Denise Worthen This feature is required for UFS operational configurations and is used to determine when MOM6 output (diagnostics and restart) have been completed. The log files created by this feature can be queried by the Global Workflow to either trigger downstream jobs or to ensure that if a run fails and a restart is required, model output is available consistent with a given restart file. * Revert vertical viscosity subroutines to JIK loop order used in dev/gfdl (#101) * Rewrote vertvisc_remnant to follow dev/gfdl jik structure. Uses omp directives and private vars to offload to gpu * Rewrote vertvisc to follow dev/gfdl jik structure and use omp directives + private vars in tridiagonal solver * Vertvisc with jik loops based on Marshals vv_coef branch, no transfers except ustar and hml * Switch to distribute parallel do and omp declare target to allow threading with find_coupling call inside of vertvisc_coef * Cleanup commented out code in vertvisc_coef * Revert non-k find_couling functions back to their dev/gfdl version, reintroduce conditionals to vertvisc_coef * Explicitly make coupling routines gpu routines to avoid crash when compiling with O2. Make transfers explicit * Harmonization with dev/gfdl * Several minor changes to harmonize this patch (and dev/gpu) with dev/gfdl. * touch_ij is removed since it is no longer used by the CPU. * Various whitespace changes to OMP directives and loop indices. * Unused OMP directives have been removed. * An `associated(ADp%dv_dt_visc)` typo has been fixed. * Added latent heat flux from ice shelf to ocean fluxes * Fixes wrong number of levels in z-coord diags When a z-coordinate diagnostic grid is specified via the "PARAM" method of coordinate definition, then the number of levels was always the same as the main model. This commit fixes this by first allowing for upto a 1000 levels in the new grid, checking for the actual requested size, and then allocating to that size. It appears we have no examples using this mode, which is probably how this bug has persisted so long. This "PARAM" method of specifying grids is being used in a range of new CMIP7 diagnostics in both MOM6 and COBALT. * Fix bug in registration of ALE sponge diagnostics for generic tracers (#1003) * Init all sponge tendency diag IDs to -1 immediately * No need to reset to -1 since initialized when declared * Move init_ALE_sponge_diags to after all tracers have been set up * Fix reference of (rarely) unassociated pointer These two references to members of a pointer don't seem to be hit except under special circumstances but nevertheless I ran in to them when debugging an unrelated problem. There are two references to members of `diag%axes` that assume `diag%axes` are associated, but in the specific case I was debugging this was not the case. * Adds 5 CMIP7 diagnostics for vertically integrated heat/salt content Five vertically integrated diagnostics are requested in CMIP7. These ultimately are to be for four vertical intervals (0-300m, 300-700m, etc.) but we will handle that through addition of a 4-level diagnostic grid, configured at run-time. This commit handles the conversion from temperature or salt to heat content or salt content (by mass) and registers a "vertically extensive" quantity so that the diagnostics know to re-integrate rather than remap. Changes: - Added diagnostics absscint, pfscint, scint, chcint and phcint - Moved registration of temp_int and salt_int to within an existing `if (use_temperature)` block - Made public 2 GSW conversion functions in MOM_EOS * Optimized the ice-shelf CG scheme by reducing the number of times reproducing_sum (and therefore, mpp_sum) is called. Previously, several 2-D arrays were each being passed within their own reproducing_sum calls, which is now avoided by consolidating the 2-D arrays into one 3-D array that is passed to a single reproducing_sum call. * Check that frazil is allocated before adding it to ice-shelf water flux calculation. Needed for runs without frazil. * Added melt_mask for ice shelves * Added melt_mask to ice-shelf restart * comments and units * subroutine ice_shelf_solve_inner: Completed variable descriptions and units; converted cg_halo and max_cg_halo from real to integer * +Add trim_trailing_commas and ints_to_string Copied the function i2s from MOM_diag_mediator into the function ints_to_string in MOM_string_functions, and moved the code removing trailing commas from two places in MOM_diag_mediator into the new function trim_trailing_commas in MOM_string_functions. Because of the duplication of code between MOM6, SIS2 and the MOM6 ice shelf code, these functions would need to be replicated 3 or 6 times without these changes. Also added unit tests of both new functions to string_functions_unit_tests. All answers are bitwise identical but there are two new public functions in MOM_string_functions. * Call trim_trailing_commas from register_diag_field Call trim_trailing_commas from register_diag_field and register_static_field and ints_to_string from trim_trailing_commas and eliminated the now redundant routine i2s. All code functions exactly as before but there is less duplicative code. * Refactor nsten_halo in routine advect_tracer Move nsten_halo out of iteration loop * Fix OBC indexing bug in MOM_tracer_advect Fix a bug that tracers in domain outside of the OBC is falsely updated when then the OBC is in the interior. The bug was due to an indexing error in routine advect_x. * MOM_interpolate: use get_axis_size() The prior version of `get_external_field_info` incorrectly relied on the `size` output of `get_external_field_info_infra` to determine the size of an external field's axes, since all external fields are assumed to be domain-decomposed. Since axis metadata is generally opaque, we have introduced a new infra function, `get_axis_data`, which returns the size of an axis. * Adds the ability to read a CDEPS configuration file to provide in-line forcing. * Adds the ability to read a CDEPS configuration file to provide in-line forcing. Currently this is set up to read a non-climatological lrunoff data stream only. * ice-ocean-nolib: Fix SIS2 paths Patch to fix the SIS2 paths in the pipeline CI script. Explicitly excludes the icebergs stub, since we are using the actual icebergs model. * Correct the path to the Icepack interfaces The previous attempt to fix the automated no-library build of the ice-ocean model incorrectly specified the path to the Icepack_interfaces. This has now been corrected from `src/SIS2/config_src/external/Icepack_interfaces` to `src/SIS2/config_src/external/Icepack_interfaces` in pipeline-ci-tool.sh. The real mystery here is why the testing on the previous PR actually worked. * Delete unneeded masks args from 25 post_data calls Removed redundant mask arguments from 25 post_data() calls for 2-d arrays that were using masks that would have been set anyway based on the axes of these diagnostics. Explicit masks are only required for arrays that use unusual masks, pass atypically sized arrays (e.g., just the computational domain), or are static diagnostics that do not evolve in time. All answers and diagnostic output are bitwise identical. * vertvisc: add missing b_denom_1 map delete * vertvisc: remove scalar allocs * Add 2D meanSL field The spatially varying time mean sea level meanSL is used as a reference height to calculate, e.g., time mean ocean column thickness max(meanSL + bathyT, 0.0). This field allows the model run in a domain with spatically varying mean height, e.g. the Great Lakes system. This first commit insulates the changes from the rest of the model. It only adds the field to ocean_grid_type and dyn_horgrid_type, the transcription between the two types, and a routine to read it from a file. The field is not yet used by the rest of the code. * Use meanSL to calcualte mean column thickness This commit uses G%meanSL in 13 modules. The change is essentially replacing G%bathyT + G%Z_ref with G%meanSL + G%bathyT. Note that this does NOT mean parameter G%Z_ref is replaced by G%meanSL. G%Z_ref is factored in both G%meanSL and G%bathyT and it is kept as a useful consistency testing tool. Another cosmetic change is made by using G%meanSL + G%bathyT, instead of G%bathyT + G%meanSL, which (hopefully) can be easily interpreted as G%meanSL - (-G%bathyT). * Modify max_depth calculation using meanSL max_depth is really used as a maximum static thickness throughout the model, so meanSL needs to be considered. * +Fix how missing values are handled in post_data At no point does MOM6 code actually set arrays passed to the post_data() to have a missing value. Instead a missing value is set in output files entirely by masking. This commit eliminates the logic that would (inaccurately) try to reset fields that seem to match rescaled missing values to the output missing value. The previous code was inaccurate, in that a rescaled field could have taken on the unscaled missing value as a valid data point and still have been incorrectly marked is missing, although the odds of this happening are exceptionally small and it would only be cases with dimensional rescaling where this could have applied. For 2-d diagnostics, this commit eliminates a duplicative array syntax math expression that did exactly what the code now does. All solutions are identical, and because the missing value was not being explicitly it is unlikely that any diagnostics will change. * Remove ice-sheet melting/freezing contribution to fluxes%latent because it is already accounted for in fluxes%sens * Add tracing instrumentation to nuopc driver (#162) * adds calls to ufs tracing routines that will create a trace file which can then be visualized, which is found to be useful in identifying various performance issues. * +Add G%IdxCu_OBCmask and G%IdyCv_OBCmask Added the new elements `IdxCu_OBCmask` and `IdyCv_OBCmask` to the `ocean_grid_type` and `dyn_horgrid_type` to facilitate the application of no-gradient open boundary conditions at faces with essentially no added overhead. These new arrays are set initially in `set_derived_metrics()` and `set_derived_dyn_horgrid()`, but may be reset in `initialize_masks()` and `open_boundary_impose_land_mask()`. All answers are bitwise identical but there are a pair of new 2-d arrays in two transparent grid types. * Use G%IdxCu_OBCmask in 7 places Modified the code to use `G%IdxCu_OBCmask` and `G%IdyCv_OBCmask` in 7 places each in 6 modules. They are used instead of `G%OBCmaskCu*G%IdxCu` and `G%OBCmaskCv*G%IdyCv`, to which they are equivalent. This change should slightly speed up the model, and as expected all answers are bitwise identical. * Add option to scale tidal amplitude for bottom ustar. (#1016) * Add option to scale tidal amplitude for bottom ustar. - previously we used the tidal amplitude to compute ustar. - The additional factor translates between amplitude and time mean tidal current. - Setting the factor TIDEAMP_FACTOR<0 preserves old answers. * Update tideamp factor implementation for efficiency - factor out the negative "default" value to automatically set to multiply by 1.0 instead of using an if-block. - factor in the c-grid averaging 0.5 to further reduce extra operations, but clearly label the parameter to reflect this. --------- Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov> * Add vertical tracer flux diagnostic for dye tracers (#1022) * Add vertical tracer flux diagnostic for dye tracers - Register vertical flux diagnostic in initialize_dye_tracer - Calculate net vertical flux from entrainment (positive upward) - Post flux diagnostic in dye_tracer_column_physics * changed diagnostic registration to be at interface, made sure boundary fluxes are zero * changed lines 338 and 354 as needed. Fixed accidental space on Line 1. * Regroup MOM_initialize_fixed params in param_doc This commit is meant to fix the issue that all parameters in MOM_initialize_fixed after OBC are logged under module MOM_open_boundary in MOM_parameter_doc. By moving log_version call after OBC, parameters from MOM_initialize_fixed are now logged under three "modules" in MOM_parameter_doc: 1. Parameters before OBC are under module MOM_grid_init, which also (incorrectly) includes topography relatd parameters. 2. module MOM_open_boundary 3. Parameters after OBC are under module MOM_initialize_fixed. The change makes sure OBC parameters are well separated from the other parameters. This is a hack rather than a fix. * Minor open_boundary_config refactor * Make OBC related calls in MOM_initialize_fixed explicitly conditional for readibility. * Early return in open_boundary_config if there is no se…

Hallberg-NOAA added 4 commits July 8, 2019 13:43

Split long comments in RGC_tracer.F90

71693b5

RGC_tracer.F90 previously had some very long comments at the end of some lines. These have now been split onto multiple lines to respect the MOM6 standards for line-length. All answers are bitwise identical.

Split excessively long lines in 2 files

7a9cf32

Split excessively long lines and corrected the syntax for unit documentation in MOM_lateral_mixing_coeffs.F90 and MOM_thickness_diffuse.F90. All answers are bitwise identical.

(*)Multiply fmax by US%s_to_T in MOM_hor_visc.F90

00d99ea

Added a dimensional scaling factor for fmax in MOM_hor_visc.F90 that was dropped at some point in the merging of the dev/ncar code into dev/gfdl. All answers are bitwise identical and now pass the dimensional scaling test.

Travis tests for lines exceeding 120 characters

85939c3

Added the 120 character line limit into the travis testing script.

adcroft merged commit 4cb18ac into mom-ocean:dev/gfdl Jul 8, 2019

Hallberg-NOAA deleted the fix_line_lengths branch July 30, 2021 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MOM6: Corrected line lengths and Travis testing#944

MOM6: Corrected line lengths and Travis testing#944
adcroft merged 4 commits into
mom-ocean:dev/gfdlfrom
Hallberg-NOAA:fix_line_lengths

Hallberg-NOAA commented Jul 8, 2019

Uh oh!

Hallberg-NOAA commented Jul 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hallberg-NOAA commented Jul 8, 2019

Uh oh!

Hallberg-NOAA commented Jul 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants