+Fix OBC reproducibility across restarts#907
Conversation
3a2fba7 to
0d9dd94
Compare
|
We manually tested this PR with the CEFI-NWA12 regional physics-only config. Surprisingly, even the first commit fb7b25b changes the baseline answers and breaks restart reproducibility. You can find the stdout on C6 at: |
|
On the Kelvin_wave/rotated45_BT issue, running on 1 vs 3 cores gives differing values in CS%h in the first step at point (6,23) which is point (6,4) on the middle of the three-core job. This point is just outside of OBC segments 3 and 23. We shouldn't be setting or using h outside, right? Could this be due to changing the order of operations when using different core counts? |
|
In rk2, hp comes out of the second call to continuity differing at that point. On three cores inside continuity_PPM, vh differs between cores 0 and 1 at the face (6,23) aka (6,4), i.e., right on the tile boundary. This is at line L#176. hp then differs between the cores as well at all the i=6 points with values near 100. |
0d9dd94 to
29887b3
Compare
|
CS%v_accel_BT comes out of btstep with differiing values at that point, feeding into vp at line 687 of rk2 solver. |
|
btloop_update_v gets called 4 times in the first call to btstep. The first time, all is 0, times 2 and 3, the OBC values are 0 while the values to either side are non-zero but matching between the cores, the final time, the OBC values are 0 and the values to either side do not match - for PFv and Cor_v, hence v_accel_bt. |
|
Something happens to make vbt and vhbt not match across cores on the fourth call to apply_v_velocities_OBC. I have to quit for the day now. |
|
Could it be that the time-to-exchange code needs to be more conservative? The first time through apply_v_velocities_OBC, the vbt values all start as zero, then the boundary values are set based on external Flather forcing. The second time into the routine, both processes have three matching vbt values at i=6. The third time, two values match, one does not. The fourth time, only the boundary values match, while the halo values have kept their prior value. However, the vbt OBC calculation uses one of these halo points (vbt_old(i,J+1)). |
29887b3 to
1a1fb72
Compare
|
Thank you for the comments on the reproducibility across PE count and layout for the Kelvin wave test case, @kshedstrom. Those issues remain, but are separate from the ones that this PR addresses. I do have some fixes to the rotational symmetry problem with Kelvin waves (they work with 1 PE but not multiple PEs, due to these issues). I will put in that PR and we can use that PR as a platform to continue this conversation. In the mean-time, I would appreciate both your opinion and @yichengt900 's on whether the revised version of the current PR works with the OBC-based cases you use for testing. |
|
We tested this PR using the CEFI For the However, for the NEP configuration, we observed changes in the baseline results after the Add OBC_RESERVOIR_RESTART_BUG commit. A preliminary check suggests that the differences originate at the |
|
After further investigation, we found that the baseline answers for the NEP case can be recovered by reverting the changes in the |
1a1fb72 to
634502d
Compare
|
This PR has been updated to correct the single-character problem that was causing the NE Pacific test to fail. In addition, the runtime parameter associated with this bug has been updated to OBC_RESERVOIR_INIT_BUG to better characterize what it does. |
This commit makes a set of changes to refactor code related to open boundary conditions to correct issues with restarts. Renamed open_boundary_init to open_boundary_halo_update to reflect what it actually does and eliminated the 4 unused arguments to this routine. Moved halo updates for temperature and salinity and the call to setup_OBC_tracer_reservoirs out of fill_temp_salt_segments and made the tv argument to fill_temp_salt_segments intent in. Moved the call to setup_OBC_tracer_reservoirs and the debugging calls for OBCs into the same code block as the rest of the OBC code in MOM_initialize_state. Moved the call to open_boundary_halo_update after MOM_state_initialization for both rotated and unrotated branches. Moved the calls to fill_temp_salt_segments and setup_OBC_tracer_reservoirs and the debugging calls for OBCs into the same code block as the rest of the OBC code in MOM_initialize_state. All answers are bitwise identical, but there are changes to public interfaces and what does and does not occur in several routines.
Added the new runtime option VERTEX_SHEAR_OBC_BUG that can be set to false to use masks to set the thicknesses interpolated to velocity points when open boundary conditions might be used. This commit includes the addition of checksums of the input stratification and shear squared when DEBUG is true. It also refactors the code setting the output diffusivity in Calc_kappa_shear_vertex to move some logical branches outside of the do loops for efficiency. A number of spelling errors were corrected, and some vertex-point indices were converted to uppercase to follow the MOM6 soft convention in MOM_kappa_shear.F90. By default, all answers are bitwise identical, but there is a new runtime parameter.
Added the new optional argument OBC_N2 to calc_isoneutral_slopes that can be use to specify that only interior points are used to calculate the buoyancy frequencies at velocity points when open boundary conditions are in use. When the new OBC_N2 argument is absent or false, the solutions are identical to what they had been previously, but this gives a dependence on the arbitrary outer thicknesses when open boundary conditions are used. By default, all answers are bitwise identical, but there is a new optional argument to a publicly visible interface.
Added the new runtime option MIXING_COEFS_OBC_BUG that can be set to false to specify that only interior data should be used for thickness weighting in the lateral mixing coefficient calculations when open boundary conditions are in use, and that only interior temperatures, salinities, pressures and thicknesses should be used when calculating the buoyancy frequency at open boundary velocity faces. These changes are implemented by introducing two new 3-d arrays that set the thicknesses at velocity points and there are a few new logical tests inside of some loops, which may slow down the code slightly in routines that had never previously been notable for the amount of time spent there. They also make use of the recently added OBC_N2 argument to calc_isoneutral_slopes(). There are new ocean_OBC_type arguments to calc_resoln_function() and calc_sqg_struct() as a part of these changes. The default setting for MIXING_COEFS_OBC_BUG retains the previous (incorrect) answers, and all answers are bitwise identical in cases without open boundary conditions regardless of how it is set. There is a new entry in some MOM_parameter_doc files for some cases with active open boundary conditions and new arguments to publicly visible interfaces.
This commit revises setup_OBC_tracer_reservoirs and other OBC routines to allow some cases using open boundary conditions and tracer reservoirs to reproduce across restarts and added the new runtime parameter OBC_RESERVOIR_INIT_BUG that can be set to true to recover the previous answers in cases that are not initialized from a restart file. There is a new optional MOM_restart_CS argument to setup_OBC_tracer_reservoirs that can be used to query whether the reservoirs have been read from a restart file before resetting them to values that may have been set during initialization. If the reservoirs have not been set from a restart file, there is now a preference to set them from the tracer reservoir (tres) element in the OBC tracer registry on the segments (if it has been initialized) rather than the interior tracer values that have been recorded on the segment. Fill_temp_salt_segments and fill_obgc_segments have been modified to only set the tres segment data if it has not been initialized already, as indicated by the is_initialized field. This commit adds the new public routine set_initialized_OBC_tracer_reservoirs that can be used to record in the restart control structure that the OBC%tres_x and OBC%tres_y fields have been initialized and should not be reset to new values during initialization. The calls to setup_OBC_tracer_reservoirs and open_boundary_halo update have been moved after the call to tracer_flow_control_init so that the full arrays of tracer reservoirs do not need to be initialized in fill_obgc_segments. Fill_obgc_segments was refactored to separate out the steps that should occur there from those that should not and are wrapped in a bug flag. A debugging checksum call to chksum_OBC_segments was added to the top of radiation_open_bdry_conds to assist in any future debugging of the OBC code. Added doxygen comments describing several routines that were previously missing such a description. By default, all answers are bitwise identical, but there is a new publicly visible routine, a new optional argument to a public interface and a new runtime parameter that can be used to correct the previous behavior that broke the reproducibility across restarts of some cases using open boundary conditions with tracer reservoirs.
Copy segment tracer registry data from the input segment to the target segment in rotate_OBC_segment_data, including the tracer reservoirs on segments. With this change, configurations using tracers with open boundary conditions are now passing the rotational consistency tests. This will change answers with some rotated cases with active OBCs, but all answers in non-rotated cases or cases without tracers in OBCs are bitwise identical.
634502d to
97934a9
Compare
|
@Hallberg-NOAA, thanks for the updates! This PR has now passed both NWA and NEP regression tests. The answers remain unchanged and are reproducible across restarts. |
|
I approve of this PR. My tracer reservoir rotational tests are now passing. That said, my restart test of the Arctic is not reproducing, even with the bug flags set to False. I will continue to investigate, but it shouldn't hold this PR up. |
|
This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/27682 with the expected warnings about new runtime parameters. |

This pull request consists of a series of 6 commits that adds new runtime parameters (
VERTEX_SHEAR_OBC_BUG,MIXING_COEFS_OBC_BUGandOBC_RESERVOIR_INIT_BUG) that when set properly (toFalse) will allow some cases (and in particular the loop_current test case) with open boundary conditions to reproduce across restarts and to pass rotational consistency tests. There is also some targeted refactoring along with the code changes to 6 files and some new arguments to public interfaces to enable these changes. By default, all answers are bitwise identical.The detailed description of the various commits are as follows:
This commit makes a set of changes to refactor code related to open boundary conditions to correct issues with restarts.
Renamed
open_boundary_init()toopen_boundary_halo_update()to reflect what it actually does and eliminated the 4 unused arguments to this routine.Moved calls to
fill_temp_salt_segments(),setup_OBC_tracer_reservoirs()andopen_boundary_halo_update()outside ofrotate_OBC_init(), and eliminated 5 unused parameters torotate_OBC_init().Moved halo updates for temperature and salinity and the call to
setup_OBC_tracer_reservoirs()out offill_temp_salt_segments()and made thetvargument tofill_temp_salt_segments()intent in.Moved the call to
open_boundary_halo_update()afterMOM_state_initialization()for both rotated and unrotated branches.Moved the calls to
fill_temp_salt_segments()andsetup_OBC_tracer_reservoirs()and the debugging calls for OBCs into the same code block as the rest of the OBC code inMOM_initialize_state().All answers are bitwise identical, but there are changes to public interfaces and what does and does not occur in several routines.
Added the new runtime option
VERTEX_SHEAR_OBC_BUGthat can be set to false to use masks to set the thicknesses interpolated to velocity points when open boundary conditions might be used. This commit includes the addition of checksums of the input stratification and shear squared whenDEBUGis true. It also refactors the code setting the output diffusivity inCalc_kappa_shear_vertex()to move some logical branches outside of the do loops for efficiency. A number of spelling errors were corrected, and some vertex-point indices were converted to uppercase to follow the MOM6 soft convention in MOM_kappa_shear.F90. By default, all answers are bitwise identical, but there is a new runtime parameter.Added the new optional argument
OBC_N2tocalc_isoneutral_slopes()that can be used to specify that only interior points are used to calculate the buoyancy frequencies at velocity points when open boundary conditions are in use. When the newOBC_N2argument is absent or false, the solutions are identical to what they had been previously, but this gives a dependence on the arbitrary outer thicknesses when open boundary conditions are used. By default, all answers are bitwise identical, but there is a new optional argument to a publicly visible interface.Added the new runtime option
MIXING_COEFS_OBC_BUGthat can be set to false to specify that only interior data should be used for thickness weighting in the lateral mixing coefficient calculations when open boundary conditions are in use, and that only interior temperatures, salinities, pressures and thicknesses should be used when calculating the buoyancy frequency at open boundary velocity faces. These changes are implemented by introducing two new 3-d arrays that set the thicknesses at velocity points and there are a few new logical tests inside of some loops, which may slow down the code slightly in routines that had never previously been notable for the amount of time spent there. They also make use of the recently addedOBC_N2argument tocalc_isoneutral_slopes(). The default setting forMIXING_COEFS_OBC_BUGretains the previous (incorrect) answers, and all answers are bitwise identical in cases without open boundary conditions regardless of how it is set. There is a new entry in some MOM_parameter_docfiles for some cases with active open boundary conditions.
This commit revises
setup_OBC_tracer_reservoirs()and other OBC routines to allow some cases using open boundary conditions and tracer reservoirs to reproduce across restarts and adds the new runtime parameterOBC_RESERVOIR_INIT_BUGthat can be set to true to recover the previous answers in cases that are not initialized from a restart file. There is a new optionalMOM_restart_CSargument tosetup_OBC_tracer_reservoirs()that can be used to query whether the reservoirs have been read from a restart file before resetting them to values that may have been set during initialization. If the reservoirs have not been set from a restart file, there is now a preference to set them from the tracer reservoir (tres) element in the OBC tracer registry on the segments (if it has been initialized) rather than the interior tracer values that have been recorded on the segment.Fill_temp_salt_segments()andfill_obgc_segments()have been modified to only set thetressegment data if it has not been initialized already, as indicated by theis_initializedfield.This commit adds the new public routine
set_initialized_OBC_tracer_reservoirs()that can be used to record in the restart control structure that theOBC%tres_xandOBC%tres_yfields have been initialized and should not be reset to new values during initialization.The calls to
setup_OBC_tracer_reservoirs()andopen_boundary_halo update()have been moved after the call totracer_flow_control_init()so that the full arrays of tracer reservoirs do not need to be initialized infill_obgc_segments().Fill_obgc_segments()was refactored to separate out the steps that should occurthere from those that should not and are wrapped in a bug flag.
A debugging checksum call to
chksum_OBC_segments()was added to the top ofradiation_open_bdry_conds()to assist in any future debugging of the OBC code.Doxygen comments were added describing several routines that were previously missing such a description.
By default, all answers are bitwise identical, but there is a new publicly visible routine, a new optional argument to a public interface and a new runtime parameter that can be used to correct the previous behavior that broke the
reproducibility across restarts of some cases using open boundary conditions with tracer reservoirs.
(*)Correct rotation with OBC tracer reservoirs
Copy segment tracer registry data from the input segment to the target segment in
rotate_OBC_segment_data(), including the tracer reservoirs on segments. With this change, configurations using tracers with open boundary conditions are nowpassing the rotational consistency tests. This will change answers with some rotated cases with active OBCs, but all answers in non-rotated cases or cases without tracers in OBCs are bitwise identical.