-
Notifications
You must be signed in to change notification settings - Fork 279
Add p7.2 tests and updates #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ee1c1f9
d7f00d3
29bac6d
ef7ab9a
cfb9db7
3bbc69e
608f46e
0089e7a
69d87bc
dc34f62
f616952
e5f9469
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| ############################################# | ||
| #### NEMS Run-Time Configuration File ##### | ||
| ############################################# | ||
|
|
||
| # EARTH # | ||
| EARTH_component_list: MED ATM CHM OCN ICE WAV | ||
| EARTH_attributes:: | ||
| Verbosity = 0 | ||
| :: | ||
|
|
||
| # MED # | ||
| MED_model: @[med_model] | ||
| MED_petlist_bounds: @[med_petlist_bounds] | ||
| :: | ||
|
|
||
| # ATM # | ||
| ATM_model: @[atm_model] | ||
| ATM_petlist_bounds: @[atm_petlist_bounds] | ||
| ATM_attributes:: | ||
| Verbosity = 0 | ||
| DumpFields = false | ||
| ProfileMemory = false | ||
| OverwriteSlice = true | ||
| :: | ||
|
|
||
| # CHM # | ||
| CHM_model: @[chm_model] | ||
| CHM_petlist_bounds: @[chm_petlist_bounds] | ||
| CHM_attributes:: | ||
| Verbosity = 0 | ||
| :: | ||
|
|
||
| # OCN # | ||
| OCN_model: @[ocn_model] | ||
| OCN_petlist_bounds: @[ocn_petlist_bounds] | ||
| OCN_attributes:: | ||
| Verbosity = 0 | ||
| DumpFields = false | ||
| ProfileMemory = false | ||
| OverwriteSlice = true | ||
| mesh_ocn = @[MESHOCN_ICE] | ||
| :: | ||
|
|
||
| # ICE # | ||
| ICE_model: @[ice_model] | ||
| ICE_petlist_bounds: @[ice_petlist_bounds] | ||
| ICE_attributes:: | ||
| Verbosity = 0 | ||
| DumpFields = false | ||
| ProfileMemory = false | ||
| OverwriteSlice = true | ||
| mesh_ice = @[MESHOCN_ICE] | ||
| stop_n = @[RESTART_N] | ||
| stop_option = nhours | ||
| stop_ymd = -999 | ||
| :: | ||
|
|
||
| # WAV # | ||
| WAV_model: @[wav_model] | ||
| WAV_petlist_bounds: @[wav_petlist_bounds] | ||
| WAV_attributes:: | ||
| Verbosity = 0 | ||
| OverwriteSlice = false | ||
| :: | ||
|
|
||
| # CMEPS warm run sequence | ||
| runSeq:: | ||
| @@[coupling_interval_slow_sec] | ||
| MED med_phases_prep_ocn_avg | ||
| MED -> OCN :remapMethod=redist | ||
| OCN -> WAV | ||
| WAV -> OCN :srcMaskValues=1 | ||
| OCN | ||
| @@[coupling_interval_fast_sec] | ||
| MED med_phases_prep_atm | ||
| MED med_phases_prep_ice | ||
| MED -> ATM :remapMethod=redist | ||
| MED -> ICE :remapMethod=redist | ||
| WAV -> ATM :srcMaskValues=1 | ||
| ATM -> WAV | ||
| ICE -> WAV | ||
| ATM phase1 | ||
| ATM -> CHM | ||
| CHM | ||
| CHM -> ATM | ||
| ATM phase2 | ||
| ICE | ||
| WAV | ||
| ATM -> MED :remapMethod=redist | ||
| MED med_phases_post_atm | ||
| ICE -> MED :remapMethod=redist | ||
| MED med_phases_post_ice | ||
| MED med_phases_prep_ocn_accum | ||
| @ | ||
| OCN -> MED :remapMethod=redist | ||
| MED med_phases_post_ocn | ||
| MED med_phases_restart_write | ||
| @ | ||
| :: | ||
|
|
||
| # CMEPS variables | ||
|
|
||
| DRIVER_attributes:: | ||
| :: | ||
|
|
||
| MED_attributes:: | ||
| ATM_model = @[atm_model] | ||
| ICE_model = @[ice_model] | ||
| OCN_model = @[ocn_model] | ||
| history_n = 1 | ||
| history_option = nhours | ||
| history_ymd = -999 | ||
| coupling_mode = @[CPLMODE] | ||
| :: | ||
| ALLCOMP_attributes:: | ||
| ScalarFieldCount = 2 | ||
| ScalarFieldIdxGridNX = 1 | ||
| ScalarFieldIdxGridNY = 2 | ||
| ScalarFieldName = cpl_scalars | ||
| start_type = @[RUNTYPE] | ||
| restart_dir = RESTART/ | ||
| case_name = ufs.cpld | ||
| restart_n = @[RESTART_N] | ||
| restart_option = nhours | ||
| restart_ymd = -999 | ||
| dbug_flag = @[cap_dbug_flag] | ||
| use_coldstart = @[use_coldstart] | ||
| use_mommesh = @[use_mommesh] | ||
| eps_imesh = @[eps_imesh] | ||
| stop_n = @[FHMAX] | ||
| stop_option = nhours | ||
| stop_ymd = -999 | ||
| :: |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,6 +26,10 @@ RUN | cpld_restart_c384_p8 | |
| COMPILE | -DAPP=S2S -DDEBUG=ON -DCCPP_SUITES=FV3_GFS_v16_coupled_p8 | - wcoss_cray | fv3 | | ||
| RUN | cpld_debug_p8 | - wcoss_cray | fv3 | | ||
|
|
||
| # Add aerosols (temporary) | ||
| COMPILE | -DAPP=S2SW -DUFS_GOCART=ON -DCCPP_SUITES=FV3_GFS_v16_coupled_nsstNoahmpUGWPv1 | + hera.intel | fv3 | | ||
| RUN | cpld_bmark_p7_aero | + hera.intel | fv3 | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest to change the test name to cpld_bmark_p7.2
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we are moving forward to P8 should this be cpld_bmkark_p8_aero instead or do you want this to strictly follow p7.2?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unless the entire feature test suite for P8 gets updated in this PR (meaning all resolutions), I consider this a feature test for s2sw+aerosols, outside of the Prototype feature set. So my suggestion is that this test is cpld_bmark_aero or something similar. |
||
|
|
||
| ################################################################################################################################################################################### | ||
| # PROD tests # | ||
| ################################################################################################################################################################################### | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this variable available on other platforms too? Or is there change/updates required to use set this option on other platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmontuoro made this change, I'm unsure if this is needed elsewhere but I'll see what I can figure out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line too as it is deprecated as Intel 17. Hera admin suggested us not using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmontuoro you okay with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting
I_MPI_DAPL_UDto 1 is required when running coupled GOCART with Intel MPI 2018.0.4. The model will crash otherwise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmontuoro @JessicaMeixner-NOAA I saw 25 tracers are added in the cpld_bmark_aero test (compared to bmark test) and the number of nodes does not increase. and it takes 35 mins to finish. Since all the RT tests are required to finish within 30mins. I ran the test with 20 tasks/node (TPN_cpl_bmrk=20) and without setting I_MPI_DAPL_UD, the test finished in 24 mins. Since the ufs-weather-model supported platforms have different MPI versions, I think this might be helpful to port the aerosol code on other platforms without the MPI specific I_MPI setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junwang-noaa - Thank you for the suggestion. I've rerun the regression test using 20 tasks/node within 27 minutes of wall clock time on Hera, so I agree to setting
TPN_cpl_bmrk=20.My test also did not set
I_MPI_DAPL_UD. Note that, as mentioned in previous discussions, settingI_MPI_DAPL_UDto 1 enables the connectionless DAPL UD transport, which is crucial when running the coupled model (w/ aerosols) on thousands of cores since the default transport would not scale sufficiently, causing timeout errors in the GOCART History component. This regression test uses only 560 MPI tasks and the default DAPL transport may still be adequate. Prototype runs, however, use well above 1k MPI tasks. In such cases, using the connectionless DAPL transport is highly recommended if not required to prevent communication failures.Note also that the Intel MPI library "switched from the Open Fabrics Alliance* (OFA) framework to the Open Fabrics Interfaces* (OFI) framework" with release 2019. Therefore, the DAPL fabric is deprecated starting with the 2019 release (see Intel documentation). The UFS weather model is using Intel MPI 2018 on Hera and Orion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to use DAPL, but the variable I_MPI_DAPL_UD is a depreciated option since Intel 17. Because of this, the impact on srun/slurm is unknown. Some of opn runs are using a couple of thousands tasks without using this setting. Do you have any test case with thousand tasks we can take look the performance of not using this setting? I am not sure what is the problem with those runs with thousand tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junwang-noaa - DAPL and
I_MPI_DAPL_UDare deprecated since release 2019 of Intel MPI. We are still using Intel MPI 2018 on some platforms.Setting
I_MPI_DAPL_UDto 1 (as recommended by NASA) is necessary when running GOCART on a large number of MPI tasks, since the MAPL I/O implementation is based on MPI one-sided communication using Remote Memory Access (RMA) for higher performance. This requires a RDMA-capable communication fabric such as DAPL. Enabling DAPL connectionless transport (I_MPI_DAPL_UD=1) addresses scalability issues/failures at higher core counts and reduces the overall memory footprint. This setting is only needed when running the UFS weather model with GOCART.