-
Notifications
You must be signed in to change notification settings - Fork 87
Update for latest version of weather model (hash 2f1c8e1), add RRFS_NA_3km pre-defined domain, update timestep and MPI settings #492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mkavulich
merged 6 commits into
ufs-community:develop
from
mkavulich:update_GFS_v16_suite
May 25, 2021
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
1337d9b
Rename FV3_GFS_v16beta to FV3_GFS_v16
mkavulich 597c40a
Add Jeffs changes for new NA domain, plus an RRFS v1alpha suite test
mkavulich 008bf3d
Make grid_RRFS_NA_3km test use RRFS_v1alpha suite, add to test list, …
mkavulich e3ef798
Forgot to add new test to baselines_list.txt
mkavulich 63783c9
Remove old runtime notes
mkavulich c3dfbb9
Fix incorrect settings; k_split and n_split should be switched
mkavulich File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| RUN_ENVIR="community" | ||
| PREEXISTING_DIR_METHOD="rename" | ||
|
|
||
| PREDEF_GRID_NAME="RRFS_NA_3km" | ||
| QUILTING="TRUE" | ||
|
|
||
| CCPP_PHYS_SUITE="FV3_RRFS_v1alpha" | ||
|
|
||
| FCST_LEN_HRS="06" | ||
| LBC_SPEC_INTVL_HRS="6" | ||
|
|
||
| DATE_FIRST_CYCL="20190701" | ||
| DATE_LAST_CYCL="20190701" | ||
| CYCL_HRS=( "00" ) | ||
|
|
||
| EXTRN_MDL_NAME_ICS="FV3GFS" | ||
| EXTRN_MDL_NAME_LBCS="FV3GFS" | ||
| USE_USER_STAGED_EXTRN_FILES="TRUE" | ||
|
|
||
| ######################################################################### | ||
| # The following code/namelist/workflow setting changes are necessary to # | ||
| # run/optimize end-to-end experiments using the 3-km NA grid # | ||
| ######################################################################### | ||
|
|
||
| # The model should be built in 32-bit mode (64-bit will result in much | ||
| # longer run times. | ||
|
|
||
| # Use k_split=2 and n_split=5, the previous namelist values (k_split=4 | ||
| # and n_split=5) will result in significantly longer run times. | ||
|
|
||
| NNODES_MAKE_ICS="12" | ||
| NNODES_MAKE_LBCS="12" | ||
| PPN_MAKE_ICS="4" | ||
| PPN_MAKE_LBCS="4" | ||
| WTIME_MAKE_LBCS="01:00:00" | ||
|
|
||
| PPN_RUN_FCST="24" | ||
|
|
||
| NNODES_RUN_POST="6" | ||
| PPN_RUN_POST="12" | ||
|
|
||
| OMP_STACKSIZE_RUN_FCST="2048m" | ||
|
|
||
| ############################################################################### | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these settings specific to Hera? If we need to change these according to platform, we can put that kind of code in run_experiments.sh. That's where I've been doing platform-specific differentiation of settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeffBeck-NOAA Can better answer these questions since these changes came from him.
I think the problem is specific to the domain, which is very large and high-resolution, requiring more nodes for make_ics and make_lbcs for memory reasons. I assume that would mean this is required on all platforms.
I do think that these settings are less than ideal. Making PPN_MAKE_ICS and PPN_MAKE_LBCS lower means we will be under-utilizing the nodes.
We ddo need to have a separate conversation about making PPN_RUN_FCST (and maybe every PPN setting) platform specific in defaults. Right now we are under-utilizing nodes on most platforms using default settings. Maybe this can be rolled in to issue #452 since the OMP settings will have to be taken into account as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These settings are required for all platforms due to the domain size and resolution. The PPN_MAKE_ICS and PPN_MAKE_LBCS values are a specific chgres_cube requirement for large domains (need fewer processes but massive amounts of memory) due to an ESMF limitation/memory bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeffBeck-NOAA Thanks for the clarification. In that case should these be handled somewhere in the generate workflow calling tree in order to have these settings as a default for this domain specifically? Maybe that's something best handled in an issue and follow-up PR in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich, yeah, this is where things can get really thorny. Do we want to fill the generate script with nested if/case statements for different domains, resolutions, platforms for changes to PPN for individual tasks? I wasn't sure if we wanted to go that route or just have users source the specific WE2E config.sh file when they want to run this domain? I think this definitely deserves an issue and potential follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's likely some variability between platforms, but as a first approximation, chgres_cube needs more nodes and less cores for memory management for this domain across all platforms. Since there are no NNODES_* or PPN_* settings currently defined in the set_predef_grid_params.sh script, I left the changes in the WE2E test. The test is designed for Hera, but could be run on any platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsketefian I was able to run this test on Cheyenne as well (but the forecast step did not complete due to wallclock time, so I can't be sure if it would have fully run successfully).
I agree that k_split and n_split should likely be handled by domain, however, currently they are also set according to the physics scheme, which complicates things. I believe they should also be configurable via config.sh for maximum flexibility. But again, that complicates things.
I can take this change out of the current PR...more discussion definitely needs to happen, the question is do we keep this change for now or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich, if the NA 3-km WE2E test runs with the old k/n_split settings, then lets stick with them and deal with domain-, physics-, and platform-dependent settings in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On AWS for the RRFS work, we run chgres on a single node that has 768 GB of memory. I wonder if there is a way to specify large memory nodes if available, but will leave that to you