Skip to content

Bug fixes in machine files#692

Merged
gsketefian merged 8 commits into
ufs-community:developfrom
gsketefian:bugfix/cleanup_machine_files
Mar 8, 2022
Merged

Bug fixes in machine files#692
gsketefian merged 8 commits into
ufs-community:developfrom
gsketefian:bugfix/cleanup_machine_files

Conversation

@gsketefian
Copy link
Copy Markdown
Collaborator

@gsketefian gsketefian commented Mar 4, 2022

DESCRIPTION OF CHANGES:

Cleaning up bugs in the machine files. The first bug prompted this PR, and the rest were found subsequently. The bugs (and their fixes) are as follows:

  1. A space is missing after the print_info_msg and print_err_msg_exit function calls in the file_location functions. Inserting a space gets passed this bug, but subsequent issues were found as described below.

For machine files that call the print_info_msg function in file_location (cheyenne.sh, hera.sh, jet.sh, and orion.sh):
Fixing this bug leads to other failures because when the "*" stanza is encountered in the file_location function,
the EXTRN_MDL_SYSBASEDIR_ICS|LBCS variable gets set to the message that file_location returns. Since that message contains spaces, it leads to other failures in downstream scripts (the ex-scripts). Simply removing the printing out of the message (thus causing EXTRN_MDL_SYSBASEDIR_ICS|LBCS to be set to a null string) fixes the failures, so this was the fix implemented. If desired, a message for an empty value for EXTRN_MDL_SYSBASEDIR_ICS|LBCS can be placed in another script (where those variables are used).

For machine files that use print_err_msg_exit in file_location (stampede.sh and wcoss_dell_p3.sh):
These should not exit if the file location is not available since the experiment can still complete successfully. So just removing the print_err_msg_exit call should work (and make the behavior of these machine files consistent with the set above).

  1. In all the machine files, the variable FV3GFS_FILE_FMT_ICS should be changed to FV3GFS_FILE_FMT_LBCS in the definition of EXTRN_MDL_SYSBASEDIR_LBCS. This was fixed in all the files.

  2. In stampede.sh, a variable named SYSBASEDIR_ICS is defined. This is a typo. Modify to EXTRN_MDL_SYSBASEDIR_ICS.

TESTS CONDUCTED:

Ran the WE2E test grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GSD_SAR on:

  • Hera -- successful
  • Jet -- successful except for UPP tasks
  • Cheyenne -- successful except for UPP tasks

The UPP task failures are new and being experienced by other PRs as well (e.g. #689). The original issue with machine files seems resolved.

CONTRIBUTORS (optional):

@JeffBeck-NOAA encountered and reported the original error.

1) Need space after "print_info_msg" and "print_err_msg_exit" and next character.
2) Initialize "location" variable to a null string.
3) Change variable FV3GFS_FILE_FMT_ICS to FV3GFS_FILE_FMT_LBCS at appropriate locations.
4) In stampede.sh, change variable name "SYSBASEDIR_ICS" to "EXTRN_MDL_SYSBASEDIR_ICS".
1) Remove printing out of informational messages from the "file_location" function in the machine files because this causes the EXTRN_MDL_SYSBASEDIR_ICS|LBCS variables to wrongly get set to these messages, and that causes scripts downstream to crash.
2) Remove calls to print_err_msg_exit in two of the machine files because we do not want the calling scripts to exit; they can continue by trying HPSS, for example.
Comment thread ush/machine/cheyenne.sh
@@ -71,6 +68,6 @@ NDAS_OBS_DIR="/glade/p/ral/jntp/UFS_SRW_app/develop/obs_data/ndas/proc"
MET_BIN_EXEC="bin"
Copy link
Copy Markdown
Collaborator

@JeffBeck-NOAA JeffBeck-NOAA Mar 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that RUN_CMD_SERIAL and MET_BIN_EXEC use double quotes, and the others use single quotes? Are the single quotes used for commands that are "expandable"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use templated variables (as we do for RUN_CMD_UTILS since they contain references to variables like nprocs that are not yet defined), then use single quotes. For variables whose definitions don't include references to other variables, either single or double quotes work.

Comment thread ush/machine/cheyenne.sh
# Test Data Locations
TEST_PREGEN_BASEDIR=/glade/p/ral/jntp/UFS_CAM/FV3LAM_pregen
TEST_COMINgfs=/glade/scratch/ketefian/NCO_dirs/COMGFS
TEST_EXTRN_MDL_SOURCE_BASEDIR=/glade/p/ral/jntp/UFS_SRW_app/staged_extrn_mdl_files
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths are in double quotes, but the "location" path above is in single quotes. Is that OK?

Comment thread ush/machine/singularity.sh Outdated
location=""
case ${external_model} in

"*")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdwardSnyder-NOAA I'm wondering how you were able to run a successful WE2E test with this print_info_msg call since this causes EXTRN_MDL_SYSBASEDIR_ICS|LBCS to be set to the message itself, and that leads to failures downstream in the exregional_... scripts -- at least on the on-prem machines. I wanted to remove this case statement since it doesn't have any stanzas for which location is set to something (just like orion.sh above). But wanted to ask you first.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to remove that case statement. We haven't ran the WE2E on the singularity container yet.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks.

…e print statement, and we don't want the print statement.
@gsketefian
Copy link
Copy Markdown
Collaborator Author

@JeffBeck-NOAA @christinaholtNOAA Thanks for the reviews. Merging now.

@gsketefian gsketefian merged commit a97763a into ufs-community:develop Mar 8, 2022
@gsketefian gsketefian mentioned this pull request Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants