[develop:] Add the RRFS_SaS physics suite to SRW options#1201
Conversation
|
All three
The experiment directories on Hera can be found - /scratch1/NCEPDEV/stmp2/Michael.Lueken/ufs-srweather-app/hera/expt_dirs. We need to make sure that these tests run correctly before moving forward with merging this PR and creating the release branch for SRW v3.0.0. |
|
Thank you, Michael,
my earlier tests to go ahead with the cold-start RRFSv1 files (on native
model levels) apparently crept into the code. The file listing the data
sources has to be changed, to use *prslev*.conus.* format for RRFS data!
Will do so asap.
…On Tue, Mar 4, 2025 at 10:16 AM Michael Lueken ***@***.***> wrote:
@natalie-perlin <https://github.com/natalie-perlin> -
All three rrfs_sas WE2E tests are failing on Hera:
1. grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas -
failing in run_fcst with the following seg fault [h5c05:1495455:0:1495455]
Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
2. grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_sas -
failing in get_extrn_lbcs/ics due to looking for
/scratch1/NCEPDEV/nems/role.epic/UFS_SRW_data/develop/input_model_data/RRFS/2024060517/rrfs.t17z.natlev.f000.grib2
while only prslev files are available. I suspect that the test would
ultimately fail in run_fcst with a seg fault as well.
3. grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_RRFS_sas_plot -
failing in run_fcst with the same seg fault as
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas above.
The experiment directories on Hera can be found -
/scratch1/NCEPDEV/stmp2/Michael.Lueken/ufs-srweather-app/hera/expt_dirs.
We need to make sure that these tests run correctly before moving forward
with merging this PR and creating the release branch for SRW v3.0.0.
—
Reply to this email directly, view it on GitHub
<#1201 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQHA63HJEDS22PIQBDVWRYD2SW7WNAVCNFSM6AAAAABXEZRWHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJXHE4DOOBXGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: MichaelLueken]*MichaelLueken* left a comment
(ufs-community/ufs-srweather-app#1201)
<#1201 (comment)>
@natalie-perlin <https://github.com/natalie-perlin> -
All three rrfs_sas WE2E tests are failing on Hera:
1. grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas -
failing in run_fcst with the following seg fault [h5c05:1495455:0:1495455]
Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
2. grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_sas -
failing in get_extrn_lbcs/ics due to looking for
/scratch1/NCEPDEV/nems/role.epic/UFS_SRW_data/develop/input_model_data/RRFS/2024060517/rrfs.t17z.natlev.f000.grib2
while only prslev files are available. I suspect that the test would
ultimately fail in run_fcst with a seg fault as well.
3. grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_RRFS_sas_plot -
failing in run_fcst with the same seg fault as
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas above.
The experiment directories on Hera can be found -
/scratch1/NCEPDEV/stmp2/Michael.Lueken/ufs-srweather-app/hera/expt_dirs.
We need to make sure that these tests run correctly before moving forward
with merging this PR and creating the release branch for SRW v3.0.0.
|
mkavulich
left a comment
There was a problem hiding this comment.
Putting in an official "request changes" review because these changes seem problematic. Not just the inclusion of a potentially bad partition name, but changing the default PPN value to not use multiple cores on a node.
Resolved and reverted the changes. |
|
The All RRFS_sas WE2E tests are still seg faulting in run_fcst. |
Are there any logs I could take a look at?.. |
|
Sure, my logs are available on Hera - /scratch1/NCEPDEV/stmp2/Michael.Lueken/ufs-srweather-app/hera/expt_dirs |
@MichaelLueken @mkavulich - For the test submitted below, I simply replaced field_table in the working directory by the field_table from the previous identical experiment that was successful, which contained three additional tracers for smoke and dust. Doing so before the forecasts started allowed it to finish successfully: RRFS_sas definition in FV3.input.yaml contains several parameters for smoke/dust options under |
There was a problem hiding this comment.
@natalie-perlin Thanks for the update. For the record, these field_table entries should not be necessary when smoke and dust are turned off, but I won't hold up this PR further as I hope that this issue can be resolved properly through this discussion between me and @benkozi before the release.
Thank you for pointing to the related discussion. Few things could be indeed hard-coded, so it would be helpful to consult the developers (?) who were putting together all the options for RRFS_sas and to get their perspective. |
|
The additional tests using RRFS_sas were successful after the recent change to the fields_table.RRFS_sas: and |
|
(closed by accident, reopened immediately after) |
|
The grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas WE2E test failed on Hera GNU in the
The directory for the logs can be found - The
The directory for the logs can be found - When the PR was closed, Jenkins dropped the run, leading to the tests not being run on Gaea-C5. I have connected this PR to the sandbox Jenkins pipeline for running the SRW App, and am rerunning them now so that we can have a successful run on Gaea-C5. |
|
If the failures in I'll let you know once the Gaea-C5 tests complete and then we can decide on a path forward at that time. |
Looked at the failed run and compared with the successful run done using Intel compiler. [UPDATING/CORRECTING THE COMMENT after more debugging done for the printouts: the large values that looked wrong are not individual values, but sums across the domain, so it is naturally very large or small numbers] Let me show the comparison between the two cases (in the following comment)... |
|
Log files from the test grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas compiled with the two compilers: Snippets from these log files showing the OK diagnostics before getting to print out the restart files: For INTEL: For GNU: What was different for the two compilers, domain-wide diagnostics (sums of variables across the domain), which have troubles getting from a log for Intel compiler: from a log for GNU compiler: |
|
The Jenkins automated tests have successfully passed on both Hera GNU: and Hera Intel: Moving forward with merging this work now. |
…ty#1201) * RRFS_sas suite is added to the SRW physics suite options. * New WE2E tests prepared: * grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas * grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_RRFS_sas_plot * grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_sas --------- Co-authored-by: Natalie Perlin <Natalie.Perlin@noaa.gov> Co-authored-by: Brandon Selbig <156852197+selbigmtnwx23@users.noreply.github.com> Co-authored-by: Michael Lueken <63728921+MichaelLueken@users.noreply.github.com> Co-authored-by: Gillian Petro <96886803+gspetro-NOAA@users.noreply.github.com>
DESCRIPTION OF CHANGES:
RRFS_sas suite is added to the SRW physics suite options.
This scheme was used in RRFSv1 production runs between ~August to December 2024.
Fractional approach in RUC_LSF not yet used (mosaic_lu=0, mosaic_soil=0), and will be turned on when data generation for the fractional vegetation and soil, as part of the UFS_UTILS, is integrated into the SRW workflow. All other schemes that use RUC_LSM, such as FV3_HRRR and FV3_HRRR_gf, could also be configured to use fractional vegetation and soil data after the data generation becomes functional.
This code still uses sfc_data.nc older (v1) format.
Files modified or added (*):
NB: With the currently used hashes of ufs-srweather-app and FV3atm in this draft PR, an additional namelist file
suite_RRFS_sas.xmlis to be placed under./sorc/ufs-weather-model/FV3/./ccpp/suites/suite_RRFS_sas.xml@MichaelLueken brought changes to the UFS_UTILS branch in NOAA-EPIC repository that allowed preparing input data correctly for the FV3atm model.
Type of change
TESTS CONDUCTED:
Tested on Hera
New WE2E tests prepared:
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_RRFS_sas_plot
grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_sas
Checking out the PR and testing it, example for Hera:
To run other tests, replace the last line by:
or
TESTS on PLATFORMS:
DEPENDENCIES:
DOCUMENTATION:
ISSUE:
This closes an issue #1192
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@MichaelLueken
@ulmononian
Log files:
test_grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_sas.txt
test_grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_RRFS_sas_plot.txt
test_grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_sas.txt