[develop] Changes for Derecho, a new platform#894
Conversation
RatkoVasic-NOAA
left a comment
There was a problem hiding this comment.
Looks good to me.
MichaelLueken
left a comment
There was a problem hiding this comment.
@natalie-perlin - Thanks for opening this PR to allow the SRW App to build and run on Derecho!
Since Cheyenne will be decommissioned at the end of the year and given that the NRAL0032 account is out of resources on Cheyenne, should we keep Cheyenne in the various files still, or would it be best to fully transition to Derecho?
If we fully cut support for Cheyenne and fully transition to Derecho, then the modification made in ush/get_crontab_contents.py can be changed so that line 61 would read:
if MACHINE == "DERECHO"
which should allow the Python unittests to pass (currently, the Python unittests are failing in test_get_crontab_contents because the crontab_cmd is being set as usr/bin/crontab rather than crontab).
Has an EPIC Platform ticket been created to create a new Derecho pipeline so that we can add Derecho to the .cicd/Jenkinsfile to run the automated tests on the new platform? If not, please let me know and I can open a ticket for this work.
There was a problem hiding this comment.
In addition to my other suggested change, we should remove all the nodesize: lines from all the files in parm/wflow/. It turns out this <nodesize> tag in the Rocoto XML actually does nothing without a corresponding <cores> tag, which we do not have. And the newer Rocoto build on Derecho gives a bunch of deprecation warnings for this tag each time you run rocotorun, so we should just get rid of it.
Negative news aside, I did confirm I was able to run tests successfully on Derecho! So hopefully once these changes are addressed and the latest development merged in this will be good to go.
Thank you, @mkavulich! Co-authored-by: Michael Kavulich <kavulich@ucar.edu>
|
@mkavulich - addressed your comments on yaml files in wflow/ directory |
mkavulich
left a comment
There was a problem hiding this comment.
Sorry about those late comments, thanks for addressing them!
|
Merged changes from develop, and tested without additional cmake options file for UFS WM. After fixing a default for EXTRN_MDL_DATA_STORES: aws in ./ush/machine/derecho.yaml, all the fundamental test have passed. (before correcting derecho.yaml): after correcting derecho.yaml: |
|
Running comprehensive tests now on Derecho. |
|
@MichaelLueken - are there any additional tests needed for Derecho? As to CI/CD we may not have the account yet. |
|
Comprehesive tests: |
|
@natalie-perlin - With the decommissioning of Cheyenne, using the Are there plans to add GNU to Derecho at a later time? If there are plans, then we can bring in the I'm wrapping up my testing of the Jenkins build and run scripts to ensure that the SRW will build and run using these on Derecho. Additionally, this will also test the coverage suite for the machine. Once they pass, I will give my approval and test the rest of the systems using Jenkins. |
MichaelLueken
left a comment
There was a problem hiding this comment.
@natalie-perlin - The SRW App successfully builds on Derecho using the Jenkins .cicd/scripts/srw_build.sh script. Additionally, the coverage.derecho tests were successfully run using .cicd/scripts/srw_test.sh and all tests successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km COMPLETE 21.29
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 35.41
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 COMPLETE 42.15
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR COMPLETE 26.62
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 16.56
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 38.69
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 23.09
pregen_grid_orog_sfc_climo COMPLETE 12.96
specify_template_filenames COMPLETE 14.32
----------------------------------------------------------------------------------------------------
Total COMPLETE 231.09
Approving this PR now and running the Jenkins tests for the rest of the platforms (since there is no Jenkins runner for Derecho at this time).
|
The Jenkins Hera Intel WE2E coverage tests failed for It failed with a strange NetCDF failure:
A rerun of the test was successful: The Orion and Gaea Jenkins tests have successfully passed. Awaiting completion of Hera GNU and Jet tests now. |
|
Both the Hera GNU and Jet WE2E coverage tests successfully passed on Jenkins. Now moving forward with merging this work. |
Modulefile and other configuration files to adapt the SRW to Derecho system.
Software stacks used for testing are hdf5/1.14.0, netcdf/4.9.2-based, similar to those used in #889.
DESCRIPTION OF CHANGES:
Adding Derecho system at UCAR/NCAR at Tier-1 machine.
Type of change
TESTS CONDUCTED:
All fundamental tests pass.
DEPENDENCIES:
This PR will resolve the issue 884:
#884
This PR depends on #889 - MERGED
DOCUMENTATION:
ISSUE:
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@mark-a-potts
Fundamental tests are successful.
#894 (comment)
WE2E_summary_20230823001411.txt
WE2E_summary_20230823013603.txt