Changes towards making SRW app NCO compliant#348
Conversation
354a29b to
bea89e6
Compare
|
@danielabdi-noaa, thank you so much for this work!!! In this PR, you newly define some directories such as HOMErrfs, PARMrrfs, and USHrrfs. This kind of naming only works for RRFS. For example, Online-CMAQ (AQMv7) is an extension of the UFS SRW App. For this implementation, we should name them as HOMEaqm, PARMaqm, and USHaqm. The model name is defined with the "NET" variable in the app. For example, NET=rrfs for RRFS, and NET=aqm for Online-CMAQ. Could you change the way to define these directory names like "HOME{NET}", "PARM{NET}", and "USH{NET}"? Is this possible? |
|
@chan-hoo Thanks! We can probably re-name HOMErrfs, USHrrfs etc to be HOME{NET}, USH{NET} in the ex-scripts and j-jobs, since all we have to do is modify what gets written to the |
|
@danielabdi-noaa, I agree with you! This requirement is only for the "J-job" (and its ex-script). |
|
@danielabdi-noaa, what about naming it with "model" such as "HOMEmodel", "USHmodel", and "PARMmodel" in the python script? |
|
There probably isn't a great solution for this issue (with the SRW App intended for multiple operational systems), but NCO will really want a $HOMErrfs for RRFS and a $HOMEaqm for AQM. Maybe a generic placeholder like $HOMEmodel could be modified ahead of final code delivery to what it needs to be for the specific application. |
|
@danielabdi-noaa @MatthewPyle-NOAA, If so, the "rrfs" suffix would be fine for the python script. I'll modify it in my forked repo for Online-CMAQ when AQMv7 is delivered to NCO. |
|
@danielabdi-noaa I noticed a small issue with the build for WCOSS2. The version defining variable for wrf_io is inconsistently named between versions/build.ver.wcoss2 (export wrf_io_ver=1.2.0) and modulefiles/build_wcoss2_intel ( module load wrf_io/$::env(wrfio_ver) ) Could you modify build_wcoss2_intel to pull in wrf_io_ver? It built for me on WCOSS2 after making this change. |
|
@MatthewPyle-NOAA Thanks, just fixed it in the repo. |
|
@chan-hoo After starting to work on the change, I am realizing it may not be the best idea. + f"HOME{NET}": HOMErrfs,
+ f"USH{NET}": USHrrfs,
+ f"SCRIPTS{NET}": SCRIPTSrrfs,
+ f"JOBS{NET}": JOBSrrfs,
+ f"SORC{NET}": SORCrrfs,
+ f"PARM{NET}": PARMrrfs,
+ f"MODULES{NET}": MODULESrrfs,
+ f"EXEC{NET}": EXECrrfs,
+ f"FIX{NET}": FIXrrfs,In the ex-scripts/j-job scripts we would have to parametrize the variable name itself. If we had an m4 macro processor run over the scripts, maybe it would have been easier, but for now replacing HOMErrfs etc with the appropriate name before delivery sounds better. |
|
@danielabdi-noaa, I got it. I'll replace them manually for AQM. Thank you! |
|
@danielabdi-noaa One more thing jumped out at me when doing some testing. Would it be possible to make DATAROOT a distinctly specifiable variable, rather than having it specified as OPSROOT/tmp/? While production does run that way, development work on WCOSS generally has DATAROOT and COMROOT on different temporary disks that have different retention periods. |
|
@MatthewPyle-NOAA It should be possible but I've had doubts on how NCO variables are specified. Currently the variables are in the # [nco]
envir='para'
NET='rrfs'
RUN='nco_ensemble'
model_ver='we2e'
OPSROOT='/scratch1/BMC/zrtrr/Daniel.Abdi/rrfs/OPSROOT'
COMIN_BASEDIR='/scratch1/BMC/zrtrr/Daniel.Abdi/rrfs/OPSROOT/com/rrfs/we2e'
COMOUT_BASEDIR='/scratch1/BMC/zrtrr/Daniel.Abdi/rrfs/OPSROOT/com/rrfs/we2e'
COMROOT='/scratch1/BMC/zrtrr/Daniel.Abdi/rrfs/OPSROOT/com'
PACKAGEROOT='/scratch1/BMC/zrtrr/Daniel.Abdi/rrfs/OPSROOT/packages'
DATAROOT='/scratch1/BMC/zrtrr/Daniel.Abdi/rrfs/OPSROOT/tmp'
DCOMROOT='/scratch1/BMC/zrtrr/Daniel.Abdi/rrfs/OPSROOT/dcom'
DBNROOT=''
SENDECF='FALSE'
SENDDBN='FALSE'
SENDDBN_NTC='FALSE'
SENDCOM='FALSE'
SENDWEB='FALSE'
KEEPDATA='TRUE'
MAILTO=''
MAILCC=''
If some of these variables are specified by an export from the environment, currently it won't take effect because both j-jobs and ex-scripts source the |
|
Machine: hera |
|
@danielabdi-noaa Sorry about that - I saw something in setup.py, not thinking that I could fix it in var_defns.sh. The highest level items like OPSROOT and DATAROOT (items listed as "job card" in Table 1 of the implementation standards) are specified by the ecflow jobs in production. For rocoto-driven jobs, they would ideally be defined somewhere at the rocoto or rocoto-driving script level. At the J-job and below, the expectation is that they would be available as environmental variables. |
|
@MatthewPyle-NOAA I forgot to mention that you can actually export OPSROOT, DATAROOT etc before workflow generation and it will take those values, instead of the deffaults in |
|
Machine: jet |
gets reported as finished.
f9045a7 to
21f9c96
Compare
|
I've tested the fundamental tests on Gaea through Jenkins since last time. I am fairly confident it works as intended so going to merge now. |
|
@danielabdi-noaa I missed that you'd merged already. Thank you for being so quick and thorough on your responses to reviews! Great job! |
DESCRIPTION OF CHANGES:
Changes to make SRW app NCO complaint following the standard and review.
Here are some of the changes in this PR:
communityorncomodeset -euxconfig.community.yamlandconfig.nco.yamlTESTS CONDUCTED:
Run fundamental tests (9 test cases) on Hera, Jet and Orion manually using both
communityandncomode.Run fundamental tests on Gaea through Jenkins
Run the comprehensive test (82 test cases) on Hera using NCO mode. All of the test cases are successful.
Some unexpected failures.
After a fix for conflicting ics/lbcs temporay working directories in NCO mode all test cases run to completion in one attempt. I suspect that I was very lucky to get this result since develop branch can sometimes fail all of them.get_from_HPSS*tasks that fail when run in tandem with other tests, but not if run individually. Cause: Probably HPSS access issue with multiple processes?ensemble runs going into second day forecast not being run. Cause: make_grid_complete.txt is stored in first day's directory. Not relevant for operations since make_grid is not run, but necessary to run every test case in NCO mode. Find a better way to communicate grid generation completion than using dummy file?Solved by moving grid/orog completion files to a shared space for all cycle dates in
EXPTDIR/grid(orog)(sfc_climo)Solved by outputtingcommunity_ensemble_2mems_stochfails. Cause: This test case generates newinput.nmlper ensemble member.input.nmlto whatever is current working directory.The test directory and rocot stats files:
The NCO operations directory:
The subdirectories
com,outputandtmpare the ones with content.Run the comprehensive test (82 test cases) on Hera using COMMUNITY mode. All of the test cases are successful.
The test directory and rocot stats files:
DEPENDENCIES:
None
DOCUMENTATION:
None but would need documentation updates
ISSUE (optional):
#316
CONTRIBUTORS (optional):
@MatthewPyle-NOAA @christinaholtNOAA