RRFS_cloud: Add Linux Target#44
Conversation
…ity#402) This PR adds generic platforms to the regional_workflow, not specific to any one machine, that should allow users to run the ufs-srweather-app on any UNIX-based machine, without a workflow manager, so long as the NCEPLIBS and other prerequisites have been properly installed. This can be done using the scripts described in regional_workflow/ush/wrappers/README.md; additional documentation is currently being written. Users can utilize these options by setting the MACHINE variable in config.sh to either "LINUX" or "MACOS". The LINUX option should allow most users to run the ufs-srweather-app on a generic Linux OS machine. The MACOS option is for MacOS/Darwin operating systems; this needs to be kept separate because the MacOS version of bash is very old, and missing some functionality, as well as several GNU Linux utilities having different functionality and/or names. "Generic Linux" test was run on Cheyenne machine (GNU 9.1.0 compilers) as a fresh install, including stand-alone install of NCEPLIBS, with no reference to staged or pre-built input files. This was run without rocoto or directly submitting jobs via PBS, but rather the entire workflow was run interactively on a compute node (using the `qinteractive` command which emulated the running of the workflow on a machine with no job scheduler). On MacOS (Catalina, 10.15.7), with GNU 10.1.0 compilers, was able to successfully generate workflow, and run end-to-end successfully. Currently there is a bug in UFS_UTILS that makes the make_orog test fail; UFS UTILS PR245 must be merged to fix this. Resolves ufs-community#369
…manager as Rocoto. (ufs-community#426) ## DESCRIPTION OF CHANGES: Added sourcing of bash utilities to avoid $SED undefined variable error when using the workflow launch script. Add Rocoto as the workflow manager on Gaea. ## TESTS CONDUCTED: Tested on Gaea. Release branch end-to-end tests (aside from 3km runs) were run on Hera and all passed. ## CONTRIBUTORS (optional): @climbfuji, @mkavulich, @gsketefian
|
Just a note. I am running into some issues with the run time environment, so will likely be pushing some other considerable changes once the test is complete. |
christopherwharrop-noaa
left a comment
There was a problem hiding this comment.
I do not think I have enough familiarity with this level of detail to be able to give an informed review. I had a few questions about configuration values for proposed aws config, but do not see anything that concerns me. I have NOT yet tried to actually use this branch to configure/run, so I need to do that before my review would be considered complete.
The vast majority of the changes here seem to deal with sed and readlink compatibility issues. If those were done in isolation first, they could be evaluated more trivially as they seem to constitute a very straightforward change that should not affect existing functionality. A subsequent set of changes which only deal with the functional changes required to add the Linux and MacOS platforms for workflow generation and execution would then be easier to navigate since it would be a bit simpler to see and evaluate them without the added noise of all the other sed and readlink fixes that are necessary, but don't really change workflow generation/execution results.
| PARTITION_DEFAULT="hera" | ||
| QUEUE_DEFAULT=batch | ||
| PARTITION_HPSS=service | ||
| QUEUE_HPSS=batch | ||
| PARTITION_FCST=hera | ||
| QUEUE_FCST=batch |
There was a problem hiding this comment.
It's not clear to me how these will work on parallelcluster since those queues/partitions won't be defined.
There was a problem hiding this comment.
Perhaps these are just meant to be place holders? Same for the paths below?
There was a problem hiding this comment.
Yes. The goal of this PR is to get the Linux target working on Hera. Isolating these changes should also help @mkavulich and others at DTC with implementing Rocoto support for generic Linux environments. The next step for us will be to make the changes needed to run this on AWS instead of Hera.
|
@christinaholtNOAA - Two of the PRs you cherry picked from (#426 and #402) have already been merged to develop. Can you please remind me why those should be cherry picked rather than merging |
…potential undefined variable issues (ufs-community#433) ## DESCRIPTION OF CHANGES: It was found that if set -u is in the user's default bash environment, this will cause the launch script or individual run scripts to fail because you're using a variable before it's defined; this is likely to occur if you submit any of these scripts from a crontab. This was due to the way that the default run command was set up for MacOS and generic LINUX platforms, which was a bit of a hack that resulted in RUN_CMD_FCST being defined twice in var_defns.sh. The fix will delete the first instance of RUN_CMD_FCST in var_defns.sh so that it is no longer referencing an undefined variable early on. This potential bug does not affect Tier 1 supported platforms, only MacOS and generic Linux. ## TESTS CONDUCTED: Tested on affected MacOS platform and the fix worked. Also ran end-to-end tests on Hera and Cheyenne (still running) as a sanity check.
|
@christopherwharrop-noaa Maybe I missed the merge to develop, but I thought they had only been merged to the release branch (release/public-v1), and there is no definite timeline one when the new release branch features will be included in develop. We shouldn't be limited to pull in the latest from develop, which seems to now also include support for SPP (although from what Jeff has mentioned, that modification only includes a limited number of packages). |
|
I pushed my latest commits, including another cherry-picked PR from @mkavulich to the release branch that fixes a bug related to running with rocoto and a generic linux platform. |
|
@christinaholtNOAA - Sorry! My bad. You are correct, of course. They were merged to the release. I'm not used to that development workflow (no pun intended) so saw the "merged" and drew the wrong conclusion without looking carefully. Sorry about that. |
|
I was having trouble with getting through the Hera queue last week, but had more success today. It seems that these changes are working to run the workflow without fail through the forecast and post processing tasks on Hera. |
* add fix_crtm for community mode * remove run_envir from err_exit for fix * remove duplicated def of omp_num_threads in run_fcst * additional clean-up of if-statement in set_rrfs_config_general * add missing tpp_run_fcst to set_rrfs_config_general
DESCRIPTION OF CHANGES:
I have cherry-picked merges from the community release branch into our develop-based working branch. Those include:
I implemented the same changes (manually) that are present in NOAA-EMC/regional_workflow PR #349.
I also added two config files. One for a normal Hera configuration, the other is an identical configuration that would run on Hera, but designed as a generic linux target.
TESTS CONDUCTED:
I have generated workflows that are identical for both Hera and Linux targets. I am currently running the experiments to ensure that the run time environments are working as expected.
CONTRIBUTORS (optional):
@mkavulich (please review!)