Skip to content

Add GSI and rrfs_utl optional RRFS components#269

Merged
christopherwharrop-noaa merged 13 commits into
ufs-community:developfrom
christopherwharrop-noaa:feature/add-gsi-rrfs_utl
Jun 21, 2022
Merged

Add GSI and rrfs_utl optional RRFS components#269
christopherwharrop-noaa merged 13 commits into
ufs-community:developfrom
christopherwharrop-noaa:feature/add-gsi-rrfs_utl

Conversation

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator

@christopherwharrop-noaa christopherwharrop-noaa commented May 23, 2022

DESCRIPTION OF CHANGES:

This PR closes #239. The GSI and rrfs_utl authoritative repositories are added as optional components that can be built by adding ENABLE_RRFS=on to the cmake build command. Alternatively, they can be compiled using the --rrfs option with the devbuild.sh convenience script.

TESTS CONDUCTED:

Manual build tests have been conducted on:

  • Hera
  • Jet
  • Orion
  • Cheyenne Intel
  • Cheyenne Gnu
  • WCOSS 2 (NOTE: I need a volunteer who has access to test on WCOSS)
  • Parallel Works

Automated WE2E and build tests passed on Jet and Hera.

DEPENDENCIES:

None

DOCUMENTATION:

Usage documentation in devbuild.sh has been added. The rst files for the User's Guide have been updated to include use of RRFS build option as well as a table that lists descriptions of the RRFS executables.

ISSUE:

Closes #239

CONTRIBUTORS:

Rahul Mahajan's refactor of the GSI cmake build was instrumental for enabling this work.

The GSI and rrfs_utl are added as optional components
that can be built by adding "ENABLE_RRFS=on" to the cmake
build command.  Alternatively, the "--rrfs" option may be
used with the devbuild.sh convenience script.
@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

christopherwharrop-noaa commented May 24, 2022

@JeffBeck-NOAA, @jwolff-ncar, @gsketefian, @christinaholtNOAA, @gspetro-NOAA: This PR is in draft status, but I'd like to request feedback about where/how to add documentation for the proposed option that turns on building of GSI and rrfs_utl. I have added the necessary verbiage to the usage() for the devbuild.sh convenience script, but that is all so far. If you have guidance about where the additional information should be added, I'd like to get that pushed before I upgrade this to full PR status. There may be some debate about the approach taken here, so any documentation would, of course, be subject to change as needed during the formal review process. Thank you.

@christinaholtNOAA
Copy link
Copy Markdown
Collaborator

@christopherwharrop-noaa I think this implementation looks very clean and straightforward. Thanks for the contribution.

I'd definitely like to hear from others you listed above.

@chan-hoo
Copy link
Copy Markdown
Collaborator

chan-hoo commented May 24, 2022

@christopherwharrop-noaa, I think we don't have an option excluding GSI and rrfs_utl from External.cfg in case that we don't want to check them out. For RRFS-CMAQ, I introduced a build script on a higher level as shown in https://github.com/chan-hoo/ufs-srweather-app/blob/rrfs-cmaq_uwm/build_srw_app.sh . This script is devbuild.sh (w/o unnecessary options for end users) + test/build.sh (for checking the compiled executables). As shown in Lines 164-179, I put additional external components for AQM in a separate Externals_AQM.cfg file and it only runs when --app=AQM. (this script was removed and merged into devbuild.sh: https://github.com/chan-hoo/ufs-srweather-app/blob/rrfs-cmaq_uwm/devbuild.sh : See Lines 195-210). What do you think about this kind of approach?

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

christopherwharrop-noaa commented May 24, 2022

@chan-hoo - I agree it would be better to not check out components that are not going to be built. Because of the way manage_externals works the only way to accomplish that is by using multiple Externals.cfg files using some mechanism such as the one you've come up with. While I don't really like that GSI and rrfs_utl are checked out even when they aren't used, I am not a fan of adding a lot of additional complexity and another few hundred lines of bash to work around it. My opinion is that the tradeoff in complexity just isn't worth it. The GSI is being checked out here without its git submodules, so goes very quickly. So does rrfs_utl. There is also the possibility that a user will change their mind about what they want to do, so having them checked out allows for them to be built later. I do very much relate to your comment, though. I wish there was a better way than over-engineering around the problem by building yet more "convenience" scripts that make the system more complex and difficult to understand and maintain. This situation is one reason I am not a fan of using repository externals.

@chan-hoo
Copy link
Copy Markdown
Collaborator

chan-hoo commented May 25, 2022

@christopherwharrop-noaa, I am sorry if my comment was not clear enough. My point was not the build script but the '-e' flag option in manage_externals. I agree with you. We don't have to develop a new script only for checking out two external components. However, we should have an option to avoid checking out the extra components if users don't need them.

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

@chan-hoo - Thank you for the clarification. I guess I would be ok with having another Externals.cfg file for the GSI and rrfs_utl repositories and including documentation that explains how to check them out with an additional call to manage_externals that points to that file. Honestly, I don't think there is a good solution to the checkout problem (this is caused by the use of externals in the first place, so is not something that is fixable without a complete rethinking of the entire UFS repository structures/philosophy). Either we accept checking out components that may or may not be used (there is already a precedent for doing this in the ufs-weather-model, right? Aren't there TONS of ufs-weather-model git submodules that always get checked out which may or may not be used?), or we accept an increase in complexity and break our simple instructions for checking out and building the code. Neither of those is desirable, but I don't think we have a choice; we have to pick one and go with it. My preference is the first one because it's the simplest for both the user and developers, with the least change to existing behavior, and costs almost nothing (a minute or two during the one-time checkout). But, if there is a consensus for the latter, I'm fine with it. When we start adding RRFS workflow capabilities, those will be going into the existing regional_workflow, not a new regional_workflow. So, there should not need to be any other new externals for future RRFS capabilities.

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

One approach I didn't mention because it's very disruptive to the established way UFS Apps currently manage their externals is to do away with manage_externals and its Externals.cfg file(s) and instead take advantage of CMake features that allow you to define checkout options for all external projects. This has the advantage of being able to turn checkouts on/off with CMake options to do exactly what we want. The disadvantage is that it buries the external references (repos, branches, hashes, etc) into the src/CMakeLists.txt file with CMake syntax that may be more difficult for some developers/users to understand and find. I'm pretty sure it also requires the external projects to always be CMake projects, which may or may not be a problem.

@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

@JeffBeck-NOAA, @jwolff-ncar, @gsketefian, @christinaholtNOAA, @gspetro-NOAA: This PR is in draft status, but I'd like to request feedback about where/how to add documentation for the proposed option that turns on building of GSI and rrfs_utl. I have added the necessary verbiage to the usage() for the devbuild.sh convenience script, but that is all so far. If you have guidance about where the additional information should be added, I'd like to get that pushed before I upgrade this to full PR status. There may be some debate about the approach taken here, so any documentation would, of course, be subject to change as needed during the formal review process. Thank you.

@christopherwharrop-noaa I think line 162 of the BuildRunSRW.rst file could be a good place to add a note about the --rrfs option.
For example, after line 161, which says "...where valid values are intel or gnu." add:

If users want to build the optional GSI and rrfs_utl optional components, they can add the --rrfs argument. For example:
.. code-block:: console

./devbuild.sh --platform=hera --rrfs

Then you can add info on the cmake build command option to line 284. After "...4 parallel threads to prevent overly long installation times," you could add:

If users want to build the optional GSI and rrfs_utl optional components, they can add ENABLE_RRFS=on to the original cmake command. For example:
.. code-block:: console

cmake .. -DCMAKE_INSTALL_PREFIX=.. ENABLE_RRFS=on
make -j 4 >& build.out &

(Not sure that's how where you'd add ENABLE_RRFS=on/--rrfs, so definitely change the commands I suggested so that they work, but I hope this gives you an idea of where to add the info in the docs!)

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

@gspetro-NOAA - Thank you! That is very helpful. I am not familiar with the documentation part of the repo, so I appreciate you providing those hints for me.

@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

@christopherwharrop-noaa One thing that didn't render properly in my examples above is that any code appearing under a ".. code-block:: console" directive needs to be indented. (Also, fyi, an empty line between the directive and the code snippet is required and an empty line after the code snippet as well.)

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

@gspetro-NOAA - Is there a good way to see the rendered view before pushing to Github?

@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

gspetro-NOAA commented May 26, 2022

@gspetro-NOAA - Is there a good way to see the rendered view before pushing to Github?

@christopherwharrop-noaa You can build the locally or create a readthedocs.org account, which you can connect to your GitHub fork (in My Projects, click on Import a Project. Then, in the Versions tab, you can activate docs for a specific branch of your fork). There are some instructions for building locally in the User's Guide README file, which allows you to view the docs before pushing. However, it's one of those things where if it goes smoothly, it's great, but if it doesn't go smoothly, you could be troubleshooting for a while, and it's easier just to view on RTD after your push and then make any required changes.

Comment thread devbuild.sh Outdated
Comment thread devbuild.sh
Comment thread ufs_srweather_app.settings.in
Comment thread Externals.cfg
#branch = develop
hash = a90b632
local_path = src/gsi
externals = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the externals = None flag causes manage_externals to not check out any submodules. But at some point when we want users to actually run gsi, I'm assuming this will have to be removed so that the submodules get recursively checked out, correct? Will it take a long time to clone then?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will never turn on the checking out of GSI submodules. The only submodule that is relevant is the one for the fix files, and those are currently handled in other ways by the RRFS team. We can definitely discuss strategies for that. But, in terms of submodules for GSI, they will never be checked out by manage_externals unless and until they are moved from VLab to some where that is accessible to the public at large. It's not just a question of how long the checkout takes, it's also a question of access to VLab. Currently, checking out the external submodules for GSI is a non-starter.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems I'm missing something basic. Why clone the top-level GSI repo if you can't clone its submodules? Is it because those submodules (except possibly the one for fix files) are not necessary for running GSI?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The submodules are not needed. One of them libsrc was removed some time ago because it has been superseded by use of hpc-stack libs. The only one that remains is fix, and there are other ways of managing that. The RRFS team's local GSL fork of GSI also does not include any git submodules and they are managing use of any required fix files in other ways. We may need to get more information about GSI fix file requirements from the RRFS team and discuss how to handle any potential issues with that. However, as far as the GSI repository is concerned, we absolutely cannot include checking out of VLab git submodules for the SRW App. That is an absolute show stopper. So, regarding this issue there are two options:

  1. Check out GSI without the fix submodule
  2. Don't add GSI to the SRW App and do not add RRFS capabilities to the SRW App.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to ask Ming Hu about how they deal with GSI fix files. We could also ask Mike Lueken about plans to move the GSI fix files off of VLab to something that is publicly accessible. Given what I've heard about transitions of GSI off of VLab, I am not remotely hopeful about the latter.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Are you comfortable merging this PR without the fix file issue being resolved? I.e. are you hopeful it will be resolved say by staging the fix files on system disk? Or do you think it's better to wait until that's resolved so users don't have to clone gsi and rrfs_utils without being able to use gsi?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question. I think I'll need to talk with the RRFS GSL folks to understand what they're doing before I can have a good answer to that. I wanted to tag Ming here, but he doesn't appear to be on this repo, so I'll have to ping him offline.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsketefian - What do you think about addressing the fix file issue in conjunction with adding the first RRFS workflow capability? I think we've established that resolving handling of the fix files may take some time and involve some creative solutions. My current favorite is putting the fix files in a bucket on the cloud, and then using a workflow task to retrieve them. Or alternatively, copying the fix files from the cloud could become part of the RRFS-related documentation as an additional step. These fix files DO change (quarterly?) so they need to be refreshed from time to time.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherwharrop-noaa That is fine with me. As long as cloning of the new repos is not a significant burden for users (which you said isn't). I'll approve.

@christopherwharrop-noaa christopherwharrop-noaa added ci-hera-intel-build Kicks off automated build test on hera with intel ci-jet-intel-build Kicks off automated build test on jet with gnu labels Jun 16, 2022
@venitahagerty venitahagerty removed ci-jet-intel-build Kicks off automated build test on jet with gnu ci-hera-intel-build Kicks off automated build test on hera with intel labels Jun 16, 2022
@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

@gsketefian - I wanted to alert you that I did push a couple updates since your approval. One was an update to the version of ncio being used (only used for extra RRFS components) to get ncio bug fixes that were recently released. Another was an update to the test/build.sh script that controls automated CI builds on Jet/Hera so that the new RRFS build would be tested. Let me know if you have any issues there.

@venitahagerty
Copy link
Copy Markdown
Collaborator

Machine: hera
Compiler: intel
Job: build
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/944949396/20220616183517/ufs-srweather-app
Build was Successful
If test failed, please make changes and add the following label back:
ci-hera-intel-build

@gsketefian
Copy link
Copy Markdown
Collaborator

@gsketefian - I wanted to alert you that I did push a couple updates since your approval. One was an update to the version of ncio being used (only used for extra RRFS components) to get ncio bug fixes that were recently released. Another was an update to the test/build.sh script that controls automated CI builds on Jet/Hera so that the new RRFS build would be tested. Let me know if you have any issues there.

@christopherwharrop-noaa Those new changes all look good to me. The only thing I wonder is if there should be a disclaimer somewhere (in the docs?) clearly stating that although users can build gsi and rrfs_utl, they can't actually use them because of the missing fix files (as well as the fact that there are no workflow tasks to run them, etc).

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

Good idea. I can make a note in the User's Guide rst file about that.

@venitahagerty
Copy link
Copy Markdown
Collaborator

Machine: jet
Compiler: intel
Job: build
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/944949396/20220616183510/ufs-srweather-app
Build was Successful
If test failed, please make changes and add the following label back:
ci-jet-intel-build

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

@gsketefian - I just added a little NOTE in the docs indicating that the RRFS components are not yet available at runtime. Let me know if this accomplishes what you were thinking.

@christopherwharrop-noaa christopherwharrop-noaa added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Jun 17, 2022
@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

I am currently waiting for ncio-1.1.2 to be installed on Cheyenne as per my request in this hpc-stack issue. Once that is installed, this PR is ready to be merged.

@jkbk2004
Copy link
Copy Markdown
Collaborator

@christopherwharrop-noaa I will take a look on this on cheyenne.

@venitahagerty venitahagerty removed ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Jun 17, 2022
@venitahagerty
Copy link
Copy Markdown
Collaborator

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/944949396/20220617165015/ufs-srweather-app
Build was Successful
If test failed, please make changes and add the following label back:
ci-hera-intel-WE

@venitahagerty
Copy link
Copy Markdown
Collaborator

venitahagerty commented Jun 17, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/944949396/20220617165019/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Failed on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
2022-06-17 17:42:10 +0000 :: fe6 :: Task get_extrn_lbcs, jobid=5238751, in state DEAD (OUT_OF_MEMORY), ran for 151.0 seconds, exit status=253, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
2022-06-17 17:42:07 +0000 :: fe6 :: Task get_extrn_lbcs, jobid=5238766, in state DEAD (OUT_OF_MEMORY), ran for 180.0 seconds, exit status=253, try=1 (of 1)
Experiment Succeeded on jet: nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha

@christopherwharrop-noaa christopherwharrop-noaa added the ci-jet-intel-WE Kicks off automated workflow test on jet with intel label Jun 17, 2022
@venitahagerty venitahagerty removed the ci-jet-intel-WE Kicks off automated workflow test on jet with intel label Jun 17, 2022
@venitahagerty
Copy link
Copy Markdown
Collaborator

venitahagerty commented Jun 17, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/944949396/20220617193516/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on jet: nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta

@christopherwharrop-noaa
Copy link
Copy Markdown
Collaborator Author

ncio-1.1.2 is installed on Cheyenne now. Ready to merge.

@christopherwharrop-noaa christopherwharrop-noaa merged commit 648fc79 into ufs-community:develop Jun 21, 2022
@christopherwharrop-noaa christopherwharrop-noaa deleted the feature/add-gsi-rrfs_utl branch June 21, 2022 18:04
christinaholtNOAA added a commit to christinaholtNOAA/ufs-srweather-app that referenced this pull request Nov 14, 2024
)

Includes:

* A change in the Rocoto YAMLS to use && to link commands together.
* Added documentation to the default_config.yaml.
* Updates to machine YAMLs so we might expect the same behavior on all machines.
* An update for the latest uwtools UPP config (using control_file task).
* Linted and applied black to the whole scripts directory (ignoring a couple of out-of-scope files with pylint disable tags)
* Turned on GH actions for this branch to keep our files linted.
* Removed files and default variables that are no longer used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GSI and RRFS Utilities to enable RRFS DA workflows

7 participants