Skip to content

[develop] Replace TCL modulefiles with Lua counterparts#413

Merged
MichaelLueken merged 8 commits into
ufs-community:developfrom
danielabdi-noaa:feature/lua_modules
Oct 21, 2022
Merged

[develop] Replace TCL modulefiles with Lua counterparts#413
MichaelLueken merged 8 commits into
ufs-community:developfrom
danielabdi-noaa:feature/lua_modules

Conversation

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa danielabdi-noaa commented Oct 13, 2022

DESCRIPTION OF CHANGES:

This PR mainly addresses issue #410

  • All TCL modulefiles are replaced with Lua counterparts
  • Setting version numbers of libraries through environment is now possible for Tier-1 platforms
  • Simplifies noaacloud task modulefiles

Moreover,

  • Bug fix for Jet data locations on disk
  • Bug fix NCO mode non-ensemble run post dependency on fcst

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

Run fundamental tests on Jet, Hera and Orion successfully.
For the rest, I did only sanity check to see if there are any syntax errors -- so actual testing with Jenkins is crucial here.

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@MichaelLueken

@danielabdi-noaa danielabdi-noaa changed the title Replace TCL modulefiles with Lua counterparts [develop] Replace TCL modulefiles with Lua counterparts Oct 13, 2022
@danielabdi-noaa danielabdi-noaa added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Oct 13, 2022
@venitahagerty venitahagerty removed ci-jet-intel-WE Kicks off automated workflow test on jet with intel ci-hera-intel-WE Kicks off automated workflow test on hera with intel labels Oct 13, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Oct 13, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Oct 13, 2022
@danielabdi-noaa danielabdi-noaa added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Oct 13, 2022
@venitahagerty venitahagerty removed ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Oct 13, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Oct 13, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Oct 13, 2022
@danielabdi-noaa danielabdi-noaa force-pushed the feature/lua_modules branch 4 times, most recently from 9da7c4e to f52f5ea Compare October 13, 2022 14:21
@danielabdi-noaa danielabdi-noaa added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Oct 13, 2022
@venitahagerty venitahagerty removed ci-jet-intel-WE Kicks off automated workflow test on jet with intel ci-hera-intel-WE Kicks off automated workflow test on hera with intel labels Oct 13, 2022
@venitahagerty
Copy link
Copy Markdown
Collaborator

venitahagerty commented Oct 13, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/1085525562/20221013143511/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 9 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Succeeded on jet: nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR

@danielabdi-noaa danielabdi-noaa force-pushed the feature/lua_modules branch 4 times, most recently from 4ece465 to c04a310 Compare October 13, 2022 18:14
@MichaelLueken
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa It looks like there has been another update on Hera, this time to the nccmp library. This update requires that netcdf be loaded before the nccmp library can be loaded. This update appears to have gone in about an hour and a half ago. This will cause the SRW to fail to build on Hera. Since you are still testing on Hera, I just wanted to bring this to your attention.

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

danielabdi-noaa commented Oct 13, 2022

@MichaelLueken Thank!. Yes, I can confirm that I am not able to load build_hera_intel anymore manually. Although my run started and completed a couple of steps, the update broke make_ics/lbcs steps.

@ufs-community ufs-community deleted a comment from venitahagerty Oct 13, 2022
@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

@chan-hoo Are you able to test this on wcoss2? CIs don't cover wcoss2, and I've already made a couple of mistakes with the wcoss2 modulefiles. So it would be great if you can test this on wcoss2 by running one test case.

Copy link
Copy Markdown
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental tests were run on Hera and all passed successfully. I will go ahead and approve these changes, but I would hold off on merging this until @chan-hoo has had an opportunity to run a test on WCOSS2 (since the Jenkins CI don't run on WCOSS2).

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Oct 14, 2022
@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

danielabdi-noaa commented Oct 15, 2022

This PR failed on Orion and Cheyenne. The former was tested with pre-built binaries so I did not notice the issue with the incorrect cmake version in the build modulefile. I've no made the build manually on Orion and made sure it builds SRW app properly, so the fix should fully resolve the Orion issue.
Cheyenne had other unrelated issues with the CC and CXX variables, which I have fixed, but the real issue according to the log file seems to be something related to the load_any statement in srw_common. The fact that it is in srw_common is suspicious because that is used successfully on 4 other systems now. I would need help from someone with access to Cheyenne to help figure out the problem, if I can not figure out the issue.

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

danielabdi-noaa commented Oct 15, 2022

I can not tell what version of Lmod is present on Cheyenne, but the load_any functionality of Lmod for Lua modulefiles is added in version 8.3.7 according to this issue. It is odd that TCL modulefile did not complain but maybe it had support for TCL's modulefile's load-any earlier than it added the equivalent load_any for Lua modulefiles. In any case, we need to make sure Lmod version is above 8.3.7 on Cheyenne.

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

danielabdi-noaa commented Oct 16, 2022

Cheyenee has Lmod version 8.1.7 so load_any does not work for sure. I made a temporary solution with try_load + isloaded combo as suggested in the above link. So Cheyenne and Orion issues should be solved now.

@MichaelLueken
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa Unfortunately, Cheyenne is down for maintenance this week. Once the machine comes back, I will be able to kick off the Jenkins fundamental tests for Cheyenne. The Jenkins tests for Orion were resubmitted and successfully passed. Once Cheyenne and WCOSS2 have been tested, this work should be ready.

@chan-hoo
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa @MichaelLueken, Sorry! I missed your request. I'll test it on wcoss2 today.

Comment thread modulefiles/wflow_wcoss2.lua Outdated

if mode() == "load" then
LmodMsgRaw([===[Please do the following to activate conda:
> conda activate regional_workflow
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa, can you remove this if-statement? On wcoss2, conda is not available.

@chan-hoo
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa, it works well on wcoss2. Once you remove the if-statement in wflow_wcoss2 I pointed out, I'll approve this pr.

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

@chan-hoo Thanks for testing! I've made the changes you requested.

@MichaelLueken
Copy link
Copy Markdown
Collaborator

The Jenkins CI tests have successfully passed for Cheyenne. This work is ready to be merged.

@MichaelLueken MichaelLueken merged commit 61f8253 into ufs-community:develop Oct 21, 2022

if mode() == "load" then
LmodMsgRaw([===[Please do the following to activate conda:
> conda activate regional_workflow
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line does not work on Cheyenne. The full path must be provided.

conda activate /glade/p/ral/jntp/UFS_SRW_app/conda/regional_workflow

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened PR #424 to fix this issue; not urgent since it is only a display problem but it should get in before the minor release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants