[develop]: Update for Gaea software stack location and Lmod initialization script #627
Conversation
|
|
||
| setenv LMOD_SYSTEM_DEFAULT_MODULES "modules/3.2.11.4" | ||
| module --initial_load --no_redirect restore | ||
| source /lustre/f2/dev/role.epic/contrib/Lmod_init.sh |
There was a problem hiding this comment.
This will not work since you would need a "csh" script.
There was a problem hiding this comment.
Thank you, @danielabdi-noaa , I will prepare a separate Lmod_init.csh script!
There was a problem hiding this comment.
Thank you, @natalie-perlin, for preparing a separate Lmod_init.csh script! Just to reiterate what @danielabdi-noaa noted, while attempting to source the etc/lmod-setup.csh file, I encountered the following:
gaea15 Michael.Lueken/ufs-srweather-app> source etc/lmod-setup.csh gaea
Illegal variable name.
| elif [ "$L_MACHINE" = gaea ]; then | ||
| export BASH_ENV="/lustre/f2/dev/role.epic/contrib/apps/lmod/lmod/init/bash" | ||
| source $BASH_ENV | ||
| source /lustre/f2/dev/role.epic/contrib/Lmod_init.sh |
There was a problem hiding this comment.
I think it is best to call the "bash" script directly instead of what looks like a custom "Lmod_init.sh" script.
Could you please print the contents of the script so that we can see what it does differently?
There was a problem hiding this comment.
The initialization script for Lmod 8.7.12 stores a list of modules that are loaded by default into the user environment, purges the modules, sources the /lustre/f2/dev/role.epic/contrib/apps/lmod/lmod/init/profile, and then loads the default modules in a preferred order, due to some interdependencies. After a default module management manager on Gaea, modules/3.2.11.4 , is loaded, $MODULEHOME is changed to point to a path set by modules/3.2.11.4. In the end of the updated Lmod initialization script, $MODULEHOME is reset to correspond to Lmod 8.7.12 module management package.
There was a problem hiding this comment.
The issue is that the modules/3.2.11.4 is always a default module, but the list of the rest of the modules differ depending on whether you got into a login node or into a compute node during the model run.
There was a problem hiding this comment.
The Lmod_init.sh has the following:
#!/bin/bash
loaded_modules=$(echo ${LOADEDMODULES:-} | tr ":" "\n")
module purge 2>/dev/null
echo "Initializing lua module environment Lmod 8.7.12, loading modules (wait...)"
export LMOD_SYSTEM_DEFAULT_MODULES=modules/3.2.11.4
export BASH_ENV=/lustre/f2/dev/role.epic/contrib/apps/lmod/lmod/init/profile
source $BASH_ENV
export PATH=$MODULESHOME/libexec:$MODULESHOME/init/ksh_funcs:$PATH
module --initial_load --no_redirect restore
#
if [[ -d /opt/cray/ari/modulefiles ]] ; then
module use -a /opt/cray/ari/modulefiles
fi
if [[ -d /opt/cray/pe/ari/modulefiles ]] ; then
module use -a /opt/cray/pe/ari/modulefiles
fi
if [[ -d /opt/cray/pe/craype/default/modulefiles ]] ; then
module use -a /opt/cray/pe/craype/default/modulefiles
fi
# Load craype module first, then DefApps, then all others
for module in $loaded_modules
do
[[ $module == craype/* ]] && module try-load $module
done
for module in $loaded_modules
do
[[ $module == DefApps ]] && module try-load $module
done
for module in $loaded_modules
do
[[ $module == craype/* || $module == DefApps ]] || module is-loaded $module || module try-load $module
done
# Set environment variables
export MODULESHOME=$LMOD_ROOT/lmod
#
# Report when done loading
#echo "... done loading "
There was a problem hiding this comment.
@natalie-perlin Thanks for the details. I was not aware the logic for replacing cray modules has grown now, makes sense to put it in its own script.
|
@MichaelLueken @danielabdi-noaa - Note that there are some differences between the *.sh version, because the modules are not purged in Lmod_init.csh. They simply do not appear in user environment when Lmod init script is sources. I avoided to make a The Lmod_init.csh has the following: |
danielabdi-noaa
left a comment
There was a problem hiding this comment.
@natalie-perlin I was able to load srw modules from my default tcsh login now, so approving.
MichaelLueken
left a comment
There was a problem hiding this comment.
@natalie-perlin Thank you for updating the etc/lmod-setup.csh script. I was ultimately able to build the SRW and run the fundamental tests on Gaea. Having said that, I did note some weird behavior after using source etc/lmod-setup.csh gaea. It would be nice if the warning messages noted in my review could be addressed, but I'm not even sure why they are showing up with your changes (a test using the current develop shows no messages like I see with your fork's branch).
MichaelLueken
left a comment
There was a problem hiding this comment.
@natalie-perlin The SRW App builds without issue on Gaea. Loading the Lmod environment using csh occurs without issue and subsequent builds and WE2E test generation occurs without issue. Approving these changes now!
|
@natalie-perlin The failure of the Jenkins tests on Orion is due to the issue noted in issue #635. The Jenkins tests successfully built and ran on Gaea without issue. |
|
@natalie-perlin I was able to check out externals on Orion manually (the issue with the default Git version used on Orion). Your branch was built without issue and the WE2E fundamental tests ran through to completion without issue. It sounds like the ufs-weather-model is hoping to merge your changes in tomorrow. Once merged, I will move forward with this PR. Thanks! |
|
Since I will be away due to jury duty tomorrow, I will be unable to merge this work. Once PR #1645 has been merged, please go ahead and merge this PR into develop. The tests have successfully passed for Gaea and the Orion tests were manually run and passed as well. If you are fine with waiting until Friday (March 10) morning, I can merge this work at that time. If this work is merged before Friday, please make sure to replace the four commit messages with the following: Thanks! |
|
@MichaelLueken |
DESCRIPTION OF CHANGES:
Gaea hpc-stack location has changed to match that for the UFS-WM that results in all regression test passed successfully.
Lmod initialization has been updated as well, similar to the one that worked for the UFS-WM to pass RTs. It is done by sourcing a single initialization script.
Files changed:
./modulefiles/build_gaea_intel.lua
./etc/lmod-setup.sh
UPDATE: ./etc/lmod-setup.csh is not changed
Type of change
TESTS CONDUCTED:
UFS-WM has passed all the regression tests using this updated hpc-stack location.
The SRW code has successfully compiled.
DEPENDENCIES:
LABELS (optional):