Skip to content

Explicit module loading#336

Closed
marshallward wants to merge 1 commit into
NOAA-GFDL:dev/gfdlfrom
marshallward:mrs_module_list
Closed

Explicit module loading#336
marshallward wants to merge 1 commit into
NOAA-GFDL:dev/gfdlfrom
marshallward:mrs_module_list

Conversation

@marshallward
Copy link
Copy Markdown
Member

A change to the Gaea module environment was corrupting the user's
environment in the MOM gitlab testing, causing all module commands to
fail and making it impossible to build and run the models.

This may be caused by the introduction of readonly variables in the
/etc/bash.bashrc.local file, which prevented critical modules from
being loaded (presumably to prevent double module load statements).
It is possible these were incorrectly inherited to noninteractive Gaea
sessions.

This patch introduces a more explicit module setup. The process is
described below in more detail, in case further changes are needed.

  1. module purge

    Clear out any existing modules (but not the readonly flags)

  2. source /opt/cray/pe/modules/default/etc/modules.sh

    Bootstrap the module system and add and following modulepaths:

    • /opt/modulefiles: Compilers
    • /opt/cray/modulefiles: Lower level filesystem
    • /opt/cray/pe/modulefiles: PrgEnv-* and libraries
  3. Load the following modules:

    • craype-network-aries
    • eproxy
    • PrgEnv-${compiler}
    • craype-broadwell
    • cray-mpich
  4. Select compiler version and load the version-dependent modules:

    • cray-hdf5
    • cray-netcdf

Much of this process replicates the steps in /etc/bash.bashrc.local.

Advantages to this approach:

  • It works.
  • It creates a more minimal and predictable environment.

Disadvantages:

  • Paths to modules.sh and modulefiles are hard-coded and need updates.
  • The list of required modules may change over time.

A change to the Gaea module environment was corrupting the user's
environment in the MOM gitlab testing, causing all module commands to
fail and making it impossible to build and run the models.

This may be caused by the introduction of `readonly` variables in the
`/etc/bash.bashrc.local` file, which prevented critical modules from
being loaded (presumably to prevent double `module load` statements).
It is possible these were incorrectly inherited to noninteractive Gaea
sessions.

This patch introduces a more explicit module setup.  The process is
described below in more detail, in case further changes are needed.

1. module purge

   Clear out any existing modules (but not the readonly flags)

2. source /opt/cray/pe/modules/default/etc/modules.sh

   Bootstrap the module system and add and following modulepaths:

     - /opt/modulefiles: Compilers
     - /opt/cray/modulefiles: Lower level filesystem
     - /opt/cray/pe/modulefiles: PrgEnv-* and libraries

3. Load the following modules:

     - craype-network-aries
     - eproxy
     - PrgEnv-${compiler}
     - craype-broadwell
     - cray-mpich

4. Select compiler version and load the version-dependent modules:

     - cray-hdf5
     - cray-netcdf

Much of this process replicates the steps in `/etc/bash.bashrc.local`.

Advantages to this approach:
- It works.
- It creates a more minimal and predictable environment.

Disadvantages:
- Paths to modules.sh and modulefiles are hard-coded and need updates.
- The list of required modules may change over time.
@marshallward
Copy link
Copy Markdown
Member Author

I just noticed that /etc/bash.bashrc.local was updated again on July 14, moving those crucial modules outside of the readonly flag's if-block. So this PR is probably no longer needed.

But if we want to take charge of our modules, then we can still consider it.

@adcroft
Copy link
Copy Markdown
Member

adcroft commented Jul 20, 2021

On the one hand, the testing was originally defined to only work on our production machine and replicate our production environment. On the other, controlling the modules might be amenable to porting our testing to other sites. However, given that we might not (probably won't) have modules at other sites, I'm not sure if this change helps us or not. I've said that the mrs merge was just a pre-cursor to a re-design but waiting until we get around to that might re-encounter the trigger for this PR. I'm happy to moved forward with this as is but if you think it's unnecessary then I'll leave it up to you.

@marshallward
Copy link
Copy Markdown
Member Author

marshallward commented Jul 20, 2021

It's no more localized than anything else in tools/MRS, so I don't think we need to consider the portability. This is a GFDL fix to a GFDL problem.

The underlying problem was later fixed on Gaea's end, but I still think it's better to be explicit, and this explicitly sets our modules, so it is probably the better solution. It also inoculates us from future undocumented changes to /etc/bash.bashrc.local.

If there are future changes, I'd rather be made aware of them than have them implicitly changed again.

A downside here is that our test environment is now more controlled, and no longer reflects the usual managed environment of a user.

If we want a porcelain test environment, then we should do this. If we want a typical user environment, then we should probably not do it and should pester Ops if this happens again.

@marshallward
Copy link
Copy Markdown
Member Author

We were unable to come to a consensus on this issue, mainly due to indifference. The original problem no longer exists, and in all likelihood won't happen again.

I'll keep this branch around in case we want to revisit explicit module control, but for now it's probably OK to close this PR.

marshallward added a commit that referenced this pull request Apr 23, 2025
- NOAA-GFDL/MOM6@484b91756 Merge branch 'main' into merge_main_ncar_02_2025
- NOAA-GFDL/MOM6@6539f6955 Refactor btcalc to reduce duplicated code
- NOAA-GFDL/MOM6@c267ed614 Add btloop_add_dyn_PF and refactor OBCs in btcalc
- NOAA-GFDL/MOM6@812d58541 Barotropic OBC memory and halo update cleanup
- NOAA-GFDL/MOM6@fe7131892 Split western and eastern OBC loops
- NOAA-GFDL/MOM6@3f1a0d0b8 Refactor barotropic OBC code
- NOAA-GFDL/MOM6@8ae091bf7 Eliminate eta_PF_BT pointer in btstep_timeloop
- NOAA-GFDL/MOM6@8951de711 Simpler btloop_eta_predictor
- NOAA-GFDL/MOM6@657b17959 Add new subroutine btloop_find_PF
- NOAA-GFDL/MOM6@488058cd8 Eliminate OBC loops in btloop_update_u
- NOAA-GFDL/MOM6@24e20dc99 Set use_ice_shelf dynamically
- NOAA-GFDL/MOM6@7bae43fe3 Initialize use_ice_shelf in ALE_regridding_and_remapping
- NOAA-GFDL/MOM6@85bd913b6 Merge pull request #1653 from NCAR/dev/ncar_main_candidate_25Feb
- NOAA-GFDL/MOM6@982b5776a Only allocate MARBL coupled fields if MARBL is in use (#345)
- NOAA-GFDL/MOM6@d46123493 Remove year_to_sec() function
- NOAA-GFDL/MOM6@8d8ea0295 Merge branch 'main' into dev/ncar_main_candidate_25Feb
- NOAA-GFDL/MOM6@7ef0a3acb Add log_out default value
- NOAA-GFDL/MOM6@69220e9ce Specify order of operations via parantheses (#346)
- NOAA-GFDL/MOM6@1a6968370 Address comments to NCAR to main (2025-02-10) (#344)
- NOAA-GFDL/MOM6@18a48ef0a Code cleanup (#339)
- NOAA-GFDL/MOM6@623e49d15 Make appending the date to rpointer optional (#343)
- NOAA-GFDL/MOM6@78ca691ef Delete unused "use" statements and variables
- NOAA-GFDL/MOM6@a2193cf94 Make contents of marbl_forcing_CS private
- NOAA-GFDL/MOM6@9359ab745 More comment clean-up
- NOAA-GFDL/MOM6@b268de467 Clean up comments from previous commit
- NOAA-GFDL/MOM6@5402ce0ec Address some comments from code review
- NOAA-GFDL/MOM6@a867971af Remove broken symlink to MARBL (#338)
- NOAA-GFDL/MOM6@d678d3a9c Do not nullify non-present MOM_dom_unmasked pointer
- NOAA-GFDL/MOM6@9f3de4884 OMP directive fixes for build and runtime failures in CESM C cases (#337)
- NOAA-GFDL/MOM6@1221ef3f7 Merge pull request #336 from mom-ocean/main
- NOAA-GFDL/MOM6@8feb16278 Merge pull request #335 from mnlevy1981/update_cvmix
- NOAA-GFDL/MOM6@59925b5f2 Merge pull request #331 from gustavo-marques/mle_lf_via_file
- NOAA-GFDL/MOM6@ea85449b2 Update CVMix to latest commit
- NOAA-GFDL/MOM6@138a66e13 Leith+E bugfix (#334)
- NOAA-GFDL/MOM6@fc0d03135 Fix dimensional scaling and use Adcroft reciprocal
- NOAA-GFDL/MOM6@8b6767f24 Merge pull request #333 from NCAR/merge_pr1647
- NOAA-GFDL/MOM6@b25193536 fix OMP directive bug introduced during conflict reslove
- NOAA-GFDL/MOM6@a92eb6ef5 Merge remote-tracking branch 'upstream/main' into merge_pr1647
- NOAA-GFDL/MOM6@755bd877c Merge branch 'dev_ncar_main' into mle_lf_via_file
- NOAA-GFDL/MOM6@1fc274ddd merge latest dev/ncar and resolve new conflicts in MOM_mixed_layer_restrat
- NOAA-GFDL/MOM6@ea1b4f026 Merge pull request #332 from mnlevy1981/update_cvmix
- NOAA-GFDL/MOM6@788ff4208 Update to latest CVMix
- NOAA-GFDL/MOM6@ebbfa160f Avoid division by zero in I_LFront calculation
- NOAA-GFDL/MOM6@71b6ba35b Initialize mle_fl_2d
- NOAA-GFDL/MOM6@9a191aeac Fixed typos
- NOAA-GFDL/MOM6@217816cd1 Add option to read MLE Frontal-Length Scale via file
- NOAA-GFDL/MOM6@544377e2e Merge pull request #330 from mnlevy1981/update_cvmix
- NOAA-GFDL/MOM6@bd335489b make MLD_003 computation available to mixedlayer_restrat_Bodner (#327)
- NOAA-GFDL/MOM6@289a54bbb Use latest commit from CVMix
- NOAA-GFDL/MOM6@0d828a2b5 Merge branch 'pr1647e' into merge_pr1647
- NOAA-GFDL/MOM6@c17243b92 turn on MEKE thickness diffusivity flags only when USE_MEKE is on
- NOAA-GFDL/MOM6@dc057943e merge pr1647 with dev/ncar and resolve conflicts
- NOAA-GFDL/MOM6@acadc6eca Replace atan with atan2 in NUOPC cap to addresss missing nvhpc support.
- NOAA-GFDL/MOM6@a533e20d8 Merge pull request #322 from alperaltuntas/merge_main
- NOAA-GFDL/MOM6@a775c4d3b merge latest main and resolve conflicts
- NOAA-GFDL/MOM6@b880ce8ff Two small fixes for FPMix (#321)
- NOAA-GFDL/MOM6@c9dab5740 Fix incorrect limiter in FULL_DEPTH_KHTR_MIN implementation (#318)
- NOAA-GFDL/MOM6@a73cd4944 Merge branch 'initialize_chl' into dev/ncar
- NOAA-GFDL/MOM6@c6d6dcc4c Merge branch 'fpmix_debug' into dev/ncar
- NOAA-GFDL/MOM6@2cf0fac2a Merge branch 'fpmix_restart' into dev/ncar
- NOAA-GFDL/MOM6@b3f55b28b Update to run with CESM gnu debug
- NOAA-GFDL/MOM6@5cd90c633 Two changes to fpmix indexing
- NOAA-GFDL/MOM6@67e7e2408 Add KPP OBLdepth to restart file
- NOAA-GFDL/MOM6@bff4d2aac Clean-up following review
- NOAA-GFDL/MOM6@a8cb1d8c9 Velocity remap dimensional consistency (#308)
- NOAA-GFDL/MOM6@2b07520ec Fix comment that exceeded line-length
- NOAA-GFDL/MOM6@d9aa6b49b Add short_name to marbl_single_output_type
- NOAA-GFDL/MOM6@5a2973044 Use new MARBL interface to initialize Chl
- NOAA-GFDL/MOM6@4f1da6bdd Merge pull request #311 from mnlevy1981/update_cvmix
- NOAA-GFDL/MOM6@d38372ae8 Update CVMix external
- NOAA-GFDL/MOM6@6184273fd Stokes MOST modifications to unresolved shear (#306)
- NOAA-GFDL/MOM6@ad7cf38e6 Merge branch 'avoid_zero_chl' into dev/ncar
- NOAA-GFDL/MOM6@8fb8aaf83 Avoid chl=0 in lookup_ohlmann_opacity()
- NOAA-GFDL/MOM6@00beb269f add timestamp to rpointer files (#304)
- NOAA-GFDL/MOM6@5726d941d Don't compute log10(chl) for small chl
- NOAA-GFDL/MOM6@a3e2f1485 fix memory leak in NUOPC State_getImport method (#303)
- NOAA-GFDL/MOM6@37411fb6d Check if fluxes%salt_flux is associated (#301)
- NOAA-GFDL/MOM6@15deea43e *Updates in FPMix and Stokes Most (#283)
- NOAA-GFDL/MOM6@a077a61f9 Use longString in MAX_LAYER_THICKNESS (#299)
- NOAA-GFDL/MOM6@b5641afbe Merge pull request #298 from klindsay28/marbl_stf_salt_fluxes
- NOAA-GFDL/MOM6@f8f76f294 remove doxygen formatting on local variables
- NOAA-GFDL/MOM6@621107b99 Modify NUOPC cap to accept separate glc runoff fluxes (#288)
- NOAA-GFDL/MOM6@4acea916f MARBL: convert salt_flux to tracer flux and add to STF
- NOAA-GFDL/MOM6@5904666ef Options to enforce KHTR_MIN and KHTH_MIN in the whole water column and fix naming inconsistency (#294)
- NOAA-GFDL/MOM6@e96a46e40 Merge pull request #291 from gustavo-marques/merge_main_073024
- NOAA-GFDL/MOM6@225c0d829 remove incorrect comma in write statements (#293)
- NOAA-GFDL/MOM6@7afbb6df3 Add missing missing_scale argument to read_Z_edges calls
- NOAA-GFDL/MOM6@69b0454b9 Merge branch 'dev/ncar' into merge_main_073024
- NOAA-GFDL/MOM6@8b9ba9767 Add MARBL to MOM6 (#157)
- NOAA-GFDL/MOM6@e413c299d Adding Ohlmann solar penetration scheme to MOM_opacity (#289)
- NOAA-GFDL/MOM6@df7b5803b Merge branch 'main' into merge_main_073024
- NOAA-GFDL/MOM6@9ff6ca419 Scale checksums in hor_bnd_diffusion
- NOAA-GFDL/MOM6@99edf23b0 *+Make Leith viscosity runs layout-invariant
- NOAA-GFDL/MOM6@8a03b63c2 append ensemble num to geom filename in ensemble runs
- NOAA-GFDL/MOM6@51a98c7c4 +Reproducing KPP_smooth_BLD when KPP%N_SMOOTH > 1
- NOAA-GFDL/MOM6@6ad1530c3 Stochastic GM+E (#280)
- NOAA-GFDL/MOM6@2b1201a87 KE-conserving correction to velocity remap (#277)
- NOAA-GFDL/MOM6@95259e42f Revert a comment that was changed unintentionally.
- NOAA-GFDL/MOM6@0d5584158 Add MEKE_positive to the control structure
- NOAA-GFDL/MOM6@a7725dcad Improve description
- NOAA-GFDL/MOM6@4584e5ebb Add option to avoid negative MEKE
- NOAA-GFDL/MOM6@d16c330a0 Introduce GEOM_FILE runtime parameter to set ocean_geometry file name.
- NOAA-GFDL/MOM6@a30e7c8d9 Disable codecov upload requirement
- NOAA-GFDL/MOM6@f4121ca39 Write unmasked ocean geometry files
- NOAA-GFDL/MOM6@db64408fd Enable relative path specification for IC files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants