Fix modulefiles for Hera/Rocky8 OS.#2194
Conversation
|
This should be tested on Rocky login nodes! hfe09-hfe12. |
|
@RatkoVasic-NOAA With this module update, we can only run UFS weather model code on hfe09-hfe12, can we run the model on other hera nodes when your PR is committed? Thanks |
@junwang-noaa , no. You cannot use both. But, you can use old modulefiles ( |
|
@RatkoVasic-NOAA can you continue to sync up branch? we may need to schedule this pr tomorrow. |
Done. |
|
The It is here: HERA: Big long backtrace |
|
The cpld_control_p8_gnu and cpld_debug_p8_gnu both fail with this message: |
@SamuelTrahanNOAA all tests pass on my side. @zach1221 @FernandoAndrade-NOAA can you test gnu cases on hera/rocky8? |
|
Did your tests pass on the first try or did you have to rerun them? |
It passed with the first try. A few other people are running gnu cases now. We can confirm. |
I am also receiving this error on Hera, for cpld_control_p8_gnu and cpld_debug_p8_gnu. |
|
@RatkoVasic-NOAA it sounds like different result with case-by-case. Some nodes still heterogeneous? openmpi or gcc version issue? |
|
@zach1221 - Can you reproduce the error I saw with control_wam_debug_gnu? It may have been caused by the job being sent to the wrong service (login) node. |
|
I am not sure if we are triggering -mcmodel=medium on hera/gnu. |
|
For ecflow, if used: I have a feeling maybe the ECF_HOST env var on hera isn't set properly with this transition? I logged into a rocky8 node and 'module load ecflow' and 'printenv | grep ECF' and only the _ROOT env var showed up. Try manually setting the hera ecflow ECF_HOST var to (i think) hfe12 and see if that helps? (if needed) |
With my first test, where cpld_control_p8_gnu and cpld_debug_p8_gnu failed, control_wam_debug_gnu actually passed. I'm retesting now with some changes to cmake.gnu . |
|
I used Rocoto and saw those bugs. That means the problem is not specific to ecFlow. |
|
@climbfuji I am not sure about OSC pt2pt issue. I vaguely remember a similar issue was seen with openmpi on Hercules. Do you remember? @RatkoVasic-NOAA @ulmononian Any comment? |
|
@jkbk2004 I'm looking into this right now. I haven't seen this error message before. |
|
@jkbk2004 forcing to run on nodes 5-12 didnt work, failing with same OSC pt2pt error. The GNU.cmake update test timed out, so running it again with manually extended time. Update: the cpld_control_p8_gnu test failed with same error after adding -mcmodel=large & medium to gnu.cmake. |
As @climbfuji explained, I'm not going to start working on GNU 13 until all packages are working with this version. Though, I will try on my personal space in the meantime. |
|
Do you have to use OpenMPI for this? Can't you use an MPICH derivative instead? |
If you want to use mapl@2.40.3 then you can't use mpich@4 - don't remember when the bug fix in mapl was merged that allows using mpich@4 |
Gnu 12.2 is fine with me. I wasn't sure how complicated it would make things on hera. Important to get this started ASAP. |
mapl@2.42.0 works with mpich@4 - https://github.com/GEOS-ESM/MAPL/releases/tag/v2.42.0 |
We don't have 12.2 on Hera/Rocky. Only 9.2.0 and 13.2.0 (for now). |
How long would it take their SA to install 12.2? Would it be easier to wait for that over trying to get 13.2 working with spack stack? |
That is good question, meaning: I don't know the answer ;-) I will first try with 13.2.0 and see what we need only for WM. |
@RatkoVasic-NOAA It won't work since mapl doesn't work with 13.2 and you need that for the UFSWM. The last change for gnu@13 for mapl was apparently merged last week, there isn't even a release yet - GEOS-ESM/MAPL#2640. - EDIT this was for mapl@3. I don't know which tag if any of mapl@2 works with gnu@13 - @mathomp4 probably knows. |
At the moment no official release of MAPL 2 works with GCC 13. But, MAPL That said, if needed we could release MAPL 2.45 with those fixes...but not that at the moment MAPL 2.44+ doesn't build in spack. That is due to the Footnotes
|
|
I didn't think we needed to run all systems. No code touches any other system. I thought Hercules was a special case because of the changes to the cpld tests. |
Ok, I was just being safe. We don't have to finish jet and wcoss2/acorn if you don't think it's necessary. But yes, you're correct, only hercules/hera had changes. |
|
@BrianCurtis-NOAA @DeniseWorthen @jkbk2004 testing is complete. Feel free to provide final review. |
|
@aerorahul We moved to rocky8. FYI: we will revisit about the gnu/openmpi issue on rocky8. |
Commit Queue Requirements:
Description:
Hera is switching to new OS. This is update to enable ufs-weather-model to run on Rocky8 OS.
Necessary changes are made to spack-stack libraries.
NOTE! Since different version of openmpi is used, results change when using GNU compiler.
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Library changes are included in this PR (spack-stack).
Testing Log: