Skip to content

Upgrade to spack-stack 1.9.2#1076

Merged
BrianCurtis-NOAA merged 60 commits into
ufs-community:developfrom
DavidHuber-NOAA:feature/ss_192
Aug 4, 2025
Merged

Upgrade to spack-stack 1.9.2#1076
BrianCurtis-NOAA merged 60 commits into
ufs-community:developfrom
DavidHuber-NOAA:feature/ss_192

Conversation

@DavidHuber-NOAA
Copy link
Copy Markdown
Collaborator

@DavidHuber-NOAA DavidHuber-NOAA commented Jul 8, 2025

DESCRIPTION OF CHANGES:

This upgrades the libraries on all remaining RDHPCS systems to spack-stack 1.9.2 and updates the ip library on WCOSS2. It also restores minimal support for Hera (module files only), though no testing capabilities were reinstated (just enough to be able to run the global workflow).

TESTS CONDUCTED:

If there are changes to the build or source code, the tests below must be conducted. Contact a repository manager if you need assistance.

  • Compile branch on all Tier 1 machines using Intel (Orion, Jet, Ursa, Hera, Hercules and WCOSS2). Done using df16365.
  • Compile branch on Ursa using GNU. Done using df16365.
  • Compile branch in 'Debug' mode on WCOSS2. Done using df16365.
  • Compile with Doxygen on any machine with no errors. Done on WCOSS2 using df16365.
  • Run unit tests locally on any Tier 1 machine. Done on WCOSS2 using df16365. All tests passed.
  • Run relevant consistency tests locally on all Tier 1 machines. See below for details.

Details on consistency test results:

  • WCOSS2 - tested hash df16365. chgres_cube and snow2md failed. Other tests passed. Failures explainable. For details, see: here.
  • Jet and Ursa - tested hash df16365. All tests passed as expected. For details, see: here.
  • Hercules and Orion - tested hash df16365. chgres_cube, cpld_gridgen, grid_gen, ocnice_prep and regrid_sfc failed. Other tests passed. The chgres_cube, grid_gen and regrid_sfc tests showed insignificant differences from the baseline. See here. The cpld_gridgen and ocnice_prep differences from the baseline were also insignificant. See here.

Describe any additional tests performed.

  • Build tests on all supported platforms

ISSUE:

Update source code to access required modules for ip v5.

Fixes ufs-community#1064.
@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

Started on orion-login-3 Commit hash: df16365

regrid_sfc consistency tests FAILED
weight_gen consistency tests PASSED
ocnice_prep consistency tests FAILED
cpld_gridgen consistency tests FAILED
chgres_cube consistency tests FAILED
grid_gen consistency tests FAILED
global_cycle consistency tests PASSED
ice_blend consistency tests FAILED
snow2mdl consistency tests PASSED

@DeniseWorthen - can you please check the Orion tests. @BrianCurtis-NOAA - where is your test directory?

@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

Started on orion-login-3 Commit hash: df16365
regrid_sfc consistency tests FAILED
weight_gen consistency tests PASSED
ocnice_prep consistency tests FAILED
cpld_gridgen consistency tests FAILED
chgres_cube consistency tests FAILED
grid_gen consistency tests FAILED
global_cycle consistency tests PASSED
ice_blend consistency tests FAILED
snow2mdl consistency tests PASSED

@DeniseWorthen - can you please check the Orion tests. @BrianCurtis-NOAA - where is your test directory?

/work2/noaa/stmp/bcurtis/UFS_UTILS_DH/reg-tests/

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

GeorgeGayno-NOAA commented Jul 28, 2025

Test on hercules.

Commit hash df16365

chgres_cube - FAILED
cpld_gridgen - FAILED
global_cycle - PASSED
grid_gen - FAILED
ice_blend - PASSED
ocnice_prep - FAILED
regrid_sfc - FAILED
snow2mdl - PASSED
weight_gen - PASSED

Output from the chgres_cube, grid_gen and regrid_sfc tests were examined. Differences from the baseline files were insignificant. Examples:

chgres_cube - test 1:

+ nccmp -dmfqS out.atm.tile1.nc /work/noaa/nems/role-nems/ufs_utils.hercules/reg_tests/chgres_cube/baseline_data/c96_fv3_restart/out.atm.tile1.nc
Variable Group Count          Sum      AbsSum          Min         Max       Range         Mean      StdDev
liq_wat  /       117 -7.65162e-16 4.47023e-15 -8.88178e-16 4.44089e-16 1.33227e-15 -6.53985e-18 1.15786e-16
ice_wat  /       218 -4.54044e-19 2.05591e-18 -8.67362e-19 4.33681e-19 1.30104e-18 -2.08277e-21 6.88833e-20
rainwat  /         4  3.72966e-17 3.72966e-17  8.67362e-19 2.77556e-17 2.68882e-17  9.32414e-18 1.25767e-17
snowwat  /        34  2.16893e-19 2.68035e-19 -6.77626e-21  2.1684e-19 2.23617e-19  6.37921e-21  3.7344e-20
graupel  /        21  3.05224e-19 3.60652e-19 -1.35525e-20  2.1684e-19 2.30393e-19  1.45345e-20 5.22815e-20

grid_gen - test 3:

+ /apps/contrib/spack-stack/spack-stack-1.9.2/envs/ue-oneapi-2024.1.0/install/oneapi/2024.2.1/nccmp-1.9.0.1-eifnlmm/bin/nccmp -dmfqS C424_grid.tile7.halo0.nc /work/noaa/nems/role-nems/ufs_utils.hercules/reg_tests/grid_gen/baseline_data/gfdl.regional/C424_grid.tile7.halo0.nc
Variable Group Count          Sum      AbsSum          Min         Max       Range         Mean      StdDev
x        /      3999  1.87867e-11 2.85439e-10 -6.25278e-13 7.38964e-13 1.36424e-12  4.69786e-15 9.07429e-14
y        /      9695  3.18519e-11 1.69804e-10 -8.88178e-14 5.25802e-13 6.14619e-13  3.28539e-15  3.5676e-14
dx       /     11643 -9.38307e-08 1.74067e-05 -1.36879e-08  1.3677e-08 2.73649e-08 -8.05898e-12 2.67064e-09
dy       /     11335  3.49901e-08 1.19346e-05  -1.1396e-08 1.13978e-08 2.27938e-08  3.08691e-12 1.46449e-09
area     /      7165     -1.94687     267.947   -0.0721062   0.0721062    0.144212 -0.000271719   0.0380167

+ /apps/contrib/spack-stack/spack-stack-1.9.2/envs/ue-oneapi-2024.1.0/install/oneapi/2024.2.1/nccmp-1.9.0.1-eifnlmm/bin/nccmp -dmfqS C424_oro_data.tile7.halo4.nc /work/noaa/nems/role-nems/ufs_utils.hercules/reg_tests/grid_gen/baseline_data/gfdl.regional/C424_oro_data.tile7.halo4.nc
Variable  Group Count         Sum      AbsSum          Min         Max       Range         Mean      StdDev
land_frac /         4 -0.00111919  0.00132401  -0.00108922 6.68764e-05  0.00115609 -0.000279796 0.000546658
orog_raw  /         5  -0.0135193   0.0890198   -0.0348816   0.0365601   0.0714417  -0.00270386   0.0261376
orog_filt /       171  -0.0132104    0.128978   -0.0157776   0.0138245   0.0296021 -7.72541e-05  0.00214225
stddev    /         8   0.0438357     0.13022   -0.0370731   0.0539932   0.0910664   0.00547947   0.0268153
convexity /         6 -0.00707316   0.0230324    -0.011373   0.0064106   0.0177836  -0.00117886  0.00583655
theta     /         6    0.058197     1.86507    -0.748337    0.947361      1.6957    0.0096995    0.543275
gamma     /         6  -0.0034568  0.00493565  -0.00350338 0.000517428  0.00402081 -0.000576134  0.00147408
sigma     /         6 2.17813e-05 2.44719e-05 -6.82659e-07  1.4198e-05 1.48807e-05  3.63022e-06 5.63749e-06
elvmax    /         6   0.0135239   0.0890015   -0.0365524   0.0348701   0.0714226   0.00225398   0.0234001

regrid_sfc

+ /apps/contrib/spack-stack/spack-stack-1.9.2/envs/ue-oneapi-2024.1.0/install/oneapi/2024.2.1/nccmp-1.9.0.1-eifnlmm/bin/nccmp -dmfqS sfci.tile6.nc /work/noaa/nems/role-nems/ufs_utils.hercules/reg_tests/regrid_sfc/baseline_data/gauss2fv3incr/sfci.tile6.nc
Variable   Group Count          Sum      AbsSum          Min         Max       Range         Mean      StdDev
soilt1_inc /       506 -1.90607e-13 1.60201e-12 -2.44249e-14 2.35367e-14 4.79616e-14 -3.76693e-16 5.11768e-15
soilt2_inc /       509  6.92088e-14 6.34964e-13 -1.44051e-14 1.55986e-14 3.00038e-14   1.3597e-16 2.39103e-15
slc1_inc   /       504 -2.00555e-15 2.51301e-14 -1.17267e-15 8.56086e-16 2.02876e-15 -3.97928e-18 1.13137e-16
slc2_inc   /       509 -4.05034e-16 8.04492e-15  -2.5739e-16 2.70292e-16 5.27681e-16 -7.95745e-19  3.1911e-17

Differences from the baseline on Orion were virtually identical.

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

Started on orion-login-3 Commit hash: df16365

regrid_sfc consistency tests FAILED
weight_gen consistency tests PASSED
ocnice_prep consistency tests FAILED
cpld_gridgen consistency tests FAILED
chgres_cube consistency tests FAILED
grid_gen consistency tests FAILED
global_cycle consistency tests PASSED
ice_blend consistency tests FAILED
snow2mdl consistency tests PASSED

The ice_blend test failed because the script is not loading the wgrib2 and grib-utils modules correctly. @DavidHuber-NOAA - can you make the same update you did to the snow2mdl script at e376089.

@DavidHuber-NOAA
Copy link
Copy Markdown
Collaborator Author

The ice_blend test failed because the script is not loading the wgrib2 and grib-utils modules correctly. @DavidHuber-NOAA - can you make the same update you did to the snow2mdl script at e376089.

Yes, done at 985f8bd.

@DavidHuber-NOAA
Copy link
Copy Markdown
Collaborator Author

@GeorgeGayno-NOAA @BrianCurtis-NOAA is there anything I can do to help move this PR along?

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

@GeorgeGayno-NOAA @BrianCurtis-NOAA is there anything I can do to help move this PR along?

Yes, you can prod @DeniseWorthen to take a look at the cpld_grid and ocnice_prep results on Orion and Hera (smiley face)

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

Sorry, I don't know where the notification went for this...into the ether somewhere. I'll take a look, but I suspect they're fine.

@DavidHuber-NOAA
Copy link
Copy Markdown
Collaborator Author

Thanks @DeniseWorthen!

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

@BrianCurtis-NOAA I see your RT run directory /work2/noaa/stmp/bcurtis/UFS_UTILS_DH/reg-tests/ but where are the log files so I know which files failed comparison?

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

@BrianCurtis-NOAA I see your RT run directory /work2/noaa/stmp/bcurtis/UFS_UTILS_DH/reg-tests/ but where are the log files so I know which files failed comparison?

On hercules, you will find them here: /work2/noaa/da/ggayno/save/UFS_UTILS.huber

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

/work2/noaa/da/ggayno/save/UFS_UTILS.huber

I can't see that directory (permission denied).

@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

@BrianCurtis-NOAA I see your RT run directory /work2/noaa/stmp/bcurtis/UFS_UTILS_DH/reg-tests/ but where are the log files so I know which files failed comparison

/work2/noaa/stmp/bcurtis/UFS_UTILS_DH/UFS_UTILS/reg_tests

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

/work2/noaa/da/ggayno/save/UFS_UTILS.huber

I can't see that directory (permission denied).

I copied the log files here: /work2/noaa/stmp/ggayno/denise

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

@BrianCurtis-NOAA That was the regression test for which platform? I see only orion llvm logs. For both utilities, the orion RTs should run on /work and the hercules RTs should run on /work2, but your orion llvm log shows you're running on /work2, so I'm a bit confused.

@DavidHuber-NOAA I cloned your branch 985f8bd and tried just running the RTs for cpld_gridgen and ocnice_prep against the current baselines myself. The compile failed (-- Configuring incomplete, errors occurred!). I was able to run the ocnice_prep and I'll check the logs.

My clone is /work/noaa/nems/dworthen/utils_dh/

@DavidHuber-NOAA
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen it looks like you may have launched the RTs at about the same time. They both will try to build in the same directory, which is likely to cause problems. In the cpld_gridgen compile.log, I see the error:

 CMake Error: Failed to change directory: No such file or directory

I suspect this is because the build for the ocnice_prep RT just clobbered it.

Could you try running cpld_gridgen again?

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

@DavidHuber-NOAA You're right, I was trying to run them at the same time. But I was able to see what George got for logs after he copied them, so I'll use those tests to see whether any of the changes are of concern.

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

DeniseWorthen commented Jul 31, 2025

For the cpld_gridgen, I checked the runs that George copied for me on hercules (I expect Orion to be similar).

I don't see anything of real concern, given that this PR includes a compiler update. All the field differences I see are quite small; the only thing that really surprises me is that even the the SCRIP file created via NCO has slight differences in the area arrays.

For the ocnice_prep, I'm seeing the same issue as when I checked the new Ursa platform. That is, the mapping is producing slightly different fields; for example with CICE where I am basically just doing 'nearest' mapping, I can see that the largest differences must have come from choosing a different src/dst pair. The result is still valid, it just represents a different src point mapped to the destination. I'm not quite sure why the RH creation is proving this "variable". I will try to look into whether I'm missing an optional flag. The resulting warmstarts appear to be just as valid though.

I think the answer changes are fine. Thanks.

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

I approve for merging.

@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

Looks OK to me.

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

GeorgeGayno-NOAA commented Aug 1, 2025

Post merge duties:

  • Ensure the role account on Hercules uses the latest version of develop and that the rt.sh script is correct. Brian
  • Ensure the role account on all machines (except Hercules) uses the latest version of develop and that the rt.sh script is correct. George
  • Update the chgres_cube and snow2mdl baseline data on WCOSS2. Stage the updates on HPSS so the baseline can be updated after a 'prod' switch (since I just lost 'devonprod' privileges) - George On HPSS: /NCEPDEV/emc-global/1year/George.Gayno/pr1076
  • Update chgres_cube, cpld_gridgen, grid_gen, ocnice_prep and regrid_sfc baselines on Orion. George
  • Update chgres_cube, cpld_gridgen, grid_gen, ocnice_prep and regrid_sfc baselines on Hercules. Brian

@BrianCurtis-NOAA BrianCurtis-NOAA merged commit 4a982af into ufs-community:develop Aug 4, 2025
4 checks passed
@DavidHuber-NOAA DavidHuber-NOAA deleted the feature/ss_192 branch August 15, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove direct link to Intel MPI libraries from WCOSS2 modulefile Update to spack-stack v1.9.2

5 participants