Skip to content

Updates for Gaea C6 following OS upgrade#4110

Merged
DavidHuber-NOAA merged 9 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:update/c6
Oct 8, 2025
Merged

Updates for Gaea C6 following OS upgrade#4110
DavidHuber-NOAA merged 9 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:update/c6

Conversation

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor

Description

This PR will hold all updates needed for Gaea C6 following the OS upgrade last week. So far, only one module file needs to be updated. Full testing will start shortly to determine if any other issues need to be addressed.

Resolves #4102

Type of change

  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? YES/NO
  • Does this change require a documentation update? YES/NO
  • Does this change require an update to any of the following submodules? YES/NO (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

Will test a full suite on C6

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

Local testing on C6 is underway.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

C6 may not be fully stable yet. When running generate_workflows.sh -GESC -b -u, some of the builds failed. But when I launch the builds individually, they ran fine. Perhaps this is a new CPU limit? I will continue testing and open an issue with GFDL.

@JessicaMeixner-NOAA
Copy link
Copy Markdown
Contributor

The following change allowed me to run a C1152 forecast:

diff --git a/env/GAEAC6.env b/env/GAEAC6.env
index 4920f156..9634e6d9 100755
--- a/env/GAEAC6.env
+++ b/env/GAEAC6.env
@@ -258,11 +258,10 @@ case ${step} in
 
     export MPICH_COLL_SYNC=MPI_Bcast
     export FI_VERBS_PREFER_XRC=0
-    export FI_CXI_RX_MATCH_MODE=hybrid
+    export FI_CXI_RX_MATCH_MODE=software
     export COMEX_EAGER_THRESHOLD=65536
     export FI_CXI_RDZV_THRESHOLD=65536
     export FI_CXI_DEFAULT_CQ_SIZE=1048576

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

DavidHuber-NOAA commented Oct 6, 2025

Launching a C96C48_ufs_hybatmDA C96C48mx500_S2SW_cyc_gfs case on C6. If that is successful, I will launch a full automated suite.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

@RussTreadon-NOAA It looks like some updates may be required to the workflow for the updated GDASApp hash. The C96C48mx500_S2SW_cyc_gfs case failed at the gdas_marinebmat task with the following error:

2025-10-06 11:39:11,846 - DEBUG    - marine_bmat : ( <pygfs.task.marine_bmat.MarineBMat object at 0x1495811db710> )
Traceback (most recent call last):
  File "/gpfs/f6/drsa-hurr1/scratch/David.Huber/GW/gw_upd_ss/scripts/exglobal_marinebmat.py", line 23, in <module>
    marineBMat.finalize()
  File "/gpfs/f6/drsa-hurr1/scratch/David.Huber/GW/gw_upd_ss/sorc/wxflow/src/wxflow/logger.py", line 252, in wrapper
    retval = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/f6/drsa-hurr1/scratch/David.Huber/GW/gw_upd_ss/ush/python/pygfs/task/marine_bmat.py", line 217, in finalize
    os.rename(os.path.join(self.task_config.DATAstaticb, f"ocn.bkgerr_stddev.incr.{self.task_config.MARINE_WINDOW_END_ISO}.nc"),
FileNotFoundError: [Errno 2] No such file or directory: '/gpfs/f6/drsa-precip4/world-shared/David.Huber/RUNDIRS/C96C48mx500_S2SW_cyc_gfs_4110/gdas.2021122018/marineanalysis.2021122018/staticb/ocn.bkgerr_stddev.incr.2021-12-20T21:00:00Z.nc' -> '/gpfs/f6/drsa-precip4/world-shared/David.Huber/RUNDIRS/C96C48mx500_S2SW_cyc_gfs_4110/gdas.2021122018/marineanalysis.2021122018/staticb/ocn.bkgerr_stddev.nc'

Looking in the /gpfs/f6/drsa-precip4/world-shared/David.Huber/RUNDIRS/C96C48mx500_S2SW_cyc_gfs_4110/gdas.2021122018/marineanalysis.2021122018/staticb directory, I see these files:

-rw-r--r-- 1 David.Huber drsa-precip4   80968 Oct  6 11:39 hz_ocean.nc                                                                                                                                                                    
-rw-r--r-- 1 David.Huber drsa-precip4   87165 Oct  6 11:39 ice.bkgerr_ens_stddev.incr.2021-12-20T15:00:00Z.nc                                                                                                                             
-rw-r--r-- 1 David.Huber drsa-precip4   87172 Oct  6 11:39 ice.bkgerr_parametric_stddev.incr.2021-12-20T21:00:00Z.nc                                                                                                                      
-rw-r--r-- 1 David.Huber drsa-precip4   87156 Oct  6 11:39 ice.ens_mean.incr.2021-12-20T15:00:00Z.nc                                                                                                                                      
-rw-r--r-- 1 David.Huber drsa-precip4   87169 Oct  6 11:39 ice.ssh_recentering_error.incr.2021-12-20T15:00:00Z.nc                                                                                                                         
-rw-r--r-- 1 David.Huber drsa-precip4 2585100 Oct  6 11:39 ocn.bkgerr_ens_stddev.incr.2021-12-20T15:00:00Z.nc                                                                                                                             
-rw-r--r-- 1 David.Huber drsa-precip4 1565063 Oct  6 11:39 ocn.bkgerr_parametric_stddev.incr.2021-12-20T21:00:00Z.nc                                                                                                                      
-rw-r--r-- 1 David.Huber drsa-precip4 2585091 Oct  6 11:39 ocn.ens_mean.incr.2021-12-20T15:00:00Z.nc                                                                                                                                      
-rw-r--r-- 1 David.Huber drsa-precip4 2585104 Oct  6 11:39 ocn.ssh_recentering_error.incr.2021-12-20T15:00:00Z.nc                                                                                                                         
-rw-r--r-- 1 David.Huber drsa-precip4   38841 Oct  6 11:39 ocn.ssh_steric_stddev.incr.2021-12-20T15:00:00Z.nc                                                                                                                             
-rw-r--r-- 1 David.Huber drsa-precip4   38840 Oct  6 11:39 ocn.ssh_total_stddev.incr.2021-12-20T15:00:00Z.nc                                                                                                                              
-rw-r--r-- 1 David.Huber drsa-precip4   38840 Oct  6 11:39 ocn.ssh_unbal_stddev.incr.2021-12-20T15:00:00Z.nc                                                                                                                              
-rw-r--r-- 1 David.Huber drsa-precip4   38849 Oct  6 11:39 ocn.steric_explained_variance.incr.2021-12-20T15:00:00Z.nc                                                                                                                     
-rw-r--r-- 1 David.Huber drsa-precip4 1048648 Oct  6 11:39 vt_ocean.nc

Is there perhaps a parm file that needs to be updated?

@RussTreadon-NOAA
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA : Yes, GDASApp develop is ahead of g-w develop. As noted in GDASApp PR #1927, GDASApp requires the changes in g-w PR #4120.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

Ah, thanks @RussTreadon-NOAA. I'll review that PR and help it move through the process.

@JessicaMeixner-NOAA
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA - Have you ran the CI on this?

I'm still having trouble at C1152mx025 for the marineanlvar - I've tried adding more nodes, I've tried adding more MPI tasks. @guillaumevernieres is looking at things too. Curious if you have any suggestions.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

@JessicaMeixner-NOAA I am running this locally on C6 now.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

All tests passed on C6 except C96C48mx500_S2SW_gfs_cyc, which is still running. However, all GDAS and EnKF* cycles are complete for that test and only a few GFS post jobs are still to be run on the last cycle. I believe this PR is ready for review. Marking as passed CI on C6 (cm).

@DavidHuber-NOAA DavidHuber-NOAA added the CI-Gaeac6-Passed (cm) Manual CI passed on Gaea C6 label Oct 8, 2025
@JessicaMeixner-NOAA
Copy link
Copy Markdown
Contributor

Awesome! Thank you @DavidHuber-NOAA

I think @guillaumevernieres traced back the error I was seeing to an insitu obs issues.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

@aerorahul @RussTreadon-NOAA noting that this PR bumps the GDASApp hash, it should probably be tested on additional platforms, but given that this will restore C6 use, I would like to expedite this PR. Can we test develop after the merge on Ursa/WCOSS2?

Copy link
Copy Markdown
Contributor

@JessicaMeixner-NOAA JessicaMeixner-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @DavidHuber-NOAA

@RussTreadon-NOAA
Copy link
Copy Markdown
Contributor

I'm fine with updating the sorc/gdas.cd hash in this PR to gdas.cd @ 811d86b. This hash is two behind the current head of GDASApp develop. g-w PR #4080 moves sorc/gdas.cd to the current head of GDASApp develop.

I am currently running g-w CI on Cactus. So far, so good.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

Thanks @RussTreadon-NOAA. Based on that knowledge, I am going to go ahead and merge.

@DavidHuber-NOAA DavidHuber-NOAA merged commit 0485aa0 into NOAA-EMC:develop Oct 8, 2025
5 checks passed
weihuang-jedi added a commit to NOAA-EPIC/global-workflow-cloud that referenced this pull request Oct 9, 2025
…into feature/adjust_tasks_per_node_layout

* 'develop' of github.com:NOAA-EPIC/global-workflow-cloud:
  Ctest case updates (NOAA-EMC#4118)
  Consolidate load_*_modules scripts into a generic load_modules.sh script (NOAA-EMC#4126)
  Updates for Gaea C6 following OS upgrade (NOAA-EMC#4110)
weihuang-jedi added a commit to NOAA-EPIC/global-workflow-cloud that referenced this pull request Nov 5, 2025
…NOAA-EPIC/global-workflow-cloud into feature/use_container_spack-stack-1.9.2

* 'feature/use_container_spack-stack-1.9.2' of github.com:NOAA-EPIC/global-workflow-cloud:
  remove env/*.container
  testing on AWS
  no need to save to repo, as it is a link
  add PYCMD
  merge develop change in
  Consolidate JEDI-based atmospheric analysis task configuration YAMLs and create new Analysis class (NOAA-EMC#4080)
  Ctest case updates (NOAA-EMC#4118)
  using PYCMD
  fix archive script
  Consolidate load_*_modules scripts into a generic load_modules.sh script (NOAA-EMC#4126)
  Updates for Gaea C6 following OS upgrade (NOAA-EMC#4110)
  combine few scripts to decrease numbers
  reverse to GW repo code, and new way to handle jobs scripts
  Correct parametric and ensemble background error statistics filenames in marine DA (NOAA-EMC#4120)
@DavidHuber-NOAA DavidHuber-NOAA deleted the update/c6 branch November 18, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Gaeac6-Passed (cm) Manual CI passed on Gaea C6

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing libs on gaeac6

3 participants