Skip to content

Call err_chk/err_exit for fatal errors in post JJobs/ex-scripts#3571

Merged
KateFriedman-NOAA merged 14 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:feature/post_err
Apr 16, 2025
Merged

Call err_chk/err_exit for fatal errors in post JJobs/ex-scripts#3571
KateFriedman-NOAA merged 14 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:feature/post_err

Conversation

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Apr 10, 2025

Description

This PR replaces exit calls with err_chk calls in the post-processing J-Jobs.

Resolves #3511
Resolves #3453
Refs #294

Type of change

  • NCO Bug fix (fixes something broken)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

C48mx500_hybAOWCDA test on Hercules

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

This PR should stay in draft mode until #3570 is merged and it is fully tested on WCOSS2.

@DavidHuber-NOAA DavidHuber-NOAA requested a review from Copilot April 11, 2025 12:09

This comment was marked as resolved.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

All tests passed on WCOSS2.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

I spoke too soon. Extending the extended test another couple cycles produced a failure in Fit2Obs. I'll look into that.

@DavidHuber-NOAA DavidHuber-NOAA requested a review from Copilot April 11, 2025 15:15

This comment was marked as resolved.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

The fit2obs issue has been fixed. All tests now pass on WCOSS2. This PR is now ready for review. Marking as such.

@emcbot emcbot removed the CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 label Apr 11, 2025
@DavidHuber-NOAA DavidHuber-NOAA added CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 and removed CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. labels Apr 11, 2025
@emcbot emcbot added CI-Gaeac6-Running CI-Gaeac6-Passed **Bot use only** CI testing on Gaea C6 for this PR has completed successfully and removed CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 CI-Gaeac6-Running labels Apr 11, 2025
@emcbot
Copy link
Copy Markdown

emcbot commented Apr 11, 2025

CI Passed on Gaeac6 in Build# 1
Built and ran in directory /gpfs/f6/drsa-precip3/world-shared/global/CI/3571


Experiment C48_ATM_cf43c327 Completed 1 Cycles: *SUCCESS* at Fri 11 Apr 2025 04:36:10 PM EDT
Experiment C48_S2SW_cf43c327 Completed 1 Cycles: *SUCCESS* at Fri 11 Apr 2025 04:41:49 PM EDT
Experiment C48mx500_hybAOWCDA_cf43c327 Completed 2 Cycles: *SUCCESS* at Fri 11 Apr 2025 04:54:28 PM EDT
Experiment C48mx500_3DVarAOWCDA_cf43c327 Completed 2 Cycles: *SUCCESS* at Fri 11 Apr 2025 05:37:20 PM EDT
Experiment C48_S2SWA_gefs_cf43c327 Completed 1 Cycles: *SUCCESS* at Fri 11 Apr 2025 05:37:22 PM EDT
Experiment C96_atm3DVar_cf43c327 Completed 3 Cycles: *SUCCESS* at Fri 11 Apr 2025 05:42:42 PM EDT
Experiment C96C48_hybatmDA_cf43c327 Completed 3 Cycles: *SUCCESS* at Fri 11 Apr 2025 05:55:35 PM EDT
Experiment C96C48_hybatmaerosnowDA_cf43c327 Completed 3 Cycles: *SUCCESS* at Fri 11 Apr 2025 06:13:55 PM EDT

@aerorahul
Copy link
Copy Markdown
Contributor

An inspection of the runs on WCOSS shows that all tests except C96_atm3DVar_extended_post_err have passed successfully.
C96_atm3DVar_extended_post_err test is still working its way through fit2obs jobs.
2021122212z cycle reports gdas_fit2obs as dead, but the associated log file seems to show success. A rewind and resubmit might be necessary.

@KateFriedman-NOAA KateFriedman-NOAA added CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. and removed CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 labels Apr 15, 2025
@emcbot emcbot added CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 CI-Wcoss2-Failed CI testing on WCOSS for this PR has failed and removed CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 labels Apr 15, 2025
@KateFriedman-NOAA KateFriedman-NOAA added CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 and removed CI-Wcoss2-Failed CI testing on WCOSS for this PR has failed labels Apr 15, 2025
@emcbot emcbot added CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 labels Apr 15, 2025
@emcbot
Copy link
Copy Markdown

emcbot commented Apr 15, 2025

CI Tests set up to run in /lfs/h2/emc/ptmp/emc.global/PR/PR_3571/RUNTESTS on WCOSS

@KateFriedman-NOAA
Copy link
Copy Markdown
Contributor

CI testing on WCOSS2 completed successfully:

Wed Apr 16 16:57:00 UTC 2025
******** C48_ATM_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Apr 15 2025 19:47:08    Apr 15 2025 20:55:30

******** C48mx500_3DVarAOWCDA_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103241800        Done    Apr 15 2025 19:47:10    Apr 15 2025 20:05:38
202103250000        Done    Apr 15 2025 19:47:10    Apr 15 2025 21:40:56

******** C48mx500_hybAOWCDA_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103241800        Done    Apr 15 2025 19:47:12    Apr 15 2025 20:05:43
202103250000        Done    Apr 15 2025 19:47:12    Apr 15 2025 21:11:03

******** C48_S2SW_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Apr 15 2025 19:47:14    Apr 15 2025 21:00:57

******** C48_S2SWA_gefs_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Apr 15 2025 19:47:16    Apr 15 2025 21:30:47

******** C96_atm3DVar_extended_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201800        Done    Apr 15 2025 19:47:19    Apr 15 2025 20:06:02
202112210000        Done    Apr 15 2025 19:47:19    Apr 16 2025 00:16:04
202112210600        Done    Apr 15 2025 19:47:19    Apr 16 2025 01:05:40
202112211200        Done    Apr 15 2025 20:10:48    Apr 16 2025 01:55:51
202112211800        Done    Apr 16 2025 00:20:36    Apr 16 2025 05:05:45

******** C96C48_hybatmaerosnowDA_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201200        Done    Apr 15 2025 19:47:21    Apr 15 2025 20:10:53
202112201800        Done    Apr 15 2025 19:47:21    Apr 15 2025 22:16:07
202112210000        Done    Apr 15 2025 19:47:21    Apr 15 2025 22:06:00

******** C96C48_hybatmDA_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201800        Done    Apr 15 2025 19:47:24    Apr 15 2025 20:06:11
202112210000        Done    Apr 15 2025 19:47:24    Apr 15 2025 21:51:06
202112210600        Done    Apr 15 2025 19:47:24    Apr 15 2025 21:55:49

******** C96mx100_S2S_3571 ********
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
199405010000        Done    Apr 15 2025 19:47:26    Apr 15 2025 21:21:04

@KateFriedman-NOAA KateFriedman-NOAA added CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress labels Apr 16, 2025
@KateFriedman-NOAA KateFriedman-NOAA merged commit bad16cd into NOAA-EMC:develop Apr 16, 2025
tsga added a commit to tsga/global-workflow that referenced this pull request May 1, 2025
* develop:
  Update GSI hash and GSI fix version to resolve bugs (NOAA-EMC#3626)
  Add missing marine DA files to archiving  (NOAA-EMC#3596)
  Add a low resolution test to mimic GFSv17 cycling as much as possible (NOAA-EMC#3617)
  Add the setting to use the reject list for station t/q observations in GSI based soil DA (NOAA-EMC#3599)
  GitLab CI Framework for schedule PR cases and ctests on multi hosts (NOAA-EMC#3603)
  Avoid parallel restart I/O on WCOSS2 (NOAA-EMC#3615)
  Enables user toggling of GDASApp g-w ctests (NOAA-EMC#3587)
  COM variable updates for prep and some external downstream jobs (NOAA-EMC#3608)
  Remove MOS from system (NOAA-EMC#3612)
  Updates to enable soil DA  (NOAA-EMC#3452)
  Unexport SHELLOPTS when running htar (NOAA-EMC#3601)
  Fix check for netcdf wave restart (NOAA-EMC#3594)
  Call err_chk/err_exit for fatal errors in post JJobs/ex-scripts (NOAA-EMC#3571)
  Remove support for Jet and S4 (NOAA-EMC#3572)
  Hotfix in GitLab pipline for Nightly (env MACHINE breaks build on head node) (NOAA-EMC#3578)
  [hotfix] Missed a path during merging develop (NOAA-EMC#3577)
  Prepare for ops readiness - part 1 (NOAA-EMC#3557)
  Update UFS weather-model to 20250328 hash (NOAA-EMC#3528)
  Fix SFS fcst config (NOAA-EMC#3574)
  Use err_chk in GDAS j-jobs (NOAA-EMC#3570)
  Perform compute builds on Gaea head nodes (NOAA-EMC#3560)
  Add initial capability to produce JEDI-based observation space summary stat files (NOAA-EMC#3471)
  Spread epos over more nodes on Hera to increase allocated memory (NOAA-EMC#3567)
  Create separate gists when multiple files are published on GitHub (NOAA-EMC#3551)
  Use err_chk in GSI J-Jobs and scripts (NOAA-EMC#3549)
  Add unified jinja obs list to marine DA (NOAA-EMC#3530)
  Save snow and aerosol analysis increments (and logs and YAMLs) every cycle (NOAA-EMC#3537)
  Add Dependencies to SFS Cleanup Job (NOAA-EMC#3559)
  Updates archiving to reflect current naming of marine anl output (NOAA-EMC#3541)
  Temporarily disable compute builds on C6 (NOAA-EMC#3558)
  Update gdas.cd hash to resolve msu prod_util failure (NOAA-EMC#3556)
  COMIN/COMOUT updates for enkf chgres and downstream product jobs (NOAA-EMC#3518)
  Call err_chk in forecast scripts for fatal errors (NOAA-EMC#3515)
  Add Rocoto Jobs for the Missing Products of GEFS (NOAA-EMC#3466)
  Download subset fix data with python script (NOAA-EMC#3400)
  Check that partition should be set (NOAA-EMC#3543)
  Rename wave output and refactor some wave scripts to use MPMD, and fix some bugzillas along the way (NOAA-EMC#3517)
  Add support for dual batch partitions on AWS NOAA-EMC#3483
  Update CI build and run directories for GitLab Nightlies on C6 and added GitLab support on Hera (NOAA-EMC#3536)
  Hotfix path for CI in Jenkins on Gaea C6 to it's world-share path (NOAA-EMC#3532)
  Create single ocean grib2 product file (NOAA-EMC#3529)
  Scheduled Nightly CI/CD Pipeline Script in GitLab on Gaea C6 (NOAA-EMC#3493)
  make sure cold starts are handled correctly when DOIAU=YES (issue NOAA-EMC#3516) (NOAA-EMC#3520)
  Add check for DO_AERO_FCST before copying fv_tracer files (NOAA-EMC#3485)
  Use jinja templates instead of `@VARNAME@` in config files (NOAA-EMC#3411)
  Replace "status" (or comparable) with "err" in preparation for moving to err_chk/err_exit (NOAA-EMC#3507)
  Error in Java launch script for CI (NOAA-EMC#3465)
  Delete DATAROOT when running generate_workflows.sh (NOAA-EMC#3504)
  Fix 3244 garbled change (NOAA-EMC#3492)
  Enable ensemble archiving via Globus (NOAA-EMC#3479)
  Update MSU FIX_DIR paths (NOAA-EMC#3488)
  Updates for AOWCDA and hybatmaerosnowDA cases on Gaea C6 (NOAA-EMC#3487)
  Update GOCART path for GDAS/GFS/GCAFS implementations  (NOAA-EMC#3455)
  Make RUN Variables Explicit in `config.resources` (NOAA-EMC#3478)
  Remove unused key from enkfgdas_earc_vrfy (NOAA-EMC#3473)
  Bug fix to the failing early cycle marine DA ensemble re-centering (NOAA-EMC#3454)
  Make marine LETKF optional (NOAA-EMC#3462)
  When sourcing for RUN=enkf*, use CASE_ENS (NOAA-EMC#3475)
  Updates for Gaea: verif-global tag, tracker tag, Fit2Obs tag, and C768 analysis resources (NOAA-EMC#3463)
  Update gefswave glo_025 mesh file with new mask (NOAA-EMC#3457)
  Update MSU glopara paths to new role-global space (NOAA-EMC#3443)
  Enable CI testing on AWS (NOAA-EMC#3459)
  Enable Gaea C5 Jenkins CI (NOAA-EMC#3447)
  Job reference removal from WMO product names (NOAA-EMC#3460)
  Turn off aerosol prognostic radiative feedback for GDAS NOAA-EMC#2926 (NOAA-EMC#3445)
  Add DO_GEMPAK check to postsnd subtask (NOAA-EMC#3451)
  Add a force option to setup_xml to ignore unwritable directories (NOAA-EMC#3448)
  Remove the eomg job (NOAA-EMC#3331)
  Migration to role account for Jenkins on Orion (NOAA-EMC#3440)
  Eliminate `_gfs`, `_gdas`, etc, variables and add necessary if blocks (NOAA-EMC#3420)
  Update workflow staging for sfcanl tiles and waveinit (NOAA-EMC#3429)
  Improve messaging to display clear warning when missing snogrb file (NOAA-EMC#3317)
  JEDI-based ensemble recentering and analysis calculation (NOAA-EMC#3312)
  Enable HPSS archiving on C5/6 (NOAA-EMC#3437)
  Check if HOMEDIR STMP and PTMP are writable (NOAA-EMC#3430)
  Update UFS_Utils and GFS-utils hashes to update Gaea support and ocean/ice post products (NOAA-EMC#3433)
  Enable C1152 forecasts on gaea C6 (NOAA-EMC#3438)
  Migration to role account for Jenkins on Hercules (NOAA-EMC#3423)
  Remove Direct Linking to COM from DATA for `extractvars` Job (NOAA-EMC#3379)
  Enable HPSS via Globus on Hercules and Orion
  Remove job name from product files & update GEMPAK module. (NOAA-EMC#3415)
  `link` instead of `copy` in staging jobs (NOAA-EMC#3410)
  Migrate CI Jenkins to role account on Hera (NOAA-EMC#3414)
  Add rocotorc documentation when using scrontab (NOAA-EMC#3417)
  Update jgdas atmos verfozn and verfrad with COMIN/COMOUT prefix instead of COM (NOAA-EMC#3342)
  Add configuration for empirically-corrected ozone parameters (NOAA-EMC#3386)
  Enable global-workflow to run C768C384 GSI on Gaea-C6 (NOAA-EMC#3412)
  Move logical checks into if blocks (NOAA-EMC#3339)
  Adding Jenkins CI to GaeaC6 using role account (NOAA-EMC#3389)
  Enable GDASApp g-w CI cases to run on wcoss2 (NOAA-EMC#3399)
  CI/CD Test on Gaea C5- And update config.gaea under ci/platform (NOAA-EMC#3280)
  Enable cycling support for Gaea C6 (NOAA-EMC#3323)
  Update enkf archive jobs to use COMIN/COMOUT (NOAA-EMC#3393)
  Copy marine ensemble output observation diags and spread (NOAA-EMC#3407)
  Ci testing on aws 2 (NOAA-EMC#3408)
  Disable METplus jobs on Hera (NOAA-EMC#3403)
  Add the mean EnKF soil increment to the deterministic member (NOAA-EMC#3295)
  Add mpich/8.1.19 to the WCOSS2 LD_LIBRARY_PATH for GDASApp jobs (NOAA-EMC#3396)
  Change order of RUNs (NOAA-EMC#3335)
  CI testing on aws (NOAA-EMC#3391)
  Rename Gulf of Mexico in bufr station list in GFSv17 (NOAA-EMC#3384)
  Enabling AWS CI/testing (NOAA-EMC#3383)
  Update issue templates to use new issue type field (NOAA-EMC#3369)
  Replace WAVECUR_DID variable with "rtofs" (NOAA-EMC#3337)
  Allow for C1152 ATM-Aero cycled DA to run on WCOSS2 (NOAA-EMC#3309)
  Remove Direct Linking to COM from DATA for `wavepostsbs` Job (NOAA-EMC#3303)
  Update jgdas enkf update job with COMIN or COMOUT prefix instead of COM (NOAA-EMC#3333)
  Add capability to run diff resolutions for marine anl and background (NOAA-EMC#3238)
  Update high resolution tests and fix minor wave issues  (NOAA-EMC#3289)
  Add sfs as valid system (NOAA-EMC#3243)
  Add missing arch_tars dependencies (NOAA-EMC#3319)
  Fix the empty aerosol DA aerostat tar file issue (NOAA-EMC#3332)
  Add missing file safeguard for IMS prep in snow analysis tasks (NOAA-EMC#3329)
  Fix memory unsetting on Gaea (NOAA-EMC#3325)
  Fix error log parsing in compute build CI (NOAA-EMC#3301)
  Remove marineanlvrfy task from global-workflow (NOAA-EMC#3314)
  Add `gfs_wavepostpnt` dependencies to gfs_cleanup (NOAA-EMC#3313)
  Increase the GDASApp build wallclock (NOAA-EMC#3298)
  Capture build fail in Jenkins pipeline when no error logs are produced (NOAA-EMC#3297)
  Add/update config files for Gaea and check existence before sourcing config files in generate_workflows.sh (NOAA-EMC#3286)
  Fix ocean restarts when cold starting with DOIAU=YES (NOAA-EMC#3278)
  Splitting up the archive task (NOAA-EMC#3242)
  CTests extended validation for C48_ATM and staged C48_S2SW for gfs_fcst and gfs_atmos (NOAA-EMC#3256)
  Add esnowanl to enkfgfs cycle (NOAA-EMC#3283)
  Add gfs cycles to C48mx500_3DVarAOWCDA (NOAA-EMC#3249)
  Add fetch job and update stage_ic to work with fetched ICs (NOAA-EMC#3141)
  Remove WAFS files and references from `develop` (NOAA-EMC#3263)
  fix intel stack version number on c5 (NOAA-EMC#3258)
  Update gsi_monitor and ufs_utils hashes to recent hashes for C5/C6 build and run (NOAA-EMC#3252)
  Enable DA cycling on gaea C5/C6 (NOAA-EMC#3255)
  Copy post-processed sea ice increment for diagnostics (NOAA-EMC#3235)
@DavidHuber-NOAA DavidHuber-NOAA deleted the feature/post_err branch May 21, 2025 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Gaeac6-Passed **Bot use only** CI testing on Gaea C6 for this PR has completed successfully CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invoke err_exit/err_chk in all post-processing jobs Update issue templates to include C5/6

5 participants