Skip to content

GitLab CI Framework for schedule PR cases and ctests on multi hosts#3603

Merged
aerorahul merged 85 commits into
NOAA-EMC:developfrom
TerrenceMcGuinness-NOAA:ci_gitlab_pipelines_dev
Apr 28, 2025
Merged

GitLab CI Framework for schedule PR cases and ctests on multi hosts#3603
aerorahul merged 85 commits into
NOAA-EMC:developfrom
TerrenceMcGuinness-NOAA:ci_gitlab_pipelines_dev

Conversation

@TerrenceMcGuinness-NOAA
Copy link
Copy Markdown
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA commented Apr 21, 2025

Description

This PR implements a GitLab CI pipeline system to support two testing modes:

  1. Standard full test cases found in $HOMEgfs/dev/ci/cases/pr on a scheduled basis
  2. CTest-based functional tests triggered via GitHub actions found in $HOMEgfs/dev/ctests

While simultaneously supporting Hera and Gaea C6 in an extensible framework.

More detailed information can be found in GitLab README.md file. The primary design principle is to easily add addition modes and cases on a per host basis in a singular and systematic way.

Resolves #3597
Resolves #3499

Some refactoring additions included in this PR inspired by new vertical structure :

  • Replaced all evaluations of HOMEgfs_ to a CI support util (find_homegfs.py)
  • Consolidated ci_utils_wrapper.sh directly into ci_utils.sh to reduce redundancy.
  • Removed unused scripts supporting old BASH CI system
  • Use local variable HOMEgfs_ everywhere instead of HOMEgfs which is defined to be used directly within the runtime infrastructure intrinsically.

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? YES
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

Runthrough Scheduled launch of pipeline on Hera and Gaea C6 simultaneously:

image

Runthrough a launch on Hera and Gaea C6 by setting $GITHUB_API_TRIGGER == "true"
image

Terry McGUinness and others added 30 commits April 8, 2025 06:32
…ne with a common build in the main caller, now all the per-host-case/ctest stuff in one single but seperate file
…for schecheduled CI runs to work with new virtical structure
…an be minipuplated withing the GitLab controller
… out a public svg image to reflect the status of the scheduled runs as badges on GitHub
Comment thread dev/ci/platforms/config.hera Fixed
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added the CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules label Apr 24, 2025
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Apr 24, 2025
aerorahul
aerorahul previously approved these changes Apr 24, 2025
@emcbot emcbot added CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Apr 25, 2025
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Thanks Terry!

…and added unit tests for it and for parse_yaml, added a couple of featchers to support bash
@TerrenceMcGuinness-NOAA
Copy link
Copy Markdown
Collaborator Author

TerrenceMcGuinness-NOAA commented Apr 25, 2025

@aerorahul I found a bug in the yaml parser for BASH that inspired me to add a few Unit Tests for that and the util for getting HOMEgfs, so maybe you can take one more look at that before we merge.

rootdir: /home/runner/work/global-workflow/global-workflow/global-workflow/dev/ci/scripts/unittests
plugins: cov-6.1.1
collecting ... collected 18 items
test_create_experiment.py::test_create_experiment PASSED                 [  5%]
test_find_homegfs.py::TestFindHOMEgfs::test_find_homegfs_current_dir PASSED [ 11%]
test_find_homegfs.py::TestFindHOMEgfs::test_find_homegfs_nested_dir PASSED [ 16%]
test_find_homegfs.py::TestFindHOMEgfs::test_find_homegfs_none_start_path PASSED [ 22%]
test_find_homegfs.py::TestFindHOMEgfs::test_find_homegfs_not_found PASSED [ 27%]
test_find_homegfs.py::TestFindHOMEgfs::test_with_path_object PASSED      [ 33%]
test_find_homegfs.py::TestFindHOMEgfs::test_with_string_path PASSED      [ 38%]
test_parse_yaml.py::TestParseYAML::test_cli_basic PASSED                 [ 44%]
test_parse_yaml.py::TestParseYAML::test_cli_default_value PASSED         [ 50%]
test_parse_yaml.py::TestParseYAML::test_cli_fail_on_missing PASSED       [ 55%]
test_parse_yaml.py::TestParseYAML::test_cli_string_option PASSED         [ 61%]
test_rocotostat.py::test_rocoto_statcount PASSED                         [ 66%]
test_rocotostat.py::test_rocoto_summary PASSED                           [ 72%]
test_rocotostat.py::test_rocoto_done PASSED                              [ 77%]
test_rocotostat.py::test_rocoto_stalled PASSED                           [ 83%]
test_setup.py::test_setup_expt PASSED                                    [ 88%]
test_setup.py::test_setup_xml PASSED                                     [ 94%]
test_setup.py::test_setup_xml_fail_config_env_cornercase PASSED          [100%]
- generated xml file: /home/runner/work/global-workflow/global-workflow/global-workflow/dev/ci/scripts/unittests/test-results.xml -
============================= 18 passed in 20.[58](https://github.com/NOAA-EMC/global-workflow/actions/runs/14707763201/job/41272078601?pr=3603#step:8:59)s =============================

Copy link
Copy Markdown
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@aerorahul aerorahul merged commit 41c65c9 into NOAA-EMC:develop Apr 28, 2025
5 checks passed
@aerorahul aerorahul deleted the ci_gitlab_pipelines_dev branch April 28, 2025 18:58
tsga added a commit to tsga/global-workflow that referenced this pull request May 1, 2025
* develop:
  Update GSI hash and GSI fix version to resolve bugs (NOAA-EMC#3626)
  Add missing marine DA files to archiving  (NOAA-EMC#3596)
  Add a low resolution test to mimic GFSv17 cycling as much as possible (NOAA-EMC#3617)
  Add the setting to use the reject list for station t/q observations in GSI based soil DA (NOAA-EMC#3599)
  GitLab CI Framework for schedule PR cases and ctests on multi hosts (NOAA-EMC#3603)
  Avoid parallel restart I/O on WCOSS2 (NOAA-EMC#3615)
  Enables user toggling of GDASApp g-w ctests (NOAA-EMC#3587)
  COM variable updates for prep and some external downstream jobs (NOAA-EMC#3608)
  Remove MOS from system (NOAA-EMC#3612)
  Updates to enable soil DA  (NOAA-EMC#3452)
  Unexport SHELLOPTS when running htar (NOAA-EMC#3601)
  Fix check for netcdf wave restart (NOAA-EMC#3594)
  Call err_chk/err_exit for fatal errors in post JJobs/ex-scripts (NOAA-EMC#3571)
  Remove support for Jet and S4 (NOAA-EMC#3572)
  Hotfix in GitLab pipline for Nightly (env MACHINE breaks build on head node) (NOAA-EMC#3578)
  [hotfix] Missed a path during merging develop (NOAA-EMC#3577)
  Prepare for ops readiness - part 1 (NOAA-EMC#3557)
  Update UFS weather-model to 20250328 hash (NOAA-EMC#3528)
  Fix SFS fcst config (NOAA-EMC#3574)
  Use err_chk in GDAS j-jobs (NOAA-EMC#3570)
  Perform compute builds on Gaea head nodes (NOAA-EMC#3560)
  Add initial capability to produce JEDI-based observation space summary stat files (NOAA-EMC#3471)
  Spread epos over more nodes on Hera to increase allocated memory (NOAA-EMC#3567)
  Create separate gists when multiple files are published on GitHub (NOAA-EMC#3551)
  Use err_chk in GSI J-Jobs and scripts (NOAA-EMC#3549)
  Add unified jinja obs list to marine DA (NOAA-EMC#3530)
  Save snow and aerosol analysis increments (and logs and YAMLs) every cycle (NOAA-EMC#3537)
  Add Dependencies to SFS Cleanup Job (NOAA-EMC#3559)
  Updates archiving to reflect current naming of marine anl output (NOAA-EMC#3541)
  Temporarily disable compute builds on C6 (NOAA-EMC#3558)
  Update gdas.cd hash to resolve msu prod_util failure (NOAA-EMC#3556)
  COMIN/COMOUT updates for enkf chgres and downstream product jobs (NOAA-EMC#3518)
  Call err_chk in forecast scripts for fatal errors (NOAA-EMC#3515)
  Add Rocoto Jobs for the Missing Products of GEFS (NOAA-EMC#3466)
  Download subset fix data with python script (NOAA-EMC#3400)
  Check that partition should be set (NOAA-EMC#3543)
  Rename wave output and refactor some wave scripts to use MPMD, and fix some bugzillas along the way (NOAA-EMC#3517)
  Add support for dual batch partitions on AWS NOAA-EMC#3483
  Update CI build and run directories for GitLab Nightlies on C6 and added GitLab support on Hera (NOAA-EMC#3536)
  Hotfix path for CI in Jenkins on Gaea C6 to it's world-share path (NOAA-EMC#3532)
  Create single ocean grib2 product file (NOAA-EMC#3529)
  Scheduled Nightly CI/CD Pipeline Script in GitLab on Gaea C6 (NOAA-EMC#3493)
  make sure cold starts are handled correctly when DOIAU=YES (issue NOAA-EMC#3516) (NOAA-EMC#3520)
  Add check for DO_AERO_FCST before copying fv_tracer files (NOAA-EMC#3485)
  Use jinja templates instead of `@VARNAME@` in config files (NOAA-EMC#3411)
  Replace "status" (or comparable) with "err" in preparation for moving to err_chk/err_exit (NOAA-EMC#3507)
  Error in Java launch script for CI (NOAA-EMC#3465)
  Delete DATAROOT when running generate_workflows.sh (NOAA-EMC#3504)
  Fix 3244 garbled change (NOAA-EMC#3492)
  Enable ensemble archiving via Globus (NOAA-EMC#3479)
  Update MSU FIX_DIR paths (NOAA-EMC#3488)
  Updates for AOWCDA and hybatmaerosnowDA cases on Gaea C6 (NOAA-EMC#3487)
  Update GOCART path for GDAS/GFS/GCAFS implementations  (NOAA-EMC#3455)
  Make RUN Variables Explicit in `config.resources` (NOAA-EMC#3478)
  Remove unused key from enkfgdas_earc_vrfy (NOAA-EMC#3473)
  Bug fix to the failing early cycle marine DA ensemble re-centering (NOAA-EMC#3454)
  Make marine LETKF optional (NOAA-EMC#3462)
  When sourcing for RUN=enkf*, use CASE_ENS (NOAA-EMC#3475)
  Updates for Gaea: verif-global tag, tracker tag, Fit2Obs tag, and C768 analysis resources (NOAA-EMC#3463)
  Update gefswave glo_025 mesh file with new mask (NOAA-EMC#3457)
  Update MSU glopara paths to new role-global space (NOAA-EMC#3443)
  Enable CI testing on AWS (NOAA-EMC#3459)
  Enable Gaea C5 Jenkins CI (NOAA-EMC#3447)
  Job reference removal from WMO product names (NOAA-EMC#3460)
  Turn off aerosol prognostic radiative feedback for GDAS NOAA-EMC#2926 (NOAA-EMC#3445)
  Add DO_GEMPAK check to postsnd subtask (NOAA-EMC#3451)
  Add a force option to setup_xml to ignore unwritable directories (NOAA-EMC#3448)
  Remove the eomg job (NOAA-EMC#3331)
  Migration to role account for Jenkins on Orion (NOAA-EMC#3440)
  Eliminate `_gfs`, `_gdas`, etc, variables and add necessary if blocks (NOAA-EMC#3420)
  Update workflow staging for sfcanl tiles and waveinit (NOAA-EMC#3429)
  Improve messaging to display clear warning when missing snogrb file (NOAA-EMC#3317)
  JEDI-based ensemble recentering and analysis calculation (NOAA-EMC#3312)
  Enable HPSS archiving on C5/6 (NOAA-EMC#3437)
  Check if HOMEDIR STMP and PTMP are writable (NOAA-EMC#3430)
  Update UFS_Utils and GFS-utils hashes to update Gaea support and ocean/ice post products (NOAA-EMC#3433)
  Enable C1152 forecasts on gaea C6 (NOAA-EMC#3438)
  Migration to role account for Jenkins on Hercules (NOAA-EMC#3423)
  Remove Direct Linking to COM from DATA for `extractvars` Job (NOAA-EMC#3379)
  Enable HPSS via Globus on Hercules and Orion
  Remove job name from product files & update GEMPAK module. (NOAA-EMC#3415)
  `link` instead of `copy` in staging jobs (NOAA-EMC#3410)
  Migrate CI Jenkins to role account on Hera (NOAA-EMC#3414)
  Add rocotorc documentation when using scrontab (NOAA-EMC#3417)
  Update jgdas atmos verfozn and verfrad with COMIN/COMOUT prefix instead of COM (NOAA-EMC#3342)
  Add configuration for empirically-corrected ozone parameters (NOAA-EMC#3386)
  Enable global-workflow to run C768C384 GSI on Gaea-C6 (NOAA-EMC#3412)
  Move logical checks into if blocks (NOAA-EMC#3339)
  Adding Jenkins CI to GaeaC6 using role account (NOAA-EMC#3389)
  Enable GDASApp g-w CI cases to run on wcoss2 (NOAA-EMC#3399)
  CI/CD Test on Gaea C5- And update config.gaea under ci/platform (NOAA-EMC#3280)
  Enable cycling support for Gaea C6 (NOAA-EMC#3323)
  Update enkf archive jobs to use COMIN/COMOUT (NOAA-EMC#3393)
  Copy marine ensemble output observation diags and spread (NOAA-EMC#3407)
  Ci testing on aws 2 (NOAA-EMC#3408)
  Disable METplus jobs on Hera (NOAA-EMC#3403)
  Add the mean EnKF soil increment to the deterministic member (NOAA-EMC#3295)
  Add mpich/8.1.19 to the WCOSS2 LD_LIBRARY_PATH for GDASApp jobs (NOAA-EMC#3396)
  Change order of RUNs (NOAA-EMC#3335)
  CI testing on aws (NOAA-EMC#3391)
  Rename Gulf of Mexico in bufr station list in GFSv17 (NOAA-EMC#3384)
  Enabling AWS CI/testing (NOAA-EMC#3383)
  Update issue templates to use new issue type field (NOAA-EMC#3369)
  Replace WAVECUR_DID variable with "rtofs" (NOAA-EMC#3337)
  Allow for C1152 ATM-Aero cycled DA to run on WCOSS2 (NOAA-EMC#3309)
  Remove Direct Linking to COM from DATA for `wavepostsbs` Job (NOAA-EMC#3303)
  Update jgdas enkf update job with COMIN or COMOUT prefix instead of COM (NOAA-EMC#3333)
  Add capability to run diff resolutions for marine anl and background (NOAA-EMC#3238)
  Update high resolution tests and fix minor wave issues  (NOAA-EMC#3289)
  Add sfs as valid system (NOAA-EMC#3243)
  Add missing arch_tars dependencies (NOAA-EMC#3319)
  Fix the empty aerosol DA aerostat tar file issue (NOAA-EMC#3332)
  Add missing file safeguard for IMS prep in snow analysis tasks (NOAA-EMC#3329)
  Fix memory unsetting on Gaea (NOAA-EMC#3325)
  Fix error log parsing in compute build CI (NOAA-EMC#3301)
  Remove marineanlvrfy task from global-workflow (NOAA-EMC#3314)
  Add `gfs_wavepostpnt` dependencies to gfs_cleanup (NOAA-EMC#3313)
  Increase the GDASApp build wallclock (NOAA-EMC#3298)
  Capture build fail in Jenkins pipeline when no error logs are produced (NOAA-EMC#3297)
  Add/update config files for Gaea and check existence before sourcing config files in generate_workflows.sh (NOAA-EMC#3286)
  Fix ocean restarts when cold starting with DOIAU=YES (NOAA-EMC#3278)
  Splitting up the archive task (NOAA-EMC#3242)
  CTests extended validation for C48_ATM and staged C48_S2SW for gfs_fcst and gfs_atmos (NOAA-EMC#3256)
  Add esnowanl to enkfgfs cycle (NOAA-EMC#3283)
  Add gfs cycles to C48mx500_3DVarAOWCDA (NOAA-EMC#3249)
  Add fetch job and update stage_ic to work with fetched ICs (NOAA-EMC#3141)
  Remove WAFS files and references from `develop` (NOAA-EMC#3263)
  fix intel stack version number on c5 (NOAA-EMC#3258)
  Update gsi_monitor and ufs_utils hashes to recent hashes for C5/C6 build and run (NOAA-EMC#3252)
  Enable DA cycling on gaea C5/C6 (NOAA-EMC#3255)
  Copy post-processed sea ice increment for diagnostics (NOAA-EMC#3235)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Issue related to CI/CD CI-Gaeac6-Passed **Bot use only** CI testing on Gaea C6 for this PR has completed successfully CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend scheduled GitLab nightly runs to work on multiple hosts Enable ctests cases on PR bases as a CI/CD process

5 participants