Skip to content

CI Refactoring and STALLED case detection#2488

Merged
WalterKolczynski-NOAA merged 220 commits into
NOAA-EMC:developfrom
TerrenceMcGuinness-NOAA:feature/check_stalled_cases
Apr 20, 2024
Merged

CI Refactoring and STALLED case detection#2488
WalterKolczynski-NOAA merged 220 commits into
NOAA-EMC:developfrom
TerrenceMcGuinness-NOAA:feature/check_stalled_cases

Conversation

@TerrenceMcGuinness-NOAA
Copy link
Copy Markdown
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA commented Apr 15, 2024

Description

These updates to the CI Framework does some bash refactoring and adds python tools in order to effectively create the feature for detecting when a CI Case has an experiment that is in a state where it can not advance such as missing a requisite dependency:

  • Added separate python script for checking the status of Rocoto driven cases and integrated its use into the bash CI drivers and Jenkins having state logic done in one place.

  • Added log publishing python utilities into the bash CI drivers as part of refactoring and consolidations of functionalities

  • Update Jenkins behavior while incorporating the above python codes for Rocoto state checking:

    • polling on PR works with one update away from including multiple labels as well
    • Label updates to FAIL as soon as first case fails and continues other cases until completes or is killed by user

    Resolves Feature BASH CI detects when a dependacy isn't being met #2008

Type of change

  • New feature (adds functionality to detect stalled CI cases)
  • Maintenance (code refactor, clean-up, new CI functionalities and behavior)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

BASH cases tested for stall and log failing reporting in dev bash cron
Jenkins tested in development mulit-branch project with fail tests and full-end-to end success path

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

Comment thread ci/scripts/utils/ci_utils.sh Fixed
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion and removed CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed labels Apr 19, 2024
@emcbot emcbot added CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Apr 19, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Apr 19, 2024

CI Passed Orion at
Built and ran in directory /work2/noaa/stmp/CI/ORION/2488

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. label Apr 19, 2024
@emcbot emcbot added CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 and removed CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. labels Apr 19, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Apr 19, 2024

CI Update on Wcoss2 at 04/19/24 09:21:13 PM
============================================
Cloning and Building global-workflow PR: 2488
with PID: 70662 on host: dlogin08

@emcbot emcbot added CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 labels Apr 19, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Apr 19, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Fri Apr 19 21:25:06 UTC 2024 on dlogin08
---------------------------------------------------
Build: Completed at 04/19/24 09:36:26 PM
Case setup: Completed for experiment C48_ATM_07617928
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_07617928
Case setup: Skipped for experiment C48_S2SWA_gefs_07617928
Case setup: Completed for experiment C48_S2SW_07617928
Case setup: Completed for experiment C96_atm3DVar_07617928
Case setup: Skipped for experiment C96_atmaerosnowDA_07617928
Case setup: Completed for experiment C96C48_hybatmDA_07617928
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_07617928

@emcbot
Copy link
Copy Markdown

emcbot commented Apr 19, 2024

Experiment C48_ATM_07617928 SUCCESS on Wcoss2 at 04/19/24 10:52:10 PM

@emcbot
Copy link
Copy Markdown

emcbot commented Apr 19, 2024

Experiment C96C48_hybatmDA_07617928 SUCCESS on Wcoss2 at 04/19/24 11:52:24 PM

@emcbot
Copy link
Copy Markdown

emcbot commented Apr 19, 2024

Experiment C96_atm3DVar_07617928 SUCCESS on Wcoss2 at 04/19/24 11:56:11 PM

@emcbot
Copy link
Copy Markdown

emcbot commented Apr 20, 2024

Experiment C48_S2SW_07617928 SUCCESS on Wcoss2 at 04/20/24 12:08:14 AM

@emcbot emcbot added CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress labels Apr 20, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Apr 20, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_07617928 *** SUCCESS *** at 04/19/24 10:52:10 PM
Experiment C96C48_hybatmDA_07617928 *** SUCCESS *** at 04/19/24 11:52:24 PM
Experiment C96_atm3DVar_07617928 *** SUCCESS *** at 04/19/24 11:56:11 PM
Experiment C48_S2SW_07617928 *** SUCCESS *** at 04/20/24 12:08:14 AM

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 1cfc8e5 into NOAA-EMC:develop Apr 20, 2024
danholdaway added a commit to danholdaway/global-workflow that referenced this pull request Apr 23, 2024
* upstream/develop:
  Add CCPP suite and FASTER option to UFS build (NOAA-EMC#2521)
  New "atmanlfv3inc" Rocoto job (NOAA-EMC#2420)
  Hotfix to disable STALLED in CI as an error (NOAA-EMC#2523)
  Add restart on failure capability for the forecast executable (NOAA-EMC#2510)
  Update parm/transfer list files to match vetted GFSv16 set (NOAA-EMC#2517)
  Update gdas_gsibec_ver to 20240416 (NOAA-EMC#2497)
  Adding more cycles to gempak script gfs_meta_sa2.sh (NOAA-EMC#2518)
  Update gsi_enkf.sh hash to 457510c (NOAA-EMC#2514)
  Enable using the FV3_global_nest_v1 CCPP suite (NOAA-EMC#2512)
  CI Refactoring and STALLED case detection (NOAA-EMC#2488)
  Add C768 and C1152 S2SW test cases (NOAA-EMC#2509)
  Fix paths for refactored prepocnobs task (NOAA-EMC#2504)
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA deleted the feature/check_stalled_cases branch April 30, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Issue related to CI/CD CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature BASH CI detects when a dependacy isn't being met

4 participants