Skip to content

CI maintenance updates and adding CI Unit Tests#2740

Merged
aerorahul merged 226 commits into
NOAA-EMC:developfrom
TerrenceMcGuinness-NOAA:ci_unit-tests_wxflow_develop
Jul 11, 2024
Merged

CI maintenance updates and adding CI Unit Tests#2740
aerorahul merged 226 commits into
NOAA-EMC:developfrom
TerrenceMcGuinness-NOAA:ci_unit-tests_wxflow_develop

Conversation

@TerrenceMcGuinness-NOAA
Copy link
Copy Markdown
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA commented Jun 28, 2024

Description

This PR has a few maintenance updates to the CI pipeline and adds a test directory with Unit Tests

Major Maintenance updates:

  • Added try blocks with appropriate messaging to GitHub PR of failure for:
    • scm checkout
    • build fail (with error logs sent as gists)
    • create experiment fails with stderr sent to GitHub PR messaging
  • Pre-stage FAILS from the above are now captured these fails allow FINALIZE to update the label to FAIL (i.e. no more "hanging" CI state labels in GitHub - see image below)

Minor Maintenance updates:

  • Fix for STALLED cases reviled from PR 2700 (just needed a lambda specifier)
  • Fixed path to experiment directory in PR message (had dropped EXPDIR in path)
  • Needed latin-1 decoder in reading log files for publishing

Added python Unit Tests for CI functionality:

  • Installed Rocoto and wxfow in GitHub Runner for testing key CI utility codes
  • Cashed the install of Rocoto in the GitHub Runners to greatly reduce stetup time for running the unit tests
  • Unit Tests Python scripts added
    • test_rocostat.py: rocoto_statcount() rocoto_summary() rocoto_stalled()
    • test_setup.py: setup_expt() test_setup_xml()
    • test_create_experment: test_create_experiment()
      • Runs all PR cases that do not have ICs in the GItHub Runner
    • Reporting mechanism in the Actions tab for Python Unit Testing results
    • Test case data for STALLED and RUNNING stored on S3 and pulled using wget during runtime of tests
  • Bug fix (fixes something broken)
  • Maintenance (new CI unit tests)

How has this been tested?

Unit Tests ran in GitHub Runner using .github/workflow/ci_uint_tests.yaml

test_create_experiment.py::test_create_experiment PASSED                 [ 12%]
test_rocotostat.py::test_rocoto_statcount PASSED                         [ 25%]
test_rocotostat.py::test_rocoto_summary PASSED                           [ 37%]
test_rocotostat.py::test_rocoto_done PASSED                              [ 50%]
test_rocotostat.py::test_rocoto_stalled PASSED                           [ 62%]
test_setup.py::test_setup_expt PASSED                                    [ 75%]
test_setup.py::test_setup_xml PASSED                                     [ 87%]
test_setup.py::test_setup_xml_fail_config_env_cornercase PASSED          [100%]

image (1)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

@TerrenceMcGuinness-NOAA
Copy link
Copy Markdown
Collaborator Author

TerrenceMcGuinness-NOAA commented Jul 3, 2024

It did get picked up but in that last integration before I left there was a syntax error in the pipeline so it doesn't run.

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA removed the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Jul 3, 2024
TerrenceMcGuinness-NOAA and others added 2 commits July 3, 2024 16:40
yes

Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Jul 4, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera labels Jul 4, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Jul 4, 2024

Checkout Failed on Hera: Could not perform submodule update

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Jul 4, 2024
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Jul 4, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera labels Jul 4, 2024
@TerrenceMcGuinness-NOAA
Copy link
Copy Markdown
Collaborator Author

TerrenceMcGuinness-NOAA commented Jul 4, 2024

Needs "cluster" update and apply catch fix similar to build fail to Finalize in the Pipeline.

Traceback (most recent call last):
  File "/scratch1/NCEPDEV/global/CI/2740/gefs//workflow/create_experiment.py", line 101, in <module>
    setup_xml.main(setup_xml_args)
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/setup_xml.py", line 73, in main
    xml = rocoto_xml_factory.create(f'{net}_{mode}', app_config, rocoto_param_dict)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/wxflow/factory.py", line 71, in create
    return self._builders[key](*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/rocoto/gefs_xml.py", line 14, in __init__
    super().__init__(app_config, rocoto_config)
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/rocoto/workflow_xml.py", line 27, in __init__
    task_list = get_wf_tasks(app_config)
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/rocoto/workflow_tasks.py", line 21, in get_wf_tasks
    tasks.append(task_obj.get_task(task_name))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/rocoto/tasks.py", line 257, in get_task
    return getattr(self, task_name, *args, **kwargs)()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/rocoto/gefs_tasks.py", line 72, in stage_ic
    resources = self.get_resource('stage_ic')
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/rocoto/tasks.py", line 2[29](https://jenkins.epic.oarcloud.noaa.gov/job/global-workflow/job/EMC-Global-Pipeline/view/change-requests/job/PR-2740/16/pipeline-console/?start-byte=0&selected-node=250#log-29), in get_resource
    if task_config['CLUSTERS'] not in ["", '@CLUSTERS@']:
       ~~~~~~~~~~~^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/global/CI/2740/gefs/workflow/wxflow/attrdict.py", line 84, in __missing__
    raise KeyError(name)
KeyError: 'CLUSTERS'
script returned exit code 1

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor

@TerrenceMcGuinness-NOAA Is this ready to be re-reviewed and tested?

@emcbot
Copy link
Copy Markdown

emcbot commented Jul 11, 2024

Build FAILED on Hera with error logs:

/scratch1/NCEPDEV/global/CI/2740/gefs/sorc/logs/build_ww3prepost.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link
Copy Markdown

emcbot commented Jul 11, 2024

Checkout Failed on Orion: Error cloning remote repo 'origin'

1 similar comment
@emcbot
Copy link
Copy Markdown

emcbot commented Jul 11, 2024

Checkout Failed on Orion: Error cloning remote repo 'origin'

@emcbot
Copy link
Copy Markdown

emcbot commented Jul 11, 2024

CI Passed Hera at
Built and ran in directory /scratch1/NCEPDEV/global/CI/2740__2


Experiment C48_ATM_6c74ed7b Completed 1 Cycles: *SUCCESS* at Thu Jul 11 03:09:41 UTC 2024
Experiment C48mx500_3DVarAOWCDA_6c74ed7b Completed 2 Cycles: *SUCCESS* at Thu Jul 11 03:21:50 UTC 2024
Experiment C96C48_hybatmDA_6c74ed7b Completed 3 Cycles: *SUCCESS* at Thu Jul 11 04:17:14 UTC 2024
Experiment C96_atm3DVar_6c74ed7b Completed 3 Cycles: *SUCCESS* at Thu Jul 11 04:23:16 UTC 2024
Experiment C48_S2SWA_gefs_6c74ed7b Completed 1 Cycles: *SUCCESS* at Thu Jul 11 04:34:51 UTC 2024
Experiment C48_S2SW_6c74ed7b Completed 1 Cycles: *SUCCESS* at Thu Jul 11 05:01:01 UTC 2024
Experiment C96_atmaerosnowDA_6c74ed7b Completed 3 Cycles: *SUCCESS* at Thu Jul 11 05:11:52 UTC 2024

Copy link
Copy Markdown
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.
A self-test should be run on develop ASAP after this is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Issue related to CI/CD CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants