Skip to content

[develop] Make CIs test NCO mode properly.#418

Merged
danielabdi-noaa merged 1 commit into
ufs-community:developfrom
danielabdi-noaa:feature/nco_cis
Oct 30, 2022
Merged

[develop] Make CIs test NCO mode properly.#418
danielabdi-noaa merged 1 commit into
ufs-community:developfrom
danielabdi-noaa:feature/nco_cis

Conversation

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa danielabdi-noaa commented Oct 14, 2022

DESCRIPTION OF CHANGES:

This PR addresses issue #416 through the alternative means described there

  • Both Jenkins and Github actions should now be able to test NCO test cases properly.
  • Symlinks to log files stored under the NCO operations directory are created under EXPTDIR/log making "nco" mode indistiguishable from "community" mode as far as the CIs are concerned
  • Having symlinks to log files under the experiment directory is also convenient since user would need to know the workflow ID otherwise

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

Run one test on Hera and Jet successfully in NCO mode and confirmed the EXPTDIR/log is populated with log files.
Test on one system should be enough for this PR.

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

None

DOCUMENTATION:

None required

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@danielabdi-noaa danielabdi-noaa changed the title Make CIs test NCO mode properly. [develop] Make CIs test NCO mode properly. Oct 14, 2022
@danielabdi-noaa danielabdi-noaa added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Oct 14, 2022
@venitahagerty venitahagerty removed ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Oct 14, 2022
@venitahagerty
Copy link
Copy Markdown
Collaborator

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1087695356/20221014172009/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 9 experiments
If test failed, please make changes and add the following label back:
ci-hera-intel-WE

@venitahagerty
Copy link
Copy Markdown
Collaborator

venitahagerty commented Oct 14, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/1087695356/20221014172018/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 9 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Succeeded on jet: nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR

Copy link
Copy Markdown
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test on Hera has shown that the "nco" fundamental test is populating the expt_dir/log directory. This is the expected behavior for this PR, so I give my approval.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Oct 14, 2022
@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

I believe this PR passed fundamental tests on all systems, except on Orion where it failed in 1 test case.
It looks like an error related to wall clock time limit for make_lbcs which is not related to this PR. The test case is also run in community mode so it has nothing to do with this PR.

slurmstepd: error: *** JOB 7217653 ON Orion-16-71 CANCELLED AT 2022-10-14T17:11:36 DUE TO TIME LIMIT *** 
slurmstepd: error: *** JOB 7217653 STEPD TERMINATED ON Orion-16-71 AT 2022-10-14T17:15:37 DUE TO JOB NOT ENDING WITH SIGNALS *** 
slurmstepd: error: Unable to destroy container 315627 in cgroup plugin, giving up after 255 sec

@MichaelLueken
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa It appears as though a condition occurred on Orion on Friday that required immediate maintenance. A rerun of the Jenkins pipeline for Orion has successfully passed. All Jenkins-based fundamental tests have now successfully passed. Once a second approval is given, this work will be ready to be merged.

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

Can someone review this PR? This is a critical piece to have NCO mode enabled in Jenkins, and we already have multiple issues of breaking NCO mode.

Copy link
Copy Markdown
Collaborator

@JeffBeck-NOAA JeffBeck-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

@JeffBeck-NOAA Thanks for the review!

@danielabdi-noaa danielabdi-noaa merged commit ce024c4 into ufs-community:develop Oct 30, 2022
danielabdi-noaa added a commit to danielabdi-noaa/ufs-srweather-app that referenced this pull request Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Priority: medium run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants