Skip to content

Enable regression tests on Acorn#1307

Merged
jkbk2004 merged 23 commits into
ufs-community:developfrom
DusanJovic-NOAA:rt_on_acorn
Jul 27, 2022
Merged

Enable regression tests on Acorn#1307
jkbk2004 merged 23 commits into
ufs-community:developfrom
DusanJovic-NOAA:rt_on_acorn

Conversation

@DusanJovic-NOAA
Copy link
Copy Markdown
Collaborator

@DusanJovic-NOAA DusanJovic-NOAA commented Jul 1, 2022

PR Checklist

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsibility to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

This PR updates rt scripts and adds configuration files required to run regression tests on Acorn.

Several tests need to be disabled due to various issues:

datm_cdeps_debug_cfsr, cpld_debug_p8 and cpld_debug_noaero_p8 crash with the following runtime error:

+ mpiexec -n 40 -ppn 40 -depth 1 ./fv3.exe
forrtl: error (182): floating invalid - possible uninitialized real/complex variable.
Image              PC                Routine            Line        Source            
fv3.exe            0000000008BB480B  Unknown               Unknown  Unknown
libpthread-2.31.s  000015198DCAF8C0  Unknown               Unknown  Unknown
fv3.exe            00000000061969D3  ice_comp_nuopc_mp         186  ice_comp_nuopc.F90
fv3.exe            0000000001C4A176  _ZN5ESMCI6FTable1        2167  ESMCI_FTable.C
fv3.exe            0000000001C4DDEA  ESMCI_FTableCallE         824  ESMCI_FTable.C
fv3.exe            000000000235178F  _ZN5ESMCI3VMK5ent        2308  ESMCI_VMKernel.C
fv3.exe            000000000233B139  _ZN5ESMCI2VM5ente        1216  ESMCI_VM.C
fv3.exe            0000000001C4B5F7  c_esmc_ftablecall         981  ESMCI_FTable.C
fv3.exe            000000000148FD6F  esmf_compmod_mp_e        1222  ESMF_Comp.F90

fv3_regional_netcdf_parallel fails while comparing baselines:

Traceback (most recent call last):
  File "/lfs/h1/emc/eib/noscrub/dusan.jovic/ufs/rt_on_acorn/ufs-weather-model/tests/compare_ncfile.py", line 6, in <module>
    with Dataset(sys.argv[1]) as nc1, Dataset(sys.argv[2]) as nc2:
  File "src/netCDF4/_netCDF4.pyx", line 2330, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 1948, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: b'/lfs/h1/emc/nems/noscrub/emc.nems/RT/NEMSfv3gfs/develop-20220623/INTEL/fv3_regional_netcdf_parallel/dynf000.nc'

cpld_bmark_p8 and cpld_restart_bmark_p8 require 11 nodes and just wait in the queue for hours.

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)

  • fixes #<issue_number>
  • fixes noaa-emc/fv3atm/issues/<issue_number>

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • acorn.intel
  • opnReqTest for newly added/changed feature
  • CI

Dependencies

If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).

Do PRs in upstream repositories need to be merged first?
If so add the "waiting for other repos" label and list the upstream PRs

  • waiting on noaa-emc/nems/pull/<pr_number>
  • waiting on noaa-emc/fv3atm/pull/<pr_number>

Comment thread modulefiles/ufs_acorn.intel
Comment thread modulefiles/ufs_acorn.intel
Comment thread tests/rt.conf Outdated
@jkbk2004 jkbk2004 mentioned this pull request Jul 26, 2022
16 tasks
@jkbk2004
Copy link
Copy Markdown
Collaborator

@DusanJovic-NOAA we can start working on this PR. @DeniseWorthen @DavidHuber-NOAA Can we combine in #1299 and #1318 to this PR? With no baseline change, we can make a quick progress on tests.

@DavidHuber-NOAA
Copy link
Copy Markdown
Collaborator

@jkbk2004 Yes, I am OK with combining #1318 with this PR.

@DusanJovic-NOAA
Copy link
Copy Markdown
Collaborator Author

My branch is up to date with develop. @DavidHuber-NOAA @DeniseWorthen please open PRs to my branch.

DusanJovic-NOAA and others added 3 commits July 26, 2022 15:21
* update CMEPS
* add fields needed by cmeps for wave-ice coupling
* remove blank lines from ciceC test
@DusanJovic-NOAA
Copy link
Copy Markdown
Collaborator Author

#1299 and #1318 are merged into this branch. Ready for regression testing.

@DusanJovic-NOAA DusanJovic-NOAA added the No Baseline Change No Baseline Change label Jul 26, 2022
@jkbk2004
Copy link
Copy Markdown
Collaborator

#1299 and #1318 are merged into this branch. Ready for regression testing.

@DusanJovic-NOAA thanks! I will start from Cheyenne. Hopefully will catch up orion and jet before they start maintenance.

on-behalf-of @ufs-community <brian.curtis@noaa.gov>
@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

@DusanJovic-NOAA LIne 197 in rt.conf the HAFS-ALL compile. Please block wcoss2.intel for that compile and those tests that follow. We have to wait for NCO to install pio/2.5.3 on WCOSS2 before those are run.

BrianCurtis-NOAA and others added 4 commits July 26, 2022 19:16
@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: RT
[RT] Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/985259239/20220726190007/ufs-weather-model
[RT] Error: Test cpld_control_c192_p8 007 failed in run_test failed
[RT] Error: Test compile_011 failed in run_compile failed
Please make changes and add the following label back: gaea-intel-RT

@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

@DusanJovic-NOAA Could you edit the PR template to include Acorn.intel ?

@DusanJovic-NOAA
Copy link
Copy Markdown
Collaborator Author

@DusanJovic-NOAA Could you edit the PR template to include Acorn.intel ?

I already added acorn.intel to the pr template. See 1482143

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

CMEPS PR has been merged. New hash is f9f7541

@jkbk2004 jkbk2004 requested a review from binli2337 July 27, 2022 13:34
@jkbk2004
Copy link
Copy Markdown
Collaborator

@DusanJovic-NOAA can you resolve the conversations? @binli2337 @junwang-noaa it looks the pr is ready to merge. Can you take a look and approve?

@DavidHuber-NOAA
Copy link
Copy Markdown
Collaborator

All regression tests passed on S4.

@jkbk2004 jkbk2004 merged commit 67eef06 into ufs-community:develop Jul 27, 2022
@DusanJovic-NOAA DusanJovic-NOAA deleted the rt_on_acorn branch July 27, 2022 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

No Baseline Change No Baseline Change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants