Skip to content

Reenable Orion Cycling Support#2877

Merged
WalterKolczynski-NOAA merged 29 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:feature/orion_upp_update
Sep 7, 2024
Merged

Reenable Orion Cycling Support#2877
WalterKolczynski-NOAA merged 29 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:feature/orion_upp_update

Conversation

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Aug 29, 2024

Description

This updates the model hash to include the UPP update needed to be able to run the post processor on Orion, thus reenabling support on that system.

A note on the UPP: it is using a newer version of g2tmpl that requires a separate spack-stack 1.6.0 installation. This version of g2tmpl will be standard in spack-stack 1.8.0, but for now requires loading separate modules for the UPP.

A note on running analyses on Orion: due to a yet-unknown issue causing the BUFR library to run much slower on Orion when compared with Rocky 8, the GSI and GDASApp are expected to run significantly slower than on any other platform (on the order of an hour longer).

Lastly, I made adjustments to the build_all.sh script to send more cores to compiling the UFS and GDASApp. Under this configuration, the GSI, UPP, UFS_Utils, and WW3 pre/post executables finish compiling before the UFS when run with 20 cores.

Resolves #2694
Resolves #2851

Type of change

  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? YES (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

  • Build on Orion
  • C48_ATM on Orion
  • C48_S2SW on Orion
  • C96C48_hybatmDA on Orion

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes

Comment thread sorc/link_workflow.sh
Comment thread jobs/rocoto/upp.sh Outdated
@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

@WenMeng-NOAA

The spack-stack environment used by the UPP (upp-addon) does not include xarray and uses older versions of numpy (v1.22.3) and jinja2 (3.0.3) than the gsi-addon environment. To make this work, I had to add a kludge to ush/python/pygfs/__init.py__ so that the marine Tasks' importing of xarray does not cause the UPP to fail. This can be removed once we get to spack-stack v1.8.0.

With these changes, the gdasatmanlupp and gfsatmanlupp jobs ran to completion. Wen, would you be able to look at the output files and verify they are OK? You can find a pair of examples here:

/work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gdas.20211221/00/model/atmos/master/gdas.t00z.master.grb2anl
/work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gfs.20211221/00/model/atmos/master/gfs.t00z.master.grb2anl

@WenMeng-NOAA
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA Can you change the access permission of /work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gdas.20211221/00/model/atmos/master/?

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

@WenMeng-NOAA Yes, done.

Comment thread sorc/link_workflow.sh
@WenMeng-NOAA
Copy link
Copy Markdown
Contributor

@WenMeng-NOAA

The spack-stack environment used by the UPP (upp-addon) does not include xarray and uses older versions of numpy (v1.22.3) and jinja2 (3.0.3) than the gsi-addon environment. To make this work, I had to add a kludge to ush/python/pygfs/__init.py__ so that the marine Tasks' importing of xarray does not cause the UPP to fail. This can be removed once we get to spack-stack v1.8.0.

With these changes, the gdasatmanlupp and gfsatmanlupp jobs ran to completion. Wen, would you be able to look at the output files and verify they are OK? You can find a pair of examples here:

/work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gdas.20211221/00/model/atmos/master/gdas.t00z.master.grb2anl
/work/noaa/global/dhuber/para/orion/COMROOT/c96_4denvar/gfs.20211221/00/model/atmos/master/gfs.t00z.master.grb2anl

@DavidHuber-NOAA Somehow, the 6 aerosol fields (ATOK) are missing in both gfs and gdas master files. To output them from UPP, itag should be set as:

&nampgb
  kpo = 57,
  po = 1000.0, 975.0, 950.0, 925.0, 900.0, 875.0, 850.0, 825.0, 800.0, 775.0, 750.0, 725.0, 700.0, 675.0, 650.0, 625.0, 600.0, 575.0, 550.0, 525.0, 500.0, 475.0, 450.0, 425.0, 400.0, 375.0, 350.0, 325.0, 300.0, 275.0, 250.0, 225.0, 200.0, 175.0, 150.0, 125.0, 100.0, 70.0, 50.0, 40.0, 30.0, 20.0, 15.0, 10.0, 7.0, 5.0, 3.0, 2.0, 1.0, 0.7, 0.4, 0.2, 0.1, 0.07, 0.04, 0.02, 0.01,
  rdaod = .true.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

@WenMeng-NOAA Here are the contents of the nampgb namelist used to generate the master file:

&nampgb
  kpo = 57,
  po = 1000.0, 975.0, 950.0, 925.0, 900.0, 875.0, 850.0, 825.0, 800.0, 775.0, 750.0, 725.0, 700.0, 675.0, 650.0, 625.0, 600.0, 575.0, 550.0, 525.0, 500.0, 475.0, 450.0, 425.0, 400.0, 375.0, 350.0, 325.0, 300.0, 275.0, 250.0, 225.0, 200.0, 175.0, 150.0, 125.0, 100.0, 70.0, 50.0, 40.0, 30.0, 20.0, 15.0, 10.0, 7.0, 5.0, 3.0, 2.0, 1.0, 0.7, 0.4, 0.2, 0.1, 0.07, 0.04, 0.02, 0.01,
  rdaod = .true.

These appear to be the same. The experiment that I was running was ATM-only, so that may be why the aerosol fields were not present. I will try running the C96C48_hybatmaerosnowDA test case and let you know when I have an analysis master grib2 file ready.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

@WenMeng-NOAA I have finished running the C96C48_hybatmaerosnowDA test case. The analysis grib2 file can be found here: /work/noaa/global/dhuber/para/orion/COMROOT/hybaerosnow/gdas.20211221/00/model/atmos/master/gdas.t00z.master.grb2anl. Running wgrib2 | grep ATOK returned no entries. Is this a feature that requires a new g2 or g2tmpl module? Or is there perhaps a change in another namelist elsewhere?

aerorahul and others added 4 commits September 5, 2024 14:36
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
fixes for UPP and compression in ufswm
@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion label Sep 5, 2024
@emcbot emcbot added CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion labels Sep 5, 2024
@aerorahul aerorahul added the CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. label Sep 5, 2024
@emcbot emcbot added CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 and removed CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Wcoss2-Ready PR is ready for CI testing on WCOSS2. labels Sep 5, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Sep 5, 2024

CI Update on Wcoss2 at 09/05/24 08:36:54 PM
============================================
Cloning and Building global-workflow PR: 2877
with PID: 57410 on host: dlogin03

@emcbot emcbot added CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 labels Sep 5, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Sep 5, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Thu Sep  5 20:39:48 UTC 2024 on dlogin03
---------------------------------------------------
Build: Completed at 09/05/24 09:21:14 PM
Case setup: Completed for experiment C48_ATM_c8710fe9
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_c8710fe9
Case setup: Skipped for experiment C48_S2SWA_gefs_c8710fe9
Case setup: Completed for experiment C48_S2SW_c8710fe9
Case setup: Completed for experiment C96_atm3DVar_extended_c8710fe9
Case setup: Skipped for experiment C96_atm3DVar_c8710fe9
Case setup: Completed for experiment C96C48_hybatmaerosnowDA_c8710fe9
Case setup: Completed for experiment C96C48_hybatmDA_c8710fe9
Case setup: Completed for experiment C96C48_ufs_hybatmDA_c8710fe9

@emcbot emcbot added the CI-Wcoss2-Failed CI testing on WCOSS for this PR has failed label Sep 5, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Sep 5, 2024

Experiment C96_atm3DVar_extended_c8710fe9 FAIL on Wcoss2 at 09/05/24 10:36:32 PM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f000.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f001.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f002.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f003.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f004.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f005.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f006.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/logs/2021122100/gfsgoesupp_f007.log

Follow link here to view the contents of the above file(s): (link)

@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor

The index file isn't being created in the GOES UPP job. I checked $DATA and it isn't there (GFSGOES.GrbF00 is there).

�[38;21m2024-09-05 22:30:49,031 - INFO     - upp         : Copy 'goes' processed data to COM/ directory�[0m
�[38;21m2024-09-05 22:30:49,034 - INFO     - file_utils  : Copied /lfs/h2/emc/stmp/terry.mcguinness/RUNDIRS/C96_atm3DVar_extended_c8710fe9/gfs.2021122100/upp.161988/GFSGOES.GrbF00 to /lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/RUNTESTS/COMROOT/C96_atm3DVar_extended_c8710fe9/gfs.20211221/00//model/atmos/master/gfs.t00z.special.grb2f000�[0m
Traceback (most recent call last):
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/fsutils.py", line 85, in cp
    shutil.copy2(source, target)
  File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/shutil.py", line 432, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/shutil.py", line 261, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/lfs/h2/emc/stmp/terry.mcguinness/RUNDIRS/C96_atm3DVar_extended_c8710fe9/gfs.2021122100/upp.161988/GFSGOES.GrbF00.idx'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/scripts/exglobal_atmos_upp.py", line 48, in <module>
    main()
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/scripts/exglobal_atmos_upp.py", line 44, in main
    upp.finalize(upp_dict.upp_run, upp_yaml)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/pygfs/task/upp.py", line 264, in finalize
    FileHandler(upp_yaml[upp_run].data_out).sync()
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/file_utils.py", line 43, in sync
    sync_factory[action](files)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/file_utils.py", line 63, in _copy_files
    cp(src, dest)
  File "/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2877/global-workflow/ush/python/wxflow/fsutils.py", line 87, in cp
    raise OSError(f"unable to copy {source} to {target}")
OSError: unable to copy /lfs/h2/emc/stmp/terry.mcguinness/RUNDIRS/C96_atm3DVar_extended_c8710fe9/gfs.2021122100/upp.161988/GFSGOES.GrbF00.idx to 

@emcbot
Copy link
Copy Markdown

emcbot commented Sep 6, 2024

CI Passed on Orion in Build# 5
Built and ran in directory /work2/noaa/stmp/CI/ORION/2877


Experiment C48_ATM_c8710fe9 Completed 1 Cycles: *SUCCESS* at Thu Sep  5 05:32:18 PM CDT 2024
Experiment C96C48_hybatmDA_c8710fe9 Completed 3 Cycles: *SUCCESS* at Thu Sep  5 06:52:35 PM CDT 2024
Experiment C96_atm3DVar_c8710fe9 Completed 3 Cycles: *SUCCESS* at Thu Sep  5 06:52:43 PM CDT 2024
Experiment C48_S2SWA_gefs_c8710fe9 Completed 1 Cycles: *SUCCESS* at Thu Sep  5 06:59:03 PM CDT 2024
Experiment C48_S2SW_c8710fe9 Completed 1 Cycles: *SUCCESS* at Thu Sep  5 07:10:15 PM CDT 2024

@aerorahul
Copy link
Copy Markdown
Contributor

Let's disable the GOES product generation and merge this PR.
Something has changed unintentionally in there that needs to be further investigated.
HR4 needs to commence immediately.

@aerorahul
Copy link
Copy Markdown
Contributor

NOAA-EMC/UPP@97ea655
here GFSGOES was renamed in the post_gfs_goes control file.

https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/python/pygfs/task/upp.py#L203-L207
The index file is created from GFSPRS and GFSFLX files. The above change in from GFSPRS to GFSGOES broke this part of the workflow.

@aerorahul
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA

for ftype in ['PRS', 'FLX']:

Can you add GOES to the list here?
['PRS', 'FLX', 'GOES']
This should be done properly, but this will do in a pinch.

@emcbot
Copy link
Copy Markdown

emcbot commented Sep 6, 2024

CI Update on Wcoss2 at 09/06/24 12:29:07 PM
============================================
Cloning and Building global-workflow PR: 2877
with PID: 216532 on host: dlogin03

@emcbot
Copy link
Copy Markdown

emcbot commented Sep 6, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Fri Sep  6 12:32:39 UTC 2024 on dlogin03
---------------------------------------------------
Build: Completed at 09/06/24 01:13:16 PM
Case setup: Completed for experiment C48_ATM_8f7bbc44
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_8f7bbc44
Case setup: Skipped for experiment C48_S2SWA_gefs_8f7bbc44
Case setup: Completed for experiment C48_S2SW_8f7bbc44
Case setup: Completed for experiment C96_atm3DVar_extended_8f7bbc44
Case setup: Skipped for experiment C96_atm3DVar_8f7bbc44
Case setup: Completed for experiment C96C48_hybatmaerosnowDA_8f7bbc44
Case setup: Completed for experiment C96C48_hybatmDA_8f7bbc44
Case setup: Completed for experiment C96C48_ufs_hybatmDA_8f7bbc44

@aerorahul aerorahul mentioned this pull request Sep 6, 2024
7 tasks
@emcbot
Copy link
Copy Markdown

emcbot commented Sep 7, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_8f7bbc44 *** SUCCESS *** at 09/06/24 02:36:12 PM
Experiment C48_S2SW_8f7bbc44 *** SUCCESS *** at 09/06/24 02:51:12 PM
Experiment C96C48_hybatmDA_8f7bbc44 *** SUCCESS *** at 09/06/24 03:48:26 PM
Experiment C96C48_hybatmaerosnowDA_8f7bbc44 *** SUCCESS *** at 09/06/24 04:48:41 PM
Experiment C96C48_ufs_hybatmDA_8f7bbc44 *** SUCCESS *** at 09/06/24 05:42:35 PM
Experiment C96_atm3DVar_extended_8f7bbc44 *** SUCCESS *** at 09/07/24 03:12:49 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update ufs_model.fd with new commit from ufs-weather-model Orion: Migration to Rocky9 OS

6 participants