Skip to content

Move machine-based options from config.base to host files#3053

Merged
aerorahul merged 19 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:feature/machine_opts
Nov 13, 2024
Merged

Move machine-based options from config.base to host files#3053
aerorahul merged 19 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:feature/machine_opts

Conversation

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Nov 1, 2024

Description

This moves all machine-specific options to the workflow/hosts files from the config.* files.

This also turns HPSS archiving on for WCOSS2 by default.

Resolves #2942
Resolves #3087

Type of change

  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

CI testing on Hercules. The tracker and genesis jobs did not run (as expected).
WCOSS2 testing still needs to be performed.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added

Comment thread parm/config/gfs/config.prepoceanobs Fixed
Comment thread parm/config/gfs/config.prepoceanobs Fixed
@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

This will break the C96_atm3DVar_extended test case until a fix is made for the gfs_downstream.tar file (as mentioned in #3019).

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

There is an issue with the gfs_bufrsnd job that is preventing it from creating the output gfs_collective BUFR files. I am re-disabling HPSS archiving on WCOSS2 for the time being and will add this information to #3019.

Marking ready for review.

@DavidHuber-NOAA DavidHuber-NOAA added CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress CI-Wcoss2-Failed CI testing on WCOSS for this PR has failed and removed CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress labels Nov 1, 2024
@DavidHuber-NOAA DavidHuber-NOAA removed the CI-Wcoss2-Failed CI testing on WCOSS for this PR has failed label Nov 4, 2024
Copy link
Copy Markdown
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a commendable PR and a key demonstration of how to refactor an "if-y diff-y" system specific configuration construct into a more centralized and pythonic approach.  I particularly liked how you enhanced the formation of the defaulting overrides.

@DavidHuber-NOAA DavidHuber-NOAA added the CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules label Nov 5, 2024
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Nov 5, 2024
Copy link
Copy Markdown
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things and one thing to sneak in (or leave for another time).

Comment thread workflow/hosts/wcoss2.yaml Outdated
Comment thread parm/config/gefs/config.base Outdated
Comment thread workflow/setup_expt.py Outdated
Comment thread workflow/setup_expt.py
@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress CI-Wcoss2-Building CI testing is cloning/building on WCOSS2 labels Nov 9, 2024
@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor

WCOSS is failing for a couple reasons:

  1. GDAS fails to build. I believe this is a known issue
  2. A failure to create experiments:
Running create_experiment.py for 5 cases
C48_ATM
The create_experiment command (./create_experiment.py -y ../ci/cases/pr/C48_ATM.yaml --overwrite) failed with a non-zero status.  Output:
Traceback (most recent call last):
  File "./create_experiment.py", line 33, in <module>
    from wxflow import AttrDict, parse_j2yaml, Logger, logit
  File "/lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3053/global-workflow/workflow/wxflow/__init__.py", line 13, in <module>
    from .jinja import Jinja
  File "/lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3053/global-workflow/workflow/wxflow/jinja.py", line 16, in <module>
    @jinja2.pass_eval_context
AttributeError: module 'jinja2' has no attribute 'pass_eval_context'

Seems like the WCOSS upgrade changed something with jinja.

CC @aerorahul

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

Updated the WCOSS2 module files to load Python 3.8.6 into the environment. This should resolve the errors seen during CI testing and issue #3087.

@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor

CI Tests set up to run in /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3053/RUNTESTS on WCOSS

aerorahul
aerorahul previously approved these changes Nov 12, 2024
Copy link
Copy Markdown
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Thanks for including the fix to the jinja2 error on wcoss2

@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor

CI Tests set up to run in /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3053/RUNTESTS on WCOSS

1 similar comment
@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor

CI Tests set up to run in /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3053/RUNTESTS on WCOSS

@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor

C96C48_ufs_hybatmDA is failing because the gdas build fails due to issues with oops. I believe this is a known issue, but the test should be disabled in the meantime.

Other active tests pass.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

I disabled the UFS DA tests for WCOSS2 as part of this PR. Thanks for the heads up.

@aerorahul
Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA
Do you have confidence that merging this is OK. You have run offline tests on WCOSS2, correct?

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

Yes, I am confident in this PR. Here is a summary of testing done on WCOSS2:

  • I ran the setup scripts on WCOSS2 yesterday, and just now, to confirm the correct Python version is being used and that the aerosol input data paths are being set correctly.
  • I successfully ran CI tests on November 1st-2nd.
  • Walter ran CI tests yesterday (11/12). All tests passed except the UFS DA test, as (temporarily) expected.

@aerorahul
Copy link
Copy Markdown
Contributor

@RussTreadon-NOAA
If you are in agreement that disabling ufs DA tests are acceptable, please let us know. We would like to merge this PR.

@CoryMartin-NOAA
Copy link
Copy Markdown
Contributor

I have a hunch (but no evidence) that #3124 may be related to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Jinja error during experiment creation on WCOSS Move all machine-specific options to workflow/hosts files

7 participants