Skip to content

Add capability to run forecast in segments#2795

Merged
DavidHuber-NOAA merged 16 commits into
NOAA-EMC:developfrom
WalterKolczynski-NOAA:feature/fcst_segments
Aug 12, 2024
Merged

Add capability to run forecast in segments#2795
DavidHuber-NOAA merged 16 commits into
NOAA-EMC:developfrom
WalterKolczynski-NOAA:feature/fcst_segments

Conversation

@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA commented Jul 25, 2024

Description

Adds the ability to run a forecast in segments instead of all at once. To accomplish this, a new local checkpnts variable is introduced to config.base to contain a comma-separated list of intermediate stopping points for the forecast. This is combined with FHMIN_GFS and FHMAX_GFS to create a comma-separated string FCST_SEGMENTS with all the start/end points that is used by config.fcst and rocoto workflow. Capability to parse these into python lists was added to wxflow in an accompanying PR. If checkpnts is an empty string, this will result in a single-segment forecast.

To accommodate the new segment metatasks that must be run serially, the capability of create_task() was expanded to allow a dictionary key of is_serial, which controls whether a metatask is parallel or serial using pre-existing capability in rocoto. The default when not given is parallel (i.e. most metatasks).

Resolves #2274
Refs NOAA-EMC/wxflow#39
Refs NOAA-EMC/wxflow#40

Type of change

  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

  • Forecast-only on Hercules
  • GEFS on Hercules

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor Author

I still have some more testing to do (and I think documentation to update), but wanted to get a draft out since people are going on leave.

@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor Author

Will check and see if the documentation needs to be updated Friday, but I think the last few bugs from this update are gone.

@WalterKolczynski-NOAA WalterKolczynski-NOAA marked this pull request as ready for review July 25, 2024 22:43
@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules label Jul 25, 2024
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Jul 25, 2024
@emcbot
Copy link
Copy Markdown

emcbot commented Jul 26, 2024

CI Passed Hercules at
Built and ran in directory /work2/noaa/stmp/CI/HERCULES/2795


Experiment C48_ATM_cd3d7c8b Completed 1 Cycles: *SUCCESS* at Thu Jul 25 19:58:13 CDT 2024
Experiment C96_atm3DVar_cd3d7c8b Completed 3 Cycles: *SUCCESS* at Thu Jul 25 21:16:52 CDT 2024
Experiment C96C48_hybatmDA_cd3d7c8b Completed 3 Cycles: *SUCCESS* at Thu Jul 25 21:17:05 CDT 2024
Experiment C48_S2SW_cd3d7c8b Completed 1 Cycles: *SUCCESS* at Thu Jul 25 21:41:48 CDT 2024
Experiment C48_S2SWA_gefs_cd3d7c8b Completed 1 Cycles: *SUCCESS* at Thu Jul 25 22:22:53 CDT 2024

Comment thread parm/config/gefs/yaml/defaults.yaml Outdated
@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor

Hercules testing completed successfully, resetting label.

@DavidHuber-NOAA DavidHuber-NOAA removed the CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress label Jul 26, 2024
@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor Author

Appears no documentation updates are needed at this time after all.

Comment thread parm/config/gfs/config.base Outdated
Comment on lines +294 to +304
export FCST_SEGMENTS_STR_GFS="@FCST_SEGMENTS_GFS@"
IFS=', ' read -ra FCST_SEGMENTS_GFS <<< "${FCST_SEGMENTS_STR_GFS}"
if (( ${FCST_SEGMENT:- -1} < 0 )); then
# Jobs other than the forecast don't care about segments, only the
# absolute start and end
declare -x FHMIN_GFS=${FCST_SEGMENTS_GFS[0]}
declare -x FHMAX_GFS=${FCST_SEGMENTS_GFS[-1]}
else
declare -x FHMIN_GFS=${FCST_SEGMENTS_GFS[${FCST_SEGMENT}]}
declare -x FHMAX_GFS=${FCST_SEGMENTS_GFS[${FCST_SEGMENT}+1]}
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to not do any of this in a config file and calculate this in a j-job or exscript?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without massive additional changes. FHMAX_GFS especially gets used later in this config, and then also in the job-specific configs that would be sourced immediately afterwards.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-done how we discussed Mon afternoon.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules and removed CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully labels Jul 29, 2024
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Jul 29, 2024
@emcbot emcbot added the CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully label Jul 30, 2024
Changes the way forecast segments are defined. Restores the original
`FHMIN_GFS` and `FHMAX_GFS` and then adds a local `breakpnts` variable
that contains the intermediate stopping points (if any). The original
list of segment endpoints is then constructed from that.

The determination of the `FHMIN` and `FHMAX` based on the segment is
moved from `config.base` to `config.fcst`. This required adding some
additional checks in `config.fcst` to clip other `FHMAX` variables to
`FHMAX`.
Updates the extended GFS case to use forecast segments to test that
capability for the GFS system (the GEFS case already tests segments
as well).
Earlier update to parse comma-separated bash variables as lists in
python mean we no longer need to do that in the task scripts.

This commit is incomplete until a follow-up PR that will update the
wxflow hash.
With comma-separated lists now being read into python as lists, the
jinja templates for the archive job had to be updated to not attempt
to create them anymore.
Missed updating one of the dependencies from task to metatask.
@TerrenceMcGuinness-NOAA
Copy link
Copy Markdown
Collaborator

TerrenceMcGuinness-NOAA commented Aug 8, 2024

All Cases Passed on Hercules

mterry (hercules-login-4) RUNTESTS $ pwd
/work2/noaa/stmp/CI/HERCULES/2795/RUNTESTS

mterry (hercules-login-4) RUNTESTS $ cat ci-run_check.log 
Experiment C48_ATM_d0927f02 Completed 1 Cycles: *SUCCESS* at Wed Aug  7 22:55:06 CDT 2024
Experiment C96C48_hybatmDA_d0927f02 Completed 3 Cycles: *SUCCESS* at Wed Aug  7 23:55:47 CDT 2024
Experiment C96_atm3DVar_d0927f02 Completed 3 Cycles: *SUCCESS* at Thu Aug  8 00:01:41 CDT 2024
Experiment C48_S2SW_d0927f02 Completed 1 Cycles: *SUCCESS* at Thu Aug  8 00:56:06 CDT 2024
Experiment C48_S2SWA_gefs_d0927f02 Completed 1 Cycles: *SUCCESS* at Thu Aug  8 01:15:09 CDT 2024

Manually setting state label for Hercules to PASSED

@WalterKolczynski-NOAA
Copy link
Copy Markdown
Contributor Author

The extended test is working when I try manually, so I am going to attempt CI again.

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 9, 2024

CI Update on Wcoss2 at 08/09/24 10:25:01 PM
============================================
Cloning and Building global-workflow PR: 2795
with PID: 59851 on host: clogin03

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 9, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Fri Aug  9 22:31:24 UTC 2024 on clogin03
---------------------------------------------------
Build: Completed at 08/09/24 11:10:59 PM
Case setup: Completed for experiment C48_ATM_7179c004
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_7179c004
Case setup: Skipped for experiment C48_S2SWA_gefs_7179c004
Case setup: Completed for experiment C48_S2SW_7179c004
Case setup: Completed for experiment C96_atm3DVar_extended_7179c004
Case setup: Skipped for experiment C96_atm3DVar_7179c004
Case setup: Completed for experiment C96_atmaerosnowDA_7179c004
Case setup: Completed for experiment C96C48_hybatmDA_7179c004
Case setup: Completed for experiment C96C48_ufs_hybatmDA_7179c004

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

Experiment C96_atm3DVar_extended_7179c004 FAIL on Wcoss2 at 08/10/24 01:42:27 AM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2795/RUNTESTS/COMROOT/C96_atm3DVar_extended_7179c004/logs/2021122100/gfsfcst_seg1.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

CI Update on Wcoss2 at 08/10/24 04:45:02 AM
============================================
Cloning and Building global-workflow PR: 2795
with PID: 230669 on host: clogin03

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Sat Aug 10 04:50:17 UTC 2024 on clogin03
---------------------------------------------------
Build: Completed at 08/10/24 05:25:54 AM
Case setup: Completed for experiment C48_ATM_3877cc5e
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_3877cc5e
Case setup: Skipped for experiment C48_S2SWA_gefs_3877cc5e
Case setup: Completed for experiment C48_S2SW_3877cc5e
Case setup: Completed for experiment C96_atm3DVar_extended_3877cc5e
Case setup: Skipped for experiment C96_atm3DVar_3877cc5e
Case setup: Completed for experiment C96_atmaerosnowDA_3877cc5e
Case setup: Completed for experiment C96C48_hybatmDA_3877cc5e
Case setup: Completed for experiment C96C48_ufs_hybatmDA_3877cc5e

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

Experiment C96C48_hybatmDA_3877cc5e FAIL on Wcoss2 at 08/10/24 05:42:38 AM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2795/RUNTESTS/COMROOT/C96C48_hybatmDA_3877cc5e/logs/2021122018/enkfgdasfcst_mem001.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2795/RUNTESTS/COMROOT/C96C48_hybatmDA_3877cc5e/logs/2021122018/enkfgdasfcst_mem002.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

CI Update on Wcoss2 at 08/10/24 06:08:53 AM
============================================
Cloning and Building global-workflow PR: 2795
with PID: 174460 on host: clogin03

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Sat Aug 10 06:15:51 UTC 2024 on clogin03
---------------------------------------------------
Build: Completed at 08/10/24 06:51:39 AM
Case setup: Completed for experiment C48_ATM_c73eecd2
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_c73eecd2
Case setup: Skipped for experiment C48_S2SWA_gefs_c73eecd2
Case setup: Completed for experiment C48_S2SW_c73eecd2
Case setup: Completed for experiment C96_atm3DVar_extended_c73eecd2
Case setup: Skipped for experiment C96_atm3DVar_c73eecd2
Case setup: Completed for experiment C96_atmaerosnowDA_c73eecd2
Case setup: Completed for experiment C96C48_hybatmDA_c73eecd2
Case setup: Completed for experiment C96C48_ufs_hybatmDA_c73eecd2

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

Experiment C96C48_hybatmDA_c73eecd2 FAIL on Wcoss2 at 08/10/24 07:06:33 AM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2795/RUNTESTS/COMROOT/C96C48_hybatmDA_c73eecd2/logs/2021122018/enkfgdasfcst_mem001.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2795/RUNTESTS/COMROOT/C96C48_hybatmDA_c73eecd2/logs/2021122018/enkfgdasfcst_mem002.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

CI Update on Wcoss2 at 08/10/24 07:40:51 AM
============================================
Cloning and Building global-workflow PR: 2795
with PID: 168059 on host: clogin03

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Sat Aug 10 07:46:57 UTC 2024 on clogin03
---------------------------------------------------
Build: Completed at 08/10/24 08:22:39 AM
Case setup: Completed for experiment C48_ATM_d0cd5295
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_d0cd5295
Case setup: Skipped for experiment C48_S2SWA_gefs_d0cd5295
Case setup: Completed for experiment C48_S2SW_d0cd5295
Case setup: Completed for experiment C96_atm3DVar_extended_d0cd5295
Case setup: Skipped for experiment C96_atm3DVar_d0cd5295
Case setup: Completed for experiment C96_atmaerosnowDA_d0cd5295
Case setup: Completed for experiment C96C48_hybatmDA_d0cd5295
Case setup: Completed for experiment C96C48_ufs_hybatmDA_d0cd5295

@emcbot
Copy link
Copy Markdown

emcbot commented Aug 10, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_d0cd5295 *** SUCCESS *** at 08/10/24 09:36:13 AM
Experiment C48_S2SW_d0cd5295 *** SUCCESS *** at 08/10/24 09:48:16 AM
Experiment C96C48_hybatmDA_d0cd5295 *** SUCCESS *** at 08/10/24 10:42:29 AM
Experiment C96_atmaerosnowDA_d0cd5295 *** SUCCESS *** at 08/10/24 11:33:18 AM
Experiment C96C48_ufs_hybatmDA_d0cd5295 *** SUCCESS *** at 08/10/24 12:03:21 PM
Experiment C96_atm3DVar_extended_d0cd5295 *** SUCCESS *** at 08/10/24 09:57:35 PM

export JEDIYAML="${PARMgfs}/gdas/aero/variational/3dvar_fgat_gfs_aero.yaml.j2"
else
export aero_bkg_times="6"
export aero_bkg_times="6," # Trailing comma is necessary so this is treated as a list
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clever!

Copy link
Copy Markdown
Contributor

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need capability to run multiple forecast segments

7 participants