Skip to content

Add output logging#166

Merged
jiandewang merged 21 commits into
NOAA-EMC:dev/emcfrom
DeniseWorthen:feature/chk4output
Dec 15, 2025
Merged

Add output logging#166
jiandewang merged 21 commits into
NOAA-EMC:dev/emcfrom
DeniseWorthen:feature/chk4output

Conversation

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

@DeniseWorthen DeniseWorthen commented Nov 2, 2025

This PR implements adds a new module, mom_cap_outputlog.F90 (a stub for CESMCOUPLED), that will enable output logging diagnostics at a given hourly output frequency of a ModelAdvance via Alarms processing.

It should be remembered that the model clock does not advance until the end of the ModelAdvance, thus each Alarm is referenced to the model's "nextTime". However, the output actually closes one full interval after the end of the averaging period (I don't know why). So the 00-06 average closes at the fh=12 + 1 advance.

When an output alarm rings, a flag is set to check for file completion on the next ModelAdvance and then write a set of information to a log file. The check flag is then set false. The file is assumed to be complete if the unlimited dimension value is > 0 and (if required) the file size is > than the file size when first created.

The actual behavior of the file creation/closing can be see by setting a MOM6 attribute (debug_outputlog = true) which will enable a series of messages written to stdout. The following is a snip, where the field (still 0//complete) gives the status check using the unlimited dimension length and the final field is the state of the check_nextAdvance flag.

438: 40: MOM_cap:(outputlog_run) fname ./ocn_2011_10_01_03_00.nc  2011-10-01T11:00:00  2011-10-01T12:00:00
439: 40: MOM_cap:(outputlog_run)./ocn_2011_10_01_03_00.nc exists 2011-10-01T11:00:00  2011-10-01T12:00:00 still 0  T
445: 40: MOM_cap:(outputlog_run)./ocn_2011_10_01_03_00.nc exists 2011-10-01T12:00:00  2011-10-01T13:00:00 complete  F
446: 40: MOM_cap:(outputlog_run)./ocn_2011_10_01_03_00.nc exists 2011-10-01T13:00:00  2011-10-01T14:00:00 complete  F
449: 40: MOM_cap:(outputlog_run)./ocn_2011_10_01_03_00.nc exists 2011-10-01T14:00:00  2011-10-01T15:00:00 complete  F
450: 40: MOM_cap:(outputlog_run)./ocn_2011_10_01_03_00.nc exists 2011-10-01T15:00:00  2011-10-01T16:00:00 complete  F
451: 40: MOM_cap:(outputlog_run)./ocn_2011_10_01_03_00.nc exists 2011-10-01T16:00:00  2011-10-01T17:00:00 complete  F
457: 40: MOM_cap:(outputlog_run) fname ./ocn_2011_10_01_09_00.nc  2011-10-01T17:00:00  2011-10-01T18:00:00
458: 40: MOM_cap:(outputlog_run)./ocn_2011_10_01_09_00.nc exists 2011-10-01T17:00:00  2011-10-01T18:00:00 still 0  T
464: 40: MOM_cap:(outputlog_run)./ocn_2011_10_01_09_00.nc exists 2011-10-01T18:00:00  2011-10-01T19:00:00 complete  F

When the model stop time is reached, any pending output is written, along with the output (if any) for the final interval. This produces two log files; one for the interval prior to the final interval and one at the final interval; the log file for the very final interval will include the tag stop (eg 20111002.000000.mom6.stop.06h)

* adding lstop to the log name does not appear to be required
* add test for filesize in addition to unlimited dimension>0
* add filetest function to simplify code
@DeniseWorthen DeniseWorthen marked this pull request as ready for review November 21, 2025 14:31
@jiandewang
Copy link
Copy Markdown
Collaborator

@DeniseWorthen can you share me your cpld_control_gfsv17_iau_intel (or any other case that uses this feature ) run dir ?

@DeniseWorthen
Copy link
Copy Markdown
Collaborator Author

@jiandewang You can check

/scratch4/NCEPDEV/stmp/Denise.Worthen/RT_RUNDIRS/Denise.Worthen/rt.outputlog

which was a run I did w/ the optional debug-flag turned on to record the logging logic.

I've run the RTs and the only test which changes baselines is the gfsv17 iau test, and that is because the mom6 output file now contains the minutes in the file name.

@jiandewang
Copy link
Copy Markdown
Collaborator

jiandewang commented Nov 24, 2025

@DeniseWorthen This is the feature we are looking for a long time

I checked /scratch4/NCEPDEV/stmp/Denise.Worthen/RT_RUNDIRS/Denise.Worthen/rt.outputlog/cpld_control_p8_intel

grep outputlog_run out|grep 15_00

150: MOM_cap:(outputlog_run) fname ./MOM6_OUTPUT/ocn_2021_03_22_15_00.nc  2021-03-22T23:00:00  2021-03-23T00:00:00 checkflag  T use_filesize  T           9415276               1
150: MOM_cap:(outputlog_run)    ./MOM6_OUTPUT/ocn_2021_03_22_15_00.nc exists 2021-03-22T23:00:00  2021-03-23T00:00:00 not complete, chkflag  T         9415276         9415276
150: MOM_cap:(outputlog_run)    ./MOM6_OUTPUT/ocn_2021_03_22_15_00.nc exists 2021-03-23T00:00:00  2021-03-23T01:00:00     complete, chkflag  F         9415276        90532460
150: MOM_cap:(outputlog_run)    ./MOM6_OUTPUT/ocn_2021_03_22_15_00.nc exists 2021-03-23T01:00:00  2021-03-23T02:00:00     complete, chkflag  F         9415276        90532460
150: MOM_cap:(outputlog_run)    ./MOM6_OUTPUT/ocn_2021_03_22_15_00.nc exists 2021-03-23T02:00:00  2021-03-23T03:00:00     complete, chkflag  F         9415276        90532460
150: MOM_cap:(outputlog_run)    ./MOM6_OUTPUT/ocn_2021_03_22_15_00.nc exists 2021-03-23T03:00:00  2021-03-23T04:00:00     complete, chkflag  F         9415276        90532460
150: MOM_cap:(outputlog_run)    ./MOM6_OUTPUT/ocn_2021_03_22_15_00.nc exists 2021-03-23T04:00:00  2021-03-23T05:00:00     complete, chkflag  F         9415276        90532460

why "complete, chkflag F" appeared 5 times ?

also log.cmeps.fxxx will be the triggle for post, right ? This file will not be on disk until the file is written complete, right ?

if (ierr /= nf90_noerr) then
write(0, '(A)') 'FATAL ERROR: ' // trim(string)// ' : ' // trim(nf90_strerror(ierr))
! This fails on WCOSS2 with Intel 19 compiler. See
! https://community.intel.com/t5/Intel-Fortran-Compiler/STOP-and-ERROR-STOP-with-variable-stop-codes/m-p/1182521#M149254
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long, failed in doxygen checking

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll fix it. Thanks.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiandewang what is the line length allowed?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max. 132

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that this line ends in column 125.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me ask Marshall on the exact max. Length allowed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is 120.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DeniseWorthen
Copy link
Copy Markdown
Collaborator Author

DeniseWorthen commented Nov 24, 2025

@jiandewang The feature sets a flag, saying whether it is done checking for a certain file to be complete. That is "chkflag". So "chkflag=T" means it will check the next time through the model advance and "chkflag=F" means it is done checking that file (and found it complete). So the extra chkflag statements are as it continues integration, it is reporting that it no longer needs to check the file.

Note this is with the "debug output logging" set true---normally none of this info appears in stdout.

This feature is really unrelated to the log.cmeps. That is a log file when CMEPS has completed writing it's restart---so all components must have also finished writing their restarts.

Originally, the request was for this feature in order to enable ocean-post. I think now the purpose is also to know which restart file can be used (if the model crashes), because you want continuous output. Your most recent restart might not coincide w/ your most recent output (history file). See NOAA-EMC/global-workflow#4249

@jiandewang jiandewang requested a review from sanAkel November 24, 2025 23:00
@jiandewang
Copy link
Copy Markdown
Collaborator

@DeniseWorthen I will use squash merge since you have ~20 commits. Can you write a short description for this PR on what I shall put in commit hash ?

@DeniseWorthen
Copy link
Copy Markdown
Collaborator Author

Is this verbose enough, or does it need a full description?

Add mom_cap_outputlog.F90 that enables output logging diagnostics at a given hourly output frequency

@jiandewang
Copy link
Copy Markdown
Collaborator

@DeniseWorthen can you expand it a bit so that people can understand the purpose of this PR ?
For example, you can phrase it like this way: "Add mom_cap_outputlog.F90 that enables output logging diagnostics at a given hourly output frequency. This is for UFS operational purpose because ............"

@DeniseWorthen
Copy link
Copy Markdown
Collaborator Author

This feature is required for UFS operational configurations and is used to determine when MOM6 output (diagnostics and restart) have been completed. The log files created by this feature can be queried by the Global Workflow to either trigger downstream jobs or to ensure that if a run fails and a restart is required, model output is available consistent with a given restart file.

@jiandewang jiandewang merged commit 41f39db into NOAA-EMC:dev/emc Dec 15, 2025
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add logging of diagnostic output and restart files

3 participants