Add output logging#166
Conversation
* required for eg output freq=24
90d7b3d to
c0009ef
Compare
1046ccb to
5c9ead4
Compare
1b64053 to
88bbd7b
Compare
* adding lstop to the log name does not appear to be required
* add test for filesize in addition to unlimited dimension>0 * add filetest function to simplify code
3ae9dbc to
2ddc0f2
Compare
|
@DeniseWorthen can you share me your cpld_control_gfsv17_iau_intel (or any other case that uses this feature ) run dir ? |
|
@jiandewang You can check which was a run I did w/ the optional debug-flag turned on to record the logging logic. I've run the RTs and the only test which changes baselines is the gfsv17 iau test, and that is because the mom6 output file now contains the minutes in the file name. |
|
@DeniseWorthen This is the feature we are looking for a long time I checked /scratch4/NCEPDEV/stmp/Denise.Worthen/RT_RUNDIRS/Denise.Worthen/rt.outputlog/cpld_control_p8_intel grep outputlog_run out|grep 15_00 why "complete, chkflag F" appeared 5 times ? also log.cmeps.fxxx will be the triggle for post, right ? This file will not be on disk until the file is written complete, right ? |
| if (ierr /= nf90_noerr) then | ||
| write(0, '(A)') 'FATAL ERROR: ' // trim(string)// ' : ' // trim(nf90_strerror(ierr)) | ||
| ! This fails on WCOSS2 with Intel 19 compiler. See | ||
| ! https://community.intel.com/t5/Intel-Fortran-Compiler/STOP-and-ERROR-STOP-with-variable-stop-codes/m-p/1182521#M149254 |
There was a problem hiding this comment.
line too long, failed in doxygen checking
There was a problem hiding this comment.
OK, I'll fix it. Thanks.
There was a problem hiding this comment.
@jiandewang what is the line length allowed?
There was a problem hiding this comment.
I get that this line ends in column 125.
There was a problem hiding this comment.
Let me ask Marshall on the exact max. Length allowed.
|
@jiandewang The feature sets a flag, saying whether it is done checking for a certain file to be complete. That is "chkflag". So "chkflag=T" means it will check the next time through the model advance and "chkflag=F" means it is done checking that file (and found it complete). So the extra chkflag statements are as it continues integration, it is reporting that it no longer needs to check the file. Note this is with the "debug output logging" set true---normally none of this info appears in stdout. This feature is really unrelated to the log.cmeps. That is a log file when CMEPS has completed writing it's restart---so all components must have also finished writing their restarts. Originally, the request was for this feature in order to enable ocean-post. I think now the purpose is also to know which restart file can be used (if the model crashes), because you want continuous output. Your most recent restart might not coincide w/ your most recent output (history file). See NOAA-EMC/global-workflow#4249 |
|
@DeniseWorthen I will use squash merge since you have ~20 commits. Can you write a short description for this PR on what I shall put in commit hash ? |
|
Is this verbose enough, or does it need a full description? Add mom_cap_outputlog.F90 that enables output logging diagnostics at a given hourly output frequency |
|
@DeniseWorthen can you expand it a bit so that people can understand the purpose of this PR ? |
|
This feature is required for UFS operational configurations and is used to determine when MOM6 output (diagnostics and restart) have been completed. The log files created by this feature can be queried by the Global Workflow to either trigger downstream jobs or to ensure that if a run fails and a restart is required, model output is available consistent with a given restart file. |
This PR implements adds a new module,
mom_cap_outputlog.F90(a stub for CESMCOUPLED), that will enable output logging diagnostics at a given hourly output frequency of a ModelAdvance via Alarms processing.It should be remembered that the model clock does not advance until the end of the ModelAdvance, thus each Alarm is referenced to the model's "nextTime". However, the output actually closes one full interval after the end of the averaging period (I don't know why). So the 00-06 average closes at the fh=12 + 1 advance.
When an output alarm rings, a flag is set to check for file completion on the next ModelAdvance and then write a set of information to a log file. The check flag is then set false. The file is assumed to be complete if the unlimited dimension value is > 0 and (if required) the file size is > than the file size when first created.
The actual behavior of the file creation/closing can be see by setting a MOM6 attribute (
debug_outputlog = true) which will enable a series of messages written to stdout. The following is a snip, where the field (still 0//complete) gives the status check using the unlimited dimension length and the final field is the state of thecheck_nextAdvanceflag.When the model stop time is reached, any pending output is written, along with the output (if any) for the final interval. This produces two log files; one for the interval prior to the final interval and one at the final interval; the log file for the very final interval will include the tag
stop(eg20111002.000000.mom6.stop.06h)