Extend diag_mediator to allow the piecemeal posting of diagnostics#809
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev/gfdl #809 +/- ##
===========================================
Coverage ? 21.81%
===========================================
Files ? 137
Lines ? 33019
Branches ? 5850
===========================================
Hits ? 7202
Misses ? 25260
Partials ? 557 ☔ View full report in Codecov by Sentry. |
Hallberg-NOAA
left a comment
There was a problem hiding this comment.
I think that this capability should be extended to include the ability to initialize a diagnostic buffer to land values to avoid the need to call the piecemeal posting for all columns, even over land.
Also, this PR should be revised to adhere to the guidance in the MOM6 style guide at https://github.com/NOAA-GFDL/MOM6/wiki/Code-style-guide, as described in specific comments.
|
@Hallberg-NOAA Thank you for the comments. I have added descriptions of the units (see the inline response) and made the if-statements conform the to the style guide. Additionally, the fill value for the arrays can now be specified; for the purposes of the piecemeal posting of the diagnostics these are set to the |
b892a0d to
bc2a91f
Compare
8380b62 to
2b83320
Compare
Hallberg-NOAA
left a comment
There was a problem hiding this comment.
With this latest set of changes, all of my concerns have been addressed, and I think that this PR is now ready to be merged in.
2b83320 to
4eebe2f
Compare
Hallberg-NOAA
left a comment
There was a problem hiding this comment.
Please modify the syntax of this code so that it compiles and works correctly with all 3 of the intel, gnu and nvidia compilers.
4eebe2f to
3891e33
Compare
|
Unfortunately we are now getting segmentation faults with all three of the intel, gnu and nvidia compilers when we try to run the pipeline testing with this PR. The GFDL regression testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/28215 (and in particular the intel run job at This did pass our TC testing, but I triggered the test again a few hours later and confirmed that these problems are not arising because of a temporary problem with the Gaea computer. @ashao, if you get the chance to revisit this, that would be appreciated. |
|
Just to clarify is this a new regression for this branch, or had this branch not previously been run on the internal pipeline before? |
|
I went back and looked at the original regression testing that we did back in late February at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/26532. That version did not compile with the Nvidia compiler, but it did compile with gnu and intel. When it ran, it was also getting segmentation faults and bus errors. We focused on the compile failures, but the run-time errors were also problematic, and we should have pointed these out at the same time. The failed tests include everything using ePBL, the simplest of which would be something like ocean_only/single_column/EPBL. |
|
@Hallberg-NOAA: Thanks! That will definitely save some time in the debugging. |
|
Clsoing while debugging |
|
@Hallberg-NOAA @marshallward: These changes should resolve the segmentation faults. Additionally, I have confirmed that the |
Some quantities in MOM6 are calculated in subroutines which expect slices of the model's 2d or 3d arrays. Diagnosing these quantities can be challenging because the usual post_data calls expect whole arrays. To solve this problem, the changes here introduce a dynamic buffer that can grow as needed over the course of an simulation. This buffer keeps track of the diagnostic IDs and the index in the buffer. When a slot in the buffer is no longer needed, for example if the whole array has been computed and posted, the slot is marked as "available" for overwriting. The buffer is only allowed to grow if all the currently allocated slots are in use. Any computational cost associated with growing this buffer will only happen in the first few steps of the model as post_data for requested diagnostics is called for the first time in that run.
The diag mediator has been extended to add a dynamic buffer to each axes group. Three new methods have also been added to enable the piecemeal posting (by column, by point) of a diagnostic and a 'final' method to allow the buffer to be reused later.
ePBL calculates the vertical diffusivity column by column. This provides a convenient sanity check of the new piecemeal posting of diagnostics. The original diagnostic Kd_ePBL is done by posting the full 3d prognostic array, whereas a new diagnostic Kd_ePBL_col_by_col posts the same array from within ePBL but does so column by column.
Fixes failures in the CI due to some procedures and type members not having docstrings.
Two issues were leading to sporadic compilation errors and segmentation faults 1) The post_diagnostics_by_point only looked up the slot in the buffer but would not expand the buffer -> now uses check_capacity_by_id which will expand the buffer if needed 2) post_data_3d_final now conditionally checks to make sure that the buffer slot is not zero before taking any action
Fixes a bug where the Kd_ePBL_col_by_col diagnostic's post_data calls were not being checked before being called. Also, move setting of extents for buffers on native axes to mask generation
Update the extents for the piecemeal buffers after the axes info is set. Resolves a problem where the extents were not being retained.
f1597b6 to
6bdd4a7
Compare
Hallberg-NOAA
left a comment
There was a problem hiding this comment.
This PR is now passing all of our testing. The pipeline testing has passed at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/30328 with the expected warning about the addition of a new diagnostic.
Building MOM_diag_mediator.F90 with GCC and -O2 became much slower after
diagnostic buffers were added.
```
commit 6f6975a
Author: Andrew Shao <andrew.shao@hpe.com>
Date: Wed Mar 18 12:17:43 2026 -0700
+Extend diag_mediator to allow the piecemeal posting of diagnostics (#809)
```
Before this commit, MOM_diag_mediator.F90 built in about 15 seconds.
After this commit, build time increased to more than 170 seconds.
The slowdown was isolated to derived types with multiple allocatable
array components, not to abstract types, inheritance, or type-bound
methods. A minimal form of the triggering pattern is shown below.
```
type :: diag_buffer_2d
real, allocatable :: buffer(:,:,:)
integer, allocatable :: ids(:)
end type
```
The component did not need to be referenced in MOM_diag_mediator.F90.
It was enough for diag_buffer_2d to appear as a component of axes_grp.
Somehow, this was resolved by defining an explicit dummy finalizer in
`diag_buffer_2d`.
```
type :: diag_buffer_2d
! ...
contains
final :: finalize_diag_buffer_2d
end type diag_buffer_2d
subroutine finalize_diag_buffer_2d
type(diag_buffer_2d), intent(inout) :: this
end subroutine finalize_diag_buffer_2d
```
With this change, build time returns to about 15 seconds.
* The finalizers are intentionally empty. Allocatable components are
automatically deallocated by the language, so no explicit deallocate()
calls are needed here.
Finalizers should only be used for custom cleanup, such as external
resources or manually managed pointer targets.
* `diag_buffer_[23]d` contains an array of `buffer_[23]d` objects. The
language standard notes that components are finalized before the
parent.
* No similar compile-time benefit was seen from adding dummy finalizers
to buffer_[23]d, so those types are left unchanged.
Some quantities in MOM6 are calculated in subroutines which expect slices of the model's 2d or 3d arrays. Diagnosing these quantities can be challenging because the usual post_data calls expect whole arrays. To solve this problem, the changes here introduce a dynamic buffer that can grow as needed over the course of an simulation.
This buffer keeps track of the diagnostic IDs and the index in the buffer. When a slot in the buffer is no longer needed, for example if the whole array has been computed and posted, the slot is marked as "available" for overwriting. The buffer is only allowed to grow if all the currently allocated slots are in use.
Any computational cost associated with growing this buffer will only happen in the first few steps of the model as post_data for requested diagnostics is called for the first time in that run.
This new piecemeal posting of a diagnostic was implemented for
Kd_ePBL. The new diagnosticKd_ePBL_col_by_colis the same as the the original field in theBaltic_OM4_05case.@adcroft @marshallward: Two points in particular that I'd like your feedback on:
post_datacall to be done without an extra copy. I have included astoresubroutine that could be used if we made it private, but otherwise those should be deleted.