Skip to content

Extend diag_mediator to allow the piecemeal posting of diagnostics#809

Merged
Hallberg-NOAA merged 13 commits into
NOAA-GFDL:dev/gfdlfrom
ashao:post_piecemeal
Mar 18, 2026
Merged

Extend diag_mediator to allow the piecemeal posting of diagnostics#809
Hallberg-NOAA merged 13 commits into
NOAA-GFDL:dev/gfdlfrom
ashao:post_piecemeal

Conversation

@ashao
Copy link
Copy Markdown

@ashao ashao commented Jan 20, 2025

Some quantities in MOM6 are calculated in subroutines which expect slices of the model's 2d or 3d arrays. Diagnosing these quantities can be challenging because the usual post_data calls expect whole arrays. To solve this problem, the changes here introduce a dynamic buffer that can grow as needed over the course of an simulation.

This buffer keeps track of the diagnostic IDs and the index in the buffer. When a slot in the buffer is no longer needed, for example if the whole array has been computed and posted, the slot is marked as "available" for overwriting. The buffer is only allowed to grow if all the currently allocated slots are in use.

Any computational cost associated with growing this buffer will only happen in the first few steps of the model as post_data for requested diagnostics is called for the first time in that run.

This new piecemeal posting of a diagnostic was implemented for Kd_ePBL. The new diagnostic Kd_ePBL_col_by_col is the same as the the original field in the Baltic_OM4_05 case.

@adcroft @marshallward: Two points in particular that I'd like your feedback on:

  • The underlying buffer is public, but I cannot think of a way to make it private and still allow a post_data call to be done without an extra copy. I have included a store subroutine that could be used if we made it private, but otherwise those should be deleted.
  • Each time we post the data in a piecemeal fashion, the slot index is computed. We could put in some logic so that we store it once per loop at the beginning of the section where the diagnostic is being computed/posted. I am not sure that it would be worth the effort/complexity for what I am assuming would be a marginal performance gain.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 20, 2025

Codecov Report

Attention: Patch coverage is 0% with 200 lines in your changes missing coverage. Please review.

Please upload report for BASE (dev/gfdl@40a59f7). Learn more about missing BASE report.
Report is 2 commits behind head on dev/gfdl.

Files with missing lines Patch % Lines
src/framework/MOM_diag_buffers.F90 0.00% 181 Missing ⚠️
src/framework/MOM_diag_mediator.F90 0.00% 19 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             dev/gfdl     #809   +/-   ##
===========================================
  Coverage            ?   21.81%           
===========================================
  Files               ?      137           
  Lines               ?    33019           
  Branches            ?     5850           
===========================================
  Hits                ?     7202           
  Misses              ?    25260           
  Partials            ?      557           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment thread src/framework/MOM_diag_buffers.F90 Outdated
Comment thread src/framework/MOM_diag_buffers.F90 Outdated
Copy link
Copy Markdown
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this capability should be extended to include the ability to initialize a diagnostic buffer to land values to avoid the need to call the piecemeal posting for all columns, even over land.

Also, this PR should be revised to adhere to the guidance in the MOM6 style guide at https://github.com/NOAA-GFDL/MOM6/wiki/Code-style-guide, as described in specific comments.

@ashao ashao requested a review from Hallberg-NOAA January 23, 2025 19:02
@ashao
Copy link
Copy Markdown
Author

ashao commented Jan 23, 2025

@Hallberg-NOAA Thank you for the comments. I have added descriptions of the units (see the inline response) and made the if-statements conform the to the style guide. Additionally, the fill value for the arrays can now be specified; for the purposes of the piecemeal posting of the diagnostics these are set to the DIAG_MISSING_VALUE specified in MOM_input.

Comment thread src/framework/MOM_diag_buffers.F90 Outdated
@ashao ashao force-pushed the post_piecemeal branch 4 times, most recently from b892a0d to bc2a91f Compare February 3, 2025 20:48
Comment thread src/framework/MOM_diag_buffers.F90
@Hallberg-NOAA Hallberg-NOAA added the enhancement New feature or request label Feb 22, 2025
Copy link
Copy Markdown
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this latest set of changes, all of my concerns have been addressed, and I think that this PR is now ready to be merged in.

Comment thread src/framework/MOM_diag_buffers.F90 Outdated
Copy link
Copy Markdown
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify the syntax of this code so that it compiles and works correctly with all 3 of the intel, gnu and nvidia compilers.

@Hallberg-NOAA
Copy link
Copy Markdown
Member

Hallberg-NOAA commented Jul 20, 2025

Unfortunately we are now getting segmentation faults with all three of the intel, gnu and nvidia compilers when we try to run the pipeline testing with this PR. The GFDL regression testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/28215 (and in particular the intel run job at
https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/jobs/163349 )
seems to indicate that the problem in various test cases arises with the allocate() statement in grow_3d() on line 225 of MOM_diag_buffers.F90, or perhaps with the allocated(this%ids) test on line 105 in find_buffer_slot(), or on the reference to this%ids(:) on line 125 in grow_ids(). I suspect that these are arising from calls with an unallocated diag_buffer object.

This did pass our TC testing, but I triggered the test again a few hours later and confirmed that these problems are not arising because of a temporary problem with the Gaea computer. @ashao, if you get the chance to revisit this, that would be appreciated.

@ashao
Copy link
Copy Markdown
Author

ashao commented Jul 20, 2025

Just to clarify is this a new regression for this branch, or had this branch not previously been run on the internal pipeline before?

@Hallberg-NOAA
Copy link
Copy Markdown
Member

I went back and looked at the original regression testing that we did back in late February at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/26532. That version did not compile with the Nvidia compiler, but it did compile with gnu and intel. When it ran, it was also getting segmentation faults and bus errors. We focused on the compile failures, but the run-time errors were also problematic, and we should have pointed these out at the same time. The failed tests include everything using ePBL, the simplest of which would be something like ocean_only/single_column/EPBL.

@ashao
Copy link
Copy Markdown
Author

ashao commented Jul 21, 2025

@Hallberg-NOAA: Thanks! That will definitely save some time in the debugging.

@ashao
Copy link
Copy Markdown
Author

ashao commented Mar 11, 2026

Clsoing while debugging

@ashao ashao closed this Mar 11, 2026
@ashao ashao reopened this Mar 12, 2026
@ashao
Copy link
Copy Markdown
Author

ashao commented Mar 12, 2026

@Hallberg-NOAA @marshallward: These changes should resolve the segmentation faults. Additionally, I have confirmed that the Kd_ePBL_col_by_col and the Kd_ePBL diagnostics are the same (as long as the the fill value is 0.).

@ashao ashao requested a review from Hallberg-NOAA March 12, 2026 18:13
Comment thread src/framework/MOM_diag_buffers.F90
ashao and others added 13 commits March 17, 2026 17:00
Some quantities in MOM6 are calculated in subroutines which expect
slices of the model's 2d or 3d arrays. Diagnosing these quantities
can be challenging because the usual post_data calls expect
whole arrays. To solve this problem, the changes here introduce
a dynamic buffer that can grow as needed over the course of an
simulation.

This buffer keeps track of the diagnostic IDs and the index in
the buffer. When a slot in the buffer is no longer needed, for
example if the whole array has been computed and posted, the
slot is marked as "available" for overwriting. The buffer is only
allowed to grow if all the currently allocated slots are in use.

Any computational cost associated with growing this buffer will
only happen in the first few steps of the model as post_data
for requested diagnostics is called for the first time in that
run.
The diag mediator has been extended to add a dynamic buffer to
each axes group. Three new methods have also been added to enable
the piecemeal posting (by column, by point) of a diagnostic and a
'final' method to allow the buffer to be reused later.
ePBL calculates the vertical diffusivity column by column. This
provides a convenient sanity check of the new piecemeal posting
of diagnostics. The original diagnostic Kd_ePBL is done by posting
the full 3d prognostic array, whereas a new diagnostic
Kd_ePBL_col_by_col posts the same array from within ePBL but does
so column by column.
Fixes failures in the CI due to some procedures and type members
not having docstrings.
Two issues were leading to sporadic compilation errors and
segmentation faults

1) The post_diagnostics_by_point only looked up the slot in the
buffer but would not expand the buffer -> now uses
check_capacity_by_id which will expand the buffer if needed

2) post_data_3d_final now conditionally checks to make sure that
the buffer slot is not zero before taking any action
Fixes a bug where the Kd_ePBL_col_by_col diagnostic's post_data
calls were not being checked before being called.

Also, move setting of extents for buffers on native axes to mask
generation
Update the extents for the piecemeal buffers after the axes info
is set. Resolves a problem where the extents were not being
retained.
Copy link
Copy Markdown
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is now passing all of our testing. The pipeline testing has passed at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/30328 with the expected warning about the addition of a new diagnostic.

@Hallberg-NOAA Hallberg-NOAA merged commit 6f6975a into NOAA-GFDL:dev/gfdl Mar 18, 2026
52 checks passed
Hallberg-NOAA pushed a commit that referenced this pull request May 21, 2026
Building MOM_diag_mediator.F90 with GCC and -O2 became much slower after
diagnostic buffers were added.
```
    commit 6f6975a
    Author: Andrew Shao <andrew.shao@hpe.com>
    Date:   Wed Mar 18 12:17:43 2026 -0700

        +Extend diag_mediator to allow the piecemeal posting of diagnostics (#809)
```
Before this commit, MOM_diag_mediator.F90 built in about 15 seconds.
After this commit, build time increased to more than 170 seconds.

The slowdown was isolated to derived types with multiple allocatable
array components, not to abstract types, inheritance, or type-bound
methods.  A minimal form of the triggering pattern is shown below.
```
  type :: diag_buffer_2d
    real, allocatable :: buffer(:,:,:)
    integer, allocatable :: ids(:)
  end type
```
The component did not need to be referenced in MOM_diag_mediator.F90.
It was enough for diag_buffer_2d to appear as a component of axes_grp.

Somehow, this was resolved by defining an explicit dummy finalizer in
`diag_buffer_2d`.
```
  type :: diag_buffer_2d
    ! ...
  contains
    final :: finalize_diag_buffer_2d
  end type diag_buffer_2d

  subroutine finalize_diag_buffer_2d
    type(diag_buffer_2d), intent(inout) :: this
  end subroutine finalize_diag_buffer_2d
```
With this change, build time returns to about 15 seconds.

* The finalizers are intentionally empty.  Allocatable components are
  automatically deallocated by the language, so no explicit deallocate()
  calls are needed here.

  Finalizers should only be used for custom cleanup, such as external
  resources or manually managed pointer targets.

* `diag_buffer_[23]d` contains an array of `buffer_[23]d` objects.  The
  language standard notes that components are finalized before the
  parent.

* No similar compile-time benefit was seen from adding dummy finalizers
  to buffer_[23]d, so those types are left unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants