Skip to content

fix for problems in ctests using debug mode of GSI release/rrfs1.0.0 #862

Merged
ShunLiu-NOAA merged 6 commits into
NOAA-EMC:release/rrfs.v1.0.0from
TingLei-NOAA:gsi-release-debug_fix
Jul 25, 2025
Merged

fix for problems in ctests using debug mode of GSI release/rrfs1.0.0 #862
ShunLiu-NOAA merged 6 commits into
NOAA-EMC:release/rrfs.v1.0.0from
TingLei-NOAA:gsi-release-debug_fix

Conversation

@TingLei-NOAA
Copy link
Copy Markdown
Contributor

@TingLei-NOAA TingLei-NOAA commented Apr 4, 2025

This is a draft PR to show fix for problems when running ctests using debug mode built GSI at rrfs.1.0.0.

Resolves #860

Now the working ctests for this implementation are

Test project /lfs/h2/emc/da/noscrub/Ting.Lei/dr-emc-gsi-release_debug/GSI/build
  Test #1: global_4denvar
  Test #2: rtma
  
  Test #4: netcdf_fv3_regional
  Test #5: hafs_4denvar_glbens
  Test #6: hafs_3denvar_hybens
  Test #7: global_enkf

With this PR, the update runs of all ctests (except for global_enkf) successfully ran to completion.
The global_enkf issue (a known issue) has been investigated at #776.
Also, letkf is not used in the current rrfs workflow.
Hence, the following tests would be to use real cases from current rrfsv1 to test the GSI::gsi and GSI :: regional EnKF

@ShunLiu-NOAA ShunLiu-NOAA requested a review from hu5970 May 7, 2025 19:42
@TingLei-NOAA TingLei-NOAA changed the title fix for problems in ctests using debug mode of GSI release/rrfs1.1.0 fix for problems in ctests using debug mode of GSI release/rrfs1.0.0 May 15, 2025
@TingLei-NOAA
Copy link
Copy Markdown
Contributor Author

Using this PR built with the release mode in comparison with the rrfs1.0.0 result, it is found both GSI::gsi and GSI::enkf produced identical results. But this PR is significantly faster for GSI:: gsi while slower for GSI::enkf.
They are maybe caused by the real time situations. But I would do some optimization for the changes for enkf.

@ShunLiu-NOAA
Copy link
Copy Markdown
Contributor

@TingLei-daprediction, could you give a number of how slow it is for GSI:enkf?

@TingLei-NOAA
Copy link
Copy Markdown
Contributor Author

TingLei-NOAA commented May 15, 2025

In the runs I already have, the PR Enkf used time 631 S while the control used 545s.
Update, in a second run , the PR Enkf used 581 s, still longer than the previous control 's.

@TingLei-NOAA
Copy link
Copy Markdown
Contributor Author

@ShunLiu-NOAA @hu5970 To avoid the additional computational time, I have removed the additional mpi_barrier as the work-around to prevent that runtime error with debug mode GSI::enkf. I also did some optimization of the openmp directive in the changes to avoid "false sharing". The current PR shows using less time in the recent two runs for the control and the PR. Hence, I think this PR is ready to be reviewed before merging.
For the incorporation of the recent GSI::gsi optimization (issue 879, I would suggest we use a different PR because that optimization commit (by including merging with then-current develop branch) has all the changes in the develop branch which can't be singled out by "git cherry-pick" easily and hence would be more effort-consuming.

@ShunLiu-NOAA
Copy link
Copy Markdown
Contributor

Thank you @TingLei-NOAA . We need another ISSURE/PR for GSI optimization test.

@TingLei-NOAA
Copy link
Copy Markdown
Contributor Author

@ShunLiu-NOAA I had opened the issue #879.

@TingLei-NOAA
Copy link
Copy Markdown
Contributor Author

@ShunLiu-NOAA The numbers you asked for: now,(skipped the writing of ensemble step ), the PR used the clock time 505 seconds while the control (RRFS1.0.0) used 519 seconds.
The ensemble write-out step was skipped because I want to avoid real time env impacts on the speeding up. The IO step is most vulnerable to them.

@TingLei-NOAA
Copy link
Copy Markdown
Contributor Author

@ShunLiu-NOAA An update for the runs of the complete EnKF (with the step of ensemble output0. The PR used 590 seconds and the control 561 Seconds. They also produced the identical results by comparison the min/max of the ensemble increament mean in their standard output . The difference of the clock time should come from the real time system conditions.
I think this PR is ready for reviewing .

@ShunLiu-NOAA
Copy link
Copy Markdown
Contributor

Thank you Ting.

Comment thread src/enkf/enkf.f90 Outdated
Comment thread src/gsi/combine_radobs.f90 Outdated
Comment thread src/gsi/combine_radobs.f90
@SamuelDegelia-NOAA
Copy link
Copy Markdown

Had a couple of minor comments but overall the changes make sense and look good to me!

Copy link
Copy Markdown
Contributor

@GangZhao-NOAA GangZhao-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ting,
The modifications look good to me.
Thank you!
-Gang

@RussTreadon-NOAA
Copy link
Copy Markdown
Contributor

@TingLei-NOAA , what is the status of this PR? Should we keep it open, merge it, or close it?

@TingLei-NOAA
Copy link
Copy Markdown
Contributor Author

TingLei-NOAA commented Jun 30, 2025

@RussTreadon-NOAA This is up to @ShunLiu-NOAA 's decision. We have some concerns on the rrfs1.0.0 (the base of this PR) 's behavior (slower than expected) recently found on wcoss2. This delayed this PR 's merging into rrfs1.0.0.

@RussTreadon-NOAA
Copy link
Copy Markdown
Contributor

Thank you. @TingLei-NOAA , for the update.

@ShunLiu-NOAA
Copy link
Copy Markdown
Contributor

@RussTreadon-NOAA I am going to test this PR again with real time parallel to check if it is slower than the current version.

@RussTreadon-NOAA
Copy link
Copy Markdown
Contributor

Thank you @ShunLiu-NOAA for testing.

Taking a step back, does the RRFS GSI we are preparing to send to NCO successfully run the RRFS in debug mode?

  • If yes, there is less need to merge the changes in this PR into NOAA-EMC:release/rrfs.v1.0.0.
  • If no, do the changes in this PR allow the debug gsi.x successfully run the RRFS in debug mode?
    • If yes, we need these changes or a variant of them in NOAA-EMC:release/rrfs.v1.0.0.
    • If no, we need to identify and fix the reason(s) for debug RRFS gsi.x failure(s).

@ShunLiu-NOAA
Copy link
Copy Markdown
Contributor

Thank you @ShunLiu-NOAA for testing.

Taking a step back, does the RRFS GSI we are preparing to send to NCO successfully run the RRFS in debug mode?

  • If yes, there is less need to merge the changes in this PR into NOAA-EMC:release/rrfs.v1.0.0.

  • If no, do the changes in this PR allow the debug gsi.x successfully run the RRFS in debug mode?

    • If yes, we need these changes or a variant of them in NOAA-EMC:release/rrfs.v1.0.0.
    • If no, we need to identify and fix the reason(s) for debug RRFS gsi.x failure(s).

Yes. RRFS GSI will be sent to NCO. This PR is to ensure RRFS GSI is can run in debug mode. I tested this PR with RRFS parallel on WCOSS2. It performs as expected. We will merge the change to rrfs1.0.0.

@ShunLiu-NOAA ShunLiu-NOAA merged commit abf57f6 into NOAA-EMC:release/rrfs.v1.0.0 Jul 25, 2025
@ShunLiu-NOAA ShunLiu-NOAA mentioned this pull request Feb 20, 2026
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

problems running debug mode GSI on wcoss2 for release/rrfs1.0.0

7 participants