Skip to content

Move ESMF-based surface regridding code from GDASApp to UFS_UTILS#1038

Merged
GeorgeGayno-NOAA merged 30 commits into
ufs-community:developfrom
ClaraDraper-NOAA:feature/mv_regridding
Apr 4, 2025
Merged

Move ESMF-based surface regridding code from GDASApp to UFS_UTILS#1038
GeorgeGayno-NOAA merged 30 commits into
ufs-community:developfrom
ClaraDraper-NOAA:feature/mv_regridding

Conversation

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator

@ClaraDraper-NOAA ClaraDraper-NOAA commented Mar 14, 2025

DESCRIPTION OF CHANGES:

Added a new directory, with the surface regridding code that was in GDASApp.

Changes to the files, from what was in GDASApp:

-Added a regression test for the regridding code, which has been tested on hera only. The regression test inputs and baseline files are here:

/scratch2/BMC/gsienkf/Clara.Draper/gerrit-hera/PRprep/new_test

-Added Doxygen documentation files, plus necessary changes to the fortran routines.

There are are no changes to the executable code in the fortran files.

TESTS CONDUCTED:

If there are changes to the build or source code, the tests below must be conducted. Contact a repository manager if you need assistance.

  • Compile branch on all Tier 1 machines using Intel (Orion, Jet, Hera, Hercules and WCOSS2). Done using 009456b.
  • Compile branch on Hera using GNU. Done using 009456b.
  • Compile branch in 'Debug' mode on WCOSS2. Done using 009456b.
  • Compile with Doxygen on any machine with no errors. For details, see DOCUMENTATION section.
  • Run unit tests locally on any Tier 1 machine. Done on Hera using 9f313c7 and GNU. All tests passed.
  • Run new regrid_sfc consistency test locally on all Tier 1 machines. Done successfully using 009456b.

Describe any additional tests performed.

Also, checked that the regridding executable can be run from global_workflow. This will require a spack-stack update

DEPENDENCIES:

None.

DOCUMENTATION:

All new and updated source code must be documented with Doxygen.

  • Doxygen is updated. Using 009456b, the Doxygen was checked and looked correct. See: here

ISSUE:

Fixes issue 1032

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

@GeorgeGayno-NOAA Creating as a draft, as I haven't done the unit tests for the new executable. I'm hoping you can still help me with that. Everything else is ready to go (I hope!).

Comment thread reg_tests/regrid_sfc/driver.s4.sh Outdated
@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

@ClaraDraper-NOAA - I know that other consistency tests have separate driver scripts for each machine, but one of my goals is to combine them if possible. Please see my script on Hera: /scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS.clara/reg_tests/regrid_sfc/driver.sh.

If that script works on the RDHPCS machines, I can expand it for WCOSS2.

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

@GeorgeGayno-NOAA The driver.sh script works for hera. I've added it, and deleted the separate scripts.

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

@ClaraDraper-NOAA - I can't get the consistency test to run because the /sfcincr_gsi file is missing. I can also set up the input and baseline directories on each machine.

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

@ClaraDraper-NOAA - I can't get the consistency test to run because the /sfcincr_gsi file is missing. I can also set up the input and baseline directories on each machine.

The sfcinc file is here:
/scratch2/BMC/gsienkf/Clara.Draper/gerrit-hera/PRprep/ufs_utils.fd/reg_tests/regrid_sfc/input_data_noahmp/

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

Is sfcincr_gsi input only or does the program modify it? That is one of the files the consistency test is comparing against the baseline.

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

input only.

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

I see what the problem is. I'm fixing it (and also trying to figure out why my tests are passing, when they shouldn't have).

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

I fixed both issues.

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

When I moved the namelist read routine to its own routine, I did not get the Doxygen correct. Please baseline my updated version: /scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS.clara/sorc/regrid_sfc.fd/readin_setup.F90

done.

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

@GeorgeGayno-NOAA thanks. I've addressed all your comments.

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

@GeorgeGayno-NOAA thanks. I've addressed all your comments.

The Doxygen looks good. See: https://www.emc.ncep.noaa.gov/jcsda/ggayno/doxygen/html/regrid_sfc/index.html

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

@GeorgeGayno-NOAA thanks. I've addressed all your comments.

The Doxygen looks good. See: https://www.emc.ncep.noaa.gov/jcsda/ggayno/doxygen/html/regrid_sfc/index.html

Thanks George. What's the next step?

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

@GeorgeGayno-NOAA thanks. I've addressed all your comments.

The Doxygen looks good. See: https://www.emc.ncep.noaa.gov/jcsda/ggayno/doxygen/html/regrid_sfc/index.html

Thanks George. What's the next step?

Please baseline my updates to the regression test in /scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS.clara

  • ./reg_tests/rt.sh
  • ./reg_tests/regrid_sfc/driver.sh
  • ./reg_tests/regrid_sfc/gauss2fv3incr.sh

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

@GeorgeGayno-NOAA thanks. I've addressed all your comments.

The Doxygen looks good. See: https://www.emc.ncep.noaa.gov/jcsda/ggayno/doxygen/html/regrid_sfc/index.html

Thanks George. What's the next step?

Please baseline my updates to the regression test in /scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS.clara

  • ./reg_tests/rt.sh
  • ./reg_tests/regrid_sfc/driver.sh
  • ./reg_tests/regrid_sfc/gauss2fv3incr.sh

Done.

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

Will do some final tests, then merge.

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

Will do some final tests, then merge.

Thanks - let me know if I can do anything. Otherwise, have a great weekend!

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

I tried running the consistency test on WCOSS2 in 'debug' mode and it stopped with an array mismatch:

forrtl: severe (408): fort: (33): Shape mismatch: The extent of dimension 2 of array LON_PTR_COORD is 65 and 
the corresponding extent of array LON_PTR_FIELD is 64

Image              PC                Routine            Line        Source
regridStates.x     00000000020CB74F  Unknown               Unknown  Unknown
regridStates.x     000000000043DC75  grids_io_mp_creat         647  grids_IO.F90
regridStates.x     00000000004294A1  grids_io_mp_setup          72  grids_IO.F90
regridStates.x     000000000042368B  MAIN__                    107  regridStates.F90

This is the line in the grids_IO.F90:

     ! set coord pointe to field pointer
     lon_ptr_coord = lon_ptr_field

It only happens on PET 5. On all other PETS, dimension 2 is 64 for both pointers. LAT_PTR_COORD has the same problem on PET 5.

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

I forgot to update the consistency test driver script for WCOSS2. I placed the updated version on Hera: /scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS.clara/reg_tests/regrid_sfc/driver.sh

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

I forgot to update the consistency test driver script for WCOSS2. I placed the updated version on Hera: /scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS.clara/reg_tests/regrid_sfc/driver.sh

I pushed your change. Did that fix the issue with debug?

Comment thread sorc/regrid_sfc.fd/grids_IO.F90 Outdated
character(len=15), dimension(2) :: latvar_list

! center coord names
lonvar_list(1) = 'grid_center_lon'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are specifying center and corner variables.

Comment thread sorc/regrid_sfc.fd/grids_IO.F90 Outdated
latvar_list(2) = 'grid_corner_lat'

! Create the fields
do v=1,2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this loop, you are only specifying 'center' fields.

Comment thread sorc/regrid_sfc.fd/grids_IO.F90 Outdated
@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

@GeorgeGayno-NOAA I've fixed the bug, and then wound up replacing that section of code with setting-up the gaussian grid directly from the scrip file, which is much neater. I don't have access to WCOSS, but this now compiles in debug mode on hera (previous didn't). It fails the reg_test, but the differences are very minor (< 10**-15). I also checked that it's wrapping correctly on 0 longitude (i.e., correctly using a periodic dimenson).

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

New test output:
/scratch2/BMC/gsienkf/Clara.Draper/gerrit-hera/PRprep/new_test/regrid_sfc/baseline_data

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

@GeorgeGayno-NOAA I've fixed the bug, and then wound up replacing that section of code with setting-up the gaussian grid directly from the scrip file, which is much neater. I don't have access to WCOSS, but this now compiles in debug mode on hera (previous didn't). It fails the reg_test, but the differences are very minor (< 10**-15). I also checked that it's wrapping correctly on 0 longitude (i.e., correctly using a periodic dimenson).

I was able to compile and run the tests on WCOSS2 in 'debug' mode. I noted the same minor differences from the baseline.

@ClaraDraper-NOAA
Copy link
Copy Markdown
Collaborator Author

@GeorgeGayno-NOAA I think this is ready to go. I changed the pet distribution for the Gaussian case so that the npets does not need to be a factor of the longitudinal dimension (which was causing workflow problems). It does now need to be a multiple of 6 though - and the Gaussian longitudinal dimension also needs to be a multiple of 6 - this is true for all our grids, except C128, which has Gaussian output at 512x256.
@aerorahul suggested we merge this now, then fix that later.

Note: the results are not reproducible if we change the number of tasks. With 6 tasks at C192, the wall time is 6 seconds (5 seconds with 36 tasks). I suspect that this is not a new problem. For the workflow, I suggest we proceed for now, and fix the tasks at 6 for reproducibility. @CatherineThomas-NOAA @aerorahul - does this sound OK?

This passes the test, and compiles and runs in debug on hera.

More info:
The issue with the restrictions on the number of tasks is coming from the fact that I can't get the unabalanced DE distributions in the CreateGrid routines to work - and I see from the comments in the code @GeorgeGayno-NOAA sent me that this was also a problem for the setting up the cube sphere grid (which is where the multiples of 6 requirements is coming from). I can probably fix this later, but if my proposal above is not OK and we need it fixed sooner, I'll need some help from a software specialist.

@GeorgeGayno-NOAA
Copy link
Copy Markdown
Collaborator

@GeorgeGayno-NOAA I think this is ready to go. I changed the pet distribution for the Gaussian case so that the npets does not need to be a factor of the longitudinal dimension (which was causing workflow problems). It does now need to be a multiple of 6 though - and the Gaussian longitudinal dimension also needs to be a multiple of 6 - this is true for all our grids, except C128, which has Gaussian output at 512x256. @aerorahul suggested we merge this now, then fix that later.

Note: the results are not reproducible if we change the number of tasks. With 6 tasks at C192, the wall time is 6 seconds (5 seconds with 36 tasks). I suspect that this is not a new problem. For the workflow, I suggest we proceed for now, and fix the tasks at 6 for reproducibility. @CatherineThomas-NOAA @aerorahul - does this sound OK?

This passes the test, and compiles and runs in debug on hera.

More info: The issue with the restrictions on the number of tasks is coming from the fact that I can't get the unabalanced DE distributions in the CreateGrid routines to work - and I see from the comments in the code @GeorgeGayno-NOAA sent me that this was also a problem for the setting up the cube sphere grid (which is where the multiples of 6 requirements is coming from). I can probably fix this later, but if my proposal above is not OK and we need it fixed sooner, I'll need some help from a software specialist.

In the interest of time, I will run the final tests, then merge what we have.

@GeorgeGayno-NOAA GeorgeGayno-NOAA merged commit 3f498ed into ufs-community:develop Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants