Skip to content

Component level PIO initialization for external applications such as HAFS#158

Merged
uturuncoglu merged 10 commits into
ESCOMP:masterfrom
hafs-community:feature/pio_fix_comp
Mar 5, 2021
Merged

Component level PIO initialization for external applications such as HAFS#158
uturuncoglu merged 10 commits into
ESCOMP:masterfrom
hafs-community:feature/pio_fix_comp

Conversation

@uturuncoglu
Copy link
Copy Markdown
Collaborator

@uturuncoglu uturuncoglu commented Feb 18, 2021

Description of changes

This PR aims to bring component level PIO initialization to HAFS application (or UFS Weather Model) without using shr_pio_mod.F90. The shr_pio_mod.F90 is also removed from the util/ directory.

Specific notes

Testing under UFS:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 8957306..d086bb8 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -343,8 +343,7 @@ elseif(CDEPS_DOCN)
 endif()
 
 if(CMEPS)
-  list(APPEND _ufs_defs_private PIO
-                                CMEPS
+  list(APPEND _ufs_defs_private CMEPS
                                 FRONT_CMEPS=MED)
   add_dependencies(ufs cmeps)
   target_link_libraries(ufs PUBLIC cmeps)
@@ -356,8 +355,7 @@ if(CDEPS)
 endif()
 
 if(S2S)
-  list(APPEND _ufs_defs_private PIO
-                                FRONT_MOM6=mom_cap_mod
+  list(APPEND _ufs_defs_private FRONT_MOM6=mom_cap_mod
                                 FRONT_CICE6=ice_comp_nuopc
                                 CMEPS
                                 FRONT_CMEPS=MED)
@@ -386,10 +384,6 @@ if(WW3)
   add_dependencies(ufs_model ww3_nems)
 endif()
 
-if (CMEPS OR S2S)
-  list(APPEND _ufs_model_defs_private PIO)
-endif()
-
 target_compile_definitions(ufs_model PRIVATE "${_ufs_model_defs_private}")
 
 if(DATM OR CDEPS)

Contributors other than yourself, if any:

CMEPS Issues Fixed (include github issue #):

Are changes expected to change answers?

  • bit for bit
  • different at roundoff level
  • more substantial

Any User Interface Changes (namelist or namelist defaults changes)?

  • Yes
    The UFS model does not require *_modelio.nml and pio_in files anymore. Those are controlled by the ESMF attributes. I also set defaults for them if they are not available in nems.configure.
To ALLCOMP_attributes

  pio_rearr_comm_enable_hs_comp2io = .true.
  pio_rearr_comm_enable_hs_io2comp = .false.
  pio_rearr_comm_enable_isend_comp2io = .false.
  pio_rearr_comm_enable_isend_io2comp = .true.
  pio_rearr_comm_fcd = "2denable"
  pio_rearr_comm_max_pend_req_comp2io = 0
  pio_rearr_comm_max_pend_req_io2comp = 64
  pio_rearr_comm_type = “p2p"

To [ATM|OCN|*]_attributes

  pio_netcdf_format = 64bit_offset
  pio_numiotasks = -99
  pio_rearranger = 1
  pio_root = 1
  pio_stride = 40
  pio_typename = netcdf
  • No

Testing performed if application target is CESM:(either UFS-S2S or CESM testing is required):

  • (recommended) CIME_DRIVER=nuopc scripts_regression_tests.py
    • machines: Cheyenne - Intel+MPT
    • details (e.g. failed tests): qcmd -l walltime=4:00:00 -- "CIME_DRIVER=nuopc ./scripts_regression_tests.py"
  • (recommended) CESM testlist_drv.xml
    • machines and compilers: Cheyenne - Intel+MPT
    • details (e.g. failed tests): qcmd -l walltime=4:00:00 -- ./create_test --xml-testlist ../src/drivers/nuopc/cime_config/testdefs/testlist_drv.xml --xml-machine cheyenne --xml-category nuopc --compare feb02 --baseline-root /glade/p/cesmdata/cseg/cmeps_baselines
      FAIL ERS_Vnuopc_Ln9_N3.f19_g17_rx1.A.cheyenne_intel TPUTCOMP Error: TPUTCOMP: Computation time increase > 25% from baseline
   FAIL SMS_Vnuopc_Ld1_N3.f19_g17_rx1.A.cheyenne_intel TPUTCOMP Error: TPUTCOMP: Computation time increase > 25% from baseline

I am looking into the failed test. It seems that the error is coming from dshr_stream_mod_mp_shr_stream_init_from_inline_and I think I could fix it soon.

  • (optional) CESM prealpha test
    • machines and compilers
    • details (e.g. failed tests):
  • (other) please described in detail
    • machines and compilers
    • details (e.g. failed tests):

Testing performed if application target is UFS-coupled:

  • (recommended) UFS-coupled testing
    • description:
    • details (e.g. failed tests):

Testing performed if application target is UFS-HAFS:

  • (recommended) UFS-HAFS testing
    • description:
    • details (e.g. failed tests): The HAFS specific regression tests are run and data component related configurations (DATM+DOCN and DATM+HYCOM) are passed. Full RT test are on the way. I'll update the PR once I have the results. I also need to test fully coupled case.

Hashes used for testing:

  • CESM:
  • UFS-coupled, then umbrella repostiory to check out and associated hash:
    • repository to check out:
    • branch:
    • hash:
  • UFS-HAFS, then umbrella repostiory to check out and associated hash:
    • repository to check out:
    • branch: feature/hafs_couplehycom_cdeps (still need to push an commit that removes shr_pio_mod.F90 from CMEPS-interface)
    • hash: db3d4c8

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@jedwards4b I think that failure in ERS_Vnuopc_Ln5.f19_f19_mg17.F2000Nuopc.cheyenne_intel.cam-nuopc_cap is related with the ICE component because it is calling shr_strdata_init_from_inline without passing new PIO related arguments. I think I need to make them optional.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

uturuncoglu commented Feb 18, 2021

@jedwards4b i am not sure how this is built. Is this shr_strdata_init_from_inline coming from CESM/CIME or CDEPS? It might fail right way in the build. Anyway, let me know what do you think? I also realized that CLM is also using shr_strdata_init_from_inline.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@jedwards4b okay, i see it build because interface is same. something else is going on there. I need to look at more carefully.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

I run CDEPS and CMEPS tests again after last commit and update the PR description. Everything seems fine now. I am still waiting regression tests.

@uturuncoglu uturuncoglu marked this pull request as ready for review February 22, 2021 16:18
@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@danrosen25 could you test it with fully coupled. @DeniseWorthen you could also test with S2S. I have already run the RTs without any problem (some of them was failing due to wall clock time limit on Orion) but you might want to test it. You could find more information about testing under UFS in the description section.

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

@uturuncoglu I will run the ufs-coupled and ufs-datm tests.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen thanks. Let me know if you have any problem.

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

DeniseWorthen commented Feb 22, 2021

I've had at least 9 test failures on Cheyenne where the wall clock time was exceeded. This will be a problem. We can't just make all the tests run longer.

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

@uturuncoglu can you point me to a branch where you've made all the changes other than to cmeps and the cmeps CMakelist?

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen the diff for the rest of the mods is in the PR description. BTW, do you know any idea why takes more time?

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen also we have special branch for NEMS (feature/pio_fix_comp).

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

DeniseWorthen commented Feb 22, 2021

@uturuncoglu Yes, but since your diff shows w/in your HAFS branch (which includes CDEPS), I can't use the line numbers and I want to make sure that I get the top CMakeList.txt correct. I realized my previous test was w/o making the changes in NEMS.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen if you put your CMakeList file somewhere on Cheyenne, I could check and fix it for testing this PR.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen BTW, it would be nice to test the failed test with the PIO configuration used currently in S2S. You could find those options in pio_in (need to add it to ALLCOMP_attributes in the nems.configuration) and modelio.nml ( same but in this case [ATM|OCN|]_attributes) files.

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

Thanks. I've got the code checked out here: /glade/work/worthen/ufs_testpiochange

I have not change the top CMakeList yet.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen it seems that your CMakeList is fine. There is need to change anything. You just want to remove mod_shr_pio from CMPES-interface. That is all. Let me know how it goes and if you have any error in build.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

after updating CIME and MizuRoute component under CESM, scripts_regression_tests.py works fine.

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

OK, let me try one of the timed-out test again.

Our nems.configure does not have either the pio_in or med_modelio.nml settings so I assume I will be getting the values specified in your new med_io_init. I noticed our current setting in med_modelio.nml has pio_stride set to 36 (the only difference). Would that make a noticeable timing impact?

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen in your run directory you have pio_in and *modelio.nml. Those files are used by PIO but with this implementation I moved them to set in nems.configure. So, for example you could set pio_stride as following,

MED_attributes::
  pio_stride = 36
::

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

It seems that writing mediator restarts is taking a lot of time. This case is writing mediator restarts every 6 hours. I was watching the PET log message and it almost seemed to hang it was taking so long to write. The run is here (I've copied in the latest build from /glade/work/worthen/ufs_testpiochange).

/glade/scratch/worthen/FV3_RT/rt_35151/cpld_controlfrac_c192_prod

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen okay. thanks for helping about test. I'll look at your run directory. Just to clarify, this is using default PIO parameters. Right?

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

The rpointer.cpl is unused in this test.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen I changed the defaults and fix couple of minor things. Could you test it again after updating CMEPS (same branch). I think you could perform two test:

  • nothing related with PIO in nems.configure. The mediator will pick a parameter combination for you. I also set default rearranger as box like you use in S2S.
  • you could add following options to MED_attributes to use exactly same configuration with S2S
      pio_rearranger = box
      pio_stride = 36
      pio_numiotasks = 5

BTW, I realized that if you want to set pio_stride you also need to set pio_numiotasks otherwise the interface tries to find a combination for you and replace your setting. I am not expecting any performance difference between those two test. I saw similar performance with your control run (simulated years, ~0.026).

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

I've completed two full RT tests for the cpld and datm configurations on cheyenne. Both used your latest CMEPS commit and the NEMS commit. Both passed all the baselines. The two cases were:

  1. no pio settings in nems.configure (/glade/scratch/worthen/FV3_RT/piotest_dflt)
  2. pio settings in nems.configure that you showed above (/glade/scratch/worthen/FV3_RT/piotest_nemsconfig)

Two points:

  1. The test times seem all very similar (and similar to the test times I see in the current develop branch). I'm looking at the 'total wall time' in the log out.
  2. The pio settings in the nems.configure still seem to be triggering a 'resetting to defaults' in the med_io_init in some of the tests.

I'm very unclear on how to set these pio parameters optimally. One thing I'm absolutely sure of is that we're using whatever had originally been set up at the time of bringing CMEPS into ufs and not adjusting them based on PE count etc.

I think if med_io_init can set these parameters and we obtain similar wall clock times to our current develop branch that is fine. I don't think we need to try to replicate the settings we're currently using by putting things in nems.configure. Do you agree?

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen it is nice to same similar timing. There are some logic in the med_io_init that updates PIO parameters but if you provide the settings of your own then it should be fine. At least, in my test, I provided pio_stride and pio_numiotasks together and they were used in the run. This part of the code is coming from shr_pio_mod and I did not change anything. I think it is fine for now, if we need some fine tuning in the future we could implement it. I have still need to look at using pnetcdf as pio_iotype. Lat night I tried to run the model with that option but FV3 was failing without too much information. I need to ask Jim about that. Besides of `pnetcdf issue I think this PR looks fine but I'll run full test suit again on Orion to be sure.

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

DeniseWorthen commented Feb 23, 2021

@uturuncoglu I agree this PR looks ready so I'll go ahead an approve and you can merge when you are done w/ your final testing and other approvals.

Regarding pnetcdf, I have a very vague memory of pnetcdf not working for us early on. It may now be fine---a lot has changed since those early days of cmeps integration.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen yes, I remembered like this. It wasn't working before. It is still failing and we are trying to find the source of the issue. I'll update you about it. Thanks for testing and help.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@danrosen25 if you don't mind could you review this PR? I run full test suit without any problem but If you could test it externally with fully coupled case, that would be great.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen I think pnetcdf is working now with great help of @jedwards4b. I tested on Orion with all data components in HAFS without any problem (I used it to get rid of complexity of other model components) but I am plaining to test with also S2S also. Here is the details,

  • PIO needs to be installed with PnetCDF support. Following is an example on Orion.
module use /apps/contrib/NCEP/libs/hpc-stack/modulefiles/stack

module load hpc/1.1.0
module load hpc-intel/2018.4
module load hpc-impi/2018.4
module load hdf5/1.10.6-parallel
module load netcdf/4.7.4-parallel

CC=mpiicc FC=mpiifort cmake -DPIO_ENABLE_FORTRAN=ON -DWITH_PNETCDF=ON -DPnetCDF_C_LIBRARY=$PNETCDF_LIBRARY_DIRS/libpnetcdf.a -DPnetCDF_C_INCLUDE_DIR=$PNETCDF_INCLUDE_DIRS -DPIO_ENABLE_LOGGING=ON -DPIO_ENABLE_TIMING=OFF -DCMAKE_INSTALL_PREFIX=/work/noaa/nems/tufuk/progs/pio-2.5.2/install ../
  • The UFS model cmake build is updated to use pnetcdf. I create preliminary version of CMakeModules/Modules/FindPnetCDF.cmake and make subsequent changes in the top level CMakeList.txt to use pnetcdf library in the link stage. Otherwise, the build was failing. I also need to clean FindPnetCDF.cmake and make PNETCDF optional in cmake build system. So, I did not push it yet.

  • Then, i set following options in MED_attributes,

pio_rearranger = subset
pio_stride = 1
pio_numiotasks = 120
pio_typename = pnetcdf

Anyway, it would be nice to compare the performance of the model with and without pnetcdf. I'll let you know when it is ready for performance comparison.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@DeniseWorthen @jedwards4b @mvertens @rsdunlapiv You could find the initial performance comparison for netcdf vs. pnetcdf. It seems that pnetcdf performs better then netcdf and I think this could be more dominant if you increase the frequency of mediatory history and restart. Of course this is just a single test and it is better to test each configurations couple of times to get more robust results. For example, I did two test with netcdf and one of them gave 0.010 cmp-day and other gave 0.014 cmp/day. Actually, I was not expecting such a big difference and maybe @jedwards4b could comment about it. It could be a issue related with the overload of the platform. Anyway, I am merging this PR at this point. If you need more information about setting PIO with pnetcdf just let me know.

Test results on Orion: 2-days run with restart interval 3-hours (cpld_controlfrac_prod)
pnetcdf:
Options:

      pio_rearranger = subset
      pio_stride = 40
      pio_numiotasks = 3
      pio_typename = pnetcdf

Performance from mediator.log:
simulated years / cmp-day = 0.110

Performance from stdout:

  0: Tabulating mpp_clock statistics across    144 PEs...
  0:
  0:                                           tmin          tmax          tavg          tstd  tfrac grain pemin pemax
  0: Total runtime                       120.713748    120.713886    120.713813      0.000053  1.000     0     0   143
  0: Initialization                        0.000000      0.000000      0.000000      0.000000  0.000     0     0   143
  0: FV dy-core                           28.570453     34.255114     30.946457      1.029037  0.256    11     0   143
  0: FV subgrid_z                          0.085232      0.106753      0.098361      0.005198  0.001    11     0   143
  0: FV Diag                               0.251791      1.036582      0.334647      0.147768  0.003    11     0   143
  0: GFS Step Setup                        2.365903      3.098185      2.663432      0.266708  0.022     1     0   143
  0: GFS Radiation                         6.287957     10.050448      8.618888      0.602491  0.071     1     0   143
  0: GFS Physics                           3.292284      4.328692      3.915805      0.196795  0.032     1     0   143
  0: Dynamics get state                    0.296023      0.313769      0.305355      0.003791  0.003     1     0   143
  0: Dynamics update state                 2.148558      7.787839      4.592129      0.989244  0.038     1     0   143
  0: FV3 Dycore                           29.103950     34.780858     31.553365      0.993771  0.261     1     0   143
  0:  MPP_STACK high water mark=           0
  0:   wrt grid comp destroy time=  0.347625970840454
  0:
  0:
  0:      ENDING DATE-TIME    FEB 26,2021  22:17:43.973   57  FRI   2459272
  0:      PROGRAM nems      HAS ENDED.
  0: * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
  0: *****************RESOURCE STATISTICS*******************************
  0: The total amount of wall time                        = 126.973885
  0: The total amount of time in user mode                = 113.816894
  0: The total amount of time in sys mode                 = 5.354081
  0: The maximum resident set size (KB)                   = 618032
  0: Number of page faults without I/O activity           = 208399
  0: Number of page faults with I/O activity              = 21
  0: Number of times filesystem performed INPUT           = 33752
  0: Number of times filesystem performed OUTPUT          = 296504
  0: Number of Voluntary Context Switches                 = 10511
  0: Number of InVoluntary Context Switches               = 240
  0: *****************END OF RESOURCE STATISTICS*************************

netcdf:
Options:

      pio_rearranger = subset
      pio_stride = 40
      pio_numiotasks = 3
      pio_typename = netcdf

Performance from mediator.log:
simulated years / cmp-day = 0.014

Performance from stdout:

  0: Tabulating mpp_clock statistics across    144 PEs...
  0:
  0:                                           tmin          tmax          tavg          tstd  tfrac grain pemin pemax
  0: Total runtime                       706.206202    706.206746    706.206488      0.000235  1.000     0     0   143
  0: Initialization                        0.000000      0.000000      0.000000      0.000000  0.000     0     0   143
  0: FV dy-core                           28.467060     33.540359     30.647396      1.070105  0.043    11     0   143
  0: FV subgrid_z                          0.085443      0.106754      0.099031      0.005013  0.000    11     0   143
  0: FV Diag                               0.256450      1.113360      0.354713      0.171503  0.001    11     0   143
  0: GFS Step Setup                        2.322996      2.612965      2.513327      0.095351  0.004     1     0   143
  0: GFS Radiation                         6.499035     10.188439      8.662936      0.629107  0.012     1     0   143
  0: GFS Physics                           3.301958      4.323191      3.927162      0.194902  0.006     1     0   143
  0: Dynamics get state                    0.298493      0.317037      0.307999      0.003630  0.000     1     0   143
  0: Dynamics update state                 2.039517      7.072745      4.298874      1.022846  0.006     1     0   143
  0: FV3 Dycore                           29.016406     34.076674     31.275134      1.024055  0.044     1     0   143
  0:  MPP_STACK high water mark=           0
  0:   wrt grid comp destroy time=  0.412357807159424
  0:
  0:
  0:      ENDING DATE-TIME    FEB 26,2021  23:11:06.839   57  FRI   2459272
  0:      PROGRAM nems      HAS ENDED.
  0: * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
  0: *****************RESOURCE STATISTICS*******************************
  0: The total amount of wall time                        = 709.486479
  0: The total amount of time in user mode                = 669.559845
  0: The total amount of time in sys mode                 = 32.175304
  0: The maximum resident set size (KB)                   = 617564
  0: Number of page faults without I/O activity           = 295629
  0: Number of page faults with I/O activity              = 27
  0: Number of times filesystem performed INPUT           = 33728
  0: Number of times filesystem performed OUTPUT          = 296680
  0: Number of Voluntary Context Switches                 = 10761
  0: Number of InVoluntary Context Switches               = 398
  0: *****************END OF RESOURCE STATISTICS*************************

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

I tested netcdf with pio_rearranger = box and now it performs better than pio_rearranger = subset. It is still slower than pnetcdf but more closer.

Performance from mediator.log:
# simulated years / cmp-day = 0.098

Performance from stdout:

  0: Tabulating mpp_clock statistics across    144 PEs...
  0:
  0:                                           tmin          tmax          tavg          tstd  tfrac grain pemin pemax
  0: Total runtime                       133.737439    133.737556    133.737501      0.000039  1.000     0     0   143
  0: Initialization                        0.000000      0.000000      0.000000      0.000000  0.000     0     0   143
  0: FV dy-core                           28.475112     33.566421     30.653098      1.076239  0.229    11     0   143
  0: FV subgrid_z                          0.083372      0.105635      0.098551      0.005268  0.001    11     0   143
  0: FV Diag                               0.249191      0.979971      0.330283      0.160687  0.002    11     0   143
  0: GFS Step Setup                        2.399862      2.539952      2.483874      0.039399  0.019     1     0   143
  0: GFS Radiation                         6.348360     10.262042      8.698659      0.650156  0.065     1     0   143
  0: GFS Physics                           3.331662      4.329270      3.920093      0.194209  0.029     1     0   143
  0: Dynamics get state                    0.297753      0.315214      0.305651      0.003768  0.002     1     0   143
  0: Dynamics update state                 2.040887      7.095129      4.278616      1.025718  0.032     1     0   143
  0: FV3 Dycore                           29.023413     34.090902     31.256110      1.026425  0.234     1     0   143
  0:  MPP_STACK high water mark=           0
  0:   wrt grid comp destroy time=  0.391751050949097
  0:
  0:
  0:      ENDING DATE-TIME    FEB 26,2021  23:24:46.155   57  FRI   2459272
  0:      PROGRAM nems      HAS ENDED.
  0: * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
  0: *****************RESOURCE STATISTICS*******************************
  0: The total amount of wall time                        = 136.850044
  0: The total amount of time in user mode                = 122.075589
  0: The total amount of time in sys mode                 = 5.759515
  0: The maximum resident set size (KB)                   = 617360
  0: Number of page faults without I/O activity           = 214366
  0: Number of page faults with I/O activity              = 22
  0: Number of times filesystem performed INPUT           = 54432
  0: Number of times filesystem performed OUTPUT          = 296488
  0: Number of Voluntary Context Switches                 = 10992
  0: Number of InVoluntary Context Switches               = 115
  0: *****************END OF RESOURCE STATISTICS*************************

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

Hi All, currently, @danrosen25 has trouble to get bit-to-bit identical results for FV3+HYCOM coupled configuration with his baseline. We will look at the issue and update you about it. Until this issue is resolved we won't merge this PR. It is strange that all tests (full RT under HAFS, CESM and S2S) are identical and pass with https://github.com/hafs-community/HAFS and feature/hafs_couplehycom_cdeps branch.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

Hi All,

I tried to reproduce the issue that @danrosen25 had but i could not. Here is the details of the test that I performed recently,

Two tests are conducted to see possible answer change:

TEST 1 (HAFS/support)

  • checkout code
git clone --recursive https://github.com/hafs-community/HAFS HAFS_dan
cd HAFS_dan
git checkout ae718076c632351cc9b1761d83ab10a3c3f305a4
git submodule update --init --recursive
  • buid for workflow
cd sorc/
./build_all.sh
./install_all.sh
./link_fix.sh
  • modify workflow for Orion
cd parm
ln -s system.conf.orion system.conf
edit system.conf and add following

disk_project=gmtb
tape_project=emc-gmtb
cpu_account=nems
CDSAVE=/work/noaa/{disk_project}/{ENV[USER]}

edit cronjob_hafs_rt.sh keep only following

# MSU Orion
 HOMEhafs=/work/noaa/gmtb/tufuk/HAFS_dan
 dev="-s sites/orion.ent -f"
 PYTHON3=/apps/intel-2020/intel-2020/intelpython3/bin/python3

 ${PYTHON3} ./run_hafs.py -t ${dev} 2020082512 00L HISTORY \
     config.EXPT=${EXPT} config.SUBEXPT=${EXPT}_rt_regional_static_cplocean3 \
     config.NHRS=12 ${scrubopt} \
     ../parm/hafs_regional_static.conf \
     ../parm/hafs_hycom.conf

Test 2 (feature/hafs_couplehycom_cdeps)

Everything is same except I updated model using following commands

git clone -b feature/hafs_couplehycom_cdeps --recursive https://github.com/hafs-community/HAFS
cd HAFS
git merge ae718076c632351cc9b1761d83ab10a3c3f305a4 <--- to bring app related changes for fully coupled configuration since the app is not up to date
git checkout feature/hafs_couplehycom_cdeps <--- checkout CDEPS branch in model level
git submodule update --init --recursive

Then I compared dynf* and phyf* of 12-hours forecast with cprnc. They are identical.

Copy link
Copy Markdown
Collaborator

@danrosen25 danrosen25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested within HAFS and produced bit-for-bit results once REPRO=Y. I have one concern and it's that PIO and CMEPS are now a packaged deal. A user must pre-build PIO and link it with their project before CMEPS can be used.

ae718076c632351cc9b1761d83ab10a3c3f305a4 root
+db3d4c8138b458a9c5cddd5c29dcb987d6e1d58f sorc/hafs_forecast.fd (heads/feature/hafs_couplehycom_cdeps)
+ee0aa4c02ffdae2e26acd398c8ac9d0ad7cdb15f sorc/hafs_forecast.fd/CDEPS (heads/feature/pio_fix_comp)
+48b9136b991c10a439968a1aadd3bb03d1203d81 sorc/hafs_forecast.fd/CMEPS-interface/CMEPS (cmeps_v0.4.1-559-g48b9136)
+6dac966624a91381aa14c17c9c9654753ff85917 sorc/hafs_forecast.fd/NEMS (hafs_coupledhycom.v0.0.0-53-g6dac966)

Comment thread mediator/med_io_mod.F90
@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

@danrosen25 it is good to know that you could reproduce the results. I am not fully sure REPRO=Y is forced by the app level by default or not. If not, I could reproduce without REPRO=Y and probably you could too. Anyway, if you don't have any other concern I'll merge this PR. Thanks for testing.

@uturuncoglu
Copy link
Copy Markdown
Collaborator Author

uturuncoglu commented Mar 4, 2021

Since PIO is now integral part of the UFS (it is used by both CMEPS and CDEPS) I think external dependency to PIO is not a big deal. Also, this was done in previous PR when PIO submodule removed from CMEPS.

Copy link
Copy Markdown
Collaborator

@danrosen25 danrosen25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of the changes but if there's a way to abstract the I/O and not hard code a dependency on an external library that would be great. The external library is an extra burden on the end-user and it will possibly cause headaches in building and testing future systems.

@jedwards4b
Copy link
Copy Markdown
Collaborator

@danrosen25 I don't understand that comment - every component has some dependency on external IO libraries - FMS, netcdf, hdf5, pio and other external dependencies (eg ESMF). We hope to fold pio into esmf in a future update but this will have to do for now.

@uturuncoglu uturuncoglu merged commit 74f7751 into ESCOMP:master Mar 5, 2021
@danrosen25
Copy link
Copy Markdown
Collaborator

It's just a preference I have that the less external dependencies the better. The Land Information System (LIS) and ESMF both have external dependencies but they make them optional. Even MPI is optional in ESMF.
https://github.com/NASA-LIS/LISF/blob/master/lis/arch/Config.pl#L464
http://earthsystemmodeling.org/docs/release/ESMF_8_0_1/ESMF_usrdoc/node9.html#SECTION00094200000000000000

@danrosen25 I don't understand that comment - every component has some dependency on external IO libraries - FMS, netcdf, hdf5, pio and other external dependencies (eg ESMF). We hope to fold pio into esmf in a future update but this will have to do for now.

korsbakken pushed a commit to korsbakken/CMEPS that referenced this pull request Oct 23, 2025
* addition -f new streams for datm
* fixes for getting streams with vector fields to work correctly
* added ndep stream data
* added ndep stream functionality
* fixed compile bug
* more updates
* added ndep to core2 and jra forcing
* updated stream definition file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants