Component level PIO initialization for external applications such as HAFS#158
Conversation
|
@jedwards4b I think that failure in |
|
@jedwards4b i am not sure how this is built. Is this |
|
@jedwards4b okay, i see it build because interface is same. something else is going on there. I need to look at more carefully. |
|
I run CDEPS and CMEPS tests again after last commit and update the PR description. Everything seems fine now. I am still waiting regression tests. |
|
@danrosen25 could you test it with fully coupled. @DeniseWorthen you could also test with S2S. I have already run the RTs without any problem (some of them was failing due to wall clock time limit on Orion) but you might want to test it. You could find more information about testing under UFS in the description section. |
|
@uturuncoglu I will run the ufs-coupled and ufs-datm tests. |
|
@DeniseWorthen thanks. Let me know if you have any problem. |
|
I've had at least 9 test failures on Cheyenne where the wall clock time was exceeded. This will be a problem. We can't just make all the tests run longer. |
|
@uturuncoglu can you point me to a branch where you've made all the changes other than to cmeps and the cmeps CMakelist? |
|
@DeniseWorthen the diff for the rest of the mods is in the PR description. BTW, do you know any idea why takes more time? |
|
@DeniseWorthen also we have special branch for NEMS (feature/pio_fix_comp). |
|
@uturuncoglu Yes, but since your diff shows w/in your HAFS branch (which includes CDEPS), I can't use the line numbers and I want to make sure that I get the top CMakeList.txt correct. I realized my previous test was w/o making the changes in NEMS. |
|
@DeniseWorthen if you put your CMakeList file somewhere on Cheyenne, I could check and fix it for testing this PR. |
|
@DeniseWorthen BTW, it would be nice to test the failed test with the PIO configuration used currently in S2S. You could find those options in pio_in (need to add it to ALLCOMP_attributes in the nems.configuration) and modelio.nml ( same but in this case [ATM|OCN|]_attributes) files. |
|
Thanks. I've got the code checked out here: /glade/work/worthen/ufs_testpiochange I have not change the top CMakeList yet. |
|
@DeniseWorthen it seems that your CMakeList is fine. There is need to change anything. You just want to remove mod_shr_pio from CMPES-interface. That is all. Let me know how it goes and if you have any error in build. |
|
after updating CIME and MizuRoute component under CESM, |
|
OK, let me try one of the timed-out test again. Our nems.configure does not have either the pio_in or med_modelio.nml settings so I assume I will be getting the values specified in your new med_io_init. I noticed our current setting in med_modelio.nml has pio_stride set to 36 (the only difference). Would that make a noticeable timing impact? |
|
@DeniseWorthen in your run directory you have pio_in and *modelio.nml. Those files are used by PIO but with this implementation I moved them to set in nems.configure. So, for example you could set pio_stride as following, |
|
It seems that writing mediator restarts is taking a lot of time. This case is writing mediator restarts every 6 hours. I was watching the PET log message and it almost seemed to hang it was taking so long to write. The run is here (I've copied in the latest build from /glade/work/worthen/ufs_testpiochange). /glade/scratch/worthen/FV3_RT/rt_35151/cpld_controlfrac_c192_prod |
|
@DeniseWorthen okay. thanks for helping about test. I'll look at your run directory. Just to clarify, this is using default PIO parameters. Right? |
|
The rpointer.cpl is unused in this test. |
|
@DeniseWorthen I changed the defaults and fix couple of minor things. Could you test it again after updating CMEPS (same branch). I think you could perform two test:
BTW, I realized that if you want to set pio_stride you also need to set pio_numiotasks otherwise the interface tries to find a combination for you and replace your setting. I am not expecting any performance difference between those two test. I saw similar performance with your control run (simulated years, ~0.026). |
|
I've completed two full RT tests for the cpld and datm configurations on cheyenne. Both used your latest CMEPS commit and the NEMS commit. Both passed all the baselines. The two cases were:
Two points:
I'm very unclear on how to set these pio parameters optimally. One thing I'm absolutely sure of is that we're using whatever had originally been set up at the time of bringing CMEPS into ufs and not adjusting them based on PE count etc. I think if med_io_init can set these parameters and we obtain similar wall clock times to our current develop branch that is fine. I don't think we need to try to replicate the settings we're currently using by putting things in nems.configure. Do you agree? |
|
@DeniseWorthen it is nice to same similar timing. There are some logic in the |
|
@uturuncoglu I agree this PR looks ready so I'll go ahead an approve and you can merge when you are done w/ your final testing and other approvals. Regarding pnetcdf, I have a very vague memory of pnetcdf not working for us early on. It may now be fine---a lot has changed since those early days of cmeps integration. |
|
@DeniseWorthen yes, I remembered like this. It wasn't working before. It is still failing and we are trying to find the source of the issue. I'll update you about it. Thanks for testing and help. |
|
@danrosen25 if you don't mind could you review this PR? I run full test suit without any problem but If you could test it externally with fully coupled case, that would be great. |
|
@DeniseWorthen I think pnetcdf is working now with great help of @jedwards4b. I tested on Orion with all data components in HAFS without any problem (I used it to get rid of complexity of other model components) but I am plaining to test with also S2S also. Here is the details,
Anyway, it would be nice to compare the performance of the model with and without pnetcdf. I'll let you know when it is ready for performance comparison. |
|
@DeniseWorthen @jedwards4b @mvertens @rsdunlapiv You could find the initial performance comparison for netcdf vs. pnetcdf. It seems that pnetcdf performs better then netcdf and I think this could be more dominant if you increase the frequency of mediatory history and restart. Of course this is just a single test and it is better to test each configurations couple of times to get more robust results. For example, I did two test with netcdf and one of them gave 0.010 cmp-day and other gave 0.014 cmp/day. Actually, I was not expecting such a big difference and maybe @jedwards4b could comment about it. It could be a issue related with the overload of the platform. Anyway, I am merging this PR at this point. If you need more information about setting PIO with pnetcdf just let me know. Test results on Orion: 2-days run with restart interval 3-hours (cpld_controlfrac_prod) Performance from mediator.log: Performance from stdout: netcdf: Performance from mediator.log: Performance from stdout: |
|
I tested netcdf with Performance from mediator.log: Performance from stdout: |
|
Hi All, currently, @danrosen25 has trouble to get bit-to-bit identical results for FV3+HYCOM coupled configuration with his baseline. We will look at the issue and update you about it. Until this issue is resolved we won't merge this PR. It is strange that all tests (full RT under HAFS, CESM and S2S) are identical and pass with https://github.com/hafs-community/HAFS and feature/hafs_couplehycom_cdeps branch. |
|
Hi All, I tried to reproduce the issue that @danrosen25 had but i could not. Here is the details of the test that I performed recently, Two tests are conducted to see possible answer change: TEST 1 (HAFS/support)
Test 2 (feature/hafs_couplehycom_cdeps)Everything is same except I updated model using following commands Then I compared dynf* and phyf* of 12-hours forecast with cprnc. They are identical. |
danrosen25
left a comment
There was a problem hiding this comment.
I tested within HAFS and produced bit-for-bit results once REPRO=Y. I have one concern and it's that PIO and CMEPS are now a packaged deal. A user must pre-build PIO and link it with their project before CMEPS can be used.
ae718076c632351cc9b1761d83ab10a3c3f305a4 root
+db3d4c8138b458a9c5cddd5c29dcb987d6e1d58f sorc/hafs_forecast.fd (heads/feature/hafs_couplehycom_cdeps)
+ee0aa4c02ffdae2e26acd398c8ac9d0ad7cdb15f sorc/hafs_forecast.fd/CDEPS (heads/feature/pio_fix_comp)
+48b9136b991c10a439968a1aadd3bb03d1203d81 sorc/hafs_forecast.fd/CMEPS-interface/CMEPS (cmeps_v0.4.1-559-g48b9136)
+6dac966624a91381aa14c17c9c9654753ff85917 sorc/hafs_forecast.fd/NEMS (hafs_coupledhycom.v0.0.0-53-g6dac966)
|
@danrosen25 it is good to know that you could reproduce the results. I am not fully sure REPRO=Y is forced by the app level by default or not. If not, I could reproduce without REPRO=Y and probably you could too. Anyway, if you don't have any other concern I'll merge this PR. Thanks for testing. |
|
Since PIO is now integral part of the UFS (it is used by both CMEPS and CDEPS) I think external dependency to PIO is not a big deal. Also, this was done in previous PR when PIO submodule removed from CMEPS. |
danrosen25
left a comment
There was a problem hiding this comment.
I approve of the changes but if there's a way to abstract the I/O and not hard code a dependency on an external library that would be great. The external library is an extra burden on the end-user and it will possibly cause headaches in building and testing future systems.
|
@danrosen25 I don't understand that comment - every component has some dependency on external IO libraries - FMS, netcdf, hdf5, pio and other external dependencies (eg ESMF). We hope to fold pio into esmf in a future update but this will have to do for now. |
|
It's just a preference I have that the less external dependencies the better. The Land Information System (LIS) and ESMF both have external dependencies but they make them optional. Even MPI is optional in ESMF.
|
* addition -f new streams for datm * fixes for getting streams with vector fields to work correctly * added ndep stream data * added ndep stream functionality * fixed compile bug * more updates * added ndep to core2 and jra forcing * updated stream definition file
Description of changes
This PR aims to bring component level PIO initialization to HAFS application (or UFS Weather Model) without using shr_pio_mod.F90. The shr_pio_mod.F90 is also removed from the util/ directory.
Specific notes
Testing under UFS:
Contributors other than yourself, if any:
CMEPS Issues Fixed (include github issue #):
Are changes expected to change answers?
Any User Interface Changes (namelist or namelist defaults changes)?
The UFS model does not require *_modelio.nml and pio_in files anymore. Those are controlled by the ESMF attributes. I also set defaults for them if they are not available in nems.configure.
Testing performed if application target is CESM:(either UFS-S2S or CESM testing is required):
qcmd -l walltime=4:00:00 -- "CIME_DRIVER=nuopc ./scripts_regression_tests.py"qcmd -l walltime=4:00:00 -- ./create_test --xml-testlist ../src/drivers/nuopc/cime_config/testdefs/testlist_drv.xml --xml-machine cheyenne --xml-category nuopc --compare feb02 --baseline-root /glade/p/cesmdata/cseg/cmeps_baselinesI am looking into the failed test. It seems that the error is coming from
dshr_stream_mod_mp_shr_stream_init_from_inline_and I think I could fix it soon.Testing performed if application target is UFS-coupled:
Testing performed if application target is UFS-HAFS:
Hashes used for testing: