Closed
Conversation
davegill
pushed a commit
that referenced
this pull request
Jan 12, 2019
TYPE: bug fix KEYWORDS: obs nudging, max number of tasks SOURCE: internal DESCRIPTION OF CHANGES: Problem: The max number of processors, 1024, is hard coded in module_dm.F for observation nudging. If a user requests more MPI tasks than this max number, this leads to segmentation fault. Solution: In the routine where the dimension of the variables is defined as the maximum number of MPI tasks, those two variables are now declared as ALLOCATABLE, and then they are allocated based on the total number of MPI ranks. LIST OF MODIFIED FILES: M external/RSL_LITE/module_dm.F TESTS CONDUCTED: Applied new code to a user's case, which shows the code works as expected. No bit-wise diffs with smaller test case, before vs after mods: I built the code with ./configure -d option, and run a small test case with 1 processor and 36 processors, respectively. OBS nudging is turned on. Both runs cover a 3-hour period. Results are identical. Test case with > 1024 MPI tasks: A large case (derived from a user's case) is also tested. In this case, the code is built with ./configure -D option. Without the change, the case crashed immediately. The error message is: OBS NUDGING is requested on a total of 2 domain(s). ++++++CALL ERROB AT KTAU = 0 AND INEST = 1: NSTA = 0 ++++++ At line 5741 of file module_dm.f90 Fortran runtime error: Index '1025' of dimension 1 of array 'idisplacement' above upper bound of 1024 Error termination. Backtrace: #0 0x782093 in __module_dm_MOD_get_full_obs_vector at /glade/scratch/chenming/WRFHELP/WRFV3.9.1.1_intel_dmpar_large-file/frame/module_dm.f90:5741 #1 0xffffffffffffffff in ??? With the code change, the case can run successfully for 6 hours. RELEASE NOTE: After removing a hard-coded limit for an assumed maximum number of MPI tasks, the observation nudging code for WRF now supports more than 1024 MPI tasks. If users previously ran the obs nudging code with 1024 or fewer MPI tasks, the original code is OK. However, if users tried to run obs nudging with > 1024 MPI tasks, likely the code died from a segmentation fault, while trying to access an address for an array index that was not available.
davegill
added a commit
that referenced
this pull request
Feb 15, 2019
davegill
added a commit
that referenced
this pull request
Feb 15, 2019
TYPE: text only KEYWORDS: version_decl, v4.1-alpha SOURCE: internal DESCRIPTION OF CHANGES: Update the character string inside the WRF system from 4.0.3 to 4.1-alpha. LIST OF MODIFIED FILES: M inc/version_decl TESTS CONDUCTED: - [x] Code runs and v4.1-alpha is the version printed from the WRF system programs. ``` > ncdump -h wrfinput_d01 | grep TITLE :TITLE = " OUTPUT FROM REAL_EM V4.1-alpha PREPROCESSOR" ; > ncdump -h wrfinput_initialized_d01 | grep TITLE :TITLE = " OUTPUT FROM WRF V4.1-alpha MODEL" ; > ncdump -h met_em.d01.2019-02-15_12:00:00.nc | grep TITLE :TITLE = "OUTPUT FROM METGRID V4.1" ; > ncdump -h wrfout_d01_2019-02-16_12:00:00 | grep TITLE :TITLE = " OUTPUT FROM WRF V4.1-alpha MODEL" ; ```
davegill
added a commit
that referenced
this pull request
Apr 22, 2019
davegill
added a commit
that referenced
this pull request
May 30, 2019
… data (wrf-model#875) TYPE: bug fix KEYWORDS: LBC, valid time SOURCE: identified by Michael Duda (NCAR/MMM), fixed internally DESCRIPTION OF CHANGES: Problem: 1. If a user tried to start a simulation _after_ the last LBC valid period, the WRF model would get into a nearly infinite loop and print out repeated statements: ``` THIS TIME 2000-01-24_18:00:00, NEXT TIME 2000-01-25_00:00:00 d01 2000-01-25_06:00:00 Input data is acceptable to use: wrfbdy_d01 2 input_wrf: wrf_get_next_time current_date: 2000-01-24_18:00:00 Status = -4 d01 2000-01-25_06:00:00 ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01 ``` 2. If a user tries to extend the model simulation beyond that valid times of the LBC, the code behavior is not controlled (nearly infinite loops on some machines, or runtime errors with a backtrace on other machines). Solution: In another routine, the lateral boundary condition is read to get to the correct time. Once inside of share/input_wrf.F, we should be at the correct time. There is no need to try to get to the next time. In this particular case, the effort to get to the next time fails, but we try again (and again and again). This solution fixes both problems identified above. ISSUE: Fixes wrf-model#769 "WRF doesn't halt when beginning LBC time is not in wrfbdy_d01 file" LIST OF MODIFIED FILES: M share/input_wrf.F TESTS CONDUCTED: 1. Without fix, start the model after the last valid time of the LBC file => lots of repeated messages ``` THIS TIME 2000-01-24_18:00:00, NEXT TIME 2000-01-25_00:00:00 d01 2000-01-25_06:00:00 Input data is acceptable to use: wrfbdy_d01 2 input_wrf: wrf_get_next_time current_date: 2000-01-24_18:00:00 Status = -4 d01 2000-01-25_06:00:00 ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01 ``` 2. With this fix, when LBC stops at 2000 01 25 00, and WRF starts at 2000 01 25 06 ``` d01 2000-01-25_06:00:00 Input data is acceptable to use: wrfbdy_d01 THIS TIME 2000-01-24_12:00:00, NEXT TIME 2000-01-24_18:00:00 d01 2000-01-25_06:00:00 Input data is acceptable to use: wrfbdy_d01 THIS TIME 2000-01-24_18:00:00, NEXT TIME 2000-01-25_00:00:00 d01 2000-01-25_06:00:00 Input data is acceptable to use: wrfbdy_d01 2 input_wrf: wrf_get_next_time current_date: 2000-01-24_18:00:00 Status = -4 -------------- FATAL CALLED --------------- FATAL CALLED FROM FILE: <stdin> LINE: 1134 ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01 ------------------------------------------- ``` 3. Without this fix, if we try to extend the module simulation beyond the valid lateral boundary times ``` Timing for main: time 2000-01-24_23:54:00 on domain 1: 0.53782 elapsed seconds Timing for main: time 2000-01-24_23:57:00 on domain 1: 0.51111 elapsed seconds Timing for main: time 2000-01-25_00:00:00 on domain 1: 0.54507 elapsed seconds Timing for Writing wrfout_d01_2000-01-25_00:00:00 for domain 1: 0.03793 elapsed seconds d01 2000-01-25_00:00:00 Input data is acceptable to use: wrfbdy_d01 2 input_wrf: wrf_get_next_time current_date: 2000-01-25_00:00:00 Status = -4 d01 2000-01-25_00:00:00 ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01 At line 777 of file module_date_time.f90 Fortran runtime error: Bad value during integer read Error termination. Backtrace: #0 0x10e67c36c #1 0x10e67d075 #2 0x10e67d7e9 ``` 4. With this fix, if we try to extend the module simulation beyond the valid lateral boundary times ``` Timing for main: time 2000-01-24_23:54:00 on domain 1: 0.60755 elapsed seconds Timing for main: time 2000-01-24_23:57:00 on domain 1: 0.57641 elapsed seconds Timing for main: time 2000-01-25_00:00:00 on domain 1: 0.60817 elapsed seconds Timing for Writing wrfout_d01_2000-01-25_00:00:00 for domain 1: 0.04499 elapsed seconds d01 2000-01-25_00:00:00 Input data is acceptable to use: wrfbdy_d01 2 input_wrf: wrf_get_next_time current_date: 2000-01-25_00:00:00 Status = -4 -------------- FATAL CALLED --------------- FATAL CALLED FROM FILE: <stdin> LINE: 1134 ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01 ------------------------------------------- ``` MMM Classroom regtest; em_real, nmm, em_chem; GNU only
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Does Kelly need to approve this