Fixing sms_3dtke km_opt=5 which breaks km_opt=2#1009
Fixing sms_3dtke km_opt=5 which breaks km_opt=2#1009janmandel wants to merge 8 commits intowrf-model:developfrom
Conversation
…fails error message: FIRE:crash: NaN detected -------------- FATAL CALLED --------------- FATAL CALLED FROM FILE: <stdin> LINE: 67 crash: NaN detected reason: new option km_opt=5 SMS 3DTKE scheme in d059afe broke km_opt=2 details: Memory corruption due to several variables used in dyn_em/module_diffusion_em.F with both km_opt=5 and km_opt=2 are in package sms_3dtke km_opt==5 in Registry/Registry.EM_COMMON but there is no package for other values of km_opt. This results in km_opt=2 using grid% variables that were nor properly allocated.
to make the variables allocated all the time, because some are used also for other km_opt values.
Jan, |
…sed in horizontal_diffusion_s which is called with km_opt=2
Dave, |
3287863 to
57780c9
Compare
|
@janmandel |
|
@davegill yes PR #1012 is essentially the same as b6407f3 in this PR. When you asked which variables, I reverted that, and using hardware watchpoints in debugger found that l_scale, th_h_tend, and tke_diffusion_h_tend made my km_opt=2 test case fail, and with them removed from the package it did not fail. But I do not know if sms_3dtke even with the packaging removed does not break something in a different way. Maybe d059afe should be simply reverted until the developers fix it and verify that it does not change the results of existing ideal test cases? Such testing does not seem to be have been done. I thought WTF does this. |
|
I think Dave confirmed that removing the packaging does give results that are unaffected,but he can confirm that again,Jimy
On Sunday, November 17, 2019, 11:59:44 PM MST, Jan Mandel <notifications@github.com> wrote:
@davegill
Dave,
yes PR #1012 is essentially the same as b6407f3 in this PR. When you asked which variables, I reverted that, and using hardware watchpoints in debugger found that l_scale, th_h_tend, and tke_diffusion_h_tend made my km_opt=2 test case fail, and with them removed from the package it did not fail. But I do not know if sms_3dtke even with the packaging removed does not break something in a different way. Maybe d059afe should be simply reverted until the developers fix it and verify that it does not change the results of existing ideal test cases? Such testing does not seem to be have been done. I thought WTF does this.
May I ask few things about testing please?
Are ideal problems in test/em_XXX/XXX tested automatically? It does not seem so. I found the km_opt=5 bug in test/em_fire/twofires (which is created dynamically by ./compile em_fire as you wanted).
I am getting some commits failed by Jenkins and marked with red cross and pull request flagged but I do not know why because I am not getting the emails with outputs for failed tests, only for successful ones.
Finally, what is the process for testing real runs, please?
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
|
@janmandel @dudhia @weiwangncar @smileMchen
Jan, In my PR, I will reference the fields you suggest as a starting point for the developer to review. The testing for WRF has been difficult for a while. We have not had access to a robust testing harness for more than a year. We are close to getting our testing handled with this new Jenkins component. As you see though, the PASS / FAIL status that is reported by Jenkins is uncorrelated with the actual test results. We are working on this. For proposed real-data cases, take a look at this web page: |
|
@janmandel |
TYPE: bug fix
KEYWORDS: 3DTKE, Fire
SOURCE: Jan Mandel
DESCRIPTION OF CHANGES:
Removed variables l_scale, th_h_tend, and tke_diffusion_h_tend included in registry package sms_3dtke for km_opt=5 which are used outside of km_opt=5 but should not be and broke km_opt=2.
When running with bound checks on, I also found and fixed an index overrun in vertical staggering I originally made in openwfm@be6d402 The memory allocated in the vertical grid direction is by one one less than what I thought. The fix is, do not allow the index exceed the end memory index.
LIST OF MODIFIED FILES:
Registry/Registry.EM_COMMON | 7 ++++++-
phys/module_fr_fire_util.F | 6 +++---
TESTS CONDUCTED: Jenkins, test/em_fire/two_fires serial (build option 13) only
RELEASE NOTE:
Symptoms: /compile em_fire; cd test/em_fire/two_fires; ./ideal.exe; ./wrf.exe failed with error message FIRE:crash: NaN detected
Tested with
-fpe0 -check noarg_temp_created,bounds,format,output_conversion,pointers,uninit -ftrapuv -unroll0 -u
uncommened in configure.wrf
It is possible that more variables need to be removed from sms_3dtke for other values of km_opt.