Skip to content

Fixing sms_3dtke km_opt=5 which breaks km_opt=2#1009

Closed
janmandel wants to merge 8 commits intowrf-model:developfrom
openwfm:3DTKE-fix
Closed

Fixing sms_3dtke km_opt=5 which breaks km_opt=2#1009
janmandel wants to merge 8 commits intowrf-model:developfrom
openwfm:3DTKE-fix

Conversation

@janmandel
Copy link
Contributor

@janmandel janmandel commented Nov 10, 2019

TYPE: bug fix

KEYWORDS: 3DTKE, Fire

SOURCE: Jan Mandel

DESCRIPTION OF CHANGES:

Removed variables l_scale, th_h_tend, and tke_diffusion_h_tend included in registry package sms_3dtke for km_opt=5 which are used outside of km_opt=5 but should not be and broke km_opt=2.

When running with bound checks on, I also found and fixed an index overrun in vertical staggering I originally made in openwfm@be6d402 The memory allocated in the vertical grid direction is by one one less than what I thought. The fix is, do not allow the index exceed the end memory index.

LIST OF MODIFIED FILES:
Registry/Registry.EM_COMMON | 7 ++++++-
phys/module_fr_fire_util.F | 6 +++---

TESTS CONDUCTED: Jenkins, test/em_fire/two_fires serial (build option 13) only

RELEASE NOTE:

Symptoms: /compile em_fire; cd test/em_fire/two_fires; ./ideal.exe; ./wrf.exe failed with error message FIRE:crash: NaN detected

Tested with
-fpe0 -check noarg_temp_created,bounds,format,output_conversion,pointers,uninit -ftrapuv -unroll0 -u
uncommened in configure.wrf

It is possible that more variables need to be removed from sms_3dtke for other values of km_opt.

…fails

error message:
FIRE:crash: NaN detected
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE:  <stdin>  LINE:      67
crash: NaN detected

reason:
new option km_opt=5 SMS 3DTKE scheme in d059afe broke km_opt=2

details:
Memory corruption due to several variables used in dyn_em/module_diffusion_em.F with both
km_opt=5 and km_opt=2 are in package  sms_3dtke km_opt==5 in Registry/Registry.EM_COMMON
but there is no package for other values of km_opt. This results in km_opt=2 using
grid% variables that were nor properly allocated.
to make the variables allocated all the time, because some are used
also for other km_opt values.
@janmandel janmandel requested a review from a team as a code owner November 10, 2019 06:22
@davegill
Copy link
Contributor

@janmandel

Several variables used in dyn_em/module_diffusion_em.F with both
km_opt=5 and km_opt=2 are in package sms_3dtke km_opt==5 in Registry/Registry.EM_COMMON
but there is no package for other values of km_opt. Hence the variables used by km_opt=2
are not properly allocated.

Jan,
I am tripping over this error also. What variables are packaged for km_opt=5 that are also used for km_opt=2?

WRF allocates too small kme or one should not run all the way to the top

(cherry picked from commit bc8878d)
…sed in horizontal_diffusion_s

which is called with km_opt=2
@janmandel
Copy link
Contributor Author

janmandel commented Nov 14, 2019

@janmandel

Several variables used in dyn_em/module_diffusion_em.F with both
km_opt=5 and km_opt=2 are in package sms_3dtke km_opt==5 in Registry/Registry.EM_COMMON
but there is no package for other values of km_opt. Hence the variables used by km_opt=2
are not properly allocated.

Jan,
I am tripping over this error also. What variables are packaged for km_opt=5 that are also used for km_opt=2?

Dave,
I found l_scale, th_h_tend, and tke_diffusion_h_tend and removed from the km_opt=5 package.
My test case goes through.
Would you like to merge this, please?

@janmandel janmandel changed the title km_opt=2 test for Jenkins sms_3dtke km_opt=5 breaks km_opt=2 Nov 14, 2019
@janmandel janmandel changed the title sms_3dtke km_opt=5 breaks km_opt=2 Fixing sms_3dtke km_opt=5 breaks km_opt=2 Nov 14, 2019
@janmandel janmandel changed the title Fixing sms_3dtke km_opt=5 breaks km_opt=2 Fixing sms_3dtke km_opt=5 which breaks km_opt=2 Nov 14, 2019
@janmandel janmandel force-pushed the 3DTKE-fix branch 2 times, most recently from 3287863 to 57780c9 Compare November 14, 2019 09:57
@davegill
Copy link
Contributor

@janmandel
Jan,
Take a look at PR #1012 and #1013
I have gone the route of ham-handedness to get this fixed. We can get the developers to fix the Registry packaging and the associated computations so that the shared km_opt=2 and km_opt=5 variables are correctly identified.

@janmandel
Copy link
Contributor Author

@davegill
Dave,

yes PR #1012 is essentially the same as b6407f3 in this PR. When you asked which variables, I reverted that, and using hardware watchpoints in debugger found that l_scale, th_h_tend, and tke_diffusion_h_tend made my km_opt=2 test case fail, and with them removed from the package it did not fail. But I do not know if sms_3dtke even with the packaging removed does not break something in a different way. Maybe d059afe should be simply reverted until the developers fix it and verify that it does not change the results of existing ideal test cases? Such testing does not seem to be have been done. I thought WTF does this.
May I ask few things about testing please?
Are ideal problems in test/em_XXX/XXX tested automatically? It does not seem so. I found the km_opt=5 bug in test/em_fire/twofires (which is created dynamically by ./compile em_fire as you wanted).
I am getting some commits failed by Jenkins and marked with red cross and pull request flagged but I do not know why because I am not getting the emails with outputs for failed tests, only for successful ones.
Finally, what is the process for testing real runs, please?

@dudhia
Copy link
Collaborator

dudhia commented Nov 18, 2019 via email

@davegill
Copy link
Contributor

@janmandel @dudhia @weiwangncar @smileMchen

Maybe d059afe should be simply reverted until the developers fix it and verify that it does not change the results of existing ideal test cases?

Jan,
Because the broken option is to a public branch, we should not really change history with a revert. Since this broken option went into the repository in August, several months of development has happened assuming this code. The easiest solution for us is to put in a second change to fix the problem in a temporary fashion (and that fix is to remove the packaging).

In my PR, I will reference the fields you suggest as a starting point for the developer to review.

The testing for WRF has been difficult for a while. We have not had access to a robust testing harness for more than a year. We are close to getting our testing handled with this new Jenkins component. As you see though, the PASS / FAIL status that is reported by Jenkins is uncorrelated with the actual test results. We are working on this.

For proposed real-data cases, take a look at this web page:
http://www2.mmm.ucar.edu/wrf/users/contrib_info.php

@davegill
Copy link
Contributor

@janmandel
I am closing this PR.
I have used the guidance provided from this PR in #1012 (SMS3dTKE: Remove broken packaging for km_opt=5).

@davegill davegill closed this Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants