-
Notifications
You must be signed in to change notification settings - Fork 81
Option to SIGQUIT or throw error during ESMF_Abort #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Dan propose to have this as a runtime option -> that way ESMF quit on error and output info. this allow easier troubleshooting and debugging. Bob: looks reasonable. and maybe put in 8.8 becuase it is not heavy weight. and this new method will be optional. |
Bill: CESM also uses this.. it make sense to use this as an option |
Look at the LogSetError option for abort on error. |
design consideration on to handle MPI aborts that makes this story a medium. this ticket may be beneficial to CESM: CESM back traces is only available to certain compilers and so this feature may help. Bill: is there a C mechanism for producing backtrace? |
Testing on Mac OS and Derecho Executing SIGQUIT on rank 2
Adding sleep for longer than walltime
|
Branch is ready to be discussed Alternative option is to utilize execinfo, which provides backtrace and backtrace_symbols. This will not provide a core dump. This is already available for writing to the ESMF PET logs using |
I split the LogMsgAbort and ESMF_Abort settings into two options, as suggested by @billsacks. Then I tested initializing ESMF with a config file, I had to move some code but that's working now. I also added SIGABRT, which calls std::abort() from the standard library. Inside of std::abort() it will raise signal SIGABRT. After some further reading SIGQUIT is usually initiated externally and is not available for Window/MinGW. The POSIX documentation says that both SIGQUIT and SIGABRT will both core dump. PR Open: #361 SIGABRT
SIGQUIT
|
Also tested on derecho with intel and mpich
|
The current method to debug ESMF Errors is to build a back trace using
ESMF_LogSetError
andrc
. This gives you a limited amount of information about the state at the time of the error. I started investigating throwing aSIGQUIT
error, which can print a backtrace and dump a core. The core dump can be analyzed to see the state causing the error.The text was updated successfully, but these errors were encountered: