-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in ReLAPACK #2066
Comments
You could try building OpenBLAS with debug information (setting DEBUG=1, or adding |
This is with
ETA: with CBLAS, the first two lines are:
|
Not sure what to make of this - dgetrf.c line 67 is where it forwards the call to stock LAPACK dgetf2 on the assumption that the problem size is too small to make the recursive block approach worthwile. I believe dgetf2 would complain if n actually managed to become zero or negative but this may need confirmation. |
Nope, not Windows, it's Ubuntu 14.04. |
At first glance the kludge from #723 should keep it from doing any accesses beyond the end of the array at line 83 - as long as the jp value from line 97 remains positive, but if it did not, it should have crashed at line 100 that has basically the same assignment. Could you try to obtain the values in ipiv at the time of failure, by running your program from gdb ? |
I don't know if I'm doing it right, but with
I guess |
Yes, |
Does this help?
Maybe I should try with a newer compiler (gcc 4.8.5 at the moment). |
I have a nagging feeling that this is related to the INTERFACE64=1 build - not sure if I thought/knew to make ReLAPACK compatible with this when I added it to the build some two years ago, and int/long argument mismatch might explain the astonishing growth in the value of n. |
Unfortunately I still see a segfault in your test 036 even with a quick hack for the (assumed) INTERFACE64 problem, |
The good thing is you could reproduce it. It is possible to run
but note that it will first run
and then run:
(It will write some scratch files in the current directory, be warned.) |
You can try |
I dont understand offset calculation above |
Similar symptoms can be seen with the LAPACK tests actually, although ReLAPACK still passes its own tests. (I can only assume that back when I created the PR to merge ReLAPACK, the OpenBLAS build of lapack/TESTING was incomplete and/or I was not aware of its importance.) There are several spots in the code (notably xPBTRF) where local work arrays are allocated based on runtime parameters that may (legitimately?) become negative depending on input. Not sure yet if fixing these will solve all problems, but at least this appears to be one cause of stack corruption. |
Down to
now |
That looks good, "grayzone" tests are not expected to pass. |
Great. I have now released 0.3.6 with the fixes. (The ReLAPACK build still shows some errors in the LAPACK testsuite as follows:
compared to
for a build with Reference-LAPACK from netlib, but I expect different rounding will be a factor) |
The more operations are done per point the more (rounding) error is accumulated. |
I tried to use OpenBLAS with ReLAPACK in OpenMolcas (https://gitlab.com/Molcas/OpenMolcas) and got a crash, apparently in
RELAPACK_dgetrf_rec at dgetrf.c
.I compiled OpenBLAS with:
and the crash occurs, for example, when I run:
Without ReLAPACK, I don't see the problem.
What other information can I provide or how can I debug it further?
The text was updated successfully, but these errors were encountered: