-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory error in dswap_k/dgetf2_k #671
Comments
Looks like some kind of calling convention problem between C and FORTRAN, does the code snippet in https://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=23 work for you ? |
I don't think it's a calling convention problem because in that case the code should fail in any case. If I add sensible matrix data, e.g.,
instead of |
Looking closer, the following pivots are computed by dgetrf:
The first index 9 is out of bound for a matrix of size 8. (So the downstream problems valgrind detects in dgetri might be due to that.) But there must be a bug in the OpenBLAS implementation of dgetrf_ to make it return this number. |
Alright, it is something about the handling of NaN values in input (or even a matrix composed exclusively of them) - the same code works fine even if I use infinity() for all the values. (And just for grins - I can make it work with NaNs by setting only mat[7] to a non-NaN value. Does not work with just any matrix element, perhaps related to how the task is divided internally ?) |
@kronbichler , because of the NaNs, the idamax (used by dgetf2 to select the pivot) may return the wrong overflowed index. Please look at #624 |
@xianyi Thanks for info. This could very well be the cause. Unfortunately, it looks like the implementation is done @martin-frbg If I call the netlib lapack (linked with -llapack -lblas using the system libraries), I get the following ipiv array:
Similarly, MKL gives me
Regarding undefined behavior: I don't know the specifics of the LAPACK interface with NaN numbers, but I don't like to work around a memory bug in our code that's two layers away from the LAPACK/BLAS implementation (I call the functions in a big C++ project that encapsulates the Trilinos EPETRA_LAPACK methods which in turn provide access to OpenBLAS). On the other hand, I really like the OpenBLAS project since it provides very good performance, much better than all other open-source alternatives we've tried. |
I agree that it is a bug, I am arguing that it "only" has this drastic consequences when the input is dubious already. Perhaps a kludgy temporary solution could be to clamp the value returned by ldamax |
@martin-frbg , I agree with you about the temporary fix. I check the return of i?max. |
For a (admittedly corner case) simple 8x8 matrix inversion problem according to the code:
I get memory access errors in both the factorization phase and the inversion phase:
The error seems to come from the dswap routines that do partial pivoting. The matrix does only contain NaN and inversion makes no sense, but OpenBLAS should not create memory access errors.
I compiled openBLAS from the latest git source but also checked release 0.2.14. Appears on both haswell compilation (see above) and penryn compilation. Compilers: gcc/gfortran 5.2, no other special options in openBLAS build process.
The text was updated successfully, but these errors were encountered: