Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LAPACK_dtr_nancheck called by LAPACKE_dlantr assumes triangular matrix #642

Closed
kortschak opened this issue Sep 12, 2015 · 5 comments
Closed

Comments

@kortschak
Copy link
Contributor

LAPACKE_dlantr works on a m by n triangular or trapezoidal matrix, but calls LAPACKE_dtr_nancheck with n. Under some circumstances (m < n), this can result in a segfault (reproducer in called from Go - pure C reproducer not available):

./cgo
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0xc820500028 pc=0x7f718a6dce80]
runtime stack:
runtime.throw(0x66cd60, 0x2a)
    /usr/local/go/src/runtime/panic.go:527 +0x90
runtime.sigpanic()
    /usr/local/go/src/runtime/sigpanic_unix.go:12 +0x5a
runtime.systemstack(0xc800000065)
    /usr/local/go/src/runtime/asm_amd64.s:279 +0xab
runtime.mHeap_AllocStack(0x5, 0x49, 0x55)
    /usr/local/go/src/runtime/mheap.go:498 +0x25
goroutine 6 [syscall, locked to thread]:
runtime.cgocall(0x42a660, 0xc820041a78, 0xc800000000)
    /usr/local/go/src/runtime/cgocall.go:120 +0x11b fp=0xc820041a30 sp=0xc820041a00
github.com/gonum/lapack/cgo/clapack._Cfunc_LAPACKE_dlantr(0x554c4900000065, 0xa00000005, 0xc8204ffdc0, 0x7f710000000b, 0x0)
    github.com/gonum/lapack/cgo/clapack/_obj/_cgo_gotypes.go:6622 +0x3a fp=0xc820041a78 sp=0xc820041a30
github.com/gonum/lapack/cgo/clapack.Dlantr(0x49, 0x4c, 0x55, 0x5, 0xa, 0xc8204ffdc0, 0x37, 0x37, 0xb, 0x80)
    /home/travis/gopath/src/github.com/gonum/lapack/cgo/clapack/clapack.go:3112 +0xa1 fp=0xc820041ab8 sp=0xc820041a78
github.com/gonum/lapack/cgo.Implementation.Dlantr(0x7f714ad6c449, 0x7a, 0x84, 0x5, 0xa, 0xc8204ffdc0, 0x37, 0x37, 0xb, 0xc8204f7600, ...)
    github.com/gonum/lapack/cgo/_test/_obj_test/lapack.go:167 +0x277 fp=0xc820041b20 sp=0xc820041ab8
github.com/gonum/lapack/cgo.(*Implementation).Dlantr(0x95db98, 0x49, 0x7a, 0x84, 0x5, 0xa, 0xc8204ffdc0, 0x37, 0x37, 0xb, ...)
    <autogenerated>:3 +0x12c fp=0xc820041b90 sp=0xc820041b20
github.com/gonum/lapack/testlapack.DlantrTest(0xc820020120, 0x7f714ad6c728, 0x95db98)
    /home/travis/gopath/src/github.com/gonum/lapack/testlapack/dlantr.go:73 +0x607 fp=0xc820041f28 sp=0xc820041b90
github.com/gonum/lapack/cgo.TestDlantr(0xc820020120)
    /home/travis/gopath/src/github.com/gonum/lapack/cgo/lapack_test.go:21 +0x76 fp=0xc820041f68 sp=0xc820041f28
testing.tRunner(0xc820020120, 0x927a98)
    /usr/local/go/src/testing/testing.go:456 +0x98 fp=0xc820041fa0 sp=0xc820041f68
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820041fa8 sp=0xc820041fa0
created by testing.RunTests
    /usr/local/go/src/testing/testing.go:561 +0x86d
goroutine 1 [chan receive]:
testing.RunTests(0x6841e8, 0x927a80, 0xe, 0xe, 0xc82007c301)
    /usr/local/go/src/testing/testing.go:562 +0x8ad
testing.(*M).Run(0xc820043f08, 0x93a180)
    /usr/local/go/src/testing/testing.go:494 +0x70
main.main()
    github.com/gonum/lapack/cgo/_test/_testmain.go:126 +0x252
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

This fault is prevented by making the following change:

diff --git a/lapack-netlib/lapacke/src/lapacke_dlantr.c b/lapack-netlib/lapacke/
index 2cde1eb..9d44972 100644
--- a/lapack-netlib/lapacke/src/lapacke_dlantr.c
+++ b/lapack-netlib/lapacke/src/lapacke_dlantr.c
@@ -46,7 +46,7 @@ double LAPACKE_dlantr( int matrix_order, char norm, char uplo,
     }
 #ifndef LAPACK_DISABLE_NAN_CHECK
     /* Optionally check input matrices for NaNs */
-    if( LAPACKE_dtr_nancheck( matrix_order, uplo, diag, n, a, lda ) ) {
+    if( LAPACKE_dtr_nancheck( matrix_order, uplo, diag, MIN(m,n), a, lda ) ) {
         return -7;
     }
 #endif

I don't believe that this is the proper fix though, as that leaves the non-triangular portion of a trapezoidal matrix unchecked for NaN. The correct fix is to either fix the signature for LAPACKE_*tr_nancheck and include the m (this requires changes in ~80 call sites) or add a trapezoidal NaN check version of *tr_nancheck.

@xianyi
Copy link
Collaborator

xianyi commented Oct 5, 2015

@kortschak , sorry for the delay. Did you report it to LAPACK forum?

@kortschak
Copy link
Contributor Author

I have now.

@btracey
Copy link
Contributor

btracey commented Oct 5, 2015

I downloaded the new OpenBLAS to check on issue #615 , and I'm getting Dlantr failures again. The failures are not consistent, but I sometimes get the following test failure:

brendan:~/Documents/mygo/src/github.com/gonum/lapack/cgo$ go test
--- FAIL: TestDlantr (0.04s)
    dlantr.go:77: Norm mismatch. norm = O, unitdiag = true, upper = true, m = 5, n = 10, lda = 11, Want -7, got 4.4771955720516665.

(where the -7 is from cgo, and should be impossible from a norm)

and less frequently get a panic trace like in @kortschak 's post

@btracey
Copy link
Contributor

btracey commented Oct 5, 2015

The issue is as Dan suggests, the nancheck size is wrong.

julielangou pushed a commit to Reference-LAPACK/lapack-www that referenced this issue Apr 29, 2016
Bug report and bug fix from Dan Kortschak.

See:
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=4810
OpenMathLib/OpenBLAS#642

This is a quick bug fix, this only checks the triangular part of the trapezoid,
the rectangular part is not checked. Another commit should follow up soon.
julielangou pushed a commit to Reference-LAPACK/lapack that referenced this issue Apr 29, 2016
Bug report and bug fix from Dan Kortschak.

See:
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=4810
OpenMathLib/OpenBLAS#642

This is a quick bug fix, this only checks the triangular part of the trapezoid,
the rectangular part is not checked. Another commit should follow up soon.
@martin-frbg
Copy link
Collaborator

Suggesting to close this as the "quick and dirty bug fix" from Reference-LAPACK has been imported since. (I see no indication of continuing work towards a "proper" solution there however, still the problem seems to be fundamentally a Reference-LAPACK one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants