accuracy problems with the new LAPACK 3.10 *LARTG implementations #629

sergey-v-kuznetsov · 2021-10-11T23:40:19Z

Description
Under certain conditions the new *LARTG implementations can essentially increase the 2-norm of initial vector. It can lead to significant loss of accuracy of singular vectors or eigenvectors in ScaLAPACK or LAPACK if the matrix size is large enough. The attached test demonstrates the problem. The test is for the single complex precision but the precision doesn’t matter.

In the test 5000 of plane rotations are applied to the vector of length 2 with the initial 2-norm of 10. If the test is linked with the reference LAPACK 3.10 built with
-bash-4.2$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

then the 2-norm of the resulting vector is 0.10001301E+02

-bash-4.2$ gfortran test.f liblapack.a librefblas.a
-bash-4.2$ ./a.out
…
The 2-norm of the first vector =
0.10001301E+02
…
Max of all errors printed above
0.13008118E-02

Everything is excellent with the LAPACK 3.9 clartg and here is the output for the release

-bash-4.2$ ./a.out
...
The 2-norm of the first vector =
0.99999943E+01
..
Error abs(norm of the first column - initial norm )
0.57220459E-05
...
Max of all errors printed above
0.57220459E-05

Checklist

[
test.zip
] I've included a minimal example to reproduce the issue
I'd be willing to make a PR to solve this issue

weslleyspereira · 2021-10-13T19:24:08Z

Hi @sergey-v-kuznetsov.

Thanks for this interesting problem, it took us a while to try to figure it out what was happening. I first thought it was a bug, but at this point I am not convinced there is a bug indeed. I would like to share some thoughts.

It is reasonable to think that the norm should change accordingly to the error "1 - (c^2 + |s|^2)". This is the only singular value of the rotation matrix. I plot these errors for each rotation you had in your example:

LAPACK 3.9.1 has the smallest variance.
Now, if I change treal=1.0 I obtain

and now LAPACK 3.10.0 has the smallest variance.
Using random pairs (f,g) with real and complex component values between 1e-6 and 1, and larger data, I noticed that the standard deviation of both versions is of the same order. However, the 3.10 version had a lower average, which means it was more precise. I can provide the codes if needed.
Now I will look at the norms of the rotations applied to the two vectors x=(-10,0) and y=(0,-10) at each iteration. I will plot "|x|-10" [updated] instead of "|x|". For your data, I observed the following behavior:

which seems scary, I must admit. However, I would like to change slightly the data. Instead of computing 10 rotations at each run, I will compute 16 rotations. So change do j=1, 10 to do j=1, 16. Then:

I also played with random data here, I couldn't tell which method was more precise.

My conclusions at this point are:

The issue you observe is related to the data. I am, at this point, unable to tell which method is more accurate.
The cLARTG in LAPACK 3.10 should perform better. Please, see https://doi.org/10.1145/3061665.

sergey-v-kuznetsov · 2021-10-13T19:42:23Z

I'm not convinced at all. This is certainly a bug. The norm of initial vector could be any especially in general eigenvalue problems. If everything is ok for random data and/or the test passes for other input data and/or because of the paper was published it doesn't mean that everything works properly.

sergey-v-kuznetsov · 2021-10-13T19:45:59Z

Do you suggest to say the users that their matrix is not good if he gets wrong results??

langou · 2021-10-13T20:57:03Z

Hi Sergey,

We are still trying to figure out the issue.

You are reporting that It can lead to significant loss of accuracy of singular vectors or eigenvectors in ScaLAPACK or LAPACK if the matrix size is large enough. So you are saying that 3.9 *LARTG works great across the board for singular vectors or eigenvectors in ScaLAPACK or LAPACK computation whereas 3.10 *LARTG has significant loss of accuracy.

You are reporting a severe issue and we are trying to understand it. For now, we do not understand where the issue comes from. Reverting to 3.9 *LARTG is definitely an option. And for now, it seems like the only option.

Using test.zip, we observe, as you pointed out, that, for the test in test.zip that you provided, 3.9 is significantly better than 3.10. However if we change the data in the test, we can also observe the reverse. (3.10 is significantly better than 3.9.) Our preliminary conclusion using test.zip is that: 3.9 and 3.10 look equally bad, it is not clear that one is better than the other. We still need more time to look at this.

We are still working on it, trying to figure some ideas.

We were trying to avoid running a full scale eigensolver to debug a 2x2 rotations. That being said when you say: It can lead to significant loss of accuracy of singular vectors or eigenvectors in ScaLAPACK or LAPACK if the matrix size is large enough. Which eigensolvers in LAPACK are you referring to? How large are the matrices? And is this a specific type of matrices, or pretty much any matrix would break it? Is it complex or real arithmetic? We might need to immerse ourselves in a full scale eigensolver to understand the issue, so if you can tell us when the issue appear in an eigensolver, this might help us reproduce the problem in that setting.

After a rotation the norm needs to be the conserved. However due to finite precision arithmetic, the norm either decreases a little bit or increases a little bit. One idea was that maybe 3.10 systematically reduces the norm after each rotation. Whereas 3.9 sometimes increases and sometimes decreases the norm. So after many 3.10 rotations, you keep on reducing more and more and get catastrophic reduction of norm that 3.9 avoids. This hypothesis does not seem to do be true from our experiments. Still looking at it.

We only looked at conservation of norm, we did not look yet at conservation of inner product. So start with the 2x2 identity matrix, apply many many rotations, obtain Q, and check if the 2 columns are of norm 1, but also if the 2 columns are orthogonal. (So compute || I - Q^T Q ||. ) We should look at this as well.

One idea is that c^2+|s|^2 is not exactly 1 and how far c^2+|s|^2 is from 1 affect the quality of the rotation. So maybe after computing s, we compute a c with a formula that makes c^2+|s|^2 as close to a as possible. We have not tried this yet. And another idea is that maybe sometimes it needs to reduce the norm a little, sometimes increase a little, so that over time it compensates. So maybe add some random noise.

Cheers,
Julien.

sergey-v-kuznetsov · 2021-10-13T22:24:30Z

I didn’t also like both lartg implementations and I just stated that our tests are passed with the Lapack 3.9 lartg.

We had two types of failures and unfortunately the both failures are for customer provided test cases and we are not allowed to change them. The customer provided object files which contains a call to DGESDD and DGESDD is called for 600 by 600 matrix. The norm of orthogonality is slightly exceeded but unfortunately we are not allowed to change this kind of tests.

Another failure happens in a regression Scalapack test for general eigensolver for single and double complex precision and but only single complex precisions fails. In fact the test subsequently calls pcgehrd, pclahqr, and pctrevc (right and left eigenvectors are computed) The matrix is complex but it is pretty small 96x96 for ScaLAPACK and since that this failure is really scary, the distribution block size is 16, 4 MPI nodes but absolute values of several eigenvalues are very small. The test reports the loss of orthogonality of right eigenvectors like the tolerance is 1.43503e-07 but computed value of || UTU -I || is 0.00736551. All off-diagonal of the matrix UTU -I were in a reasonable range but the diagonal entries were not. All plane rotations used in my test on your site were taken from the failure. I just dumped a sequence of plane rotations and used them for test generation.

sergey-v-kuznetsov · 2021-10-13T22:55:55Z

I also forgot to mention about another minor problem which also causes test failures. Some our old test uses a precomputed singular vectors or precomputed eigenvectors and this kind of tests is considered as passed if and only if || computed eigen/singular vector - precomputed eigen/singular vector || is less than the given threshold. Since 3.10 lartg generates sine and cosine differently, the sign of some eigevectors or singular vectors might be different from singular vectors computed with the help of LAPACK 3.9. We understand that both results are correct but it would be great an implementation of lartg will generate sine and cosine with the same signs as LAPACK 3.9 version. Anyway it's a minor problem and we can make the needed change.

langou · 2021-10-14T02:41:20Z

For the 600x600 matrix,

(*) is the matrix nonsymmetric? On the one hand, I read pcgehrd, pclahqr, and pctrevc, so I am assuming a nonsymmetric matrix. On the other hand, I read "loss of orthogonality" of the right eigenvectors U. I do not understand why U would be orthogonal in the first place. Even if the nonsymmetric matrix were to be normal, we cannot expect a good level of orthogonality between eigenvectors from pclahqr when eigenvalues are relatively close.

(*) I assume that the test was working for 3.9 and is now not working for 3.10. So that is a problem. But I am trying to understand better what is going on.

(*) That being written, I read that All off-diagonal of the matrix UTU -I were in a reasonable range but the diagonal entries were not. So actually, with 3.10, the right eigenvectors are orthogonal, but not correctly normalized. Dang it. I cannot understand why that would be the case. But this is a good lead on trying to find a solution.

(*) Can you try the 600x600 matrix with LAPACK and do you see the same issue? It seems that you do not have access to the matrix but only to an object file that you can link to ScaLAPACK MKL. So I assume the answer know. But asking just in case. The algorithms in LAPACK and ScaLAPACK are quite different by now, so I am not sure whether this would help, but I would be curious to know if the matrix breaks LAPACK as well.

(*) "All plane rotations used in my test on your site were taken from the failure. I just dumped a sequence of plane rotations and used them for test generation." Yes, great, perfect, thanks. I am sure this was a lots of work. So thanks. Well, this surely is useful. Thanks.

(*) As explained but to repeat, we understand that there is a problem 3.10 as shown by the sequence of plane rotations, and that there is no problem with 3.9 for this same sequence. However if we slightly change the sequence then, we see the reverse problem. Namely, 3.10 works and 3.9 does not. It is possible that the sequence of rotations on which 3.9 fails do not occur in practical applications, in which case, 3.9 is a better version for sure. Well, we'll keep digging.

langou · 2021-10-14T02:53:23Z

"Some our old test uses a precomputed singular vectors or precomputed eigenvectors and this kind of tests is considered as passed if and only if || computed eigen/singular vector - precomputed eigen/singular vector || is less than the given threshold."

Do you see the issue only for real arithmetic? Or for complex and real arithmetic?

Since you are mentioned "signs" as opposed to multiplication with "e^(i * theta)" (a complex number of modulus one), I believe you might only be referring to real arithmetic.

For real arithmetic, there is the following convention in 3.9 and in 3.10: "If F exceeds G in magnitude, CS will be positive." So that should give some kind of continuity in between versions. I do not think our test suite check for this though. So it is possible that 3.10 does not respect the convention. I did not look at this.

Also, we do not know what the convention is "If F does not exceed G in magnitude". Then should we assume that CS is not positive? Or we should assume nothing? I do not know.

You pointed to us in an email that the following lines of code
IF( ABS( F ).GT.ABS( G ) .AND. CS.LT.ZERO ) THEN
CS = -CS
SN = -SN
R = -R
END IF
were in 3.9 and not in 3.10. These lines of code in 3.9 are precisely to make "If F exceeds G in magnitude, CS will be positive." So maybe we could add them back.

I do not have a good understanding of all this at the moment.

sergey-v-kuznetsov · 2021-10-14T16:32:01Z

"Some our old test uses a precomputed singular vectors or precomputed eigenvectors and this kind of tests is considered as passed if and only if || computed eigen/singular vector - precomputed eigen/singular vector || is less than the given threshold."

Do you see the issue only for real arithmetic? Or for complex and real arithmetic?

Since you are mentioned "signs" as opposed to multiplication with "e^(i * theta)" (a complex number of modulus one), I believe you might only be referring to real arithmetic.

For real arithmetic, there is the following convention in 3.9 and in 3.10: "If F exceeds G in magnitude, CS will be positive." So that should give some kind of continuity in between versions. I do not think our test suite check for this though. So it is possible that 3.10 does not respect the convention. I did not look at this.

Also, we do not know what the convention is "If F does not exceed G in magnitude". Then should we assume that CS is not positive? Or we should assume nothing? I do not know.

You pointed to us in an email that the following lines of code IF( ABS( F ).GT.ABS( G ) .AND. CS.LT.ZERO ) THEN CS = -CS SN = -SN R = -R END IF were in 3.9 and not in 3.10. These lines of code in 3.9 are precisely to make "If F exceeds G in magnitude, CS will be positive." So maybe we could add them back.

I do not have a good understanding of all this at the moment.

I meant real arithmetic and I saw these failures for real arithmetic only. And yes these lines makes some tests pass.

sergey-v-kuznetsov · 2021-10-14T19:07:59Z

Please find below my answers to your questions

For the 600x600 matrix,

These dimensions (e.g. 600 by 600) are for the DGESDD test and DGESDD is a LAPACK routine. By the way adding the lines IF( ABS( F ).GT.ABS( G ) .AND. CS.LT.ZERO ) THEN CS = -CS SN = -SN R = -R END IF to the LAPACK 3.10 version makes the test pass. I've not investigated yet why it fails without these lines.

(*) is the matrix nonsymmetric? On the one hand, I read pcgehrd, pclahqr, and pctrevc, so I am assuming a nonsymmetric matrix. On the other hand, I read "loss of orthogonality" of the right eigenvectors U. I do not understand why U would be orthogonal in the first place. Even if the nonsymmetric matrix were to be normal, we cannot expect a good level of orthogonality between eigenvectors from pclahqr when eigenvalues are relatively close.

ScaLAPACK test fails for a 96 by 96 matrix and yes these ScaLAPACK routines: pcgehrd, pclahqr, and pctrevc` are for nonsymmetric eigenvalue problems. The test collects all eigenvalues and eigenvectors on the master node and check the accuracy with the help of the LAPACK cget22 routine http://www.netlib.org/lapack/explore-html/d6/d7c/group__complex__eig_gaba7f4c1d3cb97e0b5812937d39270e81.html for checking results. The cget22 is designed by the Netlib team and since that you should know better how it works. Just in the case here is this purpose of the testing routine:
Purpose:
The basic test is:

RESULT(1) = | A E  -  E W | / ( |A| |E| ulp )

using the 1-norm. It also tests the normalization of E:

RESULT(2) = max | m-norm(E(j)) - 1 | / ( n ulp )
             j

where E(j) is the j-th eigenvector, and m-norm is the max-norm of a
vector. The max-norm of a complex n-vector x in this case is the
maximum of |re(x(i)| + |im(x(i)| over i = 1, ..., n.

Namely the test fails because RESULT(2) is larger then it is expected. The value of RESULT(1) is ok. In fact the test prints RESULT(1)eps and RESULT(2)eps. I've not checked yet whether the LAPACK CGEEV or the same or similar set of LAPACK routines works for this particular matrix since I've got a lot of other work to do. But certainly the test passes if the LAPACK 3.9 clartg is used.
The matrix is almost real except a few entries and almost all of eigenvalues are complex conjugate pairs. Moreover the most eigenvalues are known in advance and mostly they are complex:
1+ i2, 1-i2, 1+i3, 1-i3, ..., 1+ik, 1-ik, .... The set of plane rotations is computed by the ScaLAPACK routine named clanv2 and they are called for each 2x2 block of the Schur decomposition and the routine also computes complex conjugate pair of eigenvalues.
I'll try to prepare a separate case in a couple of weeks if it helps.

(*) Can you try the 600x600 matrix with LAPACK and do you see the same issue? It seems that you do not have access to the matrix but only to an object file that you can link to ScaLAPACK MKL. So I assume the answer know. But asking just in case. The algorithms in LAPACK and ScaLAPACK are quite different by now, so I am not sure whether this would help, but I would be curious to know if the matrix breaks LAPACK as well.

See the answer above.

(*) "All plane rotations used in my test on your site were taken from the failure. I just dumped a sequence of plane rotations and used them for test generation." Yes, great, perfect, thanks. I am sure this was a lots of work. So thanks. Well, this surely is useful. Thanks.

(*) As explained but to repeat, we understand that there is a problem 3.10 as shown by the sequence of plane rotations, and that there is no problem with 3.9 for this same sequence. However if we slightly change the sequence then, we see the reverse problem. Namely, 3.10 works and 3.9 does not. It is possible that the sequence of rotations on which 3.9 fails do not occur in practical applications, in which case, 3.9 is a better version for sure. Well, we'll keep digging.

I don't know how to distinguish practical problem from non-practical one. It seem you know. Can you provide a guidance how to do it? What should we do if an issue with non-practical problem is filed?

langou · 2021-10-17T17:09:17Z

By the way adding the lines IF( ABS( F ).GT.ABS( G ) .AND. CS.LT.ZERO ) THEN CS = -CS SN = -SN R = -R END IF to the LAPACK 3.10 version makes the test pass. I've not investigated yet why it fails without these lines.

Thanks for checking this. Great. This is probably because the checks are of the kind || v_reference - v_new || (where v_reference is a reference value, and v_new is what you want to check), as opposed to a check like || A * v_new - v_new * lambda_new ||.

langou · 2021-10-17T17:26:02Z

I don't know how to distinguish practical problem from non-practical one. It seem you know. Can you provide a guidance how to do it? What should we do if an issue with non-practical problem is filed?

What I meant is the following/

We have two subroutines (1) and (2).

(A) We have unit test suites that show that (1) and (2) are as buggy. The unit test show that, for 50% of the cases of the unit tests, (1) fails and (2) succeeds. For the other 50% of the cases of the unit tests, (1) succeeds and (2) fails.

(B) Then we also check (1) and (2) in various test suites, many many test suites, as many as we can. We also give (1) and (2) to users, everyone is testing. And everyone is reporting that their applications with (1) work great and their applications with (2) does not work.

We can describe (A) unit test as non-practical problem. Maybe academic test suite. And then (B) embedding in applications and larger test suites as practical problems.

Well this is definitely a gray area.

What should we do if an issue with non-practical problem is filed?

Right. That's a good question.

If an issue with non-practical problem is filed, and we fix the issue so that we are now successful in solving the non-practical problem, but the fix breaks practical problems test suits, we should wait and not release the fix and continue in trying to understand what is going on.

I think there is an order of practicality. Givens rotations are a mean for our eigensolvers, singular solvers, least squares solvers, applications, etc to work. The first priority is that eigensolvers, singular solvers, least squares solvers, users applications work. The second priority is to have unit tests that exercises the Givens rotations in a wide range so that we can make sure that we understand the few lines of codes and can assess that these lines are correct and behave as expected in the range that we target, and passing these unit tests.

You would hope that there is no disconnect between unit tests and what application needs. That is passing unit test means passing any application. Well, there is a disconnect. Passing unit test does not imply passing practical problem test. Passing practical problem test does not imply passing unit test. Bridging this gap is an important goal.

langou · 2021-10-17T17:28:32Z

@sergey-v-kuznetsov. To let you know that @weslleyspereira is proposing:
https://github.com/Reference-LAPACK/lapack/pull/631/files/2495f1ced2d2d080bb523685502e1fc7b9d62713
as a fix. We'll follow up with you when we are done with this, but we think this is what needs to be changed.

weslleyspereira · 2021-12-14T19:02:42Z

Hi @sergey-v-kuznetsov. I prepared a PR with a new version of the Givens rotation algorithms. Please, see #631.

The code in #631 gives the following output for the tests you proposed:

  The first intial vector 
            (-10.0000000,0.00000000)             (0.00000000,0.00000000)
  Second intial vector 
             (0.00000000,0.00000000)            (-10.0000000,0.00000000)
  Norms of initial vectors    10.0000000    
  Output first vector 
       (-10.0000057,1.606070121E-08) (-4.723551683E-08,-1.907348633E-06)
  Output second vector 
 (-4.723551683E-08,-1.907348633E-06)      (-10.0000057,-1.606070121E-08)
  The 2-norm of the first vector = 
    0.10000006E+02
  Error abs(norm of the first column  - initial norm )
    0.57220459E-05
  The 2-norm of the second vector= 
    0.10000006E+02
  Error abs(norm of the second column - initial norm )
    0.57220459E-05
  Scalar product of the vectors should be around zero.
  Real part of the scalar product 
    0.00000000E+00
  Imaginary part of the scalar product 
    0.00000000E+00
  ABS  of scalar product 
    0.00000000E+00
  Max of all errors printed above 
    0.57220459E-05

h-vetinari · 2022-04-19T01:33:23Z

Any update on this? Pity #631 missed 3.10.1 as this might further extend the timeline for MKL conformance (which we depend on to release new lapack versions in conda-forge).

weslleyspereira · 2022-04-19T20:41:21Z

Any update on this? Pity #631 missed 3.10.1 as this might further extend the timeline for MKL conformance (which we depend on to release new lapack versions in conda-forge).

Hi. There is one update: @langou and I are working on a report about #629 and the changes on #631. We hope to finish it soon. Also, the algorithm for Givens rotations will change with #631. I think a minor release, 3.11.0, is a better place to have new algorithms. I don't think it will take long to have it released.

h-vetinari · 2022-04-19T23:00:44Z

Great to hear, thanks for the update!

h-vetinari · 2022-10-06T21:16:26Z

There is one update: @langou and I are working on a report about #629 and the changes on #631. We hope to finish it soon.

Any update on this? :)

weslleyspereira · 2022-11-09T16:20:58Z

Any update on this? :)

Yes! Finally :)
And here is the report: https://arxiv.org/abs/2211.04010.

sergey-v-kuznetsov added the Type: Bug label Oct 11, 2021

weslleyspereira mentioned this issue Oct 12, 2021

Fix accuracy issues in the Givens rotations #630

Closed

2 tasks

sknepper mentioned this issue Oct 13, 2021

Update mkl and dal to 2021.4 conda-forge/intel_repack-feedstock#29

Merged

5 tasks

weslleyspereira mentioned this issue Dec 13, 2021

New algorithms for computing Givens rotations #631

Merged

2 tasks

h-vetinari mentioned this issue Dec 23, 2021

2021.5 update conda-forge/intel_repack-feedstock#32

Merged

5 tasks

h-vetinari mentioned this issue Apr 19, 2022

Add GSVD with QR factorizations, 2-by-1 CS decomposition #406

Open

weslleyspereira closed this as completed in #631 Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accuracy problems with the new LAPACK 3.10 *LARTG implementations #629

accuracy problems with the new LAPACK 3.10 *LARTG implementations #629

sergey-v-kuznetsov commented Oct 11, 2021

weslleyspereira commented Oct 13, 2021 •

edited

Loading

sergey-v-kuznetsov commented Oct 13, 2021

sergey-v-kuznetsov commented Oct 13, 2021

langou commented Oct 13, 2021

sergey-v-kuznetsov commented Oct 13, 2021

sergey-v-kuznetsov commented Oct 13, 2021

langou commented Oct 14, 2021

langou commented Oct 14, 2021 •

edited

Loading

sergey-v-kuznetsov commented Oct 14, 2021

sergey-v-kuznetsov commented Oct 14, 2021

langou commented Oct 17, 2021

langou commented Oct 17, 2021

langou commented Oct 17, 2021

weslleyspereira commented Dec 14, 2021

h-vetinari commented Apr 19, 2022

weslleyspereira commented Apr 19, 2022

h-vetinari commented Apr 19, 2022

h-vetinari commented Oct 6, 2022 •

edited

Loading

weslleyspereira commented Nov 9, 2022

accuracy problems with the new LAPACK 3.10 *LARTG implementations #629

accuracy problems with the new LAPACK 3.10 *LARTG implementations #629

Comments

sergey-v-kuznetsov commented Oct 11, 2021

weslleyspereira commented Oct 13, 2021 • edited Loading

sergey-v-kuznetsov commented Oct 13, 2021

sergey-v-kuznetsov commented Oct 13, 2021

langou commented Oct 13, 2021

sergey-v-kuznetsov commented Oct 13, 2021

sergey-v-kuznetsov commented Oct 13, 2021

langou commented Oct 14, 2021

langou commented Oct 14, 2021 • edited Loading

sergey-v-kuznetsov commented Oct 14, 2021

sergey-v-kuznetsov commented Oct 14, 2021

langou commented Oct 17, 2021

langou commented Oct 17, 2021

langou commented Oct 17, 2021

weslleyspereira commented Dec 14, 2021

h-vetinari commented Apr 19, 2022

weslleyspereira commented Apr 19, 2022

h-vetinari commented Apr 19, 2022

h-vetinari commented Oct 6, 2022 • edited Loading

weslleyspereira commented Nov 9, 2022

weslleyspereira commented Oct 13, 2021 •

edited

Loading

langou commented Oct 14, 2021 •

edited

Loading

h-vetinari commented Oct 6, 2022 •

edited

Loading