New algorithms for computing Givens rotations #631

weslleyspereira · 2021-10-15T20:13:35Z

Closes #629

New algorithms for computing Givens rotations

@sergey-v-kuznetsov highlighted in #629 that the new Givens rotations operations may have lower accuracy than the ones that were in LAPACK up to release 3.9. This was verified after applying several rotations to a initial unitary matrix. This PR proposes:

A new algorithm for computing complex Givens rotations.
A slight modification in the algorithms for computing real-valued Givens rotations.

Both modifications target the improvement of the output's accuracy.

@langou and I are preparing a report with the numerical analysis and experiments comparing the different algorithms for computing Givens rotations. We will share the document here when it is ready to review.

Minor modifications

Use rtmin = sqrt( safmin ) instead of rtmin = sqrt( safmin / epsilon ). The first condition is sufficient to guarantee all real variables used in the intermediate steps of the new algorithm belong to the interval [safmin,safmax].
Set rtmax to either sqrt( safmin/4 ), sqrt( safmin/2 ) or sqrt( safmin ). This variable depends on where it is in the algorithm. The value is the maximum possible in order that all real variables used in the intermediate steps of the new algorithm belong to the interval [safmin,safmax].
Eliminate the intermediate computations like p = one / d, uu = one / u and vv = one / v. These operations reduce the number of divisions in the code at the cost of possibly increasing the accumulation error. I am trying to improve accuracy, so I remove the intermediate operations at the cost of having additional floating-point divisions.
When f = 0, check if real(g) == 0 or aimag(g) == 0 to avoid unnecessary ABSSQ( g ); sqrt( g2 ). This change reduces the accumulation error when real(g) == 0 (analogously aimag(g) == 0) and aimag(g)**2 (analogously real(g)**2) cannot be stored in the respective finite precision. We choose not to use the intrinsic complex abs because its implementation is compiler-dependent.

Major changes

The algorithm for computing complex Givens rotations was revisited. This is the new code in (c,z)ROTG and (c,z)LARTG for the unscaled part:

f2 = ABSSQ( f )
g2 = ABSSQ( g )
h2 = f2 + g2
! safmin <= f2 <= h2 <= safmax 
if( f2 >= h2 * safmin ) then
    ! safmin <= f2/h2 <= 1, and h2/f2 is finite
    c = sqrt( f2 / h2 )
    r = f / c
    rtmax = rtmax * 2
    if( f2 > rtmin .and. h2 < rtmax ) then
        ! safmin <= sqrt( f2*h2 ) <= safmax
        s = conjg( g ) * ( f / sqrt( f2*h2 ) )
    else
        s = conjg( g ) * ( r / h2 )
    end if
else
    ! f2/h2 <= safmin may be subnormal, and h2/f2 may overflow.
    ! Moreover,
    !  safmin <= f2*f2 * safmax < f2 * h2 < h2*h2 * safmin <= safmax,
    !  sqrt(safmin) <= sqrt(f2 * h2) <= sqrt(safmax).
    ! Also,
    !  g2 >> f2, which means that h2 = g2.
    d = sqrt( f2 * h2 )
    c = f2 / d
    if( c >= safmin ) then
        r = f / c
    else
        ! f2 / sqrt(f2 * h2) < safmin, then
        !  sqrt(safmin) <= f2 * sqrt(safmax) <= h2 / sqrt(f2 * h2) <= h2 * (safmin / f2) <= h2 <= safmax
        r = f * ( h2 / d )
    end if
    s = conjg( g ) * ( f / d )
end if

The worst-case scenario analysis shows this algorithm is more accurate than both the algorithms from 3.9.1 and 3.10.0.
All real variables used in intermediate computations belong to the interval [safmin,safmax].

Acknowledgements

Thanks to people that contributed in the discussions about this code:

@sergey-v-kuznetsov, @ecanesc, @alilotfi90 and @langou,
People from @BallisticLA team, especially prof. Jim Demmel.

Checklist

The documentation has been updated.
Time measurements.

codecov · 2021-10-15T20:28:42Z

Codecov Report

Merging #631 (0c3cdcb) into master (79aa0f2) will not change coverage.
The diff coverage is 0.00%.

❗ Current head 0c3cdcb differs from pull request most recent head 95b6e84. Consider uploading reports for the commit 95b6e84 to get more accurate results

@@           Coverage Diff           @@
##           master     #631   +/-   ##
=======================================
  Coverage    0.00%    0.00%           
=======================================
  Files        1894     1894           
  Lines      184021   184035   +14     
=======================================
- Misses     184021   184035   +14

Impacted Files	Coverage Δ
SRC/cgeqrf.f	`0.00% <0.00%> (ø)`
SRC/cgerqf.f	`0.00% <0.00%> (ø)`
SRC/clarrv.f	`0.00% <0.00%> (ø)`
SRC/clartg.f90	`0.00% <0.00%> (ø)`
SRC/dgeqrf.f	`0.00% <0.00%> (ø)`
SRC/dgerqf.f	`0.00% <0.00%> (ø)`
SRC/dlarrv.f	`0.00% <0.00%> (ø)`
SRC/dlartg.f90	`0.00% <0.00%> (ø)`
SRC/sgeqrf.f	`0.00% <0.00%> (ø)`
SRC/sgerqf.f	`0.00% <0.00%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79aa0f2...95b6e84. Read the comment docs.

langou · 2021-10-16T23:37:15Z

SRC/clartg.f90

+            c = (1 / sqrt( one + g2/f2 )) * w
+         else
+            c = ( f2*p )*w
+         end if
         c = ( f2*p )*w


@weslleyspereira: I think you want to remove line 236.

Yes. I forgot it. Thanks!

langou · 2021-10-17T02:14:44Z

From Jim: Do we know if the new clartg is consistent with what we proposed in section 2.3.5 of our exception handling document? Might be nice to avoid having yet another version in the future (or at least know what we need to change).

langou

Let us wait on feedback from Sergey though.

…eira/lapack into fix-precision-in-clartgf90-2

SRC/dlartg.f90

SRC/zlartg.f90

weslleyspereira · 2021-12-13T20:24:55Z

I have just updated the description (first comment) of this PR. @langou and I are preparing a report with the numerical analysis and experiments comparing the different algorithms for computing Givens rotations. We will share the document here when it is ready to review.

weslleyspereira · 2022-11-09T16:22:07Z

And here is the report: https://arxiv.org/abs/2211.04010.
I am ready to merge it.

This incorporates modifications introduced in LAPACK 3.11.0 in Reference-LAPACK/lapack#631

Solves a precision bug in clartg

2495f1c

langou reviewed Oct 16, 2021

View reviewed changes

weslleyspereira added 2 commits October 18, 2021 09:49

Removes one line from clartg

b89b15b

Several changes to reduce the computation error

ac11f62

langou previously approved these changes Oct 27, 2021

View reviewed changes

weslleyspereira added 2 commits October 28, 2021 10:27

Starting to modify zlartg

4320882

Fix all other Givens rotation routines

37a1a1e

weslleyspereira dismissed langou’s stale review via 37a1a1e November 3, 2021 00:49

weslleyspereira marked this pull request as ready for review November 3, 2021 00:51

weslleyspereira added 2 commits November 3, 2021 10:37

Merge branch 'fix-precision-in-clartgf90-2' of github.com:weslleysper…

a49a659

…eira/lapack into fix-precision-in-clartgf90-2

Algorithm precise and with no bias in the error

2904d87

langou previously approved these changes Nov 27, 2021

View reviewed changes

Updates Givens rotations with preciser algorithms

95b6e84

weslleyspereira dismissed langou’s stale review via 95b6e84 December 10, 2021 22:43

weslleyspereira mentioned this pull request Dec 13, 2021

Fix accuracy issues in the Givens rotations #630

Closed

2 tasks

vladimir-ch reviewed Dec 13, 2021

View reviewed changes

SRC/dlartg.f90 Outdated Show resolved Hide resolved

vladimir-ch reviewed Dec 13, 2021

View reviewed changes

SRC/zlartg.f90 Outdated Show resolved Hide resolved

Fix documentation thanks to @vladimir-ch

cdc8f33

weslleyspereira changed the title ~~Solves a precision bug in clartg~~ New algorithms for computing Givens rotation Dec 13, 2021

weslleyspereira changed the title ~~New algorithms for computing Givens rotation~~ New algorithms for computing Givens rotations Dec 13, 2021

Minor changes

c362fff

weslleyspereira mentioned this pull request Dec 14, 2021

accuracy problems with the new LAPACK 3.10 *LARTG implementations #629

Closed

1 task

weslleyspereira added this to the LAPACK 3.11.0 milestone Apr 19, 2022

langou approved these changes Nov 9, 2022

View reviewed changes

weslleyspereira merged commit c30e502 into Reference-LAPACK:master Nov 9, 2022

vladimir-ch added a commit to gonum/gonum that referenced this pull request Nov 24, 2022

lapack/gonum: improve accuracy of Dlartg

341eae7

This incorporates modifications introduced in LAPACK 3.11.0 in Reference-LAPACK/lapack#631

vladimir-ch mentioned this pull request Nov 24, 2022

lapack/gonum: improve accuracy of Dlartg gonum/gonum#1840

Merged

vladimir-ch added a commit to gonum/gonum that referenced this pull request Nov 24, 2022

lapack/gonum: improve accuracy of Dlartg

4e39290

This incorporates modifications introduced in LAPACK 3.11.0 in Reference-LAPACK/lapack#631

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New algorithms for computing Givens rotations #631

New algorithms for computing Givens rotations #631

weslleyspereira commented Oct 15, 2021 •

edited

Loading

codecov bot commented Oct 15, 2021 •

edited

Loading

langou Oct 16, 2021 •

edited

Loading

weslleyspereira Oct 18, 2021

langou commented Oct 17, 2021

langou left a comment •

edited

Loading

weslleyspereira commented Dec 13, 2021

weslleyspereira commented Nov 9, 2022

New algorithms for computing Givens rotations #631

New algorithms for computing Givens rotations #631

Conversation

weslleyspereira commented Oct 15, 2021 • edited Loading