Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New algorithms for computing Givens rotations #631

Conversation

weslleyspereira
Copy link
Collaborator

@weslleyspereira weslleyspereira commented Oct 15, 2021

Closes #629

New algorithms for computing Givens rotations

@sergey-v-kuznetsov highlighted in #629 that the new Givens rotations operations may have lower accuracy than the ones that were in LAPACK up to release 3.9. This was verified after applying several rotations to a initial unitary matrix. This PR proposes:

  1. A new algorithm for computing complex Givens rotations.
  2. A slight modification in the algorithms for computing real-valued Givens rotations.

Both modifications target the improvement of the output's accuracy.

@langou and I are preparing a report with the numerical analysis and experiments comparing the different algorithms for computing Givens rotations. We will share the document here when it is ready to review.

Minor modifications

  1. Use rtmin = sqrt( safmin ) instead of rtmin = sqrt( safmin / epsilon ). The first condition is sufficient to guarantee all real variables used in the intermediate steps of the new algorithm belong to the interval [safmin,safmax].
  2. Set rtmax to either sqrt( safmin/4 ), sqrt( safmin/2 ) or sqrt( safmin ). This variable depends on where it is in the algorithm. The value is the maximum possible in order that all real variables used in the intermediate steps of the new algorithm belong to the interval [safmin,safmax].
  3. Eliminate the intermediate computations like p = one / d, uu = one / u and vv = one / v. These operations reduce the number of divisions in the code at the cost of possibly increasing the accumulation error. I am trying to improve accuracy, so I remove the intermediate operations at the cost of having additional floating-point divisions.
  4. When f = 0, check if real(g) == 0 or aimag(g) == 0 to avoid unnecessary ABSSQ( g ); sqrt( g2 ). This change reduces the accumulation error when real(g) == 0 (analogously aimag(g) == 0) and aimag(g)**2 (analogously real(g)**2) cannot be stored in the respective finite precision. We choose not to use the intrinsic complex abs because its implementation is compiler-dependent.

Major changes

The algorithm for computing complex Givens rotations was revisited. This is the new code in (c,z)ROTG and (c,z)LARTG for the unscaled part:

f2 = ABSSQ( f )
g2 = ABSSQ( g )
h2 = f2 + g2
! safmin <= f2 <= h2 <= safmax 
if( f2 >= h2 * safmin ) then
    ! safmin <= f2/h2 <= 1, and h2/f2 is finite
    c = sqrt( f2 / h2 )
    r = f / c
    rtmax = rtmax * 2
    if( f2 > rtmin .and. h2 < rtmax ) then
        ! safmin <= sqrt( f2*h2 ) <= safmax
        s = conjg( g ) * ( f / sqrt( f2*h2 ) )
    else
        s = conjg( g ) * ( r / h2 )
    end if
else
    ! f2/h2 <= safmin may be subnormal, and h2/f2 may overflow.
    ! Moreover,
    !  safmin <= f2*f2 * safmax < f2 * h2 < h2*h2 * safmin <= safmax,
    !  sqrt(safmin) <= sqrt(f2 * h2) <= sqrt(safmax).
    ! Also,
    !  g2 >> f2, which means that h2 = g2.
    d = sqrt( f2 * h2 )
    c = f2 / d
    if( c >= safmin ) then
        r = f / c
    else
        ! f2 / sqrt(f2 * h2) < safmin, then
        !  sqrt(safmin) <= f2 * sqrt(safmax) <= h2 / sqrt(f2 * h2) <= h2 * (safmin / f2) <= h2 <= safmax
        r = f * ( h2 / d )
    end if
    s = conjg( g ) * ( f / d )
end if
  • The worst-case scenario analysis shows this algorithm is more accurate than both the algorithms from 3.9.1 and 3.10.0.
  • All real variables used in intermediate computations belong to the interval [safmin,safmax].

Acknowledgements

Thanks to people that contributed in the discussions about this code:

Checklist

  • The documentation has been updated.
  • Time measurements.

@codecov
Copy link

codecov bot commented Oct 15, 2021

Codecov Report

Merging #631 (0c3cdcb) into master (79aa0f2) will not change coverage.
The diff coverage is 0.00%.

❗ Current head 0c3cdcb differs from pull request most recent head 95b6e84. Consider uploading reports for the commit 95b6e84 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master     #631   +/-   ##
=======================================
  Coverage    0.00%    0.00%           
=======================================
  Files        1894     1894           
  Lines      184021   184035   +14     
=======================================
- Misses     184021   184035   +14     
Impacted Files Coverage Δ
SRC/cgeqrf.f 0.00% <0.00%> (ø)
SRC/cgerqf.f 0.00% <0.00%> (ø)
SRC/clarrv.f 0.00% <0.00%> (ø)
SRC/clartg.f90 0.00% <0.00%> (ø)
SRC/dgeqrf.f 0.00% <0.00%> (ø)
SRC/dgerqf.f 0.00% <0.00%> (ø)
SRC/dlarrv.f 0.00% <0.00%> (ø)
SRC/dlartg.f90 0.00% <0.00%> (ø)
SRC/sgeqrf.f 0.00% <0.00%> (ø)
SRC/sgerqf.f 0.00% <0.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79aa0f2...95b6e84. Read the comment docs.

SRC/clartg.f90 Outdated
c = (1 / sqrt( one + g2/f2 )) * w
else
c = ( f2*p )*w
end if
c = ( f2*p )*w
Copy link
Contributor

@langou langou Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weslleyspereira: I think you want to remove line 236.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I forgot it. Thanks!

@langou
Copy link
Contributor

langou commented Oct 17, 2021

From Jim: Do we know if the new clartg is consistent with what we proposed in section 2.3.5 of our exception handling document? Might be nice to avoid having yet another version in the future (or at least know what we need to change).

langou
langou previously approved these changes Oct 27, 2021
Copy link
Contributor

@langou langou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us wait on feedback from Sergey though.

langou
langou previously approved these changes Nov 27, 2021
SRC/dlartg.f90 Outdated Show resolved Hide resolved
SRC/zlartg.f90 Outdated Show resolved Hide resolved
@weslleyspereira weslleyspereira changed the title Solves a precision bug in clartg New algorithms for computing Givens rotation Dec 13, 2021
@weslleyspereira weslleyspereira changed the title New algorithms for computing Givens rotation New algorithms for computing Givens rotations Dec 13, 2021
@weslleyspereira
Copy link
Collaborator Author

I have just updated the description (first comment) of this PR. @langou and I are preparing a report with the numerical analysis and experiments comparing the different algorithms for computing Givens rotations. We will share the document here when it is ready to review.

@weslleyspereira
Copy link
Collaborator Author

And here is the report: https://arxiv.org/abs/2211.04010.
I am ready to merge it.

@weslleyspereira weslleyspereira merged commit c30e502 into Reference-LAPACK:master Nov 9, 2022
vladimir-ch added a commit to gonum/gonum that referenced this pull request Nov 24, 2022
This incorporates modifications introduced in LAPACK 3.11.0 in
Reference-LAPACK/lapack#631
vladimir-ch added a commit to gonum/gonum that referenced this pull request Nov 24, 2022
This incorporates modifications introduced in LAPACK 3.11.0 in
Reference-LAPACK/lapack#631
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

accuracy problems with the new LAPACK 3.10 *LARTG implementations
3 participants