-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drotmg.f: Replace GAM**2 by existing GAMSQ #1000
base: master
Are you sure you want to change the base?
Conversation
Hi Sebastian, I am looking more into this. I only had time for a cursory review at this point. I am reading in the comments of the routine that
But I think in "double," so in drotmg, (what you did,) it might be OK to use GAMSQ. (Because GAMSQ is exactly GAM**2.) Whereas in "single," so in srotmg, it might not be OK to use GAMSQ. In the "single" precision case, according to the comment, using GAM twice would be better. (But then why DD1*GAM**2 and why not (DD1*GAM)*GAM.) Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ. I'll have to look more into this. I just wanted to point to this comment in the subroutine. (Also there is a typo in the comment on line 153 of xrotmg that we might want to correct "This code path if [sic] here for safety." The "if" should be an "is".) Julien |
In the double precision code, |
Does Fortran have edit edit2 Fortran supports exponentiation in its syntax (try |
Yes, of course, you are correct. This is a good point. Thanks. The 5.9604645D-8 is not a good approximation! In base 10, RGAMSQ is 5.9604644775390625D-08, I think that this base 10 representation will be mapped exactly to RGAMSQ (which is 2^(-24)) in the computer in double precision arithmetic. I did not check. |
xROTMG was introduced in 1979 in ACM Algorithm 539. The linked page contains a freely available tarball with the Fortran77 source code (open Fortran supports exponentiation in its syntax. My suggestion is to use this notation and to let the compiler take care of optimizing expressions like REAL, PARAMETER :: GAM = 2.0E0**12
DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12 edit fixed the right-hand sides in the Fortran code above. Thanks, Julien. /edit |
The longer that I look at this function, the longer I think it should be left as is: the change would not significantly improve performance, there are no callers of this function in LAPACK, and in general nobody seems to be using it. The constant 4096 exists to avoid under- and overflows and the same value is used for single and double precision. Picking a larger value would avoid unnecessary re-scaling operations. Another optimization opportunity is the possibly dead code branch starting in line 153 (there is a comment above the branch saying that the authors do not expect the branch to be taken) which could be removed. Finally, there exist clones of this implementation, e.g., a C clone created by f2c in BLIS or in golang. The commit for the golang code add DROTMG is 13 years old and reads |
See also #244, where a paper with a clearer implementation is referenced and a bug related to scaling is mentioned |
I looked into this this morning. Here is a brain dump. I now see that @angsch added a comment. Thanks. I am copy-pasting my brain dump. I'll try to work more on this later in a few days. If anyone wants to make progress, please do.
DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12
DOUBLE PRECISION, PARAMETER :: GAMSQ = 2.0D0**24
DOUBLE PRECISION, PARAMETER :: RGAMSQ = 2.0D0**(-24)
I'll have to read more the gonum thread. |
Related. I was reading the code and the WHILE loop looked suspicious. I think you can get an infinite loop if you call the function with DD1 is +Infinity for example. See code below. I did not spend much time on it. Just looked suspicious and the code below seems to have an infinite loop. program test
use iso_fortran_env, only: sp => real32, dp => real64
IMPLICIT NONE
DOUBLE PRECISION DPARAM(5)
DOUBLE PRECISION DD1, DD2, DX1, DY1
real(dp) :: pinf_dp = real(z'7ff0000000000000',dp)
DD1 = pinf_dp
DD2 = 1.0D+00
DX1 = 1.0D+00
DY1 = 1.0D+00
CALL DROTMG(DD1,DD2,DX1,DY1,DPARAM)
print *,DD1
end program |
No description provided.