-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inline ^ for literal powers of numbers #20637
Conversation
base/gmp.jl
Outdated
@@ -416,6 +416,10 @@ function ^(x::BigInt, y::Culong) | |||
ccall((:__gmpz_pow_ui, :libgmp), Void, (Ptr{BigInt}, Ptr{BigInt}, Culong), &z, &x, y) | |||
return z | |||
end | |||
@generated function ^{p}(x::BigInt, ::Type{Val{p}}) | |||
p < 0 && return :(inv(x)^p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^(-p)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really need not be a generated function. Constant folding should be able to take care of this just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would constant folding eliminate the type instability?
base/intfuncs.jl
Outdated
# for numbers, inline the power_by_squaring method for literal p. | ||
# (this also allows integer^-p to give a floating-point result | ||
# in a type-stable fashion). | ||
@generated function ^{p}(x::Number, ::Type{Val{p}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really shouldn't be @generated
. It's too likely to cause runtime performance issues with limited type information and potentially other issues with compile-time convergence / termination.
The power_by_squaring
definition here is also inaccurate except for ::Integer
(llvm already implements this optimization for those), and Complex{<:Integer}
. This definition would likely also be slower for non-bitstypes, such as BigInt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigInt
overrides this. We are already using power_by_squaring
for Complex{Float}
and in all other cases where this method is called, so there will be no accuracy regressions. "Inaccurate" seems like an exaggeration, since the roundoff errors grow extremely slowly, as O(sqrt(log(p)))
I think.
Can you give an example of where this would cause a runtime performance issue compared to calling power_by_squaring
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems analogous to saying we should default to sum_kbn
because sum(::Array)
is "inaccurate." Yes, sum_kbn
is slightly more accurate, but only slightly in most cases, and it is vastly slower.
Should it be: |
What I had in mind when I described this as an experimental feature was that we don't use it by default but have a |
Does this mean that: n = -2
y = 2^n and y = 2^(-2) will give different results? If so, I'm not sure I'm so keen about that. |
@simonbyrne, yes, one case throws an error, as opposed to both cases. This was discussed in #20527. |
@StefanKarpinski, if we only use it one package that hardly anyone uses, then we won't really get experience in whether it causes confusion, so we won't be able to decide whether to really use this in 1.0. If we use it for |
Updated to use optimal addition chains for powers < 256. Also updated so that |
Optimal addition chains are so cool! 😍 |
ex = Union{Expr,Symbol}[] # expressions to compute intermediate powers | ||
if p < 0 | ||
x′ = gensym() | ||
push!(ex, :($x′ = inv($x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this likely to be more accurate if the inversion happens at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, for integers, but in that case it's also likelier to overflow. For floating-point values, I don't see why it would be more accurate to put the inv
at the end.
Ironically, I'm seeing julia> function ff()
x = 3^17
return 2x + 1
end
ff (generic function with 1 method)
julia> ff()
258280327
julia> @code_llvm ff()
define i64 @julia_ff_67552() #0 !dbg !5 {
top:
%0 = call i64 @jlsys_internal_pow_53152(i64 3, i8** inttoptr (i64 4532589264 to i8**))
%1 = shl i64 %0, 1
%2 = or i64 %1, 1
ret i64 %2
}
julia> @code_native ff()
.section __TEXT,__text,regular,pure_instructions
Filename: REPL[46]
pushq %rbp
movq %rsp, %rbp
Source line: 2
movabsq $internal_pow, %rax
movabsq $4532589264, %rsi ## imm = 0x10E29D2D0
movl $3, %edi
callq *%rax
Source line: 3
leaq 1(%rax,%rax), %rax
popq %rbp
retq
nopw %cs:(%rax,%rax) |
@StefanKarpinski, yes, it doesn't inline for large powers (I forget what the threshold is) because the code is long enough to fail the inlining heuristic. It's not clear to me what the desired behavior is, here. e.g. if you have several |
I suppose it depends – if the whole thing can be boiled down to a constant, then you'd want to inline it, of course, but that's a separate optimization issue from this PR. I tried this patch: diff --git a/base/intfuncs.jl b/base/intfuncs.jl
index 0585bf22b7..d57bc4d706 100644
--- a/base/intfuncs.jl
+++ b/base/intfuncs.jl
@@ -201,7 +201,7 @@ end
# To avoid ambiguities for methods that dispatch on the
# first argument, we dispatch the fallback via internal_pow:
^(x, p) = internal_pow(x, p)
-internal_pow{p}(x, ::Type{Val{p}}) = x^p
+@inline internal_pow{p}(x, ::Type{Val{p}}) = x^p
# This table represents the optimal "power tree"
# based on Knuth's "TAOCP vol 2: Seminumerical Algorithms",
@@ -305,10 +305,10 @@ inlined_pow(x::Symbol, p::Integer) = inlined_pow(x, Int(p))
# the unrolled expression for literal p
# (this also allows integer^-p to give a floating-point result
# in a type-stable fashion).
-@generated internal_pow{p}(x::Number, ::Type{Val{p}}) = inlined_pow(:x, p)
+@inline @generated internal_pow{p}(x::Number, ::Type{Val{p}}) = inlined_pow(:x, p)
# for hardware floating-point types, we already call powi or powf
-internal_pow{p}(x::Union{Float32,Float64}, ::Type{Val{p}}) = x^p
+@inline internal_pow{p}(x::Union{Float32,Float64}, ::Type{Val{p}}) = x^p Would that be beneficial to ensure that |
Now that 0.6 is out with special-cased parsing of integer literal powers and the world has not come to a fiery death, we should consider these function changes (or similar ones) for 0.7/1.0. |
There's was a suggestion to make sure that the various |
@@ -443,9 +443,6 @@ end | |||
^(x::Integer, y::BigInt ) = bigint_pow(BigInt(x), y) | |||
^(x::Bool , y::BigInt ) = Base.power_by_squaring(x, y) | |||
|
|||
# override default inlining of x^2 and x^3 etc. | |||
^{p}(x::BigInt, ::Type{Val{p}}) = x^p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we still call GMP in this case?
There's some disagreement on the triage call about this. @JeffBezanson and @vtjnash feel that most of this should live in a package that people opt into if they want this level of sophistication in their exponentiation. It's also somewhat unclear when such large powers would be useful. I want at least the negative literal exponent part so that |
There would be some changes in the roundoff errors for I'm not so worried about dimensional types, since those can and do easily support ^literal already. The negative-exponent case is somewhat independent; I |
We can probably remove the 1.0 milestone from this now that #24240 has merged the negative-power parts, since the remaining parts are effectively just an optimization? |
Agree: if we change this it should only be if it is both faster and at least as accurate, which seems like an acceptable 1.x change. It might change the results of someone's code but then so can changing CPUs or numbers of threads or just running threaded code again. |
Is this something we should revisit ? |
For now, I pulled this code out into a FastPow.jl package that provides a |
Note that the https://github.com/JuliaMath/FastPow.jl package now exists to provide a |
I think that between improvements in |
Taking advantage of #20530, this PR inlines
x^p
for literal powersp
of numbers wherepower_by_squaring
is normally called, excluding floating-point numbers which already use an LLVM intrinsic (#19890). As discussed in #20527, this has the advantages:*
calls.x^-n
now works for integerx
and literaln
(closes Exponentiation by a negative power throws a DomainError #3024).(Needs tests, doc updates.)