-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consideration of workarounds for cbrt
and exp
precompilation problems
#425
Comments
Although I don't want to do it much, specializing |
This issue will be addressed with the measure for issue #477. In other words, this will not be addressed as a precompilation problem, but as a issue of speed optimizations. |
The following is based on the techniques used in # Approximation of the reciprocal of the cube root, x^(-1/3).
# Assuming that x > 0.003, the conditional branches are omitted.
@inline function rcbrt(x::Float64)
ix = reinterpret(UInt64, x)
e0 = (ix >> 0x34) % UInt32
ed = e0 ÷ 0x3
er = e0 - ed * 0x3
a = 0x000b_f2d7 - 0x0005_5718 * er
e = (UInt32(1363) - ed) << 0x14 | a
t1 = reinterpret(Float64, UInt64(e) << 0x20)
h1 = muladd(t1^2, -x * t1, 1.0)
t2 = muladd(@evalpoly(h1, 1/3, 2/9, 14/81), h1 * t1, t1)
h2 = muladd(t2^2, -x * t2, 1.0)
t3 = muladd(muladd(2/9, h2, 1/3), h2 * t2, t2)
reinterpret(Float64, reinterpret(UInt64, t3) & 0xffff_ffff_8000_0000)
end
@inline function cbrt01(x::Float64)
r = rcbrt(x) # x^(-1/3)
h = muladd(r^2, -x * r, 1.0)
e = muladd(2/9, h, 1/3) * h * r
# x * x^(-2/3)
muladd(r, x * r, x * e * (r + r + e))
end julia> @noinline cbrt01n(x) = cbrt01(x);
julia> xs = max.(rand(1000), 0.003);
julia> @btime cbrt.($xs);
6.880 μs (1 allocation: 7.94 KiB)
julia> @btime cbrt01n.($xs);
6.680 μs (1 allocation: 7.94 KiB)
julia> @btime cbrt01.($xs); # Note that this is different from the actual use case.
2.378 μs (1 allocation: 7.94 KiB) Although this is slightly less accurate than However, for |
The aim of Lines 428 to 432 in 4645452
For example: @inline function fxyz2lab(v)
ka = oftype(v, 841 / 108) # (29/6)^2 / 3 = xyz_kappa / 116
kb = oftype(v, 16 / 116) # 4/29
vc = @fastmath max(v, oftype(v, xyz_epsilon))
@fastmath min(cbrt01(vc), muladd(ka, v, kb))
end |
The use of It is important to note that functions that internally convert colors (e.g. I will open a new issue, as the discussion has moved beyond specific functions such as |
BTW, the next most problematic function is In practice, the current julia> atand(50.85106f0, -24.293373f0), Float32(atand(big(50.85106f0), big(-24.293373f0)))
(115.53545f0, 115.53548f0) |
Edit: For the
exp
case, a workaround has been introduced in PR #483.As JuliaLang/julia#35972 revealed,
cbrt
andexp
have problems with precompilation.In the Colors.jl,
cbrt
andexp
are used inXYZ
-->{Lab
,Luv
} andXYZ
<-->RGB
conversions.Colors.jl/src/conversions.jl
Lines 430 to 436 in bea98d9
Colors.jl/src/conversions.jl
Lines 525 to 536 in bea98d9
Colors.jl/src/utilities.jl
Lines 21 to 27 in bea98d9
Note that the conversion methods from the
DIN99
and its variants also useexp
but they don't cause this problem since they are not precompiled.RGB{Float32}
->XYZ
andRGB{N0f8}
->XYZ
are also OK because of the specialization ofinvert_srgb_compand
. Moreover, although I don't clearly understand the reason,RGB{Float64}
->XYZ
seems to have no problem. (I guess the reason is the@noinline
.exp
potentially and definitely have the problem.)In the case of MWE,
f()
will be eventually recompiled becausef()
is in a "shallow" place. However, in the Colors.jl,cbrt
is in "deep" places from the public API (e.g.colordiff
anddistinguishable_colors
), so it is difficult to avoid the problem.Even if this problem is fixed by a compiler improvement, that improvement may not be backported to older versions. (It may not even be backported to v1.4 depending on the release timing of
v1.4.2v1.4.3.) So I think we should look for the workarounds which can be handled in Colors.jl.One simple workaround is not to precompile the functions which cause the problem. However, this detracts from the benefits of precompilation.
The text was updated successfully, but these errors were encountered: