Consideration of workarounds for `cbrt` and `exp` precompilation problems #425

kimikage · 2020-05-23T09:38:51Z

Edit: For the exp case, a workaround has been introduced in PR #483.

As JuliaLang/julia#35972 revealed, cbrt and exp have problems with precompilation.

In the Colors.jl, cbrt and exp are used in XYZ-->{Lab, Luv} and XYZ <--> RGB conversions.

Lines 430 to 436 in bea98d9

    
           function fxyz2lab(v) 
        
               v > xyz_epsilon ? cbrt(v) : (xyz_kappa * v + 16) / 116 
        
           end 
        
           function cnvt(::Type{Lab{T}}, c::XYZ, wp::XYZ) where T 
        
               fx, fy, fz = fxyz2lab(c.x / wp.x), fxyz2lab(c.y / wp.y), fxyz2lab(c.z / wp.z) 
        
               Lab{T}(116fy - 16, 500(fx - fy), 200(fy - fz)) 
        
           end

Colors.jl/src/conversions.jl

Lines 525 to 536 in bea98d9

    
           function cnvt(::Type{Luv{T}}, c::XYZ, wp::XYZ = WP_DEFAULT) where T 
        
               (u_wp, v_wp) = xyz_to_uv(wp) 
        
               (u_, v_) = xyz_to_uv(c) 
        
               y = c.y / wp.y 
        
               l = y > xyz_epsilon ? 116 * cbrt(y) - 16 : xyz_kappa * y 
        
               u = 13 * l * (u_ - u_wp) + zero(T) 
        
               v = 13 * l * (v_ - v_wp) + zero(T) 
        
               Luv{T}(l, u, v) 
        
           end

Colors.jl/src/utilities.jl

Lines 21 to 27 in bea98d9

    
           # `pow5_12` is called from `srgb_compand`. 
        
           # `cbrt` generates a function call, so there is little benefit of `@fastmath`. 
        
           pow5_12(x) = pow3_4(x) / cbrt(x) # 5/12 == 1/2 + 1/4 - 1/3 == 3/4 - 1/3 
        
           # `pow12_5` is called from `invert_srgb_compand`. 
        
           # x^y ≈ exp(y*log(x)) ≈ exp2(y*log2(y)); the middle form is faster 
        
           @noinline pow12_5(x) = x^2 * exp(0.4 * log(x)) # 12/5 == 2.4 == 2 + 0.4

Note that the conversion methods from the DIN99 and its variants also use exp but they don't cause this problem since they are not precompiled.

RGB{Float32}->XYZ and RGB{N0f8}->XYZ are also OK because of the specialization of invert_srgb_compand. Moreover, although I don't clearly understand the reason, RGB{Float64}->XYZ seems to have no problem. (I guess the reason is the @noinline. exp potentially and definitely have the problem.)

In the case of MWE, f() will be eventually recompiled because f() is in a "shallow" place. However, in the Colors.jl, cbrt is in "deep" places from the public API (e.g. colordiff and distinguishable_colors), so it is difficult to avoid the problem.

Even if this problem is fixed by a compiler improvement, that improvement may not be backported to older versions. (It may not even be backported to v1.4 depending on the release timing of ~~v1.4.2~~ v1.4.3.) So I think we should look for the workarounds which can be handled in Colors.jl.

One simple workaround is not to precompile the functions which cause the problem. However, this detracts from the benefits of precompilation.

The text was updated successfully, but these errors were encountered:

kimikage · 2020-05-24T15:36:19Z

Although I don't want to do it much, specializing cbrt in Colors.jl is an option. Since it is sufficient for Colors.jl to be able to calculate mainly for the [0, 1] domain, some optimizations may be available.

kimikage · 2021-05-31T01:30:54Z

This issue will be addressed with the measure for issue #477. In other words, this will not be addressed as a precompilation problem, but as a issue of speed optimizations.

kimikage · 2021-06-16T12:27:54Z

The following is based on the techniques used in Base.cbrt(::Float64). The main difference is that the following calculates 1 / cbrt(x) and then finds cbrt(x). The 1 / cbrt(x) could be used in the pow5_12.

# Approximation of the reciprocal of the cube root, x^(-1/3).
# Assuming that x > 0.003, the conditional branches are omitted.
@inline function rcbrt(x::Float64)
    ix = reinterpret(UInt64, x)
    e0 = (ix >> 0x34) % UInt32
    ed = e0 ÷ 0x3
    er = e0 - ed * 0x3
    a = 0x000b_f2d7 - 0x0005_5718 * er
    e = (UInt32(1363) - ed) << 0x14 | a
    t1 = reinterpret(Float64, UInt64(e) << 0x20)
    h1 = muladd(t1^2, -x * t1, 1.0)
    t2 = muladd(@evalpoly(h1, 1/3, 2/9, 14/81), h1 * t1, t1)
    h2 = muladd(t2^2, -x * t2, 1.0)
    t3 = muladd(muladd(2/9, h2, 1/3), h2 * t2, t2)
    reinterpret(Float64, reinterpret(UInt64, t3) & 0xffff_ffff_8000_0000)
end

@inline function cbrt01(x::Float64)
    r = rcbrt(x) # x^(-1/3)
    h = muladd(r^2, -x * r, 1.0)
    e = muladd(2/9, h, 1/3) * h * r
    # x * x^(-2/3)
    muladd(r, x * r, x * e * (r + r + e))
end

julia> @noinline cbrt01n(x) = cbrt01(x);

julia> xs = max.(rand(1000), 0.003);

julia> @btime cbrt.($xs);
  6.880 μs (1 allocation: 7.94 KiB)

julia> @btime cbrt01n.($xs);
  6.680 μs (1 allocation: 7.94 KiB)

julia> @btime cbrt01.($xs); # Note that this is different from the actual use case.
  2.378 μs (1 allocation: 7.94 KiB)

Although this is slightly less accurate than Base.cbrt, the accuracy of 1ULP is satisfied.

However, for pow5_12, the spline-based PR #485 is faster than using this. Also, I haven't found an implementation which is faster than Base.cbrt(::Float32).

kimikage · 2021-06-17T00:50:16Z

The aim of cbrt01 is to eliminate the following branch.

Colors.jl/src/conversions.jl

Lines 428 to 432 in 4645452

    
           function fxyz2lab(v) 
        
               ka = oftype(v, 841 / 108) # (29/6)^2 / 3 = xyz_kappa / 116 
        
               kb = oftype(v, 16 / 116) # 4/29 
        
               v > oftype(v, xyz_epsilon) ? cbrt(v) : muladd(ka, v, kb) 
        
           end

For example:

@inline function fxyz2lab(v)
    ka = oftype(v, 841 / 108) # (29/6)^2 / 3 = xyz_kappa / 116
    kb = oftype(v, 16 / 116) # 4/29
    vc = @fastmath max(v, oftype(v, xyz_epsilon))
    @fastmath min(cbrt01(vc), muladd(ka, v, kb))
end

kimikage · 2021-06-20T06:16:28Z

The use of cbrt is limited by PR #489 and PR #491. This makes the propagation of precompilation problems less likely. However, there are still problems with some conversions.

It is important to note that functions that internally convert colors (e.g. colordiff and distinguishable_colors) may be affected by the problems.

I will open a new issue, as the discussion has moved beyond specific functions such as exp and cbrt.

kimikage · 2021-06-21T01:57:45Z

BTW, the next most problematic function is atan (atand). The function is slow, even if there were no precompilation problems.

In practice, the current atand(::Float32) is not so accurate.

julia> atand(50.85106f0, -24.293373f0), Float32(atand(big(50.85106f0), big(-24.293373f0)))
(115.53545f0, 115.53548f0)

kimikage mentioned this issue May 23, 2020

Allocates when precompiled JuliaLang/julia#35972

Closed

kimikage mentioned this issue May 27, 2020

Improve type stability in conversion methods #427

Merged

kimikage added this to the 0.13 milestone May 31, 2021

kimikage mentioned this issue Jun 4, 2021

Improve speed and mean accuracy of pow12_5 #483

Merged

kimikage mentioned this issue Jun 18, 2021

Make the cube root to be inlined #489

Merged

kimikage mentioned this issue Jun 20, 2021

Improve Luv<-->XYZ conversions #491

Merged

This comment has been minimized.

Sign in to view

kimikage closed this as completed in #491 Jun 22, 2021

kimikage mentioned this issue Jun 22, 2021

Fatal performance degradation due to precompilation or inference bugs #496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consideration of workarounds for `cbrt` and `exp` precompilation problems #425

Consideration of workarounds for `cbrt` and `exp` precompilation problems #425

kimikage commented May 23, 2020 •

edited

Loading

kimikage commented May 24, 2020

kimikage commented May 31, 2021

kimikage commented Jun 16, 2021 •

edited

Loading

kimikage commented Jun 17, 2021

kimikage commented Jun 20, 2021

kimikage commented Jun 21, 2021 •

edited

Loading

This comment has been minimized.

Consideration of workarounds for cbrt and exp precompilation problems #425

Consideration of workarounds for cbrt and exp precompilation problems #425

Comments

kimikage commented May 23, 2020 • edited Loading

kimikage commented May 24, 2020

kimikage commented May 31, 2021

kimikage commented Jun 16, 2021 • edited Loading

kimikage commented Jun 17, 2021

kimikage commented Jun 20, 2021

kimikage commented Jun 21, 2021 • edited Loading

This comment has been minimized.

Consideration of workarounds for `cbrt` and `exp` precompilation problems #425

Consideration of workarounds for `cbrt` and `exp` precompilation problems #425

kimikage commented May 23, 2020 •

edited

Loading

kimikage commented Jun 16, 2021 •

edited

Loading

kimikage commented Jun 21, 2021 •

edited

Loading