inference: revive `CachedMethodTable` mechanism #46535

aviatesk · 2022-08-29T12:40:46Z

CachedMethodTable was removed within #44240 as we couldn't confirm any
performance improvement then. However it turns out the optimization was
critical in some real world cases (e.g. #46492), so this commit revives
the mechanism with the following tweaks that should make it more effective:

create method table cache per inference (rather than per local
inference on a function call as on the previous implementation)
only use cache mechanism for abstract types (since we already cache
lookup result at the next level as for concrete types)

As a result, the following snippet reported at #46492 recovers the
compilation performance:

using ControlSystems
a_2 = [-5 -3; 2 -9]
C_212 = ss(a_2, [1; 2], [1 0; 0 1], [0; 0])
@time norm(C_212)

on master

julia> @time norm(C_212)
364.489044 seconds (724.44 M allocations: 92.524 GiB, 6.01% gc time, 100.00% compilation time)
0.5345224838248489

on this commit

julia> @time norm(C_212)
 26.539016 seconds (62.09 M allocations: 5.537 GiB, 5.55% gc time, 100.00% compilation time)
0.5345224838248489

aviatesk · 2022-08-29T13:06:33Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-08-29T13:51:41Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Keno · 2022-08-29T16:47:36Z

Can we come up with a nanosoldier benchmark that mirrors what ControlSystems does here, so we'd catch regressions of this form on benchmarking runs?

aviatesk · 2022-08-30T10:41:27Z

I dumped the cached method signatures and AFAICT it seems like that their code base involves many broadcasting operations. Still not sure how the code contains so many broadcasting operations, but I may try to come up with a target benchmark example.

aviatesk · 2022-08-30T11:01:41Z

Okay, I come up with a following target example (quite artificial though),

let # see https://github.com/JuliaLang/julia/pull/45276
    n = 100
    ex = Expr(:block)
    var = gensym()
    push!(ex.args, :(x .= rand(); y = sum(x)))
    for i = 1:n
        push!(ex.args, :(x .= $i; y += sum(x)))
    end
    push!(ex.args, :(return y))
    @eval global function issue46535(x)
        $ex
    end
end

on master

julia> using BaseBenchmarks;

julia> BaseBenchmarks.load!("inference");

julia> run(BaseBenchmarks.SUITE["inference"]["abstract interpretation"]["issue46535"]; verbose = true)
BenchmarkTools.Trial: 74 samples with 1 evaluation.
 Range (min … max):  12.535 ms …    1.083 s  ┊ GC (min … max): 0.00% … 1.32%
 Time  (median):     13.432 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.582 ms ± 124.252 ms  ┊ GC (mean ± σ):  0.67% ± 0.15%

   ▂▂█ ▂                                                        
  ████▇█▅▄▄▄▅▃▃▁▃▃▃▁▃▁▁▁▁▁▁▁▃▁▃▁▁▃▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  12.5 ms         Histogram: frequency by time           25 ms <

 Memory estimate: 3.38 MiB, allocs estimate: 57163.

on this PR

julia> using BaseBenchmarks;

julia> BaseBenchmarks.load!("inference");

julia> run(BaseBenchmarks.SUITE["inference"]["abstract interpretation"]["issue46535"]; verbose = true)
BenchmarkTools.Trial: 78 samples with 1 evaluation.
 Range (min … max):  10.696 ms …    1.005 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.302 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.720 ms ± 112.423 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▇█▅▂                                                         
  █████▄█▃▃▅▃▄▁▃▁▄▁▄▃▁▁▃▁▁▁▁▁▁▁▁▁▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▃ ▁
  10.7 ms         Histogram: frequency by time           21 ms <

 Memory estimate: 3.15 MiB, allocs estimate: 52055.

This benchmark target corresponds to JuliaLang/julia/pull/46535. `method_match_cache` contains many artificially generated broadcasting operations, that will lead to constant propagations and accompanying method match analysis on call sites with same abstract signatures, e.g. - `Tuple{typeof(convert), Type{Int64}, Int64}` - `Tuple{typeof(convert), Type{Float64}, Float64}` Since we currently don't cache method match result for abstract call signatures on the level of the runtime system, we can obtain the best performance if we cache such abstract method match results on Julia level, that will be revived by JuliaLang/julia/pull/46535.

aviatesk · 2022-08-30T11:49:57Z

Can we come up with a nanosoldier benchmark that mirrors what ControlSystems does here, so we'd catch regressions of this form on benchmarking runs?

Filed as JuliaCI/BaseBenchmarks.jl#299.

baggepinnen · 2022-08-30T11:58:51Z

This PR appears to improve the compile time with over 13x 🥳 , but the benchmark shows a 1.17x improvement only, is that really enough to catch regressions that are smaller than 13x?

aviatesk · 2022-08-30T12:03:51Z

I think we probably won't want to replicate the entire call graph of the norm(C_212) example (as it is quite big call graph and thus the benchmark would be slow). I believe the method_match_cache function would be enough as it exploits the same compiler performance pitfall as norm(C_212) though.

#299) This benchmark target corresponds to JuliaLang/julia/pull/46535. `method_match_cache` contains many artificially generated broadcasting operations, that will lead to constant propagations and accompanying method match analysis on call sites with same abstract signatures, e.g. - `Tuple{typeof(convert), Type{Int64}, Int64}` - `Tuple{typeof(convert), Type{Float64}, Float64}` Since we currently don't cache method match result for abstract call signatures on the level of the runtime system, we can obtain the best performance if we cache such abstract method match results on Julia level, that will be revived by JuliaLang/julia/pull/46535.

@time

`CachedMethodTable` was removed within #44240 as we couldn't confirm any performance improvement then. However it turns out the optimization was critical in some real world cases (e.g. #46492), so this commit revives the mechanism with the following tweaks that should make it more effective: - create method table cache per inference (rather than per local inference on a function call as on the previous implementation) - only use cache mechanism for abstract types (since we already cache lookup result at the next level as for concrete types) As a result, the following snippet reported at #46492 recovers the compilation performance: ```julia using ControlSystems a_2 = [-5 -3; 2 -9] C_212 = ss(a_2, [1; 2], [1 0; 0 1], [0; 0]) @time norm(C_212) ``` > on master ``` julia> @time norm(C_212) 364.489044 seconds (724.44 M allocations: 92.524 GiB, 6.01% gc time, 100.00% compilation time) 0.5345224838248489 ``` > on this commit ``` julia> @time norm(C_212) 26.539016 seconds (62.09 M allocations: 5.537 GiB, 5.55% gc time, 100.00% compilation time) 0.5345224838248489 ```

aviatesk · 2022-08-30T17:38:37Z

@nanosoldier runbenchmarks("inference", vs=":master")

vtjnash · 2022-08-30T17:47:40Z

SGTM

nanosoldier · 2022-08-30T18:27:00Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

aviatesk · 2022-08-30T23:32:34Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-08-31T00:20:57Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

KristofferC added compiler:inference Type inference backport 1.8 Change should be backported to release-1.8 labels Aug 29, 2022

aviatesk mentioned this pull request Aug 30, 2022

inference: add benchmark target to check benefit of method match cache JuliaCI/BaseBenchmarks.jl#299

Merged

aviatesk force-pushed the avi/46492 branch from 8445744 to fd178c3 Compare August 30, 2022 11:50

KristofferC mentioned this pull request Aug 30, 2022

release-1.8: Backports for Julia 1.8.1 #46376

Merged

35 tasks

aviatesk force-pushed the avi/46492 branch from fd178c3 to 2d05615 Compare August 30, 2022 17:38

aviatesk merged commit f066855 into master Aug 31, 2022

aviatesk deleted the avi/46492 branch August 31, 2022 02:40

KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Sep 7, 2022

sostock mentioned this pull request Sep 13, 2022

Broadcasting picks wrong method in 1.8.1 #46722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference: revive `CachedMethodTable` mechanism #46535

inference: revive `CachedMethodTable` mechanism #46535

aviatesk commented Aug 29, 2022

aviatesk commented Aug 29, 2022

nanosoldier commented Aug 29, 2022

Keno commented Aug 29, 2022

aviatesk commented Aug 30, 2022

aviatesk commented Aug 30, 2022 •

edited

Loading

aviatesk commented Aug 30, 2022

baggepinnen commented Aug 30, 2022 •

edited

Loading

aviatesk commented Aug 30, 2022

aviatesk commented Aug 30, 2022

vtjnash commented Aug 30, 2022

nanosoldier commented Aug 30, 2022

aviatesk commented Aug 30, 2022

nanosoldier commented Aug 31, 2022

inference: revive CachedMethodTable mechanism #46535

inference: revive CachedMethodTable mechanism #46535

Conversation

aviatesk commented Aug 29, 2022

aviatesk commented Aug 29, 2022

nanosoldier commented Aug 29, 2022

Keno commented Aug 29, 2022

aviatesk commented Aug 30, 2022

aviatesk commented Aug 30, 2022 • edited Loading

aviatesk commented Aug 30, 2022

baggepinnen commented Aug 30, 2022 • edited Loading

aviatesk commented Aug 30, 2022

aviatesk commented Aug 30, 2022

vtjnash commented Aug 30, 2022

nanosoldier commented Aug 30, 2022

aviatesk commented Aug 30, 2022

nanosoldier commented Aug 31, 2022

inference: revive `CachedMethodTable` mechanism #46535

inference: revive `CachedMethodTable` mechanism #46535

aviatesk commented Aug 30, 2022 •

edited

Loading

baggepinnen commented Aug 30, 2022 •

edited

Loading