Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference: revive CachedMethodTable mechanism #46535

Merged
merged 1 commit into from
Aug 31, 2022
Merged

inference: revive CachedMethodTable mechanism #46535

merged 1 commit into from
Aug 31, 2022

Conversation

aviatesk
Copy link
Member

CachedMethodTable was removed within #44240 as we couldn't confirm any
performance improvement then. However it turns out the optimization was
critical in some real world cases (e.g. #46492), so this commit revives
the mechanism with the following tweaks that should make it more effective:

  • create method table cache per inference (rather than per local
    inference on a function call as on the previous implementation)
  • only use cache mechanism for abstract types (since we already cache
    lookup result at the next level as for concrete types)

As a result, the following snippet reported at #46492 recovers the
compilation performance:

using ControlSystems
a_2 = [-5 -3; 2 -9]
C_212 = ss(a_2, [1; 2], [1 0; 0 1], [0; 0])
@time norm(C_212)

on master

julia> @time norm(C_212)
364.489044 seconds (724.44 M allocations: 92.524 GiB, 6.01% gc time, 100.00% compilation time)
0.5345224838248489

on this commit

julia> @time norm(C_212)
 26.539016 seconds (62.09 M allocations: 5.537 GiB, 5.55% gc time, 100.00% compilation time)
0.5345224838248489

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@Keno
Copy link
Member

Keno commented Aug 29, 2022

Can we come up with a nanosoldier benchmark that mirrors what ControlSystems does here, so we'd catch regressions of this form on benchmarking runs?

@KristofferC KristofferC added compiler:inference Type inference backport 1.8 Change should be backported to release-1.8 labels Aug 29, 2022
@aviatesk
Copy link
Member Author

I dumped the cached method signatures and AFAICT it seems like that their code base involves many broadcasting operations. Still not sure how the code contains so many broadcasting operations, but I may try to come up with a target benchmark example.

@aviatesk
Copy link
Member Author

aviatesk commented Aug 30, 2022

Okay, I come up with a following target example (quite artificial though),

let # see https://github.com/JuliaLang/julia/pull/45276
    n = 100
    ex = Expr(:block)
    var = gensym()
    push!(ex.args, :(x .= rand(); y = sum(x)))
    for i = 1:n
        push!(ex.args, :(x .= $i; y += sum(x)))
    end
    push!(ex.args, :(return y))
    @eval global function issue46535(x)
        $ex
    end
end

on master

julia> using BaseBenchmarks;

julia> BaseBenchmarks.load!("inference");

julia> run(BaseBenchmarks.SUITE["inference"]["abstract interpretation"]["issue46535"]; verbose = true)
BenchmarkTools.Trial: 74 samples with 1 evaluation.
 Range (min … max):  12.535 ms …    1.083 s  ┊ GC (min … max): 0.00% … 1.32%
 Time  (median):     13.432 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.582 ms ± 124.252 ms  ┊ GC (mean ± σ):  0.67% ± 0.15%

   ▂▂█ ▂                                                        
  ████▇█▅▄▄▄▅▃▃▁▃▃▃▁▃▁▁▁▁▁▁▁▃▁▃▁▁▃▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  12.5 ms         Histogram: frequency by time           25 ms <

 Memory estimate: 3.38 MiB, allocs estimate: 57163.

on this PR

julia> using BaseBenchmarks;

julia> BaseBenchmarks.load!("inference");

julia> run(BaseBenchmarks.SUITE["inference"]["abstract interpretation"]["issue46535"]; verbose = true)
BenchmarkTools.Trial: 78 samples with 1 evaluation.
 Range (min … max):  10.696 ms …    1.005 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.302 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.720 ms ± 112.423 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▇█▅▂                                                         
  █████▄█▃▃▅▃▄▁▃▁▄▁▄▃▁▁▃▁▁▁▁▁▁▁▁▁▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▃ ▁
  10.7 ms         Histogram: frequency by time           21 ms <

 Memory estimate: 3.15 MiB, allocs estimate: 52055.

aviatesk added a commit to JuliaCI/BaseBenchmarks.jl that referenced this pull request Aug 30, 2022
This benchmark target corresponds to JuliaLang/julia/pull/46535.

`method_match_cache` contains many artificially generated broadcasting
operations, that will lead to constant propagations and accompanying
method match analysis on call sites with same abstract signatures, e.g.
- `Tuple{typeof(convert), Type{Int64}, Int64}`
- `Tuple{typeof(convert), Type{Float64}, Float64}`

Since we currently don't cache method match result for abstract call
signatures on the level of the runtime system, we can obtain the best
performance if we cache such abstract method match results on Julia level,
that will be revived by JuliaLang/julia/pull/46535.
@aviatesk
Copy link
Member Author

Can we come up with a nanosoldier benchmark that mirrors what ControlSystems does here, so we'd catch regressions of this form on benchmarking runs?

Filed as JuliaCI/BaseBenchmarks.jl#299.

@baggepinnen
Copy link
Contributor

baggepinnen commented Aug 30, 2022

This PR appears to improve the compile time with over 13x 🥳 , but the benchmark shows a 1.17x improvement only, is that really enough to catch regressions that are smaller than 13x?

@aviatesk
Copy link
Member Author

I think we probably won't want to replicate the entire call graph of the norm(C_212) example (as it is quite big call graph and thus the benchmark would be slow). I believe the method_match_cache function would be enough as it exploits the same compiler performance pitfall as norm(C_212) though.

aviatesk added a commit to JuliaCI/BaseBenchmarks.jl that referenced this pull request Aug 30, 2022
#299)

This benchmark target corresponds to JuliaLang/julia/pull/46535.

`method_match_cache` contains many artificially generated broadcasting
operations, that will lead to constant propagations and accompanying
method match analysis on call sites with same abstract signatures, e.g.
- `Tuple{typeof(convert), Type{Int64}, Int64}`
- `Tuple{typeof(convert), Type{Float64}, Float64}`

Since we currently don't cache method match result for abstract call
signatures on the level of the runtime system, we can obtain the best
performance if we cache such abstract method match results on Julia level,
that will be revived by JuliaLang/julia/pull/46535.
`CachedMethodTable` was removed within #44240 as we couldn't confirm any
performance improvement then. However it turns out the optimization was
critical in some real world cases (e.g. #46492), so this commit revives
the mechanism with the following tweaks that should make it more effective:
- create method table cache per inference (rather than per local
  inference on a function call as on the previous implementation)
- only use cache mechanism for abstract types (since we already cache
  lookup result at the next level as for concrete types)

As a result, the following snippet reported at #46492 recovers the
compilation performance:
```julia
using ControlSystems
a_2 = [-5 -3; 2 -9]
C_212 = ss(a_2, [1; 2], [1 0; 0 1], [0; 0])
@time norm(C_212)
```

> on master
```
julia> @time norm(C_212)
364.489044 seconds (724.44 M allocations: 92.524 GiB, 6.01% gc time, 100.00% compilation time)
0.5345224838248489
```

> on this commit
```
julia> @time norm(C_212)
 26.539016 seconds (62.09 M allocations: 5.537 GiB, 5.55% gc time, 100.00% compilation time)
0.5345224838248489
```
@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@vtjnash
Copy link
Member

vtjnash commented Aug 30, 2022

SGTM

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk merged commit f066855 into master Aug 31, 2022
@aviatesk aviatesk deleted the avi/46492 branch August 31, 2022 02:40
@KristofferC KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants