Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer: alias-aware SROA #43888

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

optimizer: alias-aware SROA #43888

wants to merge 1 commit into from

Conversation

aviatesk
Copy link
Sponsor Member

@aviatesk aviatesk commented Jan 21, 2022

Enhances SROA of mutables using the novel Julia-level
escape analysis (on top of #43800):

  1. alias-aware SROA, mutable ϕ-node elimination
  2. isdefined check elimination
  3. load-forwarding for non-eliminable but analyzable mutables

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:

julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case, e.g.:

julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto #3 if not cond
2 ─     goto #4
3nothing::Nothing
4return x
) => Any

Combined with the powerful alias analysis and ϕ-node handling,
the following realistic examples will be fully optimized:

julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y);

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end;

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @allocated compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)
0

julia> @allocated compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)
0

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @allocated compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))
0

julia> @allocated compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))
0

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @allocated compute_point!(10000, af, bf)
0

julia> @allocated compute_point!(10000, ac, bc)
0

2. isdefined check elimination

This commit also implements a simple optimization to eliminate
isdefined call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g. from @vchuravy's old example:

julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1%1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto #3 if not true
2 ─      goto #4
3$(Expr(:throw_undef_if_not, :arr, false))::Any
4return %1
) => Vector{Any}

3. load-forwarding for non-eliminable but analyzable mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):

julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1%1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto #3 if not c
2return s
3return %1
) => Union{Base.RefValue{Any}, String}

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis is general and already reasons about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.

@aviatesk aviatesk added the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Jan 21, 2022
@aviatesk
Copy link
Sponsor Member Author

Replaces #43267 and #43505

@oscardssmith
Copy link
Member

With this PR, will the pattern of @btime f(Ref($(x))[]) no longer prevent the benchmark from specializing on x?

@Keno
Copy link
Member

Keno commented Jan 22, 2022

With this PR, will the pattern of @btime f(Ref($(x))[]) no longer prevent the benchmark from specializing on x?

This is an optimizer-time change, so that pattern is fine to block inference-time specialization (which is most of what we wanted to block, LLVM was already able to SROA this. That said, we have Base.inferencebarrier which has this pattern and will be updated if inference ever learns to look through this pattern.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 22, 2022

The actual benchmarking pattern is @btime f($(Ref(x))[]), since any other combination of parenthesis is vulnerable to optimizations distortion.

@aviatesk
Copy link
Sponsor Member Author

All tests should be green now.

@nanosoldier runbenchmarks(!"scalar", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/EscapeAnalysis branch 2 times, most recently from 6f2c4a5 to ca12688 Compare January 24, 2022 05:57
@aviatesk aviatesk changed the base branch from avi/EscapeAnalysis to master January 24, 2022 06:43
@aviatesk aviatesk changed the base branch from master to avi/EscapeAnalysis January 24, 2022 06:43
aviatesk added a commit that referenced this pull request Jan 24, 2022
Implements a simple Julia-level array allocation elimination on top of #43888.

```julia
julia> code_typed((String,String)) do s, t
           a = Vector{Base.RefValue{String}}(undef, 2)
           a[1] = Ref(s)
           a[2] = Ref(t)
           return a[1][]
       end
```
```diff
diff --git a/master b/pr
index 9c8da14380..5b63d08190 100644
--- a/master
+++ b/pr
@@ -1,11 +1,4 @@
 1-element Vector{Any}:
  CodeInfo(
-1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Base.RefValue{String}}, svec(Any, Int64), 0, :(:ccall), Vector{Base.RefValue{String}}, 2, 2))::Vector{Base.RefValue{String}}
-│   %2 = %new(Base.RefValue{String}, s)::Base.RefValue{String}
-│        Base.arrayset(true, %1, %2, 1)::Vector{Base.RefValue{String}}
-│   %4 = %new(Base.RefValue{String}, t)::Base.RefValue{String}
-│        Base.arrayset(true, %1, %4, 2)::Vector{Base.RefValue{String}}
-│   %6 = Base.arrayref(true, %1, 1)::Base.RefValue{String}
-│   %7 = Base.getfield(%6, :x)::String
-└──      return %7
+1 ─     return s
 ) => String
```

Still this array SROA handle is very limited and able to handle only
trivial examples (though I confirmed this version already eliminates
few array allocations during sysimg build).
For those who interested, I added some discussions on array optimization
[here](https://aviatesk.github.io/EscapeAnalysis.jl/dev/#EA-Array-Analysis).
@aviatesk
Copy link
Sponsor Member Author

Some non-scientific numbers:

on master (580f51d)

~/julia/julia master 12s
❯ ./usr/bin/julia --project=~/julia/plot -e "@time using Plots; @time plot(rand(10,3));"
  4.708875 seconds (8.71 M allocations: 585.312 MiB, 3.42% gc time, 30.38% compilation time)
  4.998614 seconds (14.32 M allocations: 824.033 MiB, 5.63% gc time, 99.82% compilation time)

on this PR (619edba)

~/julia/julia4 remotes/origin/avi/EASROA
❯ ./usr/bin/julia --project=~/julia/plot -e "@time using Plots; @time plot(rand(10,3));"
  4.840917 seconds (8.72 M allocations: 586.129 MiB, 3.27% gc time, 28.89% compilation time)
  4.864891 seconds (14.34 M allocations: 827.091 MiB, 5.38% gc time, 99.83% compilation time)

on #43909 (e22ef67)

~/julia/julia3 avi/ArraySROA* 13s
❯ ./usr/bin/julia --project=~/julia/plot -e "@time using Plots; @time plot(rand(10,3));"
  4.693940 seconds (8.69 M allocations: 582.534 MiB, 3.89% gc time, 28.87% compilation time)
  4.935694 seconds (14.47 M allocations: 835.919 MiB, 7.81% gc time, 99.83% compilation time)

aviatesk added a commit that referenced this pull request Jan 24, 2022
Implements a simple Julia-level array allocation elimination on top of #43888.

```julia
julia> code_typed((String,String)) do s, t
           a = Vector{Base.RefValue{String}}(undef, 2)
           a[1] = Ref(s)
           a[2] = Ref(t)
           return a[1][]
       end
```
```diff
diff --git a/master b/pr
index 9c8da14380..5b63d08190 100644
--- a/master
+++ b/pr
@@ -1,11 +1,4 @@
 1-element Vector{Any}:
  CodeInfo(
-1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Base.RefValue{String}}, svec(Any, Int64), 0, :(:ccall), Vector{Base.RefValue{String}}, 2, 2))::Vector{Base.RefValue{String}}
-│   %2 = %new(Base.RefValue{String}, s)::Base.RefValue{String}
-│        Base.arrayset(true, %1, %2, 1)::Vector{Base.RefValue{String}}
-│   %4 = %new(Base.RefValue{String}, t)::Base.RefValue{String}
-│        Base.arrayset(true, %1, %4, 2)::Vector{Base.RefValue{String}}
-│   %6 = Base.arrayref(true, %1, 1)::Base.RefValue{String}
-│   %7 = Base.getfield(%6, :x)::String
-└──      return %7
+1 ─     return s
 ) => String
```

Still this array SROA handle is very limited and able to handle only
trivial examples (though I confirmed this version already eliminates
few array allocations during sysimg build).
For those who interested, I added some discussions on array optimization
[here](https://aviatesk.github.io/EscapeAnalysis.jl/dev/#EA-Array-Analysis).
@aviatesk
Copy link
Sponsor Member Author

@nanosoldier runtests(["AbstractAlgebra", "AutomotiveSimulator", "BlochSim", "CSDP", "ClusteringGA", "DIVAnd", "FunSQL", "GaussianMixtureAlignment", "InfiniteOpt", "IntervalTrees", "InvertibleNetworks", "Lighthouse", "MathOptInterface", "ONNXNaiveNASflux", "PermutationGroups", "ReactiveMP", "StaticKernels"])

@aviatesk
Copy link
Sponsor Member Author

@nanosoldier runbenchmarks("inference" || "misc", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible issues were detected. A full report can be found here.

aviatesk added a commit that referenced this pull request Feb 15, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (#43888),
array SROA (#43909), `mutating_arrayfreeze` optimization (#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA in this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA in this commit, and then merge the
depending PRs built on top of this commit like #43888, #43909 and #42465

This commit simply defines and runs EA inside Julia base compiler and
enables the existing test suite with it. In this commit, we just run EA
before inlining to generate IPO cache. The depending PRs, EA will be
invoked again after inlining to be used for various local optimizations.
ianatol pushed a commit to ianatol/julia that referenced this pull request Feb 16, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (JuliaLang#43888),
array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA in this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA in this commit, and then merge the
depending PRs built on top of this commit like JuliaLang#43888, JuliaLang#43909 and JuliaLang#42465

This commit simply defines and runs EA inside Julia base compiler and
enables the existing test suite with it. In this commit, we just run EA
before inlining to generate IPO cache. The depending PRs, EA will be
invoked again after inlining to be used for various local optimizations.
aviatesk added a commit that referenced this pull request Feb 16, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (#43888),
array SROA (#43909), `mutating_arrayfreeze` optimization (#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA in this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA in this commit, and then merge the
depending PRs built on top of this commit like #43888, #43909 and #42465

This commit simply defines and runs EA inside Julia base compiler and
enables the existing test suite with it. In this commit, we just run EA
before inlining to generate IPO cache. The depending PRs, EA will be
invoked again after inlining to be used for various local optimizations.
aviatesk added a commit that referenced this pull request Feb 16, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (#43888),
array SROA (#43909), `mutating_arrayfreeze` optimization (#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA in this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA in this commit, and then merge the
depending PRs built on top of this commit like #43888, #43909 and #42465

This commit simply defines and runs EA inside Julia base compiler and
enables the existing test suite with it. In this commit, we just run EA
before inlining to generate IPO cache. The depending PRs, EA will be
invoked again after inlining to be used for various local optimizations.
aviatesk added a commit that referenced this pull request Feb 16, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (#43888),
array SROA (#43909), `mutating_arrayfreeze` optimization (#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
aviatesk added a commit that referenced this pull request Feb 16, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (#43888),
array SROA (#43909), `mutating_arrayfreeze` optimization (#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
Base automatically changed from avi/EscapeAnalysis to master February 16, 2022 16:04
JeffBezanson pushed a commit that referenced this pull request Feb 16, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (#43888),
array SROA (#43909), `mutating_arrayfreeze` optimization (#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (JuliaLang#43888),
array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (JuliaLang#43888),
array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (JuliaLang#43888),
array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
Enhances SROA of mutables using the novel Julia-level escape analysis (on top of #43800):
1. alias-aware SROA, mutable ϕ-node elimination
2. `isdefined` check elimination
3. load-forwarding for non-eliminable but analyzable mutables

---

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:
```julia
julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0
```

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case as like:
```julia
julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto #3 if not cond
2 ─     goto #4
3 ─     nothing::Nothing
4 ┄     return x
) => Any
```

Combined with the alias analysis and ϕ-node handling above,
allocations in the following "realistic" examples will be optimized:
```julia
julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y);

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end;

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @allocated compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)
0

julia> @allocated compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)
0

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @allocated compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))
0

julia> @allocated compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))
0

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @allocated compute_point!(10000, af, bf)
0

julia> @allocated compute_point!(10000, ac, bc)
0
```

2. `isdefined` check elimination

This commit also implements a simple optimization to eliminate
`isdefined` call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g.:
```julia
julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto #3 if not true
2 ─      goto #4
3 ─      $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄      return %1
) => Vector{Any}
```

3. load-forwarding for non-eliminable but analyzable mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):
```julia
julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto #3 if not c
2 ─      return s
3 ─      return %1
) => Union{Base.RefValue{Any}, String}
```

---

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis already succeeds to reason about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.
aviatesk added a commit that referenced this pull request Mar 23, 2022
Mostly adapted from #43888.
Should be more robust and cover more cases.
aviatesk added a commit that referenced this pull request Mar 23, 2022
Mostly adapted from #43888.
Should be more robust and cover more cases.
@vchuravy
Copy link
Member

@aviatesk can you rebase this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants