Skip to content

Conversation

@serenity4
Copy link
Member

Setting world bounds on the created CodeInfo allows us to interpret opaque closures faster.

Taking the following example:

julia> f(x, y) = x + y
f (generic function with 1 method)

julia> ir = Base.code_ircode_by_type(Tuple{typeof(f), Int, Int})[1][1]
1 1%1 = intrinsic Base.add_int(_2, _3)::Int64                                                                                                                                                                                 │╻ +
  └──      return %1                                                                                                                                                                                                             ││
                                                                                                                                                                                                                                   

julia> ir.argtypes[1] = Tuple{}
Tuple{}

julia> oc = Core.OpaqueClosure(ir; do_compile=true)
(::Int64, ::Int64)->::Int64

this is what we emitted before

julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[8]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 =   dynamic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @btime $oc(1, 2)
  39.765 ns (0 allocations: 0 bytes)

and now:

julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[93]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 = intrinsic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @btime $oc(1, 2)
  2.678 ns (0 allocations: 0 bytes)

The overhead notably adds more and more with every statement, which for ~20 statements led to > 1 µs of overhead, and multiple allocations. This overhead is observed on 1.12+ only (1.11 evaluates as fast as with this change), which may have been surfaced by the partitioned bindings feature.

@serenity4 serenity4 self-assigned this Sep 22, 2025
@serenity4 serenity4 added backport 1.12 Change should be backported to release-1.12 performance Must go faster labels Sep 22, 2025
@KristofferC KristofferC mentioned this pull request Sep 24, 2025
24 tasks
@topolarity topolarity requested a review from Keno September 24, 2025 12:46
@oscardssmith oscardssmith merged commit a5576b4 into JuliaLang:master Sep 24, 2025
10 checks passed
KristofferC pushed a commit that referenced this pull request Sep 30, 2025
…59631)

Setting world bounds on the created `CodeInfo` allows us to interpret
opaque closures faster.

Taking the following example:

```julia
julia> f(x, y) = x + y
f (generic function with 1 method)

julia> ir = Base.code_ircode_by_type(Tuple{typeof(f), Int, Int})[1][1]
1 1 ─ %1 = intrinsic Base.add_int(_2, _3)::Int64                                                                                                                                                                                 │╻ +
  └──      return %1                                                                                                                                                                                                             ││

julia> ir.argtypes[1] = Tuple{}
Tuple{}

julia> oc = Core.OpaqueClosure(ir; do_compile=true)
(::Int64, ::Int64)->◌::Int64
```

this is what we emitted before
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[8]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 =   dynamic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  39.765 ns (0 allocations: 0 bytes)
```

and now:
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[93]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 = intrinsic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  2.678 ns (0 allocations: 0 bytes)
```

The overhead notably adds more and more with every statement, which for
~20 statements led to > 1 µs of overhead, and multiple allocations. This
overhead is observed on 1.12+ only (1.11 evaluates as fast as with this
change), which may have been surfaced by the partitioned bindings
feature.

(cherry picked from commit a5576b4)
KristofferC pushed a commit that referenced this pull request Sep 30, 2025
…59631)

Setting world bounds on the created `CodeInfo` allows us to interpret
opaque closures faster.

Taking the following example:

```julia
julia> f(x, y) = x + y
f (generic function with 1 method)

julia> ir = Base.code_ircode_by_type(Tuple{typeof(f), Int, Int})[1][1]
1 1 ─ %1 = intrinsic Base.add_int(_2, _3)::Int64                                                                                                                                                                                 │╻ +
  └──      return %1                                                                                                                                                                                                             ││

julia> ir.argtypes[1] = Tuple{}
Tuple{}

julia> oc = Core.OpaqueClosure(ir; do_compile=true)
(::Int64, ::Int64)->◌::Int64
```

this is what we emitted before
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[8]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 =   dynamic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  39.765 ns (0 allocations: 0 bytes)
```

and now:
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[93]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 = intrinsic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  2.678 ns (0 allocations: 0 bytes)
```

The overhead notably adds more and more with every statement, which for
~20 statements led to > 1 µs of overhead, and multiple allocations. This
overhead is observed on 1.12+ only (1.11 evaluates as fast as with this
change), which may have been surfaced by the partitioned bindings
feature.

(cherry picked from commit a5576b4)
@KristofferC KristofferC mentioned this pull request Sep 30, 2025
47 tasks
KristofferC pushed a commit that referenced this pull request Sep 30, 2025
…59631)

Setting world bounds on the created `CodeInfo` allows us to interpret
opaque closures faster.

Taking the following example:

```julia
julia> f(x, y) = x + y
f (generic function with 1 method)

julia> ir = Base.code_ircode_by_type(Tuple{typeof(f), Int, Int})[1][1]
1 1 ─ %1 = intrinsic Base.add_int(_2, _3)::Int64                                                                                                                                                                                 │╻ +
  └──      return %1                                                                                                                                                                                                             ││

julia> ir.argtypes[1] = Tuple{}
Tuple{}

julia> oc = Core.OpaqueClosure(ir; do_compile=true)
(::Int64, ::Int64)->◌::Int64
```

this is what we emitted before
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[8]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 =   dynamic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  39.765 ns (0 allocations: 0 bytes)
```

and now:
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[93]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 = intrinsic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  2.678 ns (0 allocations: 0 bytes)
```

The overhead notably adds more and more with every statement, which for
~20 statements led to > 1 µs of overhead, and multiple allocations. This
overhead is observed on 1.12+ only (1.11 evaluates as fast as with this
change), which may have been surfaced by the partitioned bindings
feature.

(cherry picked from commit a5576b4)
xal-0 pushed a commit to xal-0/julia that referenced this pull request Sep 30, 2025
…uliaLang#59631)

Setting world bounds on the created `CodeInfo` allows us to interpret
opaque closures faster.

Taking the following example:

```julia
julia> f(x, y) = x + y
f (generic function with 1 method)

julia> ir = Base.code_ircode_by_type(Tuple{typeof(f), Int, Int})[1][1]
1 1 ─ %1 = intrinsic Base.add_int(_2, _3)::Int64                                                                                                                                                                                 │╻ +
  └──      return %1                                                                                                                                                                                                             ││
                                                                                                                                                                                                                                   

julia> ir.argtypes[1] = Tuple{}
Tuple{}

julia> oc = Core.OpaqueClosure(ir; do_compile=true)
(::Int64, ::Int64)->◌::Int64
```

this is what we emitted before
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[8]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 =   dynamic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  39.765 ns (0 allocations: 0 bytes)
```

and now:
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[93]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 = intrinsic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  2.678 ns (0 allocations: 0 bytes)
```

The overhead notably adds more and more with every statement, which for
~20 statements led to > 1 µs of overhead, and multiple allocations. This
overhead is observed on 1.12+ only (1.11 evaluates as fast as with this
change), which may have been surfaced by the partitioned bindings
feature.
KristofferC pushed a commit that referenced this pull request Oct 12, 2025
…59631)

Setting world bounds on the created `CodeInfo` allows us to interpret
opaque closures faster.

Taking the following example:

```julia
julia> f(x, y) = x + y
f (generic function with 1 method)

julia> ir = Base.code_ircode_by_type(Tuple{typeof(f), Int, Int})[1][1]
1 1 ─ %1 = intrinsic Base.add_int(_2, _3)::Int64                                                                                                                                                                                 │╻ +
  └──      return %1                                                                                                                                                                                                             ││

julia> ir.argtypes[1] = Tuple{}
Tuple{}

julia> oc = Core.OpaqueClosure(ir; do_compile=true)
(::Int64, ::Int64)->◌::Int64
```

this is what we emitted before
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[8]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 =   dynamic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  39.765 ns (0 allocations: 0 bytes)
```

and now:
```julia
julia> @code_typed oc(1, 2)
Pair{Core.CodeInfo, Any}(CodeInfo(
    @ REPL[93]:1 within `f`
   ┌ @ int.jl:87 within `+`
1 ─│ %1 = intrinsic Base.add_int(none@_2, none@_3)::Int64
└──│      return %1
   └
), Int64)

julia> using BenchmarkTools; @Btime $oc(1, 2)
  2.678 ns (0 allocations: 0 bytes)
```

The overhead notably adds more and more with every statement, which for
~20 statements led to > 1 µs of overhead, and multiple allocations. This
overhead is observed on 1.12+ only (1.11 evaluates as fast as with this
change), which may have been surfaced by the partitioned bindings
feature.

(cherry picked from commit a5576b4)
@KristofferC KristofferC removed the backport 1.12 Change should be backported to release-1.12 label Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants