Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type creation in the absence of info from code_typed #25267

Closed
timholy opened this issue Dec 25, 2017 · 6 comments
Closed

Type creation in the absence of info from code_typed #25267

timholy opened this issue Dec 25, 2017 · 6 comments
Labels
performance Must go faster

Comments

@timholy
Copy link
Member

timholy commented Dec 25, 2017

In the context of my work on #23692, I'm wondering if there's anything that can be done at the compiler level to improve the performance of code in this demo: (EDIT: performance issues worked around, see below)

julia> using BenchmarkTools, Profile

julia> a = rand(2, 2);

julia> b = similar(a);

julia> include("newbc.jl")
_copyto! (generic function with 1 method)

julia> @btime mybroadcast!(+, $b, $a, 1);
  29.467 ns (2 allocations: 96 bytes)

julia> @btime broadcast!(+, $b, $a, 1);
  16.169 ns (0 allocations: 0 bytes)

where newbc.jl is defined in this gist. Moreover, if you change that last @noinline to @inline, then you get

julia> @btime mybroadcast!(+, $b, $a, 1);
  2.133 μs (13 allocations: 544 bytes)

What's strange about this is that if you look at @code_warntype of either the inline/noinline version, they look pretty harmless: every reasonable variable is optimized out, there are no type instabilities, etc. For the @inline variant, there are not even calls to creating a new type. Yet, if you add @profile in front of those @btime statements and look at the resulting output (the interesting part is near the end of the display), you'll see that a lot of time is spent gc-marking and/or instantiating the Type.

@JeffBezanson JeffBezanson added the performance Must go faster label Dec 26, 2017
@timholy
Copy link
Member Author

timholy commented Dec 27, 2017

OK, I've looked at this a bit more. What seems really suspicious is that if you look at the lowered and type-inferred code, there are no references to any kind of Broadcasted object---it appears to be fully optimized out (apologies for the length):

julia> @code_warntype mybroadcast!(+, b, a, 1)
Variables:
  f<optimized out>
  dest::Array{Float64,2}
  args@_4::Tuple{Array{Float64,2},Int64}
  style<optimized out>
  newargs<optimized out>
  bc<optimized out>
  ibc<optimized out>
  T<optimized out>
  axes<optimized out>
  _newindexer<optimized out>
  indexing<optimized out>
  #temp#@_24::Bool
  #temp#@_25::Bool
  #temp#@_26::CartesianIndex{1}
  i#1041::Int64
  val<optimized out>
  iterfirst<optimized out>
  iterlast<optimized out>
  #temp#@_37::CartesianIndex{1}
  px<optimized out>
  py<optimized out>
  R<optimized out>

Body:
  begin
      #= line 168 =#
      SSAValue(5) = (Core.getfield)(args@_4::Tuple{Array{Float64,2},Int64}, 1)::Array{Float64,2}
      SSAValue(6) = (Core.getfield)(args@_4::Tuple{Array{Float64,2},Int64}, 2)::Int64
      #= line 170 =#
      # meta: location /Users/timholy/src/julia/newbc2.jl instantiate 102
      #= line 104 =#
      # meta: location /Users/timholy/src/julia/newbc2.jl instantiate 109
      # meta: location broadcast.jl combine_indices 294
      # meta: location broadcast.jl broadcast_indices 222
      # meta: location broadcast.jl broadcast_indices 226
      # meta: location abstractarray.jl axes 80
      # meta: location array.jl size 113
      SSAValue(134) = (Base.arraysize)(SSAValue(5), 1)::Int64
      SSAValue(135) = (Base.arraysize)(SSAValue(5), 2)::Int64
      # meta: pop location
      # meta: location tuple.jl map 151
      # meta: location range.jl Type 180
      # meta: location range.jl Type 178
      # meta: location promotion.jl max 397
      # meta: location int.jl < 49
      SSAValue(152) = (Base.slt_int)(SSAValue(134), 0)::Bool
      # meta: pop location
      SSAValue(151) = (Base.select_value)(SSAValue(152), 0, SSAValue(134))::Int64
      # meta: pop locations (3)
      # meta: location range.jl Type 180
      # meta: location range.jl Type 178
      # meta: location promotion.jl max 397
      # meta: location int.jl < 49
      SSAValue(163) = (Base.slt_int)(SSAValue(135), 0)::Bool
      # meta: pop location
      SSAValue(162) = (Base.select_value)(SSAValue(163), 0, SSAValue(135))::Int64
      # meta: pop locations (8)
      #= line 110 =#
      # meta: location /Users/timholy/src/julia/newbc2.jl instantiate 120
      #= line 121 =#
      # meta: location /Users/timholy/src/julia/newbc2.jl mapTupleLL 32
      # meta: location /Users/timholy/src/julia/newbc2.jl _newindexer 119
      # meta: location broadcast.jl newindexer 350
      # meta: location broadcast.jl broadcast_indices 222
      # meta: location broadcast.jl broadcast_indices 226
      # meta: location abstractarray.jl axes 80
      # meta: location array.jl size 113
      SSAValue(298) = (Base.arraysize)(SSAValue(5), 1)::Int64
      SSAValue(299) = (Base.arraysize)(SSAValue(5), 2)::Int64
      # meta: pop location
      # meta: location tuple.jl map 151
      # meta: location range.jl Type 180
      # meta: location range.jl Type 178
      # meta: location promotion.jl max 397
      # meta: location int.jl < 49
      SSAValue(316) = (Base.slt_int)(SSAValue(298), 0)::Bool
      # meta: pop location
      SSAValue(315) = (Base.select_value)(SSAValue(316), 0, SSAValue(298))::Int64
      # meta: pop locations (3)
      # meta: location range.jl Type 180
      # meta: location range.jl Type 178
      # meta: location promotion.jl max 397
      # meta: location int.jl < 49
      SSAValue(327) = (Base.slt_int)(SSAValue(299), 0)::Bool
      # meta: pop location
      SSAValue(326) = (Base.select_value)(SSAValue(327), 0, SSAValue(299))::Int64
      # meta: pop locations (7)
      # meta: location broadcast.jl shapeindexer 353
      #= line 354 =#
      # meta: location broadcast.jl shapeindexer 353
      #= line 355 =#
      # meta: location range.jl == 595
      # meta: location promotion.jl == 386
      SSAValue(382) = (SSAValue(162) === SSAValue(326))::Bool
      # meta: pop location
      # meta: location bool.jl & 36
      SSAValue(383) = (Base.and_int)(true, SSAValue(382))::Bool
      # meta: pop locations (3)
      #= line 355 =#
      # meta: location range.jl == 595
      # meta: location promotion.jl == 386
      SSAValue(411) = (SSAValue(151) === SSAValue(315))::Bool
      # meta: pop location
      # meta: location bool.jl & 36
      SSAValue(412) = (Base.and_int)(true, SSAValue(411))::Bool
      # meta: pop locations (9)
      #= line 171 =#
      # meta: location /Users/timholy/src/julia/newbc2.jl copyto! 177
      # meta: location /Users/timholy/src/julia/newbc2.jl copyto! 182
      # meta: location broadcast.jl check_broadcast_indices 327
      # meta: location broadcast.jl broadcast_indices 222
      # meta: location broadcast.jl broadcast_indices 226
      # meta: location abstractarray.jl axes 80
      # meta: location array.jl size 113
      SSAValue(471) = (Base.arraysize)(dest::Array{Float64,2}, 1)::Int64
      SSAValue(472) = (Base.arraysize)(dest::Array{Float64,2}, 2)::Int64
      # meta: pop location
      # meta: location tuple.jl map 151
      # meta: location range.jl Type 180
      # meta: location range.jl Type 178
      # meta: location promotion.jl max 397
      # meta: location int.jl < 49
      SSAValue(489) = (Base.slt_int)(SSAValue(471), 0)::Bool
      # meta: pop location
      SSAValue(488) = (Base.select_value)(SSAValue(489), 0, SSAValue(471))::Int64
      # meta: pop locations (3)
      # meta: location range.jl Type 180
      # meta: location range.jl Type 178
      # meta: location promotion.jl max 397
      # meta: location int.jl < 49
      SSAValue(500) = (Base.slt_int)(SSAValue(472), 0)::Bool
      # meta: pop location
      SSAValue(499) = (Base.select_value)(SSAValue(500), 0, SSAValue(472))::Int64
      # meta: pop locations (7)
      # meta: location broadcast.jl check_broadcast_shape 324
      # meta: location broadcast.jl _bcsm 313
      # meta: location range.jl == 595
      # meta: location promotion.jl == 386
      SSAValue(530) = (SSAValue(151) === SSAValue(488))::Bool
      # meta: pop location
      # meta: location bool.jl & 36
      SSAValue(531) = (Base.and_int)(true, SSAValue(530))::Bool
      # meta: pop locations (2)
      unless SSAValue(531) goto 122
      #temp#@_24::Bool = SSAValue(531)
      goto 127
      122: 
      # meta: location promotion.jl == 386
      SSAValue(537) = (SSAValue(488) === 1)::Bool
      # meta: pop location
      #temp#@_24::Bool = SSAValue(537)
      127: 
      # meta: pop location
      SSAValue(503) = #temp#@_24::Bool
      unless SSAValue(503) goto 132
      goto 137
      132: 
      # meta: location array.jl Type 12
      SSAValue(539) = $(Expr(:new, :(Base.DimensionMismatch), "array could not be broadcast to match destination"))
      # meta: pop location
      (Base.Broadcast.throw)(SSAValue(539))::Union{}
      137: 
      #= line 325 =#
      # meta: location broadcast.jl check_broadcast_shape 324
      # meta: location broadcast.jl _bcsm 313
      # meta: location range.jl == 595
      # meta: location promotion.jl == 386
      SSAValue(577) = (SSAValue(162) === SSAValue(499))::Bool
      # meta: pop location
      # meta: location bool.jl & 36
      SSAValue(578) = (Base.and_int)(true, SSAValue(577))::Bool
      # meta: pop locations (2)
      unless SSAValue(578) goto 151
      #temp#@_25::Bool = SSAValue(578)
      goto 156
      151: 
      # meta: location promotion.jl == 386
      SSAValue(584) = (SSAValue(499) === 1)::Bool
      # meta: pop location
      #temp#@_25::Bool = SSAValue(584)
      156: 
      # meta: pop location
      SSAValue(550) = #temp#@_25::Bool
      unless SSAValue(550) goto 161
      goto 166
      161: 
      # meta: location array.jl Type 12
      SSAValue(586) = $(Expr(:new, :(Base.DimensionMismatch), "array could not be broadcast to match destination"))
      # meta: pop location
      (Base.Broadcast.throw)(SSAValue(586))::Union{}
      166: 
      # meta: pop locations (3)
      #= line 183 =#
      # meta: location /Users/timholy/src/julia/newbc2.jl _copyto! 189
      # meta: location simdloop.jl
      #= line 66 =#
      # meta: location multidimensional.jl start 277
      # meta: location multidimensional.jl first 308
      # meta: location multidimensional.jl Type 67
      # meta: location multidimensional.jl Type 64
      SSAValue(641) = $(Expr(:new, CartesianIndex{1}, (1,)))
      # meta: pop locations (3)
      #= line 278 =#
      # meta: location sysimg.jl getproperty 8
      SSAValue(659) = (Base.getfield)(SSAValue(641), :I)::Tuple{Int64}
      # meta: pop location
      # meta: location tuple.jl map 169
      # meta: location tuple.jl getindex 21
      SSAValue(665) = (Base.getfield)(SSAValue(659), 1, true)::Int64
      # meta: pop location
      # meta: location operators.jl > 250
      # meta: location int.jl < 49
      SSAValue(668) = (Base.slt_int)(SSAValue(162), SSAValue(665))::Bool
      # meta: pop locations (3)
      unless SSAValue(668) goto 205
      #= line 279 =#
      # meta: location multidimensional.jl + 116
      # meta: location tuple.jl map 150
      # meta: location multidimensional.jl #5 116
      # meta: location int.jl + 53
      SSAValue(686) = (Base.add_int)(SSAValue(162), 1)::Int64
      # meta: pop locations (2)
      SSAValue(682) = (Core.tuple)(SSAValue(686))::Tuple{Int64}
      # meta: pop location
      # meta: location multidimensional.jl Type 64
      SSAValue(690) = $(Expr(:new, CartesianIndex{1}, SSAValue(682)))
      # meta: pop locations (2)
      #temp#@_37::CartesianIndex{1} = SSAValue(690)
      goto 208
      205: 
      #= line 281 =#
      #temp#@_37::CartesianIndex{1} = SSAValue(641)
      208: 
      # meta: pop location
      #temp#@_26::CartesianIndex{1} = #temp#@_37::CartesianIndex{1}
      211: 
      # meta: location multidimensional.jl done 296
      # meta: location sysimg.jl getproperty 8
      SSAValue(699) = (Base.getfield)(#temp#@_26::CartesianIndex{1}, :I)::Tuple{Int64}
      # meta: pop location
      # meta: location tuple.jl getindex 21
      SSAValue(700) = (Base.getfield)(SSAValue(699), 1, true)::Int64
      # meta: pop location
      # meta: location operators.jl > 250
      # meta: location int.jl < 49
      SSAValue(707) = (Base.slt_int)(SSAValue(162), SSAValue(700))::Bool
      # meta: pop locations (3)
      SSAValue(592) = (Base.not_int)(SSAValue(707))::Bool
      unless SSAValue(592) goto 316
      # meta: location multidimensional.jl next 284
      # meta: location sysimg.jl getproperty 8
      SSAValue(716) = (Base.getfield)(#temp#@_26::CartesianIndex{1}, :I)::Tuple{Int64}
      # meta: pop location
      # meta: location multidimensional.jl inc 288
      # meta: location tuple.jl getindex 21
      SSAValue(749) = (Base.getfield)(SSAValue(716), 1, true)::Int64
      # meta: pop location
      # meta: location int.jl + 53
      SSAValue(750) = (Base.add_int)(SSAValue(749), 1)::Int64
      # meta: pop location
      SSAValue(748) = (Core.tuple)(SSAValue(750))::Tuple{Int64}
      # meta: pop location
      # meta: location multidimensional.jl Type 67
      # meta: location multidimensional.jl Type 64
      SSAValue(756) = $(Expr(:new, CartesianIndex{1}, SSAValue(748)))
      # meta: pop locations (2)
      SSAValue(1067) = #temp#@_26::CartesianIndex{1}
      # meta: pop location
      #temp#@_26::CartesianIndex{1} = SSAValue(756)
      #= line 68 =#
      # meta: location int.jl < 49
      SSAValue(767) = (Base.slt_int)(0, SSAValue(151))::Bool
      # meta: pop location
      unless SSAValue(767) goto 314
      #= line 70 =#
      i#1041::Int64 = 0
      #= line 71 =#
      253: 
      # meta: location int.jl < 49
      SSAValue(768) = (Base.slt_int)(i#1041::Int64, SSAValue(151))::Bool
      # meta: pop location
      unless SSAValue(768) goto 312
      #= line 72 =#
      # meta: location multidimensional.jl simd_index 327
      # meta: location int.jl + 53
      SSAValue(779) = (Base.add_int)(i#1041::Int64, 1)::Int64
      # meta: pop location
      # meta: location sysimg.jl getproperty 8
      SSAValue(780) = (Base.getfield)(SSAValue(1067), :I)::Tuple{Int64}
      # meta: pop location
      SSAValue(782) = (Core.getfield)(SSAValue(780), 1)::Int64
      # meta: pop location
      #= line 73 =#
      # meta: location /Users/timholy/src/julia/newbc2.jl
      #= line 190 =#
      # meta: location /Users/timholy/src/julia/newbc2.jl _broadcast_getindex 140
      # meta: location /Users/timholy/src/julia/newbc2.jl _getindex 134
      # meta: location /Users/timholy/src/julia/newbc2.jl _getidx 133
      # meta: location broadcast.jl newindex 341
      # meta: location broadcast.jl _newindex 342
      # meta: location operators.jl ifelse 319
      SSAValue(829) = (Base.select_value)(SSAValue(412), SSAValue(779), 1)::Int64
      # meta: pop location
      # meta: location broadcast.jl _newindex 342
      # meta: location operators.jl ifelse 319
      SSAValue(855) = (Base.select_value)(SSAValue(383), SSAValue(782), 1)::Int64
      # meta: pop locations (4)
      # meta: location broadcast.jl _broadcast_getindex 373
      # meta: location broadcast.jl _broadcast_getindex 376
      # meta: location multidimensional.jl getindex 449
      # meta: location array.jl getindex 640
      SSAValue(920) = (Base.arrayref)(false, SSAValue(5), SSAValue(829), SSAValue(855))::Float64
      # meta: pop locations (6)
      # meta: location promotion.jl + 290
      # meta: location promotion.jl promote 261
      # meta: location promotion.jl _promote 219
      # meta: location number.jl convert 7
      # meta: location float.jl Type 57
      SSAValue(962) = (Base.sitofp)(Float64, SSAValue(6))::Float64
      # meta: pop locations (4)
      # meta: location float.jl + 394
      SSAValue(974) = (Base.add_float)(SSAValue(920), SSAValue(962))::Float64
      # meta: pop locations (3)
      # meta: location multidimensional.jl setindex! 451
      # meta: location array.jl setindex! 678
      (Base.arrayset)(false, dest::Array{Float64,2}, SSAValue(974), SSAValue(779), SSAValue(782))::Array{Float64,2}
      # meta: pop locations (3)
      #= line 74 =#
      # meta: location int.jl + 53
      SSAValue(1043) = (Base.add_int)(i#1041::Int64, 1)::Int64
      # meta: pop location
      i#1041::Int64 = SSAValue(1043)
      #= line 75 =#
      $(Expr(:simdloop))
      310: 
      goto 253
      312: 
      #= line 79 =#
      314: 
      goto 211
      316: 
      # meta: pop locations (4)
      return dest::Array{Float64,2}
  end::Array{Float64,2}

Yet if I set a breakpoint in jl_apply_type I get this:

(lldb) b jltypes.c:930
Breakpoint 1: where = libjulia-debug.0.7.0.dylib`jl_apply_type + 907 at jltypes.c:930, address = 0x00000001000957fb
(lldb) c
Process 4870 resuming
julia> mybroadcast!(+, b, a, 1);
Process 4870 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000957fb libjulia-debug.0.7.0.dylib`jl_apply_type(tc=0x000000010da23d50, params=0x00007fff5fbfea10, n=1) at jltypes.c:930
   927 	                jl_type_error_rt("Type", jl_symbol_name(ua->var->name), (jl_value_t*)ua->var, pi);
   928 	        }
   929 	
-> 930 	        tc = jl_instantiate_unionall(ua, pi);
   931 	    }
   932 	    JL_GC_POP();
   933 	    return tc;
Target 0: (julia) stopped.
(lldb) call jl_(tc)
Main.Broadcasted{Style, ElType, Axes, Indexing, F, Args} where Args<:(Main.TupleLL{T, Rest} where Rest where T) where F where Indexing<:Union{Nothing, Main.TupleLL{T, Rest} where Rest where T} where Axes where ElType where Style<:Union{Nothing, Base.Broadcast.BroadcastStyle}

Not only does the lowered & inferred code suggest that it shouldn't be creating a Broadcasted, it certainly seems like it shouldn't be creating a UnionAll since all the types are specified concretely by the various constructors.

@vtjnash
Copy link
Member

vtjnash commented Dec 28, 2017

Are you just missing forced specialization on some Function argument?

@timholy
Copy link
Member Author

timholy commented Dec 31, 2017

The thing that confuses me most is that if you copy/paste the typed-code into an editor and search for Broadcasted, invoke, or call, you'll see that there aren't any. So why is it creating an Broadcasted? Why does it depend on whether a function is inlined or not? It seems there's some extra state affecting the compilation process that probably shouldn't be.

@KristofferC
Copy link
Member

As vtjnash said, isn't this about the specialization heuristics? E.g writing map(f::F, x) where {F} instead of map(f, x). I think the code_ macros report with specialization on f but that may not happen in practice without forcing specialization.

@timholy timholy changed the title Runtime overhead from unnecessary (?) gc-marking and type instantiation Type creation in the absence of info from code_typed Jan 4, 2018
@timholy
Copy link
Member Author

timholy commented Jan 4, 2018

Adding specialization to the top-level call largely fixes the performance problem. I'm still puzzled about what the compiler is doing here and why it's so different from what's suggested by code_typed.

@vtjnash
Copy link
Member

vtjnash commented Jan 4, 2018

dup #19137

@vtjnash vtjnash closed this as completed Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants