-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inference/LLVM codegen/native codegen time split #251
Comments
@bkamins you already currently can dig into LLVM optimization and native codegen timings, via We haven't written big blog posts about those yet, I think, but there is at least some information in the documentation site:
Part of the difficultly, though, is that the signal you get from those two measurements is less useful than the signal you get from This is because for inference, julia currently performs type inference via a depth-first traversal, so by measuring trace spans, we get a nice flamegraph that you can look at, to understand both inclusive and exclusive inference timings. For LLVM optimization on the other hand, we pass functions through LLVM optimization in a somewhat random order, batched together arbitrarily, which makes it more difficult to account optimization time to individual functions.
And finally, I believe Hope that's helpful! :) If you find any of of that useful, documentation PRs are always welcome ❤️ :) |
The problem is that they start a new Julia session and I need to run them in an already running session so that I do not measure the cost of first compilation. I am interested only in consecutive compilation times. Is this possible? |
Reproducible example on Julia nightly:
And I am discussing the difference between ~0.11 for |
Ah, I understand! Yes, I fully agree!! I actually have a branch on my laptop that adds support for this, but never pushed it up for some dumb reason... I'm out for the day, but when I get home tomorrow I'll push the branch up and open a PR, and maybe you can help me wrap it up and get it over the finish line? :) Basically it's just adding flag to all those functions that control whether to run the measurements in a separate process or in the current one. And then I had a mega-function, I think I called it Will find it and push it up tomorrow! |
Excellent. Thank you! Please ping me in the PR. |
There's definitely some weird recompilation going on. @snoopi_deep begin
select(g, :n => (n -> n .> mean(n)))
end seems to reinfer mean(::SubArray{Int64, 1, Vector{Int64}, Tuple{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}, false}) every time. It doesn't seem to be constant-prop. However, julia> function mymean(v)
s = zero(eltype(v))
for item in v
s += item
end
return s/length(v)
end doesn't show exactly the same thing (the broadcasting part gets reinferred, but not |
(You can display the result in a flamegraph and click on the bars in ProfileView.) You can also find it in |
I still don't know what's going on at the level of the compiler, but if I make this change: ~/.julia/dev/DataFrames$ git diff
diff --git a/src/groupeddataframe/callprocessing.jl b/src/groupeddataframe/callprocessing.jl
index 740a9df7..d0c079b4 100644
--- a/src/groupeddataframe/callprocessing.jl
+++ b/src/groupeddataframe/callprocessing.jl
@@ -87,7 +87,7 @@ function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
end
end
-function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+function do_call(@nospecialize(f::Base.Callable), idx::AbstractVector{<:Integer},
starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
gd::GroupedDataFrame, incols::Tuple{AbstractVector}, i::Integer)
idx = view(idx, starts[i]:ends[i]) then I cut the time roughly in half and I don't recompile which is a case in which inference is no longer a substantial bottleneck. |
But if you do I think it is better to benchmark the decision about adding
as in it we call |
In this case this indeed improves performance (which is strange):
your proposal with
But if we use a simpler function the situation is reversed:
your proposal with
|
Thanks for specifying a case where the performance might matter. I don't think I hadn't noticed until I found the caller just by inserting an error that you're using tasks. That may help me figure out a MWE for the compiler issue. |
and I would assume that these two other places it matters to avoid dynamic dispatch as in both of them The call in If you start Julia with multiple threads these operations are supporting multi-threading indeed. |
Can you give me a workload that is guaranteed to call those? Maybe this one does, but if so: julia> using MethodAnalysis
julia> include("select_subset_test.jl")
julia> mctwf = methods(DataFrames._combine_tables_with_first!).ms[1]
_combine_tables_with_first!(first::Union{AbstractDataFrame, NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}}, outcols::Tuple{Vararg{AbstractVector, N}}, idx::Vector{Int64}, rowstart::Integer, colstart::Integer, f::Union{Function, Type}, gd::GroupedDataFrame, incols::Union{Nothing, Tuple, AbstractVector, NamedTuple}, colnames::Tuple{Vararg{Symbol, N}}, firstmulticol::DataFrames.FirstColCount) where N in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:354
julia> methodinstances(mctwf)
3-element Vector{Core.MethodInstance}:
MethodInstance for DataFrames._combine_tables_with_first!(::NamedTuple{(:x1,), Tuple{BitVector}}, ::Tuple{Vector{Bool}}, ::Vector{Int64}, ::Int64, ::Int64, ::Function, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Tuple{Symbol}, ::DataFrames.FirstSingleCol)
MethodInstance for DataFrames._combine_tables_with_first!(::NamedTuple{(:x1,), _A} where _A<:Tuple{AbstractVector}, ::Tuple{AbstractVector}, ::Vector{Int64}, ::Int64, ::Int64, ::Function, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Tuple{Symbol}, ::DataFrames.FirstSingleCol)
MethodInstance for DataFrames._combine_tables_with_first!(::Union{AbstractDataFrame, NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}}, ::Tuple{Vararg{AbstractVector, N}}, ::Vector{Int64}, ::Int64, ::Int64, ::Union{Function, Type}, ::GroupedDataFrame{DataFrame}, ::Union{Nothing, Tuple, NamedTuple, AbstractVector}, ::Tuple{Vararg{Symbol, N}}, ::Union{DataFrames.FirstMultiCol, DataFrames.FirstSingleCol}) where N so it doesn't seem to specialize on julia> utyps = Base.IdSet(); for mi in methodinstances(mctwf)
push!(utyps, Base.unwrap_unionall(mi.specTypes).parameters[7])
end
julia> utyps
Base.IdSet{Any} with 2 elements:
Function
Union{Function, Type} There were 153 MethodInstances, so this seems likely to be all the type-diversity for that argument. For the other, it's definitely not being called in this workload (there are no MethodInstances). UPDATE: after running the tests there are 1343 MethodInstances (wow!) with 317 different types for Bottom line, if it's not specialized on the specific function, then runtime dispatch gets used. In such cases there should be no cost to deferring it until you get to actual work performed by the function. In your benchmarking, be careful of potential confusion from world-splitting. If you have 3 or fewer applicable MethodInstances Julia will use world-splitting, but define a 4th method and it will use runtime dispatch. That can distort your estimate of performance hotspots. You may have to decide whether you want to optimize the very first use or the use case where the user has supplied many different functions in a single session. In the demo above julia> m = methods(DataFrames.do_call).ms[2]
do_call(f::Union{Function, Type}, idx::AbstractVector{<:Integer}, starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer}, gd::GroupedDataFrame, incols::Tuple{AbstractVector}, i::Integer) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/callprocessing.jl:90
julia> methodinstances(m)
5-element Vector{Core.MethodInstance}:
MethodInstance for DataFrames.do_call(::DataFrames.var"#496#497"{var"#11#12"}, ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Int64)
MethodInstance for DataFrames.do_call(::Function, ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Int64)
MethodInstance for DataFrames.do_call(::var"#13#14", ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Int64)
MethodInstance for DataFrames.do_call(::DataFrames.var"#496#497"{var"#15#16"}, ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Int64)
MethodInstance for DataFrames.do_call(::var"#17#18", ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Int64) but isn't if you run fewer tests. However, apply my julia> methodinstances(m)
2-element Vector{Core.MethodInstance}:
MethodInstance for DataFrames.do_call(::Function, ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Int64)
MethodInstance for DataFrames.do_call(::Union{Function, Type}, ::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}, ::GroupedDataFrame{DataFrame}, ::Tuple{Vector{Int64}}, ::Int64) This is few enough to ensure that this method of |
By the way, when you might want to do "pass through" specialization ( function outer(f, args...)
for i = 1:n
# do lots of work (independent of f)
inner(f, otherargs...)
end
end and take the part |
Running this script: using DataFrames, CategoricalArrays
using SnoopCompile, Profile
using PyPlot: PyPlot, plt
delay=0.1
Profile.init(n=10^7, delay)
tinf = @snoopi_deep include("grouping.jl")
@profile include("grouping.jl")
mref, ax = pgdsgui(tinf; by=exclusive, t0=delay) from within You can also do some analysis: # To see a list of DataFrames methods that have run times (within our
# poor temporal resolution of 0.1s) less than 1/10th of their inference time
# and an inference time of at least 0.1s:
ridata = runtime_inferencetime(tinf; by=exclusive)
filter(ridata) do pr
m, rid = pr
(rid.trun < rid.tinf/10 && rid.tinf > delay) || return false
return isa(m, Method) ? m.module === DataFrames : occursin("DataFrames", m.file)
end which yields 29-element Vector{Pair{Union{Method, SnoopCompile.MethodLoc}, SnoopCompile.PGDSData}}:
transform!(gd::GroupedDataFrame{DataFrame}, args::Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector{T} where T, Type, All, Between, Cols, InvertedIndex, AbstractVecOrMat{var"#s180"} where var"#s180"<:Pair}...; ungroup, renamecols) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:769 => SnoopCompile.PGDSData(0.0, 0.0, 0.32326337400000005, 5)
var"#transform!#638"(ungroup::Bool, renamecols::Bool, ::typeof(transform!), gd::GroupedDataFrame{DataFrame}, args::Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector{T} where T, Type, All, Between, Cols, InvertedIndex, AbstractVecOrMat{var"#s284"} where var"#s284"<:Pair}...) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:769 => SnoopCompile.PGDSData(0.0, 0.0, 0.124490108, 19)
(agg::DataFrames.Aggregate{typeof(Statistics.var), C} where C)(incol::AbstractVector{T} where T, gd::GroupedDataFrame) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/fastaggregates.jl:263 => SnoopCompile.PGDSData(0.0, 0.0, 0.17673247900000003, 39)
wrap(x) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/callprocessing.jl:25 => SnoopCompile.PGDSData(0.0, 0.0, 0.107918367, 69)
hashrows_col!(h::Vector{UInt64}, n::Vector{Bool}, v::AbstractVector{T} where T, rp, firstcol::Bool) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/utils.jl:21 => SnoopCompile.PGDSData(0.0, 0.0, 0.22342205600000004, 39)
(::DataFrames.var"#select##kw")(::Any, ::typeof(select), df::AbstractDataFrame, args...) in DataFrames at /home/tim/.julia/dev/DataFrames/src/abstractdataframe/selection.jl:940 => SnoopCompile.PGDSData(0.0, 0.0, 0.103290706, 126)
(::DataFrames.var"#397#414")(col, refpool, missingind, nmind) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/utils.jl:339 => SnoopCompile.PGDSData(0.0, 0.0, 0.27375607799999996, 51)
getindex(df::DataFrame, row_ind::Integer, col_ind::Union{Signed, Unsigned}) in DataFrames at /home/tim/.julia/dev/DataFrames/src/dataframe/dataframe.jl:462 => SnoopCompile.PGDSData(0.0, 0.0, 0.206627946, 81)
(::DataFrames.var"#667#669")(k) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:145 => SnoopCompile.PGDSData(0.0, 0.0, 0.135824883, 131)
var"#fillfirst!#581"(rev::Bool, ::typeof(DataFrames.fillfirst!), condf, outcol::AbstractVector{T} where T, incol::AbstractVector{T} where T, gd::GroupedDataFrame) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:131 => SnoopCompile.PGDSData(0.0, 0.0, 0.29025164800000014, 68)
groupreduce_init(op, condf, adjust, incol::AbstractVector{U}, gd::GroupedDataFrame) where U in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/fastaggregates.jl:77 => SnoopCompile.PGDSData(0.0, 0.0, 0.147357272, 163)
(::DataFrames.var"#select##kw")(::Any, ::typeof(select), gd::GroupedDataFrame, args::Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector{T} where T, Type, All, Between, Cols, InvertedIndex, AbstractVecOrMat{var"#s179"} where var"#s179"<:Pair}...) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:722 => SnoopCompile.PGDSData(0.0, 0.0, 0.282880664, 98)
var"#select#632"(copycols::Bool, keepkeys::Bool, ungroup::Bool, renamecols::Bool, ::typeof(select), gd::GroupedDataFrame, args::Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector{T} where T, Type, All, Between, Cols, InvertedIndex, AbstractVecOrMat{var"#s284"} where var"#s284"<:Pair}...) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/splitapplycombine.jl:722 => SnoopCompile.PGDSData(0.0, 0.0, 0.24843195300000004, 127)
row_group_slots(cols::Tuple{Vararg{AbstractVector{T} where T, N} where N}, hash::Val, groups::Union{Nothing, Vector{Int64}}, skipmissing::Bool, sort::Union{Nothing, Bool}) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/utils.jl:184 => SnoopCompile.PGDSData(0.0, 0.0, 0.19911604800000016, 177)
hashrows(cols::Tuple{Vararg{AbstractVector{T} where T, N} where N}, skipmissing::Bool) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/utils.jl:59 => SnoopCompile.PGDSData(0.0, 0.0, 0.2374651519999999, 151)
view(adf::AbstractDataFrame, rowinds, colinds::Union{Colon, Regex, AbstractVector{T} where T, All, Between, Cols, InvertedIndex}) in DataFrames at /home/tim/.julia/dev/DataFrames/src/subdataframe/subdataframe.jl:136 => SnoopCompile.PGDSData(0.0, 0.0, 0.13630385300000003, 263)
_names_match(r1, t::Tuple{Vararg{Symbol, N}}) where N in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:84 => SnoopCompile.PGDSData(0.0, 0.0, 0.18915692899999992, 201)
do_call(f::Union{Function, Type}, idx::AbstractVector{var"#s285"} where var"#s285"<:Integer, starts::AbstractVector{var"#s284"} where var"#s284"<:Integer, ends::AbstractVector{var"#s180"} where var"#s180"<:Integer, gd::GroupedDataFrame, incols::Nothing, i::Integer) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/callprocessing.jl:137 => SnoopCompile.PGDSData(0.0, 0.0, 0.196004527, 233)
SubDataFrame(parent::DataFrame, rows::AbstractVector{Int64}, cols) in DataFrames at /home/tim/.julia/dev/DataFrames/src/subdataframe/subdataframe.jl:76 => SnoopCompile.PGDSData(0.0, 0.0, 0.21583261100000004, 263)
isagg(::Pair{var"#s284", var"#s180"} where {var"#s284"<:Union{AbstractString, Signed, Symbol, Unsigned}, var"#s180"<:(Pair{var"#s179", var"#s55"} where {var"#s179", var"#s55"<:Union{AbstractString, Symbol}})}, gdf::GroupedDataFrame) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/fastaggregates.jl:310 => SnoopCompile.PGDSData(0.0, 0.0, 0.21159961600000013, 310)
lookupname(l::Dict{Symbol, Int64}, idx::Symbol) in DataFrames at /home/tim/.julia/dev/DataFrames/src/other/index.jl:283 => SnoopCompile.PGDSData(0.0, 0.0, 0.155423984, 473)
(::Core.var"#Type##kw")(::Any, ::Type{DataFrame}, columns::Union{Vector{Any}, Vector{AbstractVector{T} where T}}, colindex::DataFrames.Index) in DataFrames at /home/tim/.julia/dev/DataFrames/src/dataframe/dataframe.jl:170 => SnoopCompile.PGDSData(0.0, 0.0, 0.23239490200000001, 362)
getproperty(df::AbstractDataFrame, col_ind::Symbol) in DataFrames at /home/tim/.julia/dev/DataFrames/src/abstractdataframe/abstractdataframe.jl:356 => SnoopCompile.PGDSData(0.0, 0.0, 0.12287589699999993, 703)
row_group_slots(cols::Tuple{Vararg{AbstractVector{T} where T, N}}, refpools::Tuple{Vararg{AbstractVector{T} where T, N}}, refarrays::Tuple{Vararg{Union{Missings.EachReplaceMissing{var"#s284", U} where {var"#s284"<:(AbstractVector{var"#s180"} where var"#s180"<:Union{Missing, Real}), U}, AbstractVector{var"#s285"} where var"#s285"<:Real}, N}}, hash::Val{false}, groups::Vector{Int64}, skipmissing::Bool, sort::Bool) where N in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/utils.jl:254 => SnoopCompile.PGDSData(0.0, 0.0, 1.061802504, 92)
fill_row!(row, outcols::Tuple{Vararg{AbstractVector{T} where T, N}}, i::Integer, colstart::Integer, colnames::Tuple{Vararg{Symbol, N}}) where N in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/complextransforms.jl:93 => SnoopCompile.PGDSData(0.0, 0.0, 0.480759434, 323)
(::DataFrames.var"#618#624")() in DataFrames at threadingconstructs.jl:169 => SnoopCompile.PGDSData(0.0, 0.0, 0.34025604699999995, 681)
groupreduce!_helper(res::AbstractVector{T} where T, f, op, condf, adjust, checkempty::Bool, incol::AbstractVector{T} where T, groups::Vector{Int64}, counts::Vector{Int64}, batches) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/fastaggregates.jl:160 => SnoopCompile.PGDSData(0.0, 0.0, 1.133820778, 628)
var"#DataFrame#171"(kwargs, ::Type{DataFrame}) in DataFrames at /home/tim/.julia/dev/DataFrames/src/dataframe/dataframe.jl:288 => SnoopCompile.PGDSData(0.0, 0.0, 1.4798948850000004, 725)
do_call(f::Union{Function, Type}, idx::AbstractVector{var"#s285"} where var"#s285"<:Integer, starts::AbstractVector{var"#s284"} where var"#s284"<:Integer, ends::AbstractVector{var"#s180"} where var"#s180"<:Integer, gd::GroupedDataFrame, incols::Tuple{AbstractVector{T} where T}, i::Integer) in DataFrames at /home/tim/.julia/dev/DataFrames/src/groupeddataframe/callprocessing.jl:90 => SnoopCompile.PGDSData(0.0, 0.0, 1.1230623329999998, 996) These may be your best candidates for reduced specialization, with the important caveat that a test suite seems unlikely to be a particularly good example of the kind of workload you'd like to optimize for. (It would be better to re-run this with something more realistic.) Interestingly, the dot with highest inference time (but also significant runtime, nearly 10s of each) is |
Thank you for your comments and tips. Just two quick remarks:
The examples I gave:
Yes, I have thought about it. The problem is that we have:
|
If |
But the compiler when compiling I tried to verify your vs my thinking, but I hit the following problem (i.e. it seems I cannot use standard macros to check what happens):
So we seem to have an observer effect. The fact that I used Interestingly - in H2O benchmarks https://h2oai.github.io/db-benchmark/ of course macros are used to measure performance, so the question is if using the macro does not affect the benchmarks :) (this is not a super important question but it just occurred to me so I thought of writing it down). CC @nalimilan Thank you! |
No, my point is that this is a technique for allowing If you want to force specialization, beware of Julia's heuristics; all those @noinline inner(f::F, x) where F = f(x)
function outer(f::F, x) where F
... Adding an apparently-spurious type is kind of like the opposite of adding julia> methodinstances(outer)
4-element Vector{Core.MethodInstance}:
MethodInstance for outer(::typeof(sin), ::Int64)
MethodInstance for outer(::typeof(identity), ::String)
MethodInstance for outer(::typeof(cos), ::Float64)
MethodInstance for outer(::typeof(-), ::Rational{Int64}) Interesting about the observer effect. Cthulhu shows the same pattern. I guess we're explicitly asking for that specialized form. |
Yes - this is clear. I somehow forgotten that this was the point.
This is also clear - I wanted to know about the case when specialization is not wanted.
The most strange thing is that Thank you again :). |
If I understand things correctly SnoopCompile.jl currently allows to dig into inference timings. It would be great to be able to similarly dig into LLVM codegen and native codegen timings.
This is relevant for packages like DataFrames.jl, where in many cases it is LLVM codegen and/or native codegen timing a chief time of running some functions (and this is unavoidable since for many functions users pass anonymous functions as an argument).
An example of such a case is here: JuliaData/DataFrames.jl#2806
The text was updated successfully, but these errors were encountered: