Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing latency: acceptable strategies discussion #792

Closed
timholy opened this issue Dec 18, 2020 · 21 comments
Closed

Reducing latency: acceptable strategies discussion #792

timholy opened this issue Dec 18, 2020 · 21 comments
Labels
planning For discussion and planning development

Comments

@timholy
Copy link
Contributor

timholy commented Dec 18, 2020

I'm opening this to ask about what changes the devs here are willing to accept in order to reduce latency. As a case study, let's consider AbstractPlotting.draw_axis2d, which has a mere 43 arguments. If I run the test suite and then do

using MethodAnalysis
mis = methodinstances(AbstractPlotting.draw_axis2d)`

it gives me 8 inferred MethodInstances. I've edited the output and stashed it in this gist, turning it into a compile-time benchmark. Here's what I get:

ulia> include("/tmp/draw_axis2d_demo.jl ")
  0.000012 seconds (5 allocations: 976 bytes)   # the all-Any MethodInstance
  4.711192 seconds (11.49 M allocations: 681.750 MiB, 7.89% gc time, 100.00% compilation time)
  0.000014 seconds (5 allocations: 976 bytes)   # the mostly-Any MethodInstance
  0.142401 seconds (232.42 k allocations: 12.758 MiB, 99.99% compilation time)
  0.111768 seconds (138.38 k allocations: 6.993 MiB, 99.98% compilation time)
  0.181831 seconds (349.69 k allocations: 19.633 MiB, 99.99% compilation time)
  0.094713 seconds (176.63 k allocations: 9.740 MiB, 99.98% compilation time)
  0.084915 seconds (105.29 k allocations: 5.205 MiB, 99.97% compilation time)
true

Obviously, the first "real" one (not the all-Any) takes a lot of time because it's also compiling a bunch of dependent functions that can be partially reused on future compiles. But the noteworthy thing here is that later "real" instances each take ~100ms. And this is just one method (and its callees), one that is not horrifically huge by Makie/AbstractPlotting/Layout standards.

Let's try to get a more systematic view. This is complex, so let me just show you how to investigate this yourself. First, you need to be running the master branch of SnoopCompile. We're going to try this on a demo that matches the user experience, i.e., the first plot in a new session:

julia> using SnoopCompile, AbstractPlotting

julia> tinf = @snoopi_deep scatter(rand(10^4), rand(10^4))
Core.Compiler.Timings.Timing(InferenceFrameInfo for Core.Compiler.Timings.ROOT()) with 658 children

julia> tminfo = flatten_times(tinf);

julia> count_time = Dict{Method,Tuple{Int,Float64}}();

julia> for (t, info) in tminfo
           mi = info.mi
           if isa(mi.def, Method)
               m = mi.def::Method
               n, tm = get(count_time, m, (0, 0.0))
               count_time[m] = (n + 1, tm + t)
           end
       end

julia> sort!(collect(count_time); by = pr -> pr.second[2])
2131-element Vector{Pair{Method, Tuple{Int64, Float64}}}:
< lots of output >

This is a list of method => (number of instances, total time to compile *just* this method and not its callees) pairs. (To emphasize, that's different from what we measured above with draw_axis2d, where the time was inclusive of the compile time for the callees.) You'll see one object, ROOT, which measures all time outside of inference (including codegen). But more than half the time is inference.

I encourage you to run this yourself, there are some interesting takeaways. Here's a log plot of the time of each of these 2131 methods (measuring the self-inference time, in seconds):

inference_time

You can see it's really dominated by a "few" bad players: there are 171 methods accounting for more than 0.01s of total inference time across all of their argument type combinations. Some of these are things like setproperty! which have hundreds of instances (and are expected to), but there are quite a few that are among the dominant ones that have relatively few MethodInstances (here I'm just showing the last, most dominant 171 methods):

count_end

Moreover, many of the ones that have too many instances to precompile might be precompiled by being precompiled into methods that have fewer instances (that works for as many calls as inference succeeds for, so YMMV).

There is a pretty natural solution, one I've discussed before: https://docs.julialang.org/en/v1/manual/style-guide/#Handle-excess-argument-diversity-in-the-caller. The idea is if you have

foo(x, y, z, ...)

you can design it like this:

foo(x::TX, y::TY, z::TZ, ...) = # the "real" version, big and slow to compile
foo(x, y, z, ...) = foo(convert(TX, x)::TX, convert(TY, y)::TY, convert(TZ, z)::TZ, ...)  # tiny and fast to compile

Obviously that's a pretty big change from the code base as it is right now. Certainly, the user's data might come in any number of variants, but a lot of AbstractPlotting's internals seems to consist of passing around data you've computed from it: things like axis limits, fonts, etc. These seem much more standardizable.

Anyway, I've decided this is more than I could tackle solo, so I'm not going to implement this without buy-in from Makie's developers. But if there is interest I'm happy to help, especially to teach the tools so that you can run these diagnostics yourself. I'm slowly getting towards a big new release of SnoopCompile, but I've realized I need a real-world test bed (beyond what I got in JuliaLang/julia#38906, since getting changes made to Base is probably off the table for many packages) and Makie seems like a good candidate.

@timholy
Copy link
Contributor Author

timholy commented Dec 18, 2020

Link to WIP blog post using DataFrames (I will probably rewrite it heavily, but it does give an overview, slightly outdated, of the tools): JuliaLang/www.julialang.org#1093

@timholy
Copy link
Contributor Author

timholy commented Dec 18, 2020

Just a couple more details about how I'd envision a collaboration working. What's easy for me: analyzing method instances, inference times, invalidations, inference failures, suggesting precompile directives, and improving the analysis tools. What's hard for me: understanding how to improve AbstractPlotting to make it cheaper to infer/more precompilable. A lot of the improvements will involve either (1) refactoring methods (breaking up big methods into smaller pieces with greater independence and thus lower argument-type diversity) and/or (2) imposing constraints on the types of objects. When I look at an untyped variable in AbstractPlotting's source code, it's usually a fairly painful (and probably error-prone) process for me to figure out what types it might possibly have, and whether I can impose some constraints with or without normalization. That's where having someone with greater familiarity with the package will help.

So, I can identify some juicy targets and point out some of the problems, but it's best if someone else with greater awareness of the codebase than I have time to acquire ends up implementing some of the fixes. Here are some of the steps I'd imagine and the expected schedule for getting a "win":

  • standardizing argument types on slow-to-compile functions: together with precompilation, may lead to immediate but modest descreases in latency
  • improving inference: hard work (the hardest step of this whole process), and in and of itself will not visibly contribute to latency reduction. However, I think it's essential for success. Your latency has a "few" dominant players (~200) but many of them are methods like getproperty(::MyType, fieldname). You cannot directly precompile such methods, because even though you own MyType you do not own getproperty. However, if getproperty is compile-time called from a method you do own (if the chain of inference succeeds between methods you do own and methods you do not), then the backedge this introduces will allow you to successfully precompile it. There's a visualization below that may help understand these issues.
  • Adding precompiles: especially if the previous step has increased the length of successful inference chains, this will give you a big win. (If your inference chains are short, then the benefit will be pretty marginal.)

The master branch of SnoopCompile lets you visualize inference chains as a flame graph. Here I'll disable AbstractPlotting's precompile.jl file (it doesn't have that many entries anyway) and do this:

julia> using SnoopCompile, AbstractPlotting
[ Info: Precompiling AbstractPlotting [537997a7-5e4e-5d89-9595-2241ea00577e]

julia> tinf = @snoopi_deep scatter(rand(10^4), rand(10^4))
Core.Compiler.Timings.Timing(InferenceFrameInfo for Core.Compiler.Timings.ROOT()) with 646 children

julia> using FlameGraphs, ProfileView
< annoying Gtk warnings suppressed>

julia> fg = flamegraph(tinf)
Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:75, 0x00, 0:20848427294))

julia> ProfileView.view(fg)
Gtk.GtkWindowLeaf(name="", parent, width-request=-1, height-request=-1, visible=TRUE, sensitive=TRUE, app-paintable=FALSE, can-focus=FALSE, has-focus=FALSE, is-focus=FALSE, focus-on-click=TRUE, can-default=FALSE, has-default=FALSE, receives-default=FALSE, composite-child=FALSE, style, events=0, no-show-all=FALSE, has-tooltip=FALSE, tooltip-markup=NULL, tooltip-text=NULL, window, opacity=1.000000, double-buffered, halign=GTK_ALIGN_FILL, valign=GTK_ALIGN_FILL, margin-left, margin-right, margin-start=0, margin-end=0, margin-top=0, margin-bottom=0, margin=0, hexpand=FALSE, vexpand=FALSE, hexpand-set=FALSE, vexpand-set=FALSE, expand=FALSE, scale-factor=1, border-width=0, resize-mode, child, type=GTK_WINDOW_TOPLEVEL, title="Profile", role=NULL, resizable=TRUE, modal=FALSE, window-position=GTK_WIN_POS_NONE, default-width=800, default-height=600, destroy-with-parent=FALSE, hide-titlebar-when-maximized=FALSE, icon, icon-name=NULL, screen, type-hint=GDK_WINDOW_TYPE_HINT_NORMAL, skip-taskbar-hint=FALSE, skip-pager-hint=FALSE, urgency-hint=FALSE, accept-focus=TRUE, focus-on-map=TRUE, decorated=TRUE, deletable=TRUE, gravity=GDK_GRAVITY_NORTH_WEST, transient-for, attached-to, has-resize-grip, resize-grip-visible, application, is-active=TRUE, has-toplevel-focus=TRUE, startup-id, mnemonics-visible=FALSE, focus-visible=FALSE, is-maximized=FALSE)

Here's what I get:

image

The empty gaps between the vertical bars correspond to times when AbstractPlotting is doing something other than inference...which you can see is not very much of the time.

Let's contrast that with Revise:

julia> tinf = @snoopi_deep begin
           cp("Example1.jl", "Example.jl"; force=true)
           sleep(0.01)   # for FileWatching
           revise()
       end

and visualizing it similarly:

image

(That's also without Revise's precompile file...it's even better with it.) You can see more empty spaces where codegen is happening, plus a small amount where Revise is doing real work.

Improving inference in AbstractPlotting will have the effect of consolidating the periods of inference: you'll take a bunch of skinny flames and merge them into fewer flatter flames. (There are a few fat flames right now, but most of the time is taken up by lots of skinny ones.) Each flame (i.e., a tower rising from the bottom) corresponds to a new entry into inference. Having a smaller number of entry points into inference will make the package more precompilable, because with fewer precompile directives you'll be able to force inference on a larger fraction of the code. But that also explains why improving inference will not itself reduce latency: you're just moving your inference burden around, not reducing it. But once you add precompiles, then you'll see the benefit.

The advantage of doing the inference work first is that you avoid adding gazillions of precompile statements that later prove totally unnecessary and just clutter your source code.

@jkrumbiegel
Copy link
Member

Thanks so much for the work put into this. I had already read the blog post and think this new functionality makes it much easier to spot bad apples in terms of compilation.

I'm definitely guilty of using methods with tons of arguments, because that's usually what happens when lifting multiple attributes of a plot. In the LAxis code there are places with more than 15 or so arguments. I have talked with Simon how this could be improved, and one core idea is to replace the bundle of nodes approach with a node of a dictionary for attributes. This would massively reduce the number of arguments of a lot of the consumers but it's also a deep change that will take time.

In general, I think the optimization potential is in nospecializing / un-parameterizing a lot of the higher level code which passes values downstream to lower level code. A plotting package obviously has to deal with an extraordinary amount of input types but we probably specialize too much on these where it's not necessary.

One question I have is if every nospecialize or a struct with an untyped field makes compilation slower than the compilation of a couple specialized versions would be. That is, what is the overhead of deliberately hiding useful information from the compiler, to reduce the number of compiled methods, while simultaneously increasing the complexity of the compiled instructions?

@timholy
Copy link
Contributor Author

timholy commented Dec 20, 2020

I have talked with Simon how this could be improved, and one core idea is to replace the bundle of nodes approach with a node of a dictionary for attributes.

Or just define more internal structs, if the same bits of data are needed in many places.

A plotting package obviously has to deal with an extraordinary amount of input types but we probably specialize too much on these where it's not necessary.

Both are true. You do have to accept really diverse inputs, but in general what I think you can do is normalize most of the values you compute from them to just a few types. Not the raw data, but axis limits, fonts and fontsizes, etc. Everything you'd need, for example, to compute a layout.

One question I have is if every nospecialize or a struct with an untyped field makes compilation slower ...

It's really hard to predict in general. Let's take the case of draw_axis2d, using the absolutely-current SnoopCompile master branch (I've been working on it a fair bit recently, mostly to add functionality while also improving ergonomics):

julia> using SnoopCompile, AbstractPlotting

julia> tinf = @snoopi_deep scatter(rand(10^4), rand(10^4))
Core.Compiler.Timings.Timing(InferenceFrameInfo for Core.Compiler.Timings.ROOT()) with 658 children

julia> tf = flatten_times(tinf; sorted=false);   # sorted=false flattens them in the order they were inferred

julia> m = methods(AbstractPlotting.draw_axis2d).ms[1]    # find the ones for draw_axis2d
draw_axis2d(textbuffer, frame_linebuffer, grid_linebuffer, tickmarks_linebuffer, m, padding, limits, xyrange_labels, showgrid, showticks, showtickmarks, g_linewidth, g_linecolor, g_linestyle, t_linewidth, t_linecolor, t_linestyle, t_textcolor, t_textsize, t_rotation, t_align, t_font, t_gap, t_title_gap, tm_linewidth, tm_linecolor, tm_linestyle, tm_length, f_linewidth, f_linecolor, f_linestyle, f_axis_position, f_axis_arrow, f_arrow_size, f_frames, ti_labels, ti_textcolor, ti_textsize, ti_rotation, ti_align, ti_font, ti_title) in AbstractPlotting at /home/tim/.julia/dev/AbstractPlotting/src/basic_recipes/axis.jl:443

julia> filter(pr -> Method(pr.second) == m, tf)
3-element Vector{Pair{Float64, Core.Compiler.Timings.InferenceFrameInfo}}:
 0.026801339 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
 0.006816421 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::StaticArrays.SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.025635156 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)

From this we see that the ones with abstract signature took almost 4x more "exclusive time" than the one with concrete signature...abstract inference is expensive. However, now let's look at the "inclusive time" (meaning, incorporating the inference on all the callees):

julia> tinc = InclusiveTiming(tinf)
InclusiveTiming: 20.75259714 for MethodInstance for Core.Compiler.Timings.ROOT() with 658 direct children

julia> tf = flatten_times(tinc; sorted=false);

julia> filter(pr -> Method(pr.second) == m, tf)
3-element Vector{Pair{Float64, Core.Compiler.Timings.InferenceFrameInfo}}:
 1.544234134 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
 0.587577583 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::StaticArrays.SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.048427101 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)

The second one was pretty slow too, because once you have concrete types you can dive much deeper into the later dispatch chain. The first one was slow partly because it was abstract, but mostly because it was first: if we start a fresh session and run inference on just the concrete signature:

julia> using SnoopCompile, AbstractPlotting, StaticArrays

julia> tinf = @snoopi_deep precompile(AbstractPlotting.draw_axis2d, (Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}, Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, StaticArrays.SMatrix{4, 4, Float32, 16}, Float64, Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, Tuple{Bool, Bool}, Tuple{Bool, Bool}, Tuple{Bool, Bool}, Tuple{Float64, Float64}, Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, Tuple{Nothing, Nothing}, Tuple{Int64, Int64}, Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, Tuple{Nothing, Nothing}, Tuple{Symbol, Symbol}, Tuple{Int64, Int64}, Tuple{Float64, Float64}, Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, Tuple{String, String}, Int64, Int64, Tuple{Int64, Int64}, Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, Tuple{Nothing, Nothing}, Tuple{Float64, Float64}, Float64, Symbol, Nothing, Nothing, Bool, Float64, Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, Tuple{String, String}, Tuple{Symbol, Symbol}, Tuple{Int64, Int64}, Tuple{Float64, Float64}, Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, Tuple{String, String}, Nothing))
Core.Compiler.Timings.Timing(InferenceFrameInfo for Core.Compiler.Timings.ROOT()) with 2 children

julia> tinc = InclusiveTiming(tinf)
InclusiveTiming: 4.185188563 for MethodInstance for Core.Compiler.Timings.ROOT() with 2 direct children

julia> m = methods(AbstractPlotting.draw_axis2d).ms[1]
draw_axis2d(textbuffer, frame_linebuffer, grid_linebuffer, tickmarks_linebuffer, m, padding, limits, xyrange_labels, showgrid, showticks, showtickmarks, g_linewidth, g_linecolor, g_linestyle, t_linewidth, t_linecolor, t_linestyle, t_textcolor, t_textsize, t_rotation, t_align, t_font, t_gap, t_title_gap, tm_linewidth, tm_linecolor, tm_linestyle, tm_length, f_linewidth, f_linecolor, f_linestyle, f_axis_position, f_axis_arrow, f_arrow_size, f_frames, ti_labels, ti_textcolor, ti_textsize, ti_rotation, ti_align, ti_font, ti_title) in AbstractPlotting at /home/tim/.julia/dev/AbstractPlotting/src/basic_recipes/axis.jl:443

julia> filter(pr -> Method(pr.second) == m, tf)
1-element Vector{Pair{Float64, Core.Compiler.Timings.InferenceFrameInfo}}:
 2.784754542 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)

we see that it's the most expensive of all, even more than the sum of the first two above. (When we ran it in the context of scatter, a lot of the inference time for draw_axis2d with a concrete signature must have been eliminated due to caching the results of previous inference attributed to other methods.)

@timholy
Copy link
Contributor Author

timholy commented Dec 20, 2020

While it's less related to the typical user experience, for more insight into the concrete-vs-abstract and first-vs-later issues, here are the results when running the entire test suite (since the tests don't pass, you have to use tinf = @snoopi_deep try include("runtests.jl") catch end):

julia> tf = flatten_times(tinf; sorted=false);

julia> filter(pr -> Method(pr.second) == m, tf)
8-element Vector{Pair{Float64, Core.Compiler.Timings.InferenceFrameInfo}}:
 0.031903985 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
 0.007949048 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.031357103 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
 0.007187978 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Int64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.007625902 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Int64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.007632252 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Int64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.007827706 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.006590775 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)

julia> tinc = InclusiveTiming(tinf)
InclusiveTiming: 189.34281908 for MethodInstance for Core.Compiler.Timings.ROOT() with 4958 direct children

julia> tf = flatten_times(tinc; sorted=false);

julia> filter(pr -> Method(pr.second) == m, tf)
8-element Vector{Pair{Float64, Core.Compiler.Timings.InferenceFrameInfo}}:
 2.029182911 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
 0.513283067 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.060381376 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
 0.027713052 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Int64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.014189973 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Int64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.049258881 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Int64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.021972931 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)
 0.008082211 => InferenceFrameInfo for AbstractPlotting.draw_axis2d(::Annotations{Tuple{Vector{Tuple{String, Point{2, Float32}}}}}, ::LineSegments{Tuple{Vector{Point{2, Float32}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::Tuple{LineSegments{Tuple{Vector{Point{2, Float32}}}}, LineSegments{Tuple{Vector{Point{2, Float32}}}}}, ::SMatrix{4, 4, Float32, 16}, ::Float64, ::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, ::Tuple{Tuple{Vector{Float64}, Vector{Float64}}, Tuple{Vector{String}, Vector{String}}}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Bool, Bool}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, ::Int64, ::Int64, ::Tuple{Int64, Int64}, ::Tuple{Tuple{Symbol, Float64}, Tuple{Symbol, Float64}}, (nothing, nothing)::Tuple{Nothing, Nothing}, ::Tuple{Float64, Float64}, ::Float64, ::Symbol, nothing::Nothing, nothing::Nothing, ::Bool, ::Float64, ::Tuple{Tuple{Bool, Bool}, Tuple{Bool, Bool}}, ::Tuple{String, String}, ::Tuple{Symbol, Symbol}, ::Tuple{Int64, Int64}, ::Tuple{Float64, Float64}, ::Tuple{Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}}, ::Tuple{String, String}, nothing::Nothing)

So inclusive time for these later concrete signatures is of order 30ms (and exclusive time is ~7ms, meaning a big chunk is the method itself), but of course keep in mind that there's also codegen and the total, as measured in the OP, is around 100ms.

@timholy
Copy link
Contributor Author

timholy commented Dec 20, 2020

One other point with regards to specialization: you probably want to stay < 30 arguments due to
https://github.com/JuliaLang/julia/blob/58d6684e5e938eeb62dd3a472e593a04c54b2431/src/julia.h#L325

@timholy
Copy link
Contributor Author

timholy commented Dec 20, 2020

A generally summary of the discussion above: while adding @nospecialize might be useful in many cases, in some cases even better is the normalization strategy

function foo(x::TX, y::TY, z::TZ, ...)
    # the "real" version, big and slow to compile
    ...
end
foo(x, y, z, ...) = foo(convert(TX, x)::TX, convert(TY, y)::TY, convert(TZ, z)::TZ, ...)  # tiny and fast to compile

because you do expensive inference (for big methods) only for concrete signatures, not abstract ones. You can of course combine these:

function foo(@nospecialize(x::Union{TXa,TXb}), ...)
    # or you can just add `@nospecialize` here to ensure it never gets more specialized than the signature you supply
    ...
end

You've probably tried this yourself, but I experimented with adding module-level @nospecialize to every module defined in AbstractPlotting and GLMakie; it was actually a bit slower than the specialized one. So the easy "just quit specializing" unfortunately isn't effective. But I am confident that we can reduce the inference cost to a small fraction of its current state, and that should reduce the TTFP by at least a factor of 2. (If we also end up compiling less code, it could even more better than that.)

@timholy
Copy link
Contributor Author

timholy commented Dec 21, 2020

Using scatter(rand(10^4), rand(10^4)), I've done an analysis of the "inference triggers," calls that force a "fresh" entrance into inference. This happens when (1) inference cannot predict the callee from the caller (aka, requires runtime dispatch), and (2) the callee has not previously been inferred for those argument types. The fewer such triggers we have, the more precompilable the package becomes.

There were 673 such triggers, but falling into roughly 193 non-Base source code locations. I've posted a gist here, where the results are sorted by the number of distinct callee signatures. 98 of them had only a single callee signature, and most of those look precompilable, so it's mostly the 95 multiple-callee locations that merit attention. That actually seems like a manageable number. Of course, do keep in mind that this is a pretty crude analysis, and sometimes the fix might lie in the caller of one of these locations, but at least it gives a sense for the scope of the challenge.

If you scroll down to the bottom of the list, you can see that the handling of Observables is the single biggest source of trouble, followed by handling of properties via dictlike.jl. A lot of these have straightforward fixes. For example, the node_pairs failure (second-to-last) can be fixed by adding a special Observable{Any} constructor

diff --git a/src/Observables.jl b/src/Observables.jl
index 0abdc9c..da6d005 100644
--- a/src/Observables.jl
+++ b/src/Observables.jl
@@ -24,6 +24,8 @@ mutable struct Observable{T} <: AbstractObservable{T}
     val::T
     Observable{T}() where {T} = new{T}([])
     Observable{T}(val) where {T} = new{T}([], val)
+    # Construct an Observable{Any} without runtime dispatch
+    Observable{Any}(@nospecialize(val)) = new{Any}([], val)
 end
 
 function Base.copy(observable::Observable{T}) where T

and then replacing convert(Node{Any}, obj) with

# Version of `convert(Node{Any}, obj)` that rarely requires runtime dispatch (only when converting from another poorly inferred `Node{T}`)
node_any(@nospecialize(obj)) = isa(obj, Node{Any}) ? obj :
                               isa(obj, Node) ? convert(Node{Any}, obj) : Node{Any}(obj)

For me that knocks the number of triggers down to 609 (eliminating 64 triggers), a pretty huge gain for very little effort. Shall I submit some of these changes, or do you want to poke at them a bit yourself instead?

@timholy
Copy link
Contributor Author

timholy commented Dec 22, 2020

Perhaps to try to extract a bit more feedback, I think the most important decision you have to make is the type-normalization question: are you willing to provide inference with a bit more to go on, and if so what is your preferred strategy? Let's take a few examples:

https://github.com/JuliaPlots/AbstractPlotting.jl/blob/6346fe87f18bb1df09f0065f0ad6b2b699c20b94/src/interfaces.jl#L336-L346
What types might plot[:color] have?

https://github.com/JuliaPlots/AbstractPlotting.jl/blob/6346fe87f18bb1df09f0065f0ad6b2b699c20b94/src/basic_recipes/annotations.jl#L21-L64
What types do the sargs entries have?

https://github.com/JuliaPlots/AbstractPlotting.jl/blob/6346fe87f18bb1df09f0065f0ad6b2b699c20b94/src/basic_recipes/axis.jl#L266-L280
What types do the args entries have?

https://github.com/JuliaPlots/AbstractPlotting.jl/blob/6346fe87f18bb1df09f0065f0ad6b2b699c20b94/src/basic_recipes/axis.jl#L540-L589
What types do most of the variables have?

There are really three strategies:

  • use a generic strategy to improve inferrability:
    • use a struct with fields like color::Union{Nothing,RGB{Float32},RGBA{Float32}}
    • use a Dict{Symbol,Any} or Dict{String,Any} as a "fake struct" by writing a macro @extract plot[:color] that converts this into plot.attributes[:color]::Union{Nothing,RGB{Float32},RGBA{Float32}} . (The macro would apply the type-constraints using the name :color as a cue.)
  • annotate each use-location. https://github.com/jlapeyre/ManualDispatch.jl might be a nice helper.
  • it's not possible to limit the types these variables may have---they could be anything.

The last option is perfectly fine, but if so you need to tell me so I can avoid wasting effort.

There's really not much I can do to help until the devs make a decision about this issue.

@jkrumbiegel
Copy link
Member

Hi Tim, it's great how much you're paving the way here for substantial improvements of the Makie ecosystem, and I wish I had the capacity right now to really get into the weeds. I am personally pretty short on time right now and will be for the foreseeable future. All my time spent on Makie will probably go into some specific features I have been working on for a while and only then could I really look at the compilation issues. I say this just to manage expectations because I know it will be frustrating for you to put in more and more work while we don't reciprocate. So again thanks a lot for your work on this, and I'm looking forward to when I can spend time on working on your suggestions!

@timholy
Copy link
Contributor Author

timholy commented Dec 22, 2020

Sounds good, I appreciate getting an answer! I'll wait to work on Makie's latency until there can be a meeting of schedules.

@SimonDanisch
Copy link
Member

Thank you so much @timholy, this is very exciting and I'd absolutely love to reduce the latency of Makie. Sadly, I'm currently in a similar situation as @jkrumbiegel, and all the minutes I can squeeze out right now go into cleaning up Makie to make it more easily maintainable and fix some of the most annoying bugs and inconsistencies.

I do say though, that I'm very willing to make drastic changes if they reduce latency - as long as they're easy to maintain and simplify the code base ;)
Btw, it looks like one of our main goal should be to use the type information found in convert_attribute(value, ::key"attribute") in https://github.com/JuliaPlots/AbstractPlotting.jl/blob/master/src/conversions.jl#L708 to make getproperty on plot objects type stable. The T in getproperty(plot, :attribute)::T should be the same as convert_attribute(plot.attribute, key"attribute"())::T and they should be fairly stable for attribute. I'm also open to make convert_attribute more strict and convert to fewer types.
If we then use the type inferable getproperty (or maybe make a function like get_converted that is type stable), it should be possible to make large parts of the code inferable.
A complication here is, that we need to figure out how to make this work with Observables -.- I will think a bit about this!
But this could make something like:

    sargs = (
        plot.model, plot.font,
        plot[1],
        getindex.(plot, (:color, :textsize, :align, :rotation, :justification, :lineheight))...,
    )

Immediately type stable without touching that code ;)

@timholy
Copy link
Contributor Author

timholy commented Dec 22, 2020

These sound like good ideas. In general, it sounds like we agree that pushing the type constraints into the background is the best approach---it will keep the operational stuff more readable.

WRT Observables, you can do something similar even when it's an Observable{Any}; if you know types more precisely than the Any suggests, you can impose them. It just might take a bit of macro-magic or something to make it fairly unintrusive.

Given all of our time constraints, one hard thing is that I suspect the most effective approach will be to "eat our vegetables" first and then "get the dessert," but that does mean that the reward comes all at the end. There's just not a lot of point in adding a bunch of precompiles that are only marginally effective and for which the majority will become unnecessary as inferrability improves, so one might as well just work on the inferrability & specialization until they're to a reasonable point, and then start adding precompiles. Still, I've been through this road before and it's worked well, so I'm pretty confident it will work well for Makie. When we all have the time to do it.

Just ping me if you get excited about this.

@timholy
Copy link
Contributor Author

timholy commented Dec 26, 2020

I submitted a bit of the stuff I'd played with, just in case it will be a while before anyone has time to tackle this.

@Denis-Titov
Copy link

Denis-Titov commented Aug 16, 2021

Apologies if this was discussed elsewhere but I'm curios if @SimonDanisch, @jkrumbiegel and Makie team still think this is worthwhile or plan to work on this in the near future.
I switched to using Makie as my main plotting package recently and love it.
Since recent introduction of LaTeX equation support, Makie TTFP is probably top on my Makie improvement wishlist (and I likely many other users) so I'm very curious about any plans/timeline of Makie team to work on this.

On my laptop Makie TTFP is decisively slower than Plots for example:

`julia> @time using Plots
4.325755 seconds (8.26 M allocations: 571.164 MiB, 7.14% gc time, 0.10% compilation time)

julia> @time display(Plots.plot([1,2,3,4,5,6],[1,2,3,4,5,6]))
8.889164 seconds (19.50 M allocations: 1.092 GiB, 4.09% gc time, 14.47% compilation time)

julia> @time using CairoMakie
3.393199 seconds (5.64 M allocations: 366.921 MiB, 4.08% gc time)

julia> @time display(CairoMakie.lines([1,2,3,4,5,6],[1,2,3,4,5,6]))
50.223172 seconds (88.59 M allocations: 5.279 GiB, 3.72% gc time, 20.62% compilation time)`

It seems like other complex packages had success using @timholy great tools to decisively decrease latency (e.g., SciML/DifferentialEquations.jl#786) so I presume large gains in TTFP can be done for Makie as well?

Thanks for all your excellent work on this beautiful package!

@timholy
Copy link
Contributor Author

timholy commented Aug 17, 2021

Until Julia can cache native code, I estimate Makie can only get a 2x benefit (because about half the TTFP is type-inference). 2x is great, but not the 7x seen for large parts of the SciML stack (and more to come in the fullness of time).

However, in the short term it seems likely that if/when native-code caching lands, it will have the same requirements that caching type-inference has now. Consequently, any strides made to improve caching type-inference results may have cascading benefits as Julia itself becomes more capable. (And conversely, failure to address the shortcomings in caching type-inference will block Makie from taking advantage of new possibilities.)

I am quite sure this is on the devs radar, but it's also fair to say that this is going to be a much bigger lift for Makie than it has been for SciML---there's much more redesign needed here. I have not yet given #1085 a serious review yet, but I suspect it's the most important step towards improvement.

@Denis-Titov
Copy link

Thanks @timholy. That makes sense. I am also hoping that part of the plan for #1085 is to improve type-inference to reduce latency as is mentioned there. 2x would be great. Any thoughts on the time frame when/if you think native-code caching will become a reality or perhaps some kind tiered JIT/interpreter setup?

@bjarthur
Copy link
Contributor

so what's the latest on reducing TTFP for Makie? have the 2x improvements mentioned above been implemented already? i ask because i have a snazzy interactive plot that takes 45 seconds to display on a brand new machine. very sad.

@SimonDanisch
Copy link
Member

Yeah, 2x was achieved!
We should get some further improvements from:
#3281
#3113
Otherwise you should try to put your code in a package and use PrecompileTools to cache the compilation for the plot ;)
Should be possible to get it close to a second, but with longer precompile times

@bjarthur
Copy link
Contributor

thanks, got PrecompileTools.jl working with my Makie code!

one quirk i came across is that an Observable used in an on callback has to be declared inside @compile_workload. if outside in the @setup_workload block, bytecode is printed forever (at least overnight with my full code). here's a MWE:

@setup_workload begin
    ob = Observable(1)    ### move this inside `@compile_workload` or bytecode is printed forever
    @compile_workload begin
        fig = Figure()
        to = Toggle(fig[1,2])
        on(ob) do i_timepoint
            println(to.active)
        end
        display(GLMakie.Screen(), fig)
    end
end 

let me know if you'd like me to file an issue about this here or elsewhere.

@SimonDanisch
Copy link
Member

That's pretty weird! I'd open an issue with PrecompileTools!

@ffreyer ffreyer added the planning For discussion and planning development label Aug 22, 2024
@MakieOrg MakieOrg locked and limited conversation to collaborators Aug 22, 2024
@ffreyer ffreyer converted this issue into discussion #4192 Aug 22, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
planning For discussion and planning development
Projects
None yet
Development

No branches or pull requests

6 participants