From f9c7d192cb0a6d5d83c4124652e791694513b296 Mon Sep 17 00:00:00 2001 From: Tim Holy Date: Sat, 21 Sep 2024 05:39:49 -0500 Subject: [PATCH 1/2] Docs: add material on constprop --- docs/src/tutorials/snoop_inference.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/src/tutorials/snoop_inference.md b/docs/src/tutorials/snoop_inference.md index ebd1389a..b3c29cfb 100644 --- a/docs/src/tutorials/snoop_inference.md +++ b/docs/src/tutorials/snoop_inference.md @@ -155,10 +155,17 @@ Users are encouraged to read the ProfileView documentation to understand how to - ctrl-click can be used to zoom in - empty horizontal spaces correspond to activities other than type-inference - any boxes colored red (there are none in this particular example, but you'll see some later) correspond to *naively non-precompilable* `MethodInstance`s, in which the method is owned by one module but the types are from another unrelated module. Such `MethodInstance`s are omitted from the precompile cache file unless they've been "marked" by `PrecompileTools.@compile_workload` or an explicit `precompile` directive. -- any boxes colored orange-yellow (there is one in this demo) correspond to methods inferred for specific constants (constant propagation) +- any boxes colored orange-yellow (there is one in this demo) correspond to methods inferred for specific constants (constant propagation). You can explore this flamegraph and compare it to the output from `print_tree`. +!!! note + Orange-yellow boxes that appear at the base of a flame are worth special attention, and may represent something that you thought you had precompiled. For example, suppose your workload "exercises" `myfun(args...; warn=true)`, so you might think you have `myfun` covered for the corresponding argument *types*. But constant-propagation (as indicated by the orange-yellow coloration) results in (re)compilation for specific *values*: if Julia has decided that `myfun` merits constant-propagation, a call `myfun(args...; warn=false)` might need to be compiled separately. + + When you want to prevent constant-propagation from hurting your TTFX, you have two options: + - precompile for all relevant argument *values* as well as types. The most common argument types to trigger Julia's constprop heuristics are numbers (`Bool`/`Int`/etc) and `Symbol`. + - Disable constant-propagation for this method by adding `Base.@constprop :none` in front of your definition of `myfun`. Constant-propagation can be a big performance boost when it changes how performance-sensitive code is optimized for specific input values, but when this doesn't apply you can safely disable it. + Finally, [`flatten`](@ref), on its own or together with [`accumulate_by_source`](@ref), allows you to get an sense for the cost of individual `MethodInstance`s or `Method`s. The tools here allow you to get an overview of where inference is spending its time. From a70584d4555a737b4e1a66db9f41684dd5881d97 Mon Sep 17 00:00:00 2001 From: Tim Holy Date: Sat, 21 Sep 2024 05:40:06 -0500 Subject: [PATCH 2/2] Fix discussion on inference timing --- docs/src/tutorials/snoop_inference.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/tutorials/snoop_inference.md b/docs/src/tutorials/snoop_inference.md index b3c29cfb..8407b9f8 100644 --- a/docs/src/tutorials/snoop_inference.md +++ b/docs/src/tutorials/snoop_inference.md @@ -124,9 +124,9 @@ The second number is the *inclusive* time, which is the exclusive time plus the Therefore, the inclusive time is always at least as large as the exclusive time. The `ROOT` node is a bit different: its exclusive time measures the time spent on all operations *except* inference. -In this case, we see that the entire call took approximately 10ms, of which 9.3ms was spent on activities besides inference. +In this case, we see that the entire call took approximately 3.3ms, of which 2.7ms was spent on activities besides inference. Almost all of that was code-generation, but it also includes the time needed to run the code. -Just 0.76ms was needed to run type-inference on this entire series of calls. +Just 0.55ms was needed to run type-inference on this entire series of calls. As you will quickly discover, inference takes much more time on more complicated code. We can also display this tree as a flame graph, using the [ProfileView.jl](https://github.com/timholy/ProfileView.jl) package: