Skip to content

Commit

Permalink
More OptimizeMe updates; delete broken refs
Browse files Browse the repository at this point in the history
  • Loading branch information
timholy committed Jul 24, 2024
1 parent 3c328f1 commit 354f18d
Show file tree
Hide file tree
Showing 4 changed files with 24 additions and 100 deletions.
2 changes: 1 addition & 1 deletion docs/src/tutorials/invalidations.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ There are quite a few other tools for working with `invs` and `trees`, see the [
`tallyscores` and `playgame` were compiled in `Blackjack`, a "world" where the `score` method defined in `BlackjackFacecards` does not yet exist. When you load the `BlackjackFacecards` package, Julia must ask itself: now that this new `score` method exists, am I certain that I would compile `tallyscores` the same way? If the answer is "no," Julia invalidates the old compiled code, and compiles a fresh version with full awareness of the new `score` method in `BlackjackFacecards`.

Why would the compilation of `tallyscores` change? Evidently, `cards` is a `Vector{Any}`, and this means that `tallyscores` can't guess what kind of object `card` might be, and thus it can't guess what kind of objects are passed into `score`. The crux of the invalidation is thus:
- when `Blackjack` is compiled, inference does not know which `score` method will be called. However, at the time of compilation the only `score` method is for `Int`. Thus Julia will reason that anything that isn't an `Int` is going to trigger an error anyway, and so you might as well optimize `tallyscore` expecting all cards to be `Int`s. (More information about how `tallyscores` gets optimized can be found in [World-splitting](@ref).)
- when `Blackjack` is compiled, inference does not know which `score` method will be called. However, at the time of compilation the only `score` method is for `Int`. Thus Julia will reason that anything that isn't an `Int` is going to trigger an error anyway, and so you might as well optimize `tallyscore` expecting all cards to be `Int`s.
- however, when `BlackjackFacecards` is loaded, suddenly there are two `score` methods supporting both `Int` and `Char`. Now Julia's guess that all `cards` will probably be `Int`s doesn't seem so likely to be true, and thus `tallyscores` should be recompiled.

Thus, invalidations arise from optimization based on what methods and types are "in the world" at the time of compilation (sometimes called *world-splitting*). This form of optimization can have performance benefits, but it also leaves your code vulnerable to invalidation.
Expand Down
2 changes: 1 addition & 1 deletion docs/src/tutorials/pgdsgui.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ The "standardizing method" `foo(x, y)` is short and therefore quick to compile,
Without it, `foo(x, y)` might call itself in an infinite loop, ultimately triggering a StackOverflowError.
StackOverflowErrors are a particularly nasty form of error, and the typeassert ensures that you get a simple `TypeError` instead.

In other contexts, such typeasserts would also have the effect of fixing inference problems even if the type of `x` is not well-inferred (this will be discussed in more detail [later](@ref typeasserts)), but in this case dispatch to `foo(x::X, y::Y)` would have ensured the same outcome.
In other contexts, such typeasserts would also have the effect of fixing inference problems even if the type of `x` is not well-inferred, but in this case dispatch to `foo(x::X, y::Y)` would have ensured the same outcome.

There are of course cases where you can't implement your code in this way: after all, part of the power of Julia is the ability of generic methods to "do the right thing" for a wide variety of types. But in cases where you're doing a standard task, e.g., writing some data to a file, there's really no good reason to recompile your `save` method for a filename encoded as a `String` and again for a `SubString{String}` and again for a `SubstitutionString` and again for an `AbstractString` and ...: after all, the core of the `save` method probably isn't sensitive to the precise encoding of the filename. In such cases, it should be safe to convert all filenames to `String`, thereby reducing the diversity of input arguments for expensive-to-compile methods.

Expand Down
5 changes: 2 additions & 3 deletions docs/src/tutorials/snoop_inference_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@ Throughout this page, we'll use the `OptimizeMe` demo, which ships with `SnoopCo

```@repl fix-inference
using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
cd(joinpath(pkgdir(SnoopCompile), "examples"))
include("OptimizeMe.jl")
include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))
tinf = @snoop_inference OptimizeMe.main();
fg = flamegraph(tinf)
```
Expand Down Expand Up @@ -79,7 +78,7 @@ You can still "dig deep" into individual triggers:
itrig = mtrig.itrigs[1]
```

This is useful if you want to analyze with [`Cthulhu.ascend`](@ref ascend-itrig).
This is useful if you want to analyze with `Cthulhu.ascend`.
`Method`-based triggers, which may aggregate many different individual triggers, can be useful because tools like [Cthulhu.jl](https://github.com/JuliaDebug/Cthulhu.jl) show you the inference results for the entire `MethodInstance`, allowing you to fix many different inference problems at once.

### Trigger trees
Expand Down
115 changes: 20 additions & 95 deletions docs/src/tutorials/snoop_inference_parcel.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,61 +13,45 @@ In such cases, one alternative is to create a manual list of precompile directiv
`precompile` directives have to be emitted by the module that owns the method and/or types.
SnoopCompile comes with a tool, `parcel`, that splits out the "root-most" precompilable MethodInstances into their constituent modules.
This will typically correspond to the bottom row of boxes in the [flame graph](@ref flamegraph).
In cases where you have some non-precompilable MethodInstances, they will include MethodInstances from higher up in the call tree.
In cases where you have some that are not naively precompilable, they will include MethodInstances from higher up in the call tree.

Let's use `SnoopCompile.parcel` on [`OptimizeMeFixed`](@ref inferrability):
Let's use `SnoopCompile.parcel` on our [`OptimizeMe`](@ref inferrability) demo:

```julia
julia> ttot, pcs = SnoopCompile.parcel(tinf);

julia> ttot
0.6084431670000001

julia> pcs
4-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}:
Core => (0.000135179, [(0.000135179, MethodInstance for (NamedTuple{(:sizehint,), T} where T<:Tuple)(::Tuple{Int64}))])
Base => (0.028383533000000002, [(3.2456e-5, MethodInstance for getproperty(::IOBuffer, ::Symbol)), (4.7474e-5, MethodInstance for ==(::Type, ::Nothing)), (5.7944e-5, MethodInstance for typeinfo_eltype(::Type)), (0.00039092299999999994, MethodInstance for show(::IOContext{IOBuffer}, ::Any)), (0.000433143, MethodInstance for IOContext(::IOBuffer, ::IOContext{Base.TTY})), (0.000484984, MethodInstance for Pair{Symbol, DataType}(::Any, ::Any)), (0.000742383, MethodInstance for print(::IOContext{Base.TTY}, ::String, ::String, ::Vararg{String, N} where N)), (0.001293705, MethodInstance for Pair(::Symbol, ::Type)), (0.0018914350000000003, MethodInstance for show(::IOContext{IOBuffer}, ::UInt16)), (0.010604793000000001, MethodInstance for show(::IOContext{IOBuffer}, ::Tuple{String, Int64})), (0.012404293, MethodInstance for show(::IOContext{IOBuffer}, ::Vector{Int64}))])
Base.Ryu => (0.15733664599999997, [(0.05721630600000001, MethodInstance for writeshortest(::Vector{UInt8}, ::Int64, ::Float32, ::Bool, ::Bool, ::Bool, ::Int64, ::UInt8, ::Bool, ::UInt8, ::Bool, ::Bool)), (0.10012033999999997, MethodInstance for show(::IOContext{IOBuffer}, ::Float32))])
Main.OptimizeMeFixed => (0.4204474180000001, [(0.4204474180000001, MethodInstance for main())])
```@repl parcel-inference
using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))
tinf = @snoop_inference OptimizeMe.main();
ttot, pcs = SnoopCompile.parcel(tinf);
ttot
pcs
```

This tells us that a total of ~0.6s were spent on inference.
`parcel` discovered precompilable MethodInstances for four modules, `Core`, `Base`, `Base.Ryu`, and `OptimizeMeFixed`.
`ttot` shows the total amount of time spent on type-inference.
`parcel` discovered precompilable MethodInstances for four modules, `Core`, `Base.Multimedia`, `Base`, and `OptimizeMe` that might benefit from precompile directives.
These are listed in increasing order of inference time.

Let's look specifically at `OptimizeMeFixed`, since that's under our control:

```julia
julia> pcmod = pcs[end]
Main.OptimizeMeFixed => (0.4204474180000001, Tuple{Float64, Core.MethodInstance}[(0.4204474180000001, MethodInstance for main())])

julia> tmod, tpcs = pcmod.second;

julia> tmod
0.4204474180000001

julia> tpcs
1-element Vector{Tuple{Float64, Core.MethodInstance}}:
(0.4204474180000001, MethodInstance for main())
```@repl parcel-inference
pcmod = pcs[end]
tmod, tpcs = pcmod.second;
tmod
tpcs
```

0.42s of that time is due to `OptimizeMeFixed`, and `parcel` discovered a single MethodInstances to precompile, `main()`.
This indicates the amount of time spent specifically on `OptimizeMe`, plus the list of calls that could be precompiled in that module.

We could look at the other modules (packages) similarly.

## SnoopCompile.write

You can generate files that contain ready-to-use `precompile` directives using `SnoopCompile.write`:

```julia
julia> SnoopCompile.write("/tmp/precompiles_OptimizeMe", pcs)
Core: no precompile statements out of 0.000135179
Base: precompiled 0.026194226 out of 0.028383533000000002
Base.Ryu: precompiled 0.15733664599999997 out of 0.15733664599999997
Main.OptimizeMeFixed: precompiled 0.4204474180000001 out of 0.4204474180000001
```@repl parcel-inference
SnoopCompile.write("/tmp/precompiles_OptimizeMe", pcs)
```

You'll now find a directory `/tmp/precompiles_OptimizeMe`, and inside you'll find three files, for `Base`, `Base.Ryu`, and `OptimizeMeFixed`, respectively.
You'll now find a directory `/tmp/precompiles_OptimizeMe`, and inside you'll find files for modules that could have precompile directives added manually.
The contents of the last of these should be recognizable:

```julia
Expand All @@ -81,64 +65,5 @@ The first `ccall` line ensures we only pay the cost of running these `precompile
(It would also matter if you've set `__precompile__(false)` at the top of your module, but if so why are you reading this?)

This file is ready to be moved into the `OptimizeMe` repository and `include`d into your module definition.
Since we added `warmup` manually, you could consider moving `precompile(warmup, ())` into this function.

You might also consider submitting some of the other files (or their `precompile` directives) to the packages you depend on.
In some cases, the specific argument type combinations may be too "niche" to be worth specializing; one such case is found here, a `show` method for `Tuple{String, Int64}` for `Base`.
But in other cases, these may be very worthy additions to the package.

## Final results

Let's check out the final results of adding these `precompile` directives to `OptimizeMeFixed`.
First, let's build both modules as precompiled packages:

```julia
ulia> push!(LOAD_PATH, ".")
4-element Vector{String}:
"@"
"@v#.#"
"@stdlib"
"."

julia> using OptimizeMe
[ Info: Precompiling OptimizeMe [top-level]

julia> using OptimizeMeFixed
[ Info: Precompiling OptimizeMeFixed [top-level]
```
Now in fresh sessions,
```julia
julia> @time (using OptimizeMe; OptimizeMe.main())
3.14 is great
2.718 is jealous
Object x: 7
3.159908 seconds (10.63 M allocations: 582.091 MiB, 5.19% gc time, 99.67% compilation time)
```
versus
```julia
julia> @time (using OptimizeMeFixed; OptimizeMeFixed.main())
3.14 is great
2.718 is jealous
Object x: 7
1.840034 seconds (5.38 M allocations: 289.402 MiB, 5.03% gc time, 96.70% compilation time)
```
We've cut down on the latency by nearly a factor of two.
Moreover, if Julia someday caches generated code, we're well-prepared to capitalize on the benefits, because the same improvements in "code ownership" are almost certain to pay dividends there too.
If you inspect the results, you may sometimes suffer a few disappointments: some methods that we expected to precompile don't "take."
At the moment there appears to be a small subset of methods that fail to precompile, and the reasons are not yet widely understood.
At present, the best advice seems to be to comment-out any precompile directives that don't "take," since otherwise they increase the build time for the package without material benefit.
These failures may be addressed in future versions of Julia.
It's also worth appreciating how much we have succeeded in reducing latency, with the awareness that we may be able to get even greater benefit in the future.
## Summary
`@snoop_inference` collects enough data to learn which methods are triggering inference, how heavily methods are being specialized, and so on.
Examining your code from the standpoint of inference and specialization may be unfamiliar at first, but like other aspects of package development (testing, documentation, and release compatibility management) it can lead to significant improvements in the quality-of-life for you and your users.

0 comments on commit 354f18d

Please sign in to comment.