-
-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add blog post on new latency-reduction tools #1093
Conversation
This focuses on the new Core.Compiler.Timings inference-timing tools, and the utilities in SnoopCompile for analyzing the results. These tools were introduced by Nathan Daly, who is a co-author of the post.
Once the build has completed, you can preview your PR at this URL: https://julialang.netlify.app/previews/PR1093/ |
|
||
- two arguments (`first` and `incols`) could potentially be `NamedTuple`s, and since `(x=1,)` and `(y=1,)` are different `NamedTuple` types, these arguments alone have potentially-huge possibility for specialization. (If these are specialized for the particular column names in a DataFrame, then the scope for specialization is essentially limitless.) Indeed, a check `methodinstances(DataFrames._combine_with_first)` reveals that many of these specializations are for different `NamedTuple`s. | ||
|
||
- the `f::Base.Callable` argument is either a function or a type, again a potentially-limitless source of specialization. However, checking the output of `methodinstances`, you'll see that this argument is not specialized. Presumably this is due to the major callers of `_combine_with_first` using a `@nospecialize` on their corresponding argument. In this case, over-specialization does not seem to be a concern, but generally speaking function or type arguments are prime candidates for risk of over-specialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the absence of specialization just due to the fact that these methods don't call f
, but pass it to another function?
|
||
\toc | ||
|
||
[The Julia programming language][Julia] delivers remarkable runtime performance and flexibility. Julia's flexibility depends on the ability to of methods to handle arguments of many different types. This flexibility would be in competition with runtime performance, were it not for the "trick" of *method specialization*. Julia compiles a separate "instance" of a method for each distinct combination of argument types; this specialization allows code to be optimized to take advantage of specific features of the inputs, eliminating most of the *runtime* cost that would otherwise be the result of Julia's flexibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the ability to of methods" -> "the ability of methods" I think? :)
|
||
[The Julia programming language][Julia] delivers remarkable runtime performance and flexibility. Julia's flexibility depends on the ability to of methods to handle arguments of many different types. This flexibility would be in competition with runtime performance, were it not for the "trick" of *method specialization*. Julia compiles a separate "instance" of a method for each distinct combination of argument types; this specialization allows code to be optimized to take advantage of specific features of the inputs, eliminating most of the *runtime* cost that would otherwise be the result of Julia's flexibility. | ||
|
||
Unfortunately, method specialization has its own cost: compiler latency. Since compilation is expensive, there is a measurable delay that occurs on first invokation of a method for a specific combination of argument types. There are cases where one can do some of this work once, in advance, using utilities like [`precompile`] or building a custom system with [PackageCompiler]. In other cases, the number of distinct argument types that a method might be passed seems effectively infinite, and in such cases precompilation seems unlikely to be a comprehensive solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"first invokation of" -> "first invocation of" I think? :)
|
||
[The Julia programming language][Julia] delivers remarkable runtime performance and flexibility. Julia's flexibility depends on the ability to of methods to handle arguments of many different types. This flexibility would be in competition with runtime performance, were it not for the "trick" of *method specialization*. Julia compiles a separate "instance" of a method for each distinct combination of argument types; this specialization allows code to be optimized to take advantage of specific features of the inputs, eliminating most of the *runtime* cost that would otherwise be the result of Julia's flexibility. | ||
|
||
Unfortunately, method specialization has its own cost: compiler latency. Since compilation is expensive, there is a measurable delay that occurs on first invokation of a method for a specific combination of argument types. There are cases where one can do some of this work once, in advance, using utilities like [`precompile`] or building a custom system with [PackageCompiler]. In other cases, the number of distinct argument types that a method might be passed seems effectively infinite, and in such cases precompilation seems unlikely to be a comprehensive solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"using utilities like [precompile
] or building a custom system with [PackageCompiler]" sounds slightly off to me. Perhaps "or building" -> "or by building" and "custom system" -> "custom system image" or so? :)
In this post, we'll walk through the process of analyzing and optimizing the [DataFrames] package. We chose DataFrames for several reasons: | ||
|
||
- DataFrames is widely used | ||
- the DataFrames API seems fairly stable, and they are approaching their 1.0 release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "the" -> "The" for consistency with capitalization later in the list (or decapitalize the "In" below, alternatively)? :)
|
||
In this post, we'll walk through the process of analyzing and optimizing the [DataFrames] package. We chose DataFrames for several reasons: | ||
|
||
- DataFrames is widely used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, perhaps add terminating periods to the items on this list for consistency with the last item and the presence of punctuation in some of the bodies of the items? :)
- DataFrames is developed by a sophisticated and conscientious team, and the package has already been [aggressively optimized for latency](https://discourse.julialang.org/t/release-announcements-for-dataframes-jl/18258/112?u=tim.holy) using tools that were, until now, state-of-the-art; this sets a high bar for any new tools (don't worry, we're going to crest that bar ;-) ) | ||
- In a previous [blog post][invalidations], one of the authors indirectly "called out" DataFrames (and more accurately its dependency [CategoricalArrays]) for having a lot of difficult-to-fix invalidations. To their credit, the developers made changes that dropped the number of invalidations by about 10×. This post is partly an attempt to return the favor. That said, we hope they don't mind being guinea pigs for these new tools. | ||
|
||
This post is based on DataFrames 0.22.1, and version 0.9 of the underlying CategoricalArrays. If you follow the steps of this blog post with different versions, you're likely to get different results from those shown here, partly because many of the issues we identified have been fixed in more recent releases. It should also be emphasize that these analysis tools are only supported on Julia 1.6 and above; at the time of this post, Julia 1.6 not yet to "alpha" release phase but can be obtained from [nightly] snapshots or built from [source]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"It should also be emphasize that" -> "It should also be emphasized that" I think? :)
- DataFrames is developed by a sophisticated and conscientious team, and the package has already been [aggressively optimized for latency](https://discourse.julialang.org/t/release-announcements-for-dataframes-jl/18258/112?u=tim.holy) using tools that were, until now, state-of-the-art; this sets a high bar for any new tools (don't worry, we're going to crest that bar ;-) ) | ||
- In a previous [blog post][invalidations], one of the authors indirectly "called out" DataFrames (and more accurately its dependency [CategoricalArrays]) for having a lot of difficult-to-fix invalidations. To their credit, the developers made changes that dropped the number of invalidations by about 10×. This post is partly an attempt to return the favor. That said, we hope they don't mind being guinea pigs for these new tools. | ||
|
||
This post is based on DataFrames 0.22.1, and version 0.9 of the underlying CategoricalArrays. If you follow the steps of this blog post with different versions, you're likely to get different results from those shown here, partly because many of the issues we identified have been fixed in more recent releases. It should also be emphasize that these analysis tools are only supported on Julia 1.6 and above; at the time of this post, Julia 1.6 not yet to "alpha" release phase but can be obtained from [nightly] snapshots or built from [source]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Julia 1.6 not yet to" -> "Julia 1.6 is not yet to" I think? :)
|
||
## Identifying the most costly-to-infer methods | ||
|
||
Our first goal is to identify methods that cost the most in inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "cost the most in inference" -> "cost the most to infer"? :)
⋮ | ||
``` | ||
|
||
`@snoopi_deep` is a new tool in [SnoopCompile] which leverages new functionality in Julia. Like the older `@snoopi`, it measures what is being inferred and how much time it takes. However, `@snoopi` measures aggregate time for each "entrance" into inference, and it includes the time spent inferring all the methods that get inferrably dispatched from the entrance point. In contrast, `@snoopi_deep` extracts this data for each method instance, regardless of whether it is an "entrance point" or called by something else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "extracts this data for each method instance" -> "extracts the time spent inferring each method instance exclusive of time spent inferring other (e.g. callee) method instances" or similar? :)
│ │ │ ⋮ | ||
``` | ||
|
||
Each branch of a node indents further to the right, and represents callees of the node. The `ROOT` object is special: it measures the approximate time spent on the entire operation, excepting inference, and consequently combines native code generation and runtime. Each other entry reports the time needed to infer just that method instance, not including the time spent inferring its callees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "Each other entry" -> "Every other entry"? :)
|
||
- two arguments (`first` and `incols`) could potentially be `NamedTuple`s, and since `(x=1,)` and `(y=1,)` are different `NamedTuple` types, these arguments alone have potentially-huge possibility for specialization. (If these are specialized for the particular column names in a DataFrame, then the scope for specialization is essentially limitless.) Indeed, a check `methodinstances(DataFrames._combine_with_first)` reveals that many of these specializations are for different `NamedTuple`s. | ||
|
||
- the `f::Base.Callable` argument is either a function or a type, again a potentially-limitless source of specialization. However, checking the output of `methodinstances`, you'll see that this argument is not specialized. Presumably this is due to the major callers of `_combine_with_first` using a `@nospecialize` on their corresponding argument. In this case, over-specialization does not seem to be a concern, but generally speaking function or type arguments are prime candidates for risk of over-specialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on which style guide you prefer, "potentially-limitless" -> "potentially limitless", or not :).
|
||
- the `f::Base.Callable` argument is either a function or a type, again a potentially-limitless source of specialization. However, checking the output of `methodinstances`, you'll see that this argument is not specialized. Presumably this is due to the major callers of `_combine_with_first` using a `@nospecialize` on their corresponding argument. In this case, over-specialization does not seem to be a concern, but generally speaking function or type arguments are prime candidates for risk of over-specialization. | ||
|
||
Some strategies, like adding `@nospecialize`s, might be effective in reducing compile-time cost. But without knowing a lot more about this package, it is difficult to know whether this might have undesirable effects on runtime performance. So here we pursue a different strategy: let's focus on the fact that inference has to be performed for each unique combination of input types. Since we have two highly-diverse argument types, the effect is essentially *multiplicative*. But we also note that `incols` is just "passed through"; while we might want to preserve this type information, specializing on `incols` does not affect any portion of the body of this method other than the final calls to `_combine_tables_with_first!` or `_combine_rows_with_first!`. Consequently, we may be wasting a lot of time specializing code that doesn't actually change dependening on the type of `incols`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, depending on which style guide you prefer, "highly-diverse" -> "highly diverse", or not :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it out and noticed the name changed from accumulate_by_method
to accumulate_by_source
.
```julia | ||
julia> using DataFrames; tinf = @snoopi_deep include("grouping.jl"); | ||
|
||
julia> tm = accumulate_by_method(flatten_times(tinf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> tm = accumulate_by_method(flatten_times(tinf)) | |
julia> tm = accumulate_by_source(flatten_times(tinf)) |
and after we had | ||
|
||
```julia | ||
julia> tm = accumulate_by_method(flatten_times(tinf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> tm = accumulate_by_method(flatten_times(tinf)) | |
julia> tm = accumulate_by_source(flatten_times(tinf)) |
This is a truncated version of the output; if you look at more of the entries carefully, you'll notice a number of near-duplicates: `do_call` appears numerous times, with different argument types. While `do_call` has eight methods, there are many more entries in `flatten_times(tinf)` than these eight, and this is explained by multiple specializations of single methods. It's of particular interest to aggregate all the instances of a particular method, since this represents the cost of the method itself: | ||
|
||
```julia | ||
julia> tm = accumulate_by_method(flatten_times(tinf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> tm = accumulate_by_method(flatten_times(tinf)) | |
julia> tm = accumulate_by_source(flatten_times(tinf)) |
``` | ||
|
||
The aggregate cost is a sum of the cost of all individual `MethodInstance`s. | ||
(`do_call` has even more instances, at 1260, but some of these instances must be must less time-consuming than the worst offender we noted above.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"must be must less" -> "must be much less" I think? :)
Let's apply this to DataFrames. After collecting the data with `@snoopi_deep include("runtests.jl")`, we can see inference failures with | ||
|
||
```julia | ||
julia> ibs = SnoopCompile.inference_breaks(tinf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find inference_breaks
in SnoopCompile (latest master), did the name change?
Yeah, it's changed a lot. Your best source now is timholy/SnoopCompile.jl#192, though I'm going to push a couple more changes before merging. I am almost certainly going to replace this wholesale, starting off the foundation in #1111, so for safety I'll close this. |
This focuses on the new Core.Compiler.Timings inference-timing tools, and
the utilities in SnoopCompile for analyzing the results (
@snoopi_deep
and friends). These tools wereintroduced by Nathan Daly, who is a co-author of the post. CC @NHDaly
This WIP in part because it depends on quite a few outstanding PRs:
module_roots
timholy/SnoopCompile.jl#157inference_triggers
timholy/SnoopCompile.jl#159Nevertheless it seemed time to post this so that @NHDaly, among others, can collaborate on the writing and so that the DataFrames developers can get a sense for the overall context.