Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add blog post on new latency-reduction tools #1093

Closed
wants to merge 1 commit into from

Conversation

timholy
Copy link
Member

@timholy timholy commented Nov 28, 2020

This focuses on the new Core.Compiler.Timings inference-timing tools, and
the utilities in SnoopCompile for analyzing the results (@snoopi_deep and friends). These tools were
introduced by Nathan Daly, who is a co-author of the post. CC @NHDaly

This WIP in part because it depends on quite a few outstanding PRs:

Nevertheless it seemed time to post this so that @NHDaly, among others, can collaborate on the writing and so that the DataFrames developers can get a sense for the overall context.

This focuses on the new Core.Compiler.Timings inference-timing tools, and
the utilities in SnoopCompile for analyzing the results. These tools were
introduced by Nathan Daly, who is a co-author of the post.
@github-actions
Copy link

Once the build has completed, you can preview your PR at this URL: https://julialang.netlify.app/previews/PR1093/


- two arguments (`first` and `incols`) could potentially be `NamedTuple`s, and since `(x=1,)` and `(y=1,)` are different `NamedTuple` types, these arguments alone have potentially-huge possibility for specialization. (If these are specialized for the particular column names in a DataFrame, then the scope for specialization is essentially limitless.) Indeed, a check `methodinstances(DataFrames._combine_with_first)` reveals that many of these specializations are for different `NamedTuple`s.

- the `f::Base.Callable` argument is either a function or a type, again a potentially-limitless source of specialization. However, checking the output of `methodinstances`, you'll see that this argument is not specialized. Presumably this is due to the major callers of `_combine_with_first` using a `@nospecialize` on their corresponding argument. In this case, over-specialization does not seem to be a concern, but generally speaking function or type arguments are prime candidates for risk of over-specialization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the absence of specialization just due to the fact that these methods don't call f, but pass it to another function?


\toc

[The Julia programming language][Julia] delivers remarkable runtime performance and flexibility. Julia's flexibility depends on the ability to of methods to handle arguments of many different types. This flexibility would be in competition with runtime performance, were it not for the "trick" of *method specialization*. Julia compiles a separate "instance" of a method for each distinct combination of argument types; this specialization allows code to be optimized to take advantage of specific features of the inputs, eliminating most of the *runtime* cost that would otherwise be the result of Julia's flexibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the ability to of methods" -> "the ability of methods" I think? :)


[The Julia programming language][Julia] delivers remarkable runtime performance and flexibility. Julia's flexibility depends on the ability to of methods to handle arguments of many different types. This flexibility would be in competition with runtime performance, were it not for the "trick" of *method specialization*. Julia compiles a separate "instance" of a method for each distinct combination of argument types; this specialization allows code to be optimized to take advantage of specific features of the inputs, eliminating most of the *runtime* cost that would otherwise be the result of Julia's flexibility.

Unfortunately, method specialization has its own cost: compiler latency. Since compilation is expensive, there is a measurable delay that occurs on first invokation of a method for a specific combination of argument types. There are cases where one can do some of this work once, in advance, using utilities like [`precompile`] or building a custom system with [PackageCompiler]. In other cases, the number of distinct argument types that a method might be passed seems effectively infinite, and in such cases precompilation seems unlikely to be a comprehensive solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"first invokation of" -> "first invocation of" I think? :)


[The Julia programming language][Julia] delivers remarkable runtime performance and flexibility. Julia's flexibility depends on the ability to of methods to handle arguments of many different types. This flexibility would be in competition with runtime performance, were it not for the "trick" of *method specialization*. Julia compiles a separate "instance" of a method for each distinct combination of argument types; this specialization allows code to be optimized to take advantage of specific features of the inputs, eliminating most of the *runtime* cost that would otherwise be the result of Julia's flexibility.

Unfortunately, method specialization has its own cost: compiler latency. Since compilation is expensive, there is a measurable delay that occurs on first invokation of a method for a specific combination of argument types. There are cases where one can do some of this work once, in advance, using utilities like [`precompile`] or building a custom system with [PackageCompiler]. In other cases, the number of distinct argument types that a method might be passed seems effectively infinite, and in such cases precompilation seems unlikely to be a comprehensive solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"using utilities like [precompile] or building a custom system with [PackageCompiler]" sounds slightly off to me. Perhaps "or building" -> "or by building" and "custom system" -> "custom system image" or so? :)

In this post, we'll walk through the process of analyzing and optimizing the [DataFrames] package. We chose DataFrames for several reasons:

- DataFrames is widely used
- the DataFrames API seems fairly stable, and they are approaching their 1.0 release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "the" -> "The" for consistency with capitalization later in the list (or decapitalize the "In" below, alternatively)? :)


In this post, we'll walk through the process of analyzing and optimizing the [DataFrames] package. We chose DataFrames for several reasons:

- DataFrames is widely used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, perhaps add terminating periods to the items on this list for consistency with the last item and the presence of punctuation in some of the bodies of the items? :)

- DataFrames is developed by a sophisticated and conscientious team, and the package has already been [aggressively optimized for latency](https://discourse.julialang.org/t/release-announcements-for-dataframes-jl/18258/112?u=tim.holy) using tools that were, until now, state-of-the-art; this sets a high bar for any new tools (don't worry, we're going to crest that bar ;-) )
- In a previous [blog post][invalidations], one of the authors indirectly "called out" DataFrames (and more accurately its dependency [CategoricalArrays]) for having a lot of difficult-to-fix invalidations. To their credit, the developers made changes that dropped the number of invalidations by about 10×. This post is partly an attempt to return the favor. That said, we hope they don't mind being guinea pigs for these new tools.

This post is based on DataFrames 0.22.1, and version 0.9 of the underlying CategoricalArrays. If you follow the steps of this blog post with different versions, you're likely to get different results from those shown here, partly because many of the issues we identified have been fixed in more recent releases. It should also be emphasize that these analysis tools are only supported on Julia 1.6 and above; at the time of this post, Julia 1.6 not yet to "alpha" release phase but can be obtained from [nightly] snapshots or built from [source].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It should also be emphasize that" -> "It should also be emphasized that" I think? :)

- DataFrames is developed by a sophisticated and conscientious team, and the package has already been [aggressively optimized for latency](https://discourse.julialang.org/t/release-announcements-for-dataframes-jl/18258/112?u=tim.holy) using tools that were, until now, state-of-the-art; this sets a high bar for any new tools (don't worry, we're going to crest that bar ;-) )
- In a previous [blog post][invalidations], one of the authors indirectly "called out" DataFrames (and more accurately its dependency [CategoricalArrays]) for having a lot of difficult-to-fix invalidations. To their credit, the developers made changes that dropped the number of invalidations by about 10×. This post is partly an attempt to return the favor. That said, we hope they don't mind being guinea pigs for these new tools.

This post is based on DataFrames 0.22.1, and version 0.9 of the underlying CategoricalArrays. If you follow the steps of this blog post with different versions, you're likely to get different results from those shown here, partly because many of the issues we identified have been fixed in more recent releases. It should also be emphasize that these analysis tools are only supported on Julia 1.6 and above; at the time of this post, Julia 1.6 not yet to "alpha" release phase but can be obtained from [nightly] snapshots or built from [source].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Julia 1.6 not yet to" -> "Julia 1.6 is not yet to" I think? :)


## Identifying the most costly-to-infer methods

Our first goal is to identify methods that cost the most in inference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "cost the most in inference" -> "cost the most to infer"? :)

```

`@snoopi_deep` is a new tool in [SnoopCompile] which leverages new functionality in Julia. Like the older `@snoopi`, it measures what is being inferred and how much time it takes. However, `@snoopi` measures aggregate time for each "entrance" into inference, and it includes the time spent inferring all the methods that get inferrably dispatched from the entrance point. In contrast, `@snoopi_deep` extracts this data for each method instance, regardless of whether it is an "entrance point" or called by something else.
Copy link
Member

@Sacha0 Sacha0 Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "extracts this data for each method instance" -> "extracts the time spent inferring each method instance exclusive of time spent inferring other (e.g. callee) method instances" or similar? :)

│ │ │ ⋮
```

Each branch of a node indents further to the right, and represents callees of the node. The `ROOT` object is special: it measures the approximate time spent on the entire operation, excepting inference, and consequently combines native code generation and runtime. Each other entry reports the time needed to infer just that method instance, not including the time spent inferring its callees.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "Each other entry" -> "Every other entry"? :)


- two arguments (`first` and `incols`) could potentially be `NamedTuple`s, and since `(x=1,)` and `(y=1,)` are different `NamedTuple` types, these arguments alone have potentially-huge possibility for specialization. (If these are specialized for the particular column names in a DataFrame, then the scope for specialization is essentially limitless.) Indeed, a check `methodinstances(DataFrames._combine_with_first)` reveals that many of these specializations are for different `NamedTuple`s.

- the `f::Base.Callable` argument is either a function or a type, again a potentially-limitless source of specialization. However, checking the output of `methodinstances`, you'll see that this argument is not specialized. Presumably this is due to the major callers of `_combine_with_first` using a `@nospecialize` on their corresponding argument. In this case, over-specialization does not seem to be a concern, but generally speaking function or type arguments are prime candidates for risk of over-specialization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on which style guide you prefer, "potentially-limitless" -> "potentially limitless", or not :).


- the `f::Base.Callable` argument is either a function or a type, again a potentially-limitless source of specialization. However, checking the output of `methodinstances`, you'll see that this argument is not specialized. Presumably this is due to the major callers of `_combine_with_first` using a `@nospecialize` on their corresponding argument. In this case, over-specialization does not seem to be a concern, but generally speaking function or type arguments are prime candidates for risk of over-specialization.

Some strategies, like adding `@nospecialize`s, might be effective in reducing compile-time cost. But without knowing a lot more about this package, it is difficult to know whether this might have undesirable effects on runtime performance. So here we pursue a different strategy: let's focus on the fact that inference has to be performed for each unique combination of input types. Since we have two highly-diverse argument types, the effect is essentially *multiplicative*. But we also note that `incols` is just "passed through"; while we might want to preserve this type information, specializing on `incols` does not affect any portion of the body of this method other than the final calls to `_combine_tables_with_first!` or `_combine_rows_with_first!`. Consequently, we may be wasting a lot of time specializing code that doesn't actually change dependening on the type of `incols`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, depending on which style guide you prefer, "highly-diverse" -> "highly diverse", or not :).

Copy link
Contributor

@thofma thofma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out and noticed the name changed from accumulate_by_method to accumulate_by_source.

```julia
julia> using DataFrames; tinf = @snoopi_deep include("grouping.jl");

julia> tm = accumulate_by_method(flatten_times(tinf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> tm = accumulate_by_method(flatten_times(tinf))
julia> tm = accumulate_by_source(flatten_times(tinf))

and after we had

```julia
julia> tm = accumulate_by_method(flatten_times(tinf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> tm = accumulate_by_method(flatten_times(tinf))
julia> tm = accumulate_by_source(flatten_times(tinf))

This is a truncated version of the output; if you look at more of the entries carefully, you'll notice a number of near-duplicates: `do_call` appears numerous times, with different argument types. While `do_call` has eight methods, there are many more entries in `flatten_times(tinf)` than these eight, and this is explained by multiple specializations of single methods. It's of particular interest to aggregate all the instances of a particular method, since this represents the cost of the method itself:

```julia
julia> tm = accumulate_by_method(flatten_times(tinf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> tm = accumulate_by_method(flatten_times(tinf))
julia> tm = accumulate_by_source(flatten_times(tinf))

```

The aggregate cost is a sum of the cost of all individual `MethodInstance`s.
(`do_call` has even more instances, at 1260, but some of these instances must be must less time-consuming than the worst offender we noted above.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"must be must less" -> "must be much less" I think? :)

Let's apply this to DataFrames. After collecting the data with `@snoopi_deep include("runtests.jl")`, we can see inference failures with

```julia
julia> ibs = SnoopCompile.inference_breaks(tinf)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find inference_breaks in SnoopCompile (latest master), did the name change?

@timholy
Copy link
Member Author

timholy commented Jan 2, 2021

Yeah, it's changed a lot. Your best source now is timholy/SnoopCompile.jl#192, though I'm going to push a couple more changes before merging.

I am almost certainly going to replace this wholesale, starting off the foundation in #1111, so for safety I'll close this.

@timholy timholy closed this Jan 2, 2021
@DilumAluthge DilumAluthge deleted the teh/latency_deep branch February 8, 2021 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants