-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: overhaul inliner #61502
Comments
Change https://go.dev/cl/511565 mentions this issue: |
Change https://go.dev/cl/511560 mentions this issue: |
Change https://go.dev/cl/511559 mentions this issue: |
Change https://go.dev/cl/511555 mentions this issue: |
Change https://go.dev/cl/511566 mentions this issue: |
Change https://go.dev/cl/511564 mentions this issue: |
Change https://go.dev/cl/511562 mentions this issue: |
Change https://go.dev/cl/511557 mentions this issue: |
Change https://go.dev/cl/511561 mentions this issue: |
Change https://go.dev/cl/511558 mentions this issue: |
Change https://go.dev/cl/511563 mentions this issue: |
Change https://go.dev/cl/511556 mentions this issue: |
Christmas in July! |
The 1.22 RC is planned for December, so I think it'll be more like Christmas in December. 😄 |
I would argue that pattern is more of a workaround for the current inliner limitations than a deliberate one aiding readability, so if possible it would be great to avoid ossifying it further. Instead, especially since PGO is now available, I would rather suggest we should go in the direction (not necessarily as part of this overhaul) of supporting also a form of function splitting - so that if the compiler detects what is basically a hot fast path and one or more cold paths in the same function, the fast path is inlined in the callers and the cold path(s) are left where they are. This would allow to curb the proliferation of the |
Function splitting/outlining is tricker, especially for tracebacks and APIs that converts between function and PC (e.g. runtime.FuncForPC). It is probably solvable, but I'm not sure it is within the scope of this issue. |
@aclements So excited for this work! I'm curious if there is a corpus of code that the team is planning to use to guide the changes (a corpus would probably also function as a test bed). |
@evanphx, for static metrics, we're doing some experiments over the Go module cache (that is, basically all public Go modules). We aren't really at performance metrics yet, but the basic corpus we'll use is golang.org/x/benchmarks. |
Add "NewInliner" to the list of Go experiments, used for enabling an updated/improved version of the function inlining phase within the Go compiler. Updates #61502. Change-Id: I3218b3ae59a2d05156e8017cd9ee1d7b66cad031 Reviewed-on: https://go-review.googlesource.com/c/go/+/511555 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> Run-TryBot: Than McIntosh <[email protected]>
Change https://go.dev/cl/517595 mentions this issue: |
Add definitions for a set of Go function "properties" intended to be useful for driving inlining decisions. This CL just defines a set of flags and a container to hold them; a subsequent CL will add code to compute the properties for a function given its IR/AST representation. Updates #61502. Change-Id: Ifa26c1ad055c02ca0ce9cf37078cee7b3385e18a Reviewed-on: https://go-review.googlesource.com/c/go/+/511556 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> Run-TryBot: Than McIntosh <[email protected]> Reviewed-by: Than McIntosh <[email protected]>
Add some machinery to support computing function "properties" for use in driving inlining heuristics, and a unit testing framework to check to see if the property computations are correct for a given set of canned Go source files. This CL is mainly the analysis skeleton and a testing framework; the code to compute the actual props will arrive in a later patch. Updates #61502. Change-Id: I7970b64f713d17d7fdd7e8e9ccc7d9b0490571bf Reviewed-on: https://go-review.googlesource.com/c/go/+/511557 Reviewed-by: Matthew Dempsky <[email protected]> Run-TryBot: Than McIntosh <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Rename the ir-local function "reassigned" to "Reassigned" so that it can be used as part of inline heuristic analysis. Fix up the header comment along that way, which had some stale material. Add support for detecting reassignments via OASOP (as opposed to just simple assignments). Updates #61502. Change-Id: I50f40f81263c0d7f61f30fcf0258f0b0f93acdca Reviewed-on: https://go-review.googlesource.com/c/go/+/511560 Reviewed-by: Matthew Dempsky <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Than McIntosh <[email protected]>
This CL adds a builder and trybot for GOEXPERIMENT=newinliner, which is currently under development. Updates golang/go#61502. Fixes golang/go#61883. Change-Id: Iefb37ca7c9feca30bb2783170f1a8cdfa9fd02af Reviewed-on: https://go-review.googlesource.com/c/build/+/517595 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Matthew Dempsky <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Auto-Submit: Matthew Dempsky <[email protected]> Reviewed-by: Than McIntosh <[email protected]>
…stics Add code to compute whether a given function appears to unconditionally call panic or exit, as a means of driving inlining decisions. Note that this determination is based on heuristics/guesses, as opposed to strict safety analysis; in some cases we may miss a function that does indeed always panic, or mark a function as always invoking panic when it doesn't; the intent is get the right answer in "most" cases. Updates #61502. Change-Id: Ibba3e60c06c2e54cf29b3ffa0f816518aaacb9a3 Reviewed-on: https://go-review.googlesource.com/c/go/+/511558 Reviewed-by: Matthew Dempsky <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Add code to analyze properties of function result values, specifically heuristics for cases where we always return allocated memory, always return the same constant, or always return the same function. Updates #61502. Change-Id: I8b0a3295b5be7f7ad4c2d5b9803925aea0639376 Reviewed-on: https://go-review.googlesource.com/c/go/+/511559 Reviewed-by: Matthew Dempsky <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Add code to analyze properties of function params, specifically heuristics to look for cases where unmodified params feed into "if" and "switch" statements in ways that might enable constant folding and/or dead code elimination if the call were inlined at a callsite that passes a constant to the correct param. We also look for cases where a function parameter feeds unmodified into an interface method call or indirect call. Updates #61502. Change-Id: Iaf7297e19637daeabd0ec72be88d654b545546ae Reviewed-on: https://go-review.googlesource.com/c/go/+/511561 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]>
Augment the ir.Inline container to include an entry for function properties (currently serialized as a string), and if GOEXPERIMENT=newinliner is set, compute and store function properties for all inline candidates processed by the inliner. The idea here is that if the function properties are going to drive inlining decisions, we'd like to have the same info from non-local / imported functions as for local / in-package functions, hence we need to include the properties in the export data. Hand testing on the compiler itself and with k8s kubelet shows that this increases the size of export data overall by about 2-3 percent, so a pretty modest increase. Updates #61502. Change-Id: I9d1c311aa8418d02ffea3629c3dd9d8076886d15 Reviewed-on: https://go-review.googlesource.com/c/go/+/511562 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]>
Extend the code that computes various properties and parameter flags to incorporate information from export data in addition to things we can get from the current package. Specifically: - when deciding whether the current function always calls panic/exit, check to see whether it has an unconditional call to some other function that has that flag. - when computing "parameter feeds" properties, look not just for cases where a parameter feeds an interesting construct (if/switch, indirect/interface call, etc) but where it feeds a call whose corresponding param has that flag. - when computing return properties, if a given return is always the results of a call to X, then set the return properties to those of X. Updates #61502. Change-Id: I6472fe98759cccad05b8eed58e4fc568201d88ad Reviewed-on: https://go-review.googlesource.com/c/go/+/511563 Reviewed-by: Matthew Dempsky <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Build up a table of (potentially) inlinable call sites during inline heuristic analysis, and introduce a framework for analyzing each call site to collect applicable flags (for example, is call nested in loop). This patch doesn't include any of the flag analysis, just the machinery to collect the callsites and a regression test harness. Updates #61502. Change-Id: Ieaf4a008ac9868e9762c63f5b59bd264dc71ab30 Reviewed-on: https://go-review.googlesource.com/c/go/+/511564 Reviewed-by: Matthew Dempsky <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Add code to detect call sites that are nested in loops, call sites that are on an unconditional path to panic/exit, and call sites within "init" functions. The panic-path processing reuses some of the logic+state already present for the function flag version of "calls panic/exit". Updates #61502. Change-Id: I1d728e0763282d3616a9cbc0a07c5cda115660f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/511565 Reviewed-by: Matthew Dempsky <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Assign scores to callsites based on previously computed function properties and callsite properties. This currently works by taking the size score for the function (as computed by CanInline) and then making a series of adjustments, positive or negative based on various function and callsite properties. NB: much work also remaining on deciding what are the best score adjustment values for specific heuristics. I've picked a bunch of arbitrary constants, but they will almost certainly need tuning and tweaking to arrive at something that has good performance. Updates #61502. Change-Id: I887403f95e76d7aa2708494b8686c6026861a6ed Reviewed-on: https://go-review.googlesource.com/c/go/+/511566 Reviewed-by: Matthew Dempsky <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Change https://go.dev/cl/549395 mentions this issue: |
Add some material to the "compiler" portion of the release notes describing the 1.22 changes to the inliner. For #61422. Updates #61502. Change-Id: Ic7f1cb7f70752446d2465ea3da6bd7488436342b Reviewed-on: https://go-review.googlesource.com/c/go/+/549395 Reviewed-by: David Chase <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
@aclements asked me to leave this example here. Lots of code using utf8.DecodeRune and friends has manual checks for the ASCII case to avoid the function call. It would be nice to rewrite utf8.DecodeRune and friends to be "outlined": do the ASCII case and then call an unexported general-case function. Then this form would get inlined and we could delete all the manual optimizations. This doesn't work today. For example here is what we'd like to write:
but instead people write things like this:
If we rewrote utf8.DecodeRune to be:
then the hope is that inlining would turn the NumSmileys code into the NumSmileysFast code instead of us having to do it. This fails today for at least two reasons. First, the inlining budgets penalize function calls too much. The budget is 80 and the cost of a call is 57 (!?), with the result that the rewritten DecodeRune comes in at 87. I would suggest something more like 40: if you are doing an outline-call pattern you can spend half what you normally would. Second, with the budgets changed, the inlining is not as efficient as the original code. The problem is that s[i:] is still computed before doing the fast path, so there are a bunch of memory operations to make this slice value. It would be nice if somehow the compiler recognized that in
i < len(s) implies len(t) > 0 and also t[0] is just s[i], so the code could delay materializing t entirely until the actual slowpath call. Generally speaking, making it not necessary to manually optimize these kinds of loops is a good goal for the new inliner. |
Add some material to the "compiler" portion of the release notes describing the 1.22 changes to the inliner. For golang#61422. Updates golang#61502. Change-Id: Ic7f1cb7f70752446d2465ea3da6bd7488436342b Reviewed-on: https://go-review.googlesource.com/c/go/+/549395 Reviewed-by: David Chase <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Just a small idea: about how to identify hot functions in PGO, whether the total cost of the parent function should be used as the standard (that is, cum_rate), rather than the flat_rate of the child function. The reason is that if the cum_rate of a function is large and the flat_rate is small, then the downstream function of this function will be inline. The benefit will be relatively high (because it is highly probable that this function has many downstream calls, so at least it will save a lot of function call overhead). |
I would like to discuss the topic of better supporting automatic instrumentation frameworks, because they will be affected by the planned Go inliner improvements. One example is eBPF which allows to access user code and variables by analyzing the stack and CPU registers via uprobes. Uprobes do not require Go code changes and are usually installed on function-level to read parameters and return values. However, this requires to be able to predict if a function might get inlined. Only functions which never get inlined are potential candidates for installing uprobes. I'm bringing up this topic, because I think it would be a great opportunity to improve automatic monitoring capabilities of Golang. |
Hi. Is this new inliner included in Go 1.23? |
@objectref No mention of the new inliner in the provisional release notes for 1.23, so I would say no. |
@jub0bs Pity. But I would love to know the current state of this, it will be such an improvement! |
The new inliner went out in 1.22. |
In addition to the improvements that @randall77 linked to in Go 1.22, Go 1.23 includes improved stack slot allocation that directly came out of the inlining project. That change is really important for more aggressive inlining, though it happens to lift all boats. Unfortunately, other than these few changes, our broader plans for the inliner have currently been shelved because of personnel shifts. It's still in the backlog, but we're weighing it against other opportunities (again). The heuristic changes that can be enabled with the GOEXPERIMENT never produced reliable performance improvements, so they remain off by default, and one of the big next steps here would be to really understand why that is and what can be used from those heuristics, or if we just need a totally different approach to the heuristics. |
We’re starting a project to overhaul the inlining optimization pass in the Go compiler, with the goal of having a far more effective inliner in Go 1.22. This issue will track this work. This document motivates this project and outlines our general design goals. More detailed design documents will follow.
This project is being driven by @thanm and @mdempsky, but we would love to have input from the community. Inlining is inherently very heuristic and can be a deep rabbit hole to go down, so I’m sure many of our design decisions will be motivated by pragmatism and trying to ship something better than what we have for Go 1.22, even if it’s not perfect, with the goal of further improving it in later releases.
The text was updated successfully, but these errors were encountered: