-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/trace: TestTraceCPUProfile fails intermittently starting 2022-05-03 #52693
Comments
Since there is relatively little overlap between (CC @golang/runtime) |
Indeed, it also affects
In addition to those explicit failures, there was also an |
Change https://go.dev/cl/404055 mentions this issue: |
@rhysh, this test has a very high failure rate — please aim to have it resolved today. (If there isn't a quick solution to the underlying problem, either roll back the changes that introduced or exposed it, or add call to |
Of the results for All four of those claim that the CPU profile included a stack with a doubled frame, and in all four of those the doubling is 2022-05-06T04:25:34-f87e28d/openbsd-arm-jsing
2022-05-05T23:41:16-f99511d/linux-mips64le-rtrk
2022-05-05T22:53:14-86536b9/linux-mips64le-rtrk
2022-05-05T18:48:17-e94fe09/linux-386-longtest
The following all predate the attempted fix at 7b1f8b6, so don't indicate an existing problem. |
Change https://go.dev/cl/404698 mentions this issue: |
Change https://go.dev/cl/404995 mentions this issue: |
The builder failures continue to point to doubled
|
The compiler may choose to inline multiple layers of function call, such that A calling B calling C may end up with all of the instructions for B and C written as part of A's function body. Within that function body, some PCs will represent code from function A. Some will represent code from function B, and for each of those the runtime will have an instruction attributable to A that it can report as its caller. Others will represent code from function C, and for each of those the runtime will have an instruction attributable to B and an instruction attributable to A that it can report as callers. When a profiling signal arrives at an instruction in B (as inlined in A) that the runtime also uses to describe calls to C, the profileBuilder ends up with an incorrect cache of allFrames results. That PC should lead to a location record in the profile that represents the frames B<-A, but the allFrames cache's view should expand the PC only to the B frame. Otherwise, when a profiling signal arrives at an instruction in C (as inlined in B in A), the PC stack C,B,A can get expanded to the frames C,B<-A,A as follows: The inlining deck starts empty. The first tryAdd call proposes PC C and frames C, which the deck accepts. The second tryAdd call proposes PC B and, due to the incorrect caching, frames B,A. (A fresh call to allFrames with PC B would return the frame list B.) The deck accepts that PC and frames. The third tryAdd call proposes PC A and frames A. The deck rejects those because a call from A to A cannot possibly have been inlined. This results in a new location record in the profile representing the frames C<-B<-A (good), as called by A (bad). The bug is the cached expansion of PC B to frames B<-A. That mapping is only appropriate for the resulting protobuf-format profile. The cache needs to reflect the results of a call to allFrames, which expands the PC B to the single frame B. For #50996 For #52693 Fixes #52764 Change-Id: I36d080f3c8a05650cdc13ced262189c33b0083b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/404995 Reviewed-by: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Rhys Hiltner <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
No failures here since May 13, |
The failures I've seen so far are on android/386 and openbsd/amd64. Both show a sample starting in
runtime.mcall
(which I think is important to the bug) and leading to how a preemption is reported in the execution trace (which I think is less important). I'll take a look at what mcall means for getg(), and getg().m.p.ptr().CC @bcmills (you'll see these when you do the next dashboard triage).
2022-05-03T21:11:24-23f1325/android-386-emu
2022-05-03T21:11:24-23f1325/openbsd-amd64-70
The text was updated successfully, but these errors were encountered: