-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: GC turning on/off disruptive to scheduler and locality #20307
Comments
CL https://golang.org/cl/43110 mentions this issue. |
Updates golang#20307 With -c=2: name old time/op new time/op delta Template 140ms ± 3% 139ms ± 4% -1.06% (p=0.003 n=50+50) Unicode 81.1ms ± 4% 81.9ms ± 4% +0.96% (p=0.006 n=50+49) GoTypes 375ms ± 3% 374ms ± 3% ~ (p=0.094 n=48+48) Compiler 1.69s ± 2% 1.68s ± 2% -0.41% (p=0.004 n=49+48) SSA 3.05s ± 1% 3.05s ± 2% ~ (p=0.953 n=47+49) Flate 86.3ms ± 2% 85.9ms ± 2% -0.49% (p=0.011 n=49+48) GoParser 99.5ms ± 3% 99.3ms ± 3% ~ (p=0.394 n=48+49) Reflect 262ms ± 3% 261ms ± 3% ~ (p=0.354 n=47+49) Tar 81.4ms ± 3% 79.7ms ± 4% -1.98% (p=0.000 n=47+50) XML 133ms ± 3% 133ms ± 3% ~ (p=0.992 n=50+49) [Geo mean] 236ms 235ms -0.36% name old user-time/op new user-time/op delta Template 249ms ± 5% 242ms ± 7% -2.61% (p=0.000 n=48+50) Unicode 111ms ± 4% 111ms ± 6% ~ (p=0.407 n=46+47) GoTypes 753ms ± 2% 748ms ± 3% -0.65% (p=0.010 n=48+50) Compiler 3.28s ± 2% 3.27s ± 2% -0.40% (p=0.026 n=49+47) SSA 7.03s ± 2% 7.01s ± 3% ~ (p=0.154 n=45+50) Flate 154ms ± 3% 154ms ± 3% ~ (p=0.306 n=49+49) GoParser 180ms ± 4% 179ms ± 4% ~ (p=0.148 n=48+48) Reflect 427ms ± 2% 428ms ± 3% ~ (p=0.502 n=46+49) Tar 142ms ± 5% 135ms ± 9% -4.83% (p=0.000 n=46+50) XML 247ms ± 3% 247ms ± 4% ~ (p=0.921 n=49+49) [Geo mean] 426ms 422ms -0.92% Change-Id: I4746234439ddb9a7e5840fc783b8857da6a4a680
Updates #20307 With -c=2: name old time/op new time/op delta Template 140ms ± 3% 139ms ± 4% -1.06% (p=0.003 n=50+50) Unicode 81.1ms ± 4% 81.9ms ± 4% +0.96% (p=0.006 n=50+49) GoTypes 375ms ± 3% 374ms ± 3% ~ (p=0.094 n=48+48) Compiler 1.69s ± 2% 1.68s ± 2% -0.41% (p=0.004 n=49+48) SSA 3.05s ± 1% 3.05s ± 2% ~ (p=0.953 n=47+49) Flate 86.3ms ± 2% 85.9ms ± 2% -0.49% (p=0.011 n=49+48) GoParser 99.5ms ± 3% 99.3ms ± 3% ~ (p=0.394 n=48+49) Reflect 262ms ± 3% 261ms ± 3% ~ (p=0.354 n=47+49) Tar 81.4ms ± 3% 79.7ms ± 4% -1.98% (p=0.000 n=47+50) XML 133ms ± 3% 133ms ± 3% ~ (p=0.992 n=50+49) [Geo mean] 236ms 235ms -0.36% name old user-time/op new user-time/op delta Template 249ms ± 5% 242ms ± 7% -2.61% (p=0.000 n=48+50) Unicode 111ms ± 4% 111ms ± 6% ~ (p=0.407 n=46+47) GoTypes 753ms ± 2% 748ms ± 3% -0.65% (p=0.010 n=48+50) Compiler 3.28s ± 2% 3.27s ± 2% -0.40% (p=0.026 n=49+47) SSA 7.03s ± 2% 7.01s ± 3% ~ (p=0.154 n=45+50) Flate 154ms ± 3% 154ms ± 3% ~ (p=0.306 n=49+49) GoParser 180ms ± 4% 179ms ± 4% ~ (p=0.148 n=48+48) Reflect 427ms ± 2% 428ms ± 3% ~ (p=0.502 n=46+49) Tar 142ms ± 5% 135ms ± 9% -4.83% (p=0.000 n=46+50) XML 247ms ± 3% 247ms ± 4% ~ (p=0.921 n=49+49) [Geo mean] 426ms 422ms -0.92% Change-Id: I4746234439ddb9a7e5840fc783b8857da6a4a680 Reviewed-on: https://go-review.googlesource.com/43110 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Keith Randall <[email protected]>
Interesting, thanks for the traces.
What HW is this running on? I ask since lower level cache is probably
shared and switching HW threads isn't a bit deal if they are on the same
core.
That said the GC tracing is likely to do far more to smash the cache and
TLB than moving goroutines between cores on the same chip.
…On Tue, May 9, 2017 at 9:23 PM, GopherBot ***@***.***> wrote:
CL https://golang.org/cl/43110 mentions this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20307 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wnyhulqgn2rOZc4k9R1OOAbGhvWkKks5r4RGNgaJpZM4NWCiX>
.
|
It's running on my laptop (8 core 2.9 GHz Intel Core i7 macOS). Although presumably this happens on all hardware, not just mine. If it's easy to hack together a patch (unsightly or not) that experiments with memorizing G locations, I'm happy to benchmark it. The concurrent compiler seems like a good test case for it--numcpu long-running memory-heavy calculations.
Perhaps so. Seems worth an experiment, if feasible. |
@josharian Do you know why the buffered channel makes such a big difference? Is it because there's a lot of time wasted (for reasons not yet understood) when context-switching (unbuffered case), and there's less context switching needed when there's a buffered channel? |
@griesemer I think that's exactly it--those gaps in the first trace are (I suspect) caused by waiting for the scheduler to coordinate handing over work to a new worker, which requires scheduling both the producer and the consumer. The unbuffered channel means that (to a first approximation) there's always work available for a worker without needing the scheduler, thus no gaps. |
If I'm reading the goroutine line in the trace correctly there are runnable
goroutines that aren't being run during these gaps. If all the goroutines
are blocked on some channel then none would be runnable. We need to
understand this better regardless of whether the buffered channel work
around solves the compiler's issue.
As far as adding CPU affinity to our goroutines I'm a bit nervous. Not only
would it complicate the Go scheduler but more importantly it would increase
interaction between the Go and the various OS schedulers we support. There
are a lot of layers between a goroutine and the HW thread it runs on.
…On Wed, May 10, 2017 at 1:00 PM, Josh Bleecher Snyder < ***@***.***> wrote:
@griesemer <https://github.com/griesemer> I think that's exactly
it--those gaps in the first trace are (I suspect) caused by waiting for the
scheduler to coordinate handing over work to a new worker, which requires
scheduling both the producer and the consumer. The unbuffered channel means
that (to a first approximation) there's always work available for a worker
without needing the scheduler, thus no gaps.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20307 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wnx2RKbT7Z1D3WYlc_KKB-2oJeFW8ks5r4e0-gaJpZM4NWCiX>
.
|
That's correct.
I'm not suggesting CPU affinity. (At least, I don't think I am.) I'm suggesting that when we go into STW, we record which Ps are running which Gs, and put those Gs back on those Ps when we leave STW. That's it. But maybe that does involve breaking many layers of abstraction; if so, I understand your reluctance. |
We do that already, actually. STW leaves the per-P run queues in place. However, it does kick the currently running G to the end of the P's run queue (because it's using the regular preemption path), which is probably why it doesn't look like we have affinity in the execution traces. That might be relatively easy to hack in a fix for, but it's not going to matter much because we definitely don't maintain P-to-M affinity over STW, so the goroutine will almost certainly be resumed on a different OS thread even if it's still on the same P. That's harder to fix. |
yes, gc period goroutine shuffle mix with numa arch will make things event worse |
I generated a trace of the concurrent compiler backend, and see some unfortunate things in it.
Screenshot:
On the left hand side of this screenshot is the beginning of concurrent backend compilation. It continues off-screen to the right. The disconcerting things I see are:
For reference, the concurrency (fan-out, etc.) is all contained in a single function, compileFunctions, in pgen.go: https://github.com/golang/go/blob/master/src/cmd/compile/internal/gc/pgen.go#L255
Apologies if this is a duplicate. I can split this into two issues if that would be helpful.
cc @aclements @RLH
The text was updated successfully, but these errors were encountered: