Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/trace: "preempted" StateTransition sometimes has Stack of single zeroed StackFrame #68090

Open
rhysh opened this issue Jun 20, 2024 · 5 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Jun 20, 2024

Go version

go version devel go1.23-477ad7dd51 Thu Jun 20 16:46:54 2024 +0000 darwin/arm64

Output of go env in your module/workspace:

$ go env -changed

$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/rhysh/Library/Caches/go-build'
GOENV='/Users/rhysh/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/rhysh/work/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/rhysh/work'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='devel go1.23-477ad7dd51 Thu Jun 20 16:46:54 2024 +0000'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/rhysh/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/pw/d_qmtcrd3vs0890gvmrq8qx80000gn/T/go-build2075620848=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

$ go test net/http -run='^$' -bench='BenchmarkClientServerParallel/64/h1' -benchtime=100ms -trace=/tmp/trace
goos: darwin
goarch: arm64
pkg: net/http
cpu: Apple M1
BenchmarkClientServerParallel/64/h1-8               1748             64048 ns/op           22199 B/op        131 allocs/op
--- BENCH: BenchmarkClientServerParallel/64/h1-8
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59008->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59007->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59010->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59013->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59014->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59015->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59016->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59017->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59018->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59020->127.0.0.1:58995: read: connection reset by peer
        ... [output truncated]
PASS
ok      net/http        0.404s
$ go tool trace -d=1 /tmp/trace | grep -B 2 -A 2 '@ 0x0' | head -n 30
M=6163918848 P=0 G=25 StateTransition Time=775708360289024 Resource=Goroutine(25) Reason="preempted" GoID=25 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--
M=6163918848 P=0 G=26 StateTransition Time=775708372081984 Resource=Goroutine(26) Reason="preempted" GoID=26 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--
M=6162198528 P=7 G=2881 StateTransition Time=775708382371201 Resource=Goroutine(2881) Reason="preempted" GoID=2881 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--

What did you see happen?

Some StateTransition Events include a Stack and StateTransition.Stack that are not equal to NoStack, but which also don't contain a stack from the Event's goroutine. Instead, they yield a single zeroed StackFrame (PC of 0x0, Line of 0, File and Func of "").

I've only seen this on Running->Runnable transitions, with Reason="preempted".

It's also present in go1.22.4.

Here's the sort of stack I'd expect to see from that execution trace's view of goroutines 25, 26, and 2881:

$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=25 StateTransition/,/^M/ p' | head -n 50
[snip]
M=6162198528 P=4 G=25 StateTransition Time=775708359254528 Resource=Goroutine(25) Reason="system goroutine wait" GoID=25 Running->Waiting
TransitionStack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

Stack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

[snip]
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=26 StateTransition/,/^M/ p' | head -n 50
[snip]
M=8191703744 P=2 G=26 StateTransition Time=775708359184320 Resource=Goroutine(26) Reason="system goroutine wait" GoID=26 Running->Waiting
TransitionStack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

Stack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

[snip]
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=2881 StateTransition/,/^M/ p' | head -n 50
M=6163345408 P=4 G=2881 StateTransition Time=775708380563968 Resource=Goroutine(2881) Reason="sync" GoID=2881 Running->Waiting
TransitionStack=
        sync.(*Mutex).Lock @ 0x100bf30ff
                /usr/local/go/src/sync/mutex.go:92
        sync.(*Pool).pinSlow @ 0x100bf3094
                /usr/local/go/src/sync/pool.go:227
        sync.(*Pool).pin @ 0x100bf301b
                /usr/local/go/src/sync/pool.go:220
        sync.(*Pool).Get @ 0x100bf2d6f
                /usr/local/go/src/sync/pool.go:135
        fmt.newPrinter @ 0x100c4dff3
                /usr/local/go/src/fmt/print.go:152
        fmt.Fprintf @ 0x100c4e467
                /usr/local/go/src/fmt/print.go:223
        net/http.(*Request).write @ 0x100e4aacb
                /usr/local/go/src/net/http/request.go:680
        net/http.(*persistConn).writeLoop @ 0x100e73df7
                /usr/local/go/src/net/http/transport.go:2522

Stack=
        sync.(*Mutex).Lock @ 0x100bf30ff
                /usr/local/go/src/sync/mutex.go:92
        sync.(*Pool).pinSlow @ 0x100bf3094
                /usr/local/go/src/sync/pool.go:227
        sync.(*Pool).pin @ 0x100bf301b
                /usr/local/go/src/sync/pool.go:220
        sync.(*Pool).Get @ 0x100bf2d6f
                /usr/local/go/src/sync/pool.go:135
        fmt.newPrinter @ 0x100c4dff3
                /usr/local/go/src/fmt/print.go:152
        fmt.Fprintf @ 0x100c4e467
                /usr/local/go/src/fmt/print.go:223
        net/http.(*Request).write @ 0x100e4aacb
                /usr/local/go/src/net/http/request.go:680
        net/http.(*persistConn).writeLoop @ 0x100e73df7
                /usr/local/go/src/net/http/transport.go:2522

[snip]

What did you expect to see?

I expected the stack to be trace.NoStack when no stack was available, or for the stack to contain PC/Func/File/Line corresponding to code that the goroutine had on its stack. I should not see PC of 0x0.

CC @mknyszek @golang/runtime

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 20, 2024
@mknyszek
Copy link
Contributor

I have a suspicion as to how this is happening, but not a complete picture yet.

The problematic transition you point out also happens from a thread stack. The two cases where such a transition may appear are gopreempt_m and goyield_m. The former is only called from newstack while the latter is only called from goyield which is on the sema path (for a direct handoff).

I suspect that in one of these paths, gp.sched is still empty somehow, though I'm not sure how that's possible. That's at least a case where traceStack may produce a length-1 buffer with a single zero PC in it.

@mknyszek
Copy link
Contributor

#68093 may be related, but I suspect not.

@mknyszek
Copy link
Contributor

mknyszek commented Jun 20, 2024

Here's a simple reproducer:

package main

import (
	"log"
	"os"
	"runtime"
	"runtime/trace"
)

func main() {
	f, err := os.Create("trace.out")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	trace.Start(f)

	go func() {
		for {
			// Non-stop preemption points.
			g()
		}
	}()

	runtime.GC()
	trace.Stop()
}

//go:noinline
func g() {

}

It disproves my theory about gp.sched being empty. Also, funnily enough, the goroutine created in main always shows up fine -- it's the GC mark worker that's the problem.

@mknyszek
Copy link
Contributor

mknyszek commented Jun 20, 2024

OK, I figured it out. It's that the stack trace has exactly 1 frame in it, but the skip count is also 1.

In the reproducer, the victim goroutine is the GC mark worker (just like in the original post) and when I lower the skip count from 1 to 0, I see:

M=683710 P=2 G=68 StateTransition Time=253871909721536 Resource=Goroutine(68) Reason="preempted" GoID=68 Running->Runnable
TransitionStack=
        runtime.gcMarkDone @ 0x416865
                /usr/local/google/home/mknyszek/work/go-1/src/runtime/mgc.go:824

Stack=
        runtime.gcMarkDone @ 0x416865
                /usr/local/google/home/mknyszek/work/go-1/src/runtime/mgc.go:824

Unfortunately, there's still the question as to why the bottom frame in the mark worker isn't showing up.

@joedian joedian added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 25, 2024
@griesemer griesemer added this to the Go1.24 milestone Jun 26, 2024
@gopherbot gopherbot modified the milestones: Go1.24, Go1.25 Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

6 participants