Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: multi-arch build via qemu fails to exec go binary [1.23 backport] #69259

Closed
prattmic opened this issue Sep 4, 2024 · 10 comments
Closed
Labels
CherryPickApproved Used during the release process for point releases compiler/runtime Issues related to the Go compiler and/or runtime.
Milestone

Comments

@prattmic
Copy link
Member

prattmic commented Sep 4, 2024

I am requesting issue #68976 to be considered for backport to the next 1.23 minor release.

We already had a backport (#68995) to make failure to start the telemetry subprocess non-fatal. Unfortunately we misunderstood the problem (thinking that telemetry was special), when the real problem here is that any use of os.StartProcess / os/exec is broken under QEMU user mode >= 7.2, <8.0.

FWIW, QEMU user mode is not a first class supported OS. As I understand it, only real Linux is first class. QEMU user mode is an alternative Linux implementation. There is a workaround: use QEMU older than 7.2 or 8.0 or newer. That said, Debian bookworm has QEMU 7.2 in its apt repository, which I believe is why so many people are using such a narrow window of QEMU releases.

@prattmic prattmic added CherryPickCandidate Used during the release process for point releases compiler/runtime Issues related to the Go compiler and/or runtime. labels Sep 4, 2024
@prattmic prattmic added this to the Go1.23.2 milestone Sep 4, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/592078 mentions this issue: os: add clone(CLONE_PIDFD) check to pidfd feature check

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/612218 mentions this issue: [release-branch.go1.23] os: add clone(CLONE_PIDFD) check to pidfd feature check

@prattmic prattmic reopened this Sep 12, 2024
@aktau
Copy link
Contributor

aktau commented Sep 26, 2024

We've noticed the new check sometimes hanging in production. On this line:

_, _, errno = Syscall6(SYS_WAITID, _P_PIDFD, uintptr(pidfd), 0, WEXITED, 0, 0)

For some reason, sometimes on QEMU the spawned process doesn't actually exit (or wait fails to return from a wait). I wonder if this has also been observed by others.

@tianon
Copy link
Contributor

tianon commented Oct 2, 2024

@aktau huh, I wonder if that's the same hang I was seeing in #68976 (comment) 🤔

@aktau
Copy link
Contributor

aktau commented Oct 2, 2024

It's possible. Try sending a SIGQUIT if you see it again to see where it's stuck.

@mknyszek
Copy link
Contributor

mknyszek commented Oct 2, 2024

@prattmic just clarifying, should the upstream issue be closed, so the backport can continue? Is there any reason not to backport this? I moved this into "Waiting For Info" so we can figure out how to move forward. Thanks.

@prattmic
Copy link
Member Author

prattmic commented Oct 2, 2024

Yes, the upstream bug was intended to be closed. This CL should still be backported.

With regards to #69259 (comment), that may be a Google-specific issue, but I've proactively sent https://go.dev/cl/617417 since the fix is semantically equivalent anyway. I think we should backport that CL as well.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/617716 mentions this issue: [release-branch.go1.23] syscall: use SYS_EXIT_GROUP in CLONE_PIDFD feature check child

armfazh added a commit to armfazh/circl that referenced this issue Oct 7, 2024
armfazh added a commit to cloudflare/circl that referenced this issue Oct 8, 2024
@cherrymui cherrymui added the CherryPickApproved Used during the release process for point releases label Oct 9, 2024
@gopherbot gopherbot removed the CherryPickCandidate Used during the release process for point releases label Oct 9, 2024
@cherrymui
Copy link
Member

Approved for backport.

gopherbot pushed a commit that referenced this issue Oct 11, 2024
…ture check

clone(CLONE_PIDFD) was added in Linux 5.2 and pidfd_open was added in
Linux 5.3. Thus our feature check for pidfd_open should be sufficient to
ensure that clone(CLONE_PIDFD) works.

Unfortuantely, some alternative Linux implementations may not follow
this strict ordering. For example, QEMU 7.2 (Dec 2022) added pidfd_open,
but clone(CLONE_PIDFD) was only added in QEMU 8.0 (Apr 2023).

Debian bookworm provides QEMU 7.2 by default.

For #68976.
Fixes #69259.

Change-Id: Ie3f3dc51f0cd76944871bf98690abf59f68fd7bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/592078
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
(cherry picked from commit 7a5fc9b)
Reviewed-on: https://go-review.googlesource.com/c/go/+/612218
gopherbot pushed a commit that referenced this issue Oct 11, 2024
…ature check child

Inside Google we have seen issues with QEMU user mode failing to wake a
parent waitid when this child exits with SYS_EXIT. This bug appears to
not affect SYS_EXIT_GROUP.

It is currently unclear if this is a general QEMU or specific to
Google's configuration, but SYS_EXIT and SYS_EXIT_GROUP are semantically
equivalent here, so we can use the latter here in case this is a general
QEMU bug.

For #68976.
For #69259.

Change-Id: I34e51088c9a6b7493a060e2a719a3cc4a3d54aa0
Reviewed-on: https://go-review.googlesource.com/c/go/+/617417
Reviewed-by: Ian Lance Taylor <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
(cherry picked from commit 47a9935)
Reviewed-on: https://go-review.googlesource.com/c/go/+/617716
@cherrymui
Copy link
Member

CLs are submitted. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CherryPickApproved Used during the release process for point releases compiler/runtime Issues related to the Go compiler and/or runtime.
Projects
None yet
Development

No branches or pull requests

6 participants