-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: build fails when run via QEMU for linux/amd64 running on linux/arm64 #69255
Comments
I agree, it looks quite different. #68976 is very specific to pidfd use in os/syscall. This looks like some form of corruption. Do you know if this build is running a full Linux kernel in a VM, or using QEMU user mode Linux emulation? |
Notice
This seems like a sign extension issue when right shifting the packed value (See https://cs.opensource.google/go/go/+/master:src/runtime/lfstack.go;l=26-30, specifically I could imagine this being a code generation issue, or an issue in QEMU instruction emulation. cc @golang/compiler |
Does the same issue occur on Go 1.22? |
Yes. Indeed similar looking stacks for 1.21.13, 1.22.6, 1.23.0. Confirmed via:
|
I'm miles out of my depth here, but in case this is useful:
|
... but just to be super clear, I'm doing this via Docker: https://docs.docker.com/build/building/multi-platform/#qemu (so I'm actually unsure whether the host system |
I will see if I can reproduce when I get a chance. As a workaround, do you actually need to do linux-amd64 builds via QEMU emulation? Go can cross-compile on its own well, though perhaps you have cgo dependencies that make it difficult? |
We did end up with a two-stage Dockerfile where the builder is on the host platform, cross-compiles to the target platform without cgo, and then the second stage builds an image for the target platform. So while we are not blocked by this bug as there's a workaround, it's probably worth keeping it open for a fix. |
We did some investigation for: https://gitlab.com/qemu-project/qemu/-/issues/2560 and we suspect the fault comes down to aarch64 only having 47 or 39 bits of address space while the x86_64 GC assume 48 bits. Under linux-user emulation we are limited by the host address space. However I do note 48 was chosen for all arches so I wonder how this works on native aarch64 builds of go? |
Thanks for taking a look! cc @mknyszek who can speak more definitively about the address space layout, but I don't a smaller address space should be a problem. Go is pretty lenient about what it gets from mmap. I don't think we ever demand to be able to get a mapping with the 47th bit set. If you haven't already seen it, take a look at #69255 (comment). My suspicion is that this is some sort of sign-extension bug given the only difference between the expected and actual output is the value of the upper bits. |
That said, on further thought, the input address |
https://cs.opensource.google/go/go/+/master:src/runtime/malloc.go;l=149-210 this comment is about the heap address layout. We do use smaller address spaces on a few platforms, e.g. ios/arm64 is 40-bit, but the bits are set as constants so it would probably equally apply to native build and QEMU. (We could consider a qemu build tag?) |
Yes, we configure a larger heap address layout, but will anything break if the OS simply never returns addresses in the upper range? There isn't a case I can think of, provided our biggest mappings fit in the restricted address space. (Notice that amd64 configures 48-bit address space, even though Linux will only return addresses in the lower 47 bits) In gVisor, we would restrict the Go runtime to a 39-bit region of address space without problem or modification to the Go runtime. |
I think nothing would break if the OS never returns high addresses. The heapAddrBits is an upper limit, I think. |
Are there any runes for running the Go test cases (nothing jumped out at me). If we can trigger the failure with a direct testcase rather than deep in a docker image we can take a look at verifying the instruction behaviour. |
I have not personally reproduced, but in #69255 (comment) it is the compiler itself crashing, so theoretically it should reproduce by:
This will hopefully crash somewhere in the toolchain/compiler. That said, From outside QEMU (on any type of host), run
|
I stumbled upon this issue and found a solution (at least for my setup). docker:
runs-on: ubuntu-latest-arm64-kong # our private arm64 runner instance
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
install: true
- name: Build mailbox Container
uses: docker/build-push-action@v6
with:
context: .
file: cmd/Dockerfile
push: true
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64
tags: foo ARG BUILDPLATFORM
FROM --platform=$BUILDPLATFORM golang:1.23-bullseye AS build // this is really important
ARG TARGETARCH
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build \
-o /build/my-binary ./cmd/main.go
So what happens in an Then when checking the build progress you'll notice those instructions: [linux/arm64->amd64 build 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build
[linux/arm64 build 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build hope this helps |
Possibly related, Go is failing to compile any non-trivial application on a Vultr virtual machine running FreeBSD as a guest on FreeBSD 14.1 and 14.2-RELEASE, tested on 1.21 and latest, 1.23.4. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283314 On real hardware, no issues compiling or running; however when I move the binary to the VM, unpredictable panics happen and eventually a seg fault in the application (mox, a full stack mail server). I stumbled across an old discussion that raised using GODEBUG=asyncpreemptoff=1 and this does seem to have a positive effect on compilation; I'm running mox compiled with this option and so far so good but it is unclear to me what the overall impact of this is. |
This usually indicates that the virtual machine (or the OS running on it) has some bug in handling asynchronous signals. You could probably test it with a C program that sends itself a lot asynchronous signals. (See also #46272, and some test programs linked from it.) Are you also running an AMD64 VM instance on an ARM64 machine? |
The problem VM is an AMD64 VM instance on what appears to be AMD64; the provider is Vultr.com; the actual hw is said to be Xeon CPUs. Reported by the VM:
From #46272 I ran the @kostikbel 's The code runs without apparent issue (10 minutes each before I interrupted) on:
I first noted unusual behaviour on FreeBSD 14.1 on the VM in question with random panics that didn't make sense from a Go mail server (SMTP, IMAP etc) that I migrated in November to FreeBSD from Linux on that very same VM instance. There were no panics on Linux. cc @emaste @kostikbel from the runtime: possible memory corruption on FreeBSD issue. |
It sounds like you have more-or-less narrowed this down to a VMM bug on Vultr's side, likely related to save/restore of FPU state. If you have no already you should definitely take this up with them. |
Go version
go version go1.23.0 linux/arm64
Output of
go env
in your module/workspace:What did you do?
Given:
Running:
What did you see happen?
My setup here is my host machine is
linux/arm64
, Qemu installed, following the approach described at https://docs.docker.com/build/building/multi-platform/#qemu, to build forlinux/amd64
.This has definitely worked in the past which leads me to suggest that something other than Go has changed/been broken here. However I note the virtually identical call stack reported in #54104 hence raising here in the first instance.
What did you expect to see?
Successful run of
docker build
.The text was updated successfully, but these errors were encountered: