Skip to content

Comments

Add initial support for RISC-V architecture#963

Merged
mvo5 merged 1 commit intoosbuild:mainfrom
imguoguo:upstream
Aug 18, 2025
Merged

Add initial support for RISC-V architecture#963
mvo5 merged 1 commit intoosbuild:mainfrom
imguoguo:upstream

Conversation

@imguoguo
Copy link
Contributor

Testing

Native Builds (Successful)

The build is successful with the following command on riscv64 physical machine (Note: Since riscv64 is not an officially supported architecture in Fedora, Containerfile changes to use a community-maintained base image: docker.io/fedorariscv/base:42. In the future, when fedora is officially supported soon, it can use registry.fedoraproject.org/fedora:42 directly. More details can be found in the following commit: link):

podman run \
    --rm \
    -it \
    --privileged \
    --pull=newer \
    --security-opt label=type:unconfined_t \
    -v $(pwd)/output:/output \
    -v /var/lib/containers/storage:/var/lib/containers/storage \
    docker.io/fedorariscv/bootc-image-builder:latest \
    --local \
    --rootfs ext4 \
    --progress verbose \
    --type qcow2 \
    --target-arch riscv64 \
    docker.io/fedorariscv/bootc:42

bootupctl Limitations

Currently, bootupctl cannot be used for the installation. Although rust-bootupd has merged riscv64 support in its codebase, a new version has not been tagged and released. This results in the following error:

> bootupctl backend install --write-uuid --device /dev/loop0 /run/osbuild/mounts
No components available for this platform.

A manual workaround is required: manually place grubriscv64.efi and grub.cfg in /boot/efi/EFI/fedora directory, place startup.nsh with content FS0:\EFI\Fedora\grubriscv64.efi in /boot/efi directory and another grub.cfg in /boot directory to make the image bootable.

Validation on RISC-V Hardware

A bootable bootc image was successfully created on a riscv64 physical machine. It can be launched using QEMU after installing the necessary packages (dnf install -y edk2-riscv64 qemu-system-riscv).

QEMU Launch Command:

qemu-system-riscv64 -M virt,pflash0=pflash0,acpi=off \
    -m 6G -smp 4 \
    -nographic \
    -blockdev node-name=pflash0,read-only=on,driver=qcow2,file.driver=file,file.filename=/usr/share/edk2/riscv/RISCV_VIRT_CODE.qcow2 \
    -device virtio-net-device,netdev=usernet \
    -netdev user,id=usernet \
    -drive file=Fedora-Minimal-bootc-42-20250617000000.riscv64.QEMU.Generic.qcow2,id=hd0 -device virtio-blk-device,drive=hd0

For easy debugging, the default account and password is root / riscv.

Image Download Link: Fedora-Minimal-bootc-42-20250617000000.riscv64.QEMU.Generic.qcow2.gz

fastfetch information

[root@fedora ~]# fastfetch
             .',;::::;,'.                 root@fedora
         .';:cccccccccccc:;,.             -----------
      .;cccccccccccccccccccccc;.          OS: Fedora Linux 42 (Adams Prerelease) riscv64
    .:cccccccccccccccccccccccccc:.        Host: riscv-virtio,qemu
  .;ccccccccccccc;.:dddl:.;ccccccc;.      Kernel: Linux 6.14.0-63.fc42.riscv64
 .:ccccccccccccc;OWMKOOXMWd;ccccccc:.     Uptime: 1 min
.:ccccccccccccc;KMMc;cc;xMMc;ccccccc:.    Packages: 498 (rpm)
,cccccccccccccc;MMM.;cc;;WW:;cccccccc,    Shell: bash 5.2.37
:cccccccccccccc;MMM.;cccccccccccccccc:    Terminal: vt220
:ccccccc;oxOOOo;MMM000k.;cccccccccccc:    CPU: rv64gch (4)
cccccc;0MMKxdd:;MMMkddc.;cccccccccccc;    Memory: 280.27 MiB / 5.76 GiB (5%)
ccccc;XMO';cccc;MMM.;cccccccccccccccc'    Swap: Disabled
ccccc;MMo;ccccc;MMW.;ccccccccccccccc;     Disk (/): 6.84 MiB / 6.84 MiB (100%) - overlay [Read-only]
ccccc;0MNc.ccc.xMMd;ccccccccccccccc;      Disk (/sysroot): 3.21 GiB / 8.28 GiB (39%) - ext4 [Read-only]
cccccc;dNMWXXXWM0:;cccccccccccccc:,       Local IP (eth0): 10.0.*.*/24
cccccccc;.:odl:.;cccccccccccccc:,.        Locale: C.UTF-8
ccccccccccccccccccccccccccccc:'.
:ccccccccccccccccccccccc:;,..
 ':cccccccccccccccc::;,.
[root@fedora ~]#

Cross-Architecture Builds (Failed)

Attempting to build a riscv64 image from an x86_64 host currently fails(image builder's base image: registry.fedoraproject.org/fedora:42). The build command:

podman run \
    --rm \
    -it \
    --privileged \
    --pull=newer \
    --security-opt label=type:unconfined_t \
    -v $(pwd)/output:/output \
    -v /var/lib/containers/storage:/var/lib/containers/storage \
    docker.io/fedorariscv/bootc-image-builder:latest \
    --local \
    --rootfs ext4 \
    --progress verbose \
    --type qcow2 \
    --target-arch riscv64 \
    docker.io/fedorariscv/bootc:42

with an error messgaes:

Error: failed to get new shm lock manager: failed to create 2048 locks in /libpod_lock: operation not supported

Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.bootc.install-to-filesystem", line 53, in <module>
    r = main(args["options"], args["inputs"], args["paths"])
  File "/run/osbuild/bin/org.osbuild.bootc.install-to-filesystem", line 48, in main
    subprocess.run(pargs, env=env, check=True)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/subprocess.py", line 577, in run
    raise CalledProcessError(retcode, process.args,
                             output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bootc', 'install', 'to-filesystem', '--source-imgref', 'containers-storage:[overlay@/run/osbuild/containers/storage+/run/containers/storage]2c1b8672d2be7f49f7c21520b14c438bef93b981c2a1768db9004150bed33a24', '--skip-fetch-check', '--generic-image', '--karg', 'rw', '--karg', 'console=tty0', '--karg', 'console=ttyS0', '--target-imgref', 'docker.io/fedorariscv/bootc:42', '/run/osbuild/mounts']' returned non-zero exit status 1.

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks nice! Ideally we would have a proper test for this but given that its not an official fedora arch yet it fine.

What diff do I need to apply to reproduce the cross-arch build failure btw? Does the normal "aarch64" (arm64) cross arch build work for you? The error indicates it might be related to the environment this runs in but I would like to reproduce to be sure.

@imguoguo
Copy link
Contributor Author

Thank you! This looks nice! Ideally we would have a proper test for this but given that its not an official fedora arch yet it fine.

What diff do I need to apply to reproduce the cross-arch build failure btw? Does the normal "aarch64" (arm64) cross arch build work for you? The error indicates it might be related to the environment this runs in but I would like to reproduce to be sure.

Thank you for taking the time to review this!

My apologies, I should have tested aarch64 first. I just ran a test for the aarch64 cross-build, and it seems to fail as well, though with a different error message. It might be due to my local environment or incorrect parameters on my end. I will continue to investigate this.

Here is the command I used:

sudo podman run \
   --rm \
   -it \
   --privileged \
   --pull=newer \
   --security-opt label=type:unconfined_t \
   -v $(pwd)/output:/output \
   -v /var/lib/containers/storage:/var/lib/containers/storage \
   quay.io/centos-bootc/bootc-image-builder:latest \
   --local \
   --rootfs ext4 \
   --progress verbose \
   --type qcow2 \
   --target-arch aarch64 \
   quay.io/fedora/fedora-bootc:42-aarch64

And the error was:

Installing image: docker://quay.io/fedora/fedora-bootc:42-aarch64
Initializing ostree layout
ERROR Installing to filesystem: Creating ostree deployment: Creating importer: Function not implemented (os error 38)
Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.bootc.install-to-filesystem", line 53, in <module>
    r = main(args["options"], args["inputs"], args["paths"])
  File "/run/osbuild/bin/org.osbuild.bootc.install-to-filesystem", line 48, in main
    subprocess.run(pargs, env=env, check=True)
  File "/usr/lib64/python3.13/subprocess.py", line 577, in run
    raise CalledProcessError(retcode, process.args,
                             output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bootc', 'install', 'to-filesystem', '--source-imgref', 'containers-storage:[overlay@/run/osbuild/co
ntainers/storage+/run/containers/storage]fe87f59ec47e4d29b7479d84f16644f2578cbdb24de2b9097cd3065c603c75f0', '--skip-fetch-check', '--generic-
image', '--karg', 'rw', '--karg', 'console=tty0', '--karg', 'console=ttyS0', '--target-imgref', 'quay.io/fedora/fedora-bootc:42-aarch64', '/r
un/osbuild/mounts']' returned non-zero exit status 1.

To reproduce the original riscv64 cross-build failure, the commit in this PR is all you need. The key is to ensure the command uses the docker.io/fedorariscv/bootc:42 bootc container image and set target architecture to riscv64. Run podman pull docker.io/fedorariscv/bootc:42 first, then build bootc-image-builder with the commit in this PR, then run

podman run \
    --rm \
    -it \
    --privileged \
    --pull=newer \
    --security-opt label=type:unconfined_t \
    -v $(pwd)/output:/output \
    -v /var/lib/containers/storage:/var/lib/containers/storage \
    docker.io/fedorariscv/bootc-image-builder:latest \ # need to be changed here
    --local \
    --rootfs ext4 \
    --type qcow2 \
    --target-arch riscv64 \
    docker.io/fedorariscv/bootc:42

The most important arguments are:

  --rootfs ext4 \
  --type qcow2 \
  --target-arch riscv64 \
  docker.io/fedorariscv/bootc:42

Thanks again for your help!

@mvo5
Copy link
Contributor

mvo5 commented Jun 24, 2025

Thanks for the extra info, that was most helpful! I can reproduce the issue now:
bib-riscv64-cross-build-test.diff.txt

(interestingly the cross-arch aarch64 test is working for f42)

With the above diff applied I can run the cross build in pytest and get:

$ sudo pytest -s -v ./test/test_build_cross.py::test_image_boots_cross[container_ref=docker.io/fedorariscv/bootc:42,image=raw,rootfs=ext4,target_arch=riscv64]
...
Installing image: docker://docker.io/fedorariscv/bootc:42
Unsupported syscall: 435
Unsupported syscall: 435
Unsupported syscall: 435
Unsupported syscall: 435
Unsupported syscall: 435
Initializing ostree layout
Unsupported syscall: 293
Unsupported ioctl: cmd=0xffffffffc0046686
ERROR Installing to filesystem: Creating ostree deployment: Creating imgstorage: Initializing images: Subprocess failed: ExitStatus(unix_wait_status(32000))
Unsupported syscall: 293
Unsupported target signal #61, ignored
Unsupported target signal #62, ignored
Unsupported target signal #63, ignored
Unsupported target signal #64, ignored
Unsupported syscall: 435
Unsupported syscall: 277
Error: failed to get new shm lock manager: failed to create 2048 locks in /libpod_lock: operation not supported

The ioctl failure cmd=0xffffffffc0046686 is https://docs.kernel.org/filesystems/fsverity.html#fs-ioc-measure-verity
The syscalls are:

$ scmp_sys_resolver -a riscv64 277
seccomp
$ scmp_sys_resolver -a riscv64 293
rseq
$ scmp_sys_resolver -a riscv64 435
clone3
$ scmp_sys_resolver -a riscv64 430
fsopen

it looks like the error happens in https://github.com/containers/podman/blob/main/libpod/lock/shm/shm_lock.c#L76

[edit:
with a small test program and

~/devel/containers/podman$ git diff && cat cmd/1/main.go && CGO_ENABLED=1 GOARCH=riscv64 go build ./cmd/1 &&  ./1
diff --git a/libpod/lock/shm/shm_lock.c b/libpod/lock/shm/shm_lock.c
index 397fb8993..61bdb9b77 100644
--- a/libpod/lock/shm/shm_lock.c
+++ b/libpod/lock/shm/shm_lock.c
@@ -9,6 +9,8 @@
 #include <sys/types.h>
 #include <unistd.h>
 
+#include <stdio.h>
+
 #include "shm_lock.h"
 
 // Compute the size of the SHM struct
@@ -169,6 +171,8 @@ shm_struct_t *setup_lock_shm(char *path, uint32_t num_locks, int *error_code) {
   // Initialize the mutex that protects the bitmaps using the mutex attributes
   ret_code = pthread_mutex_init(&(shm->segment_lock), &attr);
   if (ret_code != 0) {
+    printf("bad pthread_mutex_init()\n");
+
     *error_code = -1 * ret_code;
     goto CLEANUP_FREEATTR;
   }
---
package main

import (
	"fmt"

	"github.com/containers/podman/v4/libpod/lock/shm"
)

func main() {
	l, err := shm.CreateSHMLock("/libpod_test3", 2048)
	fmt.Println(l, err)
}
---
bad pthread_mutex_init()
<nil> failed to create 2048 locks in /libpod_test3: operation not supported

so that is the culprit.

[edit2: it looks like the riscv64 implementation of pthread_mutex_init needs clone3 and qemu-user does not implement clone3 (yet), only the plain old clone()]

@mvo5
Copy link
Contributor

mvo5 commented Jun 26, 2025

Adding (basic) clone3() to qemu-user is straightforward (c.f.
0001-linux-user-add-basic-clone3-support.patch.txt)
(was a red herring)

However it still fails with in my test, it seems https://github.com/containers/podman/blob/main/libpod/lock/shm/shm_lock.c#L153 is causing this, I don't see anything from qemu-user in the "unimp" logs, so its a bit unclear where this is unsupported.

[edit: it seems its the combination of PTHREAD_PROCESS_SHARED+PTHREAD_MUTEX_ROBUST that is not liked , which is looks like https://github.com/bminor/glibc/blob/glibc-2.41/nptl/pthread_mutex_init.c#L96 which is because it the set_robust_list syscall is not available under qemu-user https://github.com/qemu/qemu/blob/v10.0.2/linux-user/syscall.c#L12930 (and glibc6 checks for this via https://github.com/bminor/glibc/blob/glibc-2.41/sysdeps/nptl/dl-tls_init_tp.c#L93)
]
[edit2: its still mysterious because https://github.com/bminor/glibc/blob/glibc-2.41/sysdeps/unix/sysv/linux/riscv/kernel-features.h#L25 should unset this (unless its build against old kernel header but accoding to the builddlog its a 6.x linux-libc-dev that was used to build against so this #if should not have triggered
]
[edit3: what is even more curious is that the test program works with arm qemu-user, even though arm also has a #if __LINUX_KERNEL_VERSION < 0x020620 (but a much lower version]
[edit4: so for some reason my libc6 does not have __ASSUME_SET_ROBUST_LIST set:
$ for libc in /usr/lib/*/libc.so.6; do echo "$libc:"; strings "$libc"| grep -n __nptl_set_robust_list_avail; done
/usr/lib/aarch64-linux-gnu/libc.so.6:
/usr/lib/arm-linux-gnueabihf/libc.so.6:
/usr/lib/arm-linux-gnueabi/libc.so.6:
/usr/lib/riscv64-linux-gnu/libc.so.6:
1963:__nptl_set_robust_list_avail
/usr/lib/x86_64-linux-gnu/libc.so.6:
]

@mvo5
Copy link
Contributor

mvo5 commented Jun 26, 2025

So in summary, I am not sure if we can fix this with qemu-user (we could do what Colin suggested and do full-system emulation which would make this problem go away but it would be slower).

The issue is that

  1. __ASSUME_SET_ROBUST_LIST is not set during the glibc build (on riscv)
 $ podman run -it --arch riscv64 docker.io/fedorariscv/toolbox:42 bash -c "sudo dnf install -y binutils && strings /lib64/lp64d/libc.so.6|grep __nptl_set_robust_list_avail"
...
Complete!
__nptl_set_robust_list_avail
  1. this triggers a syscall for "get/set_robust_list" which is not supported/supportable under qemu-user
  2. so either the riscv libc would have to be compiled differently or podman needs a fallback

Interestingly on aarch64 on fedora __ASSUME_SET_ROBUST_LIST is set

$ podman run -it --arch aarch64 fedora:42 bash -c "sudo dnf install -y binutils && strings /*/*/libc.so.6|grep __nptl_set_robust_list_avail"
...
Complete!

no output from strings

[edit: I will try to make a case on the qemu-user mailingst ist just implement set_robust as a NOP instead of the current -ENOSYS based on that on x86_64 and aarch64 this is effectively the behavior as the result of the syscall is never checked with __ASSUME_SET_ROBUST_LIST. The point of this patch is to avoid deadlocks which on a real system require a reboot but on qemu-user it just needs killing the qemu process so I would argue that having the support is better than the (rare) case when its not fully working (and then it can be dealt with with kill). But lets see how this goes :)

mvo5 added a commit to mvo5/osbuild that referenced this pull request Jun 26, 2025
This commit drops the `QEMU_LOG=+unimp` and replaces it with
`QEMU_LOG=unimp`. The `+` format does not work and we found
this in osbuild/bootc-image-builder#963 (comment)
thozza pushed a commit to osbuild/osbuild that referenced this pull request Jun 27, 2025
This commit drops the `QEMU_LOG=+unimp` and replaces it with
`QEMU_LOG=unimp`. The `+` format does not work and we found
this in osbuild/bootc-image-builder#963 (comment)
@mvo5
Copy link
Contributor

mvo5 commented Jul 4, 2025

Fwiw, I submited an RFC patch to the qemu-devel mailinglist that return "0" unconditionally for the set_robust_list() syscall. When I do this locally the podman locking works so I'm 90% confident that it solves the problem. I have no sense in how likely it is that the patch gets accepted though (my guess is not very likely), its arguably wrong but also in practice glibc does not care - it is a risk for non-glibc threading implementations though which would be a good reason to reject it (otoh most languages like rust/java use glibc it seems, go does not use set_robust_list and musl https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutex_trylock.c#n30 also does not check the return code ]

[edit: https://lists.nongnu.org/archive/html/qemu-devel/2025-07/msg01553.html]

@imguoguo
Copy link
Contributor Author

imguoguo commented Jul 4, 2025

Fwiw, I submited an RFC patch to the qemu-devel mailinglist that return "0" unconditionally for the set_robust_list() syscall. When I do this locally the podman locking works so I'm 90% confident that it solves the problem. I have no sense in how likely it is that the patch gets accepted though (my guess is not very likely), its arguably wrong but also in practice glibc does not care - it is a risk for non-glibc threading implementations though which would be a good reason to reject it (otoh most languages like rust/java use glibc it seems, go does not use set_robust_list and musl https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutex_trylock.c#n30 also does not check the return code ]

[edit: https://lists.nongnu.org/archive/html/qemu-devel/2025-07/msg01553.html]

Thank you so much for digging into this so deeply! To be honest, this level of system debugging is beyond my current knowledge, so I apologize that I can't be of more help with the QEMU patch discussion. I've pretty much reached the limit of my abilities after submitting the initial PR. Thank you again for moving this forward!

@mvo5
Copy link
Contributor

mvo5 commented Jul 4, 2025

Fwiw, I submited an RFC patch to the qemu-devel mailinglist that return "0" unconditionally for the set_robust_list() syscall. When I do this locally the podman locking works so I'm 90% confident that it solves the problem. I have no sense in how likely it is that the patch gets accepted though (my guess is not very likely), its arguably wrong but also in practice glibc does not care - it is a risk for non-glibc threading implementations though which would be a good reason to reject it (otoh most languages like rust/java use glibc it seems, go does not use set_robust_list and musl https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutex_trylock.c#n30 also does not check the return code ]
[edit: https://lists.nongnu.org/archive/html/qemu-devel/2025-07/msg01553.html]

Thank you so much for digging into this so deeply! To be honest, this level of system debugging is beyond my current knowledge, so I apologize that I can't be of more help with the QEMU patch discussion. I've pretty much reached the limit of my abilities after submitting the initial PR. Thank you again for moving this forward!

No worries, this is deep stuff! You PR and help with the reproducer was super valuable, thanks for that!

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

This PR is stale because it had no activity for the past 30 days. Remove the "Stale" label or add a comment, otherwise this PR will be closed in 7 days.

@github-actions github-actions bot added the Stale Issue or PR with no activity for extended period of time label Aug 4, 2025
@github-actions
Copy link

This PR was closed because it has been stalled for 30+7 days with no activity.

@github-actions github-actions bot closed this Aug 12, 2025
@mvo5 mvo5 reopened this Aug 18, 2025
Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still like to merge this, it will not work in the cross-arch case because of the discussed qemu-user limitations (might be worth adding the link to the qemu-user patch in the commit message of the patch and/or in a comment in the code). But native building (or full system emulation) should work so lets include it.

@mvo5 mvo5 enabled auto-merge August 18, 2025 15:53
@mvo5 mvo5 added this pull request to the merge queue Aug 18, 2025
Merged via the queue into osbuild:main with commit e5a3379 Aug 18, 2025
21 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stale Issue or PR with no activity for extended period of time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants