-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't start nginx image with runc and user namespaces (works with crun) #4475
Comments
Could you confirm that it can works with crun? root@iZj6cgggwb62cxurec74geZ:/opt/bb# crun/crun run -d test
cat: can't open '/dev/stderr': Permission denied
root@iZj6cgggwb62cxurec74geZ:/opt/bb# crun/crun delete test
root@iZj6cgggwb62cxurec74geZ:/opt/bb# /root/go/src/github.com/opencontainers/runc/runc run -d test
cat: can't open '/dev/stderr': Permission denied
root@iZj6cgggwb62cxurec74geZ:/opt/bb# /root/go/src/github.com/opencontainers/runc/runc delete test
root@iZj6cgggwb62cxurec74geZ:/opt/bb# Maybe the core reason is that the permission of the dr-x------ 2 root root 4 Oct 26 01:10 .
dr-xr-xr-x 9 root root 0 Oct 26 01:10 ..
lr-x------ 1 root root 64 Oct 26 01:10 0 -> pipe:[2264321]
l-wx------ 1 root root 64 Oct 26 01:10 1 -> pipe:[2264322]
l-wx------ 1 root root 64 Oct 26 01:10 2 -> pipe:[2264323]
lr-x------ 1 root root 64 Oct 26 01:10 3
ls: /proc/self/fd/3: cannot read link: No such file or directory But in the host: dr-x------ 2 root root 4 Oct 26 09:28 .
dr-xr-xr-x 9 root root 0 Oct 26 09:28 ..
lrwx------ 1 root root 64 Oct 26 09:28 0 -> /dev/pts/0
lrwx------ 1 root root 64 Oct 26 09:28 1 -> /dev/pts/0
lrwx------ 1 root root 64 Oct 26 09:28 2 -> /dev/pts/0
lr-x------ 1 root root 64 Oct 26 09:28 3 -> /proc/629444/fd The content of config.json:
{
"ociVersion": "1.0.2-dev",
"process": {
"terminal": false,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"cat", "/dev/stderr"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"ambient": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"noNewPrivileges": true
},
"root": {
"path": "rootfs",
"readonly": true
},
"hostname": "runc",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/sys/fs/cgroup",
"type": "cgroup",
"source": "cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
}
],
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
]
},
"namespaces": [
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "cgroup"
},
{
"type": "user"
}
],
"uidMappings": [{"hostID": 100000, "containerID": 0, "size": 65534}],
"gidMappings": [{"hostID": 100000, "containerID": 0, "size": 65534}],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
|
We should let stdio could be accessed in user ns container. Please see opencontainers#4475 Because the default permission of stdio is 0o700, other user can't access them. If we don't change the permission to 0o666, We'll get an error msg if we access stdio in a userns contaienr: ***: /dev/std***: Permission denied. Signed-off-by: lifubang <[email protected]>
We should let stdio could be accessed in userns container. Please see opencontainers#4475 Because the default permission of stdio is 0o700, other user can't access them. If we don't change the permission to 0o666, We'll get an error msg if we access stdio in a userns contaienr: ***: /dev/std***: Permission denied. Signed-off-by: lifubang <[email protected]>
We should let stdio could be accessed in userns container. Please see opencontainers#4475 Because the default permission of stdio is 0o700, other user can't access them. If we don't change the permission to 0o666, We'll get an error msg if we access stdio in a userns contaienr: ***: /dev/std***: Permission denied. Signed-off-by: lifubang <[email protected]>
This is not the real reason, the core reason is that the permission of |
We should let stdio could be accessed in userns container. Please see opencontainers#4475 Because the default permission of stdio is 0o700, other user can't access them. If we don't change the permission to 0o666, We'll get an error msg if we access stdio in a userns contaienr: ***: /dev/std***: Permission denied. Signed-off-by: lifubang <[email protected]>
I think we need to go back to square one with this. What we know now is
What we don't know:
I think we need to have some sort of a reproducer first. |
It should also be noted that file passthrough for stdio is something folks should avoid using if possible. It's only really useful if you are using runc directly from shell scripts and don't want to use runc as a stdio forwarder (maybe you want to give a container a proper terminal or a network socket as stdio). Within the context of containerd and Kubernetes it makes very little sense to use this mode. Using consoles improves container isolation and all of these higher-level tools have built-in support for handling consoles. |
It seems that if TTY is disable, containerd-shim will create pipe and pass through writable fd to runc. And then init process inherits the writable fd as stderr or stdout. For the init process, it can write data to it. However, the inode is belong to the root user. When init process is trying to open it, the init process doesn't have permission about the pipe.
However, the tty fd is generated by runc init within that user namespace. So nginx container is able to open the file. cc @rata |
Yes, I think so. If the target link of stderr/stdout in runc is owned by the user namespace, it will be no this issue. |
One other question (I forgot) is why this works with crun. Afaik it uses the same shim as for runc. |
I see the crun implementation, it uses |
Do you think we should change |
This was done in containers/crun#755 to fix containers/crun#1019. I tried the repro from there with current runc head, it works: podman run --rm --userns=auto:size=65534 --runtime=`pwd`/runc alpine sh -c 'echo hello >> /dev/stdout'
hello
[kir@kir-tp1 runc]$ ./runc --version
runc version 1.2.0+dev
commit: v1.2.0-22-g4ad9f7fd
spec: 1.2.0
go: go1.23.2
libseccomp: 2.5.5 So it's not it I guess. |
I guess you run the crun first, and then test runc in the same terminal. Because the stdio's owner has been changed by crun in this terminal. |
If the root in the container is different from current root user, we need to change the owner of stdio before we enter the user namespace, or else we may can't access stdio in the container. Please see opencontainers#4475 Signed-off-by: lifubang <[email protected]>
If the root in the container is different from current root user, we need to change the owner of stdio before we enter the user namespace, or else we may can't access stdio in the container. Please see opencontainers#4475 Signed-off-by: lifubang <[email protected]>
If the root in the container is different from current root user, we need to change the owner of stdio before we enter the user namespace, or else we may can't access stdio in the container. Please see opencontainers#4475 Signed-off-by: lifubang <[email protected]>
If the root in the container is different from current root user, we need to change the owner of stdio before we enter the user namespace, or else we may can't access stdio in the container. Please see opencontainers#4475 Signed-off-by: lifubang <[email protected]>
If the user in the container is different from current user, we need to change the owner of stdio before we enter the user namespace, or else we may can't access stdio in the container. Please see opencontainers#4475 Signed-off-by: lifubang <[email protected]>
If the user in the container is different from current user, we need to change the owner of stdio before we enter the user namespace, or else we may can't access stdio in the container. Please see opencontainers#4475 Signed-off-by: lifubang <[email protected]>
If the user in the container is different from current user, we need to change the owner of stdio before we enter the user namespace, or else we may can't access stdio in the container. Please see opencontainers#4475 Signed-off-by: lifubang <[email protected]>
@fuweid @kolyshkin @lifubang Thank you very much for investigating this! Sorry, I was sick and got hit harder than I expected. I want to use this message to give you the debugging ninja #fixallthethings award! :-D @fuweid is that output you pasted also retsnoop? |
Yes. Based on the result, I realized that the stdio pipe is generated by shim and its owner is root. |
Description
Trying to start a Kubernetes container with userns using the nginx official image, fails. This was reported here: containerd/containerd#10598 by @ctrox.
@ctrox also found a workaround: add "tty: true" to the kubernetes pod makes it work.
And a simpler repro: just a container with userns that runs "cat /dev/stderr" also fails with permission denied.
I guess you need to run detached (as containerd does) to hit this, otherwise it uses your shell and that probably works.
@ctrox thanks for the great bug report!
Sorry the brevity, I'm sick ATM. I'll add more info when I recover
Steps to reproduce the issue
No response
Describe the results you received and expected
Works, as without user namespaces.
What version of runc are you using?
runc 1.2.0
Host OS information
No response
Host kernel information
No response
The text was updated successfully, but these errors were encountered: