Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during unshare(CLONE_NEWUSER): Operation not permitted #1901

Closed
nmiculinic opened this issue Oct 8, 2019 · 39 comments
Closed

Error during unshare(CLONE_NEWUSER): Operation not permitted #1901

nmiculinic opened this issue Oct 8, 2019 · 39 comments

Comments

@nmiculinic
Copy link

Description

I cannot run buildah bud

Steps to reproduce the issue:

docker run --rm -it ubuntu

Within the docker container I run the following:

https://github.com/containers/buildah/blob/master/install.md#ubuntu

root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# buildah bud -f Dockerfile  .
Error during unshare(CLONE_NEWUSER): Operation not permitted
ERRO[0000] error parsing PID "": strconv.Atoi: parsing "": invalid syntax 
ERRO[0000] (unable to determine exit status)            
root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# buildah --version
buildah version 1.10.1 (image-spec 1.0.1, runtime-spec 1.0.1-dev)
root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# cat /proc/sys/user/max_user_names
paces
62901
root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# cat "/proc/sys/kernel/unprivileged_userns_clone"
1
root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# 

Describe the results you expected:

I expected everything to work our and build the OCI image.

Output of rpm -q buildah or apt list buildah:

root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# apt list buildah
Listing... Done
buildah/bionic,now 1.10.1-1~ubuntu18.04~ppa1 amd64 [installed]

Output of buildah version:

buildah version 1.10.1 (image-spec 1.0.1, runtime-spec 1.0.1-dev)

Output of podman version if reporting a podman build issue:
not installed

Output of cat /etc/*release:

root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"

Output of uname -a:

root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# uname -a
Linux dbdb5cd66273 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

(( default one ))

root@dbdb5cd66273:/rootfs/ci/dockerfiles/test# cat /etc/containers/storage.conf
# storage.conf is the configuration file for all tools
# that share the containers/storage libraries
# See man 5 containers-storage.conf for more information

# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver
driver = "overlay"

# Temporary storage location
runroot = "/var/run/containers/storage"

# Primary read-write location of container storage
graphroot = "/var/lib/containers/storage"

[storage.options]
# AdditionalImageStores is used to pass paths to additional read-only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Size is used to set a maximum size of the container image.  Only supported by
# certain container storage drivers (currently overlay, zfs, vfs, btrfs)
size = ""

# OverrideKernelCheck tells the driver to ignore kernel checks based on kernel version
override_kernel_check = "true"
@rhatdan
Copy link
Member

rhatdan commented Oct 8, 2019

We recommend that people running buildah within a locked down container use images from quay.io.
https://quay.io/repository/buildah/stable
Basically running straight buildah within a locked down container will fail, because the unshare command is blocked. We recommend using the --isolation=chroot, which eliminates the unshare call.

@nmiculinic
Copy link
Author

It doesn't seem to help at all:

docker run --rm -it -v $(pwd):/rootfs quay.io/buildah/stable
[root@664c4f767a70 test]# buildah bud --isolation=chroot  -f Dockerfile  .  
Error during unshare(CLONE_NEWUSER): Operation not permitted
ERRO error parsing PID "": strconv.Atoi: parsing "": invalid syntax 
ERRO (unable to determine exit status)            

Also it appears to be default isolation as well in the container.

@rhatdan
Copy link
Member

rhatdan commented Oct 10, 2019

Could you try this with podman?
Also could you try
docker run --security-opt seccomp=/usr/share/containers/seccomp.json --rm -it -v $(pwd):/rootfs quay.io/buildah/stable

I think Docker might be blocking the unshare syscall.

@JamesWrigley
Copy link

Not sure if this is the case on Ubuntu, but on Debian the kernel itself disables the unsharing: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=808915.

I had to manually allow unprivileged users to unshare, get Docker to use Podman's seccomp profile, and then Buildah ran in the container. Using --isolation=chroot had no effect, unfortunately.

@rhatdan
Copy link
Member

rhatdan commented Oct 15, 2019

Don't fully understand what you are saying, Did Buildah work or not work within the container?

@JamesWrigley
Copy link

Yes it did (on a Debian host), once I ran:

echo 1 > /proc/sys/kernel/unprivileged_userns_clone

I'm not sure why this is necessary if --isolation=chroot eliminates the unshare call.

Then when using Podman's seccomp profile, Buildah worked in the container:

docker run --security-opt seccomp=/usr/share/containers/seccomp.json --rm -it quay.io/buildah/stable

@rhatdan
Copy link
Member

rhatdan commented Oct 16, 2019

@nalind @giuseppe Are we still unsharing the namespace if we are doing --isolation=chroot

@giuseppe
Copy link
Member

@nalind @giuseppe Are we still unsharing the namespace if we are doing --isolation=chroot

yes, a new user namespace is still necessary when the user has no CAP_SYS_ADMIN in the container.

@rhatdan
Copy link
Member

rhatdan commented Oct 18, 2019

@giuseppe Why, what do we need this for? I guess we are still bind mounting the /proc and /sys into the chroot.

@giuseppe
Copy link
Member

@giuseppe Why, what do we need this for? I guess we are still bind mounting the /proc and /sys into the chroot.

yes, we still need to be able to create bind mounts to create the environment used by the chroot

@rhatdan
Copy link
Member

rhatdan commented Oct 21, 2019

Thanks, I had figured that out.

@rhatdan
Copy link
Member

rhatdan commented Oct 21, 2019

So docker seccomp.json file blocking unshare is the issue, and should be changed, or as I reccoment use podman/CRI-O for running these containers. You can run docker with Podman /usr/share/containers/seccomp.json file.

@rhatdan rhatdan closed this as completed Oct 21, 2019
dkliban added a commit to dkliban/pulp_container that referenced this issue May 17, 2020
dkliban added a commit to dkliban/pulp_container that referenced this issue May 17, 2020
dkliban added a commit to dkliban/pulp_container that referenced this issue May 17, 2020
dkliban added a commit to dkliban/pulp_container that referenced this issue May 17, 2020
dkliban added a commit to dkliban/pulp_container that referenced this issue May 17, 2020
@qhaas
Copy link

qhaas commented Jun 2, 2020

Could you try this with podman?

Seeing this error in podman on a ppc64le RHEL 7.6 host with a CentOS7 container.

# whoami
root
# sestatus | grep mode
Current mode:                   permissive
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.6 (Maipo)
# arch
ppc64le
# podman --version
podman version 1.4.4
# podman run --rm -it ppc64le/centos:7
# cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (AltArch)
# yum install -y buildah
...
# buildah --version
buildah version 1.11.6 (image-spec 1.0.1-dev, runtime-spec 1.0.1-dev)
# buildah from scratch
Error during unshare(CLONE_NEWUSER): Operation not permitted
ERRO error parsing PID "": strconv.Atoi: parsing "": invalid syntax 
ERRO (unable to determine exit status)
# buildah --isolation=chroot from scratch
Error during unshare(CLONE_NEWUSER): Operation not permitted
ERRO error parsing PID "": strconv.Atoi: parsing "": invalid syntax 
ERRO (unable to determine exit status)

If one starts podman with superpowers, one gets a different error:

# podman run --cap-add ALL --privileged --rm -it ppc64le/centos:7
...
# buildah from scratch  
ERRO 'overlay' is not supported over overlayfs    
'overlay' is not supported over overlayfs: backing file system is unsupported for this graph driver
# buildah --isolation=chroot from scratch
ERRO 'overlay' is not supported over overlayfs    
'overlay' is not supported over overlayfs: backing file system is unsupported for this graph driver

@rhatdan
Copy link
Member

rhatdan commented Jun 3, 2020

If you are in a container, then you should use buildah from --isolation=chroot, no reason to use container technology within a container.

We do a lot of configuration to make buildah run within a locked down container.

https://github.com/containers/buildah/blob/master/contrib/buildahimage/stable/Dockerfile

@GJaminon
Copy link

no reason to use container technology within a container.

Sorry but when building a image from inside a Jenkins container agent it is useful. Since dockerd is deprecated in Kubernetes we need an alternative; Is it possible with Buildah or do we need to find something else ?

@rhatdan
Copy link
Member

rhatdan commented Dec 30, 2020

The comment should have been more specific. Basically locking down a process within a container with additional duplicative lock down is not worth it. So if I have dropped caps, and running with SELinux lock down and seccomp rules locked down, then don't attempt to do them again. If the container engines attempt to, they will be blocked, because of the existing container lockdown and your container engine will fail.

It is possible to run buildah and podman within a container. The issue is how much security you lock said container down with.

Running docker within a container has the same issues. It requires a --priivleged container or a container with a leaked docker.socket from the host into the container, which is arguably less secure then just running --privileged.

@GJaminon
Copy link

The goal is not to have a docker socket available in a Container but to build a container image inside a CI agent running in K8S

@rhatdan
Copy link
Member

rhatdan commented Jan 4, 2021

Sure, but in order to run most containers, you need more then one UID within the container, and a lot of times the process needs some linux capabilities. Podman requires these. (as well as Docker).

VannTen added a commit to VannTen/meteor-operator that referenced this issue Nov 24, 2022
This seems to be the recommended way to run buildah when already inside
a container, see
containers/buildah#1901 (comment)
VannTen added a commit to VannTen/meteor-operator that referenced this issue Nov 24, 2022
This seems to be the recommended way to run buildah when already inside
a container, see
containers/buildah#1901 (comment)
VannTen added a commit to VannTen/meteor-operator that referenced this issue Nov 28, 2022
This seems to be the recommended way to run buildah when already inside
a container, see
containers/buildah#1901 (comment)
@awildturtok
Copy link

Still encountering this issue on quay.io/containers/buildah:v1.28 doing

buildah build --isolation=chroot ${CI_PROJECT_DIR}/Dockerfile

The container is run inside a Gitlab CI Pipeline

@nikolaseu
Copy link

Still encountering this issue on quay.io/containers/buildah:v1.28 doing

buildah build --isolation=chroot ${CI_PROJECT_DIR}/Dockerfile

The container is run inside a Gitlab CI Pipeline

Same for me

@giuseppe
Copy link
Member

I think the default seccomp profile blocks unshare. You need to use a different seccomp profile

@rhatdan
Copy link
Member

rhatdan commented Apr 26, 2023

Dockers/containerd blocks unshare and mount. Podman, Buildah, CRI-O do not.

@giuseppe
Copy link
Member

CRI-O by default blocks unshare as well. There is need to change the seccomp profile with CRI-O too

@rhatdan
Copy link
Member

rhatdan commented Apr 30, 2023

Ok CRI-O Should be using the same seccomp.json file as podman and buildah.
rpm -qf /usr/share/containers/seccomp.json
containers-common-1-89.fc38.noarch

@mrunalp @haircommander @saschagrunert WDYT?

@giuseppe
Copy link
Member

That was disabled AFAIK because user namespaces open up a lot of new features that can be abused. Many security issues in the kernel in the last years were caused by user namespaces and Docker/containerd were not affected while CRI-O was. Personally I think it makes sense for CRI-O to be more locked up than Podman and allow more kernel features only when strictly necessary

@haircommander
Copy link
Collaborator

haircommander commented May 1, 2023

Ok CRI-O Should be using the same seccomp.json file as podman and buildah.

we actually typically embed the seccomp profile by default inside of the binary, but we also do manually remove unshare from it: https://github.com/cri-o/cri-o/blob/main/internal/config/seccomp/seccomp.go#L45 and this was done for the reasons @giuseppe mentions

@rhatdan
Copy link
Member

rhatdan commented May 1, 2023

I can hear Eric B, screaming from the hinderlands. How would a user add back unshare to his own seccomp.go file?

@rhatdan rhatdan reopened this May 1, 2023
@haircommander
Copy link
Collaborator

they can either specify a separate profile inside of a pod spec (or unconfined if they feel so bold) or they can point cri-o to a profile on the node (like the one you attached above)

@pkit
Copy link

pkit commented May 1, 2023

Many security issues in the kernel in the last years were caused by user namespaces and Docker/containerd were not affected while CRI-O was.

But docker (or any other purely container tech) is inherently insecure anyway. What's the point?

@giuseppe
Copy link
Member

giuseppe commented May 2, 2023

Many security issues in the kernel in the last years were caused by user namespaces and Docker/containerd were not affected while CRI-O was.

But docker (or any other purely container tech) is inherently insecure anyway. What's the point?

what do you mean with that? The point of seccomp for containers is to try to make them safer, as much as possible with the right trade-off between security and what programs would break. If you need a custom profile you can provide that.

User namespaces open up a wider kernel attack surface since more kernel features can be used (e.g. mount APIs). So to play safe it is better to disable it by default, at least on a cluster, and allow it only when it is necessary and in a controlled way.

IMO this should not be changed for CRI-O and unshare should be left disabled by default.

@pkit
Copy link

pkit commented May 2, 2023

what do you mean with that?

I mean that docker is insecure.
Either you need to fully embrace seccomp (i.e. total lockdown and a user space kernel, see gvisor)
Or fully embrace a real VM (see firecracker)
All the other half-solutions only create a false sense that something is secure.

@giuseppe
Copy link
Member

giuseppe commented May 3, 2023

Well there are compromises. Allowing unshare would give more possibilities to the malicious agent, e.g. https://unit42.paloaltonetworks.com/cve-2022-0492-cgroups/ could be avoided with unshare blocked.

I am closing the issue since I don't think we should change the default we currently have in CRI-O

@giuseppe giuseppe closed this as completed May 3, 2023
@abitrolly
Copy link

@giuseppe am I right that people need to "unblock unshare" anyway to build containers in containers with buildah? In that so case the decision just makes a false claim of security. Like buildah is placing blame on users/developers without providing any secure alternative.

I also came here from GitLab, because I saw buildah as alternative to Docker in Docker. I thought it is just a a simple user space that takes files and packs them. It is very frustrating to spend time in yet another layer of problems with no result.

@awildturtok
Copy link

awildturtok commented May 7, 2023

I also came here from GitLab, because I saw buildah as alternative to Docker in Docker. I thought it is just a a simple user space that takes files and packs them. It is very frustrating to spend time in yet another layer of problems with no result.

@abitrolly
I had the same aspirations to use buildah - since we're 100% on podman anyway. I've since switched to kaniko which was a breeze to get going

@abitrolly
Copy link

abitrolly commented May 7, 2023

@awildturtok thanks for the pointer. Going to try kaniko. I understand that Linux container security is hard, but I would rather see big companies spending time on making Kurzgesagt style videos so that more people could understand how to improve them. With SELinux and podman/buildah I admit most of the time when dealing with their errors I don't know what I am doing, and this is what frustrates me most. High respect to people who understands all that stuff. I am just not one of you.

EDIT: https://gitlab.com/abitrolly/gitlab-elasticsearch-indexer/-/jobs/4250152765#L22 kaniko rocks. )

@terinjokes
Copy link

It seems a bit weird to need unshared to build a multiarch manifest from already built images. AFAIK, there are no users or privileged operations happening.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests