Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,8 @@ FROM scratch AS cni-plugins-export-squashed
COPY --from=cni-plugins-export / /

FROM --platform=$BUILDPLATFORM alpine:${ALPINE_VERSION} AS binfmt-filter
# built from https://github.com/tonistiigi/binfmt/releases/tag/buildkit%2Fv9.2.0-50
COPY --link --from=tonistiigi/binfmt:buildkit-v9.2.0-50@sha256:ff21b00e7238dce3bbd74fbe25591f7213837a77861b47b2df5e019540ec33fa / /out/
# built from https://github.com/tonistiigi/binfmt/releases/tag/buildkit%2Fv9.2.2-54
COPY --link --from=tonistiigi/binfmt:buildkit-v9.2.2-54@sha256:e60fbf01e26c75efa816224f4de31c2ef63c5486b20c3e8fa1e5da2aff368ba9 / /out/
WORKDIR /out/
RUN rm buildkit-qemu-loongarch64 buildkit-qemu-mips64 buildkit-qemu-mips64el

Expand Down
7 changes: 7 additions & 0 deletions cmd/buildkitd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -1067,6 +1067,13 @@ func getCDIManager(cfg config.CDIConfig) (*cdidevices.Manager, error) {
if err := cdiCache.Refresh(); err != nil {
return nil, err
}
if errs := cdiCache.GetErrors(); len(errs) > 0 {
for dir, errs := range errs {
for _, err := range errs {
bklog.L.Warnf("CDI setup error %v: %+v", dir, err)
}
}
}
return cdiCache, nil
}()
if err != nil {
Expand Down
111 changes: 72 additions & 39 deletions docs/rootless.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,34 +12,48 @@ Rootless mode allows running BuildKit daemon as a non-root user.

[RootlessKit](https://github.com/rootless-containers/rootlesskit/) needs to be installed.

```console
$ rootlesskit buildkitd
```bash
rootlesskit buildkitd
```

```console
$ buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
```bash
buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
```

To isolate BuildKit daemon's network namespace from the host (recommended):
```console
$ rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
```
> [!TIP]
> To isolate BuildKit daemon's network namespace from the host (recommended):
> ```bash
> rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
> ```

## Running BuildKit in Rootless mode (containerd worker)

[RootlessKit](https://github.com/rootless-containers/rootlesskit/) needs to be installed.

Run containerd in rootless mode using rootlesskit following [containerd's document](https://github.com/containerd/containerd/blob/main/docs/rootless.md).

```bash
containerd-rootless.sh

CONTAINERD_NAMESPACE=default containerd-rootless-setuptool.sh install-buildkit-containerd
```
$ containerd-rootless.sh
```

Then let buildkitd join the same namespace as containerd.
<details>
<summary>Advanced guide</summary>

<p>


Alternatively, you can specify the full command line flags as follows:
```bash
containerd-rootless.sh --config /path/to/config.toml

containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true
```
$ containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true --containerd-worker-snapshotter=native
```

</p>

</details>

## Containerized deployment

Expand All @@ -48,36 +62,45 @@ See [`../examples/kubernetes`](../examples/kubernetes).

### Docker

```console
$ docker run \
```bash
docker run \
--name buildkitd \
-d \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--device /dev/fuse \
moby/buildkit:rootless --oci-worker-no-process-sandbox
$ buildctl --addr docker-container://buildkitd build ...
```
--security-opt systempaths=unconfined \
moby/buildkit:rootless

If you don't mind using `--privileged` (almost safe for rootless), the `docker run` flags can be shorten as follows:

```console
$ docker run --name buildkitd -d --privileged moby/buildkit:rootless
buildctl --addr docker-container://buildkitd build ...
```

#### About `--device /dev/fuse`
Adding `--device /dev/fuse` to the `docker run` arguments is required only if you want to use `fuse-overlayfs` snapshotter.
> [!TIP]
> If you don't mind using `--privileged` (almost safe for rootless), the `docker run` flags can be shorten as follows:
>
> ```bash
> docker run --name buildkitd -d --privileged moby/buildkit:rootless
> ```

#### About `--oci-worker-no-process-sandbox`
Justification of the `--security-opt` flags:

By adding `--oci-worker-no-process-sandbox` to the `buildkitd` arguments, BuildKit can be executed in a container without adding `--privileged` to `docker run` arguments.
However, you still need to pass `--security-opt seccomp=unconfined --security-opt apparmor=unconfined` to `docker run`.
* `seccomp=unconfined`: For allowing several syscalls such as `unshare` (used by runc) and `mount` (used by snapshotters, etc).

Note that `--oci-worker-no-process-sandbox` allows build executor containers to `kill` (and potentially `ptrace` depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
* `apparmor=unconfined`: For allowing mounting filesystems, etc.
This flag is not needed when the host operating system does not use AppArmor.

To allow running rootless `buildkitd` without `--oci-worker-no-process-sandbox`, run `docker run` with `--security-opt systempaths=unconfined`. (For Kubernetes, set `securityContext.procMount` to `Unmasked`.)
* `systempaths=unconfined`: For disabling the masks for the `/proc` mount in the container, so that each of `ExecOp`
(corresponds to a `RUN` instruction in Dockerfile) can have a dedicated `/proc` filesystem.
`systempaths=unconfined` potentially allows reading and writing dangerous kernel files from a container, but it is safe when you are running `buildkitd` as non-root.

The `--security-opt systempaths=unconfined` flag disables the masks for the `/proc` mount in the container and potentially allows reading and writing dangerous kernel files, but it is safe when you are running `buildkitd` as non-root.
> [!TIP]
> Instead of `--security-opt systempaths=unconfined`, `buildkitd` can be also executed with `--oci-worker-no-process-sandbox` (flag of `buildkitd`, not `docker`)
> to avoid creating a new PID namespace and mounting a new `/proc` for it.
>
> Using `--oci-worker-no-process-sandbox` is discouraged, as it cannot terminate processes that did not exit during an `ExecOp`.
> Also, `--oci-worker-no-process-sandbox` allows `ExecOp` containers to `kill` (and potentially `ptrace` depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
>
> Despite these caveats, the [Kubernetes examples](../examples/kubernetes) uses `--oci-worker-no-process-sandbox`, as Kubernetes lacks the equivalent of `systempaths=unconfined`.
> (`securityContext.procMount=Unmasked` is similar, but different in the sense that it depends on `hostUsers: false`)

### Change UID/GID

Expand All @@ -90,7 +113,7 @@ Actual ID (shown in the host and the BuildKit daemon container)| Mapped ID (show
... | ...
165535 | 65536

```
```console
$ docker exec buildkitd id
uid=1000(user) gid=1000(user)
$ docker exec buildkitd ps aux
Expand All @@ -99,15 +122,16 @@ PID USER TIME COMMAND
13 user 0:00 /proc/self/exe buildkitd --addr tcp://0.0.0.0:1234
21 user 0:00 buildkitd --addr tcp://0.0.0.0:1234
29 user 0:00 ps aux

$ docker exec cat /etc/subuid
user:100000:65536
```

To change the UID/GID configuration, you need to modify and build the BuildKit image manually.
```
$ vi Dockerfile
$ make images
$ docker run ... moby/buildkit:local-rootless ...
```bash
vi Dockerfile
make images
docker run ... moby/buildkit:local-rootless ...
```

## Troubleshooting
Expand All @@ -120,7 +144,9 @@ $ rootlesskit buildkitd --oci-worker-snapshotter=fuse-overlayfs
```

### Error related to `fuse-overlayfs`
Try running `buildkitd` with `--oci-worker-snapshotter=native`:
Run `docker run` with `--device /dev/fuse`.

Also try running `buildkitd` with `--oci-worker-snapshotter=native`:

```console
$ rootlesskit buildkitd --oci-worker-snapshotter=native
Expand All @@ -137,12 +163,19 @@ Run `sysctl -w user.max_user_namespaces=N` (N=positive integer, like 63359) on t

See [`../examples/kubernetes/sysctl-userns.privileged.yaml`](../examples/kubernetes/sysctl-userns.privileged.yaml).

### Error `fork/exec /proc/self/exe: permission denied` with `This error might have happened because /proc/sys/kernel/apparmor_restrict_unprivileged_userns is set to 1`
Add `kernel.apparmor_restrict_unprivileged_userns=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl -p`.

### Error `mount proc:/proc (via /proc/self/fd/6), flags: 0xe: operation not permitted`
This error is known to happen when BuildKit is executed in a container without the `--oci-worker-no-sandbox` flag.
Make sure that `--oci-worker-no-process-sandbox` is specified (See [below](#docker)).
This error is known to happen when BuildKit is executed in a container without the `--security-opt systempaths=unconfined` flag.
Make sure to specify it (See [above](#docker)).

## Distribution-specific hint
Using Ubuntu kernel is recommended.

### Ubuntu, 24.04 or later
Add `kernel.apparmor_restrict_unprivileged_userns=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl -p`.

### Container-Optimized OS from Google
Make sure to have an `emptyDir` volume below:
```yaml
Expand Down
56 changes: 34 additions & 22 deletions examples/kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,26 @@ This directory contains Kubernetes manifests for `Pod`, `Deployment` (with `Serv
* `StateFulset`: good for client-side load balancing, without registry-side cache
* `Job`: good if you don't want to have daemon pods

Using Rootless mode (`*.rootless.yaml`) is recommended because Rootless mode image is executed as non-root user (UID 1000) and doesn't need `securityContext.privileged`.
See [`../../docs/rootless.md`](../../docs/rootless.md).
## Variants

See also ["Building Images Efficiently And Securely On Kubernetes With BuildKit" (KubeCon EU 2019)](https://kccnceu19.sched.com/event/MPX5).
- `*.privileged.yaml`: Launches the Pod as the fully privileged root user.
- `*.rootless.yaml`: Launches the Pod as a non-root user, whose UID is 1000.
- `*.userns.yaml`: Launches the Pod as a non-root user. The UID is determined by kubelet.
Needs kubelet and kube-apiserver to be reconfigured to enable the
[`UserNamespacesSupport`](https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/) feature gate.

It is recommended to use `*.rootless.yaml` to minimize the chance of container breakout attacks.

See also:
- [`../../docs/rootless.md`](../../docs/rootless.md).
- ["Building Images Efficiently And Securely On Kubernetes With BuildKit" (KubeCon EU 2019)](https://kccnceu19.sched.com/event/MPX5).

## `Pod`

```console
$ kubectl apply -f pod.rootless.yaml
$ buildctl \
```bash
kubectl apply -f pod.rootless.yaml

buildctl \
--addr kube-pod://buildkitd \
build --frontend dockerfile.v0 --local context=/path/to/dir --local dockerfile=/path/to/dir
```
Expand All @@ -29,25 +39,27 @@ If rootless mode doesn't work, try `pod.privileged.yaml`.
Setting up mTLS is highly recommended.

`./create-certs.sh SAN [SAN...]` can be used for creating certificates.
```console
$ ./create-certs.sh 127.0.0.1
```bash
./create-certs.sh 127.0.0.1
```

The daemon certificates is created as `Secret` manifest named `buildkit-daemon-certs`.
```console
$ kubectl apply -f .certs/buildkit-daemon-certs.yaml
```bash
kubectl apply -f .certs/buildkit-daemon-certs.yaml
```

Apply the `Deployment` and `Service` manifest:
```console
$ kubectl apply -f deployment+service.rootless.yaml
$ kubectl scale --replicas=10 deployment/buildkitd
```bash
kubectl apply -f deployment+service.rootless.yaml

kubectl scale --replicas=10 deployment/buildkitd
```

Run `buildctl` with TLS client certificates:
```console
$ kubectl port-forward service/buildkitd 1234
$ buildctl \
```bash
kubectl port-forward service/buildkitd 1234

buildctl \
--addr tcp://127.0.0.1:1234 \
--tlscacert .certs/client/ca.pem \
--tlscert .certs/client/cert.pem \
Expand All @@ -58,10 +70,10 @@ $ buildctl \
## `StatefulSet`
`StatefulSet` is useful for consistent hash mode.

```console
$ kubectl apply -f statefulset.rootless.yaml
$ kubectl scale --replicas=10 statefulset/buildkitd
$ buildctl \
```bash
kubectl apply -f statefulset.rootless.yaml
kubectl scale --replicas=10 statefulset/buildkitd
buildctl \
--addr kube-pod://buildkitd-4 \
build --frontend dockerfile.v0 --local context=/path/to/dir --local dockerfile=/path/to/dir
```
Expand All @@ -70,8 +82,8 @@ See [`./consistenthash`](./consistenthash) for how to use consistent hashing.

## `Job`

```console
$ kubectl apply -f job.rootless.yaml
```bash
kubectl apply -f job.rootless.yaml
```

To push the image to the registry, you also need to mount `~/.docker/config.json`
Expand Down
5 changes: 3 additions & 2 deletions examples/kubernetes/deployment+service.rootless.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ spec:
metadata:
labels:
app: buildkitd
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
# see buildkit/docs/rootless.md for caveats of rootless mode
spec:
containers:
Expand Down Expand Up @@ -54,6 +52,9 @@ spec:
# Needs Kubernetes >= 1.19
seccompProfile:
type: Unconfined
# Needs Kubernetes >= 1.30
appArmorProfile:
type: Unconfined
# To change UID/GID, you need to rebuild the image
runAsUser: 1000
runAsGroup: 1000
Expand Down
Loading