-
Notifications
You must be signed in to change notification settings - Fork 61
unprivileged: add CLI options for isolation and storage #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gabemontero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalind @adambkaplan - perhaps I missed it in scrum, but is this change the final step for the builder image at least (i.e. ignore OCM changes we would make at some point, or global build config defaults) to support unprivileged builds? weren't there some k8s changes we were waiting on as well? if so, are those now there? or were those k8s changes for caching (I've forgotten these details more than once before :-) )
Should this PR be assoicated with https://issues.redhat.com/browse/BUILD-119
all in all, auspicious stuff :-)
| switch isolationSpec { | ||
| case "", "oci": | ||
| isolation = buildah.IsolationOCI | ||
| case "chroot": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be adding chroot isolation back @nalind, given the now unembargoed CVE that resulted in us removing it ?
or is there some non-root user combination with chroot isolation that avoids the concerns the CVE cited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chrooting doesn't provide the isolation that you'd expect from a container, so a process we execute using it has privileges similar to the process that called it. When called from inside of an unprivileged container, the unprivilegedness of that container provides the isolation we'd expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so to map that back to my wording, chrooting when we are unprivileged avoids the pitfalls the CVE unveiled
agree with ^^ @nalind ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not having a device control group set up is not an issue we have to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that sounds like the "why" for "chrooting when we are unprivileged avoids the pitfalls the CVE unveiled"
|
When we're launched with privileged: false, we don't have enough privileges to run runc.
The build controller would still need to set "privileged: false" for the build pod, and optionally specify what type of isolation to use using the CLI "--isolation" flag to override the default that this PR adds.
If "rootless" there means "uid != 0" from the node's point of view, then no, this doesn't get us that. |
Yeah that is the OCM related changes I'm referring to
I honestly don't remember what was meant by "rootless" when Mrunal opened BUILD-119 and when @siamaksade opened https://issues.redhat.com/browse/BUILD-141 I'll defer to @adambkaplan |
|
/test e2e-aws-builds |
|
/hold |
|
/retest |
|
e2e-aws-builds -> cluster could not get off the ground |
bd6d4de to
9b489dc
Compare
1291d5a to
85fb2b7
Compare
|
/hold cancel |
|
/test e2e-aws-image-ecosystem |
|
@nalind you'll need openshift/origin#25985 to merge in order to get a passing image-ecosystem |
|
/test e2e-aws-builds |
1 similar comment
|
/test e2e-aws-builds |
|
/test e2e-aws-image-ecosystem |
|
Rebased. |
|
Rebased. This may solve bug 1937069, where "chown" during a build takes a while, which I suspect due to copy-up that can be avoided if we use the metacopy feature, which this PR attempts to detect and use if it's supported. |
|
/retest |
|
/retest |
|
@gabemontero @coreydaley PTAL |
We don't use the CreateContainer() method of pkg/build/builder.DockerClient, so remove it and the implementation we have of it. Signed-off-by: Nalin Dahyabhai <[email protected]>
|
per discussion in team scrum including with @nalind /assign @adambkaplan to minimally get the approve label on this, and if he has cycles decide whether it lgtm to him I'll be taking another pass after posting this comment |
gabemontero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the request for a comment below @nalind ideally some added unit tests in https://github.com/openshift/builder/blob/master/pkg/build/builder/cmd/builder_test.go and https://github.com/openshift/builder/blob/master/pkg/build/builder/daemonless_test.go to validate that the new parameters being passed in are processed correctly seems warranted.
Otherwise looks great / thanks !!
| {"overlay", ``, nil}, | ||
| {"overlay", `["mount_program=/usr/bin/fuse-overlayfs","mountopt=metacopy=on"]`, builderCanUseOverlayFUSE}, | ||
| {"overlay", `["mount_program=/usr/bin/fuse-overlayfs"]`, builderCanUseOverlayFUSE}, | ||
| {"vfs", "", nil}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah the order of precedence @nalind mentioned in office hours today :-)
cmd/main.go
Outdated
| if err != nil { | ||
| fmt.Printf("Error setting up service CA cert: %v", err) | ||
| os.Exit(1) | ||
| if !inUserNamespace() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why do we only do this when not in user namespaces?
How is this setup provided when we are in user namespaces?
Looking for a comment that answers those questions. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we've set up a user namespace, then we know that we weren't started as UID 0 in the container, so these locations are not writable. If we try to write to those locations anyway, we hit a permissions error and exit. It's an unsolved problem for cases where we start the container as a UID other than 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK capture that in a comment in the code and I'm good on this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as non-root we can't write to /etc/docker/certs.d? Interesting, may need to determine how we can have the build controller mount these for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this also impact our entrypoint script that generates the trust bundle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mounting the certs.d subpath of the build-ca-bundles volume to /etc/docker/certs.d should work, since the directory isn't even in the builder image to begin with. The cluster.crt that we drop in /etc/pki/tls/certs is trickier, since the directory already includes content that we want to still have there. We should be able to mount the file itself from the volume that contains the service account secrets, but I haven't personally tried that.
This may all be moot, though, since I think ensuring the directories exist and setting them group-writeable in the image will also work, with no code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to try creating the directories and setting them group-writable in the Dockerfile instead. That'll probably work better for the entrypoint script, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK if !inUserNamespace() condition has been removed along with the Dockerfile changes @nalind mentions ^^
adambkaplan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
There are definite issues if we run in a user namespace around certs that need to be addressed. The scope is way beyond this PR and should be captured in JIRA.
cmd/builder.go
Outdated
|
|
||
| flags := cmd.Flags() | ||
| flags.StringVar(&isolation, "isolation", isolation, "type of process `isolation` to use for RUN instructions") | ||
| flags.StringVar(&ociRuntime, "oci-runtime", ociRuntime, "runtime to invoke for OCI isolation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if --oci-runtime is not specified? Is there a reasonable default for the empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can retrieve a default from github.com/containers/common/pkg/config's DefaultConfig() function. I'll add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd/builder.go
Outdated
|
|
||
| flags := cmd.Flags() | ||
| flags.StringVar(&isolation, "isolation", isolation, "type of process `isolation` to use for RUN instructions") | ||
| flags.StringVar(&ociRuntime, "oci-runtime", ociRuntime, "runtime to invoke for OCI isolation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func builderDefaultIsolation() (string, error) { | ||
| if inUserNamespace() { | ||
| // We probably don't have enough privileges to use a proper runtime. | ||
| return "chroot", nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are in a user namespace, does this make us vulnerable to https://access.redhat.com/security/cve/CVE-2021-20182?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an unprivileged container, the builder container will already be have a device control group configured for limiting which devices the kernel will allow direct access to, even when the nodes exist, and /dev will have the smaller set of entries in it which we grant to unprivileged containers. I expect we'll also be running without CAP_MKNOD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw @adambkaplan we also dove into the CVE ramifications back in March with #220 (comment)
bottom line: "we appear good"
that said, I think @nalind 's comment here ^^ would make a good comment when we see this code again in 6 months and ask the same question, not remembering this discussion :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding Lean on the container that we're in being itself unprivileged (i.e., having control groups including the device cgroup configured for us, being provided with a smaller set of devices in /dev, and likely running without a few capabilities that we don't need), and reduce the degree of isolation that we try to use to what we know we're actually allowed to do in an unprivileged container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @nalind
cmd/main.go
Outdated
| if err != nil { | ||
| fmt.Printf("Error setting up service CA cert: %v", err) | ||
| os.Exit(1) | ||
| if !inUserNamespace() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as non-root we can't write to /etc/docker/certs.d? Interesting, may need to determine how we can have the build controller mount these for us.
cmd/main.go
Outdated
| if err != nil { | ||
| fmt.Printf("Error setting up service CA cert: %v", err) | ||
| os.Exit(1) | ||
| if !inUserNamespace() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this also impact our entrypoint script that generates the trust bundle?
I'm good with that ^^ but for me this highlights the need for comments I noted earlier. Even a TODO comment with a ref to whatever Jira gets opened for this. |
|
/test e2e-aws-builds |
Add CLI flags for controlling the type of process isolation we use for RUN instructions, and for controlling the storage driver and options that we pass to it, along with logic for guessing the right defaults. Drop environment variables that controlled these settings. Have the docker and sti builders re-exec themselves in user namespaces when they're passed --uidmap or --gidmap flags, or run as non-root or without CAP_SYS_ADMIN, so that they will be able to create new namespaces when processing RUN instructions. Signed-off-by: Nalin Dahyabhai <[email protected]>
|
last e2e-aws-builds run died in aws install all the prior comments have been addressed, so /lgtm but let's keep an eye on whether possible future e2e-aws-build fail in the actual tests also, given the fact that we are touching the cert path, I'm launching a /test e2e-aws-proxy (pretty sure we made that an optional possibility with this repo) |
|
/test e2e-aws-proxy |
|
the one e2e-aws failure is suppose to be a low rate flake according to sippy with Associated Bugs: 1972490 1973266 1978829 /test e2e-aws |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adambkaplan, gabemontero, nalind, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add CLI flags for controlling the type of process isolation we use for
RUNinstructions, and for controlling the storage driver and options that we pass to it, along with logic for guessing the right defaults. Drop environment variables that controlled these settings. This renders the staticstorage.conffile in the builder image largely irrelevant.Add/etc/subuidand/etc/subgidfiles to the builder image that allow root to map the entire ID namespace.Have the docker and sti builders re-exec themselves in user namespaces when they're run as non-UID=0 or without the
CAP_SYS_ADMINcapabiity, so that they will be able to create new namespaces when processing RUN instructions.With no CLI flags supplied and in a privileged pod, the builder should continue using the overlay driver using the kernel's overlay filesystem, and the OCI runtime.