Skip to content

Conversation

@nalind
Copy link
Member

@nalind nalind commented Mar 9, 2021

Add CLI flags for controlling the type of process isolation we use for RUN instructions, and for controlling the storage driver and options that we pass to it, along with logic for guessing the right defaults. Drop environment variables that controlled these settings. This renders the static storage.conf file in the builder image largely irrelevant.

Add /etc/subuid and /etc/subgid files to the builder image that allow root to map the entire ID namespace.

Have the docker and sti builders re-exec themselves in user namespaces when they're run as non-UID=0 or without the CAP_SYS_ADMIN capabiity, so that they will be able to create new namespaces when processing RUN instructions.

With no CLI flags supplied and in a privileged pod, the builder should continue using the overlay driver using the kernel's overlay filesystem, and the OCI runtime.

Copy link
Contributor

@gabemontero gabemontero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalind @adambkaplan - perhaps I missed it in scrum, but is this change the final step for the builder image at least (i.e. ignore OCM changes we would make at some point, or global build config defaults) to support unprivileged builds? weren't there some k8s changes we were waiting on as well? if so, are those now there? or were those k8s changes for caching (I've forgotten these details more than once before :-) )

Should this PR be assoicated with https://issues.redhat.com/browse/BUILD-119

all in all, auspicious stuff :-)

switch isolationSpec {
case "", "oci":
isolation = buildah.IsolationOCI
case "chroot":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be adding chroot isolation back @nalind, given the now unembargoed CVE that resulted in us removing it ?

or is there some non-root user combination with chroot isolation that avoids the concerns the CVE cited?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chrooting doesn't provide the isolation that you'd expect from a container, so a process we execute using it has privileges similar to the process that called it. When called from inside of an unprivileged container, the unprivilegedness of that container provides the isolation we'd expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so to map that back to my wording, chrooting when we are unprivileged avoids the pitfalls the CVE unveiled

agree with ^^ @nalind ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having a device control group set up is not an issue we have to avoid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that sounds like the "why" for "chrooting when we are unprivileged avoids the pitfalls the CVE unveiled"

@nalind
Copy link
Member Author

nalind commented Mar 9, 2021

When we're launched with privileged: false, we don't have enough privileges to run runc.

@nalind @adambkaplan - perhaps I missed it in scrum, but is this change the final step for the builder image at least (i.e. ignore OCM changes we would make at some point, or global build config defaults) to support unprivileged builds? weren't there some k8s changes we were waiting on as well? if so, are those now there? or were those k8s changes for caching (I've forgotten these details more than once before :-) )

The build controller would still need to set "privileged: false" for the build pod, and optionally specify what type of isolation to use using the CLI "--isolation" flag to override the default that this PR adds.

Should this PR be assoicated with https://issues.redhat.com/browse/BUILD-119

If "rootless" there means "uid != 0" from the node's point of view, then no, this doesn't get us that.

@gabemontero
Copy link
Contributor

gabemontero commented Mar 9, 2021

When we're launched with privileged: false, we don't have enough privileges to run runc.

@nalind @adambkaplan - perhaps I missed it in scrum, but is this change the final step for the builder image at least (i.e. ignore OCM changes we would make at some point, or global build config defaults) to support unprivileged builds? weren't there some k8s changes we were waiting on as well? if so, are those now there? or were those k8s changes for caching (I've forgotten these details more than once before :-) )

The build controller would still need to set "privileged: false" for the build pod, and optionally specify what type of isolation to use using the CLI "--isolation" flag to override the default that this PR adds.

Yeah that is the OCM related changes I'm referring to

Should this PR be assoicated with https://issues.redhat.com/browse/BUILD-119

If "rootless" there means "uid != 0" from the node's point of view, then no, this doesn't get us that.

I honestly don't remember what was meant by "rootless" when Mrunal opened BUILD-119 and when @siamaksade opened https://issues.redhat.com/browse/BUILD-141

I'll defer to @adambkaplan

@nalind
Copy link
Member Author

nalind commented Mar 9, 2021

/test e2e-aws-builds

@nalind
Copy link
Member Author

nalind commented Mar 10, 2021

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 10, 2021
@nalind
Copy link
Member Author

nalind commented Mar 16, 2021

/retest

@gabemontero
Copy link
Contributor

e2e-aws-builds -> Cannot exceed quota for InstanceProfilesPerAccount: 1000

cluster could not get off the ground

@nalind
Copy link
Member Author

nalind commented Mar 19, 2021

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 19, 2021
@nalind
Copy link
Member Author

nalind commented Mar 19, 2021

/test e2e-aws-image-ecosystem
/test e2e-aws-builds

@gabemontero
Copy link
Contributor

@nalind you'll need openshift/origin#25985 to merge in order to get a passing image-ecosystem

@nalind
Copy link
Member Author

nalind commented Mar 19, 2021

/test e2e-aws-builds

1 similar comment
@nalind
Copy link
Member Author

nalind commented Mar 23, 2021

/test e2e-aws-builds

@nalind
Copy link
Member Author

nalind commented Mar 23, 2021

/test e2e-aws-image-ecosystem

@nalind
Copy link
Member Author

nalind commented Mar 24, 2021

Rebased.

@nalind
Copy link
Member Author

nalind commented Apr 6, 2021

Rebased. This may solve bug 1937069, where "chown" during a build takes a while, which I suspect due to copy-up that can be avoided if we use the metacopy feature, which this PR attempts to detect and use if it's supported.

@nalind
Copy link
Member Author

nalind commented Jun 23, 2021

/retest

@nalind
Copy link
Member Author

nalind commented Jun 28, 2021

/retest

@rhatdan
Copy link
Contributor

rhatdan commented Jun 28, 2021

@gabemontero @coreydaley PTAL

We don't use the CreateContainer() method of
pkg/build/builder.DockerClient, so remove it and the implementation we
have of it.

Signed-off-by: Nalin Dahyabhai <[email protected]>
@gabemontero
Copy link
Contributor

per discussion in team scrum including with @nalind

/assign @adambkaplan

to minimally get the approve label on this, and if he has cycles decide whether it lgtm to him

I'll be taking another pass after posting this comment

Copy link
Contributor

@gabemontero gabemontero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the request for a comment below @nalind ideally some added unit tests in https://github.com/openshift/builder/blob/master/pkg/build/builder/cmd/builder_test.go and https://github.com/openshift/builder/blob/master/pkg/build/builder/daemonless_test.go to validate that the new parameters being passed in are processed correctly seems warranted.

Otherwise looks great / thanks !!

{"overlay", ``, nil},
{"overlay", `["mount_program=/usr/bin/fuse-overlayfs","mountopt=metacopy=on"]`, builderCanUseOverlayFUSE},
{"overlay", `["mount_program=/usr/bin/fuse-overlayfs"]`, builderCanUseOverlayFUSE},
{"vfs", "", nil},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah the order of precedence @nalind mentioned in office hours today :-)

cmd/main.go Outdated
if err != nil {
fmt.Printf("Error setting up service CA cert: %v", err)
os.Exit(1)
if !inUserNamespace() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why do we only do this when not in user namespaces?

How is this setup provided when we are in user namespaces?

Looking for a comment that answers those questions. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've set up a user namespace, then we know that we weren't started as UID 0 in the container, so these locations are not writable. If we try to write to those locations anyway, we hit a permissions error and exit. It's an unsolved problem for cases where we start the container as a UID other than 0.

Copy link
Contributor

@gabemontero gabemontero Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK capture that in a comment in the code and I'm good on this one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as non-root we can't write to /etc/docker/certs.d? Interesting, may need to determine how we can have the build controller mount these for us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also impact our entrypoint script that generates the trust bundle?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mounting the certs.d subpath of the build-ca-bundles volume to /etc/docker/certs.d should work, since the directory isn't even in the builder image to begin with. The cluster.crt that we drop in /etc/pki/tls/certs is trickier, since the directory already includes content that we want to still have there. We should be able to mount the file itself from the volume that contains the service account secrets, but I haven't personally tried that.
This may all be moot, though, since I think ensuring the directories exist and setting them group-writeable in the image will also work, with no code changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to try creating the directories and setting them group-writable in the Dockerfile instead. That'll probably work better for the entrypoint script, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK if !inUserNamespace() condition has been removed along with the Dockerfile changes @nalind mentions ^^

Copy link
Contributor

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

There are definite issues if we run in a user namespace around certs that need to be addressed. The scope is way beyond this PR and should be captured in JIRA.

cmd/builder.go Outdated

flags := cmd.Flags()
flags.StringVar(&isolation, "isolation", isolation, "type of process `isolation` to use for RUN instructions")
flags.StringVar(&ociRuntime, "oci-runtime", ociRuntime, "runtime to invoke for OCI isolation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if --oci-runtime is not specified? Is there a reasonable default for the empty string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can retrieve a default from github.com/containers/common/pkg/config's DefaultConfig() function. I'll add that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd/builder.go Outdated

flags := cmd.Flags()
flags.StringVar(&isolation, "isolation", isolation, "type of process `isolation` to use for RUN instructions")
flags.StringVar(&ociRuntime, "oci-runtime", ociRuntime, "runtime to invoke for OCI isolation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func builderDefaultIsolation() (string, error) {
if inUserNamespace() {
// We probably don't have enough privileges to use a proper runtime.
return "chroot", nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are in a user namespace, does this make us vulnerable to https://access.redhat.com/security/cve/CVE-2021-20182?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an unprivileged container, the builder container will already be have a device control group configured for limiting which devices the kernel will allow direct access to, even when the nodes exist, and /dev will have the smaller set of entries in it which we grant to unprivileged containers. I expect we'll also be running without CAP_MKNOD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw @adambkaplan we also dove into the CVE ramifications back in March with #220 (comment)

bottom line: "we appear good"

that said, I think @nalind 's comment here ^^ would make a good comment when we see this code again in 6 months and ask the same question, not remembering this discussion :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding Lean on the container that we're in being itself unprivileged (i.e., having control groups including the device cgroup configured for us, being provided with a smaller set of devices in /dev, and likely running without a few capabilities that we don't need), and reduce the degree of isolation that we try to use to what we know we're actually allowed to do in an unprivileged container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @nalind

cmd/main.go Outdated
if err != nil {
fmt.Printf("Error setting up service CA cert: %v", err)
os.Exit(1)
if !inUserNamespace() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as non-root we can't write to /etc/docker/certs.d? Interesting, may need to determine how we can have the build controller mount these for us.

cmd/main.go Outdated
if err != nil {
fmt.Printf("Error setting up service CA cert: %v", err)
os.Exit(1)
if !inUserNamespace() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also impact our entrypoint script that generates the trust bundle?

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 1, 2021
@gabemontero
Copy link
Contributor

/approve

There are definite issues if we run in a user namespace around certs that need to be addressed. The scope is way beyond this PR and should be captured in JIRA.

I'm good with that ^^ but for me this highlights the need for comments I noted earlier. Even a TODO comment with a ref to whatever Jira gets opened for this.

@nalind
Copy link
Member Author

nalind commented Jul 2, 2021

/test e2e-aws-builds

Add CLI flags for controlling the type of process isolation we use for
RUN instructions, and for controlling the storage driver and options
that we pass to it, along with logic for guessing the right defaults.
Drop environment variables that controlled these settings.

Have the docker and sti builders re-exec themselves in user namespaces
when they're passed --uidmap or --gidmap flags, or run as non-root or
without CAP_SYS_ADMIN, so that they will be able to create new
namespaces when processing RUN instructions.

Signed-off-by: Nalin Dahyabhai <[email protected]>
@gabemontero
Copy link
Contributor

last e2e-aws-builds run died in aws install

all the prior comments have been addressed, so

/lgtm

but let's keep an eye on whether possible future e2e-aws-build fail in the actual tests

also, given the fact that we are touching the cert path, I'm launching a

/test e2e-aws-proxy

(pretty sure we made that an optional possibility with this repo)

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2021
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2021
@nalind
Copy link
Member Author

nalind commented Jul 2, 2021

/test e2e-aws-proxy

@gabemontero
Copy link
Contributor

any hint what changed from f0950af to f1de8bd @nalind ?

@gabemontero
Copy link
Contributor

the one e2e-aws failure is suppose to be a low rate flake according to sippy with

Associated Bugs: 1972490 1973266 1978829

/test e2e-aws

@nalind
Copy link
Member Author

nalind commented Jul 12, 2021

any hint what changed from f0950af to f1de8bd @nalind ?

That should have dropped a helper function that was introduced in an earlier version of this PR but which is no longer used.

@gabemontero
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, gabemontero, nalind, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants