Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion install/0000_80_machine-config-operator_00_namespace.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ metadata:
workload.openshift.io/allowed: "management"
labels:
name: openshift-machine-config-operator
openshift.io/run-level: "1"
openshift.io/run-level: "" # specify no run-level turns it off on install and upgrades
openshift.io/cluster-monitoring: "true"
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
Expand Down
4 changes: 4 additions & 0 deletions manifests/machineconfigdaemon/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ rules:
- apiGroups: ["machineconfiguration.openshift.io"]
resources: ["machineconfigs"]
verbs: ["*"]
- apiGroups: ["security.openshift.io"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions on the clusterrole changes:

  1. is it required as part of the runlevel removal? or is it parallel nice-to-have? Is that tracked somewhere via e.g. another bug?
  2. I see that in the commits, you added them for controller, but then removed it from a later commit. What is the reasoning behind that?
  3. Could you squash the commits and provide more context via the commit message?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It is, otherwise the MCO will fail to start, well specifically the daemon will fail to start due to a lack of permissions. For example:
2s Warning FailedCreate daemonset/machine-config-daemon Error creating: pods "machine-config-daemon-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.hostNe...

From this requirement:

plus the fact it needs
hostNetwork: true
hostPID: true

I gave the config-server hostnetwork to allow that to run as well, from memory due to


2. The controller doesn't appear to need any special permissions. I originally added the permissions, but then realized the controller didn't need anything special. Looking at the deployment.yaml for it, there's nothing there that would require any permissions so can run restricted.

With the daemon running privileged, server with hostnetwork and controller as restricted all pods seem to start ok:

# oc get pods
NAME READY STATUS RESTARTS AGE
machine-config-controller-b559cbf8-2wp7v 1/1 Running 0 80m
machine-config-daemon-4s9tb 2/2 Running 0 80m
machine-config-daemon-r82bm 2/2 Running 0 51m
machine-config-daemon-z5xfb 2/2 Running 0 80m
machine-config-daemon-zzc9d 2/2 Running 0 80m
machine-config-operator-68cdd77788-kxd92 1/1 Running 1 102m
machine-config-server-gqrsn 1/1 Running 0 79m
machine-config-server-kmz9b 1/1 Running 0 79m
machine-config-server-ttp5t 1/1 Running 0 79m

An no errors related to admission:

# oc get events | grep -i failed
81m Warning FailedMount pod/machine-config-daemon-4s9tb MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found
81m Warning FailedMount pod/machine-config-daemon-z5xfb MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found
81m Warning FailedMount pod/machine-config-daemon-zzc9d MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found
102m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 no nodes available to schedule pods
102m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 no nodes available to schedule pods
84m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
83m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
81m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
# oc get events | grep -i error
# oc get events | grep -i scc
104m        Normal    CreatedSCCRanges    namespace/openshift-machine-config-operator     created SCC ranges

Of course @yuqi-zhang if you know anything in the controller that might need something more than the restricted SCC then we can def alter that. But from testing the MCO seems to run ok, doesn't fail to admit.

  1. @yuvalk happy to squish and add to the commit message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comments! One follow up, since I don't know runlevels that well, are there any other risks to revoking runlevel1 for the MC* pods, other than hostnetwork and privileged being required?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main risk is just permissions. As when using runlevels it pretty much bypasses SCC entirely, the pods basically run with no restrictions, hence why we're pretty keen to minimize what uses it.

Given it passes admission - means that it has the required permissions - I think it's fairly unlikely we'll see if fail later on. As far as I'm aware we've had no issues with components like kni removing it either: #2627

It used to be a requirement back in the early days of OpenShift (4.0) due to a slow startup delay, but since about 4.6 there's no requirement for it now it seems. We'll be looking at pushing for the CVO to do the same too: openshift/cluster-version-operator#623, now that we've confirmed that it should be fine to remove there as well.

resourceNames: ["privileged"]
resources: ["securitycontextconstraints"]
verbs: ["use"]
- apiGroups:
- authentication.k8s.io
resources:
Expand Down
4 changes: 4 additions & 0 deletions manifests/machineconfigserver/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,7 @@ rules:
- apiGroups: ["machineconfiguration.openshift.io"]
resources: ["machineconfigs", "machineconfigpools"]
verbs: ["*"]
- apiGroups: ["security.openshift.io"]
resourceNames: ["hostnetwork"]
resources: ["securitycontextconstraints"]
verbs: ["use"]
8 changes: 8 additions & 0 deletions pkg/operator/assets/bindata.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.