Bug 1978581: remove run-level info from operators namespaces #2655

yuvalk · 2021-07-02T08:05:28Z

- What I did

- How to verify it

- Description for the changelog

kikisdeliveryservice · 2021-07-02T20:42:59Z

Can you please update the description & commit to include something explaining why this is necessary?

/hold

kikisdeliveryservice · 2021-07-02T23:21:41Z

Closing in favor of #2627 which only contains appropriate on-prem run-level changes. We don't need the change this PR has to the MCO itself which has been in effect basically forever.

yuqi-zhang · 2021-07-03T00:31:57Z

I believe this is associated with https://bugzilla.redhat.com/show_bug.cgi?id=1978581, which on top of https://bugzilla.redhat.com/show_bug.cgi?id=1805488 has a bit more context. Reopening for now and associating.

Please add some more description to the commit/PR message, or mark WIP if this is just for testing, thanks!

openshift-ci · 2021-07-03T00:31:57Z

@yuvalk: This pull request references Bugzilla bug 1978581, which is invalid:

expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1978581: remove run-level info from operators namespaces

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2021-07-03T00:32:14Z

@yuvalk: This pull request references Bugzilla bug 1978581, which is invalid:

expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1978581: remove run-level info from operators namespaces

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mcoops · 2021-08-06T00:56:43Z

With this change the pods for the machine-config-operator will fail admission, for example:

LAST SEEN TYPE REASON OBJECT MESSAGE
12s Warning FailedCreate daemonset/machine-config-daemon Error creating: pods "machine-config-daemon-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": ...

We need to set the correct privileges for each ServiceAccount. Opened a PR on this merging branch: yuvalk#1

That then allows the pods to run without errors and behave in the expected way:

# oc get pods
NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-5d5d9f9657-dbxsl   1/1     Running   0          11h
machine-config-daemon-94gfw                  2/2     Running   0          10h
machine-config-daemon-fkzvf                  2/2     Running   0          11h
machine-config-daemon-hvmd7                  2/2     Running   0          11h
machine-config-daemon-m495h                  2/2     Running   0          11h
machine-config-operator-84756558d7-g84bd     1/1     Running   0          11h
machine-config-server-2cccr                  1/1     Running   0          11h
machine-config-server-pzfcn                  1/1     Running   0          11h
machine-config-server-slg5z                  1/1     Running   0          11h

yuvalk · 2021-08-18T22:17:39Z

/retest

yuvalk · 2021-08-22T05:07:07Z

/retest

yuvalk · 2021-08-22T06:14:35Z

/retest

yuvalk · 2021-08-22T06:46:28Z

/retest-required

yuvalk · 2021-08-22T08:22:37Z

/retest

yuqi-zhang

I think this is a good opportunity to revisit this. Some comments below. Also please rebase on master when you get a chance. Thanks!

yuqi-zhang · 2021-11-02T20:49:07Z

manifests/machineconfigdaemon/clusterrole.yaml

A few questions on the clusterrole changes:

is it required as part of the runlevel removal? or is it parallel nice-to-have? Is that tracked somewhere via e.g. another bug?

I see that in the commits, you added them for controller, but then removed it from a later commit. What is the reasoning behind that?

Could you squash the commits and provide more context via the commit message?

It is, otherwise the MCO will fail to start, well specifically the daemon will fail to start due to a lack of permissions. For example:

2s Warning FailedCreate daemonset/machine-config-daemon Error creating: pods "machine-config-daemon-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.hostNe...

From this requirement:

machine-config-operator/manifests/machineconfigdaemon/daemonset.yaml

Line 32 in 3328619

privileged: true

plus the fact it needs

machine-config-operator/manifests/machineconfigdaemon/daemonset.yaml

Lines 81 to 82 in 3328619

hostNetwork: true

hostPID: true

I gave the config-server hostnetwork to allow that to run as well, from memory due to

machine-config-operator/manifests/machineconfigserver/daemonset.yaml

Line 35 in 3328619

hostNetwork: true

2. The controller doesn't appear to need any special permissions. I originally added the permissions, but then realized the controller didn't need anything special. Looking at the deployment.yaml for it, there's nothing there that would require any permissions so can run restricted.

With the daemon running privileged, server with hostnetwork and controller as restricted all pods seem to start ok:

# oc get pods NAME READY STATUS RESTARTS AGE machine-config-controller-b559cbf8-2wp7v 1/1 Running 0 80m machine-config-daemon-4s9tb 2/2 Running 0 80m machine-config-daemon-r82bm 2/2 Running 0 51m machine-config-daemon-z5xfb 2/2 Running 0 80m machine-config-daemon-zzc9d 2/2 Running 0 80m machine-config-operator-68cdd77788-kxd92 1/1 Running 1 102m machine-config-server-gqrsn 1/1 Running 0 79m machine-config-server-kmz9b 1/1 Running 0 79m machine-config-server-ttp5t 1/1 Running 0 79m

An no errors related to admission:

# oc get events | grep -i failed 81m Warning FailedMount pod/machine-config-daemon-4s9tb MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found 81m Warning FailedMount pod/machine-config-daemon-z5xfb MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found 81m Warning FailedMount pod/machine-config-daemon-zzc9d MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found 102m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 no nodes available to schedule pods 102m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 no nodes available to schedule pods 84m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. 83m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. 81m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. # oc get events | grep -i error # oc get events | grep -i scc 104m Normal CreatedSCCRanges namespace/openshift-machine-config-operator created SCC ranges

Of course @yuqi-zhang if you know anything in the controller that might need something more than the restricted SCC then we can def alter that. But from testing the MCO seems to run ok, doesn't fail to admit.

@yuvalk happy to squish and add to the commit message?

Thanks for the detailed comments! One follow up, since I don't know runlevels that well, are there any other risks to revoking runlevel1 for the MC* pods, other than hostnetwork and privileged being required?

I think the main risk is just permissions. As when using runlevels it pretty much bypasses SCC entirely, the pods basically run with no restrictions, hence why we're pretty keen to minimize what uses it.

Given it passes admission - means that it has the required permissions - I think it's fairly unlikely we'll see if fail later on. As far as I'm aware we've had no issues with components like kni removing it either: #2627

It used to be a requirement back in the early days of OpenShift (4.0) due to a slow startup delay, but since about 4.6 there's no requirement for it now it seems. We'll be looking at pushing for the CVO to do the same too: openshift/cluster-version-operator#623, now that we've confirmed that it should be fine to remove there as well.

sinnykumari · 2021-11-04T09:11:10Z

/cc @cgwalters will be nice to know your thoughts on this PR

openshift-ci · 2021-11-04T09:11:13Z

@sinnykumari: GitHub didn't allow me to request PR reviews from the following users: to, your, will, be, on, this, PR, nice, know, thoughts.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @cgwalters will be nice to know your thoughts on this PR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters · 2021-11-09T14:12:37Z

Honestly I don't have any more expertise in this subject than anyone else here. I read through the PR and some of the linked bugzilla comments, and based on that I am fine to say
/approve

AFAICS the runlevel dates to the primordial epoch of bf6ac87 - it may be it was never necessary.

mcoops · 2021-11-10T01:30:22Z

Dang it. Similar to CVO openshift/cluster-version-operator#623 we're going have to account for upgrades too.

curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2655/pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade/1456000980163235840/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/namespaces.json | jq '.items[].metadata | select(.name == "openshift-machine-config-operator").labels'
{
  "kubernetes.io/metadata.name": "openshift-machine-config-operator",
  "name": "openshift-machine-config-operator",
  "olm.operatorgroup.uid/59920ef9-792f-461b-b4cd-364910998083": "",
  "openshift.io/cluster-monitoring": "true",
  "openshift.io/run-level": "1",
  "pod-security.kubernetes.io/audit": "privileged",
  "pod-security.kubernetes.io/enforce": "privileged",
  "pod-security.kubernetes.io/warn": "privileged"
}

mcoops · 2021-11-10T01:30:57Z

/hold

openshift-bot · 2021-11-26T06:18:24Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T06:30:24Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T06:54:22Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T07:06:23Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T07:18:25Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T07:42:24Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T08:06:25Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T08:18:30Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T08:30:25Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T08:54:25Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T09:18:24Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T10:18:24Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T10:30:24Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T10:54:26Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T11:06:25Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

yuvalk · 2021-11-26T11:24:28Z

/retest

openshift-bot · 2021-11-26T12:06:38Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T12:55:24Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-11-26T13:42:50Z

@yuvalk: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-e2e-aws	`8c4dda1`	link	false	`/test okd-e2e-aws`
ci/prow/e2e-aws-workers-rhel7	`8c4dda1`	link	false	`/test e2e-aws-workers-rhel7`
ci/prow/e2e-aws-upgrade-single-node	`8c4dda1`	link	false	`/test e2e-aws-upgrade-single-node`
ci/prow/e2e-aws-disruptive	`8c4dda1`	link	false	`/test e2e-aws-disruptive`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-11-26T13:43:59Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T13:55:26Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T14:07:25Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T14:19:25Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T14:31:27Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T14:43:23Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T15:07:23Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-11-26T15:19:25Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-11-26T15:41:49Z

@yuvalk: All pull requests linked via external trackers have merged:

openshift/machine-config-operator#2655

Bugzilla bug 1978581 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1978581: remove run-level info from operators namespaces

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested review from sinnykumari and yuqi-zhang July 2, 2021 08:05

yuvalk mentioned this pull request Jul 2, 2021

Bug 1973525: [on-prem] Drop runlevel from infra namespaces #2627

Merged

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 2, 2021

kikisdeliveryservice closed this Jul 2, 2021

yuqi-zhang changed the title ~~remove run-level info from operators namespaces~~ Bug 1978581: remove run-level info from operators namespaces Jul 3, 2021

openshift-ci bot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 3, 2021

yuqi-zhang reopened this Jul 3, 2021

yuvalk force-pushed the no_runlevel branch from fe3cff9 to 350bad1 Compare August 18, 2021 12:17

yuvalk force-pushed the no_runlevel branch from 350bad1 to 7078051 Compare August 22, 2021 05:07

yuqi-zhang reviewed Nov 2, 2021

View reviewed changes

yuvalk force-pushed the no_runlevel branch from 7078051 to a29af3d Compare November 3, 2021 20:50

openshift-ci bot requested a review from cgwalters November 4, 2021 09:11

openshift-merge-robot merged commit 2d8c153 into openshift:master Nov 26, 2021

mcoops mentioned this pull request Nov 29, 2021

Bug 2020107: Remove run-level label openshift/cluster-version-operator#623

Merged

Bug 1978581: remove run-level info from operators namespaces #2655

Bug 1978581: remove run-level info from operators namespaces #2655

Uh oh!

Conversation

yuvalk commented Jul 2, 2021

Uh oh!

kikisdeliveryservice commented Jul 2, 2021

Uh oh!

kikisdeliveryservice commented Jul 2, 2021

Uh oh!

yuqi-zhang commented Jul 3, 2021

Uh oh!

openshift-ci bot commented Jul 3, 2021

Uh oh!

openshift-ci bot commented Jul 3, 2021

Uh oh!

mcoops commented Aug 6, 2021

Uh oh!

yuvalk commented Aug 18, 2021

Uh oh!

yuvalk commented Aug 22, 2021

Uh oh!

yuvalk commented Aug 22, 2021

Uh oh!

yuvalk commented Aug 22, 2021

Uh oh!

yuvalk commented Aug 22, 2021

Uh oh!

yuqi-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

yuqi-zhang Nov 2, 2021

Choose a reason for hiding this comment

Uh oh!

mcoops Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

yuqi-zhang Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

mcoops Nov 4, 2021

Choose a reason for hiding this comment

Uh oh!

sinnykumari commented Nov 4, 2021

Uh oh!

openshift-ci bot commented Nov 4, 2021

Uh oh!

cgwalters commented Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcoops commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcoops commented Nov 10, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

Uh oh!

openshift-bot commented Nov 26, 2021

cgwalters commented Nov 9, 2021 •

edited

Loading

mcoops commented Nov 10, 2021 •

edited

Loading

openshift-ci bot commented Nov 26, 2021 •

edited

Loading