-
Notifications
You must be signed in to change notification settings - Fork 462
Bug 1978581: remove run-level info from operators namespaces #2655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you please update the description & commit to include something explaining why this is necessary? /hold |
|
Closing in favor of #2627 which only contains appropriate on-prem run-level changes. We don't need the change this PR has to the MCO itself which has been in effect basically forever. |
|
I believe this is associated with https://bugzilla.redhat.com/show_bug.cgi?id=1978581, which on top of https://bugzilla.redhat.com/show_bug.cgi?id=1805488 has a bit more context. Reopening for now and associating. Please add some more description to the commit/PR message, or mark WIP if this is just for testing, thanks! |
|
@yuvalk: This pull request references Bugzilla bug 1978581, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@yuvalk: This pull request references Bugzilla bug 1978581, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
With this change the pods for the We need to set the correct privileges for each ServiceAccount. Opened a PR on this merging branch: yuvalk#1 That then allows the pods to run without errors and behave in the expected way: |
|
/retest |
1 similar comment
|
/retest |
|
/retest |
|
/retest-required |
|
/retest |
yuqi-zhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good opportunity to revisit this. Some comments below. Also please rebase on master when you get a chance. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions on the clusterrole changes:
- is it required as part of the runlevel removal? or is it parallel nice-to-have? Is that tracked somewhere via e.g. another bug?
- I see that in the commits, you added them for controller, but then removed it from a later commit. What is the reasoning behind that?
- Could you squash the commits and provide more context via the commit message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It is, otherwise the MCO will fail to start, well specifically the daemon will fail to start due to a lack of permissions. For example:
2s Warning FailedCreate daemonset/machine-config-daemon Error creating: pods "machine-config-daemon-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.hostNe...
From this requirement:
| privileged: true |
machine-config-operator/manifests/machineconfigdaemon/daemonset.yaml
Lines 81 to 82 in 3328619
| hostNetwork: true | |
| hostPID: true |
I gave the config-server hostnetwork to allow that to run as well, from memory due to
| hostNetwork: true |
2. The controller doesn't appear to need any special permissions. I originally added the permissions, but then realized the controller didn't need anything special. Looking at the deployment.yaml for it, there's nothing there that would require any permissions so can run restricted.
With the daemon running privileged, server with hostnetwork and controller as restricted all pods seem to start ok:
# oc get pods
NAME READY STATUS RESTARTS AGE
machine-config-controller-b559cbf8-2wp7v 1/1 Running 0 80m
machine-config-daemon-4s9tb 2/2 Running 0 80m
machine-config-daemon-r82bm 2/2 Running 0 51m
machine-config-daemon-z5xfb 2/2 Running 0 80m
machine-config-daemon-zzc9d 2/2 Running 0 80m
machine-config-operator-68cdd77788-kxd92 1/1 Running 1 102m
machine-config-server-gqrsn 1/1 Running 0 79m
machine-config-server-kmz9b 1/1 Running 0 79m
machine-config-server-ttp5t 1/1 Running 0 79m
An no errors related to admission:
# oc get events | grep -i failed
81m Warning FailedMount pod/machine-config-daemon-4s9tb MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found
81m Warning FailedMount pod/machine-config-daemon-z5xfb MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found
81m Warning FailedMount pod/machine-config-daemon-zzc9d MountVolume.SetUp failed for volume "proxy-tls" : secret "proxy-tls" not found
102m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 no nodes available to schedule pods
102m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 no nodes available to schedule pods
84m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/2 nodes are available: 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
83m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
81m Warning FailedScheduling pod/machine-config-operator-68cdd77788-kxd92 0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
# oc get events | grep -i error
# oc get events | grep -i scc
104m Normal CreatedSCCRanges namespace/openshift-machine-config-operator created SCC ranges
Of course @yuqi-zhang if you know anything in the controller that might need something more than the restricted SCC then we can def alter that. But from testing the MCO seems to run ok, doesn't fail to admit.
- @yuvalk happy to squish and add to the commit message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed comments! One follow up, since I don't know runlevels that well, are there any other risks to revoking runlevel1 for the MC* pods, other than hostnetwork and privileged being required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main risk is just permissions. As when using runlevels it pretty much bypasses SCC entirely, the pods basically run with no restrictions, hence why we're pretty keen to minimize what uses it.
Given it passes admission - means that it has the required permissions - I think it's fairly unlikely we'll see if fail later on. As far as I'm aware we've had no issues with components like kni removing it either: #2627
It used to be a requirement back in the early days of OpenShift (4.0) due to a slow startup delay, but since about 4.6 there's no requirement for it now it seems. We'll be looking at pushing for the CVO to do the same too: openshift/cluster-version-operator#623, now that we've confirmed that it should be fine to remove there as well.
|
/cc @cgwalters will be nice to know your thoughts on this PR |
|
@sinnykumari: GitHub didn't allow me to request PR reviews from the following users: to, your, will, be, on, this, PR, nice, know, thoughts. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Honestly I don't have any more expertise in this subject than anyone else here. I read through the PR and some of the linked bugzilla comments, and based on that I am fine to say AFAICS the runlevel dates to the primordial epoch of bf6ac87 - it may be it was never necessary. |
|
Dang it. Similar to CVO openshift/cluster-version-operator#623 we're going have to account for upgrades too. curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2655/pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade/1456000980163235840/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/namespaces.json | jq '.items[].metadata | select(.name == "openshift-machine-config-operator").labels'
{
"kubernetes.io/metadata.name": "openshift-machine-config-operator",
"name": "openshift-machine-config-operator",
"olm.operatorgroup.uid/59920ef9-792f-461b-b4cd-364910998083": "",
"openshift.io/cluster-monitoring": "true",
"openshift.io/run-level": "1",
"pod-security.kubernetes.io/audit": "privileged",
"pod-security.kubernetes.io/enforce": "privileged",
"pod-security.kubernetes.io/warn": "privileged"
} |
|
/hold |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
14 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@yuvalk: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@yuvalk: All pull requests linked via external trackers have merged: Bugzilla bug 1978581 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- What I did
- How to verify it
- Description for the changelog