Skip to content

AUTH-2: reenable PodSecurity on privileged level#1308

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
stlaz:psa_baseline
Apr 15, 2022
Merged

AUTH-2: reenable PodSecurity on privileged level#1308
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
stlaz:psa_baseline

Conversation

@stlaz
Copy link
Copy Markdown
Contributor

@stlaz stlaz commented Feb 8, 2022

This enables the PodSecurity admission with baseline level (for now). I suspect it might break tests, which is where @s-urbaniak's upstream involvement in the e2e framework might come handy.

/cc @s-urbaniak
/cc @openshift/openshift-team-auth-maintainers

@openshift-ci openshift-ci Bot requested a review from s-urbaniak February 8, 2022 15:07
@s-urbaniak
Copy link
Copy Markdown
Contributor

yes i see the reason failures in the e2e logs, however audit doesn't contain them, i.e. from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1308/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws-serial/1491066281317634048:

% gzcat *-audit-*log.gz | jq '. | select(.annotations | has("pod-security.kubernetes.io/audit"))' | wc -l
       0

@stlaz can you check why we have no audit entries for pod-security?

@s-urbaniak
Copy link
Copy Markdown
Contributor

xref'ing openshift/kubernetes#1128

this is the backport PR which I am keeping up-to-date with upstream.

@s-urbaniak
Copy link
Copy Markdown
Contributor

@stlaz i see you bumped audit entries to restricted. i was surprised not to see violations against baseline though given e2e tests didn't pass 🤔

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Feb 9, 2022

I did not notice your earlier comments and was bumping the audit and warn levels in the meantime to see what's going to block us from moving further to the more restrictive level. I'll check the audit logs now and see whether we've got something.

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Feb 9, 2022

gzcat *-audit-*log.gz | jq '. | select(.annotations | has("pod-security.kubernetes.io/audit"))' | wc -l

@s-urbaniak I see what's the issue - upstream changed the annotation to pod-security.kubernetes.io/audit-violations in the meantime.

@s-urbaniak
Copy link
Copy Markdown
Contributor

s-urbaniak commented Feb 10, 2022

changed the annotation to pod-security.kubernetes.io/audit-violations in the meantime.

doh, you're right. now we get nice result of e2e test offenders:

% gzcat *-audit-*log.gz | jq '. | select(.annotations | has("pod-security.kubernetes.io/audit-violations")) | {namespace: .objectRef.namespace, podSecurityReason: .annotations["pod-security.kubernetes.io/audit-violations"] }' | jq -C -s --sort-keys '. | unique' | head -n 20
[
  {
    "namespace": "e2e-disruption-131",
    "podSecurityReason": "would violate PodSecurity \"baseline:latest\": hostPort (container \"donothing\" uses hostPort 5555)"
  },
  {
    "namespace": "e2e-disruption-8214",
    "podSecurityReason": "would violate PodSecurity \"baseline:latest\": hostPort (container \"donothing\" uses hostPort 5555)"
  },
  {
    "namespace": "e2e-persistent-local-volumes-test-4715",
    "podSecurityReason": "would violate PodSecurity \"baseline:latest\": host namespaces (hostNetwork=true), hostPath volumes (volume \"rootfs\"), privileged (container \"agnhost-container\" must not set securityContext.privileged=true)"
  },
  {
    "namespace": "e2e-persistent-local-volumes-test-5428",
    "podSecurityReason": "would violate PodSecurity \"baseline:latest\": host namespaces (hostNetwork=true), hostPath volumes (volume \"rootfs\"), privileged (container \"agnhost-container\" must not set securityContext.privileged=true)"
  },
  {
    "namespace": "e2e-provisioning-2371",
    "podSecurityReason": "would violate PodSecurity \"baseline:latest\": host namespaces (hostNetwork=true), hostPath volumes (volume \"rootfs\"), privileged (container \"agnhost-container\" must not set securityContext.privileged=true)"
...

I am wrapping up the upstream e2e tests, once I have a 👍 from ligitt we can ping on merging the downstream backport and fix the downstream e2e tests.

@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor

retesting, as the first one most likely didn't have the necessary changes yet. I will:

  1. Verify the most recent retest has UPSTREAM: 106454: SQUASH: test/e2e: let e2e tests specify pod security admiss… kubernetes#1128 included
  2. Iterate over the failures and start fixing tests

@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Mar 22, 2022

/retest

7 similar comments
@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Mar 22, 2022

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Mar 26, 2022

/retest
o/origin vendor bump with fixed tests merged, let's see what's gonna happen

@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest

@stlaz stlaz changed the title AUTH-2: reenable PodSecurity on baseline level AUTH-2: reenable PodSecurity on privileged level Mar 29, 2022
@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Mar 29, 2022

Moving the enforce to privileged so that we can reiterate on tests easier. We'll move to restricted soon(-ish maybe).

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Mar 29, 2022

/retest
weeee, infra quota issues again

@s-urbaniak
Copy link
Copy Markdown
Contributor

/lgtm

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Mar 30, 2022

/retest-required

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Apr 13, 2022

/hold
another o/origin PR is brewing

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2022
@s-urbaniak
Copy link
Copy Markdown
Contributor

/retest-required

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Apr 13, 2022

/hold cancel
/retest-required
let's pick up the latest o/origin changes

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2022
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Apr 14, 2022

/hold
@s-urbaniak discovered some discrepancies in the upgrade test code

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2022
@stlaz
Copy link
Copy Markdown
Contributor Author

stlaz commented Apr 14, 2022

/hold cancel
/retest-required

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2022
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 14, 2022

@stlaz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator-disruptive-single-node 77ec527 link false /test e2e-aws-operator-disruptive-single-node
ci/prow/e2e-aws-single-node 77ec527 link false /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 85a0325 into openshift:master Apr 15, 2022
JoaoBraveCoding added a commit to JoaoBraveCoding/cluster-monitoring-operator that referenced this pull request May 5, 2022
Issue https://bugzilla.redhat.com/show_bug.cgi?id=2079292

Problem: In Kubernetes v1.23 the PodSecurity feature is enabled by
default, OpenShift is making use of this feature and by default some
expectations are already set [1], these expectations end up
triggering a lot of alarms in the logs of CMO and workload running
on the UWM namespace.
[1] openshift/cluster-kube-apiserver-operator#1308

Solution: Add PodSecurity namespace labels to allow workloads running
on UWM  to have full privileges which will silence the alerts
JoaoBraveCoding added a commit to JoaoBraveCoding/cluster-monitoring-operator that referenced this pull request May 11, 2022
Issue https://bugzilla.redhat.com/show_bug.cgi?id=2079292

Problem: In Kubernetes v1.23 the PodSecurity feature is enabled by
default, OpenShift is making use of this feature and by default some
expectations are already set [1], these expectations end up
triggering a lot of alarms in the logs of CMO and workload running
on the UWM namespace.
[1] openshift/cluster-kube-apiserver-operator#1308

Solution: Add PodSecurity namespace labels to allow workloads running
on UWM  to have full privileges which will silence the alerts
JoaoBraveCoding added a commit to JoaoBraveCoding/cluster-monitoring-operator that referenced this pull request May 11, 2022
Issue https://bugzilla.redhat.com/show_bug.cgi?id=2079292

Problem: In Kubernetes v1.23 the PodSecurity feature is enabled by
default, OpenShift is making use of this feature and by default some
expectations are already set [1], these expectations end up
triggering a lot of alarms in the logs of CMO and workload running
on the UWM namespace.
[1] openshift/cluster-kube-apiserver-operator#1308

Solution: Annotate deployments that violate the PodSecurity mode
restricted on the UWM namespace
JoaoBraveCoding added a commit to JoaoBraveCoding/cluster-monitoring-operator that referenced this pull request May 11, 2022
Issue https://bugzilla.redhat.com/show_bug.cgi?id=2079292

Problem: In Kubernetes v1.23 the PodSecurity feature is enabled by
default, OpenShift is making use of this feature and by default some
expectations are already set [1], these expectations end up
triggering a lot of alarms in the logs of CMO and workload running
on the UWM namespace.
[1] openshift/cluster-kube-apiserver-operator#1308

Solution: Annotate deployments that violate the PodSecurity mode
restricted on the UWM namespace
JoaoBraveCoding added a commit to JoaoBraveCoding/cluster-monitoring-operator that referenced this pull request May 19, 2022
Issue https://bugzilla.redhat.com/show_bug.cgi?id=2079292

Problem: In Kubernetes v1.23 the PodSecurity feature is enabled by
default, OpenShift is making use of this feature and by default some
expectations are already set [1], these expectations end up
triggering a lot of alarms in the logs of CMO and workload running
on the UWM namespace.
[1] openshift/cluster-kube-apiserver-operator#1308

Solution: Annotate deployments that violate the PodSecurity mode
restricted on the UWM namespace
JoaoBraveCoding added a commit to JoaoBraveCoding/cluster-monitoring-operator that referenced this pull request May 24, 2022
Issue https://bugzilla.redhat.com/show_bug.cgi?id=2079292

Problem: In Kubernetes v1.23 the PodSecurity feature is enabled by
default, OpenShift is making use of this feature and by default some
expectations are already set [1], these expectations end up
triggering a lot of alarms in the logs of CMO and workload running
on the UWM namespace.
[1] openshift/cluster-kube-apiserver-operator#1308

Solution: Annotate deployments that violate the PodSecurity mode
restricted on the UWM namespace and update the SCC used by the SA of the
prometheus instance in UWM
JoaoBraveCoding added a commit to JoaoBraveCoding/cluster-monitoring-operator that referenced this pull request May 24, 2022
Issue https://bugzilla.redhat.com/show_bug.cgi?id=2079292

Problem: In Kubernetes v1.23 the PodSecurity feature is enabled by
default, OpenShift is making use of this feature and by default some
expectations are already set [1], these expectations end up
triggering a lot of alarms in the logs of CMO and workload running
on the UWM namespace.
[1] openshift/cluster-kube-apiserver-operator#1308

Solution: Annotate deployments that violate the PodSecurity mode
restricted on the UWM namespace and update the SCC used by the SA of the
prometheus instance in UWM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants