Skip to content

Conversation

@bentito
Copy link

@bentito bentito commented Oct 17, 2025

Added a wait step so the router service account’s RBAC settles before we create or update routes that use external certificates. The new helper impersonates the router SA and polls for get/list/watch access on the referenced secret, which eliminates the Forbidden errors that were flaking CI when the admission webhook fired during RBAC propagation

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 17, 2025
@openshift-ci-robot
Copy link

@bentito: This pull request references Jira Issue OCPBUGS-62929, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Added a wait step so the router service account’s RBAC settles before we create or update routes that use external certificates. The new helper impersonates the router SA and polls for get/list/watch access on the referenced secret, which eliminates the Forbidden errors that were flaking CI when the admission webhook fired during RBAC propagation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from candita and miheer October 17, 2025 17:38
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bentito
Once this PR has been reviewed and has the lgtm label, please assign candita for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bentito
Copy link
Author

bentito commented Oct 17, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 17, 2025
@openshift-ci-robot
Copy link

@bentito: This pull request references Jira Issue OCPBUGS-62929, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 17, 2025
@openshift-ci openshift-ci bot requested a review from lihongan October 17, 2025 17:39
@openshift-trt
Copy link

openshift-trt bot commented Oct 17, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: e3f4c0f

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] reflector doesn't support receiving resources as Tables [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by dynamic client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by informers when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by metadatainformer when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.

New tests seen in this PR at sha: e3f4c0f

  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] reflector doesn't support receiving resources as Tables [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by dynamic client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by informers when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by metadatainformer when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]

@bentito
Copy link
Author

bentito commented Oct 18, 2025

/retest

1 similar comment
@bentito
Copy link
Author

bentito commented Oct 19, 2025

/retest

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has a more detailed description than the commit message. Can you add the PR description to the commit message, and also add a link to https://issues.redhat.com/browse/OCPBUGS-62929?

Do we need a similar wait for the tests that delete RBAC or secrets? I don't know whether those tests have been flaky, but it seems to me that we might have race conditions in those tests too.

Ensure the RouteExternalCertificate tests wait until the openshift-ingress/router serviceaccount can get/list/watch the external certificate secret before creating or patching routes so the admission webhook no longer races RBAC propagation.

OCPBUGS-62929

https://issues.redhat.com/browse/OCPBUGS-62929
@bentito
Copy link
Author

bentito commented Oct 21, 2025

Do we need a similar wait for the tests that delete RBAC or secrets? I don't know whether those tests have been flaky, but it seems to me that we might have race conditions in those tests too.

I don't think so, here's why:

  • Secret deletions already flow through checkRouteStatus, which polls until the router reports ExternalCertificateValidationFailed, so we’re effectively waiting for the controller to observe the change (test/extended/router/external_certificate.go:239).
  • RBAC deletions exercised in the “routes are not reachable” path also use that same status poll, so propagation is covered there (test/extended/router/external_certificate.go:293).
  • The update scenarios that expect an API call to be rejected rely on the apiserver RBAC authorizer evaluating permissions synchronously at request time. Once the role binding is deleted, the admission stack should block the request immediately.

So there’s no extra wait needed, I think, atm.
NB: I also didn't hunt for related flakes though

}

return wait.PollUntilContextTimeout(context.Background(), time.Second, changeTimeoutSeconds*time.Second, false, func(ctx context.Context) (bool, error) {
for _, secretName := range secretNames {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we want to poll for each secret, shouldn't we call thewait.PollUntilContextTimeout function inside the loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing structure already iterates over every secret on each poll tick and only returns true once all of them succeed. Moving the polling inside the loop would serialize the waits and could stretch the total wait time to number of secrets * timeout.

return false, err
}

watcher, err := client.CoreV1().Secrets(namespace).Watch(ctx, metav1.ListOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we use listOpts, which was defined earlier?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can’t safely reuse the earlier listOpts because it has Limit: 1. Watches reject requests that include a limit, so the code intentionally builds a fresh ListOptions without that field for the watch call. Reusing the old struct would risk sending an invalid watch request unless we mutate it first, which would be harder to read than just constructing a new one.

@bentito
Copy link
Author

bentito commented Oct 22, 2025

/retest

1 similar comment
@bentito
Copy link
Author

bentito commented Oct 22, 2025

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

@bentito: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn bf0e17c link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt
Copy link

openshift-trt bot commented Oct 22, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: bf0e17c

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift High - "Import the release payload "nightly-arm64" from an external source" is a new test that was not present in all runs against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial High - "Import the release payload "nightly-arm64" from an external source" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: bf0e17c

  • "Import the release payload "nightly-arm64" from an external source" [Total: 4, Pass: 4, Fail: 0, Flake: 0]

@bentito bentito requested a review from Miciah October 23, 2025 12:31
@bentito
Copy link
Author

bentito commented Oct 23, 2025

@Miciah : The cycle before, there were 5 failing e2e but none for this flake in question, and currently we have 1 failing e2e and not b/c of this flake. Can you take another review pass?

@bentito
Copy link
Author

bentito commented Oct 23, 2025

/assign @Miciah

@candita
Copy link
Contributor

candita commented Oct 30, 2025

/assign @rfredette

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants