-
Notifications
You must be signed in to change notification settings - Fork 4.8k
OCPBUGS-62929: Check router RBAC before external cert ops #30395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@bentito: This pull request references Jira Issue OCPBUGS-62929, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bentito The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@bentito: This pull request references Jira Issue OCPBUGS-62929, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: e3f4c0f
New tests seen in this PR at sha: e3f4c0f
|
|
/retest |
1 similar comment
|
/retest |
e3f4c0f to
eaaec8d
Compare
Miciah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR has a more detailed description than the commit message. Can you add the PR description to the commit message, and also add a link to https://issues.redhat.com/browse/OCPBUGS-62929?
Do we need a similar wait for the tests that delete RBAC or secrets? I don't know whether those tests have been flaky, but it seems to me that we might have race conditions in those tests too.
Ensure the RouteExternalCertificate tests wait until the openshift-ingress/router serviceaccount can get/list/watch the external certificate secret before creating or patching routes so the admission webhook no longer races RBAC propagation. OCPBUGS-62929 https://issues.redhat.com/browse/OCPBUGS-62929
eaaec8d to
bf0e17c
Compare
I don't think so, here's why:
So there’s no extra wait needed, I think, atm. |
| } | ||
|
|
||
| return wait.PollUntilContextTimeout(context.Background(), time.Second, changeTimeoutSeconds*time.Second, false, func(ctx context.Context) (bool, error) { | ||
| for _, secretName := range secretNames { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we want to poll for each secret, shouldn't we call thewait.PollUntilContextTimeout function inside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing structure already iterates over every secret on each poll tick and only returns true once all of them succeed. Moving the polling inside the loop would serialize the waits and could stretch the total wait time to number of secrets * timeout.
| return false, err | ||
| } | ||
|
|
||
| watcher, err := client.CoreV1().Secrets(namespace).Watch(ctx, metav1.ListOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot we use listOpts, which was defined earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can’t safely reuse the earlier listOpts because it has Limit: 1. Watches reject requests that include a limit, so the code intentionally builds a fresh ListOptions without that field for the watch call. Reusing the old struct would risk sending an invalid watch request unless we mutate it first, which would be harder to read than just constructing a new one.
|
/retest |
1 similar comment
|
/retest |
|
@bentito: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: bf0e17c
New tests seen in this PR at sha: bf0e17c
|
|
@Miciah : The cycle before, there were 5 failing e2e but none for this flake in question, and currently we have 1 failing e2e and not b/c of this flake. Can you take another review pass? |
|
/assign @Miciah |
|
/assign @rfredette |
Added a wait step so the router service account’s RBAC settles before we create or update routes that use external certificates. The new helper impersonates the router SA and polls for get/list/watch access on the referenced secret, which eliminates the Forbidden errors that were flaking CI when the admission webhook fired during RBAC propagation