-
Notifications
You must be signed in to change notification settings - Fork 213
OCPBUGS-17418: Handle cache.DeletedFinalStateUnknown #952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ncdc: This pull request references Jira Issue OCPBUGS-17418, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@petr-muller: This pull request references Jira Issue OCPBUGS-17418, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
petr-muller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
LGTM, holding because Lala also said he's gonna take a look
| if !ok { | ||
| klog.Errorf("Unexpected type %T", obj) | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
I have looked into our other DeleteFunc handlers and it seems like none of them need a similar treatment because they do not use the object from the informer but just queue something else for processing.
DeleteFunc: func(obj interface{}) { ctrl.queue.Add(key) }, cluster-version-operator/pkg/cvo/cvo.go
Lines 504 to 506 in 9c0a4ac
DeleteFunc: func(obj interface{}) { optr.queue.Add(workQueueKey) }, DeleteFunc: func(obj interface{}) { c.queue.Add("cluster") },
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And just added 2763543 to make that more obvious.
The exposure has been around since 4.8 added admin acks [1], and we're not sure if watch-interruption has increased recently, relevant-ConfigMap deletion has increased, and/or our panic-monitoring has improved. [1]: https://github.com/openshift/cluster-version-operator/pull/647/files#diff-45df4901f88b6867d7fbf7e50690f376812864a0b85ec80dda5f77e6e19097b9R396-R402 Signed-off-by: Andy Goldstein <[email protected]>
|
Run the suite that panicked, to ensure we excercise the adjusted ConfigMap delete handler (even if we probably miss the well-timed watch-interruption that triggered this panic): /payload-job periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial |
|
@wking: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6e37dea0-3538-11ee-9524-c74145fe9934-0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New commit message suggestion from @wking which looks good to me.
This is to hint to any future developers that there is no existing operand handling in the functions, and they need to add all of that themselves, including (hopefully they realize) DeletedFinalStateUnknown handling. That's a fair bit of hope to get DeletedFinalStateUnknown in place, but calling that specific aspect out in a comment or something would risk them reading the comment and assuming we were giving them an exhaustive list of guards to consider, when they may need additional guards to match distant-future informer changes (they hopefully read [1]). And the amount of churn expected for our other delete handlers is low, so investing even more time in trying to protect those future devs seems uneccessary.
[1]: https://pkg.go.dev/k8s.io/client-go/tools/cache#ResourceEventHandler
This is to hint to any future developers that there is no existing event object handling in the functions, and they need to add all of that themselves, including (hopefully they realize) DeletedFinalStateUnknown handling. That's a fair bit of hope to get DeletedFinalStateUnknown in place, but calling that specific aspect out in a comment or something would risk them reading the comment and assuming we were giving them an exhaustive list of guards to consider, when they may need additional guards to match distant-future informer changes (they hopefully read [1]). And the amount of churn expected for our other delete handlers is low, so investing even more time in trying to protect those future devs seems uneccessary. [1]: https://pkg.go.dev/k8s.io/client-go/tools/cache#ResourceEventHandler Signed-off-by: Andy Goldstein <[email protected]>
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LalatenduMohanty, ncdc, petr-muller The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Serial rehearsal hiccuped while reporting Telemetry, but that's unrelated to this change, which suggests that the new code doesn't break under common ConfigMap-deletion conditions. We don't know for sure whether there was a watch hiccup overlapping with a deletion, but 🤷, we can keep an eye out for that in post-merge periodics. /hold cancel |
|
@ncdc: Jira Issue OCPBUGS-17418: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-17418 has been moved to the MODIFIED state. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
No description provided.