-
Notifications
You must be signed in to change notification settings - Fork 81
feat: replace StatusPoller w/ StatusWatcher #572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I still have some tests to write and refactoring to do, but I got all the tests passing locally. |
eba7e3d to
d760ea7
Compare
|
/test cli-utils-presubmit-master-stress |
192d01d to
0d50a15
Compare
e408395 to
5b850da
Compare
|
Extracted #574 |
|
Extracted #575 |
5b850da to
50ec259
Compare
|
Fixed the flaky test with a new |
|
Comments addressed. Tests passed. Please take another look. |
mortent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/lgtm |
|
/hold Block until we get a release with k8s v1.24 out. |
ash2k
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work! I've left a few comments. This PR adds quite a few TODOs. Are you planning to address those later?
| // TODO: Retry with backoff if in namespace-scoped mode, to allow CRDs & namespaces to be created asynchronously | ||
| type ObjectStatusReporter struct { | ||
| // InformerFactory is used to build informers | ||
| InformerFactory *DynamicInformerFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we make this an interface for both (future) testing and composability reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a reasonable enhancement, but we can probably wait to change it to use an interface until we add a 2nd impl. I try not to prematurely add interfaces, if I can avoid it. Sometime having the struct itself is more flexible, and in this case the coupling is pretty tight to the SharedIndexInformer impl details. For example, we know that passing an unstructured example means the callbacks will almost always be unstructured and can be safely cast.
|
|
||
| if tombstone, ok := iobj.(cache.DeletedFinalStateUnknown); ok { | ||
| // Last state unknown. Possibly stale. | ||
| // TODO: Should we propegate this uncertainty to the caller? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How tho? The callback interface informer uses is pretty terrible, because it uses interface{} objects as arguments, making it impossible to know what type you're going to get as a caller without looking at the implimentation. This particular pattern of wrapping the object with a struct with a getter is especially bad because it's not a standard object type. I didn't even know it was a possible edge case until an e2e test failed randomly.
The only idea I've had is to maybe add an annotation, but that would be equally hard to discover and would require removal by the receiver.
| id := object.UnstructuredToObjMetadata(obj) | ||
| klog.Warningf("Invalid CRD added: missing group and/or kind: %v", id) | ||
| // Don't return an error, because this should not inturrupt the task queue. | ||
| // TODO: Allow non-fatal errors to be reported using a specific error type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| } | ||
|
|
||
| //go:generate stringer -type=RESTScopeStrategy -linecomment | ||
| type RESTScopeStrategy int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This can probably be byte =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen a go enum use byte. Int is more extensible. But perhaps you're suggesting that this particular enum will never be extended?
Honestly, if the watcher works well, I don't think I'll get back around to the TODOs for a while. The main goal here is performance/responsiveness and reduced memory use, and I don't want perfect to be the enemy of good. This codebase already has a pile of tech debt and the TODO issues I've made keep getting auto-closed, so I figured TODOs in the code were at least a little more persistent and might be discovered again, the next time someone is in this code. I will address some of your comments tho. Thanks for reviewing! |
c1e5a9f to
7af34af
Compare
49705fa to
2e4c358
Compare
- Add DefaultStatusWatcher that wraps DynamicClient and manages
informers for a set of resource objects.
- Supports two modes: root-scoped & namespace-scoped.
- Root-scoped mode uses root-scoped informers to efficiency and
performance.
- Namespace-scoped mode uses namespace-scoped informers to
minimize the permissions needed to run and the size of the
in-memory object cache.
- Automatic mode selects which mode to use based on whether the
objects being watched are in one or multiple namespaces.
This is the default mode, optimizing for performance.
- If CRDs are being watched, the creation/deletion of CRDs can
cause informers for those custom resources to be created/deleted.
- In namespace-scope mode, if namespaces are being watched, the
creation/deletion of namespaces can also trigger informers to
be created/deleted.
- All creates/updates/deletes to CRDs also cause RESTMapper reset.
- Allow pods to be unschedulable for 15s before reporting the
status as Failed. Any update resets the timer.
- Add BlindStatusWatcher for testing and disabling for dry-run.
- Add DynamicClusterReader that wraps DynamicClient.
This is now used to look up generated resources
(ex: Deployment > ReplicaSets > Pods).
- Add DefaultStatusReader which uses a DelegatingStatusReader to
wrap a list of conventional and specific StatusReaders.
This should make it easier to extend the list of StatusReaders.
- Move some pending WaitEvents to be optional in tests, now that
StatusWatcher can resolve their status before the WaitTask starts.
- Add a new Thousand Deployments stress test (10x kind nodes)
- Add some new logs for easier debugging
- Add internal SyncEvent so that apply/delete tasks don't start
until the StatusWatcher has finished initial synchronization.
This helps avoid missing events from actions that happen while
synchronization is incomplete.
- Filter optional pending WaitEvents when testing.
BREAKING CHANGE: Replace StatusPoller w/ StatusWatcher
BREAKING CHANGE: Remove PollInterval (obsolete with watcher)
2e4c358 to
c469493
Compare
ash2k
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks!
mortent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ash2k, karlkfi, mortent The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold v0.30.0 passed all tests in Config Sync, so this this unblocked for merge. |
feat: replace StatusPoller w/ StatusWatcher
informers for a set of resource objects.
performance.
minimize the permissions needed to run and the size of the
in-memory object cache.
objects being watched are in one or multiple namespaces.
This is the default mode, optimizing for performance.
cause informers for those custom resources to be created/deleted.
creation/deletion of namespaces can also trigger informers to
be created/deleted.
status as Failed. Any update resets the timer.
This is now used to look up generated resources
(ex: Deployment > ReplicaSets > Pods).
wrap a list of conventional and specific StatusReaders.
This should make it easier to extend the list of StatusReaders.
StatusWatcher can resolve their status before the WaitTask starts.
until the StatusWatcher has finished initial synchronization.
This helps avoid missing events from actions that happen while
synchronization is incomplete.
BREAKING CHANGE: Replace StatusPoller w/ StatusWatcher
BREAKING CHANGE: Remove PollInterval (obsolete with watcher)