-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Add controller retry support scaffolding #1091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
25d3075 to
f8300d2
Compare
f8300d2 to
c3d52ff
Compare
pkg/controller/controller.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add support for maxRetries==-1 meaning "forever"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also, I hate using a constructor func rather than field initialization for this thing, but then the map would have to be lazily initialized. Ugh!
c3d52ff to
578eb49
Compare
|
Needs more testing around the retry behavior, but the implementation looks sound to me. I am slightly biased though. |
578eb49 to
b43a252
Compare
pkg/controller/controller.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the decision to retry be an interface passed to RetryController
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea... done. Was on the fence about actually defining an interface, ended up just passing a function and providing a stock implementation.
b43a252 to
e908a4a
Compare
|
Note: Before a merge, I want to add more tests for this (especially integration tests). To avoid too much rework on my part, I'm putting them off for now until we have more consensus. |
pkg/controller/controller.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// a maxRetries value of -1 is interpreted as retry forever.
|
absurdly minor nits on my part, lgtm. |
97ce277 to
8f0179e
Compare
|
I added a nice test, |
|
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_openshift3/1183/) |
|
Provisionally LGTM but I need to review the latest code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a test like TestFIFO_basic that randomly does an AddIfNotPresent instead of just Add ?
|
I really like the retry controller thing, this is informing my work on delta-based stores and controllers that do this. |
8f0179e to
164b4aa
Compare
|
@ironcladlou great work with this 👍 |
pkg/controller/controller.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need these anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a little more? The default ShouldRetry impl is RetryOnRetryableError which checks for RetryableError, and the deployment controllers return it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deployment controllers should be implementing their own retry behavior which is either retry on error, or return a special type. Wrapping an error like this is questionable - instead handle should return opaque errors, and the controller retry handler should make a decision and either log "not retrying" or "retrying". Using someone else's error is coupling unrelated code.
On Feb 24, 2015, at 8:05 AM, Dan Mace [email protected] wrote:
In pkg/controller/controller.go:
- if err != nil {
if c.ShouldRetry(resource, err) {c.RetryManager.Retry(resource)return}- }
- c.RetryManager.Forget(resource)
+}
+// RetryableError is an error that is recoverable and which should
+// signal another attempt at processing a resource.
+type RetryableError struct {
- err error
+}
+func (e *RetryableError) Error() string {
Can you elaborate a little more? The default ShouldRetry impl is RetryOnRetryableError which checks for RetryableError, and the deployment controllers return it.—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you force me to implement my own retry behavior in my controller, it's going to look exactly like this default implementation... i'm going to return an error that clearly indicates "retry me" and write a retrier that recognizes such errors. i'm not going to separate my logic such that in one place i'm trying to do things, and in another place i'm trying to figure out whether i should retry that thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone using the RetryController wants to establish their own error taxonomy and retry handler around it, they're free to do so and the abstractions here will facilitate them. However, all our existing use cases in deployment and builds (I think) can use this OOTB error type and handler for now, IMO.
I could be misunderstanding the proposed alternative design... maybe try once more to clarify just to be sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might see what you mean now, @smarterclayton. Will add a new commit for discussion based on what you're saying.
ef64bdc to
d312958
Compare
|
@smarterclayton @bparees @pmorie I added a temporary commit to demonstrate a different way of dealing with errors. The new approach addresses @smarterclayton's feedback (I think), but also incorporates a strategy I learned about from the Go standard library based on this very relevant golang-nuts discussion. Check out the way custom errors and error transience are handled in the golang net package. I used the stdlib approach for error definition/taxonomy, and used @smarterclayton's idea to move the decision-making to the factory's Let me know what you think. I won't bother changing any tests until we get this part nailed down. |
|
@ironcladlou I like it. 👊 |
Add an AddIfNotPresent function to support single producer/consumer retry scenarios. Provides the consumer with a means to prefer items already enqueued by the producer at the point of retry.
5b01476 to
d3c81ee
Compare
|
@smarterclayton @bparees Still haven't touched tests yet in response to this latest change. When the structure of this one controller looks good to all, I'll fix the tests, get this PR merged, and then start porting all the other deploy controllers in followup PRs. |
|
I'm about to take off - that was my last substantive issue (no one outside of controller and its tests should know about the errors, and caking util.HandleError before you eat one).
|
d3c81ee to
bae61d3
Compare
|
[test] |
|
Going to take a last pass at this before merge |
bae61d3 to
ddef62d
Compare
Introduce a reusable controller package which facilitates safe retries for controllers which make use of cache.FIFO. Port the DeploymentConfigController to use the new support as a reference implementation of the pattern.
ddef62d to
6c68632
Compare
|
LGTM [merge] |
|
Evaluated for origin up to 6c68632 |
|
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_openshift3/1050/) (Image: devenv-fedora_906) |
Merged by openshift-bot
Followup to openshift#1091 in support of openshift#824. Summary of the changes: * Each controller is now in its own package (`pkg/deploy/controller/$type`) with a normalized name (`controller.go`, `factory.go`, etc.) * `DeploymentController` has been split into `DeploymentController` and `DeployerPodController` * All controllers now propagate semantic errors (e.g. fatal or nonfatal) * Improved private interface usage throughout the controllers * All controllers are now wrapped in a `RetryController` via their corresponding factories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems backwards that you're logging the error when you retry, but not logging it if it's fatal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's backwards. Logging transients is good, but fatal is critical.
On Feb 28, 2015, at 11:54 AM, Ben Parees [email protected] wrote:
In pkg/deploy/controller/deploymentconfig/factory.go:
return factory.KubeClient.ReplicationControllers(namespace).Create(deployment)},},makeDeployment: func(config _deployapi.DeploymentConfig) (_kapi.ReplicationController, error) {return deployutil.MakeDeployment(config, factory.Codec)},- }
- return &controller.RetryController{
Queue: queue,RetryManager: controller.NewQueueRetryManager(queue, cache.MetaNamespaceKeyFunc, 1),ShouldRetry: func(obj interface{}, err error) bool {if _, isFatal := err.(fatalError); isFatal {return false} seems backwards that you're logging the error when you retry, but not logging it if it's fatal.kutil.HandleError(err)—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I caught and fixed this in the followup (#1170). Good eye!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good but slow apparently :)
…service-catalog/' changes from 8f07b7b..7e650e7 7e650e7 origin build: add origin tooling f32eec2 unit test for ./pkg/rest/core/fake rest client, addresses openshift#860 Test ready to be reviewed (openshift#1113) e388aee explicitly always prefer latest OSBAPI Version (openshift#1138) 962429e Merge branch 'pr/1135' b6ee7ef fix rbac cb1beb9 Merge branch 'pr/1131' ecc5c01 Update Code of Conduct (openshift#1137) e7c5ab3 address one more PR comment ddcbbad address PR comments 652a83b fix test expectation to match the new error message for missing service class 33417cc address PR comments 565fccf Use the chart name instead of the namespace (openshift#1102) bc61919 Add new terminal failure binding condition (openshift#1057) 4e642d5 Added more detailed instructions on how to setup the repo (openshift#1114) bdaea23 update unit tests (openshift#1123) 88a9642 validate the apiserver options (openshift#1116) b0af5fc fix whitespace in the copyright section dee796a generated type changes ef585c4 Rename the directory from default to defaultservicename to conform to go style guide. Wire admission controller into the apiserver 0b5d6c6 add firewall troubleshooting section (openshift#1040) fd9e6bc Fix Typo in Events Code of Conduct (openshift#1126) ebe6506 Fix Typo in Terminology (openshift#1128) 0038b1e Merge branch 'pr/1122' 8411f31 make deprovisioning an instance asynchronously not fall-through to synchronous deprovision (openshift#1067) 76c1d93 handle failures from list and test the not ready condition, cleanup 9241296 finish unit tests, passing ed75774 Minor fixes based on go report card 9911e8d Add GoReport Widget (openshift#1121) dd24e5c clean up old cruft 08276c6 generated file changes 6489d90 Implement the default plan in admission controller a6bb576 Code: Instance/Binding parameters from secret (openshift#1079) 10bb148 Update generated files (openshift#1115) 5291e6f v0.0.15 (openshift#1118) 28a1ea6 Merge branch 'pr/1104' bb4a2d2 Merge branch 'pr/1097' 1c14a90 push all arch images on release tags (openshift#1108) b587b2c Improve log output for deprovision 8887561 Remove PodPreset embedding from Binding (openshift#1030) 1abdcc8 Adjust helm/tiller installation instructions (openshift#1091) f636f99 only skip tls verify if not behind the aggregator (openshift#1101) 43b40ab controller_broker unit test bullet-proofing openshift#1077 (openshift#1099) bb596b8 Use data store instead of database (openshift#1100) 04fa477 Implementation: Support for Bearer token auth between Service Catalog and brokers (openshift#1053) 9e46d3c refactor Jenkins e2e tests (openshift#1082) 1f0a41e remove old/misleading comments about only doing soft delete if it's "our turn--" i.e. only if the finalizer we care about is at the head of the finalizers list. 5c1d9b8 Update OSB client (openshift#1085) a6e80ea Only do work for instances from a single queue (openshift#1074) 2bd85d6 Merge branch 'pr/1076' e324287 Tweaks to the walkthrough for local-up-cluster d8b7899 Add a note to the walkthrough about getting bindings when using the aggregator (openshift#1078) ea44cf1 msg on Environment Variables to set for e2e (openshift#1070) d15554a Merge branch 'pr/1017' faf966e Add comment re: async race condition in integration tests ed2e096 v0.0.14 (openshift#1071) fc84ffd more PR feedback 283bed4 Add integration tests and some error checking; PR feedback 903a7a7 Add terminal condition for instance and do not retry failed provisions REVERT: 8f07b7b origin: add required patches git-subtree-dir: cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog git-subtree-split: 7e650e7e39c3fc79a8ecc061cce2a70e899406ff
* Adjust helm/tiller installation instructions - tiller needs access later to install service-catalog objects * suggested updates
Introduce a reusable controller package which facilitates safe retries
for controllers which make use of cache.FIFO.
Port the DeploymentConfigController to use the new support as a reference
implementation of the pattern.