Add controller retry support scaffolding #1091

ironcladlou · 2015-02-20T20:07:50Z

Introduce a reusable controller package which facilitates safe retries
for controllers which make use of cache.FIFO.

Port the DeploymentConfigController to use the new support as a reference
implementation of the pattern.

ironcladlou · 2015-02-20T20:09:33Z

bparees · 2015-02-20T20:23:41Z

pkg/controller/controller.go

let's add support for maxRetries==-1 meaning "forever"

Done. Also, I hate using a constructor func rather than field initialization for this thing, but then the map would have to be lazily initialized. Ugh!

bparees · 2015-02-20T20:34:18Z

Needs more testing around the retry behavior, but the implementation looks sound to me. I am slightly biased though.

smarterclayton · 2015-02-20T22:21:50Z

pkg/controller/controller.go

Don't do this.

Make the decision to retry be an interface passed to RetryController

Good idea... done. Was on the fence about actually defining an interface, ended up just passing a function and providing a stock implementation.

ironcladlou · 2015-02-23T14:32:30Z

Note: Before a merge, I want to add more tests for this (especially integration tests). To avoid too much rework on my part, I'm putting them off for now until we have more consensus.

bparees · 2015-02-23T20:06:13Z

pkg/controller/controller.go

// a maxRetries value of -1 is interpreted as retry forever.

bparees · 2015-02-23T20:14:01Z

absurdly minor nits on my part, lgtm.

ironcladlou · 2015-02-23T22:14:18Z

I added a nice test, TestRetryController_realFifoEventOrdering, which does a miniature integration-like test with the RetryController+QueueRetryManager+FIFO to demonstrate event order guarantees with an asynchronous queue producer. Unless there are any other special requests, I feel like this is in good shape for another merge review. Going to run it through a jenkins [test].

openshift-bot · 2015-02-23T22:17:00Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_openshift3/1183/)

pmorie · 2015-02-23T22:42:06Z

Provisionally LGTM but I need to review the latest code.

pmorie · 2015-02-23T23:55:20Z

Godeps/_workspace/src/github.com/GoogleCloudPlatform/kubernetes/pkg/client/cache/fifo_test.go

Should there be a test like TestFIFO_basic that randomly does an AddIfNotPresent instead of just Add ?

pmorie · 2015-02-23T23:56:45Z

I really like the retry controller thing, this is informing my work on delta-based stores and controllers that do this.

soltysh · 2015-02-24T10:40:07Z

@ironcladlou great work with this 👍

smarterclayton · 2015-02-24T16:02:39Z

pkg/controller/controller.go

Don't need these anymore

Can you elaborate a little more? The default ShouldRetry impl is RetryOnRetryableError which checks for RetryableError, and the deployment controllers return it.

The deployment controllers should be implementing their own retry behavior which is either retry on error, or return a special type. Wrapping an error like this is questionable - instead handle should return opaque errors, and the controller retry handler should make a decision and either log "not retrying" or "retrying". Using someone else's error is coupling unrelated code.

On Feb 24, 2015, at 8:05 AM, Dan Mace [email protected] wrote:

In pkg/controller/controller.go:

if err != nil {

if c.ShouldRetry(resource, err) {

c.RetryManager.Retry(resource)

return

}

}

c.RetryManager.Forget(resource)
+}

+// RetryableError is an error that is recoverable and which should
+// signal another attempt at processing a resource.
+type RetryableError struct {

err error
+}

+func (e *RetryableError) Error() string {
Can you elaborate a little more? The default ShouldRetry impl is RetryOnRetryableError which checks for RetryableError, and the deployment controllers return it.

—
Reply to this email directly or view it on GitHub.

if you force me to implement my own retry behavior in my controller, it's going to look exactly like this default implementation... i'm going to return an error that clearly indicates "retry me" and write a retrier that recognizes such errors. i'm not going to separate my logic such that in one place i'm trying to do things, and in another place i'm trying to figure out whether i should retry that thing.

If someone using the RetryController wants to establish their own error taxonomy and retry handler around it, they're free to do so and the abstractions here will facilitate them. However, all our existing use cases in deployment and builds (I think) can use this OOTB error type and handler for now, IMO.

I could be misunderstanding the proposed alternative design... maybe try once more to clarify just to be sure?

I think I might see what you mean now, @smarterclayton. Will add a new commit for discussion based on what you're saying.

ironcladlou · 2015-02-25T14:35:07Z

@smarterclayton @bparees @pmorie

I added a temporary commit to demonstrate a different way of dealing with errors. The new approach addresses @smarterclayton's feedback (I think), but also incorporates a strategy I learned about from the Go standard library based on this very relevant golang-nuts discussion. Check out the way custom errors and error transience are handled in the golang net package. I used the stdlib approach for error definition/taxonomy, and used @smarterclayton's idea to move the decision-making to the factory's ShouldRetry implementation.

Let me know what you think. I won't bother changing any tests until we get this part nailed down.

pmorie · 2015-02-25T15:19:18Z

@ironcladlou I like it. 👊

Add an AddIfNotPresent function to support single producer/consumer retry scenarios. Provides the consumer with a means to prefer items already enqueued by the producer at the point of retry.

ironcladlou · 2015-02-26T15:39:08Z

@smarterclayton @bparees
The latest revision moves DeploymentConfigController/Factory into its own package. DeploymentConfigController has a private fatal error type which the factory recognizes to drive retries in a simple manner.

Still haven't touched tests yet in response to this latest change. When the structure of this one controller looks good to all, I'll fix the tests, get this PR merged, and then start porting all the other deploy controllers in followup PRs.

smarterclayton · 2015-02-26T16:09:43Z

I'm about to take off - that was my last substantive issue (no one outside of controller and its tests should know about the errors, and caking util.HandleError before you eat one).

On Feb 26, 2015, at 7:39 AM, Dan Mace [email protected] wrote:

@smarterclayton @bparees
The latest revision moves DeploymentConfigController/Factory into its own package. DeploymentConfigController has a private fatal error type which the factory recognizes to drive retries in a simple manner.

Still haven't touched tests yet. When the structure of this one controller looks good to all, I'll fix the tests, get this PR merged, and then start porting all the other deploy controllers in followup PRs.

—
Reply to this email directly or view it on GitHub.

ironcladlou · 2015-02-26T16:42:54Z

[test]

pmorie · 2015-02-26T16:50:19Z

Going to take a last pass at this before merge

Introduce a reusable controller package which facilitates safe retries for controllers which make use of cache.FIFO. Port the DeploymentConfigController to use the new support as a reference implementation of the pattern.

pmorie · 2015-02-26T18:09:58Z

LGTM

[merge]

openshift-bot · 2015-02-26T18:12:01Z

Evaluated for origin up to 6c68632

openshift-bot · 2015-02-26T18:12:09Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_openshift3/1050/) (Image: devenv-fedora_906)

Merged by openshift-bot

Followup to openshift#1091 in support of openshift#824. Summary of the changes: * Each controller is now in its own package (`pkg/deploy/controller/$type`) with a normalized name (`controller.go`, `factory.go`, etc.) * `DeploymentController` has been split into `DeploymentController` and `DeployerPodController` * All controllers now propagate semantic errors (e.g. fatal or nonfatal) * Improved private interface usage throughout the controllers * All controllers are now wrapped in a `RetryController` via their corresponding factories

bparees · 2015-02-28T16:53:57Z

pkg/deploy/controller/deploymentconfig/factory.go

seems backwards that you're logging the error when you retry, but not logging it if it's fatal.

Yeah that's backwards. Logging transients is good, but fatal is critical.

On Feb 28, 2015, at 11:54 AM, Ben Parees [email protected] wrote:

In pkg/deploy/controller/deploymentconfig/factory.go:

return factory.KubeClient.ReplicationControllers(namespace).Create(deployment)

},

},

makeDeployment: func(config _deployapi.DeploymentConfig) (_kapi.ReplicationController, error) {

return deployutil.MakeDeployment(config, factory.Codec)

},

}

return &controller.RetryController{

Queue: queue,

RetryManager: controller.NewQueueRetryManager(queue, cache.MetaNamespaceKeyFunc, 1),

ShouldRetry: func(obj interface{}, err error) bool {

if _, isFatal := err.(fatalError); isFatal {

return false

}

kutil.HandleError(err)
seems backwards that you're logging the error when you retry, but not logging it if it's fatal.

—
Reply to this email directly or view it on GitHub.

I caught and fixed this in the followup (#1170). Good eye!

good but slow apparently :)

…service-catalog/' changes from 8f07b7b..7e650e7 7e650e7 origin build: add origin tooling f32eec2 unit test for ./pkg/rest/core/fake rest client, addresses openshift#860 Test ready to be reviewed (openshift#1113) e388aee explicitly always prefer latest OSBAPI Version (openshift#1138) 962429e Merge branch 'pr/1135' b6ee7ef fix rbac cb1beb9 Merge branch 'pr/1131' ecc5c01 Update Code of Conduct (openshift#1137) e7c5ab3 address one more PR comment ddcbbad address PR comments 652a83b fix test expectation to match the new error message for missing service class 33417cc address PR comments 565fccf Use the chart name instead of the namespace (openshift#1102) bc61919 Add new terminal failure binding condition (openshift#1057) 4e642d5 Added more detailed instructions on how to setup the repo (openshift#1114) bdaea23 update unit tests (openshift#1123) 88a9642 validate the apiserver options (openshift#1116) b0af5fc fix whitespace in the copyright section dee796a generated type changes ef585c4 Rename the directory from default to defaultservicename to conform to go style guide. Wire admission controller into the apiserver 0b5d6c6 add firewall troubleshooting section (openshift#1040) fd9e6bc Fix Typo in Events Code of Conduct (openshift#1126) ebe6506 Fix Typo in Terminology (openshift#1128) 0038b1e Merge branch 'pr/1122' 8411f31 make deprovisioning an instance asynchronously not fall-through to synchronous deprovision (openshift#1067) 76c1d93 handle failures from list and test the not ready condition, cleanup 9241296 finish unit tests, passing ed75774 Minor fixes based on go report card 9911e8d Add GoReport Widget (openshift#1121) dd24e5c clean up old cruft 08276c6 generated file changes 6489d90 Implement the default plan in admission controller a6bb576 Code: Instance/Binding parameters from secret (openshift#1079) 10bb148 Update generated files (openshift#1115) 5291e6f v0.0.15 (openshift#1118) 28a1ea6 Merge branch 'pr/1104' bb4a2d2 Merge branch 'pr/1097' 1c14a90 push all arch images on release tags (openshift#1108) b587b2c Improve log output for deprovision 8887561 Remove PodPreset embedding from Binding (openshift#1030) 1abdcc8 Adjust helm/tiller installation instructions (openshift#1091) f636f99 only skip tls verify if not behind the aggregator (openshift#1101) 43b40ab controller_broker unit test bullet-proofing openshift#1077 (openshift#1099) bb596b8 Use data store instead of database (openshift#1100) 04fa477 Implementation: Support for Bearer token auth between Service Catalog and brokers (openshift#1053) 9e46d3c refactor Jenkins e2e tests (openshift#1082) 1f0a41e remove old/misleading comments about only doing soft delete if it's "our turn--" i.e. only if the finalizer we care about is at the head of the finalizers list. 5c1d9b8 Update OSB client (openshift#1085) a6e80ea Only do work for instances from a single queue (openshift#1074) 2bd85d6 Merge branch 'pr/1076' e324287 Tweaks to the walkthrough for local-up-cluster d8b7899 Add a note to the walkthrough about getting bindings when using the aggregator (openshift#1078) ea44cf1 msg on Environment Variables to set for e2e (openshift#1070) d15554a Merge branch 'pr/1017' faf966e Add comment re: async race condition in integration tests ed2e096 v0.0.14 (openshift#1071) fc84ffd more PR feedback 283bed4 Add integration tests and some error checking; PR feedback 903a7a7 Add terminal condition for instance and do not retry failed provisions REVERT: 8f07b7b origin: add required patches git-subtree-dir: cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog git-subtree-split: 7e650e7e39c3fc79a8ecc061cce2a70e899406ff

* Adjust helm/tiller installation instructions - tiller needs access later to install service-catalog objects * suggested updates

ironcladlou force-pushed the deploy-retries branch 2 times, most recently from 25d3075 to f8300d2 Compare February 20, 2015 20:09

ironcladlou force-pushed the deploy-retries branch from f8300d2 to c3d52ff Compare February 20, 2015 20:12

bparees reviewed Feb 20, 2015
View reviewed changes

ironcladlou force-pushed the deploy-retries branch from c3d52ff to 578eb49 Compare February 20, 2015 20:30

ironcladlou force-pushed the deploy-retries branch from 578eb49 to b43a252 Compare February 20, 2015 20:56

smarterclayton reviewed Feb 20, 2015
View reviewed changes

ironcladlou force-pushed the deploy-retries branch from b43a252 to e908a4a Compare February 21, 2015 13:23

bparees reviewed Feb 23, 2015
View reviewed changes

pkg/controller/controller.go Outdated

Copy link

Contributor

bparees Feb 23, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// a maxRetries value of -1 is interpreted as retry forever.

ironcladlou force-pushed the deploy-retries branch 6 times, most recently from 97ce277 to 8f0179e Compare February 23, 2015 22:06

pmorie reviewed Feb 23, 2015
View reviewed changes

ironcladlou force-pushed the deploy-retries branch from 8f0179e to 164b4aa Compare February 24, 2015 02:01

smarterclayton reviewed Feb 24, 2015
View reviewed changes

ironcladlou force-pushed the deploy-retries branch 2 times, most recently from ef64bdc to d312958 Compare February 24, 2015 21:33

UPSTREAM: Support AddIfNotPresent function

abd821b

Add an AddIfNotPresent function to support single producer/consumer retry scenarios. Provides the consumer with a means to prefer items already enqueued by the producer at the point of retry.

ironcladlou force-pushed the deploy-retries branch from 5b01476 to d3c81ee Compare February 26, 2015 15:36

ironcladlou force-pushed the deploy-retries branch from d3c81ee to bae61d3 Compare February 26, 2015 16:39

ironcladlou force-pushed the deploy-retries branch from bae61d3 to ddef62d Compare February 26, 2015 17:06

Add controller retry support scaffolding

6c68632

Introduce a reusable controller package which facilitates safe retries for controllers which make use of cache.FIFO. Port the DeploymentConfigController to use the new support as a reference implementation of the pattern.

ironcladlou force-pushed the deploy-retries branch from ddef62d to 6c68632 Compare February 26, 2015 17:07

openshift-bot pushed a commit that referenced this pull request Feb 26, 2015

Merge pull request #1091 from ironcladlou/deploy-retries

174086a

Merged by openshift-bot

openshift-bot merged commit 174086a into openshift:master Feb 26, 2015

ironcladlou deleted the deploy-retries branch February 26, 2015 19:13

This was referenced Feb 26, 2015

Refactor remaining deployment controllers for retry support #1170

Merged

Controller framework kubernetes/kubernetes#4877

Closed

bparees reviewed Feb 28, 2015
View reviewed changes

jpeeler pushed a commit to jpeeler/origin that referenced this pull request Feb 1, 2018

Adjust helm/tiller installation instructions (openshift#1091)

1abdcc8

* Adjust helm/tiller installation instructions - tiller needs access later to install service-catalog objects * suggested updates

Add controller retry support scaffolding #1091

Add controller retry support scaffolding #1091

Uh oh!

Conversation

ironcladlou commented Feb 20, 2015

Uh oh!

ironcladlou commented Feb 20, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bparees commented Feb 20, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ironcladlou commented Feb 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bparees commented Feb 23, 2015

Uh oh!

ironcladlou commented Feb 23, 2015

Uh oh!

openshift-bot commented Feb 23, 2015

Uh oh!

pmorie commented Feb 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmorie commented Feb 23, 2015

Uh oh!

soltysh commented Feb 24, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ironcladlou commented Feb 25, 2015

Uh oh!

pmorie commented Feb 25, 2015

Uh oh!

ironcladlou commented Feb 26, 2015

Uh oh!

smarterclayton commented Feb 26, 2015

Uh oh!

ironcladlou commented Feb 26, 2015

Uh oh!

pmorie commented Feb 26, 2015

Uh oh!

pmorie commented Feb 26, 2015

Uh oh!

openshift-bot commented Feb 26, 2015

Uh oh!

openshift-bot commented Feb 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects