Skip to content

Conversation

@ironcladlou
Copy link
Contributor

Introduce a reusable controller package which facilitates safe retries
for controllers which make use of cache.FIFO.

Port the DeploymentConfigController to use the new support as a reference
implementation of the pattern.

@ironcladlou ironcladlou force-pushed the deploy-retries branch 2 times, most recently from 25d3075 to f8300d2 Compare February 20, 2015 20:09
@ironcladlou
Copy link
Contributor Author

@pmorie @bparees PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add support for maxRetries==-1 meaning "forever"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also, I hate using a constructor func rather than field initialization for this thing, but then the map would have to be lazily initialized. Ugh!

@bparees
Copy link
Contributor

bparees commented Feb 20, 2015

Needs more testing around the retry behavior, but the implementation looks sound to me. I am slightly biased though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the decision to retry be an interface passed to RetryController

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea... done. Was on the fence about actually defining an interface, ended up just passing a function and providing a stock implementation.

@ironcladlou
Copy link
Contributor Author

Note: Before a merge, I want to add more tests for this (especially integration tests). To avoid too much rework on my part, I'm putting them off for now until we have more consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// a maxRetries value of -1 is interpreted as retry forever.

@bparees
Copy link
Contributor

bparees commented Feb 23, 2015

absurdly minor nits on my part, lgtm.

@ironcladlou ironcladlou force-pushed the deploy-retries branch 6 times, most recently from 97ce277 to 8f0179e Compare February 23, 2015 22:06
@ironcladlou
Copy link
Contributor Author

I added a nice test, TestRetryController_realFifoEventOrdering, which does a miniature integration-like test with the RetryController+QueueRetryManager+FIFO to demonstrate event order guarantees with an asynchronous queue producer. Unless there are any other special requests, I feel like this is in good shape for another merge review. Going to run it through a jenkins [test].

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_openshift3/1183/)

@pmorie
Copy link
Contributor

pmorie commented Feb 23, 2015

Provisionally LGTM but I need to review the latest code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a test like TestFIFO_basic that randomly does an AddIfNotPresent instead of just Add ?

@pmorie
Copy link
Contributor

pmorie commented Feb 23, 2015

I really like the retry controller thing, this is informing my work on delta-based stores and controllers that do this.

@soltysh
Copy link
Contributor

soltysh commented Feb 24, 2015

@ironcladlou great work with this 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need these anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a little more? The default ShouldRetry impl is RetryOnRetryableError which checks for RetryableError, and the deployment controllers return it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployment controllers should be implementing their own retry behavior which is either retry on error, or return a special type. Wrapping an error like this is questionable - instead handle should return opaque errors, and the controller retry handler should make a decision and either log "not retrying" or "retrying". Using someone else's error is coupling unrelated code.

On Feb 24, 2015, at 8:05 AM, Dan Mace [email protected] wrote:

In pkg/controller/controller.go:

  • if err != nil {
  •   if c.ShouldRetry(resource, err) {
    
  •       c.RetryManager.Retry(resource)
    
  •       return
    
  •   }
    
  • }
  • c.RetryManager.Forget(resource)
    +}

+// RetryableError is an error that is recoverable and which should
+// signal another attempt at processing a resource.
+type RetryableError struct {

  • err error
    +}

+func (e *RetryableError) Error() string {
Can you elaborate a little more? The default ShouldRetry impl is RetryOnRetryableError which checks for RetryableError, and the deployment controllers return it.


Reply to this email directly or view it on GitHub.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you force me to implement my own retry behavior in my controller, it's going to look exactly like this default implementation... i'm going to return an error that clearly indicates "retry me" and write a retrier that recognizes such errors. i'm not going to separate my logic such that in one place i'm trying to do things, and in another place i'm trying to figure out whether i should retry that thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone using the RetryController wants to establish their own error taxonomy and retry handler around it, they're free to do so and the abstractions here will facilitate them. However, all our existing use cases in deployment and builds (I think) can use this OOTB error type and handler for now, IMO.

I could be misunderstanding the proposed alternative design... maybe try once more to clarify just to be sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might see what you mean now, @smarterclayton. Will add a new commit for discussion based on what you're saying.

@ironcladlou ironcladlou force-pushed the deploy-retries branch 2 times, most recently from ef64bdc to d312958 Compare February 24, 2015 21:33
@ironcladlou
Copy link
Contributor Author

@smarterclayton @bparees @pmorie

I added a temporary commit to demonstrate a different way of dealing with errors. The new approach addresses @smarterclayton's feedback (I think), but also incorporates a strategy I learned about from the Go standard library based on this very relevant golang-nuts discussion. Check out the way custom errors and error transience are handled in the golang net package. I used the stdlib approach for error definition/taxonomy, and used @smarterclayton's idea to move the decision-making to the factory's ShouldRetry implementation.

Let me know what you think. I won't bother changing any tests until we get this part nailed down.

@pmorie
Copy link
Contributor

pmorie commented Feb 25, 2015

@ironcladlou I like it. 👊

Add an AddIfNotPresent function to support single producer/consumer
retry scenarios. Provides the consumer with a means to prefer items
already enqueued by the producer at the point of retry.
@ironcladlou
Copy link
Contributor Author

@smarterclayton @bparees
The latest revision moves DeploymentConfigController/Factory into its own package. DeploymentConfigController has a private fatal error type which the factory recognizes to drive retries in a simple manner.

Still haven't touched tests yet in response to this latest change. When the structure of this one controller looks good to all, I'll fix the tests, get this PR merged, and then start porting all the other deploy controllers in followup PRs.

@smarterclayton
Copy link
Contributor

I'm about to take off - that was my last substantive issue (no one outside of controller and its tests should know about the errors, and caking util.HandleError before you eat one).

On Feb 26, 2015, at 7:39 AM, Dan Mace [email protected] wrote:

@smarterclayton @bparees
The latest revision moves DeploymentConfigController/Factory into its own package. DeploymentConfigController has a private fatal error type which the factory recognizes to drive retries in a simple manner.

Still haven't touched tests yet. When the structure of this one controller looks good to all, I'll fix the tests, get this PR merged, and then start porting all the other deploy controllers in followup PRs.


Reply to this email directly or view it on GitHub.

@ironcladlou
Copy link
Contributor Author

[test]

@pmorie
Copy link
Contributor

pmorie commented Feb 26, 2015

Going to take a last pass at this before merge

Introduce a reusable controller package which facilitates safe retries
for controllers which make use of cache.FIFO.

Port the DeploymentConfigController to use the new support as a reference
implementation of the pattern.
@pmorie
Copy link
Contributor

pmorie commented Feb 26, 2015

LGTM

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin up to 6c68632

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_openshift3/1050/) (Image: devenv-fedora_906)

openshift-bot pushed a commit that referenced this pull request Feb 26, 2015
@openshift-bot openshift-bot merged commit 174086a into openshift:master Feb 26, 2015
@ironcladlou ironcladlou deleted the deploy-retries branch February 26, 2015 19:13
ironcladlou added a commit to ironcladlou/origin that referenced this pull request Feb 27, 2015
Followup to openshift#1091 in support of openshift#824.

Summary of the changes:

* Each controller is now in its own package (`pkg/deploy/controller/$type`) with a normalized name (`controller.go`, `factory.go`, etc.)
* `DeploymentController` has been split into `DeploymentController` and `DeployerPodController`
* All controllers now propagate semantic errors (e.g. fatal or nonfatal)
* Improved private interface usage throughout the controllers
* All controllers are now wrapped in a `RetryController` via their corresponding factories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems backwards that you're logging the error when you retry, but not logging it if it's fatal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's backwards. Logging transients is good, but fatal is critical.

On Feb 28, 2015, at 11:54 AM, Ben Parees [email protected] wrote:

In pkg/deploy/controller/deploymentconfig/factory.go:

  •           return factory.KubeClient.ReplicationControllers(namespace).Create(deployment)
    
  •       },
    
  •   },
    
  •   makeDeployment: func(config _deployapi.DeploymentConfig) (_kapi.ReplicationController, error) {
    
  •       return deployutil.MakeDeployment(config, factory.Codec)
    
  •   },
    
  • }
  • return &controller.RetryController{
  •   Queue:        queue,
    
  •   RetryManager: controller.NewQueueRetryManager(queue, cache.MetaNamespaceKeyFunc, 1),
    
  •   ShouldRetry: func(obj interface{}, err error) bool {
    
  •       if _, isFatal := err.(fatalError); isFatal {
    
  •           return false
    
  •       }
    
  •       kutil.HandleError(err)
    
    seems backwards that you're logging the error when you retry, but not logging it if it's fatal.


Reply to this email directly or view it on GitHub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I caught and fixed this in the followup (#1170). Good eye!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good but slow apparently :)

jpeeler pushed a commit to jpeeler/origin that referenced this pull request Aug 17, 2017
…service-catalog/' changes from 8f07b7b..7e650e7

7e650e7 origin build: add origin tooling
f32eec2 unit test for ./pkg/rest/core/fake rest client, addresses openshift#860 Test ready to be reviewed (openshift#1113)
e388aee explicitly always prefer latest OSBAPI Version (openshift#1138)
962429e Merge branch 'pr/1135'
b6ee7ef fix rbac
cb1beb9 Merge branch 'pr/1131'
ecc5c01 Update Code of Conduct (openshift#1137)
e7c5ab3 address one more PR comment
ddcbbad address PR comments
652a83b fix test expectation to match the new error message for missing service class
33417cc address PR comments
565fccf Use the chart name instead of the namespace (openshift#1102)
bc61919 Add new terminal failure binding condition (openshift#1057)
4e642d5 Added more detailed instructions on how to setup the repo (openshift#1114)
bdaea23 update unit tests (openshift#1123)
88a9642 validate the apiserver options (openshift#1116)
b0af5fc fix whitespace in the copyright section
dee796a generated type changes
ef585c4 Rename the directory from default to defaultservicename to conform to go style guide. Wire admission controller into the apiserver
0b5d6c6 add firewall troubleshooting section (openshift#1040)
fd9e6bc Fix Typo in Events Code of Conduct (openshift#1126)
ebe6506 Fix Typo in Terminology (openshift#1128)
0038b1e Merge branch 'pr/1122'
8411f31 make deprovisioning an instance asynchronously not fall-through to synchronous deprovision (openshift#1067)
76c1d93 handle failures from list and test the not ready condition, cleanup
9241296 finish unit tests, passing
ed75774 Minor fixes based on go report card
9911e8d Add GoReport Widget (openshift#1121)
dd24e5c clean up old cruft
08276c6 generated file changes
6489d90 Implement the default plan in admission controller
a6bb576 Code: Instance/Binding parameters from secret (openshift#1079)
10bb148 Update generated files (openshift#1115)
5291e6f v0.0.15 (openshift#1118)
28a1ea6 Merge branch 'pr/1104'
bb4a2d2 Merge branch 'pr/1097'
1c14a90 push all arch images on release tags (openshift#1108)
b587b2c Improve log output for deprovision
8887561 Remove PodPreset embedding from Binding (openshift#1030)
1abdcc8 Adjust helm/tiller installation instructions (openshift#1091)
f636f99 only skip tls verify if not behind the aggregator (openshift#1101)
43b40ab controller_broker unit test bullet-proofing openshift#1077 (openshift#1099)
bb596b8 Use data store instead of database (openshift#1100)
04fa477 Implementation: Support for Bearer token auth between Service Catalog and brokers (openshift#1053)
9e46d3c refactor Jenkins e2e tests (openshift#1082)
1f0a41e remove old/misleading comments about only doing soft delete if it's "our turn--" i.e. only if the finalizer we care about is at the head of the finalizers list.
5c1d9b8 Update OSB client (openshift#1085)
a6e80ea Only do work for instances from a single queue (openshift#1074)
2bd85d6 Merge branch 'pr/1076'
e324287 Tweaks to the walkthrough for local-up-cluster
d8b7899 Add a note to the walkthrough about getting bindings when using the aggregator (openshift#1078)
ea44cf1 msg on Environment Variables to set for e2e (openshift#1070)
d15554a Merge branch 'pr/1017'
faf966e Add comment re: async race condition in integration tests
ed2e096 v0.0.14 (openshift#1071)
fc84ffd more PR feedback
283bed4 Add integration tests and some error checking; PR feedback
903a7a7 Add terminal condition for instance and do not retry failed provisions
REVERT: 8f07b7b origin: add required patches

git-subtree-dir: cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog
git-subtree-split: 7e650e7e39c3fc79a8ecc061cce2a70e899406ff
jpeeler pushed a commit to jpeeler/origin that referenced this pull request Feb 1, 2018
* Adjust helm/tiller installation instructions

 - tiller needs access later to install service-catalog objects

* suggested updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants