pkg/assets/create: Allow mid-create cancels #226

wking · 2019-02-05T19:38:09Z

As mentioned in the interval comment I'm removing, the REST client configuration includes its own requests-per-second configuration. There's no need for an additional 200ms sleep between loops on top of that.

I'm also passing the context into create() so we can cancel without waiting for all of the manifests to be attempted. At the default 5 requests per second, the installer's 30+ manifests would take 6+ seconds in the "negligable network time" best-case, and we can be more responsive to cancels than that.

I've also shifted retryCount into the for loop, because that's where I'm used to seeing iteration counters. And I've dropped logging on returned errors, because the caller will have the error and can make
that logging decision itself. Logging should be reserved for internal progress updates, discarded errors, and other activity which the caller cannot access.

CC @mfojtik, @sttts

abhinavdahiya · 2019-02-05T20:50:38Z

pkg/assets/create/creater.go

+		if retryCount > 1 {
+			select {
+			case <-ctx.Done():
+				if lastCreateError == nil {


nit: completely ignorable ;)

if lastCreateError != nil { return lastCreateError } return ctx.Err()

IMO that is more Return the last observed set of errors from the create process instead of timeout error. ...

nit: completely ignorable ;)

if lastCreateError != nil { return lastCreateError } return ctx.Err()

Isn't that just reversing the order (but not changing the behavior) of my current:

if lastCreateError == nil { return ctx.Err() } return lastCreateError

? I'm fine updating if you like, but when I have both if and else cases (with a flattened else here) with short bodies, I prefer the positive == to the negative != so you don't have a double-negative in your head for the else case.

pkg/assets/create/creater.go

openshift-merge-robot · 2019-02-06T06:28:22Z

/retest

sttts · 2019-02-06T09:06:41Z

pkg/assets/create/create_test.go

 	}
-	if !strings.Contains(out.String(), "unable to get REST mapping") {
-		t.Fatalf("expected error logged to output when verbose is on, got: %s\n", out.String())
+	if !strings.Contains(err.Error(), "unable to get REST mapping") {


@mfojtik I should commented earlier: we need a test for this that the error string did not change in apimachinery.

@sttts if it change, this test will fail (i don't this we rely on this message in the code).

sttts · 2019-02-06T10:04:36Z

pkg/assets/create/creater.go

 	)
-	err = wait.PollImmediateUntil(interval, func() (bool, error) {
-		retryCount++
+	for retryCount := 1; ; retryCount++ {


wait.PollImmediateUntil is the canonical kubernetes way of such a loop. I would stay with it.

Yeah, it is used and well tested in many kube components and packages.

wait.PollImmediateUntil is the canonical kubernetes way of such a loop. I would stay with it.

And just set the delay to zero? It doesn't seem useful enough if all it does is wrap an initial Context check.

mfojtik · 2019-02-06T10:06:55Z

pkg/assets/create/create_test.go

 	if err == nil {
 		t.Fatal("expected error creating kubeapiserverconfig resource, got none")
 	}
-	if !strings.Contains(out.String(), "unable to get REST mapping") {


this was in fact testing that the messages are seen with verbose:true ;-) the error is just aggregation on those messages.

this was in fact testing that the messages are seen with verbose:true ;-) the error is just aggregation on those messages.

As I discuss in the commit message, I think the returned error is more important. But I can update this to test both.

mfojtik · 2019-02-06T10:08:26Z

pkg/assets/create/creater.go


-	// Default QPS in client (when not specified) is 5 requests/per second
-	// This specifies the interval between "create-all-resources", no need to make this configurable.
-	interval := 200 * time.Millisecond


This interval just match the client QPS, to the PollImmediateUntil(200*time.Millisecond) just does what client does (there is no delay).

mfojtik · 2019-02-06T10:12:58Z

pkg/assets/create/creater.go

+		if i > 0 {
+			select {
+			case <-ctx.Done():
+				return ctx.Err(), false


i'm not fan of cancelling in middle of the bulk run. i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run) instead of cancelling and error out.

i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run)...

Then don't cancel it? I don't think we want to re-interpret "please give up and gently shut down" as "I'm starting to get bored, but go ahead and run for another several minutes if you think it might work" ;).

i'm not fan of cancelling in middle of the bulk run. i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run) instead of cancelling and error out.

I may let this play out. I wouldn't gate it on that if above though.

I wouldn't gate it on that if above though.

So cancel immediately if requested by the Context, even if we've done nothing? I'm fine with that, but just want to make sure I understand before rerolling.

mfojtik · 2019-02-06T10:14:42Z

pkg/assets/create/creater.go

+		if err == nil {
+			return nil
+		}
+		if err == context.Canceled || err == context.DeadlineExceeded {


i believe this is handled in context.Done() channel already.

i believe this is handled in context.Done() channel already.

Yeah, this appears to just duplicate the existing code.

sttts · 2019-02-06T10:20:13Z

I like the overall change of adding the context handling in create. But let's strip the change down to the actual behaviour change, and not mix it with coding style changes (especially the wait. PollImmediateUntil change). This makes the PR twice as big as necessary, an 10 times harder to review.

openshift-merge-robot · 2019-02-06T13:02:59Z

/retest

pkg/assets/create/creater.go

Passing the context into create() allows callers to cancel without waiting for all of the manifests to be attempted. At the default 5 requests per second, the installer's 30+ manifests would take 6+ seconds in the "negligable network time" best-case, and we can be more responsive to cancels than that. There's a bit more of a dance around lastCreateError now that create can also return boring canceled and timed-out errors, while we still want to prefer the last actual manifest-creation failure.

wking · 2019-02-07T00:57:34Z

I still think that PollImmediateUntil is more complicated than for here, but since that seems contentious I've pushed bfedebc -> 85e4130 with a minimal pivot to a cancel-able create as requested by @sttts. There's a bit more of a dance around lastCreateError now that create can also return boring canceled and timed-out errors, while we still want to prefer the last actual manifest-creation failure.

mfojtik · 2019-02-07T11:57:41Z

/lgtm

openshift-ci-robot · 2019-02-07T11:57:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfojtik, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/assets/create/OWNERS~~ [mfojtik]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/assets/create: Allow mid-create cancels

openshift-ci-robot requested review from deads2k and mfojtik February 5, 2019 19:38

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 5, 2019

wking mentioned this pull request Feb 5, 2019

assets: add creater based on dynamic client #220

Merged

wking changed the title ~~spkg/assets/create: Drop PollImmediateUntil and allow mid-create cancels~~ pkg/assets/create: Drop PollImmediateUntil and allow mid-create cancels Feb 5, 2019

wking force-pushed the mid-create-cancel branch from bb2e0bd to 7d5d9ca Compare February 5, 2019 19:41

abhinavdahiya reviewed Feb 5, 2019

View reviewed changes

pkg/assets/create/creater.go Show resolved Hide resolved

wking force-pushed the mid-create-cancel branch from 7d5d9ca to bfedebc Compare February 5, 2019 21:22

sttts reviewed Feb 6, 2019

View reviewed changes

mfojtik reviewed Feb 6, 2019

View reviewed changes

deads2k reviewed Feb 6, 2019

View reviewed changes

pkg/assets/create/creater.go Show resolved Hide resolved

wking force-pushed the mid-create-cancel branch from bfedebc to 9bb26a0 Compare February 7, 2019 00:54

openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 7, 2019

wking force-pushed the mid-create-cancel branch from 9bb26a0 to 85e4130 Compare February 7, 2019 00:56

wking changed the title ~~pkg/assets/create: Drop PollImmediateUntil and allow mid-create cancels~~ pkg/assets/create: Allow mid-create cancels Feb 7, 2019

openshift-ci-robot assigned mfojtik Feb 7, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 7, 2019

openshift-merge-robot merged commit 5af1f6f into openshift:master Feb 7, 2019

bertinatto pushed a commit to bertinatto/library-go that referenced this pull request Jul 2, 2020

Merge pull request openshift#226 from wking/mid-create-cancel

78d49a8

pkg/assets/create: Allow mid-create cancels

wking deleted the mid-create-cancel branch August 24, 2020 18:57

pkg/assets/create: Allow mid-create cancels #226

pkg/assets/create: Allow mid-create cancels #226

Uh oh!

Conversation

wking commented Feb 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-merge-robot commented Feb 6, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wking Feb 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sttts commented Feb 6, 2019

Uh oh!

openshift-merge-robot commented Feb 6, 2019

Uh oh!

Uh oh!

wking commented Feb 7, 2019

Uh oh!

mfojtik commented Feb 7, 2019

Uh oh!

openshift-ci-robot commented Feb 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wking Feb 6, 2019 •

edited

Loading