-
Notifications
You must be signed in to change notification settings - Fork 254
pkg/assets/create: Allow mid-create cancels #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/assets/create: Allow mid-create cancels #226
Conversation
bb2e0bd to
7d5d9ca
Compare
pkg/assets/create/creater.go
Outdated
| if retryCount > 1 { | ||
| select { | ||
| case <-ctx.Done(): | ||
| if lastCreateError == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: completely ignorable ;)
if lastCreateError != nil {
return lastCreateError
}
return ctx.Err()IMO that is more Return the last observed set of errors from the create process instead of timeout error. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: completely ignorable ;)
if lastCreateError != nil { return lastCreateError } return ctx.Err()
Isn't that just reversing the order (but not changing the behavior) of my current:
if lastCreateError == nil {
return ctx.Err()
}
return lastCreateError? I'm fine updating if you like, but when I have both if and else cases (with a flattened else here) with short bodies, I prefer the positive == to the negative != so you don't have a double-negative in your head for the else case.
7d5d9ca to
bfedebc
Compare
|
/retest |
pkg/assets/create/create_test.go
Outdated
| } | ||
| if !strings.Contains(out.String(), "unable to get REST mapping") { | ||
| t.Fatalf("expected error logged to output when verbose is on, got: %s\n", out.String()) | ||
| if !strings.Contains(err.Error(), "unable to get REST mapping") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfojtik I should commented earlier: we need a test for this that the error string did not change in apimachinery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sttts if it change, this test will fail (i don't this we rely on this message in the code).
pkg/assets/create/creater.go
Outdated
| ) | ||
| err = wait.PollImmediateUntil(interval, func() (bool, error) { | ||
| retryCount++ | ||
| for retryCount := 1; ; retryCount++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait.PollImmediateUntil is the canonical kubernetes way of such a loop. I would stay with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is used and well tested in many kube components and packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait.PollImmediateUntilis the canonical kubernetes way of such a loop. I would stay with it.
And just set the delay to zero? It doesn't seem useful enough if all it does is wrap an initial Context check.
| if err == nil { | ||
| t.Fatal("expected error creating kubeapiserverconfig resource, got none") | ||
| } | ||
| if !strings.Contains(out.String(), "unable to get REST mapping") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was in fact testing that the messages are seen with verbose:true ;-) the error is just aggregation on those messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was in fact testing that the messages are seen with verbose:true ;-) the error is just aggregation on those messages.
As I discuss in the commit message, I think the returned error is more important. But I can update this to test both.
|
|
||
| // Default QPS in client (when not specified) is 5 requests/per second | ||
| // This specifies the interval between "create-all-resources", no need to make this configurable. | ||
| interval := 200 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interval just match the client QPS, to the PollImmediateUntil(200*time.Millisecond) just does what client does (there is no delay).
pkg/assets/create/creater.go
Outdated
| if i > 0 { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return ctx.Err(), false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not fan of cancelling in middle of the bulk run. i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run) instead of cancelling and error out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run)...
Then don't cancel it? I don't think we want to re-interpret "please give up and gently shut down" as "I'm starting to get bored, but go ahead and run for another several minutes if you think it might work" ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not fan of cancelling in middle of the bulk run. i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run) instead of cancelling and error out.
I may let this play out. I wouldn't gate it on that if above though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't gate it on that if above though.
So cancel immediately if requested by the Context, even if we've done nothing? I'm fine with that, but just want to make sure I understand before rerolling.
pkg/assets/create/creater.go
Outdated
| if err == nil { | ||
| return nil | ||
| } | ||
| if err == context.Canceled || err == context.DeadlineExceeded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this is handled in context.Done() channel already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this is handled in
context.Done()channel already.
Yeah, this appears to just duplicate the existing code.
|
I like the overall change of adding the context handling in |
|
/retest |
bfedebc to
9bb26a0
Compare
Passing the context into create() allows callers to cancel without waiting for all of the manifests to be attempted. At the default 5 requests per second, the installer's 30+ manifests would take 6+ seconds in the "negligable network time" best-case, and we can be more responsive to cancels than that. There's a bit more of a dance around lastCreateError now that create can also return boring canceled and timed-out errors, while we still want to prefer the last actual manifest-creation failure.
9bb26a0 to
85e4130
Compare
|
I still think that |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mfojtik, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/assets/create: Allow mid-create cancels
As mentioned in the
intervalcomment I'm removing, the REST client configuration includes its own requests-per-second configuration. There's no need for an additional 200ms sleep between loops on top of that.I'm also passing the context into
create()so we can cancel without waiting for all of the manifests to be attempted. At the default 5 requests per second, the installer's 30+ manifests would take 6+ seconds in the "negligable network time" best-case, and we can be more responsive to cancels than that.I've also shifted
retryCountinto theforloop, because that's where I'm used to seeing iteration counters. And I've dropped logging on returned errors, because the caller will have the error and can makethat logging decision itself. Logging should be reserved for internal progress updates, discarded errors, and other activity which the caller cannot access.
CC @mfojtik, @sttts