Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Feb 5, 2019

As mentioned in the interval comment I'm removing, the REST client configuration includes its own requests-per-second configuration. There's no need for an additional 200ms sleep between loops on top of that.

I'm also passing the context into create() so we can cancel without waiting for all of the manifests to be attempted. At the default 5 requests per second, the installer's 30+ manifests would take 6+ seconds in the "negligable network time" best-case, and we can be more responsive to cancels than that.

I've also shifted retryCount into the for loop, because that's where I'm used to seeing iteration counters. And I've dropped logging on returned errors, because the caller will have the error and can make
that logging decision itself. Logging should be reserved for internal progress updates, discarded errors, and other activity which the caller cannot access.

CC @mfojtik, @sttts

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 5, 2019
@wking wking changed the title spkg/assets/create: Drop PollImmediateUntil and allow mid-create cancels pkg/assets/create: Drop PollImmediateUntil and allow mid-create cancels Feb 5, 2019
@wking wking force-pushed the mid-create-cancel branch from bb2e0bd to 7d5d9ca Compare February 5, 2019 19:41
if retryCount > 1 {
select {
case <-ctx.Done():
if lastCreateError == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: completely ignorable ;)

if lastCreateError != nil {
    return lastCreateError
}
return ctx.Err()

IMO that is more Return the last observed set of errors from the create process instead of timeout error. ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: completely ignorable ;)

if lastCreateError != nil {
    return lastCreateError
}
return ctx.Err()

Isn't that just reversing the order (but not changing the behavior) of my current:

if lastCreateError == nil {
  return ctx.Err()
}
return lastCreateError

? I'm fine updating if you like, but when I have both if and else cases (with a flattened else here) with short bodies, I prefer the positive == to the negative != so you don't have a double-negative in your head for the else case.

@wking wking force-pushed the mid-create-cancel branch from 7d5d9ca to bfedebc Compare February 5, 2019 21:22
@openshift-merge-robot
Copy link
Contributor

/retest

}
if !strings.Contains(out.String(), "unable to get REST mapping") {
t.Fatalf("expected error logged to output when verbose is on, got: %s\n", out.String())
if !strings.Contains(err.Error(), "unable to get REST mapping") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfojtik I should commented earlier: we need a test for this that the error string did not change in apimachinery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sttts if it change, this test will fail (i don't this we rely on this message in the code).

)
err = wait.PollImmediateUntil(interval, func() (bool, error) {
retryCount++
for retryCount := 1; ; retryCount++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait.PollImmediateUntil is the canonical kubernetes way of such a loop. I would stay with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is used and well tested in many kube components and packages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait.PollImmediateUntil is the canonical kubernetes way of such a loop. I would stay with it.

And just set the delay to zero? It doesn't seem useful enough if all it does is wrap an initial Context check.

if err == nil {
t.Fatal("expected error creating kubeapiserverconfig resource, got none")
}
if !strings.Contains(out.String(), "unable to get REST mapping") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was in fact testing that the messages are seen with verbose:true ;-) the error is just aggregation on those messages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was in fact testing that the messages are seen with verbose:true ;-) the error is just aggregation on those messages.

As I discuss in the commit message, I think the returned error is more important. But I can update this to test both.


// Default QPS in client (when not specified) is 5 requests/per second
// This specifies the interval between "create-all-resources", no need to make this configurable.
interval := 200 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interval just match the client QPS, to the PollImmediateUntil(200*time.Millisecond) just does what client does (there is no delay).

if i > 0 {
select {
case <-ctx.Done():
return ctx.Err(), false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not fan of cancelling in middle of the bulk run. i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run) instead of cancelling and error out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run)...

Then don't cancel it? I don't think we want to re-interpret "please give up and gently shut down" as "I'm starting to get bored, but go ahead and run for another several minutes if you think it might work" ;).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not fan of cancelling in middle of the bulk run. i would rather give the loop the extra time to potentially get positive outcome (maybe it will succeed this run) instead of cancelling and error out.

I may let this play out. I wouldn't gate it on that if above though.

Copy link
Member Author

@wking wking Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't gate it on that if above though.

So cancel immediately if requested by the Context, even if we've done nothing? I'm fine with that, but just want to make sure I understand before rerolling.

if err == nil {
return nil
}
if err == context.Canceled || err == context.DeadlineExceeded {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this is handled in context.Done() channel already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this is handled in context.Done() channel already.

Yeah, this appears to just duplicate the existing code.

@sttts
Copy link
Contributor

sttts commented Feb 6, 2019

I like the overall change of adding the context handling in create. But let's strip the change down to the actual behaviour change, and not mix it with coding style changes (especially the wait. PollImmediateUntil change). This makes the PR twice as big as necessary, an 10 times harder to review.

@openshift-merge-robot
Copy link
Contributor

/retest

@wking wking force-pushed the mid-create-cancel branch from bfedebc to 9bb26a0 Compare February 7, 2019 00:54
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 7, 2019
Passing the context into create() allows callers to cancel without
waiting for all of the manifests to be attempted.  At the default 5
requests per second, the installer's 30+ manifests would take 6+
seconds in the "negligable network time" best-case, and we can be more
responsive to cancels than that.

There's a bit more of a dance around lastCreateError now that create
can also return boring canceled and timed-out errors, while we still
want to prefer the last actual manifest-creation failure.
@wking wking force-pushed the mid-create-cancel branch from 9bb26a0 to 85e4130 Compare February 7, 2019 00:56
@wking
Copy link
Member Author

wking commented Feb 7, 2019

I still think that PollImmediateUntil is more complicated than for here, but since that seems contentious I've pushed bfedebc -> 85e4130 with a minimal pivot to a cancel-able create as requested by @sttts. There's a bit more of a dance around lastCreateError now that create can also return boring canceled and timed-out errors, while we still want to prefer the last actual manifest-creation failure.

@wking wking changed the title pkg/assets/create: Drop PollImmediateUntil and allow mid-create cancels pkg/assets/create: Allow mid-create cancels Feb 7, 2019
@mfojtik
Copy link
Contributor

mfojtik commented Feb 7, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfojtik, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 7, 2019
@openshift-merge-robot openshift-merge-robot merged commit 5af1f6f into openshift:master Feb 7, 2019
bertinatto pushed a commit to bertinatto/library-go that referenced this pull request Jul 2, 2020
pkg/assets/create: Allow mid-create cancels
@wking wking deleted the mid-create-cancel branch August 24, 2020 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants