Skip to content

Conversation

@cgwalters
Copy link
Member

Saw this in a log:

I0211 21:20:46.924255   61902 daemon.go:660] Unable to apply update: rpc error: code = Unknown desc =

It must be from the drain; let's make that clear.

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 11, 2019
Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ashcrow
Copy link
Member

ashcrow commented Feb 11, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2019
@cgwalters
Copy link
Member Author

/lgtm cancel

based on #408 (comment)

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@runcom
Copy link
Member

runcom commented Feb 11, 2019

pretty sure the error reported on Slack comes from:

func (dn *Daemon) updateOSAndReboot(newConfig *mcfgv1.MachineConfig) error {
	if err := dn.updateOS(newConfig); err != nil {
		return err
	}

	// Skip draining of the node when we're not cluster driven
	if dn.onceFrom == "" {
		glog.Info("Update prepared; draining the node")

		node, err := dn.kubeClient.CoreV1().Nodes().Get(dn.name, metav1.GetOptions{})
		if err != nil {
			return err
		}

Otherwise we should have gotten a timeout error right?

@jlebon
Copy link
Member

jlebon commented Feb 11, 2019

Yeah, I agree the error is likely from the Nodes().Get(). See #409.

Saw this in a log:

```
I0211 21:20:46.924255   61902 daemon.go:660] Unable to apply update: rpc error: code = Unknown desc =
```

It must be from the drain; let's make that clear.
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 11, 2019
return err
if lastErr != nil {
return errors.Wrapf(lastErr, "Failed to drain node (%s tries)", backoff.Steps)
}
Copy link
Member

@runcom runcom Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're gonna miss the wait.* error this way cause other err may happen other than just timeout if someone, someday adds another err case which return a real err:

// If the condition never returns true, ErrWaitTimeout is returned. All other
// errors terminate immediately.
func ExponentialBackoff(backoff Backoff, condition ConditionFunc) error {

I was thinking something like https://github.com/openshift/machine-config-operator/blob/master/pkg/operator/sync.go#L357-L362

Copy link
Member

@runcom runcom Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not blocking of course, we control the ConditionFunc here today but if someone jumps and add a return false, err, we're gonna miss it, just to have the same pattern

@runcom
Copy link
Member

runcom commented Feb 11, 2019

just a comment which is really a nit since we control the ConditionFunc

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2019
IgnoreDaemonsets: true,
})
if err != nil {
glog.Infof("Draining failed with: %v; retrying...", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep the logging here to show some progress if we're retrying?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up for this in #412!

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit afcc5e2 into openshift:master Feb 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants