Always retry provisioning operations on failure (continue)#610
Always retry provisioning operations on failure (continue)#610metal3-io-bot merged 1 commit intometal3-io:masterfrom
Conversation
f8fa6dc to
e7e1dd7
Compare
|
/test-integration |
| return backOffDuration | ||
| } | ||
|
|
||
| func (r actionFailed) Result() (result reconcile.Result, err error) { |
There was a problem hiding this comment.
FWIW it would be fine to change the signature of Result() to pass the error count if it were more convenient for us to not calculate it until later.
63258fa to
c2a937b
Compare
7e4bdfc to
d5bea5a
Compare
|
/test-integration |
|
would you mind squashing the commits please ? |
f460be4 to
9d8411e
Compare
|
/test-integration |
9d8411e to
e5d6d9b
Compare
|
/test govet |
|
/test unit |
|
/test-integration |
|
/approve |
zaneb
left a comment
There was a problem hiding this comment.
Refactoring in a separate patch is good, but please squash stuff like "Fix imports".
/approve
e5d6d9b to
0e97bf4
Compare
|
/test-integration |
Improve the reconciliation loop whenever an action failure is detected (or credential error) by applying a retry pattern with exponentional backoff with jitter to avoid service overloading - currently affected statuses are: Deprovisioning, Externally Provisioned, Inspecting, Provisioned, Provisioning and Registering. A new `ErrorCount` field has been added in the `BareMetalHostStatus` to support such behavior: it gets incremented every time an action failure is recorded, and it is cleared out when the action completes successfully.
0e97bf4 to
b1b31f7
Compare
|
/test-integration |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano, dhellmann, zaneb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
/test-integration |
Always retry provisioning operations on failure (continue)
Always retry provisioning operations on failure (continue)
Always retry provisioning operations on failure (continue)
This PR replaces the one started from @honza on PR #584.
Reconciliation loop now retries the operation whenever an action failure is detected, with a backoff. The backoff calculation has been reviewed and some jitter added, and the error count is zeroed when the action succeeds.