You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The stack controller seems to retry failed updates without using a backoff. The built-in reconciliation backoff kicks in only when reconcile returns an error, which isn't the case here.
Consider looking at lastUpdate.endTime to implement a backoff strategy when lastUpdate.status is failed. Assumedly the stack would stay marked as Reconciling (rather than Stalled).
The text was updated successfully, but these errors were encountered:
Currently, if our automation APIs call fail they return non-nil errors
to the operator. In #676 I modified `Update` to translate these errors
into a "failed" status on the Update/Stack, but other operations
(preview etc.) still surface these errors and automatically re-queue.
We'd like to retry these failed updates much less aggressively than we
retry transient network errors, for example. To accomplish this we do a
few things:
* We consolidate the update controller's streaming logic for consistent
error handling across all operations.
* We return errors with known gRPC status codes as-is, but unknown
status codes are translated into failed results for all operations.
* We start tracking the number of times a stack has attempted an update.
This is used to determine how much exponential backoff to apply.
* A failed update is considered synced for a cooldown period before we
retry it. The cooldown period starts at 5 minutes and doubles for every
failed attempt, eventually maxing out at 24 hours.
Fixes#677
The stack controller seems to retry failed updates without using a backoff. The built-in reconciliation backoff kicks in only when reconcile returns an error, which isn't the case here.
Consider looking at
lastUpdate.endTime
to implement a backoff strategy whenlastUpdate.status
isfailed
. Assumedly the stack would stay marked as Reconciling (rather than Stalled).The text was updated successfully, but these errors were encountered: