feat(*)!: surface application errors in status #379
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature or Problem
This PR adds an additional field to the
BackoffWrapper
(previously BackoffAwareScaler) that can store a failed message received by wadm in response to a command. Commands in wasmCloud, particularly the ScaleComponentCommand and StartProvider command have corresponding events that indicate whether it successfully scaled/started or it failed. In this PR, I used the same structure that we already used for notifying scalers of events that should come in to also interpret whether or not a newly received event was actually a failure to process the command. In the case where the event was a failure, we now store the status in theBackoffWrapper
and report that status for 5 seconds, effectively backing off that scaler from additional reconciliation or status reporting until that status is cleared.As mentioned in some comments, a followup PR fully fixing #253 should be done as well that exponentially backs this off up to a ceiling duration. In order to keep the discussion focused, I scoped this PR to just the logic that pulls the failed message off of the event.
Requesting the status of an application that, for example, refers to a provider that doesn't exist, will look like this:
Related Issues
Lays the ground work for #253
Release Information
wadm 0.14.0
Consumer Impact
This didn't require any over-the-wire breaking changes, but operators should ensure that all of their wadm instances are updated to use this version. Running this version and an older version of wadm may result in inconsistent statuses, as the newer wadm scalers will understand backing off status but the older ones will not.
Testing
Unit Test(s)
Acceptance or Integration
Manual Verification
I manually verified this worked for a provider reference, but there's more testing to be done around the general functionality.