-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Matejrisek/actions/on failure #37945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
DanielMSchmidt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, I think the basis looks really good, there are a couple of "edge" topics that are still open, but it looks like the right course to me.
| }, | ||
| }, | ||
|
|
||
| "trigger on_failure set to 'fail' fails the resource": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test cases could be more extensive:
- we should assert that the resource change has been made (so the right RPC call on the provider has been made)
- we should check that
on_failure = continuecontinues with the other actions in theactionslist and in otheraction_triggerblocks. - we should make sure the behavior with multiple action triggers where the on_failure behavior is mixed is consistent with the expectations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey thanks for this pointers.
They should have been address in this commit: 2a46bea.
Just for the clarification:
we should assert that the resource change has been made (so the right RPC call on the provider has been made)
Do you mean we should assert action invocations or we do something else to verify changes?
| ) tfdiags.Diagnostics { | ||
| switch aii.ActionTrigger.TriggerOnFailure() { | ||
| case configs.ActionTriggerOnFailureContinue: | ||
| if currentDiags.HasErrors() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of printing the errors we want to continue with we probably want to wrap them in warning level diagnostics so that consumers that work directly with the returned diagnostics (e.g. the stacks runtime) can appropriately handle these diagnostics as well.
Also we should only do this for errors coming from the provider complete event, if a hook sends a diagnostic it is unrelated to the on_failure behavior and we should still pass them through.
| ActionTriggerBlockIndex: at.actionTriggerBlockIndex, | ||
| ActionsListIndex: at.actionListIndex, | ||
| ActionTriggerEvent: triggeringEvent, | ||
| ActionTriggerOnFailure: at.onFailure, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now part of plans.LifecycleActionTrigger, so it needs to also be handled in the JSON representation:
| case *plans.LifecycleActionTrigger: |
Also I think we need to handle this in plans.ActionInvocationInstanceSrc and in the serialization to and from the planfile:
terraform/internal/plans/planfile/tfplan.go
Line 1357 in a051ac6
| ret.ActionTrigger = &plans.LifecycleActionTrigger{ |
mildwonkey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋🏻 hi, sorry! This isn't an actual review - I just wanted to make sure we're not actually merging any changes to actions before the PRD gets approved and RFCs are written and approved. We are ready for prototypes and RFCs, not mergable PRs 😁
|
Just throwing some context in here for thought as the RFC are finalized. These may not end up being the final requirements, but are important to consider when trying to move provisioners to this new model. In comparison to provisioners:
The current implementation can't fulfill either of those two points, because the separate apply nodes are not in the dependency chain, and are not evaluated until after the resource has been already recorded as complete. |
Introduce
on_failureattribute to theaction_triggerblock.This attribute will allow consumers to define the behavior if an action fails.
As of now we default to failing run if an error has been returned as part of action invocation.
We'll keep that the default behavior but introduce two options:
failis the default behavior - we're just making our choice explicit.continueis the new behavior we're adding - it will instruct terraform to ignore errors from the action invocation, log that the error has occurred and continue with the run execution.The
on_failureattribute takes inspiration from the namesake attribute in provisioners.Example configuration
Testing
In order to manually test this feature one should use a provider that allows for controlled erroring of actions. For that purpose I've used the modified
tfcoremock[GH] provider which will always fail on the action invocation.Fixes #
Target Release
1.15.x
Rollback Plan
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.
CHANGELOG entry
Add
on_failureattribute to theaction_triggerblock to allow defining different behavior on action failure. For now we support the defaultfailkeyword as well as the newcontinuekeyword which instructs terraform to ignore action failure and continue with the run.