-
Notifications
You must be signed in to change notification settings - Fork 39
Adds Workflow State Retention #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
89d1b86
9b0f015
845f133
e087d8d
c75f03d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,162 @@ | ||
| # Workflow: History Purge TTL on Completion | ||
|
|
||
| * Author(s): @joshvanl | ||
|
|
||
| ## Overview | ||
|
|
||
| This proposal details new functionality to the workflow runtime to give users the ability to delete completed workflow state from the actor state store after some configured time. | ||
| All workflow instances may be configured with a unique TTL at workflow scheduling time. | ||
| The default remains that workflow state will _not_ be deleted from the actor state store, and will remain there indefinitely. | ||
|
|
||
| ## Background | ||
|
|
||
| It is currently the case that in order for users to delete old workflow state from the actor state store database, they either need to use the Purge Workflow API, or delete state from the database directly, either via out of Dapr database operations, or via using some kind of first class TTL feature of that database. | ||
| Users typically want to delete old workflow state after some period of time from when the workflow has reached a terminal state. | ||
|
|
||
| https://github.com/dapr/dapr/issues/9020 | ||
|
|
||
| ## Design | ||
|
|
||
| When scheduling a workflow, users will be able to configure some duration which upon elapsing after the workflow has reached a terminal state, the workflow will be purged from the actor state store. | ||
| The duration will only start once the workflow has reached either a TERMINATED, COMPLETED, or FAILED state. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to support different TTL per state? Users might not want to keep
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that makes sense i think- only think is that it will balloon the options in SDKs a bit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we support setting a global ttl in daprd? so clients have to do nothing?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So current thinking is having a Then the following optional proto message under the message InstacePurgeTTL {
optional google.protobuf.Duration defaultTerminal = 1;
optional google.protobuf.Duration completed = 2;
optional google.protobuf.Duration failed = 3;
optional google.protobuf.Duration terminated = 4;
}There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perfect, imo even just the dapr configuration is sufficient, so no real need to modify the SDKs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like having both, but happy to see either! My feeling is the more we can let users control in terms of behaviour in code, the better.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd just call it Regarding adding a configuration for this, how would this handle changes in the configuration for in-flight workflows? Like if a workflow started when the TTL was set at 60s but the configuration changes to 30s, which TTL will the workflow experience? I would assume it's 60s because it's 'saved' in the workflow, but it might feel confusing for users if they change the configuration but still see workflows being purged after 60s...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case it would the configuration a the time at which the workflow reach that terminal state. It would be possible to see what the TTL is with |
||
|
|
||
| Any duration may be given, i.e. days, weeks, or years. | ||
| A duration of `0` may also be given, if the workflow actor state is wished to be deleted immediately after reaching a terminal state. | ||
|
|
||
| ### Usage | ||
|
|
||
| #### CLI | ||
|
|
||
| Users can give a Go style duration string when running a workflow from the CLI. | ||
|
|
||
| ```bash | ||
| $ dapr run my-workflow --purge-ttl=5d | ||
| ``` | ||
|
|
||
| ```bash | ||
| $ dapr run my-workflow --purge-ttl=0s | ||
| ``` | ||
|
|
||
| The new purge reminders will be displayed like: | ||
|
|
||
| ```bash | ||
| $ dapr scheduler list | ||
| NAME BEGIN COUNT LAST TRIGGER | ||
| purge-workflow/my-workflow 96h 0 | ||
| ``` | ||
|
|
||
| #### Go | ||
|
|
||
| ```go | ||
| wf.ScheduleWorkflow(ctx, "my-workflow", workflow.WithPurgeTTL(time.Hour*24*5)) | ||
| ``` | ||
|
|
||
| ```go | ||
| wf.ScheduleWorkflow(ctx, "my-workflow", workflow.WithPurgeTTL(0)) | ||
| ``` | ||
|
|
||
| #### Python | ||
|
|
||
| ```python | ||
| wfClient.schedule_new_workflow(workflow=my_workflow, putge_ttl=timedelta(days=5)) | ||
JoshVanL marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ```python | ||
| wfClient.schedule_new_workflow(workflow=my_workflow, putge_ttl=timedelta(seconds=0)) | ||
| ``` | ||
|
|
||
| #### Javascript | ||
|
|
||
| ```js | ||
| workflowClient.scheduleNewWorkflow({workflow: MyWorkflow, putge_ttl: Temporal.Duration.from({days: 5})}) | ||
| ``` | ||
|
|
||
| ```js | ||
| workflowClient.scheduleNewWorkflow({workflow: MyWorkflow, putge_ttl: Temporal.Duration.from({})}) | ||
| ``` | ||
JoshVanL marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### .NET | ||
|
|
||
| ```dotnet | ||
| workflowClient.ScheduleNewWorkflowAsync( | ||
| name: nameof(MyWorkflow), | ||
| stateTTL: TimeSpan.FromDays(5); | ||
| ); | ||
| ``` | ||
|
|
||
| ```dotnet | ||
| workflowClient.ScheduleNewWorkflowAsync( | ||
| name: nameof(MyWorkflow), | ||
| stateTTL: TimeSpan.FromSeconds(0) | ||
JoshVanL marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ); | ||
| ``` | ||
|
|
||
| #### Java | ||
|
|
||
| ```java | ||
| opts.setPurgeTTL(Duration.ofDays(5)); | ||
| workflowClient.scheduleNewWorkflow(OrderProcessingWorkflow.class, opts); | ||
| ``` | ||
|
|
||
| ```java | ||
| opts.setPurgeTTL(Duration.ofSeconds(5)); | ||
| workflowClient.scheduleNewWorkflow(OrderProcessingWorkflow.class, opts); | ||
| ``` | ||
|
|
||
| ### Runtime | ||
|
|
||
| #### protos | ||
|
|
||
| The following protos will be updated with the new purge TTL duration field so it is piped from workflow creation to execution. | ||
|
|
||
| The new option will be added to `CreateInstanceRequest`, populated by the client. | ||
|
|
||
| ```proto | ||
| message CreateInstanceRequest { | ||
| string instanceId = 1; | ||
| string name = 2; | ||
| // EXISTING | ||
| google.protobuf.Duration purgeTTL = 10; // NEW | ||
| } | ||
| ``` | ||
|
|
||
| `ExecutionStartedEvent` will contain the TTL duration which signals the duration after which the workflow has completed should be purged. | ||
| This field will be persistent in the history log. | ||
| This field will be populated by the durabletask backend executor, piping the field from `CreateInstanceRequest`. | ||
|
|
||
| ```proto | ||
| message ExecutionStartedEvent { | ||
| string name = 1; | ||
| // EXISTING | ||
| google.protobuf.Duration purgeTTL = 10; // NEW | ||
| } | ||
| ``` | ||
|
|
||
| #### Actors | ||
|
|
||
| Upon workflow reaching a terminal state, after the orchestraion actor has written the result to the actor state store, it will then create an actor reminder if the `purgeTTL` field is present in the execution started event. | ||
|
|
||
| This reminder will target a new actor workflow type, with the reminder name being the instance ID of the workflow. | ||
|
|
||
| The new actor type will follow convention and have the following form: | ||
|
|
||
| ``` | ||
| dapr.internal.<namespace>.<app-id>.purge-workflow | ||
| ``` | ||
|
|
||
| Upon activation of the reminder, the new purge actor will be activated, call the purge API on the workflow orchestrator actor for the given instance ID, and then deactivate itself. | ||
| Along with the other workflow actor types, this type will be registered on workflow client connection, and unregistered on workflow worker client disconnection. | ||
|
|
||
| By using a new actor type, this feature is fully backwards compatible as older clients will not register for this new purge workflow type. | ||
|
|
||
|
|
||
| ``` | ||
| WORKFLOW COMPLETE -> orestrator -> create purge reminder -...> execute purge reminder -> execute purge actor -> execute purge on orchestrator | ||
| ``` | ||
|
|
||
| # Alternatives | ||
|
|
||
| Another option is to use the actor TTL state store functionality to delete store keys based on individual key TTls. | ||
| This is not appropriate as it _must_ be the case that workflow data be only delete from the state store once the workflow has reached a terminal state. | ||
| Not doing so would corrupt the workflow processing. | ||
| It is therefore necessary that the Purge API is used to delete the stored data, which itself processes the request inside the same workflow state machine. | ||
Uh oh!
There was an error while loading. Please reload this page.