Skip to content

Conversation

@JoshVanL
Copy link
Contributor

No description provided.

20251108-RIS-workflow-state-ttl.md

Signed-off-by: joshvanl <[email protected]>
Signed-off-by: joshvanl <[email protected]>
## Design

When scheduling a workflow, users will be able to configure some duration which upon elapsing after the workflow has reached a terminal state, the workflow will be purged from the actor state store.
The duration will only start once the workflow has reached either a TERMINATED, COMPLETED, or FAILED state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to support different TTL per state? Users might not want to keep COMPLETED workflows for too long, but might want to keep FAILED ones for longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense i think- only think is that it will balloon the options in SDKs a bit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we support setting a global ttl in daprd? so clients have to do nothing?

Copy link
Contributor Author

@JoshVanL JoshVanL Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So current thinking is having a spec.workflow.purgeTTL.{defaultTerminal,completed,failed,terminated} as time durations in the Dapr config.

Then the following optional proto message under the CreateInstanceRequest and ExecutionStartedEvent messages. The workflow runtime will pick the min duration which matches the terminal state of the workflow. Per workflow requests with a matching state will have preference over the Dapr config. If a specific terminal state, and defaultTerminal are defined, the specific terminal state will take precedence.
wdyt?

message InstacePurgeTTL {
  optional google.protobuf.Duration defaultTerminal = 1;
  optional google.protobuf.Duration completed = 2;
  optional google.protobuf.Duration failed = 3;
  optional google.protobuf.Duration terminated = 4;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect, imo even just the dapr configuration is sufficient, so no real need to modify the SDKs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having both, but happy to see either! My feeling is the more we can let users control in terms of behaviour in code, the better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just call it default because TTL purging is always at terminal states, right?

Regarding adding a configuration for this, how would this handle changes in the configuration for in-flight workflows? Like if a workflow started when the TTL was set at 60s but the configuration changes to 30s, which TTL will the workflow experience? I would assume it's 60s because it's 'saved' in the workflow, but it might feel confusing for users if they change the configuration but still see workflows being purged after 60s...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it would the configuration a the time at which the workflow reach that terminal state. It would be possible to see what the TTL is with dapr scheduler list --filter workflow-retention

Co-authored-by: Albert Callarisa <[email protected]>
Signed-off-by: Josh van Leeuwen <[email protected]>
@JoshVanL JoshVanL changed the title Adds Workflow Purge TTL Adds Workflow State Retention Nov 12, 2025
Copy link
Contributor

@cicoyle cicoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 binding - just a few comments from me. Approving bc my comments are on smaller things, overall I agree with the path forward with this proposal

spec:
workflow:
stateRetentionPolicy:
anyTerminal: "5s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work if all are set in this example, like anyTerminal with all the rest below? Does the specific one, like completed override the anyTerminal setting?

}

message InstanceStateRetentionPolicy {
optional google.protobuf.Duration allTerminal = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
optional google.protobuf.Duration allTerminal = 1;
optional google.protobuf.Duration all_terminal = 1;

By using a new actor type, this feature is fully backwards compatible as older clients will not register for this new purge workflow type.

```
WORKFLOW COMPLETE -> orestrator -> create retentioner reminder -...> execute retentioner reminder -> execute retentioner actor -> execute purge on orchestrator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
WORKFLOW COMPLETE -> orestrator -> create retentioner reminder -...> execute retentioner reminder -> execute retentioner actor -> execute purge on orchestrator
WORKFLOW COMPLETE -> orchestrator -> create retentioner reminder -...> execute retentioner reminder -> execute retentioner actor -> execute purge on orchestrator -> cleanup retentioner actor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think thats right?

Copy link

@passuied passuied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants