Honor non-determinism fail workflow policy #1287

taylanisikdemir · 2023-11-03T17:45:38Z

What changed?
Users can specify NonDeterministicWorkflowPolicy in worker options. If the FailWorkflow policy is chosen the workflow is expected to terminate as soon as it ends up with a nondeterministic state (e.g. activity order changed).
However this wasn't honored for a category of nondeterminism cases. This PR addresses it and workflows fail once any nondeterminism scenario is encountered.

There are two categories of nondeterminism cases in terms of how they get detected by client library:

Issue bubbles up as illegal state panic to the task handler. Most actual prod cases.
Issue is caught when comparing replay decisions with history. Replay test scenarios and a subset of prod cases.

FailWorkflow policy was honored for 2 but not for 1.

Why?
To make NonDeterministicWorkflowPolicy feature correct/complete.

How did you test it?
Added an integration test to simulate this scenario.

Potential risks
Users depending on existing buggy behavior can be impacted. This would only happen if and only if all the below holds true

user set FailWorkflow policy
user expects to not terminate category 1 (see above) workflows

This is not very realistic expectation because users don't know about these subcategories of nondeterminism detection mechanisms.
So the risk of this fix should be minimal.

internal/internal_task_handlers.go

Groxx

dropping some notes from a first skim.

overall very close I think, I just need another read in an IDE or something to make sure it's good to go since this area in general is a little bit risky (but totally worth it)

internal/internal_task_handlers.go

internal/internal_task_handlers_test.go

test/activity_test.go

test/integration_test.go

internal/internal_task_handlers.go

test/integration_test.go

internal/internal_task_handlers.go

test/integration_test.go

test/workflow_test.go

test/integration_test.go

internal/internal_task_handlers.go

test/integration_test.go

test/workflow_test.go

Groxx

Make sure there's a fairly-prominent thing in the changelog (this PR or another, as long as it happens before release), but yep. LGTM, thanks for tracking down the details on this :)

…ndeterminism

taylanisikdemir requested review from Groxx, agautam478 and jakobht November 3, 2023 17:45

taylanisikdemir commented Nov 3, 2023

View reviewed changes

internal/internal_task_handlers.go Outdated Show resolved Hide resolved

Groxx reviewed Nov 9, 2023

View reviewed changes

internal/internal_task_handlers.go Show resolved Hide resolved