Skip to content

Conversation

@mjameswh
Copy link
Contributor

@mjameswh mjameswh commented Jan 7, 2026

What was changed and why

  • Track instances of AsyncLocalStorage created inside the workflow sandbox and automatically clean them up on workflow context disposal, after execution of the dispose hook on internal interceptors.

  • The AsyncLocalStorage instances created by the SDK to track cancellation and update scopes are now automatically cleaned up at the sandbox vm destruction time.

    • PR 1834 added a workflowModule.dispose() call during cleanup of the reusable VM workflow, but that resulted in a regression in 1.14.0 where a workflow might end up getting unexpectedly cancelled if evicted from cache while blocked on a condition(fn, timeout) (see investigation details below). Fixes [Bug] Signal caused condition to fail with CancelledFailure on 1.14.0 #1866.

      • condition with timeout creates a CancellationScope to perform the race between the sleep and the blocking condition src
      • On completion, the current CancellationScope is canceled
      • Current CancellationScope is stored in a AsyncLocalStorage which falls back to the workflow cancellation scope if not in a context where there is an active store. src
      • 1834 added a workflowModule.dispose() call to the dispose method for reusable VM workflow
      • This in turn disables the AsyncLocalStorage that stores the cancellation tokens (src). Any subsequent getStore calls that happen before a new run will return undefined.
      • If a workflow ends up getting disposed after the condition scope is created, but before the scope is cancelled, then the workflow ends up get cancelled instead of the condition scope.
      • This lines up from the trace provided by the user on 1866 which is a CancelledFailure: Workflow cancelled, but coming from the cancel inside of the condition scope on this line
  • Properly dispose a workflow execution context on early failure (e.g. if requested workflow type doesn't resolve to an exported workflow function).

    • We do not add the workflow execution context to the lang-side workflow cache if it failed during early initialization. It is therefore not possible to clean it up at a later time, i.e. on cache eviction. This could previously cause a memory leak if some some non-garbage-collectible resources (e.g. AsyncLocalStorage) had been allocated during early initialization. We now do ensure that dispose is properly called in that scenario.

Checklist

  1. How was this tested:
  • Added failing integration test in first commit, second commit fixes the test.
  • Actual disposal of AsyncLocalStorage lifecycle can't really be asserted through automated testing. We temporarily added console.log calls inside our mocked ALS class to confirm that dispose() is called at the correct moment in various scenarios.
  1. Any docs updates needed?
  • We should consider documenting the use of AsyncLocalStorage in workflow context.

@mjameswh mjameswh requested a review from a team as a code owner January 7, 2026 16:20
Copy link
Member

@chris-olszewski chris-olszewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall makes sense to me.

Just a few small questions.

const wf3 = await startWorkflow(asyncLocalStorageWorkflow);

await worker.runUntil(Promise.all([wf1.result(), wf2.result(), wf3.result()]));
t.pass();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any additional assertions that would be valuable here aside from "it doesn't throw"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with the possibility of observing the destruction of ALS by escaping the sandbox to create a FinalizationRegistry, which worked. The problem, though, is that I couldn't get that test to fail with the previous code, so that's not proving anything.

With that in mind, and given the complexity and fragility of that test, I decided not to commit it.

And I really can't think of any other side effect to look for.

Comment on lines 144 to 150
disable(): void {
super.disable();
}

__temporal_disableForReal(): void {
super.disable();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both of these actually disable the ALS, do we need both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was hesitant about whether we want the "normal" disable() function to work properly, or if it should be a noop. I'm now leading for the former, so I'll delete the __temporal one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on having it function properly

Co-authored-by: Chris Olszewski <[email protected]>
@chris-olszewski
Copy link
Member

Looking into how the lint violation got through CI and why it isn't showing up locally, but fix is in #1875

@mjameswh
Copy link
Contributor Author

mjameswh commented Jan 8, 2026

Looking into how the lint violation got through CI and why it isn't showing up locally, but fix is in #1875

Ah, yeah, I was also looking at that just now. Let's merge your dep upgrade pr first.

@mjameswh mjameswh merged commit bc32cf1 into temporalio:main Jan 8, 2026
24 of 27 checks passed
@mjameswh mjameswh deleted the asynclocalstorage-auto-cleanup branch January 9, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants