Skip to content

Ensure WriteMessagesAsync/SaveAsync is called asynchronously in Async…#8163

Merged
Aaronontheweb merged 3 commits into
akkadotnet:devfrom
schdooz:NonBlockingWriteMessagesAsync
Apr 23, 2026
Merged

Ensure WriteMessagesAsync/SaveAsync is called asynchronously in Async…#8163
Aaronontheweb merged 3 commits into
akkadotnet:devfrom
schdooz:NonBlockingWriteMessagesAsync

Conversation

@schdooz
Copy link
Copy Markdown
Contributor

@schdooz schdooz commented Apr 16, 2026

Fixes #8162

Changes

Call await Task.Yield() before calling AsyncWriteJournal.WriteMessagesAsync()/SnapshotStore.SaveAsync() to ensure that these methods are executed entirely off the actor context.

Without the await Task.Yield(), the beginning portion of the AsyncWriteJournal.WriteMessagesAsync()/SnapshotStore.SaveAsync() methods will execute synchronously in the AsyncWriteJournal/SnapshotStore actor context and will block further message processing by the actor until execution has completed. This can cause performance issues if expensive work must be done at the beginning of the method (e.g. serialization).

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Latest dev Benchmarks

Benchmarked with this repo

Method Mean Error StdDev
Persist100Events 15.81 ms 0.187 ms 0.156 ms
Persist1000Events 141.52 ms 2.721 ms 2.272 ms
Persist10000Events 1,464.09 ms 28.519 ms 30.515 ms

This PR's Benchmarks

Benchmarked with this repo

Method Mean Error StdDev
Persist100Events 9.236 ms 0.1831 ms 0.2035 ms
Persist1000Events 82.233 ms 1.2815 ms 1.1988 ms
Persist10000Events 823.042 ms 15.0341 ms 13.3273 ms

@Aaronontheweb
Copy link
Copy Markdown
Member

Looks like these changes are tripping the circuit breaker under some conditions, at least in the tests we're running on the health checks. Let me dig into this.

Copy link
Copy Markdown
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to harden the timing assertions in these specs due to the new Task.Yield behavior on the journal / snapshot stores, but the health check implementations themselves look to be unaffected by this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same timing issues with the health checks here - replaced all of the fragile test scheduler stuff with AwaitAssertAsync

{
// Ensure WriteMessagesAsync is not called in AsyncWriteJournal
// actor context and so doesn't block message handling
await Task.Yield();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me - does exactly what is advertised.

@Aaronontheweb
Copy link
Copy Markdown
Member

nice work @schdooz - this looks like a genuinely good perf improvement for all Akka.Persistence plugins. We'll ship it in the next v1.5 release.

@Aaronontheweb Aaronontheweb merged commit afabfef into akkadotnet:dev Apr 23, 2026
10 of 12 checks passed
Aaronontheweb added a commit that referenced this pull request Apr 24, 2026
#8163)

* Ensure WriteMessagesAsync/SaveAsync is called asynchronously in AsyncWriteJournal/SnapshotStore.

* Fix persistence health check timing tests.

---------

Co-authored-by: Mark Dinh <mark.dinh@youlend.com>
Co-authored-by: Aaron Stannard <aaron@petabridge.com>
Co-authored-by: Aaron Stannard <aaron@aaronstannard.com>
Aaronontheweb added a commit that referenced this pull request Apr 26, 2026
…napshotStore (#8163)

This reverts the Task.Yield() additions from PR #8163 in AsyncWriteJournal.ExecuteBatch
and SnapshotStore.ReceiveSnapshotStore, while preserving the health check test
improvements from that same PR.

PR #8163 added `await Task.Yield()` before calling `WriteMessagesAsync` and `SaveAsync`
inside their respective circuit breaker lambdas. The intent was to move expensive byte
serialization off the actor's message-processing thread, which showed ~45% throughput
improvement in benchmarks.

However, this silently broke the implicit contract that persistence plugins relied on:
that the synchronous preamble of `WriteMessagesAsync`/`SaveAsync` executes in actor
context. Moving execution to the thread pool caused:

1. Plugins that access `Self` inside `WriteMessagesAsync` (e.g. Akka.Persistence.Sql,
   Akka.Persistence.EventStore) throw `NotSupportedException` because there is no
   active ActorContext on a thread pool thread.

2. Plugins that use non-thread-safe collections like `Dictionary<string, Task>` for
   write tracking (e.g. Akka.Persistence.Sql, Akka.Persistence.EventStore) are now
   subject to concurrent access from both the actor thread and thread pool threads,
   causing `InvalidOperationException` or silent data corruption.

3. Plugins that send messages to subscribers after writes complete (e.g.
   Akka.Persistence.Redis) access shared actor state off the actor thread.

The change was too blunt an instrument — it applied uniformly to all plugins via the
base class, removing their ability to do any actor-thread setup before async work begins.
Ironically, the plugins that benefit most from off-thread serialization (MongoDB, Azure
Table Storage) don't access actor context at all, while the plugins that break (SQL,
EventStore, Redis) already perform serialization off-thread in their async pipelines.

A future version may reintroduce this optimization with a more surgical approach
(e.g. opt-in property or Template Method pattern) that preserves the plugin threading
contract.
Aaronontheweb added a commit that referenced this pull request Apr 26, 2026
…napshotStore (#8163) (#8189)

This reverts the Task.Yield() additions from PR #8163 in AsyncWriteJournal.ExecuteBatch
and SnapshotStore.ReceiveSnapshotStore, while preserving the health check test
improvements from that same PR.

PR #8163 added `await Task.Yield()` before calling `WriteMessagesAsync` and `SaveAsync`
inside their respective circuit breaker lambdas. The intent was to move expensive byte
serialization off the actor's message-processing thread, which showed ~45% throughput
improvement in benchmarks.

However, this silently broke the implicit contract that persistence plugins relied on:
that the synchronous preamble of `WriteMessagesAsync`/`SaveAsync` executes in actor
context. Moving execution to the thread pool caused:

1. Plugins that access `Self` inside `WriteMessagesAsync` (e.g. Akka.Persistence.Sql,
   Akka.Persistence.EventStore) throw `NotSupportedException` because there is no
   active ActorContext on a thread pool thread.

2. Plugins that use non-thread-safe collections like `Dictionary<string, Task>` for
   write tracking (e.g. Akka.Persistence.Sql, Akka.Persistence.EventStore) are now
   subject to concurrent access from both the actor thread and thread pool threads,
   causing `InvalidOperationException` or silent data corruption.

3. Plugins that send messages to subscribers after writes complete (e.g.
   Akka.Persistence.Redis) access shared actor state off the actor thread.

The change was too blunt an instrument — it applied uniformly to all plugins via the
base class, removing their ability to do any actor-thread setup before async work begins.
Ironically, the plugins that benefit most from off-thread serialization (MongoDB, Azure
Table Storage) don't access actor context at all, while the plugins that break (SQL,
EventStore, Redis) already perform serialization off-thread in their async pipelines.

A future version may reintroduce this optimization with a more surgical approach
(e.g. opt-in property or Template Method pattern) that preserves the plugin threading
contract.
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PERF] Long event recovery times during high event persist and snapshot load, but DB query time is fast

3 participants