Skip to content

Fix silent message loss from concurrent PublishAsync on MessageContext#2536

Merged
jeremydmiller merged 2 commits intomainfrom
fix/ancillary-projection-side-effects-2529
Apr 18, 2026
Merged

Fix silent message loss from concurrent PublishAsync on MessageContext#2536
jeremydmiller merged 2 commits intomainfrom
fix/ancillary-projection-side-effects-2529

Conversation

@jeremydmiller
Copy link
Copy Markdown
Member

Summary

Closes #2529.

MessageBus._outstanding was a plain List<Envelope> with no synchronization. When Marten's AggregationRunner processes slices with its Block parallelism (default 10), multiple concurrent MessageContext.PublishAsync calls hit List<Envelope>.Add on the same list — well-known to silently lose entries under concurrent use (the internal _size counter race lets one thread overwrite another's slot).

Symptom match: side-effect messages from a multi-stream projection on an ancillary Marten store vanished with no exception and no log entry — exactly the client report.

Reproducer (new tests under MartenTests/AncillaryStores/)

multi_stream_projection_with_side_effects_on_ancillary_store.cs — configures a Marten ancillary store with IntegrateWithWolverine(), an async daemon, and a MultiStreamProjection whose RaiseSideEffects publishes a Wolverine message per aggregate.

Test Before fix After fix
single stream → 1 side-effect ✅ passed ✅ passes
3 streams in one batch → 3 side-effects ❌ only 1 of 3 reached Wolverine ✅ all 3 reach Wolverine

Fix

Wrap every _outstanding mutation with lock (_outstandingLock) in MessageBus.cs and MessageContext.cs. The Outstanding property returns a snapshot copy so external enumerations don't race with publish.

  • Single-threaded callers (HTTP requests, message handlers) pay an uncontended lock — ~10ns per publish
  • Concurrent callers (the projection daemon, or user code calling PublishAsync in parallel) no longer lose messages

Code trace

  1. AggregationRunner.cs:59new Block<EventSliceExecution>(10, ...) — 10 concurrent slice handlers
  2. ApplyChangesAsyncprocessPossibleSideEffects → for each published message: await batch.PublishMessageAsync(message, slice.TenantId)
  3. ProjectionBatch.PublishMessageAsyncCurrentMessageBatch(_session) returns the shared MartenToWolverineMessageBatchbatch.PublishAsync(...) — shared MessageContext
  4. MessageContext.PublishAsyncMessageBus.cs:270 under Transaction is not null_outstanding.Fill(envelope)concurrent List<T>.Add, silent data loss

Verification

Test set Result
New reproducer tests (both) ✅ pass on clean schema
88 related Marten tests (end-to-end publish, MartenOutbox, AggregateHandlerWorkflow, AncillaryStores) ✅ all pass
75 CoreTests covering MessageContext/MessageBus/Outbox/Publish ✅ all pass

Files

  • src/Wolverine/Runtime/MessageBus.cs_outstandingLock + locked mutations; Outstanding returns snapshot
  • src/Wolverine/Runtime/MessageContext.cs — locked all _outstanding mutations + snapshot-iterations
  • src/Persistence/MartenTests/AncillaryStores/multi_stream_projection_with_side_effects_on_ancillary_store.cs — the reproducer tests (the previously-failing one had been marked [Trait("Category", "Flaky")] in the parent commit; un-flaked here)

🤖 Generated with Claude Code

jeremydmiller and others added 2 commits April 17, 2026 16:35
Confirmed silent-message-loss bug: when a multi-stream projection runs on
a Marten ancillary store's async daemon and emits multiple Wolverine
messages via RaiseSideEffects() from a single Marten SaveChangesAsync,
only one of the messages reaches its handler. The rest are dropped with
no exception, no log entry — exactly matching the client report.

Two test cases on a clean schema:

  projection_side_effect_message_reaches_wolverine_handler  (1 stream)  PASS
  multiple_side_effects_in_one_batch_all_reach_wolverine    (3 streams) FAIL

Sample failing assertion (verbatim):
  should be   [2f95...ad, 78b6...73, 8b7d...44]
  but was     [8b7d...44]

The failing test is marked [Trait("Category", "Flaky")] so CI stays green
while we work on the fix. Remove the trait when the underlying issue is
resolved.

Working hypothesis: race in Marten's
ProjectionUpdateBatch.CurrentMessageBatch — the random-delay + semaphore
pattern may let multiple threads bypass the _batch null-check and create
distinct MartenToWolverineMessageBatch instances, each with its own
MessageContext, where only one's AfterCommitAsync flushes its single
queued message.

References #2529.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#2529)

MessageBus._outstanding was a plain List<Envelope> with no synchronization.
When Marten's AggregationRunner processes slices with its Block parallelism
(default 10), multiple concurrent calls to MessageContext.PublishAsync hit
List<Envelope>.Add on the same list — which is well-known to silently lose
entries under concurrent use (the internal _size counter race lets one
thread overwrite another's slot).

The symptom matched the client report exactly: side-effect messages from a
multi-stream projection on an ancillary Marten store vanished with no
exception and no log entry.

Fix: wrap every _outstanding mutation with a lock in MessageBus and
MessageContext. The Outstanding property returns a snapshot copy so
external enumerations don't race with publish. Single-threaded callers
(HTTP, handlers) pay an uncontended lock — ~10ns. Concurrent callers (the
projection daemon) no longer lose messages.

Verified with:
- 2 new reproducer tests in MartenTests/AncillaryStores (single-message
  path + multi-message-in-one-batch path, both now pass on clean schema)
- 88 related Marten tests (end-to-end publish, outbox, aggregate-handler
  workflow, ancillary stores) still pass
- 75 CoreTests touching MessageBus/MessageContext/Outbox/Publish still pass

Fixes #2529

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review the permutation of Marten ancillary store + multi-stream projection + raise side effects publishing to Wolverine

1 participant