Fix silent message loss from concurrent PublishAsync on MessageContext#2536
Merged
jeremydmiller merged 2 commits intomainfrom Apr 18, 2026
Merged
Conversation
Confirmed silent-message-loss bug: when a multi-stream projection runs on
a Marten ancillary store's async daemon and emits multiple Wolverine
messages via RaiseSideEffects() from a single Marten SaveChangesAsync,
only one of the messages reaches its handler. The rest are dropped with
no exception, no log entry — exactly matching the client report.
Two test cases on a clean schema:
projection_side_effect_message_reaches_wolverine_handler (1 stream) PASS
multiple_side_effects_in_one_batch_all_reach_wolverine (3 streams) FAIL
Sample failing assertion (verbatim):
should be [2f95...ad, 78b6...73, 8b7d...44]
but was [8b7d...44]
The failing test is marked [Trait("Category", "Flaky")] so CI stays green
while we work on the fix. Remove the trait when the underlying issue is
resolved.
Working hypothesis: race in Marten's
ProjectionUpdateBatch.CurrentMessageBatch — the random-delay + semaphore
pattern may let multiple threads bypass the _batch null-check and create
distinct MartenToWolverineMessageBatch instances, each with its own
MessageContext, where only one's AfterCommitAsync flushes its single
queued message.
References #2529.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#2529) MessageBus._outstanding was a plain List<Envelope> with no synchronization. When Marten's AggregationRunner processes slices with its Block parallelism (default 10), multiple concurrent calls to MessageContext.PublishAsync hit List<Envelope>.Add on the same list — which is well-known to silently lose entries under concurrent use (the internal _size counter race lets one thread overwrite another's slot). The symptom matched the client report exactly: side-effect messages from a multi-stream projection on an ancillary Marten store vanished with no exception and no log entry. Fix: wrap every _outstanding mutation with a lock in MessageBus and MessageContext. The Outstanding property returns a snapshot copy so external enumerations don't race with publish. Single-threaded callers (HTTP, handlers) pay an uncontended lock — ~10ns. Concurrent callers (the projection daemon) no longer lose messages. Verified with: - 2 new reproducer tests in MartenTests/AncillaryStores (single-message path + multi-message-in-one-batch path, both now pass on clean schema) - 88 related Marten tests (end-to-end publish, outbox, aggregate-handler workflow, ancillary stores) still pass - 75 CoreTests touching MessageBus/MessageContext/Outbox/Publish still pass Fixes #2529 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 21, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #2529.
MessageBus._outstandingwas a plainList<Envelope>with no synchronization. When Marten'sAggregationRunnerprocesses slices with itsBlockparallelism (default 10), multiple concurrentMessageContext.PublishAsynccalls hitList<Envelope>.Addon the same list — well-known to silently lose entries under concurrent use (the internal_sizecounter race lets one thread overwrite another's slot).Symptom match: side-effect messages from a multi-stream projection on an ancillary Marten store vanished with no exception and no log entry — exactly the client report.
Reproducer (new tests under
MartenTests/AncillaryStores/)multi_stream_projection_with_side_effects_on_ancillary_store.cs— configures a Marten ancillary store withIntegrateWithWolverine(), an async daemon, and aMultiStreamProjectionwhoseRaiseSideEffectspublishes a Wolverine message per aggregate.Fix
Wrap every
_outstandingmutation withlock (_outstandingLock)inMessageBus.csandMessageContext.cs. TheOutstandingproperty returns a snapshot copy so external enumerations don't race with publish.PublishAsyncin parallel) no longer lose messagesCode trace
AggregationRunner.cs:59—new Block<EventSliceExecution>(10, ...)— 10 concurrent slice handlersApplyChangesAsync→processPossibleSideEffects→ for each published message:await batch.PublishMessageAsync(message, slice.TenantId)ProjectionBatch.PublishMessageAsync→CurrentMessageBatch(_session)returns the sharedMartenToWolverineMessageBatch→batch.PublishAsync(...)— sharedMessageContextMessageContext.PublishAsync→MessageBus.cs:270underTransaction is not null→_outstanding.Fill(envelope)— concurrentList<T>.Add, silent data lossVerification
Files
src/Wolverine/Runtime/MessageBus.cs—_outstandingLock+ locked mutations;Outstandingreturns snapshotsrc/Wolverine/Runtime/MessageContext.cs— locked all_outstandingmutations + snapshot-iterationssrc/Persistence/MartenTests/AncillaryStores/multi_stream_projection_with_side_effects_on_ancillary_store.cs— the reproducer tests (the previously-failing one had been marked[Trait("Category", "Flaky")]in the parent commit; un-flaked here)🤖 Generated with Claude Code