Pre-populate chain.AncillaryStoreType in Phase A so the inbox-routing map sees [MartenStore] (closes #2944)#2948
Merged
Conversation
…-routing map sees [MartenStore] (closes #2944) Closes #2944. Reported by @fadrian23. ## The bug A message arriving from an external system in INTEROP mode (no Wolverine envelope headers) whose handler targets an ancillary Marten store via [MartenStore] had its durable inbox envelope persisted in the MAIN store's inbox instead of the ancillary store's inbox. The handler ran correctly, but the envelope landed in the wrong inbox. ## Root cause [MartenStore] derives from ModifyChainAttribute, so MartenStoreAttribute.Modify() runs in PHASE B (lazy, inside HandlerChain.applyCustomizations at first-codegen time). The message-type-to-ancillary-store map that WolverineRuntime.HostService builds during startMessagingTransportsAsync runs in PHASE A (eager, at handler-graph compile). At that point chain. AncillaryStoreType is still null on every chain, so the map is built empty. When a message arrives, DurableLocalQueue / DurableReceiver consults that map, finds nothing, and falls back to the main store. Same Phase A vs Phase B trap as GH-2941. See [[reference_handler_chain_customization_phases]] for the broader pattern. The prior fix at WolverineRuntime.HostService.cs:447 (AllChains() over Chains, refs GH-2576) addressed per-endpoint sticky chains under MultipleHandlerBehavior.Separated, but didn't address the ordering trap - by the time it runs, the chains still haven't had their AncillaryStoreType set. ## Fix MartenStoreEagerPolicy (IHandlerPolicy, lives in Wolverine.Marten, registered in MartenIntegration alongside MartenAggregateHandlerStrategy). Runs in Phase A and pre-populates chain.AncillaryStoreType by walking each HandlerChain's handler-type and handler-method for [MartenStore] - matching the discovery rules in Chain.applyAttributesAndConfigureMethods. Also walks the per-endpoint sticky child chains (ByEndpoint) so Separated-mode keeps working alongside #2576's AllChains() fix. The Phase B MartenStoreAttribute.Modify() still runs later: the AncillaryStoreType reassignment is idempotent, and the AncillaryOutboxFactoryFrame middleware insertion stays where it has to be (it participates in codegen). ## Tests New Bug_2944_interop_ancillary_inbox in Wolverine.RabbitMQ.Tests/Bugs/ mirrors the reporter's repro: publish a raw JSON message (no Wolverine headers) to a durable RabbitMQ queue whose default incoming message type has a [MartenStore]-decorated handler, then assert the inbox envelope landed in the ancillary store, not the main store. ## Verification Local (with fix): - Bug_2944_interop_ancillary_inbox: 1/1 pass. - Marten ancillary + bug-family sweep (AncillaryStores + Bug_2318 + Bug_2382 + Bug_2576 + Bug_2669 + Bug_2887 + Bug_ancillary + Distribution.with_ancillary): 30/30. - EfCoreTests.Bug_DurableLocalQueue_ancillary: 3/3. - Wolverine.RabbitMQ.Tests.Bug_2155_ancillary: 1/1. - Wolverine.Http.Tests.using_ancillary_stores: 1/1. - Full Marten Bugs: 45/45. - Full AggregateHandlerWorkflow: 66/66. Negative-control: commenting out the policy registration makes Bug_2944_interop_ancillary_inbox fail with exactly the assertion the reporter described - 'The interop message should have been persisted in the ancillary store's inbox' - confirms the test exercises the bug. PersistenceTests.ModularMonoliths.end_to_end_modular_monolith and .registration_of_message_stores cannot be validated locally (their SQL-Server-touching fixtures time out under emulated SQL Server 2025 on Apple Silicon - the existing Postgres-only subset (13/37) passes; the remaining 24 are local-environment limitation, not a regression). CI on native Linux SQL Server will validate them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 28, 2026
Closed
outofrange-consulting
pushed a commit
to outofrange-consulting/wolverine
that referenced
this pull request
May 28, 2026
…JasperFx#2949) Closes JasperFx#2949. The flake symptom on three consecutive PRs (JasperFx#2943, JasperFx#2947, JasperFx#2948 in May 2026) was: System.TimeoutException : Timed out waiting for expected response Wolverine.Runtime.Agents.AgentsStarted for original message <id> of type Wolverine.Runtime.Agents.StartAgents with a configured timeout of 10000 milliseconds Tests affected: RavenDbTests.LeaderElection.leadership_election_compliance. take_over_leader_ship_if_leader_becomes_stale and .leader_switchover_between_nodes. Root cause: WolverineRuntime.Agents.InvokeAsync<T>(NodeDestination, IAgentCommand) had an asymmetric timeout - same-node calls got 30s, remote-node calls got 10s. The asymmetry is backwards: a remote request-reply traverses the control endpoint + serialization + network, so it should have AT LEAST as much budget as a same-node in-memory invocation, not less. Under load on shared GitHub runners that 10s was a real timing race on the cross-node leadership-takeover scenarios the failing tests exercise, where StartAgents -> AgentsStarted is sent from the new leader to a target node and the runner can stall just long enough for the reply not to land in time. Fix: align remote with same-node at 30s. This is the production constant, not a test-only setting - the same flake would (rarely) bite users on busy or slow-network multi-node Wolverine clusters whose leadership-takeover scenario hits the same StartAgents -> AgentsStarted ack. Same-node already accepts 30s so the change is conservatively in-line with existing precedent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2944. Reported by @fadrian23.
The bug
A message arriving from an external system in interop mode (raw JSON, no Wolverine envelope headers) whose handler targets an ancillary Marten store via
[MartenStore]had its durable-inbox envelope persisted in the main store's inbox instead of the ancillary store's. The handler still ran (Wolverine's message routing finds the chain fine), but the inbox row landed in the wrong place — so the reporter's downstream operational assertions about ancillary-store atomicity broke.The reporter pinpointed the exact line in their issue:
returns 0 records, because
chain.AncillaryStoreTypeis stillnullat that point.Root cause
Same Phase A vs Phase B ordering trap as GH-2941:
[MartenStore]derives fromModifyChainAttribute.MartenStoreAttribute.Modify()runs in Phase B (lazy, insideHandlerChain.applyCustomizationsat first-codegen time).WolverineRuntime.HostServicebuilds duringstartMessagingTransportsAsyncruns in Phase A (eager, at handler-graph compile).At the time of the Phase A loop, no chain has had its
applyCustomizationstriggered yet — sochain.AncillaryStoreTypeisnullon every chain and the map is built empty. When an interop message arrives,DurableLocalQueue/DurableReceivercallsStores.TryFindAncillaryStoreForMessageType(envelope.MessageType), hits an empty map, and falls back to the main store.The prior fix at
HostService.cs:447(AllChains()overChains, GH-2576) addressed per-endpoint sticky chains underMultipleHandlerBehavior.Separated— but didn't address the ordering trap. By the time it runs, the chains still have nullAncillaryStoreType.Fix
A new internal
MartenStoreEagerPolicy : IHandlerPolicyinWolverine.Marten, registered byMartenIntegrationalongsideMartenAggregateHandlerStrategy. It runs in Phase A and pre-populateschain.AncillaryStoreTypeby walking eachHandlerChain's handler-type and handler-method for[MartenStore]— matching the discovery rules already used inChain.applyAttributesAndConfigureMethods(handler-type + handler-method). Also walks the per-endpoint sticky child chains (ByEndpoint) soSeparated-mode keeps working alongside #2576.The Phase B
MartenStoreAttribute.Modify()still runs later — theAncillaryStoreTypereassignment is idempotent, and theAncillaryOutboxFactoryFramemiddleware insertion stays where it has to be (it participates in codegen).Tests + verification
New regression:
Bug_2944_interop_ancillary_inboxinWolverine.RabbitMQ.Tests/Bugs/mirrors the reporter's repro — publishes a raw JSON message (no Wolverine headers) to a durable RabbitMQ queue whose default incoming type has a[MartenStore]-decorated handler, then asserts the inbox envelope landed in the ancillary store and not the main store.Bug_2944_interop_ancillary_inbox(new)AncillaryStores+Bug_2318+Bug_2382+Bug_2576+Bug_2669+Bug_2887+Bug_ancillary+Distribution.with_ancillary)EfCoreTests.Bug_DurableLocalQueue_ancillaryWolverine.RabbitMQ.Tests.Bug_2155_ancillaryWolverine.Http.Tests.using_ancillary_storesdotnet build wolverine.slnx -c ReleaseNegative-control: commenting out the policy registration makes
Bug_2944_interop_ancillary_inboxfail with exactly the assertion the reporter described — "The interop message should have been persisted in the ancillary store's inbox" — confirming the test exercises the bug.PersistenceTests.ModularMonoliths.end_to_end_modular_monolithand.registration_of_message_storescould not be validated locally — their SQL-Server-touching fixtures time out under emulated SQL Server 2025 on Apple Silicon. The same tests fail the same way on the baseline (no fix) — this is a pre-existing local-environment limitation, not a regression introduced here. CI on native Linux SQL Server validates them.🤖 Generated with Claude Code