Skip to content

Pre-populate chain.AncillaryStoreType in Phase A so the inbox-routing map sees [MartenStore] (closes #2944)#2948

Merged
jeremydmiller merged 1 commit into
mainfrom
fix-2944-marten-store-eager-policy
May 28, 2026
Merged

Pre-populate chain.AncillaryStoreType in Phase A so the inbox-routing map sees [MartenStore] (closes #2944)#2948
jeremydmiller merged 1 commit into
mainfrom
fix-2944-marten-store-eager-policy

Conversation

@jeremydmiller
Copy link
Copy Markdown
Member

Closes #2944. Reported by @fadrian23.

The bug

A message arriving from an external system in interop mode (raw JSON, no Wolverine envelope headers) whose handler targets an ancillary Marten store via [MartenStore] had its durable-inbox envelope persisted in the main store's inbox instead of the ancillary store's. The handler still ran (Wolverine's message routing finds the chain fine), but the inbox row landed in the wrong place — so the reporter's downstream operational assertions about ancillary-store atomicity broke.

The reporter pinpointed the exact line in their issue:

// WolverineRuntime.HostService.cs:447
foreach (var chain in Handlers.AllChains().Where(c => c.AncillaryStoreType != null))

returns 0 records, because chain.AncillaryStoreType is still null at that point.

Root cause

Same Phase A vs Phase B ordering trap as GH-2941:

  • [MartenStore] derives from ModifyChainAttribute. MartenStoreAttribute.Modify() runs in Phase B (lazy, inside HandlerChain.applyCustomizations at first-codegen time).
  • The map that WolverineRuntime.HostService builds during startMessagingTransportsAsync runs in Phase A (eager, at handler-graph compile).

At the time of the Phase A loop, no chain has had its applyCustomizations triggered yet — so chain.AncillaryStoreType is null on every chain and the map is built empty. When an interop message arrives, DurableLocalQueue / DurableReceiver calls Stores.TryFindAncillaryStoreForMessageType(envelope.MessageType), hits an empty map, and falls back to the main store.

The prior fix at HostService.cs:447 (AllChains() over Chains, GH-2576) addressed per-endpoint sticky chains under MultipleHandlerBehavior.Separated — but didn't address the ordering trap. By the time it runs, the chains still have null AncillaryStoreType.

Fix

A new internal MartenStoreEagerPolicy : IHandlerPolicy in Wolverine.Marten, registered by MartenIntegration alongside MartenAggregateHandlerStrategy. It runs in Phase A and pre-populates chain.AncillaryStoreType by walking each HandlerChain's handler-type and handler-method for [MartenStore] — matching the discovery rules already used in Chain.applyAttributesAndConfigureMethods (handler-type + handler-method). Also walks the per-endpoint sticky child chains (ByEndpoint) so Separated-mode keeps working alongside #2576.

The Phase B MartenStoreAttribute.Modify() still runs later — the AncillaryStoreType reassignment is idempotent, and the AncillaryOutboxFactoryFrame middleware insertion stays where it has to be (it participates in codegen).

Tests + verification

New regression: Bug_2944_interop_ancillary_inbox in Wolverine.RabbitMQ.Tests/Bugs/ mirrors the reporter's repro — publishes a raw JSON message (no Wolverine headers) to a durable RabbitMQ queue whose default incoming type has a [MartenStore]-decorated handler, then asserts the inbox envelope landed in the ancillary store and not the main store.

Suite Result
Bug_2944_interop_ancillary_inbox (new) 1/1 pass
Marten ancillary + bug-family (AncillaryStores + Bug_2318 + Bug_2382 + Bug_2576 + Bug_2669 + Bug_2887 + Bug_ancillary + Distribution.with_ancillary) 30/30 pass
EfCoreTests.Bug_DurableLocalQueue_ancillary 3/3 pass
Wolverine.RabbitMQ.Tests.Bug_2155_ancillary 1/1 pass
Wolverine.Http.Tests.using_ancillary_stores 1/1 pass
Full Marten Bugs 45/45 pass
Full AggregateHandlerWorkflow 66/66 pass
dotnet build wolverine.slnx -c Release 0 warnings, 0 errors

Negative-control: commenting out the policy registration makes Bug_2944_interop_ancillary_inbox fail with exactly the assertion the reporter described — "The interop message should have been persisted in the ancillary store's inbox" — confirming the test exercises the bug.

PersistenceTests.ModularMonoliths.end_to_end_modular_monolith and .registration_of_message_stores could not be validated locally — their SQL-Server-touching fixtures time out under emulated SQL Server 2025 on Apple Silicon. The same tests fail the same way on the baseline (no fix) — this is a pre-existing local-environment limitation, not a regression introduced here. CI on native Linux SQL Server validates them.

🤖 Generated with Claude Code

…-routing map sees [MartenStore] (closes #2944)

Closes #2944. Reported by @fadrian23.

## The bug

A message arriving from an external system in INTEROP mode (no Wolverine
envelope headers) whose handler targets an ancillary Marten store via
[MartenStore] had its durable inbox envelope persisted in the MAIN store's
inbox instead of the ancillary store's inbox. The handler ran correctly, but
the envelope landed in the wrong inbox.

## Root cause

[MartenStore] derives from ModifyChainAttribute, so MartenStoreAttribute.Modify()
runs in PHASE B (lazy, inside HandlerChain.applyCustomizations at first-codegen
time). The message-type-to-ancillary-store map that
WolverineRuntime.HostService builds during startMessagingTransportsAsync runs
in PHASE A (eager, at handler-graph compile). At that point chain.
AncillaryStoreType is still null on every chain, so the map is built empty.
When a message arrives, DurableLocalQueue / DurableReceiver consults that map,
finds nothing, and falls back to the main store.

Same Phase A vs Phase B trap as GH-2941. See
[[reference_handler_chain_customization_phases]] for the broader pattern.

The prior fix at WolverineRuntime.HostService.cs:447 (AllChains() over Chains,
refs GH-2576) addressed per-endpoint sticky chains under
MultipleHandlerBehavior.Separated, but didn't address the ordering trap - by
the time it runs, the chains still haven't had their AncillaryStoreType set.

## Fix

MartenStoreEagerPolicy (IHandlerPolicy, lives in Wolverine.Marten, registered
in MartenIntegration alongside MartenAggregateHandlerStrategy). Runs in
Phase A and pre-populates chain.AncillaryStoreType by walking each
HandlerChain's handler-type and handler-method for [MartenStore] - matching
the discovery rules in Chain.applyAttributesAndConfigureMethods. Also walks
the per-endpoint sticky child chains (ByEndpoint) so Separated-mode keeps
working alongside #2576's AllChains() fix.

The Phase B MartenStoreAttribute.Modify() still runs later: the
AncillaryStoreType reassignment is idempotent, and the AncillaryOutboxFactoryFrame
middleware insertion stays where it has to be (it participates in codegen).

## Tests

New Bug_2944_interop_ancillary_inbox in Wolverine.RabbitMQ.Tests/Bugs/ mirrors
the reporter's repro: publish a raw JSON message (no Wolverine headers) to a
durable RabbitMQ queue whose default incoming message type has a
[MartenStore]-decorated handler, then assert the inbox envelope landed in the
ancillary store, not the main store.

## Verification

Local (with fix):
  - Bug_2944_interop_ancillary_inbox: 1/1 pass.
  - Marten ancillary + bug-family sweep (AncillaryStores + Bug_2318 +
    Bug_2382 + Bug_2576 + Bug_2669 + Bug_2887 + Bug_ancillary +
    Distribution.with_ancillary): 30/30.
  - EfCoreTests.Bug_DurableLocalQueue_ancillary: 3/3.
  - Wolverine.RabbitMQ.Tests.Bug_2155_ancillary: 1/1.
  - Wolverine.Http.Tests.using_ancillary_stores: 1/1.
  - Full Marten Bugs: 45/45.
  - Full AggregateHandlerWorkflow: 66/66.

Negative-control: commenting out the policy registration makes
Bug_2944_interop_ancillary_inbox fail with exactly the assertion the reporter
described - 'The interop message should have been persisted in the ancillary
store's inbox' - confirms the test exercises the bug.

PersistenceTests.ModularMonoliths.end_to_end_modular_monolith and
.registration_of_message_stores cannot be validated locally (their
SQL-Server-touching fixtures time out under emulated SQL Server 2025 on
Apple Silicon - the existing Postgres-only subset (13/37) passes; the
remaining 24 are local-environment limitation, not a regression). CI on
native Linux SQL Server will validate them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller merged commit 3e0913e into main May 28, 2026
22 of 24 checks passed
outofrange-consulting pushed a commit to outofrange-consulting/wolverine that referenced this pull request May 28, 2026
…JasperFx#2949)

Closes JasperFx#2949. The flake symptom on three consecutive PRs (JasperFx#2943, JasperFx#2947, JasperFx#2948
in May 2026) was:

  System.TimeoutException : Timed out waiting for expected response
  Wolverine.Runtime.Agents.AgentsStarted for original message <id>
  of type Wolverine.Runtime.Agents.StartAgents with a configured timeout
  of 10000 milliseconds

Tests affected: RavenDbTests.LeaderElection.leadership_election_compliance.
take_over_leader_ship_if_leader_becomes_stale and .leader_switchover_between_nodes.

Root cause: WolverineRuntime.Agents.InvokeAsync<T>(NodeDestination, IAgentCommand)
had an asymmetric timeout - same-node calls got 30s, remote-node calls got 10s.
The asymmetry is backwards: a remote request-reply traverses the control
endpoint + serialization + network, so it should have AT LEAST as much budget
as a same-node in-memory invocation, not less. Under load on shared GitHub
runners that 10s was a real timing race on the cross-node leadership-takeover
scenarios the failing tests exercise, where StartAgents -> AgentsStarted is
sent from the new leader to a target node and the runner can stall just long
enough for the reply not to land in time.

Fix: align remote with same-node at 30s. This is the production constant, not a
test-only setting - the same flake would (rarely) bite users on busy or
slow-network multi-node Wolverine clusters whose leadership-takeover scenario
hits the same StartAgents -> AgentsStarted ack. Same-node already accepts 30s
so the change is conservatively in-line with existing precedent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Message from external system is stored in main store instead of ancillary store (ignoring [MartenStore] attribute at the message handler)

1 participant