Fix #4705 — composite projection shards stall at seq 1 under per-tenant event partitioning#4707
Merged
Merged
Conversation
…nt event partitioning
Root cause: a composite projection runs an "optimized rebuild" via CompositeReplayExecutor on
startup, whose ceiling comes from IEventDatabase.FetchHighestEventSequenceNumber(). Marten
implemented that as `select last_value from mt_events_sequence` — the store-global sequence,
which is NEVER advanced under UseTenantPartitionedEvents (each tenant draws seq_id from its own
mt_events_sequence_{suffix}). So it read as 1 and the composite replayed only events 0..1, then
parked at seq 1. Standalone projections were immune: their continuous agent is driven by the
high-water detector (HighWaterMark, computed from max(seq_id)), not that method.
Fix: under UseTenantPartitionedEvents, FetchHighestEventSequenceNumber reads
`coalesce(max(seq_id), 0) from mt_events` — the real high-water — so the composite replay covers
the whole stream. Also made FetchEventStoreStatistics.EventSequenceNumber partition-aware for
consistency (the two "highest sequence" APIs now agree; previously both read the stale sequence).
The non-partitioned path is unchanged (still mt_events_sequence.last_value).
Note: the projection VERSION is irrelevant (the reporter's stalled shards merely happened to be
versioned); the guard reproduces the stall at both v1 and v2.
Tests:
- Regressions/Bug_4705_versioned_composite_per_tenant: single-DB, single-tenant continuous-daemon
guard at v1 AND v2 — composite (bundle + members) and a standalone control must both reach the
high-water.
- Admin/event_store_statistics_under_partitioning: the two pins that documented the stale-by-design
behavior are updated to the corrected contract (FetchHighest / FetchMax / stats.EventSequenceNumber
all agree on max(seq_id) under partitioning).
Verified: TenantPartitionedEventsTests 183/183; non-partitioned DaemonTests.Composites 10/10.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #4705.
Root cause
On startup a composite projection runs an "optimized rebuild" via
CompositeReplayExecutor, whose ceiling comes fromIEventDatabase.FetchHighestEventSequenceNumber():Marten implemented that as:
Under
Events.UseTenantPartitionedEventsthat global sequence is never advanced — each tenant drawsseq_idfrom its ownmt_events_sequence_{suffix}, so itslast_valuereads as 1. The composite therefore replayed only events0..1, marked itself caught up, and parked atlast_seq_id = 1— exactly the report. A standalone projection was immune because its continuous agent is driven by the high-water detector (HighWaterMark, computed frommax(seq_id)), not this method.Fix
MartenDatabase.FetchHighestEventSequenceNumbernow reads the real high-water under per-tenant partitioning:so the composite single-pass replay gets the correct ceiling and replays the whole stream.
FetchEventStoreStatistics.EventSequenceNumberis made partition-aware the same way, so the two "highest sequence" APIs (andFetchMaxEventSequenceAsync) agree — previously both reported the stale global sequence. The non-partitioned path is unchanged.The projection version is irrelevant — the reporter's stalled shards merely happened to be versioned. The guard reproduces the stall at both v1 and v2.
Tests
Regressions/Bug_4705_versioned_composite_per_tenant— single-DB, single-tenant, continuous-daemon guard at v1 AND v2: the composite (bundle + member stages) and a standalone control must both reach the high-water. Reproduces the stall pre-fix (composite parks at 1 while standalone reaches 40); passes post-fix.Admin/event_store_statistics_under_partitioning— the two pins that previously documented the "stale by design" behavior are updated to the corrected contract:FetchHighestEventSequenceNumber,FetchMaxEventSequenceAsync, andFetchEventStoreStatistics.EventSequenceNumbernow all agree onmax(seq_id)under partitioning.Verified locally:
TenantPartitionedEventsTests183/183; non-partitionedDaemonTests.Composites10/10.🤖 Generated with Claude Code