Skip to content

Fix #4705 — composite projection shards stall at seq 1 under per-tenant event partitioning#4707

Merged
jeremydmiller merged 1 commit into
masterfrom
feature/4705-composite-replay-ceiling
Jun 9, 2026
Merged

Fix #4705 — composite projection shards stall at seq 1 under per-tenant event partitioning#4707
jeremydmiller merged 1 commit into
masterfrom
feature/4705-composite-replay-ceiling

Conversation

@jeremydmiller

Copy link
Copy Markdown
Member

Closes #4705.

Root cause

On startup a composite projection runs an "optimized rebuild" via CompositeReplayExecutor, whose ceiling comes from IEventDatabase.FetchHighestEventSequenceNumber():

var ceiling = await _database.FetchHighestEventSequenceNumber(cancellation); // CompositeReplayExecutor.cs

Marten implemented that as:

select last_value from {schema}.mt_events_sequence;   // the store-global sequence

Under Events.UseTenantPartitionedEvents that global sequence is never advanced — each tenant draws seq_id from its own mt_events_sequence_{suffix}, so its last_value reads as 1. The composite therefore replayed only events 0..1, marked itself caught up, and parked at last_seq_id = 1 — exactly the report. A standalone projection was immune because its continuous agent is driven by the high-water detector (HighWaterMark, computed from max(seq_id)), not this method.

Fix

MartenDatabase.FetchHighestEventSequenceNumber now reads the real high-water under per-tenant partitioning:

Options.Events.UseTenantPartitionedEvents
    ? "select coalesce(max(seq_id), 0) from {schema}.mt_events;"
    : "select last_value from {schema}.mt_events_sequence;"   // unchanged

so the composite single-pass replay gets the correct ceiling and replays the whole stream. FetchEventStoreStatistics.EventSequenceNumber is made partition-aware the same way, so the two "highest sequence" APIs (and FetchMaxEventSequenceAsync) agree — previously both reported the stale global sequence. The non-partitioned path is unchanged.

The projection version is irrelevant — the reporter's stalled shards merely happened to be versioned. The guard reproduces the stall at both v1 and v2.

Tests

  • Regressions/Bug_4705_versioned_composite_per_tenant — single-DB, single-tenant, continuous-daemon guard at v1 AND v2: the composite (bundle + member stages) and a standalone control must both reach the high-water. Reproduces the stall pre-fix (composite parks at 1 while standalone reaches 40); passes post-fix.
  • Admin/event_store_statistics_under_partitioning — the two pins that previously documented the "stale by design" behavior are updated to the corrected contract: FetchHighestEventSequenceNumber, FetchMaxEventSequenceAsync, and FetchEventStoreStatistics.EventSequenceNumber now all agree on max(seq_id) under partitioning.

Verified locally: TenantPartitionedEventsTests 183/183; non-partitioned DaemonTests.Composites 10/10.

🤖 Generated with Claude Code

…nt event partitioning

Root cause: a composite projection runs an "optimized rebuild" via CompositeReplayExecutor on
startup, whose ceiling comes from IEventDatabase.FetchHighestEventSequenceNumber(). Marten
implemented that as `select last_value from mt_events_sequence` — the store-global sequence,
which is NEVER advanced under UseTenantPartitionedEvents (each tenant draws seq_id from its own
mt_events_sequence_{suffix}). So it read as 1 and the composite replayed only events 0..1, then
parked at seq 1. Standalone projections were immune: their continuous agent is driven by the
high-water detector (HighWaterMark, computed from max(seq_id)), not that method.

Fix: under UseTenantPartitionedEvents, FetchHighestEventSequenceNumber reads
`coalesce(max(seq_id), 0) from mt_events` — the real high-water — so the composite replay covers
the whole stream. Also made FetchEventStoreStatistics.EventSequenceNumber partition-aware for
consistency (the two "highest sequence" APIs now agree; previously both read the stale sequence).
The non-partitioned path is unchanged (still mt_events_sequence.last_value).

Note: the projection VERSION is irrelevant (the reporter's stalled shards merely happened to be
versioned); the guard reproduces the stall at both v1 and v2.

Tests:
- Regressions/Bug_4705_versioned_composite_per_tenant: single-DB, single-tenant continuous-daemon
  guard at v1 AND v2 — composite (bundle + members) and a standalone control must both reach the
  high-water.
- Admin/event_store_statistics_under_partitioning: the two pins that documented the stale-by-design
  behavior are updated to the corrected contract (FetchHighest / FetchMax / stats.EventSequenceNumber
  all agree on max(seq_id) under partitioning).

Verified: TenantPartitionedEventsTests 183/183; non-partitioned DaemonTests.Composites 10/10.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Composite projection shards stall at sequence 1 under per-tenant event partitioning (sharded conjoined tenancy)

1 participant