Fix #4761 + #4763: per-tenant non-stale wait + sharded reassignment tenant_count#4764
Merged
Merged
Conversation
…ount Two independent multi-tenant + tenant-partitioned bugs (repros from #4762, thanks @erdtsieck): #4761 — WaitForNonStaleData never reports non-stale when multiple tenants share a shard. Under UseTenantPartitionedEvents each tenant has its own mt_events_sequence, so seq_ids overlap and a single store-global "initial" (the max across tenants) is not a valid bar for every per-tenant progression row — a tenant with fewer events legitimately tops out below the global max, so the "all rows >= initial" check could never pass and the wait timed out even though every tenant's data was fully projected. WaitForNonStaleDataAsync is now per-tenant aware under partitioning: it requires each registered projection shard to have caught its OWN tenant up to that tenant's HighWaterMark:<tenant> mark, and guards against a premature pass before any work is done. The non-partitioned path is unchanged. #4763 — ShardedTenancy.AssignTenantAsync only recomputed the TARGET shard's tenant_count on a re-assignment, leaving the SOURCE shard inflated forever (which made UseSmallestDatabaseAssignment mis-rank it). It now captures the tenant's prior shard before the upsert and recomputes both the source and target counts. Full TenantPartitionedEventsTests suite green (191); non-partitioned WaitForNonStaleData consumers unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jeremydmiller
added a commit
that referenced
this pull request
Jun 18, 2026
…ant-partitioned (#4767) Locks in the fix from #4764. Under MultiTenantedWithShardedDatabases + UseTenantPartitionedEvents, an async CompositeProjection was not driven to a non-stale state by the normal daemon catch-up path (StartAllAsync + WaitForNonStaleData) — the member read models stayed empty after catch-up returned. Root cause was shared with #4761: WaitForNonStaleData's "caught up" check could be satisfied by the HighWaterMark rows alone, without requiring the composite's per-tenant projection-progression row. #4764 fixed that. This test pins the behavior: a sharded, tenant-partitioned composite must materialize BOTH stages via the async daemon (not just via RebuildProjectionAsync). Passes on JasperFx 2.13.0 (the merged #4764 fix is sufficient; the defensive JasperFx ExecutionStage guard from #457 is a separate invariant assertion that rides the next routine JasperFx bump). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 19, 2026
Closed
This was referenced Jun 26, 2026
This was referenced Jun 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #4761. Closes #4763. Reproductions courtesy of #4762 (@erdtsieck) — those two repro tests are included here and now pass. (#4751 from that same repro PR needs a JasperFx-side change and is being handled separately.)
#4761 —
WaitForNonStaleDatanever completes for a multi-tenant shardUnder
MultiTenantedWithShardedDatabases+UseTenantPartitionedEvents, when two tenants share a shard, each has its ownmt_events_sequence_<suffix>(overlapping seq_ids).WaitForNonStaleDataAsynccheckedprojections.All(x => x.Sequence >= initial.EventSequenceNumber)whereinitialis the global max seq_id. A tenant with fewer events legitimately tops out below that max (e.g.HighWaterMark:tenant_y = 2while global = 3), so the check could never pass — the wait timed out ("...reaching the initial sequence of 3") even though both tenants' data was fully projected.Fix:
WaitForNonStaleDataAsyncis now per-tenant aware under partitioning — it requires each registered projection shard to have caught its own tenant up to that tenant'sHighWaterMark:<tenant>mark (and guards against a premature pass before the daemon has done any work). The non-partitioned path is byte-for-byte unchanged.#4763 — sharded reassignment leaves the source shard's
tenant_countinflatedShardedTenancy.AssignTenantAsyncrecomputedtenant_countonly for the target shard. Re-assigning a tenant A→B never decremented A, so A stayed inflated forever andUseSmallestDatabaseAssignmentkept mis-ranking it as fuller.Fix: capture the tenant's prior shard before the upsert and recompute both the source and target shard counts.
Tests
Bug_4761_per_tenant_progression_same_shardandBug_4763_reassign_count_divergence(from Failing repros: sharded tenant-partitioned daemon catch-up + tenant-count divergence (#4751, #4761, #4763) #4762) now pass.TenantPartitionedEventsTestssuite green (191); non-partitionedWaitForNonStaleDataconsumers (e.g.querying_with_non_stale_data) unaffected.🤖 Generated with Claude Code