Skip to content

Fix #4761 + #4763: per-tenant non-stale wait + sharded reassignment tenant_count#4764

Merged
jeremydmiller merged 1 commit into
masterfrom
fix-4761-4763-sharded-tenant-partitioned
Jun 18, 2026
Merged

Fix #4761 + #4763: per-tenant non-stale wait + sharded reassignment tenant_count#4764
jeremydmiller merged 1 commit into
masterfrom
fix-4761-4763-sharded-tenant-partitioned

Conversation

@jeremydmiller

Copy link
Copy Markdown
Member

Closes #4761. Closes #4763. Reproductions courtesy of #4762 (@erdtsieck) — those two repro tests are included here and now pass. (#4751 from that same repro PR needs a JasperFx-side change and is being handled separately.)

#4761WaitForNonStaleData never completes for a multi-tenant shard

Under MultiTenantedWithShardedDatabases + UseTenantPartitionedEvents, when two tenants share a shard, each has its own mt_events_sequence_<suffix> (overlapping seq_ids). WaitForNonStaleDataAsync checked projections.All(x => x.Sequence >= initial.EventSequenceNumber) where initial is the global max seq_id. A tenant with fewer events legitimately tops out below that max (e.g. HighWaterMark:tenant_y = 2 while global = 3), so the check could never pass — the wait timed out ("...reaching the initial sequence of 3") even though both tenants' data was fully projected.

Fix: WaitForNonStaleDataAsync is now per-tenant aware under partitioning — it requires each registered projection shard to have caught its own tenant up to that tenant's HighWaterMark:<tenant> mark (and guards against a premature pass before the daemon has done any work). The non-partitioned path is byte-for-byte unchanged.

#4763 — sharded reassignment leaves the source shard's tenant_count inflated

ShardedTenancy.AssignTenantAsync recomputed tenant_count only for the target shard. Re-assigning a tenant A→B never decremented A, so A stayed inflated forever and UseSmallestDatabaseAssignment kept mis-ranking it as fuller.

Fix: capture the tenant's prior shard before the upsert and recompute both the source and target shard counts.

Tests

🤖 Generated with Claude Code

…ount

Two independent multi-tenant + tenant-partitioned bugs (repros from #4762, thanks
@erdtsieck):

#4761 — WaitForNonStaleData never reports non-stale when multiple tenants share a
shard. Under UseTenantPartitionedEvents each tenant has its own mt_events_sequence,
so seq_ids overlap and a single store-global "initial" (the max across tenants) is
not a valid bar for every per-tenant progression row — a tenant with fewer events
legitimately tops out below the global max, so the "all rows >= initial" check could
never pass and the wait timed out even though every tenant's data was fully
projected. WaitForNonStaleDataAsync is now per-tenant aware under partitioning: it
requires each registered projection shard to have caught its OWN tenant up to that
tenant's HighWaterMark:<tenant> mark, and guards against a premature pass before any
work is done. The non-partitioned path is unchanged.

#4763 — ShardedTenancy.AssignTenantAsync only recomputed the TARGET shard's
tenant_count on a re-assignment, leaving the SOURCE shard inflated forever (which
made UseSmallestDatabaseAssignment mis-rank it). It now captures the tenant's prior
shard before the upsert and recomputes both the source and target counts.

Full TenantPartitionedEventsTests suite green (191); non-partitioned
WaitForNonStaleData consumers unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller merged commit 504bda0 into master Jun 18, 2026
9 checks passed
@jeremydmiller jeremydmiller deleted the fix-4761-4763-sharded-tenant-partitioned branch June 18, 2026 12:50
jeremydmiller added a commit that referenced this pull request Jun 18, 2026
…ant-partitioned (#4767)

Locks in the fix from #4764. Under MultiTenantedWithShardedDatabases +
UseTenantPartitionedEvents, an async CompositeProjection was not driven to a
non-stale state by the normal daemon catch-up path (StartAllAsync +
WaitForNonStaleData) — the member read models stayed empty after catch-up
returned.

Root cause was shared with #4761: WaitForNonStaleData's "caught up" check could
be satisfied by the HighWaterMark rows alone, without requiring the composite's
per-tenant projection-progression row. #4764 fixed that. This test pins the
behavior: a sharded, tenant-partitioned composite must materialize BOTH stages
via the async daemon (not just via RebuildProjectionAsync).

Passes on JasperFx 2.13.0 (the merged #4764 fix is sufficient; the defensive
JasperFx ExecutionStage guard from #457 is a separate invariant
assertion that rides the next routine JasperFx bump).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant