#4665 — bump JasperFx 2.8.0 → 2.8.2 + CatchUpAsync per-tenant repro test#4673
Merged
Conversation
…seTenantPartitionedEvents Reproduction of #4665. Shipped as Skip pin until the JasperFx-side fix lands — locally the test hangs the catch-up loop because the global gap detector waits indefinitely for sequence values that will never appear in mt_events under per-tenant partitioning. Leaving it un-skipped would stall CI. Root cause (JasperFx-side): JasperFx.Events.Daemon.JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) — src/JasperFx.Events/Daemon/JasperFxAsyncDaemon.cs:899 — calls _highWater.CheckNowAsync() (store-global HighWaterAgent) and then HighWaterMark() (also store-global, sourced from Tracker.HighWaterMark which the global agent writes). It never consults _tenantHighWater (TenantedHighWaterCoordinator) even though the same daemon already uses it for the normally-running poll loop AND for per-tenant rebuilds via rebuildProjectionForTenant. The bug is purely in the test-automation catch-up code path. Why it bites under UseTenantPartitionedEvents: Each tenant has its own mt_events_sequence_{suffix} that backs mt_events.seq_id for that tenant's events. The global mt_events_sequence is created by the schema bootstrap but never advanced (QuickAppendEventFunction calls nextval on the per-tenant sequence under partitioning). The global HighWaterDetector reads pg_sequences.last_value off the unused global sequence, finds it low, and the safe-zone gap walker tries to fill the "gap" between that low last_value and any committed per-tenant seq_id — walks indefinitely because the missing values will never appear. Marten side is ready: HighWaterDetector exposes SupportsTenantPartitioning, DetectForTenantsAsync, and DetectInSafeZoneForTenantsAsync (#4596 Phase 2). TenantedHighWaterCoordinator + VectorizedHighWaterMonitor already drive the running daemon's per-tenant catch-up correctly. The fix is purely on the JasperFx caller side — CatchUpAsync needs to dispatch on SupportsTenantPartitioning to the per-tenant coordinator instead of the global agent. Reproduction shape: * DocumentStore (not host) with UseTenantPartitionedEvents + Conjoined + QuickWithServerTimestamps — the reporter's environment. * 5 tenants × 16 rounds × 2 events = 160 events appended in cross-tenant interleaved rounds. * setval on the global mt_events_sequence to 10,000 — simulates the "accumulated waste" gap shape that a long-lived shared test database builds up (the issue says the bug only fires after enough cross-test accumulation; the setval bump reproduces that shape deterministically in one test). * daemon.CatchUpAsync(cts.Token) — the buggy JasperFx method, reached store-direct so a parallel live daemon can't mask the bug by advancing via the per-tenant path before the buggy catch-up runs. * Headline assertion: every tenant's LAST appended stream's projection doc must reflect the appended TripLeg (Distance == 1.0). Under the bug this hangs (the catch-up loop never returns) or, if it did return, every tenant's last stream's projection doc would be missing. When to unskip: After JasperFx ships JasperFxAsyncDaemon.CatchUpAsync routing through the per-tenant coordinator and the Marten-side JasperFx.Events version bump in Directory.Packages.props lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremydmiller
added a commit
to JasperFx/jasperfx
that referenced
this pull request
Jun 5, 2026
…r-tenant partitioning; bump to 2.8.2
JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) — the test-automation
catch-up path that ForceAllMartenDaemonActivityToCatchUpAsync drives —
used the store-global _highWater path even when _tenantHighWater was
non-null. Under per-tenant event partitioning the store-global
mt_events_sequence is never advanced (per-tenant mt_events_sequence_{suffix}
values power mt_events.seq_id), so _highWater.CheckNowAsync() leaves the
global high-water pinned at the unused sequence's last_value. Driving
catch-up off HighWaterMark() in that mode leaves every catch-up loop
stuck at zero — the helper returned "success" while every async projection
was still behind.
Fix: when _tenantHighWater is non-null AND the database implements
ICrossTenantRebuildSource (the partitioned-store contract — Marten's
MartenDatabase implements it), fan out per tenant. For each base shard:
* Discover every tenant the projection knows about via
ICrossTenantRebuildSource.FindRebuildTenantsAsync.
* Activate the tenants in _tenantHighWater.PolledTenants and drive one
vectorized poll to fetch fresh ceilings.
* For each (shard, tenant), build a tenant-scoped agent
(asyncShard with { Name = asyncShard.Name.ForTenant(tenantId) }) and
catch it up to that tenant's ceiling.
Mirrors the rebuildProjectionForTenant ceiling-lookup pattern that already
existed for per-tenant rebuilds. Single-tenant stores and non-partitioned
multi-tenant stores keep the byte-for-byte global path.
Marten side: src/TenantPartitionedEventsTests/Regressions/Bug_4665_catch_up_uses_global_high_water.cs
(JasperFx/marten#4673) is the failing reproduction shipped as Skip pin;
when this lands on NuGet and Marten bumps Directory.Packages.props to
2.8.2, the test gets unskipped.
Version bump: 2.8.0 → 2.8.2 (skipping 2.8.1 to avoid clashing with any
local prerelease lines).
Verified locally:
* dotnet build src/JasperFx.Events/JasperFx.Events.csproj -f net9.0 — clean.
* dotnet test src/EventStoreTests -f net9.0 — 72/72 ✅.
* dotnet test src/EventTests -f net9.0 — 351/351 ✅.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremydmiller
added a commit
to JasperFx/jasperfx
that referenced
this pull request
Jun 5, 2026
…r-tenant partitioning; bump to 2.8.2
JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) — the test-automation
catch-up path that ForceAllMartenDaemonActivityToCatchUpAsync drives —
used the store-global _highWater path even when _tenantHighWater was
non-null. Under per-tenant event partitioning the store-global
mt_events_sequence is never advanced (per-tenant mt_events_sequence_{suffix}
values power mt_events.seq_id), so _highWater.CheckNowAsync() leaves the
global high-water pinned at the unused sequence's last_value. Driving
catch-up off HighWaterMark() in that mode leaves every catch-up loop
stuck at zero — the helper returned "success" while every async projection
was still behind.
Fix: when _tenantHighWater is non-null AND the database implements
ICrossTenantRebuildSource (the partitioned-store contract — Marten's
MartenDatabase implements it), fan out per tenant. For each base shard:
* Discover every tenant the projection knows about via
ICrossTenantRebuildSource.FindRebuildTenantsAsync.
* Activate the tenants in _tenantHighWater.PolledTenants and drive one
vectorized poll to fetch fresh ceilings.
* For each (shard, tenant), build a tenant-scoped agent
(asyncShard with { Name = asyncShard.Name.ForTenant(tenantId) }) and
catch it up to that tenant's ceiling.
Mirrors the rebuildProjectionForTenant ceiling-lookup pattern that already
existed for per-tenant rebuilds. Single-tenant stores and non-partitioned
multi-tenant stores keep the byte-for-byte global path.
Marten side: src/TenantPartitionedEventsTests/Regressions/Bug_4665_catch_up_uses_global_high_water.cs
(JasperFx/marten#4673) is the failing reproduction shipped as Skip pin;
when this lands on NuGet and Marten bumps Directory.Packages.props to
2.8.2, the test gets unskipped.
Version bump: 2.8.0 → 2.8.2 (skipping 2.8.1 to avoid clashing with any
local prerelease lines).
Verified locally:
* dotnet build src/JasperFx.Events/JasperFx.Events.csproj -f net9.0 — clean.
* dotnet test src/EventStoreTests -f net9.0 — 72/72 ✅.
* dotnet test src/EventTests -f net9.0 — 351/351 ✅.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reproduction test JasperFx 2.8.2 (#418) shipped the JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) per-tenant dispatch: when _tenantHighWater is non-null AND Database implements ICrossTenantRebuildSource, the catch-up fans out per tenant and uses each tenant's per-tenant high-water ceiling instead of the store-global mark that's pinned at zero under UseTenantPartitionedEvents. Bumps on the Marten side: * Directory.Packages.props: JasperFx + JasperFx.Events + JasperFx.SourceGenerator + JasperFx.Events.SourceGenerator from 2.8.0 → 2.8.2. The bump comment now records both 2.8.1 (#416 empty-page CalculateCeiling guard) and 2.8.2 (this fix). * Bug_4665 reproduction test loses its Skip attribute. The test now runs and passes end-to-end: the catch-up loop advances every tenant's projection past its events, and the headline assertion (every tenant's last appended stream materializes with Distance == 1.0) holds. Verified locally on net9.0: * dotnet restore src/Marten.slnx — clean pull of JasperFx.Events 2.8.2. * dotnet build src/Marten/Marten.csproj -f net9.0 — clean. * Bug_4665 test: Passed, 996ms (was hanging the catch-up loop indefinitely before, which is why we shipped it Skip on the first cut). * dotnet test src/TenantPartitionedEventsTests -f net9.0 — 177 / 177 ✅. * dotnet test src/DaemonTests -f net9.0 — 188 / 188 ✅. Closes #4665 once this lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #4665.
JasperFx side (jasperfx#418)
already shipped as
JasperFx.Events 2.8.2. This PR:JasperFx,JasperFx.Events,JasperFx.SourceGenerator,JasperFx.Events.SourceGenerator) from2.8.0→2.8.2. The bump comment now records both 2.8.1(jasperfx#416 empty-page
CalculateCeilingguard for marten#4663) and 2.8.2 (this fix).Skipattribute on theBug_4665reproduction test. Thetest now runs end-to-end and passes — that's the actual
verification-of-fix gate.
Fix shape (lives in JasperFx 2.8.2)
JasperFxAsyncDaemon.CatchUpAsync(CancellationToken)now dispatches on_tenantHighWater != null && Database is ICrossTenantRebuildSource. Whenthat holds, it fans out per tenant: discovers every tenant via
ICrossTenantRebuildSource.FindRebuildTenantsAsync, activates them in_tenantHighWater.PolledTenants, drives one vectorized poll per shard,and catches up a tenant-scoped agent per
(shard, tenant)pair to thattenant's
CeilingFor(tenantId). Mirrors the existingrebuildProjectionForTenantceiling-lookup pattern. Single-tenant /non-partitioned stores keep the byte-for-byte global path.
Why the test was Skip on the first cut
Before the JasperFx fix, the global gap-detector would walk indefinitely
waiting for sequence values that never appear (the per-tenant
mt_events_sequence_{suffix}values powermt_events.seq_id, and theunused global
mt_events_sequencestays atlast_value = 1). The testhung the catch-up loop past its
CancellationTokentimeout and wouldstall CI. Shipping as
Skipwas the safe option until the upstream fixlanded.
Verification (local, net9.0)
dotnet restore src/Marten.slnxdotnet build src/Marten/Marten.csproj -f net9.0dotnet test --filter "Bug_4665"dotnet test src/TenantPartitionedEventsTests -f net9.0dotnet test src/DaemonTests -f net9.0🤖 Generated with Claude Code