Skip to content

#4665 — bump JasperFx 2.8.0 → 2.8.2 + CatchUpAsync per-tenant repro test#4673

Merged
jeremydmiller merged 2 commits into
masterfrom
fix/4665-catch-up-tenant-aware-repro
Jun 5, 2026
Merged

#4665 — bump JasperFx 2.8.0 → 2.8.2 + CatchUpAsync per-tenant repro test#4673
jeremydmiller merged 2 commits into
masterfrom
fix/4665-catch-up-tenant-aware-repro

Conversation

@jeremydmiller

@jeremydmiller jeremydmiller commented Jun 5, 2026

Copy link
Copy Markdown
Member

Closes #4665.

JasperFx side (jasperfx#418)
already shipped as JasperFx.Events 2.8.2. This PR:

  1. Bumps the four CPM-pinned JasperFx packages (JasperFx, JasperFx.Events,
    JasperFx.SourceGenerator, JasperFx.Events.SourceGenerator) from
    2.8.02.8.2. The bump comment now records both 2.8.1
    (jasperfx#416 empty-page
    CalculateCeiling guard for marten#4663) and 2.8.2 (this fix).
  2. Removes the Skip attribute on the Bug_4665 reproduction test. The
    test now runs end-to-end and passes — that's the actual
    verification-of-fix gate.

Fix shape (lives in JasperFx 2.8.2)

JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) now dispatches on
_tenantHighWater != null && Database is ICrossTenantRebuildSource. When
that holds, it fans out per tenant: discovers every tenant via
ICrossTenantRebuildSource.FindRebuildTenantsAsync, activates them in
_tenantHighWater.PolledTenants, drives one vectorized poll per shard,
and catches up a tenant-scoped agent per (shard, tenant) pair to that
tenant's CeilingFor(tenantId). Mirrors the existing
rebuildProjectionForTenant ceiling-lookup pattern. Single-tenant /
non-partitioned stores keep the byte-for-byte global path.

Why the test was Skip on the first cut

Before the JasperFx fix, the global gap-detector would walk indefinitely
waiting for sequence values that never appear (the per-tenant
mt_events_sequence_{suffix} values power mt_events.seq_id, and the
unused global mt_events_sequence stays at last_value = 1). The test
hung the catch-up loop past its CancellationToken timeout and would
stall CI. Shipping as Skip was the safe option until the upstream fix
landed.

Verification (local, net9.0)

Step Result
dotnet restore src/Marten.slnx clean pull of JasperFx.Events 2.8.2
dotnet build src/Marten/Marten.csproj -f net9.0 clean
dotnet test --filter "Bug_4665" Passed, 996ms (was hanging before)
dotnet test src/TenantPartitionedEventsTests -f net9.0 177 / 177 ✅
dotnet test src/DaemonTests -f net9.0 188 / 188 ✅

🤖 Generated with Claude Code

…seTenantPartitionedEvents

Reproduction of #4665. Shipped as Skip pin until the JasperFx-side fix lands —
locally the test hangs the catch-up loop because the global gap detector waits
indefinitely for sequence values that will never appear in mt_events under
per-tenant partitioning. Leaving it un-skipped would stall CI.

Root cause (JasperFx-side):
JasperFx.Events.Daemon.JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) —
src/JasperFx.Events/Daemon/JasperFxAsyncDaemon.cs:899 — calls
_highWater.CheckNowAsync() (store-global HighWaterAgent) and then
HighWaterMark() (also store-global, sourced from Tracker.HighWaterMark which
the global agent writes). It never consults _tenantHighWater
(TenantedHighWaterCoordinator) even though the same daemon already uses it
for the normally-running poll loop AND for per-tenant rebuilds via
rebuildProjectionForTenant. The bug is purely in the test-automation
catch-up code path.

Why it bites under UseTenantPartitionedEvents:
Each tenant has its own mt_events_sequence_{suffix} that backs
mt_events.seq_id for that tenant's events. The global mt_events_sequence is
created by the schema bootstrap but never advanced (QuickAppendEventFunction
calls nextval on the per-tenant sequence under partitioning). The global
HighWaterDetector reads pg_sequences.last_value off the unused global
sequence, finds it low, and the safe-zone gap walker tries to fill the
"gap" between that low last_value and any committed per-tenant seq_id —
walks indefinitely because the missing values will never appear.

Marten side is ready:
HighWaterDetector exposes SupportsTenantPartitioning,
DetectForTenantsAsync, and DetectInSafeZoneForTenantsAsync (#4596 Phase 2).
TenantedHighWaterCoordinator + VectorizedHighWaterMonitor already drive the
running daemon's per-tenant catch-up correctly. The fix is purely on the
JasperFx caller side — CatchUpAsync needs to dispatch on
SupportsTenantPartitioning to the per-tenant coordinator instead of the
global agent.

Reproduction shape:
* DocumentStore (not host) with UseTenantPartitionedEvents + Conjoined +
  QuickWithServerTimestamps — the reporter's environment.
* 5 tenants × 16 rounds × 2 events = 160 events appended in cross-tenant
  interleaved rounds.
* setval on the global mt_events_sequence to 10,000 — simulates the
  "accumulated waste" gap shape that a long-lived shared test database
  builds up (the issue says the bug only fires after enough cross-test
  accumulation; the setval bump reproduces that shape deterministically in
  one test).
* daemon.CatchUpAsync(cts.Token) — the buggy JasperFx method, reached
  store-direct so a parallel live daemon can't mask the bug by advancing
  via the per-tenant path before the buggy catch-up runs.
* Headline assertion: every tenant's LAST appended stream's projection
  doc must reflect the appended TripLeg (Distance == 1.0). Under the bug
  this hangs (the catch-up loop never returns) or, if it did return,
  every tenant's last stream's projection doc would be missing.

When to unskip:
After JasperFx ships JasperFxAsyncDaemon.CatchUpAsync routing through the
per-tenant coordinator and the Marten-side JasperFx.Events version bump
in Directory.Packages.props lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremydmiller added a commit to JasperFx/jasperfx that referenced this pull request Jun 5, 2026
…r-tenant partitioning; bump to 2.8.2

JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) — the test-automation
catch-up path that ForceAllMartenDaemonActivityToCatchUpAsync drives —
used the store-global _highWater path even when _tenantHighWater was
non-null. Under per-tenant event partitioning the store-global
mt_events_sequence is never advanced (per-tenant mt_events_sequence_{suffix}
values power mt_events.seq_id), so _highWater.CheckNowAsync() leaves the
global high-water pinned at the unused sequence's last_value. Driving
catch-up off HighWaterMark() in that mode leaves every catch-up loop
stuck at zero — the helper returned "success" while every async projection
was still behind.

Fix: when _tenantHighWater is non-null AND the database implements
ICrossTenantRebuildSource (the partitioned-store contract — Marten's
MartenDatabase implements it), fan out per tenant. For each base shard:

* Discover every tenant the projection knows about via
  ICrossTenantRebuildSource.FindRebuildTenantsAsync.
* Activate the tenants in _tenantHighWater.PolledTenants and drive one
  vectorized poll to fetch fresh ceilings.
* For each (shard, tenant), build a tenant-scoped agent
  (asyncShard with { Name = asyncShard.Name.ForTenant(tenantId) }) and
  catch it up to that tenant's ceiling.

Mirrors the rebuildProjectionForTenant ceiling-lookup pattern that already
existed for per-tenant rebuilds. Single-tenant stores and non-partitioned
multi-tenant stores keep the byte-for-byte global path.

Marten side: src/TenantPartitionedEventsTests/Regressions/Bug_4665_catch_up_uses_global_high_water.cs
(JasperFx/marten#4673) is the failing reproduction shipped as Skip pin;
when this lands on NuGet and Marten bumps Directory.Packages.props to
2.8.2, the test gets unskipped.

Version bump: 2.8.0 → 2.8.2 (skipping 2.8.1 to avoid clashing with any
local prerelease lines).

Verified locally:
* dotnet build src/JasperFx.Events/JasperFx.Events.csproj -f net9.0 — clean.
* dotnet test src/EventStoreTests -f net9.0 — 72/72 ✅.
* dotnet test src/EventTests -f net9.0 — 351/351 ✅.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremydmiller added a commit to JasperFx/jasperfx that referenced this pull request Jun 5, 2026
…r-tenant partitioning; bump to 2.8.2

JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) — the test-automation
catch-up path that ForceAllMartenDaemonActivityToCatchUpAsync drives —
used the store-global _highWater path even when _tenantHighWater was
non-null. Under per-tenant event partitioning the store-global
mt_events_sequence is never advanced (per-tenant mt_events_sequence_{suffix}
values power mt_events.seq_id), so _highWater.CheckNowAsync() leaves the
global high-water pinned at the unused sequence's last_value. Driving
catch-up off HighWaterMark() in that mode leaves every catch-up loop
stuck at zero — the helper returned "success" while every async projection
was still behind.

Fix: when _tenantHighWater is non-null AND the database implements
ICrossTenantRebuildSource (the partitioned-store contract — Marten's
MartenDatabase implements it), fan out per tenant. For each base shard:

* Discover every tenant the projection knows about via
  ICrossTenantRebuildSource.FindRebuildTenantsAsync.
* Activate the tenants in _tenantHighWater.PolledTenants and drive one
  vectorized poll to fetch fresh ceilings.
* For each (shard, tenant), build a tenant-scoped agent
  (asyncShard with { Name = asyncShard.Name.ForTenant(tenantId) }) and
  catch it up to that tenant's ceiling.

Mirrors the rebuildProjectionForTenant ceiling-lookup pattern that already
existed for per-tenant rebuilds. Single-tenant stores and non-partitioned
multi-tenant stores keep the byte-for-byte global path.

Marten side: src/TenantPartitionedEventsTests/Regressions/Bug_4665_catch_up_uses_global_high_water.cs
(JasperFx/marten#4673) is the failing reproduction shipped as Skip pin;
when this lands on NuGet and Marten bumps Directory.Packages.props to
2.8.2, the test gets unskipped.

Version bump: 2.8.0 → 2.8.2 (skipping 2.8.1 to avoid clashing with any
local prerelease lines).

Verified locally:
* dotnet build src/JasperFx.Events/JasperFx.Events.csproj -f net9.0 — clean.
* dotnet test src/EventStoreTests -f net9.0 — 72/72 ✅.
* dotnet test src/EventTests -f net9.0 — 351/351 ✅.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reproduction test

JasperFx 2.8.2 (#418) shipped the
JasperFxAsyncDaemon.CatchUpAsync(CancellationToken) per-tenant dispatch: when
_tenantHighWater is non-null AND Database implements ICrossTenantRebuildSource,
the catch-up fans out per tenant and uses each tenant's per-tenant
high-water ceiling instead of the store-global mark that's pinned at zero
under UseTenantPartitionedEvents.

Bumps on the Marten side:

* Directory.Packages.props: JasperFx + JasperFx.Events + JasperFx.SourceGenerator
  + JasperFx.Events.SourceGenerator from 2.8.0 → 2.8.2. The bump comment now
  records both 2.8.1 (#416 empty-page CalculateCeiling guard) and
  2.8.2 (this fix).
* Bug_4665 reproduction test loses its Skip attribute. The test now runs and
  passes end-to-end: the catch-up loop advances every tenant's projection
  past its events, and the headline assertion (every tenant's last appended
  stream materializes with Distance == 1.0) holds.

Verified locally on net9.0:
* dotnet restore src/Marten.slnx — clean pull of JasperFx.Events 2.8.2.
* dotnet build src/Marten/Marten.csproj -f net9.0 — clean.
* Bug_4665 test: Passed, 996ms (was hanging the catch-up loop indefinitely
  before, which is why we shipped it Skip on the first cut).
* dotnet test src/TenantPartitionedEventsTests -f net9.0 — 177 / 177 ✅.
* dotnet test src/DaemonTests -f net9.0 — 188 / 188 ✅.

Closes #4665 once this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller changed the title #4665 — reproduction pin: CatchUpAsync uses global high-water under UseTenantPartitionedEvents #4665 — bump JasperFx 2.8.0 → 2.8.2 + CatchUpAsync per-tenant repro test Jun 5, 2026
@jeremydmiller jeremydmiller merged commit 4c5c260 into master Jun 5, 2026
8 checks passed
@jeremydmiller jeremydmiller deleted the fix/4665-catch-up-tenant-aware-repro branch June 5, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant