Skip to content

#4712 + #4717 — per-tenant partitioning daemon: safe-harbor high-water + per-tenant progression (JasperFx 2.9.4)#4714

Merged
jeremydmiller merged 4 commits into
masterfrom
fix/4712-safe-harbor-high-water
Jun 10, 2026
Merged

#4712 + #4717 — per-tenant partitioning daemon: safe-harbor high-water + per-tenant progression (JasperFx 2.9.4)#4714
jeremydmiller merged 4 commits into
masterfrom
fix/4712-safe-harbor-high-water

Conversation

@jeremydmiller

Copy link
Copy Markdown
Member

Fixes #4712. Follow-up to #4705.

Problem

Under UseTenantPartitionedEvents on a sharded conjoined store, composite projection rebuilds hang. The daemon logs a SafeHarborTime of 0001-01-01 (≈ DateTime.MinValue + the 3s stale threshold), the gap-skip becomes a no-op, and the store-global high-water agent loops forever — silently freezing some composite rebuilds.

Root cause (the #4705 bug class, one query that was missed)

The store-global HighWaterStatisticsDetector reads select last_value from mt_events_sequence for HighestSequence. Under per-tenant partitioning the store-global mt_events_sequence is never advanced (each tenant draws seq_ids from its own mt_events_sequence_{suffix}), so HighestSequence reads 1 while the true mark is far higher → the agent treats the store as perpetually Stale. And because no store-global HighWaterMark progression row is read, HighWaterStatistics.Timestamp is left at default(DateTimeOffset) = 0001-01-01 — the source of the bogus SafeHarborTime.

Fix (mirrors the #4705 FetchHighestEventSequenceNumber change)

  • Read coalesce(max(seq_id), 0) from mt_events when UseTenantPartitionedEvents.
  • Stamp Timestamp from that first result (which always returns a row), so it can never be left at 0001-01-01 when the progression row is absent.
  • Non-partitioned stores keep reading last_value from mt_events_sequence.

Test

Bug_4712_safe_harbor_high_water drives HighWaterDetector.Detect directly under per-tenant partitioning. Before the fix: HighestSequence=1, Timestamp=0001-01-01 (with CurrentMark=40). After: HighestSequence=40 and a real timestamp. Deterministic single-DB single-tenant repro (the detector-level seam; the sharded multi-composite hang is the downstream symptom). Non-partitioned high-water detection tests unchanged (10/10).

🤖 Generated with Claude Code

…nant partitioning

Follow-up to #4705. The store-global HighWaterStatisticsDetector read
`select last_value from mt_events_sequence` for HighestSequence. Under
UseTenantPartitionedEvents the store-global sequence is never advanced (each
tenant draws seq_ids from its own mt_events_sequence_{suffix}), so HighestSequence
read 1 while the true mark was far higher. The store-global high-water agent then
treated the store as perpetually Stale and, because no store-global HighWaterMark
progression row was read, left HighWaterStatistics.Timestamp at
default(DateTimeOffset) = 0001-01-01 — which the daemon turned into a bogus
SafeHarborTime (0001-01-01 + 3s threshold), making the gap-skip a no-op and
hanging composite projection rebuilds.

Fix (mirrors the #4705 FetchHighestEventSequenceNumber change):
- read coalesce(max(seq_id),0) from mt_events when UseTenantPartitionedEvents;
- stamp Timestamp from that first result (which always returns a row) so it can
  never be left at default/0001-01-01 when the progression row is absent.
Non-partitioned stores keep reading last_value from mt_events_sequence.

Regression test Bug_4712_safe_harbor_high_water drives HighWaterDetector.Detect
directly under per-tenant partitioning: before the fix HighestSequence=1 and
Timestamp=0001-01-01 (with CurrentMark=40); after, HighestSequence=40 and a real
timestamp. Single-DB single-tenant — per-tenant partitioning is the only factor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jeremydmiller and others added 2 commits June 10, 2026 10:47
…l progression (skipped)

Follow-up to #4712/#4714. Demonstrates the #4717 requirement: under
UseTenantPartitionedEvents the async daemon must persist PER-TENANT progression
records (per tenant per projection) plus a per-tenant high-water, because each
tenant's events use its own mt_events_sequence_{suffix} starting at 1 — a single
store-global <Projection>:All shard cannot track multiple tenants.

Bug_4717_per_tenant_progression runs two tenants of DIFFERENT heights (20 and 12
events) on one per-tenant-partitioned database, with BOTH a composite projection
and a standalone async projection running continuously, then asserts a per-tenant
progression row per (projection, tenant) at that tenant's own height plus a
per-tenant HighWaterMark row.

Proven RED on master/JasperFx 2.9.2 — mt_event_progression holds only:
    20 | bug4717-composite:All / Bug4717Count:All / Bug4717Standalone:All / Bug4717Trip:All
    20 | HighWaterMark
(no <Projection>:tenant rows; tenant B's 12 events untracked). The continuous
daemon starts one store-global agent per projection (JasperFxAsyncDaemon.StartAllAsync)
and never fans out per tenant; the per-tenant high-water machinery is read/route-only.

Skipped pending the JasperFx per-tenant continuous-progression fix (separate PR);
un-skip + bump once it ships.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…high-water persistence

JasperFx 2.9.4 makes the async daemon fan out a continuous agent per (shard, tenant)
under UseTenantPartitionedEvents, so each tenant's projection advances against its own
high-water and persists its own <Projection>:All:<tenant> progression row (marten#4717).
2.9.4 also carries the projections-rebuild subscription fix (#438).

Marten side:
- Bump JasperFx* 2.9.2 -> 2.9.4.
- HighWaterDetector.MarkHighWaterForTenantAsync: implement the new IHighWaterDetector hook
  to persist a durable per-tenant HighWaterMark:<tenant> row (keyed on
  HighWaterShardIdentity.PerTenant) — invoked by JasperFx's TenantedHighWaterCoordinator.
- Un-skip Bug_4717_per_tenant_progression: now green — two tenants of different heights,
  composite + standalone, each get per-tenant projection rows AND per-tenant high-water rows
  at their own height.
- sharded_daemon_per_shard_progression: POLL for the per-tenant rows/docs instead of asserting
  immediately after WaitForNonStaleData. That helper's caught-up check counts store-global
  shards, so it can return before a per-tenant agent commits on a partitioned store — making
  the immediate assert racy. (Per-tenant progression itself is correct; only the test wait was
  racy. Hardening WaitForNonStaleData for partitioned stores is a separate follow-up.)

Full TenantPartitionedEventsTests 186/186 on clean DBs; sharded tests stable across repeated runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller changed the title Fix #4712 — composite rebuilds hang under per-tenant partitioning (SafeHarborTime 0001-01-01) #4712 + #4717 — per-tenant partitioning daemon: safe-harbor high-water + per-tenant progression (JasperFx 2.9.4) Jun 10, 2026
@jeremydmiller

Copy link
Copy Markdown
Member Author

Follow-up: #4717 per-tenant progression (consume JasperFx 2.9.4)

Building on the #4712 fix, this PR now also adopts the per-tenant continuous-progression work shipped in JasperFx.Events 2.9.4 and resolves #4717.

  • Bumped JasperFx* 2.9.2 → 2.9.4 (per-tenant continuous agents fan out one per (shard, tenant); also carries the projections-rebuild subscription fix, HiLoSequence Increment is not used #438).
  • HighWaterDetector.MarkHighWaterForTenantAsync — Marten implements the new IHighWaterDetector hook to persist a durable HighWaterMark:<tenant> row.
  • Un-skipped Bug_4717_per_tenant_progression — now green: two tenants of different heights, composite + standalone, each get per-tenant <Projection>:All:<tenant> rows and per-tenant high-water rows at their own height.
  • sharded_daemon_per_shard_progression (×2) now poll for the per-tenant rows/docs. WaitForNonStaleData's caught-up check counts store-global shards, so it can return before a per-tenant agent commits on a partitioned store — making the immediate assert racy. Per-tenant progression itself is correct; only the test wait was racy.

Verification: full TenantPartitionedEventsTests 186/186 on clean DBs; the sharded tests are stable across repeated runs (including under within-run shard accumulation).

Investigation note: the earlier "sharded regression" / "1-event-per-tenant commit bug" turned out to be a flaky WaitForNonStaleData + dirty-DB artifact — not a product bug. Per-tenant progression is production-correct, so no JasperFx 2.9.5 was needed. Hardening WaitForNonStaleData to be per-tenant-aware (so the public helper is reliable on partitioned stores) is a worthwhile separate follow-up.

…ssion)

2.9.4 carried a source-generator regression (#432) that dropped the
generated dispatcher for self-aggregating projections, so EventSourcingTests
(SingleStreamProjection<SimpleAggregate, Guid>) failed CI with
"No source-generated dispatcher found". Fixed in JasperFx 2.9.5 (#439):
the Pipeline-1 dedupe now uses a set distinct from the cross-pipeline `seen`.

EventSourcingTests Aggregation green against published 2.9.5; per-tenant suite
186/186 unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Composite projection rebuilds hang under per-tenant event partitioning (9.7.1) — SafeHarborTime computed as DateTime.MinValue (follow-up to #4705)

1 participant