#4666 Phase A — Marten.ScaleTesting CLI + event seeder#4672
Merged
Conversation
New src/Marten.ScaleTesting/ project: a JasperFx.CommandLine console runner that drives 20M+ event seeds for the async-daemon scale harness. Internal dev tool only — not packed, not wired into CI, not a benchmark. This is the regression bed for the #4667 race fixes (now landed) and the optimization-engine basis for the upcoming daemon-thread-safety work. What ships in Phase A: * Scaffold + CLI: Microsoft.Extensions.Hosting + JasperFx.CommandLine, modelled on src/EventAppenderPerfTester/. Spectre.Console comes in transitively via JasperFx 2.8 (which requires Spectre 0.55+) so we don't pin it directly and trip the CPM downgrade gate. * Lifted Telehealth domain (Domain/): Appointment / Board / ProviderShift aggregates + their events, plus Patient / Provider / RoutingReason / Specialty reference data. Copy-paste from src/DaemonTests/TeleHealth/ — the harness owns its own fork so we can extend without disturbing test fixtures (per the issue's "lift, don't share" directive). * Event seeder (Seeding/): per-stream generators with realistic Telehealth event shapes (4-8 events/Appointment with 5% early cancel, 6-14 events/Board with optional alert + finish, 4-10 events/ProviderShift with N cycles), weighted-random k-way merge across stream types (70/25/5 appointment/shift/board), bounded Channel<EventBatch> producer with N-way writer fan-out, deterministic-via-seed (`(rootSeed, tenantIdx, streamKind, streamIdx)`-derived RNG per stream). * Cross-stream interleaving happens at the events-table level via the producer's draw order across tenants and stream types. Each batch is one full stream — collapsing earlier per-stream chunking that raced on the per-stream version sequence under Quick append. One stream per batch = full writer fan-out with no contention. * `seed` subcommand: marten-scaletest seed [--wipe] --tenants N --events-per-tenant M --writers W --seed S --buckets B Defaults match the issue: 50 tenants × 400K events × 8 hash buckets × 8 writers = 20M events. Idempotent: queries mt_events grouped by tenant_id and skips tenants already meeting the target. * Reference data seeder per tenant (200 patients + 50 providers + 4 routing reasons + 8 specialties) so the Phase B enrichment projections have data to look up. * Conjoined multi-tenancy + AllDocumentsAreMultiTenantedWithPartitioning (ByHash buckets) bootstrap that mirrors multi_stage_projections.cs:246-254 — the schema shape Phase B will rebuild against. Phase A intentionally does NOT register any Snapshot<T>: registering would require running the JasperFx.Events.SourceGenerator on this project to emit each aggregate's partial-class dispatcher, which is dead weight for a pure-seeding run. StartStream<T> on the seeder just tags the stream with the aggregate type name; Phase B's CompositeReplayExecutor reads the tag back when picking the right projection. Verified locally (Postgres on docker-compose's localhost:5432): * Build clean (561 pre-existing warnings carried in from Marten, 0 errors). * Seed 2 tenants × 5,000 events × 4 writers: 11,292 events in 0.9s (~12,352 evts/sec on the dev box, well above the 5K/sec/writer EventAppenderPerfTester baseline). * Idempotency: second run with same args + no `--wipe` correctly skipped with `Seed skipped — all 2 tenants already have ≥ 5000 events.` * The Marten.slnx and Directory.Packages.props edits: add the new project + add Microsoft.Extensions.Hosting 10.0.0 to CPM (Hosting.Abstractions was already there). Phases B / C / D from the issue are deferred: * Phase B: composite projection topology (4 stage-1 + 2 stage-2 + 2 NEW stage-3) + `rebuild` subcommand on the single-pass CompositeReplayExecutor path. * Phase C: `validate` (single-shard baseline diff) + `stress` chain + JSON metrics sink. * Phase D: use the harness to drive the daemon-thread-safety synthesis fixes per the #4667 follow-up plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase A of #4666 — internal
dev-tool harness for driving 20M+ event seeds against the async daemon. The
regression bed for the #4667
race fixes (now landed) and the optimization-engine basis for the upcoming
daemon-thread-safety work.
Not packed, not in CI, not a benchmark. Pure tooling.
What ships in Phase A
src/Marten.ScaleTesting/projectsrc/EventAppenderPerfTester/.Domain/)src/DaemonTests/TeleHealth/— harness owns its own fork per the issue's "lift, don't share" directive.Seeding/)Channel<EventBatch>+ N-way writer fan-out. Deterministic via(rootSeed, tenantIdx, streamKind, streamIdx)-derived RNG per stream.AllDocumentsAreMultiTenantedWithPartitioning(ByHash)mirroringmulti_stage_projections.cs:246-254. The schema shape Phase B will rebuild against.seedsubcommandmarten-scaletest seed [--wipe] --tenants N --events-per-tenant M --writers W --seed S --buckets B. Defaults: 50 × 400K × 8 buckets × 8 writers = 20M. Idempotent.Design note: one batch per stream
The first cut chunked each stream into multiple smaller batches to interleave
mid-stream events at the per-batch level. Under Quick append mode this raced
on the per-stream version sequence (
pk_mt_events_stream_and_versionunique-constraint violation when two writers picked up consecutive batches of
the same stream).
Final shape: each batch is one complete stream. Cross-stream interleaving
at the
mt_eventstable level still happens via the producer's draw orderacross tenants and stream types. Writer fan-out is fully parallel because no
two batches ever touch the same stream.
Phase A intentionally skips snapshot registration
Registering
Snapshot<T>would require running theJasperFx.Events.SourceGeneratoron this project to emit each aggregate'spartial-class dispatcher — dead weight for a pure-seeding run.
StartStream<T>on the seeder just tags the stream with the aggregate type name; Phase B's
CompositeReplayExecutorreads the tag back when picking the rightprojection.
Acceptance criteria
seedruns to completion at the requested event countmt_eventsper-tenant rollup)mt_eventsrow count matches the requested totalVerification
Local on net10.0 against docker-compose's Postgres on
localhost:5432:--wipe)Seed skipped — all 2 tenants already have ≥ 5000 events.Deferred (Phases B / C / D from the issue)
rebuildsubcommand on the single-passCompositeReplayExecutorpath.validate(single-shard baseline diff) +stresschain + JSON metrics sink.🤖 Generated with Claude Code