Skip to content

Add EnableStrictStreamIdentityEnforcement flag for cross-partition stream id uniqueness#4293

Merged
jeremydmiller merged 1 commit intomasterfrom
strict-stream-identity-enforcement
Apr 26, 2026
Merged

Add EnableStrictStreamIdentityEnforcement flag for cross-partition stream id uniqueness#4293
jeremydmiller merged 1 commit intomasterfrom
strict-stream-identity-enforcement

Conversation

@jeremydmiller
Copy link
Copy Markdown
Member

Summary

Under UseArchivedStreamPartitioning the mt_streams PK must include is_archived (PostgreSQL requires partition keys in unique constraints). Archive physically moves a row from mt_streams_default to mt_streams_archived, which leaves (id, FALSE) free for reuse — a fresh StartStream against a previously-archived id silently succeeds, diverging from the non-partitioned mode where the unique constraint still fires.

This PR adds an opt-in StoreOptions.Events.EnableStrictStreamIdentityEnforcement flag (default false) that closes that gap.

When enabled:

  • A sibling non-partitioned mt_streams_identity table is created whose PK is just the stream identity (plus tenant_id under conjoined tenancy).
  • The InsertStream codegen wraps the mt_streams INSERT in a modifying CTE that pipes the identity columns into mt_streams_identity in the same prepared statement, so a duplicate identity raises a unique violation that InsertStreamBase.matches() recognizes and TryTransform translates into the same ExistingStreamIdCollisionException you'd get without partitioning.
  • Archive does not touch mt_streams_identity, so the protection spans the active / archived divide.

The flag is primarily aimed at stores whose stream identity is generated outside of Marten (most often user-supplied string keys); Marten-generated Guids almost never need it, so leaving it off remains the right default.

Why a separate table (and not a unique index on mt_streams.id)

PostgreSQL forbids unique constraints on a partitioned table that don't include the partition key, so a literal CREATE UNIQUE INDEX ON mt_streams (id) won't compile under UseArchivedStreamPartitioning. Adding the index per-partition doesn't help either, because archive removes the row from the active partition. A non-partitioned sibling table is the smallest mechanism that genuinely spans both partitions.

We considered a "marker rows in mt_streams_default with keep_until + background sweep" alternative; that route is real but bigger (archive function rewrite, every select … from mt_streams reader needs to filter markers out, plus a sweep entry point) and changes the semantics (allows id reuse after TTL). Left for a possible follow-up; the current flag's name reads as absolute and matches what the non-partitioned mode already guarantees.

Test plan

  • strict_stream_identity_enforcement — 8 facts, all pass:
    • Guid + string × partitioned + non-partitioned start → archive → start throws
    • duplicate-without-archive still throws (sanity)
    • subsequent Append to an existing stream still works
    • flag-off + partitioned reuse still succeeds (documents the pre-existing behavior the flag exists to fix)
  • start_stream_with_id_of_previously_archived_stream — 4 facts, standalone analysis baselines for both partition modes
  • All 33 existing collision + archiving tests still green
  • All 179 end-to-end + quick-append tests still green (codegen change is safe)
  • All 128 EventSourcingTests.Bugs / start_stream / archiving / strict_stream tests green

Documentation

docs/events/archiving.md gets a new Strict Stream Identity After Archive section that walks through the partitioned-PK mechanics, when to use the flag (external string ids), and the explicit trade-off that mt_streams_identity grows monotonically (a "background sweep / TTL" feature is left as a possible follow-up).

🤖 Generated with Claude Code

…ream id uniqueness

Under UseArchivedStreamPartitioning the mt_streams primary key is forced to
include is_archived (PostgreSQL requires partition keys in unique constraints).
Archive physically moves a row from mt_streams_default to mt_streams_archived,
which leaves (id, FALSE) free for reuse and silently allows a fresh
StartStream to land on the identity of a previously archived stream — a
divergence from the non-partitioned mode, where flipping is_archived=TRUE on
the same row keeps the id occupied and the unique constraint still fires.

Add an opt-in flag, EnableStrictStreamIdentityEnforcement (default false),
that closes that gap. When enabled Marten creates a sibling, non-partitioned
mt_streams_identity table whose primary key is just the stream identity
(plus tenant_id under conjoined tenancy). The InsertStream codegen wraps the
mt_streams INSERT in a modifying CTE that pipes the identity columns into
mt_streams_identity in the same prepared statement, so a duplicate identity
raises a unique violation that InsertStreamBase.matches() recognizes and
TryTransform translates into ExistingStreamIdCollisionException — exactly
what the non-partitioned mode already does.

The flag is meant primarily for stores whose stream identity is generated
outside of Marten (most often user-supplied string keys); Marten-generated
Guids almost never need it, so leaving it off remains the right default.

Tests:
  * strict_stream_identity_enforcement (8 facts):
      - Guid + string \u00d7 partitioned + non-partitioned start/archive/start
      - duplicate-without-archive still throws (sanity)
      - append to existing stream still works (the sibling row is only
        written on first append, not on every Append)
      - flag-off + partitioned reuse still succeeds (documents the
        pre-existing behavior the flag exists to fix)
  * start_stream_with_id_of_previously_archived_stream (4 facts):
      Standalone analysis baselines for both modes; useful regression
      anchors and serve as user-facing documentation of the divergence.

Docs: new "Strict Stream Identity After Archive" section in
docs/events/archiving.md walks through the partitioned-PK mechanics, why
the flag exists, when it is and isn't worth turning on, and the trade-off
that mt_streams_identity grows monotonically (a separate "background
sweep / TTL" feature is left for a follow-up if there's actual demand).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant