Skip to content

perf(sqlserver-transport): index dequeue path + opt-in clustered queue layout#3277

Merged
jeremydmiller merged 1 commit into
mainfrom
perf/sqlserver-transport
Jun 29, 2026
Merged

perf(sqlserver-transport): index dequeue path + opt-in clustered queue layout#3277
jeremydmiller merged 1 commit into
mainfrom
perf/sqlserver-transport

Conversation

@jeremydmiller

Copy link
Copy Markdown
Member

Summary

Performance work on the Sql Server queue transport. The dequeue path (SELECT TOP(n) ... ORDER BY timestamp) had no supporting index, so every poll scanned and sorted the whole queue table — the only index was the clustered primary key on a random Guid, which is also poor for write locality. This PR fixes the default and adds an opt-in high-throughput layout.

Changes

Default (safe, additive — no opt-in)

  • Index the queue table timestamp column so the ordered TOP(n) dequeue is a seek instead of a full scan + sort.
  • Robust idempotency: replaced locale-fragile e.Message.Contains("Violation of PRIMARY KEY constraint") with SQL error numbers (2627 / 2601).

Opt-in: OptimizeQueueThroughput()

A new fluent setting on the Sql Server transport config:

opts.UseSqlServerPersistenceAndTransport(connectionString)
    .OptimizeQueueThroughput();

The queue and scheduled tables are then clustered on a monotonic seq identity (FIFO dequeue as a clustered seek + physically contiguous deletes), with a unique non-clustered index on the message id (idempotent sends rely on the 2601 unique-index violation) and a filtered index on keep_until for the expiry sweep. This mirrors the layout Wolverine's own NServiceBus interop transport already uses (NServiceBusQueueTable).

Default is off; enabling it on an existing database triggers a one-time queue-table rebuild, so it's documented as a maintenance-window change. New apps can turn it on from the start. The NServiceBus interop transport already uses the equivalent clustered layout, so no opt-in is exposed there.

Docs

New "Optimizing Queue Throughput" section in the Sql Server guide.

Benchmarks

Raw-DDL A/B/C benchmark (batch 50, 30k-row backlog), included as a skipped manual test:

Layout drain throughput deep-pop median deep-pop p95
baseline (clustered GUID, no index) 98/s 845 ms 1860 ms
default (+ timestamp index) 498/s 2.4 ms 853 ms
OptimizeQueueThroughput() (clustered seq) 34,612/s 2.4 ms 3.7 ms

Follow-up (not in this PR)

Batched outbox→queue send via a TVP, using a default-method ISender extension (the durable send path is per-envelope today). Tracked separately.

Validation / reviewer note

  • ✅ Builds clean (full wolverine.slnx, Release).
  • ✅ Physical schema designs proven by the raw-DDL benchmark; Weasel model uses the same APIs as the existing NServiceBusQueueTable (AutoNumber() + IsClustered + filtered Predicate).
  • ⚠️ I could not run the xUnit transport suite / Weasel auto-migration locally — the dev SQL Server containers were unusable (one OOM-crashes, the other times out servicing Weasel's schema-introspection queries). The Weasel migration for the opt-in layout (fresh create and the existing-DB rebuild) needs to be validated by CI. A new integration test verify_optimized_schema_provisions_and_roundtrips exercises exactly that path, and the existing transport compliance suite covers the default layout.

🤖 Generated with Claude Code

…e layout

The Sql Server queue transport's dequeue path (TOP(n) ... ORDER BY timestamp)
had no supporting index, so every poll scanned and sorted the whole queue table
(the only index was the clustered PK on a random Guid). On a backlog this is
catastrophic — benchmarked at ~845ms median to pop a batch of 50 from a 30k-row
queue.

Default (safe, additive):
- Add an index on the queue table's `timestamp` column so the ordered TOP(n)
  dequeue is a seek. Benchmark: deep-pop median 845ms -> 2.4ms, drain 98 -> 498
  msg/s. No migration risk (additive index).
- Replace locale-fragile idempotency checks (e.Message.Contains("Violation of
  PRIMARY KEY constraint")) with SQL error numbers (2627 / 2601).

Opt-in via OptimizeQueueThroughput() on the Sql Server transport config:
- Queue and scheduled tables are clustered on a monotonic `seq` identity (FIFO
  dequeue as a clustered seek + contiguous deletes), with a unique non-clustered
  index on the message id (idempotent sends rely on 2601) and a filtered index
  on keep_until for the expiry sweep. Mirrors the proven NServiceBus SQL
  transport layout (NServiceBusQueueTable). Benchmark vs default: drain 98 ->
  34,612 msg/s, deep-pop p95 1860ms -> 3.7ms.
- Default off: enabling it on an existing DB triggers a one-time queue-table
  rebuild, so it's opt-in and documented as a maintenance-window change.

The NServiceBus interop transport already uses the equivalent clustered layout,
so no opt-in is exposed (or possible) there.

Docs: new "Optimizing Queue Throughput" section in the Sql Server guide.

Includes a manual A/B/C benchmark (skipped in CI) and a CI integration test
that verifies the optimized schema provisions and round-trips.

Batched send (outbox -> queue via a TVP) is intentionally deferred to a
follow-up PR using a default-method ISender extension.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller merged commit b66c241 into main Jun 29, 2026
26 checks passed
This was referenced Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant