Skip to content

Fix DLQ replay loop for buffered local queues (GH-1942)#2538

Merged
jeremydmiller merged 3 commits intomainfrom
fix/non-durable-dlq-replay-loop-1942
Apr 20, 2026
Merged

Fix DLQ replay loop for buffered local queues (GH-1942)#2538
jeremydmiller merged 3 commits intomainfrom
fix/non-durable-dlq-replay-loop-1942

Conversation

@jeremydmiller
Copy link
Copy Markdown
Member

Summary

  • Fixes Re-evaluate the usage of database backed DLQ on non-Durable endpoints #1942: replaying a database-backed DLQ message to a local queue with BufferedInMemory() left the row in wolverine_incoming forever, causing it to be reprocessed on every host restart.
  • Root cause: BufferedLocalQueue.EnqueueDirectlyAsync (the recovery entry point for local queues) used the BufferedReceiver itself as the channel callback. Its CompleteAsync is a no-op, so the inbox row was never marked Handled. NodeAgentController.StartLocally then reset owner_id = 0 on the next startup and recovery picked it up again.
  • Fix mirrors the RabbitMQ DLQ Replay not triggered #1594 pattern already used in ListeningAgent.EnqueueDirectlyAsync for transports: wrap the recovered envelope in a tiny LocalQueueRecoveryListener whose CompleteAsync calls MarkIncomingEnvelopeAsHandledAsync, then route through IReceiver.ReceivedAsync so the existing _completeBlock fires the wrapper.

Scope

  • Only the local-queue + BufferedInMemory path was broken. Transport-backed Buffered/Inline endpoints (RabbitMQ, etc.) already work correctly via the RabbitMQ DLQ Replay not triggered #1594 fix — a new RabbitMQ + Inline + Postgres reproducer is included that passed both before and after this change, pinning that behavior down.
  • TCP listeners don't support .ProcessInline() at all (the mode setter throws), so there is no TCP case to cover.

Files

  • src/Wolverine/Transports/Local/LocalQueueRecoveryListener.cs (new) — IListener wrapper used only for the recovery path into a non-durable local queue.
  • src/Wolverine/Transports/Local/BufferedLocalQueue.cs — constructor stores the runtime; EnqueueDirectlyAsync now dispatches through IReceiver.ReceivedAsync with the wrapper.
  • src/Persistence/PostgresqlTests/Bugs/Bug_1942_replay_dlq_to_buffered_or_inline.cs (new) — two tests: buffered local queue (fails pre-fix, passes post-fix) and RabbitMQ Inline (passes both, locks in RabbitMQ DLQ Replay not triggered #1594).
  • src/Persistence/PostgresqlTests/PostgresqlTests.csproj — added Wolverine.RabbitMQ project reference for the second test.

Test plan

  • dotnet test src/Persistence/PostgresqlTests/PostgresqlTests.csproj --framework net9.0 --filter Bug_1942 — both tests pass
  • dotnet test src/Persistence/PostgresqlTests/PostgresqlTests.csproj --framework net9.0 — 357 / 357 pass
  • dotnet test src/Testing/CoreTests/CoreTests.csproj --framework net9.0 — 1344 / 1344 pass
  • dotnet test src/Transports/RabbitMQ/Wolverine.RabbitMQ.Tests/Wolverine.RabbitMQ.Tests.csproj --framework net9.0 --filter Bug_1594 — 3 / 3 pass (RabbitMQ DLQ Replay not triggered #1594 regression locked in)
  • RabbitMQ DLQ mechanics + Bug_DLQ_* — 23 / 23 pass
  • CI across net8 / net9 / net10

🤖 Generated with Claude Code

jeremydmiller and others added 3 commits April 20, 2026 05:55
When a non-durable local queue persisted a failed message to the
database-backed DLQ, marking the row replayable caused the durability
agent to dispatch it back via BufferedLocalQueue.EnqueueDirectlyAsync.
That path used the BufferedReceiver itself as the channel callback,
whose CompleteAsync is a no-op, so the inbox row was never marked
Handled and got re-recovered on every host restart.

This fix mirrors the GH-1594 pattern used in ListeningAgent for
transport-backed endpoints: wrap the recovered envelope in a
LocalQueueRecoveryListener whose CompleteAsync marks the row as
Handled, and route through IReceiver.ReceivedAsync so the existing
_completeBlock invokes the wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First pass routed the recovery path through IReceiver.ReceivedAsync,
which fires _completeBlock eagerly at receipt time. That regressed
SqlServerTests.*.should_reasign_incoming_envelope_to_owner_id because
scheduled messages dispatched via runtime.EnqueueDirectlyAsync were
being marked Handled before the test could observe them as Incoming
(and before the handler had actually run).

Instead: attach the LocalQueueRecoveryListener to envelope.Listener
during EnqueueDirectlyAsync, and have BufferedReceiver.CompleteAsync
(the pipeline's IChannelCallback) delegate to it. Now the inbox row is
only marked Handled after the pipeline successfully completes, which
matches DurableReceiver's semantics and preserves the durability
guarantee for scheduled messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When built with the .NET 10 SDK (now required via global.json rollForward:
latestMajor), WolverineWebApiFSharp.dll was ending up with a reference to
FSharp.Core 10.0.0.0 even though Directory.Packages.props and the csproj
pinned 9.0.303. The existing `<PackageReference Update="FSharp.Core"
VersionOverride="9.0.303"/>` form was not overriding the SDK's implicit
F# Core reference.

At runtime, JasperFx's Roslyn-based codegen resolves FSharp.Core 9.0.0.0
(the package version), and the embedded 10.0.0.0 reference in the F#
assembly then fails with CS1705 — manifesting as a 500 on end-to-end
tests like `post_returning_fsharp_taskunit` and cascading into
`verify_open_api_expectations` via the shared Alba fixture.

Fix: set DisableImplicitFSharpCoreReference=true and use an explicit
`<PackageReference Include="FSharp.Core"/>`, which picks up the
centrally-managed 9.0.303.

Verified locally: `WolverineWebApiFSharp.dll` now references
FSharp.Core 9.0.0.0, and the full Wolverine.Http.Tests suite is
631/631 green on net9.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-evaluate the usage of database backed DLQ on non-Durable endpoints

1 participant