Skip to content

Fix deadlock in DrainAsync causing Redis scheduling test failure#2294

Merged
jeremydmiller merged 1 commit intomainfrom
fix/2291-redis-scheduling-graceful-shutdown
Mar 12, 2026
Merged

Fix deadlock in DrainAsync causing Redis scheduling test failure#2294
jeremydmiller merged 1 commit intomainfrom
fix/2291-redis-scheduling-graceful-shutdown

Conversation

@jeremydmiller
Copy link
Member

Summary

Root Cause

PR #2288 added WaitForCompletionAsync to DrainAsync() to wait for in-flight messages during graceful shutdown. However, DrainAsync() is also called from within the handler pipeline when a rate-limited message triggers PauseListenerContinuation:

_receiver block execute(message B)
  → pipeline.InvokeAsync()
    → RateLimitContinuation → ReScheduleAsync (stores in Redis sorted set)
    → PauseListenerContinuation → agent.PauseAsync()
      → StopAndDrainAsync()
        → receiver.DrainAsync()
          → _receiver.WaitForCompletionAsync()  ← waits for message B to finish
                                                  but message B is waiting for THIS to return

This circular dependency causes a 30-second deadlock (bounded by DrainTimeout), after which the rate-limited message's retry window has long passed, and the test's 20-second polling timeout expires first.

Fix

Use the _latched flag to distinguish the two call paths:

  • Shutdown: OnApplicationStopping calls Latch() first → _latched is already true when DrainAsync() runs → safe to wait
  • Pipeline pause: no prior Latch() call → _latched is false when DrainAsync() runs → skip the wait to avoid deadlock

Test plan

  • rate_limited_messages_are_delayed_with_native_scheduling passes consistently (was failing every run)
  • All 87 Wolverine.Redis.Tests pass
  • All 1160 CoreTests pass

Closes #2291

🤖 Generated with Claude Code

When a rate-limited message triggers PauseListenerContinuation, the
pause calls StopAndDrainAsync → DrainAsync from within the receiver
block's execute function. The WaitForCompletionAsync added in #2288
waits for in-flight items to finish, but the current message IS an
in-flight item — creating a deadlock that times out after DrainTimeout
(30s), causing the Redis rate limiting test to fail.

Fix: only wait for completion when Latch() was previously called
(indicating shutdown via OnApplicationStopping), not when DrainAsync
is the first to set _latched (indicating a pipeline-triggered pause).

Closes #2291

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Graceful shutdown causes problems with redis scheduling

1 participant