Skip to content

Add jitter support to retry cooldown policies (#2501)#2504

Merged
jeremydmiller merged 1 commit intoJasperFx:mainfrom
BlackChepo:feature/2501_jitter
Apr 14, 2026
Merged

Add jitter support to retry cooldown policies (#2501)#2504
jeremydmiller merged 1 commit intoJasperFx:mainfrom
BlackChepo:feature/2501_jitter

Conversation

@BlackChepo
Copy link
Copy Markdown
Contributor

Closes #2501.

Summary

Adds additive jitter to Wolverine's delay-based error policies so distributed
nodes don't retry in lockstep after a shared downstream failure.

Three new fluent methods on IAdditionalActions:

opts.OnException<DownstreamUnavailableException>()
    .RetryWithCooldown(50.Milliseconds(), 100.Milliseconds(), 250.Milliseconds())
    .WithFullJitter();

opts.OnException<DownstreamUnavailableException>()
    .ScheduleRetry(1.Seconds(), 5.Seconds(), 30.Seconds())
    .WithBoundedJitter(0.25);

opts.OnException<DownstreamUnavailableException>()
    .PauseThenRequeue(5.Seconds())
    .WithExponentialJitter();

Applies to RetryWithCooldown, ScheduleRetry, ScheduleRetryIndefinitely
(including the infinite source), and PauseThenRequeue. The three strategies
are mutually exclusive per error rule.

Strategies

All strategies are additive — the configured delay d is the lower bound
of the effective delay d':

┌───────────────────────┬────────────────────────────────┬──────────────────────────┐
│       Strategy        │            Formula             │          Range           │
├───────────────────────┼────────────────────────────────┼──────────────────────────┤
│ WithFullJitter        │ d + random(0, d)[d, 2d]                  │
├───────────────────────┼────────────────────────────────┼──────────────────────────┤
│ WithBoundedJitter(p)  │ d + random(0, d * p)[d, d · (1 + p)]         │
├───────────────────────┼────────────────────────────────┼──────────────────────────┤
│ WithExponentialJitter │ d + random(0, d * attempt * 2)[d, d · (1 + 2·attempt)] │
└───────────────────────┴────────────────────────────────┴──────────────────────────┘

Invariant — jitter only extends, never shortens, the configured delay.
Users' carefully chosen cooldown values remain the floor.

Design notes

- WithExponentialJitter is attempt-scaled and stateless, not the
textbook "decorrelated jitter" from the AWS architecture blog. Tracking
the previous actual delay per envelope would have required a new
Envelope field and a persisted schema change for scheduled retries.
The attempt-scaled formula gives the desired property (spread widens
with retries) without touching persistence. The method is named
honestly rather than calling it "decorrelated."
- Jitter applies once per error rule, not per Then segment. Calling
WithXxxJitter a second time (even after .Then) throws
InvalidOperationException to prevent silent strategy replacement.
- Internal singletons (RetryInlineContinuation.Instance,
RequeueContinuation.Instance) used by RetryOnce / RetryTimes /
Requeue / RequeueIndefinitely reject TrySetJitter because they
have no delay — preventing cross-rule state leaks.
- _jitter fields are volatile, matching the pattern used by
InlineReceiver._latched for single-write / many-read fields set at
configuration time and consumed by worker threads.

Files

Production code:
- src/Wolverine/ErrorHandling/IJitterStrategy.cs (new) — internal interface + three strategy implementations using Random.Shared
- src/Wolverine/ErrorHandling/IJitterable.cs (new) — marker interface on delay-carrying continuations
- src/Wolverine/ErrorHandling/RetryInlineContinuation.cs,
ScheduledRetryContinuation.cs, RequeueContinuation.cs — implement IJitterable
- src/Wolverine/ErrorHandling/FailureSlot.cs — adds ApplyJitter helper
- src/Wolverine/ErrorHandling/PolicyExpression.cs — the three WithXxxJitter methods and per-rule validation

Docs: new Jitter section in docs/guide/handlers/error-handling.md.

Test plan

- Strategy unit tests: ranges, floor invariant, non-constant output,
attempt-scaling, constructor validation (JitterStrategyTests.cs)
- Per-continuation tests: RetryInlineContinuation,
ScheduledRetryContinuation, RequeueContinuation — jitter attaches
to delay variants, singletons reject it
- FailureSlot.ApplyJitter contract tests
- Fluent-API integration tests: happy path, mutual-exclusion rejection,
no-delay-slot rejection, percent validation, .Then interaction,
ScheduleRetryIndefinitely infinite-source coverage
- End-to-end behaviour tests: measure actual Task.Delay duration for
RetryWithCooldown + WithFullJitter; capture ReScheduleAsync argument
for ScheduleRetry + WithBoundedJitter and
ScheduleRetryIndefinitely + WithFullJitter
- Regression: all 119 pre-existing ErrorHandling tests remain green

Result: 137/137 ErrorHandling tests passing (+18 new).

Breaking change

Three new members on the public IAdditionalActions interface. The
interface has only one implementation (internal FailureActions) in this
repo; no samples or tests implement it externally. Downstream code that
does implement IAdditionalActions will need to add the three new
methods.

Introduces WithFullJitter, WithBoundedJitter, and WithExponentialJitter
fluent methods on IAdditionalActions. Jitter adds randomness to retry
delays so distributed nodes don't retry in lockstep after a shared
downstream failure.

Applies to RetryWithCooldown, ScheduleRetry, ScheduleRetryIndefinitely
(including the infinite source), and PauseThenRequeue. The three
strategies are mutually exclusive per error rule.

Invariant: jitter only extends the configured delay, never shortens it.
All strategies are additive:

  - Full:        d + random(0, d)                 → [d, 2d]
  - Bounded:     d + random(0, d * percent)       → [d, d*(1+percent)]
  - Exponential: d + random(0, d * attempt * 2)   → [d, d*(1+2*attempt)]

Exponential is attempt-scaled and stateless — it deliberately avoids
tracking the previous actual delay per envelope, keeping the envelope
shape and persistence schema unchanged.

Closes JasperFx#2501
@jeremydmiller
Copy link
Copy Markdown
Member

@BlackChepo Very cool, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add jitter support to retry cooldown policies

2 participants