Skip to content

Fix nested retry loops for transient network errors#17267

Closed
shayonj wants to merge 1 commit intoastral-sh:mainfrom
shayonj:s/retry-fix
Closed

Fix nested retry loops for transient network errors#17267
shayonj wants to merge 1 commit intoastral-sh:mainfrom
shayonj:s/retry-fix

Conversation

@shayonj
Copy link

@shayonj shayonj commented Dec 30, 2025

Summary

Fixes a regression where transient network errors (like DNS failures) cause nested retry loops,
resulting in ~16 retry attempts instead of 4 and significantly longer failure times (~17-19s vs ~4s).

Problem

After #17104, both the middleware (RetryTransientMiddleware) and the cached client layer
(RetryState::should_retry) independently retry transient network errors. When the middleware
exhausts its retry budget and returns an error, should_retry starts a fresh retry loop,
causing nested retries.

Fix

RetryState::should_retry now only tracks middleware retries for error reporting instead of
initiating additional retries for transient errors. The middleware already handles transient
retry logic, so duplicating it here causes the nested loops.

Test Plan

Comment on lines +1148 to +1149
/// Kept for API compatibility with callers that pass `retry_policy`.
_retry_policy: ExponentialBackoff,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to break API compatibility here, every release is considered breaking at this time.

@shayonj
Copy link
Author

shayonj commented Dec 30, 2025

looks like there is a fix proposed in #17274. Closing on behalf

@shayonj shayonj closed this Dec 30, 2025
@shayonj shayonj deleted the s/retry-fix branch December 30, 2025 18:13
@EliteTK
Copy link
Contributor

EliteTK commented Dec 30, 2025

@shayonj ah, my apologies, I had missed that you were already working on a fix.

@shayonj
Copy link
Author

shayonj commented Dec 30, 2025

no probs at all! thank you for the fast fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants