Skip to content

Use the same retry logic across uv#17105

Merged
konstin merged 5 commits intomainfrom
konsti/better-retry-handling-5
Dec 18, 2025
Merged

Use the same retry logic across uv#17105
konstin merged 5 commits intomainfrom
konsti/better-retry-handling-5

Conversation

@konstin
Copy link
Member

@konstin konstin commented Dec 12, 2025

We were using slightly different retry code in multiple places, this PR unifies it.

Also fixes retry undercounting in publish if the retry middleware was involved.

@konstin konstin temporarily deployed to uv-test-registries December 12, 2025 17:47 — with GitHub Actions Inactive
@konstin konstin force-pushed the konsti/better-retry-handling-4 branch 2 times, most recently from 1faec8b to f4a3fac Compare December 15, 2025 13:45
@konstin konstin force-pushed the konsti/better-retry-handling-5 branch from d3e11dd to ad4c75d Compare December 15, 2025 13:53
@konstin konstin force-pushed the konsti/better-retry-handling-5 branch from ad4c75d to 259c2c3 Compare December 15, 2025 13:57
@konstin konstin force-pushed the konsti/better-retry-handling-5 branch from 259c2c3 to b1c61b8 Compare December 15, 2025 14:00
@konstin konstin marked this pull request as ready for review December 15, 2025 14:00
@konstin konstin force-pushed the konsti/better-retry-handling-5 branch from b1c61b8 to c79ce52 Compare December 15, 2025 14:37
@konstin konstin requested a review from EliteTK December 15, 2025 17:45
@konstin konstin temporarily deployed to uv-test-registries December 16, 2025 09:54 — with GitHub Actions Inactive
@konstin konstin added the internal A refactor or improvement that is not user-facing label Dec 16, 2025
Copy link
Contributor

@EliteTK EliteTK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a bit more time on this because I wanted to understand the code better.

I think the approach of hiding the timing/debug stuff may be confusing. I think maybe this would be better served as a get_retry_delay<E, P>(err: E, policy: &P, time, retries) -> Result<Duration, (E, u32)> function or something, with the time tracking, waiting, and debug! being explicitly outside.

To demonstrate the above idea, and to understand if it would help, I tried having a function like:

pub async fn retry_with_policy<P, T, E, F>(
    retry_policy: &P,
    url: &Url,
    mut make_request: F,
) -> Result<T, (E, u32)>
where
    P: RetryPolicy,
    E: TriedRequestError,
    F: AsyncFnMut() -> Result<T, E>,
{
    let start_time = SystemTime::now();
    let mut total_retries = 0;

    loop {
        match make_request().await {
            Ok(ok) => return Ok(ok),
            Err(err) => {
                let delay = get_retry_delay(err, retry_policy, start_time, total_retries)?;
                debug!(
                    "Transient failure while handling response from {}; retrying after {:.1}s...",
                    DisplaySafeUrl::ref_cast(url),
                    delay.as_secs_f32(),
                );
                tokio::time::sleep(delay).await;
                total_retries += 1;
            }
        }
    }
}

Which you would use like (in CachedClient::get_cacheable_with_retry):

retry_with_policy(&self.uncached().retry_policy(), &req.url(), async || {
    let fresh_req = req.try_clone().expect("HTTP request must be cloneable");
    self.get_cacheable(fresh_req, cache_entry, cache_control, &response_callback)
        .await
})
.await
.map_err(|(err, retries)| err.with_retries(retries))

But I couldn't come up with a good approach which would work for uv_publish::upload. It does work for all the other cases though.

I guess one option would then be to just use the function in the plases where it works and use get_retry_delay maybe with some helper for waiting and debug!.

This would reduce the boilerplate even further, railroad users (developers) towards the happy path more, and avoid the possibility of starting the timer too early.

Just an idea though, may not be appropriate for this PR in particular.

);
// TODO(konsti): Should we show a spinner plus a message in the CLI while
// waiting?
tokio::time::sleep(duration).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks odd for a function named should_retry to be async and to sleep.

Maybe the alternative is to return e.g. the metadata for how long the function expects to sleep for, and handling that externally? the debug! and URL could also then be done externally...

Not sure, I think I have a better overall idea for how this may be approached later but maybe this is fine for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed the function, the motivation is that I want to avoid the implementation from diverging. A secondary motivation that we have this boilerplate for each streaming request, and I'd like to keep that concise; Ideally it would be a part of the client, that doesn't work with e.g. our streaming hashing and unpacking:

loop {
    let result = fetch(...).await;
    match result {
        Ok(download_result) => return Ok(download_result),
        Err(err) => {
            if retry_state
                .handle_retry_and_backoff(&err, err.retries())
                .await
            {
                continue;
            }
            return if retry_state.total_retries() > 0 {
                Err(Error::NetworkErrorWithRetries {
                    err: Box::new(err),
                    retries: retry_state.total_retries(),
                })
            } else {
                Err(err)
            };
        }
    };
}

Copy link
Contributor

@EliteTK EliteTK Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand the motivation, the previous situation wasn't great, but that's why I suggested this potential boilerplate (which would work for most of the cases here):

return retry_with_policy(retry_policy, url, async || fetch(...).await)
    .await
    .map_err(|(err, retries)| {
        if retries > 0 {
            Error::NetworkErrorWithRetries { err: Box::new(err), retries }
        } else {
            err
        }
    })

But I agree that this code is better than the previous situation, it was just something to consider for the future.

@konstin
Copy link
Member Author

konstin commented Dec 16, 2025

I think the approach of hiding the timing/debug stuff may be confusing. I think maybe this would be better served as a get_retry_delay<E, P>(err: E, policy: &P, time, retries) -> Result<Duration, (E, u32)> function or something, with the time tracking, waiting, and debug! being explicitly outside.

The motivation for this change is that the implementations for this diverged slightly between the different code locations so I wanted to move them into a single location, otherwise I'd kept the logging and the waiting in the loop where it was before

@konstin konstin temporarily deployed to uv-test-registries December 16, 2025 16:17 — with GitHub Actions Inactive
@konstin konstin temporarily deployed to uv-test-registries December 16, 2025 16:53 — with GitHub Actions Inactive
@konstin konstin temporarily deployed to uv-test-registries December 17, 2025 17:18 — with GitHub Actions Inactive
Copy link
Contributor

@EliteTK EliteTK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this split up "should_retry" and "sleep_backoff" approach more than the previous combined function.

The names much better represent what the functions are doing.

There are a few things I would do differently but at this point they're just stylistic nitpicks/different opinions.

@konstin konstin force-pushed the konsti/better-retry-handling-4 branch from f4a3fac to 24e22f2 Compare December 18, 2025 10:11
Base automatically changed from konsti/better-retry-handling-4 to main December 18, 2025 10:51
konstin and others added 4 commits December 18, 2025 13:22
# This is the 1st commit message:

Use the same retry logic across uv

We were using slightly different retry code in multiple places, which this PR unifies.

Also fixes retry undercounting in publish if the retry middleware was involved.

# The commit message #2 will be skipped:

# fixup! Use the same retry logic across uv
Co-authored-by: Tomasz Kramkowski <tom@astral.sh>
@konstin konstin force-pushed the konsti/better-retry-handling-5 branch from d5cbcd8 to eb5c88d Compare December 18, 2025 12:23
@konstin konstin enabled auto-merge (squash) December 18, 2025 12:29
@konstin konstin temporarily deployed to uv-test-registries December 18, 2025 12:32 — with GitHub Actions Inactive
@konstin konstin merged commit e2a775d into main Dec 18, 2025
154 of 162 checks passed
@konstin konstin deleted the konsti/better-retry-handling-5 branch December 18, 2025 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal A refactor or improvement that is not user-facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants