Conversation
1faec8b to
f4a3fac
Compare
d3e11dd to
ad4c75d
Compare
ad4c75d to
259c2c3
Compare
259c2c3 to
b1c61b8
Compare
b1c61b8 to
c79ce52
Compare
There was a problem hiding this comment.
I spent a bit more time on this because I wanted to understand the code better.
I think the approach of hiding the timing/debug stuff may be confusing. I think maybe this would be better served as a get_retry_delay<E, P>(err: E, policy: &P, time, retries) -> Result<Duration, (E, u32)> function or something, with the time tracking, waiting, and debug! being explicitly outside.
To demonstrate the above idea, and to understand if it would help, I tried having a function like:
pub async fn retry_with_policy<P, T, E, F>(
retry_policy: &P,
url: &Url,
mut make_request: F,
) -> Result<T, (E, u32)>
where
P: RetryPolicy,
E: TriedRequestError,
F: AsyncFnMut() -> Result<T, E>,
{
let start_time = SystemTime::now();
let mut total_retries = 0;
loop {
match make_request().await {
Ok(ok) => return Ok(ok),
Err(err) => {
let delay = get_retry_delay(err, retry_policy, start_time, total_retries)?;
debug!(
"Transient failure while handling response from {}; retrying after {:.1}s...",
DisplaySafeUrl::ref_cast(url),
delay.as_secs_f32(),
);
tokio::time::sleep(delay).await;
total_retries += 1;
}
}
}
}Which you would use like (in CachedClient::get_cacheable_with_retry):
retry_with_policy(&self.uncached().retry_policy(), &req.url(), async || {
let fresh_req = req.try_clone().expect("HTTP request must be cloneable");
self.get_cacheable(fresh_req, cache_entry, cache_control, &response_callback)
.await
})
.await
.map_err(|(err, retries)| err.with_retries(retries))But I couldn't come up with a good approach which would work for uv_publish::upload. It does work for all the other cases though.
I guess one option would then be to just use the function in the plases where it works and use get_retry_delay maybe with some helper for waiting and debug!.
This would reduce the boilerplate even further, railroad users (developers) towards the happy path more, and avoid the possibility of starting the timer too early.
Just an idea though, may not be appropriate for this PR in particular.
crates/uv-client/src/base_client.rs
Outdated
| ); | ||
| // TODO(konsti): Should we show a spinner plus a message in the CLI while | ||
| // waiting? | ||
| tokio::time::sleep(duration).await; |
There was a problem hiding this comment.
I think it looks odd for a function named should_retry to be async and to sleep.
Maybe the alternative is to return e.g. the metadata for how long the function expects to sleep for, and handling that externally? the debug! and URL could also then be done externally...
Not sure, I think I have a better overall idea for how this may be approached later but maybe this is fine for now.
There was a problem hiding this comment.
I've renamed the function, the motivation is that I want to avoid the implementation from diverging. A secondary motivation that we have this boilerplate for each streaming request, and I'd like to keep that concise; Ideally it would be a part of the client, that doesn't work with e.g. our streaming hashing and unpacking:
loop {
let result = fetch(...).await;
match result {
Ok(download_result) => return Ok(download_result),
Err(err) => {
if retry_state
.handle_retry_and_backoff(&err, err.retries())
.await
{
continue;
}
return if retry_state.total_retries() > 0 {
Err(Error::NetworkErrorWithRetries {
err: Box::new(err),
retries: retry_state.total_retries(),
})
} else {
Err(err)
};
}
};
}There was a problem hiding this comment.
Yeah I understand the motivation, the previous situation wasn't great, but that's why I suggested this potential boilerplate (which would work for most of the cases here):
return retry_with_policy(retry_policy, url, async || fetch(...).await)
.await
.map_err(|(err, retries)| {
if retries > 0 {
Error::NetworkErrorWithRetries { err: Box::new(err), retries }
} else {
err
}
})But I agree that this code is better than the previous situation, it was just something to consider for the future.
The motivation for this change is that the implementations for this diverged slightly between the different code locations so I wanted to move them into a single location, otherwise I'd kept the logging and the waiting in the |
EliteTK
left a comment
There was a problem hiding this comment.
I prefer this split up "should_retry" and "sleep_backoff" approach more than the previous combined function.
The names much better represent what the functions are doing.
There are a few things I would do differently but at this point they're just stylistic nitpicks/different opinions.
f4a3fac to
24e22f2
Compare
# This is the 1st commit message: Use the same retry logic across uv We were using slightly different retry code in multiple places, which this PR unifies. Also fixes retry undercounting in publish if the retry middleware was involved. # The commit message #2 will be skipped: # fixup! Use the same retry logic across uv
Co-authored-by: Tomasz Kramkowski <tom@astral.sh>
d5cbcd8 to
eb5c88d
Compare
We were using slightly different retry code in multiple places, this PR unifies it.
Also fixes retry undercounting in publish if the retry middleware was involved.