-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uv downloads are slow on fallback to streamed wheel downloads #5073
Comments
My guess is that you're being rate-limited or throttled somehow. |
I am not rate limited. I can download torch at ~450 MB/s with uv is downloading at 1 MB/s for the first 10 MB or so, then it's increasing to double digits MB/s, it's better toward the end but still below expectations. there is something wrong with uv. |
What do you see if you download just torch though? Your examples above all feature concurrent downloads. |
installing torch alone (it requires typing-extensions too) I see the progress bar appearing with a few MB done, the download is progressing very slowly (I'd say around 1 MB/s) from 5 to 10 MB completed, then the download speed is increasing to double digits MB/s (hard to estimate on sight) until the end.
|
That invocation gives me the following:
|
Interesting... From the logs, I get both "resolved" and "prepared" logs at 19 and 20 seconds. I think it's unexpected to say the least? why does the resolution takes this long for me? and it's so much faster for you? With web server is artifactory behind some caching server. maybe it doesn't support range requests. is it possible
|
If you don't support range requests, we can't just download the part of the wheel we need to extract metadata so we need to download the entire thing to perform resolution. Servers that do not support range requests are expected to have significant degradation of resolver performance. |
That shouldn't be the problem though, since we should be caching that wheel (and we need to download it in the next step anyway). |
(As an aside, I feel like your tone assumes we will disagree with you. I'm not here to tell you that you're wrong, and that uv is perfect! I'm just trying to get more information so we can diagnose the problem.) |
It does sound weird that we don't cache and use it in the subsequent step. Maybe there's a bug there if you're seeing that log twice? |
Even still, that doesn't explain why the download is slow. |
apologies if the tone came wrong, I am not a native speaker :sorry: note that we are testing with |
Thanks! No problem we're happy to help out. With As Charlie noted, it is still confusing that the download is slow separately from the other problems. Maybe we don't chunk the download well when streaming the whole file? Maybe we need to try to reproduce with a registry that does not allow range requests. |
Do you see this problem using the standard PyPI registry? |
sorry, I am not able to test that for comparison, connection to pypi.org is blocked by firewall at my workplace. Trying in verbose debug with logs in the resolution phase below. code in or around uv_client::registry_client::read_metadata_stream?
similar logs in the prepare phase below.
|
No worries, I just want to make clear that we're on your side and want to help :) Let me take a look at the streaming code to verify (1) that the chunk size is what we expect, and (2) that we're caching the wheel in that case. |
I see 3 calls to 1 call in https://github.com/astral-sh/uv/blob/e34ab96e807764fb0bfdfa0ca9c96d258d0d22de/crates/uv-extract/src/stream.rs#L53C13-L53C28 It seems to me the copy() is defaulting to 2k chunks? (Just a thought after browsing through the code, I could be completely wrong, I have no experience in rust) |
similarly, i see 2 usages of tokio::io::BufWriter::new and one usage of tokio::io::BufWriter::with_capacity passing the filesize. the default write buffer seems to be 8k. that could explain some of the performance., the torch wheel is 1800 MB, that could be 200k+ I/O calls to write(), the overhead would be quite sizable. |
(I will trace through these, it's a little tricky because sometimes we already wrap in a buffer elsewhere.) |
Ok so for one, I believe we are downloading the wheel twice if your server doesn't support range requests, because we stream the wheel and stop as soon as we see a |
Hi! Just wanted to check whether there has been any progress on this one? We initially tried |
@zanieb -- I think we should probably just revert that change and revisit. It looks like it did more harm than good. Any objections? |
If anyone has an example of a wheel that was slow to download, that's mirrored from PyPI (so I can inspect it), it would be helpful. |
quick update on this ticket:
we have no clue why and not closer to a resolution or an understanding of the root cause.
packages are downloaded a first time to identify dependencies then a second time to do the installation. |
Empirically it proved to be significantly worse in practice to “always download the wheel”. We measured it with users. |
I'm running into an issue with this as well. A package, which is around 160Mb, is hosted in a private CodeArtifact repository. I get the following warnings for every single version of the package in code artifact, which means multiple gigabytes of data is downloaded, seemingly just to determine the package version? Additionally, I have the package version locked in the pyproject.toml.
|
@jpedrick-numeus unfortunately this is a problem with CodeArtifact — they should support the modern metadata API so we don't need to download wheels to inspect their requirements and if they don't support that they should at least support range requests so we can effectively extract the requirements. |
## Summary I think a better tradeoff here is to skip fetching metadata, even though we can't validate the extras. It will help with situations like #5073 (comment) in which, otherwise, we have to download the wheels twice.
@zanieb does UV have to download every single version? I tried setting a version constraint |
Can you share the |
@charliermarsh Based on the logs and without understanding how the code works, I have observed that the full wheels for multiple versions are being downloaded during the resolution phase and, I guess, just picking one version in the next step. |
@ewianda -- That part is totally normal. If you're using a registry that doesn't implement either the latest standards or range requests, then we have to download a wheel in order to determine its dependencies. |
@charliermarsh does UV have to download every version though? It's a bit unexpected since the version is right in the path: It makes sense to download full packages for all valid versions, but filtering the versions should happen before downloading(I think). |
We're not requesting the version. We need the dependencies in order to compute an accurate resolution. |
(For example, if you run with |
Does pip perform any differently here? Without access to logs it's not possible to figure out if uv is making an "efficient" resolution, and if not, why not. There are weaknesses in both the way uv and pip perform resolution, but they are different. |
Unfortunately, this goes a bit too far... I want |
Candidly I think your diagnosis is not quite correct. We already know the versions when we request those URLs. So whatever you're seeing is something else. I can help if you share the |
I'd love to make this better I just need a little more data. |
Thanks! I'll respond with the verbose logs when I get a chance today |
This is a log I generated in #6104, Not sure if that is usefull I see this
|
So what's happening there is that we're finding that we have to try a lot of |
(The batch prefetching is extremely helpful in most cases because fetching metadata is usually cheap.) |
I've seen reported cases, on the pip side, where packages are multiple GB big and they are the cause of backtracking, so you should probably avoid prefetching whole wheels. Unless there's some way to reliably detect what size it is before downloading, as on the other size end I do notice with some wheels the the majority of the size is the metadata, so if it's only small, like 50 KiB or less, you could probably make an exception there. |
In case others missed it, @charliermarsh already made this change #7226 and a new uv was released https://github.com/astral-sh/uv/releases/tag/0.4.8 |
@notatallshaw @charliermarsh this fixes the issue for me! Thanks! |
Hello,
I'm trying the latest version of uv and I see that you added progress bars. :)
uv is downloading tensorflow at less than 1 MB/s.
when tensorflow is the last remaining package, the speed starts increasing to a more reasonable speed of double digits MB/s.
for reference, I have pip downloading at 400+ MB/s for tensorflow/torch (with one patch pending to merge to fix the progress bar :D ).
There is a chance it's not the progress bars themselves, but the addition of progress bars made the issue visible.
Looking at the output:
It will also cause issues with proxy/firewall and internal pypi servers that can't take the load and drop connections.
Can I suggest to reduce that to 20 at most? It will be faster and more stable.
As a nice side effect, it will allow the output to fit in the screen (default console is around 80 x 24 lines). The default output of 50 lines never fits in the screen ^^
I do note that both the resolver and the download seem to use the same concurrency settings (50 by default). It might make sense for the settings to be separate, the resolver is doing small requests and is more affected by latency, the downloading is more about bandwidth.
Can you review what is the chunk size for reading from network? writing to disk?
The screen flashes a lot, like it's trying to rerender all the progress bars every few kB. Do you have some sort of limits on how often the bar should be refreshed?
Do you have some throttling or priority where it tries to download packages at the top of the list first by any chance?
The list is sorted with bigger packages to the bottom.
DEBUG RUN
the download time is part of the "prepare 394 packages"
The text was updated successfully, but these errors were encountered: