-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd UV Cache behaviour around freshly published package #5351
Comments
So, as background, we respect HTTP caching headers for the simple API responses (so, e.g., we cache PyPI responses for 10 minutes, per their headers). Are you seeing this behavior consistently, or was it only for the first 10 minutes after publishing? (I assume the latter.) Either way, running Is there any |
Adding on, after running the compile once with Which I found in #2538 |
Thanks for the thorough write-up! Some people on the team are more familiar with the cache and can probably weigh in on what's going on, but hopefully I can unblock you quickly. Unlike pip, uv respects the Did this happen outside that range? |
Ah three comments at once :) |
The |
The nexus repo response doesn't set any cache header from what I can see (sent the response in the screenshot) but maybe i'm not looking at the right thing. We noticed this first occurrence about 30 minutes after the |
If you run, e.g.,
|
Thanks a lot, that indeed is giving a lot better info. Apologies for the big blob below, cut out anything not related to importlib-metadata.
|
\cc @BurntSushi who can probably explain the headers / behavior (but is working on some other high-priority things right now) |
Yeah, if your index supports range requests, we can resolve faster. Even better are the |
interesting not seen this. Nexus repo definitely does not have support for these unfortunately. Either way, keen to get some further input from you guys whenever available, it seems possible uv is not handling caching for repo's without Cache-Control headers properly / is too relaxed with it's TTL for these cases. |
Will take a closer look at the logs when we can! |
From a quick glance, I'm not seeing anything interesting with respect to the HTTP cache semantics we have implemented. It seems to detect a "fresh" response without issue, which means, "the data we have cached is fresh after considering HTTP cache headers." And thus, a "fresh" response is one that can be used. The only thing that doesn't seemed to be cached is this:
But that's not in our HTTP semantics. That's just a result of not being able to find any cached file for that wheel. |
The Of course, the problem could be outside that module. |
Just ran into another case with the cache thinking it's up to date for a package pushed yesterday evening ( Same behavior, new version for several hours (this time >~12), but UV thinking it's cache is up to date. Got the full trace output this time. Going through the release notes, seeing multiple bug fixes that could be cache related however not sure. Even more interesting, downgrading to Failing
Working
Running with |
Afraid this happened again just now, this time with We have a package pushed to our internal repo about ~40 minutes ago, and are still running into UV not finding it via either The behavior seems similar to the other cases, with uv thinking the cache for package specific index is up to date. Replicated both in our CI and on local Windows
RUST_LOG=trace uv pip install company-auth==2.5.0 -vvv
Any help / further insight into the cache behavior is highly appreciated |
It's all the same issue: uv thinks that the response is cacheable indefinitely based on something in your cache headers. While we figure it out, I would suggest adding: [tool.uv.pip]
refresh = true ...to your |
Can you share the exact headers you get back from hitting |
Here you go http get https://<internal-repo>/repository/pypi-group/simple/importlib-metadata/
|
Thank you. Would you be able to send one for a package that is getting a "stale" response too? |
Since there isn't an uv/crates/uv-client/src/httpcache/mod.rs Lines 946 to 950 in cb505d2
We could probably use better logging here, particularly around choices made based on heuristics. And perhaps the heuristic ought to be tweaked. (Assuming I've diagnosed the problem correctly.) Of course, the HTTP server should probably have better cache headers, but |
👍 Yeah, I'm mostly wondering if the last-modified header is being updated or not for those "stale" packages. |
Having some issues replicating it now following the Will share headers when it pops up again |
Thanks @jbw-vtl. Clearly something off here so appreciate any info you can provide. |
This has popped up again just now in our CI, however was not able to reproduce locally i'm afraid. See the below, think this is roughly what happened with If UV retrieves the simple index for
Duration in this case would be set to Believe this to be way to aggressive, as quite common for packages to be updated a lot less frequently, which could cause the cache to last for weeks. Think there could be two possible changes here
We are still awaiting nexus to come back to us regarding implementing proper cache control headers, however have not got strong hopes here. |
We own the HTTP caching semantics here, so we could also just use a hard-coded duration instead of trying to get too smart with the heuristic. @charliermarsh What do you think about just hard-coding 600 seconds as the time we treat things as "fresh" in this specific case? |
That seems fine. |
Thanks for exploring this @jbw-vtl. We will change to 600 for now and consider making it configurable. |
The comment in the code explains the bulk of this: ```rust // We previously computed this heuristic freshness lifetime by // looking at the difference between the last modified header and // the response's date header. We then asserted that the cached // response ought to be "fresh" for 10% of that interval. // // It turns out that this can result in very long freshness // lifetimes[1] that lead to uv caching too aggressively. // // Since PyPI sets a max-age of 600 seconds and since we're // principally just interacting with Python package indices here, // we just assume a freshness lifetime equal to what PyPI has. // // Note though that a better solution here is for the index to // support proper HTTP caching headers (ideally Cache-Control, but // Expires also works too, as above). ``` We also remove the `heuristic_percent` field on `CacheConfig`. And since that's actually part of the cache itself, we bump the simple cache version. Finally, we add some more `trace!` calls that should hopefully make diagnosing issues related to the freshness lifetime a bit easier in the future. Fixes #5351
The comment in the code explains the bulk of this: ```rust // We previously computed this heuristic freshness lifetime by // looking at the difference between the last modified header and // the response's date header. We then asserted that the cached // response ought to be "fresh" for 10% of that interval. // // It turns out that this can result in very long freshness // lifetimes[1] that lead to uv caching too aggressively. // // Since PyPI sets a max-age of 600 seconds and since we're // principally just interacting with Python package indices here, // we just assume a freshness lifetime equal to what PyPI has. // // Note though that a better solution here is for the index to // support proper HTTP caching headers (ideally Cache-Control, but // Expires also works too, as above). ``` We also remove the `heuristic_percent` field on `CacheConfig`. And since that's actually part of the cache itself, we bump the simple cache version. Finally, we add some more `trace!` calls that should hopefully make diagnosing issues related to the freshness lifetime a bit easier in the future. Fixes #5351
The comment in the code explains the bulk of this: ```rust // We previously computed this heuristic freshness lifetime by // looking at the difference between the last modified header and // the response's date header. We then asserted that the cached // response ought to be "fresh" for 10% of that interval. // // It turns out that this can result in very long freshness // lifetimes[1] that lead to uv caching too aggressively. // // Since PyPI sets a max-age of 600 seconds and since we're // principally just interacting with Python package indices here, // we just assume a freshness lifetime equal to what PyPI has. // // Note though that a better solution here is for the index to // support proper HTTP caching headers (ideally Cache-Control, but // Expires also works too, as above). ``` We also remove the `heuristic_percent` field on `CacheConfig`. Since that's actually part of the cache itself, we bump the simple cache version. Finally, we add some more `trace!` calls that should hopefully make diagnosing issues related to the freshness lifetime a bit easier in the future. Fixes #5351
The comment in the code explains the bulk of this: ```rust // We previously computed this heuristic freshness lifetime by // looking at the difference between the last modified header and // the response's date header. We then asserted that the cached // response ought to be "fresh" for 10% of that interval. // // It turns out that this can result in very long freshness // lifetimes[1] that lead to uv caching too aggressively. // // Since PyPI sets a max-age of 600 seconds and since we're // principally just interacting with Python package indices here, // we just assume a freshness lifetime equal to what PyPI has. // // Note though that a better solution here is for the index to // support proper HTTP caching headers (ideally Cache-Control, but // Expires also works too, as above). ``` We also remove the `heuristic_percent` field on `CacheConfig`. Since that's actually part of the cache itself, we bump the simple cache version. Finally, we add some more `trace!` calls that should hopefully make diagnosing issues related to the freshness lifetime a bit easier in the future. Fixes #5351
The comment in the code explains the bulk of this: ```rust // We previously computed this heuristic freshness lifetime by // looking at the difference between the last modified header and // the response's date header. We then asserted that the cached // response ought to be "fresh" for 10% of that interval. // // It turns out that this can result in very long freshness // lifetimes[1] that lead to uv caching too aggressively. // // Since PyPI sets a max-age of 600 seconds and since we're // principally just interacting with Python package indices here, // we just assume a freshness lifetime equal to what PyPI has. // // Note though that a better solution here is for the index to // support proper HTTP caching headers (ideally Cache-Control, but // Expires also works too, as above). ``` We also remove the `heuristic_percent` field on `CacheConfig`. Since that's actually part of the cache itself, we bump the simple cache version. Finally, we add some more `trace!` calls that should hopefully make diagnosing issues related to the freshness lifetime a bit easier in the future. Fixes #5351
Hi Team,
We unfortunately ran into a nasty issue with uv seemingly incorrectly handling the repository cache for a newly released package (in this case
importlib-metadata==8.1.0
which was released this morning)Poetry happily installs the version (assuming they are handling cache differently), however UV fails to compile, complaining
8.1.0
is not found. The previous version8.0.0
is found as expected.Removing the cache or running without cache succeeds.
Now locally this would be fine, however this is happening in our CI environment, which is using a mix of
uv
andpoetry
, dropping the cache which is shared across builds would work as a one-time, however is not sustainable of course.Struggling to investigate why UV might behave like this, any help is appreciated.
Compiling all the info below with my current investigation.
uv pip compile .\requirements.txt -vvv
If I understand the output correctly, it believes the cache file for importlib-metadata to be up to date.
I am however unsure how to parse the cache headers / any other information that would indicate why UV thinks so.
Below a manual head request for that index
HEAD https://<internal-repo>/repository/pypi-group/simple/importlib-metadata/
The repo does have the
8.1.0
(we essentially have a pass through mirror using nexus with very minimal caching, should have gotten the new release shortly after it was pushed on PyPi)GET https://<internal-repo>/repository/pypi-group/simple/importlib-metadata/
Now the plot thickens, when running without cache, uv correctly installs / compiles the package (as expected I assume)
uv pip compile .\requirements.txt --no-cache -v
Now dropping all of my cache would probably resolve the issue as well, however more interestingly, dropping just the importlib-metadata cache does not resolve the issue.
So far so good, however running the original command again, it clearly still has a cache file relating to importlib-metadata (...\simple-v9\dcb525f06f275a06\importlib-metadata.rkyv, not familiar with how to read rust .rkyv, however opening with text editor clearly shows there's several importlib-metadata version in there, no 8.1.0). Would expect this to be deleted, but then maybe that's me misunderstanding how some internal work.
uv pip compile .\requirements.txt -vvv
So two questions:
*.rkyv
file despite deleting the cache for the package? (not so important)Any help is massively appreciated as always
The text was updated successfully, but these errors were encountered: