Don't download source tarball if not needed.#9819
Don't download source tarball if not needed.#9819yshui wants to merge 3 commits intoNixOS:masterfrom
Conversation
Instead of just reporting the cache doesn't exist.
If 1) the store path for the unpacked source is valid, and 2) the file at the source has not changed, downloadTarball shouldn't redownload the source file.
|
The other option is to do what prefetch is already doing - don't call I am fine with either option. |
|
BTW other input types have similar problems as well. For example for github sources, the request to |
|
Is this trying to fix #9814? |
| bool locked, | ||
| const Headers & headers = {}); | ||
| const Headers & headers = {}, | ||
| bool onlyIfChanged = false); |
There was a problem hiding this comment.
When do we want to download if not changed?
There was a problem hiding this comment.
Errr, it's fully explained in my comments above
assuming the file at the source URL hasn't changed, when I run nix flake update, if both paths exist, all is good. nix will fetch the source URL with an etag, and get a 304 response, and do nothing. but if the source file has been deleted from the store, then nix will redownload the source file, regardless if the unpacked store path is still valid.
to summarize, if the unpacked tarball is available in store, we don't need to download the source tarball if not changed.
There was a problem hiding this comment.
err sorry I might have gotten the negation backwards. What you propose of downloading if and if needed sounds reasonable. I am asking when would one not want the semantics you proposed? e.g. When would one want to download again if we already have?
There was a problem hiding this comment.
personally i can't come up with a scenario where re-downloading would be preferable. do you have something in mind?
There was a problem hiding this comment.
errrr, did i misunderstand your question?
in code, onlyIfChanged would be false for a "normal" downloadFile call. so say the user calls builtins.fetchurl then the file will be downloaded if it's unchanged but not in store.
There was a problem hiding this comment.
@yshui Sorry I am still a bit confused. It seems in both cases you are saying "if we have it in the store and it didn't change we don't need to redownload it". I am missing something --- maybe missing that nix flake update need not care what is in the store?
There was a problem hiding this comment.
I am missing something
Fine. let me just list all the possible cases here.
So, there's two blobs involved for fetchTarball.
tarball.tar.gz, and unpacked-tarball/. for our purposes, we assume both have a entry in fetcher cache. Now, let's consider different cases where either or both of them are in the store.
- both
tarball.tar.gzandunpacked-folder/are in store.
fetchTarballfetchestarball.tar.gzwithetagfrom cache, gets 304, stops. Which is good. - only
tarball.tar.gzis in store.
fetchTarballfetchestarball.tar.gzwithetagfrom cache, gets 304, unpacks the tarball already in store. This is good too. - neither is in store.
fetchTarballignores cache entry fortarball.tar.gz, redownloads it, unpacks it. Good. - only
unpacked-tarball/is in store.
a.tarball.tar.gzis changed at the specified url
fetchTarballignores cache entry fortarball.tar.gz, redownloads it, unpacks it. Good.
b.tarball.tar.gzis unchanged at the specified url
fetchTarballignores cache entry fortarball.tar.gz, redownloads it, then reuse theunpacked-tarball/in store. NOT good.
This PR avoids redownloading tarball.tar.gz in case (4.b)
OTOH, fetchurl only puts one blob into the store. this is the case where downloadFile is called with onlyIfChanged = false. and the cases for fetchurl are:
- file is not in store
doesn't matter if the file at the source url is changed or not,downloadFileneeds to download it. this is whyonlyIfChangedshould be false. - file is in store
downloadFiledownloads the file if it has changed at the source url. this is the current behavior and is unchanged.
There was a problem hiding this comment.
Thanks very much @yshui. The big thing I was forgetting what that there is both the unpack and packed versions.
@edolstra has a PR that changes how the unpacked stuff is downlodaed that we are merging soon. I suspect we should land that first.
My only question left is, why does builtins.fetchurl use the tarball fetcher at all? Shouldn't it just use the file fetcher?
There was a problem hiding this comment.
why does
builtins.fetchurluse the tarball fetcher at all?
maybe the way i phrased caused some confusion, but builtins.fetchurl doesn't call downloadTarball, it calls downloadFile.
Whereas builtins.fetchTarball will call downloadTarball, and that in turn calls downloadFile
No, they are distantly related, but this PR won't fix that issue. |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2024-01-22-nix-team-meeting-minutes-117/38838/1 |
|
Hi, so what's the decision on this PR? what do I need to do to get this merged? |
|
@Ericson2314 what does #10038 do? |
|
@yshui It caches downloaded tarballs (not individual files) in a separate git repo (for file-granularity dedup) instead of the store. I suppose it is still possible for data to be in the cache DB but not the git repo, so the principle you are trying to ensure works still applies, but the implementation details are quite different. |
|
@Ericson2314 hmm, interesting. so if i do a |
|
@yshui |
|
@Ericson2314 I looked into it, and I think implementing this became a bit easier after the new git cache, but it looks like it also introduced some bugs (#10080), I'll update this PR after things are settled a bit more. |
|
Hold on, why do we need to cache the tarball at all? What we want is the unpacked tarball, why are we keeping the tarball file in cache? (or did I misunderstand? is the unpacked tarball being cached?) |
Motivation
flake tarball inputs generate two store paths, it first downloads the source file into the store, then unpacks it into another store path.
assuming the file at the source URL hasn't changed, when I run
nix flake update, if both paths exist, all is good.nixwill fetch the source URL with anetag, and get a304response, and do nothing. but if the source file has been deleted from the store, thennixwill redownload the source file, regardless if the unpacked store path is still valid.this commit makes sure if the unpacked source exists in the store, the source file will only be downloaded if it has changed.
Context
cache->lookupExpired(store, attrs)so it returns the cache entry along with if the store path is still valid, instead of just throwing it away when the path is invalid.onlyIfChangedtodownloadFile. whentrue,downloadFilewill only download the source file if it's different from the cache, even when the source file doesn't exist in store.downloadTarballto utilize this parameter to achieve what I described in the motivation sectionPriorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.