Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary cache upload optimization #4075

Open
roberth opened this issue Sep 26, 2020 · 7 comments
Open

Binary cache upload optimization #4075

roberth opened this issue Sep 26, 2020 · 7 comments

Comments

@roberth
Copy link
Member

roberth commented Sep 26, 2020

Is your feature request related to a problem? Please describe.

Binary cache uploads can take longer than necessary.

Describe the solution you'd like

I'd like to avoid compressing the NAR before determining that it's already available in the cache.
This can be done by naming the NAR file after the uncompressed hash.
It has further benefits for the uploading implementation, which is that when the ValidPathInfo is already known (currently always the case), we can compress and upload simultaneously rather than sequentially, saving time even in the case where we do actually need to compress and upload.

Describe alternatives you've considered

Do we have a good reason to name the NAR file after its compressed hash? All I can think of is that it simplifies a possible integrity check, but that check can still be done by reading the narinfo files. In fact you probably want to read those anyway so you can garbage collect any incomplete uploads that didn't make it to the narinfo upload step.

Additional context

@zimbatm
Copy link
Member

zimbatm commented Sep 26, 2020

Assuming a compromised cache, uncompressing before verification also has the potential of including zip-bomb equivalents that explode disk usage. I just wanted to reply to the challenge and don't think that it's a significant-enough threat to prevent this change :D

@roberth
Copy link
Member Author

roberth commented Sep 26, 2020

@zimbatm that's an interesting thought though!

Assuming a compromised cache

Normally I'd say you have bigger problems in that case, but perhaps it's useful to have an untrusted cache for content-addressable stuff like sources. So that would be a cache where you don't configure a public key and rely on CA hashes for verification.

But even then, before unpacking we will have queried the narinfo, telling us the size, that we can then use to abort decompression if necessary. Or of course even earlier if we decide that the recorded size is too big in the first place.

Currently Nix doesn't seem to have any protection against large NARs, so it's kind of an orthogonal issue.

@Ericson2314
Copy link
Member

Hehe I hadn't heard about zip bombs, how fun! I think the risk is less for us because we are compressing whole archives rather than individual files? When we decompress, we can just bail out if it exceeds the nar size also specified in the narinfo file.

@zimbatm
Copy link
Member

zimbatm commented Sep 27, 2020

Oops, I didn't mean to start a thread, please ignore my comment :D

I think that this issue is a good idea.

@stale
Copy link

stale bot commented Mar 31, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Mar 31, 2021
@stale
Copy link

stale bot commented Apr 19, 2022

I closed this issue due to inactivity. → More info

@stale stale bot closed this as completed Apr 19, 2022
@thufschmitt thufschmitt reopened this Feb 24, 2023
@roberth
Copy link
Member Author

roberth commented Feb 28, 2023

Perhaps of interest to @zhaofengli, author of attic.

@stale stale bot removed the stale label Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants