-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
narinfo: Change NAR URLs to be addressed on the NAR hash instead of the compressed hash #4464
Conversation
6d6f1c6
to
28bd796
Compare
Nice. Did you check what content-types were being used on upload to S3? |
The content type is always set to |
…he compressed hash This change is to simplify [Trustix](https://github.com/tweag/trustix) indexing and makes it possible to reconstruct this URL regardless of the compression used. In particular this means that https://github.com/tweag/trustix/blob/7c2e9ca597de233846e0b265fb081626ca6c59d8/contrib/nix/nar/nar.go#L61-L71 can be removed and only the bits that are required to establish trust needs to be published in the Trustix build logs.
28bd796
to
144cad9
Compare
The problem with this scheme is that it makes the files in |
I don't think this is semantically meaningful.
Afaik Nix checks for the remote store narinfo before copying. This should be a no-op on existing narinfos.
I'm not proposing getting rid of these entirely, at least not in this iteration.
|
I'm not sure on this, the file being named for what it is makes it easy to tell if the file is valid from an operational point of view. Additionally, I'm not sure it is a good idea: the structure presented in Nix is just one possible layout of NARs in the cache. It is very easy to imagine a binary cache which segments its data in a |
I just had a discussion with @adisbladis about this and decided to merge. Since Trustix stores trust mappings from store paths to NAR hashes, there is no way to get the URL of the compressed file in arbitrary binary caches if it's not named after the NAR hash. The main remaining issue is that we need to start ignoring the Compression field but we can do that in a future PR. |
I feel this is a big loss, but okay. Does this mean binary caches which don't store this way can't participate in trustix? I guess I wish Trustix didn't ad-hoc change properties of Nix's binary caches without a bit more discussion, and I feel disappointed I couldn't participate in representing my concerns further. |
You could participate purely to track reproducibility, but actually using it as an upstream would be unwieldy. Without a canonical address to access the NARs it's not possible for a binary cache proxy to serve NARs from multiple caches as compression schemes may differ, and therefore URLs are unpredictable, even though the actual unpacked contents are identical. This isn't just an issue for Trustix, but for anyone who wants to build a highly available binary cache proxy substituting from multiple upstreams. |
My main concerns here are how quickly the Nix codebase adopted this change, and how little discussion there was in how this could impact other use cases. This could easily spread as an assumption in other parts of the codebase, breaking use cases that are important to me. I'm not interested in building on sand, and I feel like my world has become sand. |
I want to raise a few problems with this:
Not saying that those are blockers, but they do affect our future decisions. |
To elaborate a bit further: When a request for a NAR comes in, and that URL is not strictly reproducible our cache proxy would have to store a substantial amount of state to be able to go from the non-reproducible URL back to the store path prefix so we can actively poll known binary caches for this content to get the actual URLs. |
Additionally:
I would much prefer binary cache proxies follow what I feel is the spec,: query the upstream cache for the narinfo to locate the nar, instead of trying to play games reconstructing URLs. |
You only need to know the derivation's hash, since that is how you look up the narinfo to find the nar... which is how binary caches work. |
That's not practical at all as I've already explained because of the massive amounts of state required.
You could implement this with redirects, additionally this doesn't need to invalidate any client caches. |
I don't understand why you need a lot of state to look up the narinfo file on a binary cache. |
There is an unbounded time period between when a client requests (and caches) a narinfo and when it (possibly) requests a NAR. When the request for the NAR finally comes in we have lost the context of which narinfo this NAR is associated with, so it's not possible to poll other binary caches for the same NAR but with a different compression. |
It looks to me that this change might be a bit more controversial was orginally thought, given the activity post-merge on this PR. Given that @grahamc and @domenkozar have both expressed (some of) their hesitations/problems with the change, would it be perhaps a good idea to revert this change for now and start an RFC for this? |
@adisbladis I noticed you reacted with a thumbs down, which means you do not agree with what I suggest. Could you please elaborate on why you do not agree? Do you think the change is not controversial, or perhaps you have another reason? @edolstra Would you mind commenting on your view? |
Going straight from "needs more discussion" to "write an RFC" is extreme and an excellent way to quash dissenting view points as it's puts an unreasonable burden on the author to push their work forward. At this moment (this is just a quick summary, any values should be seen as rough estimates) the average time to get an RFC merged is 113 days and a median of 48 days. I think reverting with the reason "needs more discussion" and report an issue is annoying but within the reasonable realm of responses, going straight for an RFC isn't. |
I don't follow the reasoning. RFCs are to explore the design space and come up with a great solution. I don't see it being extreme, but I suppose I also don't see the RFC process as punishment. I think big changes should get proper consideration and time for the large community of users to contribute to it. I would hope that we as a project can follow the process we have agreed to for exploring significant changes. |
Perhaps I should have suggested two things in stead:
@adisbladis Does that make more sense, and do we agree on that path forward? (edited) |
This sounds like a much more reasonable path forwards. |
Okay, let's revert until after the 2.4 branch-off for now. We can continue to discuss it here. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-7/11552/1 |
We can work around the deficiency that I tried to fix in NixOS/nix#4464 by constructing URLs which embed all the metadata until such a time that upstream Nix has a truly content-addressed NAR solution.
This change is to simplify Trustix indexing and makes it possible to reconstruct this URL regardless of the compression used.
In particular this means that https://github.com/tweag/trustix/blob/7c2e9ca597de233846e0b265fb081626ca6c59d8/contrib/nix/nar/nar.go#L61-L71 can be removed and only the bits that are required to establish trust needs to be published in the Trustix build logs.
A notable change is that the compressed files no longer comes with a file extension indicating their compression. This should be fine as the associated narinfo file has the compression metadata.
For applications where this is not feasible (like Trustix) the application could check the compression by the magic numbers.
I have manually tested if my local version of Nix (2.3.10) can still substitute from a binary cache created after this change and everything still works.
Related issues
Acknowledgements
This work is done within the context of Trustix and has been sponsored by a NLNet Foundation NGI-0 PET grant.
cc @Ericson2314 @domenkozar