fetchgit: add git signature verification#330457
fetchgit: add git signature verification#330457flandweber wants to merge 1 commit intoNixOS:masterfrom
Conversation
1d0313a to
e2e2ab6
Compare
There was a problem hiding this comment.
This feels pretty hacky so I'd be glad for any ideas of how to improve this.
In my view it should be the default to include the passed reference/tag in the .git output, but changing this now would break backwards-reproducibility.
Maybe this could be an additionall --fetch-tag flag for nix-prefetch-git?
|
Requesting a review from myself to look later. I like this idea, but the hacks make me think there ought to be precursor patches that (say) make the callers of |
479ebf4 to
e567d25
Compare
|
rebased to master |
There was a problem hiding this comment.
Would it be possible to perform verification directly in the first pass?
Otherwise we are introducing a performance/security tradeoff, and some people may choose not to enable signature verification, to avoid bloating the source size in the store.
There was a problem hiding this comment.
Do you mean as part of the downloading fetcher?
Sadly not, as the fetcher is a fixed-output derivation.
If the verification would be part of it instead of a seperate component, it would not run if files with the same checksum already exists in the Nix store.
I tried to explain the problem in detail in https://landweber.xyz/ba.pdf Section 5.1.5
Also see #43233 (comment)
From my understanding, enabling leaveDotGit should not inflate the source size badly, but I didn't test this.
In any case, this is only relevant when building the derivation. When substituting it, users will only downloaded the files (unless they themselves enabled leaveDotGit).
There was a problem hiding this comment.
Sorry for the late reply, life was somewhat difficult this year. 😓
From my understanding, enabling leaveDotGit should not inflate the source size badly, but I didn't test this.
It can in repositories with deep histories, like nixpkgs itself. (Sorry, I tried to produce an example but GitHub is having issues tonight, so fetchGit was taking ~forever.)
Sadly not, as the fetcher is a fixed-output derivation. If the verification would be part of it instead of a seperate component, it would not run if files with the same checksum already exists in the Nix store.
In your BSc. thesis (and in particular the section you mentioned) the goal is to validate signatures when fetching nixpkgs itself, using builtins.fetchGit and without a known output hash. pkgs.fetchgit is different, and requires a known output hash; in fact, if one isn't specified, lib.fakeHash is used so there will be a mismatch and the user can fill in the actual hash, effectively creating a “trust on first use” model.
If the output hash is known a priori, an attacker cannot manipulate1 the fetcher's output. However, git signature validation would still have value in nixpkgs to protect that initial fetch (by a maintainer) and authenticate the versions of upstream packages which are being pinned (by hash) in nixpkgs.
In that case, wouldn't a single-pass implementation address the same threat model?
[...] When substituting it, users will only downloaded the files (unless they themselves enabled leaveDotGit).
I think there are two distinct cases to consider:
- users downloading a prebuilt (and preverified) source tree from a substituter;
- users of
fetchGitwho are actually fetching from a repository (as well as performing signature verification).
In both single- and two-pass implementations, (1) only receive the necessary data.
My concern is that the time- and storage-overhead of the two pass implementation, would cause users of fetchGit, whether nixpkgs contributors or nix users defining custom derivations, to simply not use signature validation (because it makes their builds slower or their system run out of disk space)
Footnotes
There was a problem hiding this comment.
Sorry for the late reply, life was somewhat difficult this year. 😓
I'm sorry it was and thankful for your participation! :)
It can in repositories with deep histories, like nixpkgs itself. (Sorry, I tried to produce an example but GitHub is having issues tonight, so fetchGit was taking ~forever.)
Usually fetchgit will only receive the latest commit (shallow clone).
Enabling deep cloning implies leaveDotGit, so the fetching would not add any storage overhead.
In your BSc. thesis (and in particular the section you mentioned) the goal is to validate signatures when fetching nixpkgs itself, using builtins.fetchGit and without a known output hash.
I'm sorry you got the impression, but the section "5.1.5 Verification in fixed-output derivations" does not talk about the builtin fetcher. It states:
[T]here are two distinct ways of implementing verifying fetchers in Nix. [...] As almost all fetchers depend upon network access at build time for the file download, they typically contain a fixed-output derivation and thus require a hash.
If the output hash is known a priori, an attacker cannot manipulate1 the fetcher's output. However, git signature validation would still have value in nixpkgs to protect that initial fetch (by a maintainer) and authenticate the versions of upstream packages which are being pinned (by hash) in nixpkgs. In that case, wouldn't a single-pass implementation address the same threat model?
Right, that's a good question. Usually hash updates will not be performed by maintainers but by the merge bot.
Having the signature verification execute on every build ensures, that the bot updated to a signed release.
I also loath verifications that are not run every rebuild but sit in the code as if they were. It should not be possible to build a fetchGit derivation with the correct commit/sha256 hash but an invalid signature key.
My concern is that the time- and storage-overhead of the two pass implementation, would cause users of fetchGit, whether nixpkgs contributors or nix users defining custom derivations, to simply not use signature validation (because it makes their builds slower or their system run out of disk space)
I share your concern regarding the time overhead (not space, as described above), because making the .git folder 'deterministic' is expensive on large repositories. However, as I don't see the one-pass approach as a viable option, I'd be willing to put this computation requirement on fetchGit users.
|
Thanks a lot for the good work in here, @flandweber ! ❤️ |
79f902b to
80007ef
Compare
philiptaron
left a comment
There was a problem hiding this comment.
Thanks for your patience in waiting for a reviewer.
I'm approving the approach. As I understand it, that is:
- Clean the git checkout if needed in the fixed-output derivation.
- Do the signature checking in a separate derivation. I buy your argument that the current FOD implementation makes these sorts of verifications in nix code hard to get right.
I'd like to see three "neatness" changes:
- Move the hook into
nativeBuildInputs - Move the verifying Nix code into its own file (maybe
pkgs/build-support/fetchgit/verify.nix) - Import and use that code if the user requests it.
I also wonder if you need to copy the contents from fetchresult, and whether you could just make out be a symlink to the fetchresult.
There was a problem hiding this comment.
Let's avoid this style of hook -- by my count using rg _HOOK -tnix --sort=path, this would be only the only one in nixpkgs. I think you can use nativeBuildInputs to add the hook conditionally.
There was a problem hiding this comment.
I believe I don't yet understand your suggestion.
Do you mean that the hook should be placed in it's own script and conditionally passed to nativeBuildInputs?
There was a problem hiding this comment.
@flandweber Philip is presumably talking about setup hooks.
e319dd7 to
060124f
Compare
|
@philiptaron Thank you so much for your review! I would need some elaboration on the
yes, except when |
e103897 to
7dc4bd9
Compare
|
I've added Adam @me-and on this review; they've helped enormously with all Git-related PRs in the last few months. This PR is not currently in a reviewable state due to the errors from CI and the merge conflicts, but it does represent something I would in concept like to merge. |
Thanks! |
|
I agree that two-pass is the only reasonable way to implement this. That said, I don't think you need to copy the entire contents into the second derivation. I believe it will work to do the verification, then have the verifying derivation contain a symlink to the first derivation unchanged. |
452d9fe to
4eb8f5e
Compare
|
Could you work on getting to a clean CI bill of health? You can run many of these tests locally. See ci/eval/README.md for the eval tests, which are failing, and run |
4eb8f5e to
cacceb1
Compare
Thanks for the tag! The high-level concept sounds excellent to me, although I'm helping run an event in a couple of weeks so I'm not going to be able to provide any more useful feedback until mid/late September. |
cacceb1 to
89c966b
Compare
89c966b to
510a5a1
Compare
510a5a1 to
075850d
Compare
|
One thing that came up during redrafting of this pr was key expiry, which is why I added a mechanism so that gpg signatures are checked at the creation time of the corresponding commit. For tags it would be nicer to check their signature on tag creation time, but I didn't figure out how to extract that yet. Other than that the current state is a redraft of the work from a year ago. Regarding I'll add a proper commit message when we get closer to landing. |
Referring to #8567 I presume. I should have a bit of time while waiting for transit to NixCon today, but it might slip a week or more. |
Possibly @philiptaron has something cunning that I'm not aware of, but I think the |
There was a problem hiding this comment.
A big stack of small change requests, but I want to revisit one of the key assumptions for this approach.
From #330457 (comment):
Usually hash updates will not be performed by maintainers but by the merge bot. Having the signature verification execute on every build ensures, that the bot updated to a signed release. I also loath verifications that are not run every rebuild but sit in the code as if they were. It should not be possible to build a
fetchGitderivation with the correct commit/sha256 hash but an invalid signature key.
I don't think this holds. There are three scenarios I can think of:
fetchgitresolves to a derivation that is already realised in your local store. In that case, no build stage will be performed, and therefore there's no chance to perform any signature checking.fetchgitresolves to a derivation that isn't realised in your local store but is available from a substituter/binary cache. Again, no build will be performed, so there's no chance to check signatures.fetchgitresolves to a derivation that isn't available anywhere. In that case, Nix will attempt a build which will either complete successfully and with an output that matches the given FOD hash, or it'll fail. That's the case regardless of where the FOD hash came from, and this build step provides an opportunity to verify the hashes without needing the two-step process this PR uses.
A single-step approach here is sufficient: if the signatures validate, we can produce the output and validate the FOD hash. If the signatures don't validate, bail out (similar to requireFile) so no output gets produced.
Edit to add: the approach I'd suggest is to not have a two-stage derivation, but to perform the signature checking in NIX_PREFETCH_GIT_CHECKOUT_HOOK. If the signatures validate, great, carry on and finish the build. If the signatures don't validate, error out in the hook, which will cause the entire fetch to fail.
| # create a keyring containing gpgKeys | ||
| gpgKeyring = runCommand "gpgKeyring" { buildInputs = [ gnupg ]; } '' | ||
| gpg --homedir /build --no-default-keyring --keyring $out --fingerprint # create empty keyring at $out | ||
| for KEY in ${lib.concatStringsSep " " gpgKeys} |
There was a problem hiding this comment.
gpgKeys is going to be a list of store paths, so I think this should be lib.escapeShellArgs gpgKeys.
| gitOnRepo | ||
| ]; | ||
| text = '' | ||
| committerTime="$(gitOnRepo -c core.pager=cat log --format="%cd" --date=raw -n 1 ${revWithTag})" |
There was a problem hiding this comment.
core.pager shouldn't need to be set here; I think that adds complexity for no gain.
| } | ||
| '' | ||
| gpgWithKeys -k | ||
| if test "$verifyCommit" == 1; then |
There was a problem hiding this comment.
test "$verifyCommit" == 1 feels very fragile to me: it only works because runCommand converts true to the string 1 and false to the null string.
I'd much prefer [[ "$verifyCommit" ]]. Which does depend on false becoming the null string, but that's (IME) a much more common shell idiom, so I think is much more likely to hold indefinitely.
| sparseCheckout = builtins.concatStringsSep "\n" sparseCheckout; | ||
|
|
||
| if verifyCommit || verifyTag then | ||
| verifySignature { |
There was a problem hiding this comment.
I think there should be documentation, probably here but definitely somewhere in these changes, to explain the need for the two-step verification. Something like the explanation at #330457 (comment) (notwithstanding my overall review comment) needs to be included in the repository itself, rather than needing someone to go looking for it in the PR discussions.
| gitOnRepo \ | ||
| -c gpg.ssh.allowedSignersFile="${allowedSignersFile}" \ | ||
| -c gpg.program="gpgWithKeys" \ | ||
| verify-commit ${revWithTag} |
There was a problem hiding this comment.
I think ${revWithTag} needs to be ${lib.escapeShellArg revWithTag}: it's rare, but I'm fairly sure it's entirely valid for tags and branch names and so forth to contain spaces or other characters that need to be escaped in shell.
| name = "gitOnRepo"; | ||
| runtimeInputs = [ git ]; | ||
| text = '' | ||
| git -C "${fetchresult}" -c safe.directory='*' "$@" |
There was a problem hiding this comment.
Nit: you're quoting a store path here, but not in other bits of shell code. I don't particularly mind which you do (personally, I'd either pass things in as environment variables then quote the variable, or use lib.escapeShellArg again), but consistency across this file would be nice.
| gpgKeys = lib.catAttrs "key" keysPartitioned.right; | ||
| sshKeys = keysPartitioned.wrong; |
There was a problem hiding this comment.
I think these are confusingly inconsistent:
gpgKeysis a list of store paths, where the paths are the result of the derivations in thekeysattribute of the givenpublicKeysattrsets.sshKeysis a list of attrsets, with keys stored as strings in thekeyattribute of each attrset.
To fix this, I think (a) there should be different names for values that are expected to be paths versus strings, e.g. use key in SSH keys and keypath in GPG keys, and (b) the values in gpgKeys shouldn't be broken out until they're used, as with sshKeys, or sshKeys should be renamed to something that's clearly different like sshKeyAttrs.
|
Picking up a comment I missed in the scrollback:
So the scenario here would be:
I think this is a plausible attack vector, but I don't think this PR protects against it: specifying GPG/SSH signatures in the
I don't think this is true: if you build your system from scratch without using a substituter, you'd need to fetch the source code from the upstream repository, so you'd have as much chance to verify the signatures in a one-pass build as you would in a two-pass build. |
|
Hey @me-and, thanks for taking the time!
I should've clarified the trust scenario further: in this example we do trust the nixpkgs builder/cache infrastructure but assume the update bot compromised. If
Yes, thanks, thats a way clearer way of phrasing it.
A two-pass approach would have a seperate verified source non-FOD, which As you don't have the tor signing keys going into
Because of link rot you can't rebuild nixpkgs without relying on the FODs stored in the cache (because the original sources got lost). However, for using those you wouldn't have to trust the cache as FODs are content-addressed. |
Ah, I'd managed to miss that the derivation that has the signatures verified isn't a FOD. That makes more sense now, thank you! I'd really like to see this scenario covered by a test, both demonstrating the vulnerability using regular I also wonder if there's some sensible way to make it more obvious that I expect the way to do that would be to make a new function, so a call would look something like this: I'm not desperately sold on the idea, or even sure it'd work at all, but that structure or something like it does at least make it more obvious that the signature-verified derivation isn't a FOD, it just depends on one.
Right, I see. I'm not sure this is a model that's currently supported by Nix: I don't think you can use a cache for FODs while forcing anything that's not a FOD to be compiled locally. I'm also not a fan of the assertion that "you can't rebuild nixpkgs without relying on the FODs stored in the cache". That's probably true if you consider the entirety of nixpkgs, but the solution to that IMO is fixing the linkrot, not relying on the binary cache, and I very much hope and expect that maintained packages have such problems fixed fairly rapidly. Fundamentally, the binary cache exists to speed up using Nixpkgs, and I'm not comfortable relying on it as a source code backup system. I've no problem with adding code here that would help resolve this concern, but I don't think it can be a motivator for bringing in these changes until and unless there's a mechanism for people to use a cache only for FODs. |
But would this be feasable in nix-only code? If I'm not missing something we'd be talking about spinning up a binary cache in a nixos-test. I'd love to see this demonstrated, but I'm not sure I'd be willing to go to such a length to do so.
I totally understand where the confusion comes from when discussing the technical details of the Pr, however I don't think this would be of any relevance to the user and it feels much more natural from a users perspective to have this included in the fetcher (similar to the nix fetcher), but I see that it puts some burden on the fetchers in form of added complexity. Of course I also have the bias of not wishing to re-implement all this, but a year ago I did consciously decide to change the fetcher itself. I guess in the end it also comes down to taste.
Well, yes you can by changing the store directory. I was investigating this for some time but got stuck because I don't have the resources to bootstrap nixpkgs.
I agree this isn't ideal nor this PRs primary aim, even though I may differ in the conclusion slightly ^^ |
|
@philiptaron @me-and Without wanting to seem pushy, I'd much like to move this forward before the merge conflicts stack up again and it moves outside of all our attention spans ^^ |
|
Due to the SC election, I haven't had a chance to look at this. I appreciate your continued patience. I do want to land it. |
Very much seconded, although the things getting in my way are entirely offline distractions. Spending some more quality time with this PR remains high on my to-do list. |
Description of changes
This makes it possible to include git commit and tag signature verification when using fetchgit.
The motivation for this is to enable maintainers to ensure an update to the source code was authorized by the developers.
As signing keys are supposed to stay the same over multiple versions, they should remain unchanged when updating to a new revision of a given source code.
The verification is done in a non-fixed-output derivation so it will always be executed.
See https://landweber.xyz/ba.pdf Section 5.1.5 and #43233 (comment).
The pr mimics the behaviior of the verified-fetches experimental feature introduced by NixOS/nix#8848.
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.