Skip to content

fetchgit: add git signature verification#330457

Open
flandweber wants to merge 1 commit intoNixOS:masterfrom
flandweber:fetchgit-verified
Open

fetchgit: add git signature verification#330457
flandweber wants to merge 1 commit intoNixOS:masterfrom
flandweber:fetchgit-verified

Conversation

@flandweber
Copy link
Contributor

@flandweber flandweber commented Jul 27, 2024

Description of changes

This makes it possible to include git commit and tag signature verification when using fetchgit.

The motivation for this is to enable maintainers to ensure an update to the source code was authorized by the developers.
As signing keys are supposed to stay the same over multiple versions, they should remain unchanged when updating to a new revision of a given source code.

The verification is done in a non-fixed-output derivation so it will always be executed.
See https://landweber.xyz/ba.pdf Section 5.1.5 and #43233 (comment).

The pr mimics the behaviior of the verified-fetches experimental feature introduced by NixOS/nix#8848.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added the 6.topic: fetch Fetchers (e.g. fetchgit, fetchsvn, ...) label Jul 27, 2024
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. labels Jul 27, 2024
@ofborg ofborg bot added 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. and removed 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Jul 30, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels pretty hacky so I'd be glad for any ideas of how to improve this.
In my view it should be the default to include the passed reference/tag in the .git output, but changing this now would break backwards-reproducibility.
Maybe this could be an additionall --fetch-tag flag for nix-prefetch-git?

@philiptaron philiptaron self-requested a review September 5, 2024 18:27
@philiptaron
Copy link
Contributor

Requesting a review from myself to look later. I like this idea, but the hacks make me think there ought to be precursor patches that (say) make the callers of fetchgit tell you whether something is a tag or not.

@infinisil infinisil added the 1.severity: significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Sep 6, 2024
@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Sep 10, 2024
@flandweber flandweber removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Sep 11, 2024
@flandweber
Copy link
Contributor Author

rebased to master

Copy link
Contributor

@nicoonoclaste nicoonoclaste Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to perform verification directly in the first pass?

Otherwise we are introducing a performance/security tradeoff, and some people may choose not to enable signature verification, to avoid bloating the source size in the store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean as part of the downloading fetcher?

Sadly not, as the fetcher is a fixed-output derivation.
If the verification would be part of it instead of a seperate component, it would not run if files with the same checksum already exists in the Nix store.
I tried to explain the problem in detail in https://landweber.xyz/ba.pdf Section 5.1.5
Also see #43233 (comment)

From my understanding, enabling leaveDotGit should not inflate the source size badly, but I didn't test this.
In any case, this is only relevant when building the derivation. When substituting it, users will only downloaded the files (unless they themselves enabled leaveDotGit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, life was somewhat difficult this year. 😓

From my understanding, enabling leaveDotGit should not inflate the source size badly, but I didn't test this.

It can in repositories with deep histories, like nixpkgs itself. (Sorry, I tried to produce an example but GitHub is having issues tonight, so fetchGit was taking ~forever.)

Sadly not, as the fetcher is a fixed-output derivation. If the verification would be part of it instead of a seperate component, it would not run if files with the same checksum already exists in the Nix store.

In your BSc. thesis (and in particular the section you mentioned) the goal is to validate signatures when fetching nixpkgs itself, using builtins.fetchGit and without a known output hash. pkgs.fetchgit is different, and requires a known output hash; in fact, if one isn't specified, lib.fakeHash is used so there will be a mismatch and the user can fill in the actual hash, effectively creating a “trust on first use” model.

If the output hash is known a priori, an attacker cannot manipulate1 the fetcher's output. However, git signature validation would still have value in nixpkgs to protect that initial fetch (by a maintainer) and authenticate the versions of upstream packages which are being pinned (by hash) in nixpkgs.

In that case, wouldn't a single-pass implementation address the same threat model?

[...] When substituting it, users will only downloaded the files (unless they themselves enabled leaveDotGit).

I think there are two distinct cases to consider:

  1. users downloading a prebuilt (and preverified) source tree from a substituter;
  2. users of fetchGit who are actually fetching from a repository (as well as performing signature verification).

In both single- and two-pass implementations, (1) only receive the necessary data.

My concern is that the time- and storage-overhead of the two pass implementation, would cause users of fetchGit, whether nixpkgs contributors or nix users defining custom derivations, to simply not use signature validation (because it makes their builds slower or their system run out of disk space)

Footnotes

  1. I am implicitely assuming the hash function to be 2nd-preimage resistant, but it is also a requirement for cryptographic signatures to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, life was somewhat difficult this year. 😓

I'm sorry it was and thankful for your participation! :)


It can in repositories with deep histories, like nixpkgs itself. (Sorry, I tried to produce an example but GitHub is having issues tonight, so fetchGit was taking ~forever.)

Usually fetchgit will only receive the latest commit (shallow clone).
Enabling deep cloning implies leaveDotGit, so the fetching would not add any storage overhead.


In your BSc. thesis (and in particular the section you mentioned) the goal is to validate signatures when fetching nixpkgs itself, using builtins.fetchGit and without a known output hash.

I'm sorry you got the impression, but the section "5.1.5 Verification in fixed-output derivations" does not talk about the builtin fetcher. It states:

[T]here are two distinct ways of implementing verifying fetchers in Nix. [...] As almost all fetchers depend upon network access at build time for the file download, they typically contain a fixed-output derivation and thus require a hash.


If the output hash is known a priori, an attacker cannot manipulate1 the fetcher's output. However, git signature validation would still have value in nixpkgs to protect that initial fetch (by a maintainer) and authenticate the versions of upstream packages which are being pinned (by hash) in nixpkgs. In that case, wouldn't a single-pass implementation address the same threat model?

Right, that's a good question. Usually hash updates will not be performed by maintainers but by the merge bot.
Having the signature verification execute on every build ensures, that the bot updated to a signed release.
I also loath verifications that are not run every rebuild but sit in the code as if they were. It should not be possible to build a fetchGit derivation with the correct commit/sha256 hash but an invalid signature key.


My concern is that the time- and storage-overhead of the two pass implementation, would cause users of fetchGit, whether nixpkgs contributors or nix users defining custom derivations, to simply not use signature validation (because it makes their builds slower or their system run out of disk space)

I share your concern regarding the time overhead (not space, as described above), because making the .git folder 'deterministic' is expensive on large repositories. However, as I don't see the one-pass approach as a viable option, I'd be willing to put this computation requirement on fetchGit users.

@nicoonoclaste
Copy link
Contributor

Thanks a lot for the good work in here, @flandweber ! ❤️

@flandweber flandweber force-pushed the fetchgit-verified branch 3 times, most recently from 79f902b to 80007ef Compare September 18, 2024 10:50
Copy link
Contributor

@philiptaron philiptaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience in waiting for a reviewer.

I'm approving the approach. As I understand it, that is:

  1. Clean the git checkout if needed in the fixed-output derivation.
  2. Do the signature checking in a separate derivation. I buy your argument that the current FOD implementation makes these sorts of verifications in nix code hard to get right.

I'd like to see three "neatness" changes:

  1. Move the hook into nativeBuildInputs
  2. Move the verifying Nix code into its own file (maybe pkgs/build-support/fetchgit/verify.nix)
  3. Import and use that code if the user requests it.

I also wonder if you need to copy the contents from fetchresult, and whether you could just make out be a symlink to the fetchresult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid this style of hook -- by my count using rg _HOOK -tnix --sort=path, this would be only the only one in nixpkgs. I think you can use nativeBuildInputs to add the hook conditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I don't yet understand your suggestion.

Do you mean that the hook should be placed in it's own script and conditionally passed to nativeBuildInputs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flandweber Philip is presumably talking about setup hooks.

@flandweber
Copy link
Contributor Author

flandweber commented Nov 9, 2024

@philiptaron Thank you so much for your review!

I would need some elaboration on the nativeBuildInputs idea.

I also wonder if you need to copy the contents from fetchresult, and whether you could just make out be a symlink to the fetchresult.

yes, except when leaveDotGit = false.

@philiptaron philiptaron requested a review from me-and September 1, 2025 12:57
@philiptaron
Copy link
Contributor

I've added Adam @me-and on this review; they've helped enormously with all Git-related PRs in the last few months. This PR is not currently in a reviewable state due to the errors from CI and the merge conflicts, but it does represent something I would in concept like to merge.

@flandweber
Copy link
Contributor Author

I've added Adam @me-and on this review; they've helped enormously with all Git-related PRs in the last few months. This PR is not currently in a reviewable state due to the errors from CI and the merge conflicts, but it does represent something I would in concept like to merge.

Thanks!
I wanted to get the latest concept discussion out of the way before putting in the work to rewrite the pr based on master, but if you're on board I'll see to find the time soon to move this forward and get it up to date again.
Do you have thoughts on the one-pass/two-pass discussion?

@philiptaron
Copy link
Contributor

philiptaron commented Sep 1, 2025

I agree that two-pass is the only reasonable way to implement this. That said, I don't think you need to copy the entire contents into the second derivation. I believe it will work to do the verification, then have the verifying derivation contain a symlink to the first derivation unchanged.

@philiptaron
Copy link
Contributor

Could you work on getting to a clean CI bill of health? You can run many of these tests locally. See ci/eval/README.md for the eval tests, which are failing, and run nix-shell --run treefmt in the root to format your files, also failing.

@nixpkgs-ci nixpkgs-ci bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Sep 2, 2025
@me-and
Copy link
Member

me-and commented Sep 2, 2025

I've added Adam @me-and on this review; they've helped enormously with all Git-related PRs in the last few months. This PR is not currently in a reviewable state due to the errors from CI and the merge conflicts, but it does represent something I would in concept like to merge.

Thanks for the tag! The high-level concept sounds excellent to me, although I'm helping run an event in a couple of weeks so I'm not going to be able to provide any more useful feedback until mid/late September.

@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Sep 2, 2025
@flandweber
Copy link
Contributor Author

One thing that came up during redrafting of this pr was key expiry, which is why I added a mechanism so that gpg signatures are checked at the creation time of the corresponding commit. For tags it would be nicer to check their signature on tag creation time, but I didn't figure out how to extract that yet.

Other than that the current state is a redraft of the work from a year ago.

Regarding nativeBuildInputs (#330457 (comment)) I still don't understand how they can be used to fetch the respective tag after fetching (and removing of tags) but before making .git (semi-)deterministic. @philiptaron could you spell it out for me a little further, then I'm happy to implement something cleaner.

I'll add a proper commit message when we get closer to landing.

@philiptaron
Copy link
Contributor

(semi-)deterministic

Referring to #8567 I presume.

I should have a bit of time while waiting for transit to NixCon today, but it might slip a week or more.

@me-and
Copy link
Member

me-and commented Sep 22, 2025

Regarding nativeBuildInputs (#330457 (comment)) I still don't understand how they can be used to fetch the respective tag after fetching (and removing of tags) but before making .git (semi-)deterministic. @philiptaron could you spell it out for me a little further, then I'm happy to implement something cleaner.

Possibly @philiptaron has something cunning that I'm not aware of, but I think the NIX_PREFETCH_GIT_CHECKOUT_HOOK is the only viable approach right now. Per my comment in 8567, I have grand plans to make that hook more accessible and usable, but I have far more plans than I do time to enact them ><

Copy link
Member

@me-and me-and left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big stack of small change requests, but I want to revisit one of the key assumptions for this approach.

From #330457 (comment):

Usually hash updates will not be performed by maintainers but by the merge bot. Having the signature verification execute on every build ensures, that the bot updated to a signed release. I also loath verifications that are not run every rebuild but sit in the code as if they were. It should not be possible to build a fetchGit derivation with the correct commit/sha256 hash but an invalid signature key.

I don't think this holds. There are three scenarios I can think of:

  • fetchgit resolves to a derivation that is already realised in your local store. In that case, no build stage will be performed, and therefore there's no chance to perform any signature checking.
  • fetchgit resolves to a derivation that isn't realised in your local store but is available from a substituter/binary cache. Again, no build will be performed, so there's no chance to check signatures.
  • fetchgit resolves to a derivation that isn't available anywhere. In that case, Nix will attempt a build which will either complete successfully and with an output that matches the given FOD hash, or it'll fail. That's the case regardless of where the FOD hash came from, and this build step provides an opportunity to verify the hashes without needing the two-step process this PR uses.

A single-step approach here is sufficient: if the signatures validate, we can produce the output and validate the FOD hash. If the signatures don't validate, bail out (similar to requireFile) so no output gets produced.

Edit to add: the approach I'd suggest is to not have a two-stage derivation, but to perform the signature checking in NIX_PREFETCH_GIT_CHECKOUT_HOOK. If the signatures validate, great, carry on and finish the build. If the signatures don't validate, error out in the hook, which will cause the entire fetch to fail.

# create a keyring containing gpgKeys
gpgKeyring = runCommand "gpgKeyring" { buildInputs = [ gnupg ]; } ''
gpg --homedir /build --no-default-keyring --keyring $out --fingerprint # create empty keyring at $out
for KEY in ${lib.concatStringsSep " " gpgKeys}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpgKeys is going to be a list of store paths, so I think this should be lib.escapeShellArgs gpgKeys.

gitOnRepo
];
text = ''
committerTime="$(gitOnRepo -c core.pager=cat log --format="%cd" --date=raw -n 1 ${revWithTag})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core.pager shouldn't need to be set here; I think that adds complexity for no gain.

}
''
gpgWithKeys -k
if test "$verifyCommit" == 1; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test "$verifyCommit" == 1 feels very fragile to me: it only works because runCommand converts true to the string 1 and false to the null string.

I'd much prefer [[ "$verifyCommit" ]]. Which does depend on false becoming the null string, but that's (IME) a much more common shell idiom, so I think is much more likely to hold indefinitely.

sparseCheckout = builtins.concatStringsSep "\n" sparseCheckout;

if verifyCommit || verifyTag then
verifySignature {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be documentation, probably here but definitely somewhere in these changes, to explain the need for the two-step verification. Something like the explanation at #330457 (comment) (notwithstanding my overall review comment) needs to be included in the repository itself, rather than needing someone to go looking for it in the PR discussions.

gitOnRepo \
-c gpg.ssh.allowedSignersFile="${allowedSignersFile}" \
-c gpg.program="gpgWithKeys" \
verify-commit ${revWithTag}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ${revWithTag} needs to be ${lib.escapeShellArg revWithTag}: it's rare, but I'm fairly sure it's entirely valid for tags and branch names and so forth to contain spaces or other characters that need to be escaped in shell.

name = "gitOnRepo";
runtimeInputs = [ git ];
text = ''
git -C "${fetchresult}" -c safe.directory='*' "$@"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you're quoting a store path here, but not in other bits of shell code. I don't particularly mind which you do (personally, I'd either pass things in as environment variables then quote the variable, or use lib.escapeShellArg again), but consistency across this file would be nice.

Comment on lines +23 to +24
gpgKeys = lib.catAttrs "key" keysPartitioned.right;
sshKeys = keysPartitioned.wrong;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are confusingly inconsistent:

  • gpgKeys is a list of store paths, where the paths are the result of the derivations in the keys attribute of the given publicKeys attrsets.
  • sshKeys is a list of attrsets, with keys stored as strings in the key attribute of each attrset.

To fix this, I think (a) there should be different names for values that are expected to be paths versus strings, e.g. use key in SSH keys and keypath in GPG keys, and (b) the values in gpgKeys shouldn't be broken out until they're used, as with sshKeys, or sshKeys should be renamed to something that's clearly different like sshKeyAttrs.

@me-and
Copy link
Member

me-and commented Sep 22, 2025

Picking up a comment I missed in the scrollback:

For an example of how an attack abusing signing with a one-pass approach would work, let's say I wanted to substitute pkgs.tor by a malicious version and it was protected by a signature as shown in the tests. I could then create a repository, containing the modified tor src, alongside an unrelated program. I would then add the unrelated program as a package to nixpkgs (it might be sufficient to get the pr built by hydra, not sure how caching works here), adding the source code to the nix cache (without any keys being checked). As a final step, I could then create a pr 'updating' tor to a new version and substituting the hash by the modified source-code's hash. It would look like the developers keys are being checked, but in effect they never are because a package with the same hash already exists in cache.

So the scenario here would be:

  • I create a malicious repo that can be used to build trojan and tor, where trojan is some benign tool, and tor looks like the regular tor package but has a malicious backdoor.
  • I get pkgs.trojan added to Nixpkgs, where pkgs.trojan.src is a FOD that fetches this repo.
  • Hydra / other unsuspecting users build pkgs.trojan, and therefore to pick up pkgs.trojan.src.
  • I update pkgs.tor.src to have the same FOD hash as pkgs.trojan.src. Then when someone attempts to build pkgs.tor, it uses the code from the malicious repository that Hydra cached when it built pkgs.trojan.

I think this is a plausible attack vector, but I don't think this PR protects against it: specifying GPG/SSH signatures in the fetchgit call wouldn't stop Nix using the cached malicious code from the local store or binary cache if the FOD hash matched.

A further example would be trying to build your system from scratch, only relying on nixpkgs but not trusting the cache: a one-pass approach would not (re-)verify source signatures, while a two-pass would.

I don't think this is true: if you build your system from scratch without using a substituter, you'd need to fetch the source code from the upstream repository, so you'd have as much chance to verify the signatures in a one-pass build as you would in a two-pass build.

@flandweber
Copy link
Contributor Author

Hey @me-and, thanks for taking the time!

Usually hash updates will not be performed by maintainers but by the merge bot. Having the signature verification execute on every build ensures, that the bot updated to a signed release. I also loath verifications that are not run every rebuild but sit in the code as if they were. It should not be possible to build a fetchGit derivation with the correct commit/sha256 hash but an invalid signature key.

I don't think this holds. There are three scenarios I can think of:

* `fetchgit` resolves to a derivation that is already realised in your local store.  In that case, no build stage will be performed, and therefore there's no chance to perform any signature checking.

* `fetchgit` resolves to a derivation that isn't realised in your local store but is available from a substituter/binary cache. Again, no build will be performed, so there's no chance to check signatures.

* `fetchgit` resolves to a derivation that isn't available anywhere.  In that case, Nix will attempt a build which will either complete successfully _and_ with an output that matches the given FOD hash, or it'll fail.  That's the case regardless of where the FOD hash came from, and this build step provides an opportunity to verify the hashes without needing the two-step process this PR uses.

A single-step approach here is sufficient: if the signatures validate, we can produce the output and validate the FOD hash. If the signatures don't validate, bail out (similar to requireFile) so no output gets produced.

I should've clarified the trust scenario further: in this example we do trust the nixpkgs builder/cache infrastructure but assume the update bot compromised. If fetchgit used one-pass signature verification, the update bot could therefore provide the hash to any FOD that was already build by the built infrastructure to have it get accepted. It's essentially the same scenario layed out in #330457 (comment) and discussed in the following.

So the scenario here would be:

* I create a malicious repo that can be used to build `trojan` and `tor`, where `trojan` is some benign tool, and `tor` looks like the regular `tor` package but has a malicious backdoor.

* I get `pkgs.trojan` added to Nixpkgs, where `pkgs.trojan.src` is a FOD that fetches this repo.

* Hydra / other unsuspecting users build `pkgs.trojan`, and therefore to pick up `pkgs.trojan.src`.

* I update `pkgs.tor.src` to have the same FOD hash as `pkgs.trojan.src`.  Then when someone attempts to build `pkgs.tor`, it uses the code from the malicious repository that Hydra cached when it built `pkgs.trojan`.

Yes, thanks, thats a way clearer way of phrasing it.

I think this is a plausible attack vector, but I don't think this PR protects against it: specifying GPG/SSH signatures in the fetchgit call wouldn't stop Nix using the cached malicious code from the local store or binary cache if the FOD hash matched.

A two-pass approach would have a seperate verified source non-FOD, which pkgs.tor would rely on.
The build graph would probably look something like this:

pkgs.trojan.src -> pkgs.trojan

one-pass fetchgit: pkgs.trojan.src -> pkgs.tor
two-pass fetchgit: pkgs.trojan.src -> trojan-src-verified -> pkgs.tor

As you don't have the tor signing keys going into trojan-src-verified, the two-pass build will fail on any honest builder, while the one-pass would indeed not protect against this attack.

A further example would be trying to build your system from scratch, only relying on nixpkgs but not trusting the cache: a one-pass approach would not (re-)verify source signatures, while a two-pass would.

I don't think this is true: if you build your system from scratch without using a substituter, you'd need to fetch the source code from the upstream repository, so you'd have as much chance to verify the signatures in a one-pass build as you would in a two-pass build.

Because of link rot you can't rebuild nixpkgs without relying on the FODs stored in the cache (because the original sources got lost). However, for using those you wouldn't have to trust the cache as FODs are content-addressed.
In fact, when using a different store dir than /nix/store FODs can still be fetched from the cache while non-FODs will be rebuild.

@me-and
Copy link
Member

me-and commented Sep 23, 2025

So the scenario here would be:

  • I create a malicious repo that can be used to build trojan and tor, where trojan is some benign tool, and tor looks like the regular tor package but has a malicious backdoor.

  • I get pkgs.trojan added to Nixpkgs, where pkgs.trojan.src is a FOD that fetches this repo.

  • Hydra / other unsuspecting users build pkgs.trojan, and therefore to pick up pkgs.trojan.src.

  • I update pkgs.tor.src to have the same FOD hash as pkgs.trojan.src. Then when someone attempts to build pkgs.tor, it uses the code from the malicious repository that Hydra cached when it built pkgs.trojan.

Yes, thanks, thats a way clearer way of phrasing it.

I think this is a plausible attack vector, but I don't think this PR protects against it: specifying GPG/SSH signatures in the fetchgit call wouldn't stop Nix using the cached malicious code from the local store or binary cache if the FOD hash matched.

A two-pass approach would have a seperate verified source non-FOD, which pkgs.tor would rely on. The build graph would probably look something like this:

pkgs.trojan.src -> pkgs.trojan

one-pass fetchgit: pkgs.trojan.src -> pkgs.tor
two-pass fetchgit: pkgs.trojan.src -> trojan-src-verified -> pkgs.tor

As you don't have the tor signing keys going into trojan-src-verified, the two-pass build will fail on any honest builder, while the one-pass would indeed not protect against this attack.

Ah, I'd managed to miss that the derivation that has the signatures verified isn't a FOD. That makes more sense now, thank you!

I'd really like to see this scenario covered by a test, both demonstrating the vulnerability using regular fetchgit commands and failing the build when using signature verification. That's probably a much more complex test than the existing fetchgit tests, but I think it'll be a really valuable demonstration of the attack vector and how this PR protects against it.

I also wonder if there's some sensible way to make it more obvious that fetchgit { verifyTags = true; hash = <value>; ... } isn't a fixed output derivation. I think it's likely to be very surprising that fetchgit and its siblings go from always producing FODs to only sometimes producing FODs. I think that's where my confusion came from: I'm so used to that pattern generating a FOD that it hadn't occurred to me that it's doing something different here.

I expect the way to do that would be to make a new function, so a call would look something like this:

fetchgitchecksigs {
  verifyTags = true;
  keys = ...;
  src = fetchgit { # Or fetchFromGitHub or whatever
    url = ...;
    hash = ...;
    leaveDotGit = true;
  };
}

I'm not desperately sold on the idea, or even sure it'd work at all, but that structure or something like it does at least make it more obvious that the signature-verified derivation isn't a FOD, it just depends on one.

A further example would be trying to build your system from scratch, only relying on nixpkgs but not trusting the cache: a one-pass approach would not (re-)verify source signatures, while a two-pass would.

I don't think this is true: if you build your system from scratch without using a substituter, you'd need to fetch the source code from the upstream repository, so you'd have as much chance to verify the signatures in a one-pass build as you would in a two-pass build.

Because of link rot you can't rebuild nixpkgs without relying on the FODs stored in the cache (because the original sources got lost). However, for using those you wouldn't have to trust the cache as FODs are content-addressed. In fact, when using a different store dir than /nix/store FODs can still be fetched from the cache while non-FODs will be rebuild.

Right, I see. I'm not sure this is a model that's currently supported by Nix: I don't think you can use a cache for FODs while forcing anything that's not a FOD to be compiled locally.

I'm also not a fan of the assertion that "you can't rebuild nixpkgs without relying on the FODs stored in the cache". That's probably true if you consider the entirety of nixpkgs, but the solution to that IMO is fixing the linkrot, not relying on the binary cache, and I very much hope and expect that maintained packages have such problems fixed fairly rapidly. Fundamentally, the binary cache exists to speed up using Nixpkgs, and I'm not comfortable relying on it as a source code backup system.

I've no problem with adding code here that would help resolve this concern, but I don't think it can be a motivator for bringing in these changes until and unless there's a mechanism for people to use a cache only for FODs.

@flandweber
Copy link
Contributor Author

I'd really like to see this scenario covered by a test, both demonstrating the vulnerability using regular fetchgit commands and failing the build when using signature verification. That's probably a much more complex test than the existing fetchgit tests, but I think it'll be a really valuable demonstration of the attack vector and how this PR protects against it.

But would this be feasable in nix-only code? If I'm not missing something we'd be talking about spinning up a binary cache in a nixos-test. I'd love to see this demonstrated, but I'm not sure I'd be willing to go to such a length to do so.

I also wonder if there's some sensible way to make it more obvious that fetchgit { verifyTags = true; hash = <value>; ... } isn't a fixed output derivation. I think it's likely to be very surprising that fetchgit and its siblings go from always producing FODs to only sometimes producing FODs. I think that's where my confusion came from: I'm so used to that pattern generating a FOD that it hadn't occurred to me that it's doing something different here.
I'm not desperately sold on the idea, or even sure it'd work at all, but that structure or something like it does at least make it more obvious that the signature-verified derivation isn't a FOD, it just depends on one.

I totally understand where the confusion comes from when discussing the technical details of the Pr, however I don't think this would be of any relevance to the user and it feels much more natural from a users perspective to have this included in the fetcher (similar to the nix fetcher), but I see that it puts some burden on the fetchers in form of added complexity.

Of course I also have the bias of not wishing to re-implement all this, but a year ago I did consciously decide to change the fetcher itself.

I guess in the end it also comes down to taste.

Because of link rot you can't rebuild nixpkgs without relying on the FODs stored in the cache (because the original sources got lost). However, for using those you wouldn't have to trust the cache as FODs are content-addressed. In fact, when using a different store dir than /nix/store FODs can still be fetched from the cache while non-FODs will be rebuild.

Right, I see. I'm not sure this is a model that's currently supported by Nix: I don't think you can use a cache for FODs while forcing anything that's not a FOD to be compiled locally.

Well, yes you can by changing the store directory. I was investigating this for some time but got stuck because I don't have the resources to bootstrap nixpkgs.

I'm also not a fan of the assertion that "you can't rebuild nixpkgs without relying on the FODs stored in the cache". That's probably true if you consider the entirety of nixpkgs, but the solution to that IMO is fixing the linkrot, not relying on the binary cache, and I very much hope and expect that maintained packages have such problems fixed fairly rapidly. Fundamentally, the binary cache exists to speed up using Nixpkgs, and I'm not comfortable relying on it as a source code backup system.

I've no problem with adding code here that would help resolve this concern, but I don't think it can be a motivator for bringing in these changes until and unless there's a mechanism for people to use a cache only for FODs.

I agree this isn't ideal nor this PRs primary aim, even though I may differ in the conclusion slightly ^^

@flandweber
Copy link
Contributor Author

@philiptaron @me-and Without wanting to seem pushy, I'd much like to move this forward before the merge conflicts stack up again and it moves outside of all our attention spans ^^
For me text works well as a mode of communication (when I'm interested), but we could also hop on a call to talk thing through if you think that could be easier.

@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Sep 30, 2025
@philiptaron
Copy link
Contributor

Due to the SC election, I haven't had a chance to look at this. I appreciate your continued patience. I do want to land it.

@flandweber
Copy link
Contributor Author

Related: https://discourse.nixos.org/t/fix-your-fods-fods-and-security/71531/3

@me-and
Copy link
Member

me-and commented Nov 2, 2025

Due to the SC election, I haven't had a chance to look at this. I appreciate your continued patience. I do want to land it.

Very much seconded, although the things getting in my way are entirely offline distractions. Spending some more quality time with this PR remains high on my to-do list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.severity: significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. 2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: fetch Fetchers (e.g. fetchgit, fetchsvn, ...) 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants