Skip to content

Optionally allow git filters to be applied when reading blobs#13993

Open
GrahamDennis wants to merge 8 commits intoNixOS:masterfrom
GrahamDennis:gdennis/git-filter-config
Open

Optionally allow git filters to be applied when reading blobs#13993
GrahamDennis wants to merge 8 commits intoNixOS:masterfrom
GrahamDennis:gdennis/git-filter-config

Conversation

@GrahamDennis
Copy link
Contributor

Motivation

Nix 2.20 introduced a regression (a breaking change) in which git repositories do not properly have filters applied (specifically filters that are configured via .gitattributes). This manifests as "NAR hash mismatch" errors when users are upgrading from earlier versions and they have a dependency that uses .gitattributes to manipulate line endings. curl is the most notable example I am aware of which, for example, specifies in .gitattributes that *.bat files should have CRLF line endings.

This change differs from #13428 in that the nix 2.20 behavior is retained as the default, with the previous behavior now being opt-in via the optional applyFilters attribute to git flake inputs. As such, this PR does not introduce a breaking change.

Context

Fixes #11428, alternative to #13428.

The current nix behaviour seems completely broken when it comes to handling git filters.

#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail

readonly REPO="https://github.com/curl/curl.git"
readonly COMMIT="6b951a6928811507d493303b2878e848c077b471"

readonly FILES=(
  # This file does not contain CRLF but has `eol=crlf` in `.gitattributes`.
  buildconf.bat

  # This file does not contain CRLF.
  README.md

  # This file contains CRLF and isn't matched by `.gitattributes`.
  winbuild/README.md
)

# Don't respect user/system configuration.
export GIT_CONFIG_GLOBAL=/dev/null
export GIT_CONFIG_NOSYSTEM=true

function check_crlf() {
  OUTPUT_DIR="curl-$(echo "$@" | sha256sum | awk '{print $1}')"

  if ! test -d "${OUTPUT_DIR}"; then
    git $@ clone --quiet "${REPO}" "${OUTPUT_DIR}"
    git $@ -C "${OUTPUT_DIR}" checkout --quiet "${COMMIT}"
  fi

  for FILE in ${FILES[@]}; do
    if grep --quiet $'\r' "${OUTPUT_DIR}/${FILE}"; then
      echo "git $@ => ${FILE} contains CRLF"
    else
      echo "git $@ => ${FILE} does not contain CRLF"
    fi
  done

  echo
}

nix --version
for FILE in ${FILES[@]}; do
  if grep --quiet $'\r' "$(nix eval --raw --expr "(builtins.fetchGit { url = \"${REPO}\"; ref = \"master\"; rev = \"${COMMIT}\"; }).outPath")/${FILE}"; then
    echo "nix $(nix --version) => ${FILE} contains CRLF"
  else
    echo "nix $(nix --version) => ${FILE} does not contain CRLF"
  fi
done
echo

check_crlf
check_crlf -c core.eol=lf
check_crlf -c core.eol=crlf
check_crlf -c core.eol=native
check_crlf -c core.autocrlf=false
check_crlf -c core.autocrlf=input
check_crlf -c core.autocrlf=true

The output of the above script on my machine is as follows:

nix (Nix) 2.29.1
nix nix (Nix) 2.29.1 => buildconf.bat does not contain CRLF
nix nix (Nix) 2.29.1 => README.md does not contain CRLF
nix nix (Nix) 2.29.1 => winbuild/README.md contains CRLF

git  => buildconf.bat contains CRLF
git  => README.md does not contain CRLF
git  => winbuild/README.md contains CRLF

git -c core.eol=lf => buildconf.bat contains CRLF
git -c core.eol=lf => README.md does not contain CRLF
git -c core.eol=lf => winbuild/README.md contains CRLF

git -c core.eol=crlf => buildconf.bat contains CRLF
git -c core.eol=crlf => README.md does not contain CRLF
git -c core.eol=crlf => winbuild/README.md contains CRLF

git -c core.eol=native => buildconf.bat contains CRLF
git -c core.eol=native => README.md does not contain CRLF
git -c core.eol=native => winbuild/README.md contains CRLF

git -c core.autocrlf=false => buildconf.bat contains CRLF
git -c core.autocrlf=false => README.md does not contain CRLF
git -c core.autocrlf=false => winbuild/README.md contains CRLF

git -c core.autocrlf=input => buildconf.bat contains CRLF
git -c core.autocrlf=input => README.md does not contain CRLF
git -c core.autocrlf=input => winbuild/README.md contains CRLF

git -c core.autocrlf=true => buildconf.bat contains CRLF
git -c core.autocrlf=true => README.md contains CRLF
git -c core.autocrlf=true => winbuild/README.md contains CRLF

This shows that the way Nix is handling git repositories is inconsistent with git, irrespective of the user and/or system git configuration.

Another manifestation of this bug is that the NAR hash for a git repository can change depending on the order of evaluations. This can be demonstrated by the following script:

#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail

nix --version
git clone --quiet https://github.com/joshuaspence/nix-crlf-test.git

rm -rf ~/.cache/nix
nix eval --impure --expr "builtins.fetchGit \"file://$(readlink -f nix-crlf-test)\""
nix eval --impure --expr "builtins.fetchGit { url = \"file://$(readlink -f nix-crlf-test)\"; ref = \"$(git -C nix-crlf-test branch --show-current)\"; rev = \"$(git -C nix-crlf-test rev-parse HEAD)\"; }"

rm -rf ~/.cache/nix
nix eval --impure --expr "builtins.fetchGit { url = \"file://$(readlink -f nix-crlf-test)\"; ref = \"$(git -C nix-crlf-test branch --show-current)\"; rev = \"$(git -C nix-crlf-test rev-parse HEAD)\"; }"
nix eval --impure --expr "builtins.fetchGit \"file://$(readlink -f nix-crlf-test)\""

rm -rf nix-crlf-test

The output of this script is as follows:

nix (Nix) 2.29.1
{ lastModified = 946684800; lastModifiedDate = "20000101000000"; narHash = "sha256-k7u7RAaF+OvrbtT3KCCDQA8e9uOdflUo5zSgsosoLzA="; outPath = "/nix/store/pbm7g5wjg44d1z7byaivhcs9rrv58fqf-source"; rev = "27fcdeab9b5edc4095160b6d9a15a5c5260bca38"; revCount = 2; shortRev = "27fcdea"; submodules = false; }
{ lastModified = 946684800; lastModifiedDate = "20000101000000"; narHash = "sha256-k7u7RAaF+OvrbtT3KCCDQA8e9uOdflUo5zSgsosoLzA="; outPath = "/nix/store/pbm7g5wjg44d1z7byaivhcs9rrv58fqf-source"; rev = "27fcdeab9b5edc4095160b6d9a15a5c5260bca38"; revCount = 2; shortRev = "27fcdea"; submodules = false; }
{ lastModified = 946684800; lastModifiedDate = "20000101000000"; narHash = "sha256-BBhuj+vOnwCUnk5az22PwAnF32KE1aulWAVfCQlbW7U="; outPath = "/nix/store/9vi7nc2507l5fjyd0cg6fgbrikncpjmw-source"; rev = "27fcdeab9b5edc4095160b6d9a15a5c5260bca38"; revCount = 2; shortRev = "27fcdea"; submodules = false; }
{ lastModified = 946684800; lastModifiedDate = "20000101000000"; narHash = "sha256-BBhuj+vOnwCUnk5az22PwAnF32KE1aulWAVfCQlbW7U="; outPath = "/nix/store/9vi7nc2507l5fjyd0cg6fgbrikncpjmw-source"; rev = "27fcdeab9b5edc4095160b6d9a15a5c5260bca38"; revCount = 2; shortRev = "27fcdea"; submodules = false; }

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added with-tests Issues related to testing. PRs with tests have some priority fetching Networking with the outside (non-Nix) world, input locking labels Sep 15, 2025
@roberth roberth self-assigned this Sep 24, 2025
@xokdvium
Copy link
Contributor

Discussed in team meeting today:

  • Preferred to flipping the default.
  • @roberth volunteers for review.

@xokdvium xokdvium moved this from Triage to 🏁 Review in Nix team Sep 24, 2025
@GrahamDennis
Copy link
Contributor Author

For awareness, I have built a second patch on top of this patch to "smooth over" the breaking change that was made between nix < 2.20 and nix >= 2.20 with the handling of .gitattributes. At a high-level this second patch:

  • If a NAR hash mismatch is identified using the modern treatment of .gitattributes (ignored) it tries again with the legacy behaviour (respect .gitattributes). This enables both "old" and "new" NAR hashes to be supported for reading transparently.
  • If creates an experimental feature, which when enabled causes new NAR hashes to be written with the old, legacy format. When disabled (default) NAR hashes are written in the new format.

This is intended to support a migration at Day Job where we can:

  • Read-only nix workflows can upgrade to this patched version of nix (from nix < 2.20) immediately.
  • Read-write nix workflows can upgrade to this patched version of nix with the experimental feature enabled to continue writing in the old legacy format. (During this window all NAR hashes remain in the old format)
  • After a declared deprecation period, we will then disable the experimental feature (now NAR hashes will be written in the new format)
  • After a declared deprecation period, everyone can then upgrade to an unpatched version of nix.

This additional patch is available here (for visibility): 2.31-maintenance...GrahamDennis:nix:gdennis/backcompat-2.31.2

If it is desired or considered more broadly useful, I can add this as a separate PR on top of this PR (assuming a version of this PR is accepted). My assumption is that this use case is sufficiently niche to not be worth merging into nix.

@xokdvium
Copy link
Contributor

xokdvium commented Nov 8, 2025

@roberth, do you plan on reviewing this PR still or should I take over?

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xokdvium I'd appreciate that, as I don't have the bandwidth right now for an in depth review.

My main remaining worry is that current or future libgit2 may apply smudge/clean filters as part of the current function call. Those are a significant risk in terms of fetcher reproducibility and therefore evaluation reproducibility; the reason why we only opt in to well-behaved filters like Git LFS and possibly others in the future.

This risk should be mitigated with another test case that sets up a custom smudge/clean filter and asserting that the filter did not run.

My apologies for the delay.

rm -rf $TEST_HOME/.cache/nix
export GIT_CONFIG_GLOBAL="$TEST_ROOT/gitconfig"
git config --global core.autocrlf true
narhash=$(nix eval --raw --impure --expr "(builtins.fetchGit { url = \"$repo\"; ref = \"master\"; }).narHash")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may succeed due to the fetcher cache.
Maybe run this as the very first test case
Then at least the error will be caught by the second run (without autocrlf), where it produces a hash mismatch.
This should be possible to mitigate with --tarball-ttl 0, but I haven't checked.

"publicKeys",
"url", "ref", "rev", "shallow", "submodules", "lfs", "exportIgnore",
"lastModified", "revCount", "narHash", "allRefs", "name", "dirtyRev", "dirtyShortRev",
"verifyCommit", "keytype", "publicKey", "publicKeys", "applyFilters",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, it's strange that this doesn't need a change to .clang-format.

git_blob_filter_options opts = GIT_BLOB_FILTER_OPTIONS_INIT;

opts.attr_commit_id = state->oid;
opts.flags = GIT_BLOB_FILTER_ATTRIBUTES_FROM_COMMIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opts.flags = GIT_BLOB_FILTER_ATTRIBUTES_FROM_COMMIT;
opts.flags = GIT_BLOB_FILTER_ATTRIBUTES_FROM_COMMIT | GIT_BLOB_FILTER_NO_SYSTEM_ATTRIBUTES;

For reproducibility, we don't want the filtering to depend on the user's configuration.

[[ "$narhash" = "sha256-k7u7RAaF+OvrbtT3KCCDQA8e9uOdflUo5zSgsosoLzA=" ]]


# Ensure that NAR hash doesn't depend on user configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised this succeeds given that GIT_BLOB_FILTER_NO_SYSTEM_ATTRIBUTES isn't currently passed.


nix eval --impure --expr "let attrs = builtins.fetchGit $empty; in assert attrs.lastModified != 0; assert attrs.rev != \"0000000000000000000000000000000000000000\"; assert attrs.revCount == 1; true"

# Test a repo with `eol=crlf`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we need to test here is what happens if a file has the text attribute but no eol attribute, e.g.

*.txt text

According to the git man page, it will use the system default, i.e. convert to CR/LF on Windows. That's bad because it means the NAR hash of a flake input will depend on the system type.

However, it's not clear to me what libgit2 does. It does have

#ifdef GIT_WIN32
	GIT_EOL_NATIVE = GIT_EOL_CRLF,
#else
	GIT_EOL_NATIVE = GIT_EOL_LF,
#endif
	GIT_EOL_DEFAULT = GIT_EOL_NATIVE,

but from what I can tell this might not have an effect it GIT_BLOB_FILTER_NO_SYSTEM_ATTRIBUTES is passed. But we should add a test that checks that such as repo produces the expected NAR hash.

opts.attr_commit_id = state->oid;
opts.flags = GIT_BLOB_FILTER_ATTRIBUTES_FROM_COMMIT;

int error = git_blob_filter(&filtered, blob.get(), path.rel_c_str(), &opts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One maybe theoretical concern is that this applies all the builtin libgit2 filters. So if a new filters is added to libgit2 in the future, a version of Nix linked against the new libgit2 might produce a different result than when it's linked against the old one.

It may be better to use git_filter_list_apply_to_blob because that allows specifying an explicit list of filters. However, I don't see a way to construct a git_filter_list containing only the expected filters...

[[ "$narhash" = "sha256-BBhuj+vOnwCUnk5az22PwAnF32KE1aulWAVfCQlbW7U=" ]]
narhash=$(nix eval --raw --impure --expr "(builtins.fetchGit { url = \"$repo\"; ref = \"master\"; applyFilters = true; }).narHash")
[[ "$narhash" = "sha256-k7u7RAaF+OvrbtT3KCCDQA8e9uOdflUo5zSgsosoLzA=" ]]
unset GIT_CONFIG_GLOBAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have a test here for the other builtin filter, ident, which replaces $Id$ in source files.

edolstra added a commit to DeterminateSystems/nix-src that referenced this pull request Nov 20, 2025
Taken from NixOS#13993.

Co-authored-by: Josh Spence <jspence@anduril.com>
Co-authored-by: Graham Dennis <gdennis@anduril.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fetching Networking with the outside (non-Nix) world, input locking with-tests Issues related to testing. PRs with tests have some priority

Projects

Status: 🏁 Review

Development

Successfully merging this pull request may close these issues.

NAR hash mismatch cloning git repository with crlfs introduced in 2.20

5 participants