Skip to content

libfetchers/git-utils: Avoid using git_writestream for small files#14689

Merged
Ericson2314 merged 3 commits intomasterfrom
tarball-cache-faster
Dec 2, 2025
Merged

libfetchers/git-utils: Avoid using git_writestream for small files#14689
Ericson2314 merged 3 commits intomasterfrom
tarball-cache-faster

Conversation

@xokdvium
Copy link
Contributor

@xokdvium xokdvium commented Dec 2, 2025

Motivation

It turns out that libgit2 is incredibly naive and each git_writestream creates
a new temporary file like .cache/nix/tarball-cache/objects/streamed_git2_6a82bb68dc0a3918
that it reads from afterwards. It doesn't do any internal buffering.

Doing (with a fresh fetcher cache) a simple:

strace -c nix flake metadata "https://releases.nixos.org/nixos/25.05/nixos-25.05.813095.1c8ba8d3f763/nixexprs.tar.xz" --store "dummy://?read-only=false"

(Before)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 31.05    2.372728           9    259790     81917 openat
 19.21    1.467784          30     48157           unlink
 10.43    0.796793           4    162898           getdents64
  7.75    0.592637           4    145969           read
  7.67    0.585976           3    177877           close
  7.11    0.543032           4    129970       190 newfstatat
  6.98    0.533211          10     48488           write
  4.09    0.312585           3     81443     81443 utimensat
  3.22    0.246158           3     81552           fstat

(After second commit)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 29.61    0.639393           3    162898           getdents64
 26.26    0.567119           3    163523     81934 openat
 12.50    0.269835           3     81848       207 newfstatat
 11.60    0.250429           3     81443     81443 utimensat
  9.82    0.212053           2     81593           close
  9.33    0.201390           2     81544           fstat
  0.18    0.003814           9       406        17 futex

(After third commit)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 33.39    0.646359           3    162898           getdents64
 29.34    0.567866           3    163523     81934 openat
 14.81    0.286739           3     81835       203 newfstatat
 10.98    0.212550           2     81593           close
 10.56    0.204458           2     81544           fstat
  0.15    0.002814           3       870           mmap

Context

Should improve the situation with #10683


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

Makes private functions static and removes dead code that was used
for fetching, but is currently dead.
It turns out that libgit2 is incredibly naive and each git_writestream creates
a new temporary file like .cache/nix/tarball-cache/objects/streamed_git2_6a82bb68dc0a3918
that it reads from afterwards. It doesn't do any internal buffering.

Doing (with a fresh fetcher cache) a simple:

strace -c nix flake metadata "https://releases.nixos.org/nixos/25.05/nixos-25.05.813095.1c8ba8d3f763/nixexprs.tar.xz" --store "dummy://?read-only=false"

(Before)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 31.05    2.372728           9    259790     81917 openat
 19.21    1.467784          30     48157           unlink
 10.43    0.796793           4    162898           getdents64
  7.75    0.592637           4    145969           read
  7.67    0.585976           3    177877           close
  7.11    0.543032           4    129970       190 newfstatat
  6.98    0.533211          10     48488           write
  4.09    0.312585           3     81443     81443 utimensat
  3.22    0.246158           3     81552           fstat

(After)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 29.61    0.639393           3    162898           getdents64
 26.26    0.567119           3    163523     81934 openat
 12.50    0.269835           3     81848       207 newfstatat
 11.60    0.250429           3     81443     81443 utimensat
  9.82    0.212053           2     81593           close
  9.33    0.201390           2     81544           fstat
  0.18    0.003814           9       406        17 futex
@xokdvium xokdvium requested a review from roberth December 2, 2025 01:50
@xokdvium xokdvium requested a review from edolstra as a code owner December 2, 2025 01:50
@github-actions github-actions bot added the fetching Networking with the outside (non-Nix) world, input locking label Dec 2, 2025
@xokdvium
Copy link
Contributor Author

xokdvium commented Dec 2, 2025

On my machine this shaves off a LOT of syscalls time with just nix flake metadata "https://releases.nixos.org/nixos/25.05/nixos-25.05.813095.1c8ba8d3f763/nixexprs.tar.xz" --store "dummy://?read-only=false" with a fresh tarball cache (rm ~/.cache/nix/ -rf):

(After)

        User time (seconds): 21.17
        System time (seconds): 1.24
        Percent of CPU this job got: 160%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.95

(Before)

        User time (seconds): 21.76
        System time (seconds): 5.19
        Percent of CPU this job got: 146%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:18.46

if (git_repository_open(Setter(repo), path.string().c_str()))
throw Error("opening Git repository %s: %s", path, git_error_last()->message);

/* Create a fresh object database because by default the repo also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this creating a fresh thing on disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, just a libgit2 object.

Copy link
Contributor Author

@xokdvium xokdvium Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is necessary to prevent libgit2 from looking up loose objects. We are just passing in the 2 backends that we care about: mempack and packfile. All the others can be disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extend the comment to indicate that we're just manually doing some creation of in-memory / ephemeral data structures?

Otherwise the reader may get the false impression that we're changing the .git dir in some way, and amplifying reads into writes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@xokdvium xokdvium force-pushed the tarball-cache-faster branch 3 times, most recently from 9175415 to dca1ad0 Compare December 2, 2025 03:05
@Ericson2314 Ericson2314 enabled auto-merge December 2, 2025 03:07
…tarball cache

Now the unnecessary utimensat syscalls from the previous commit
are completely gone:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 33.39    0.646359           3    162898           getdents64
 29.34    0.567866           3    163523     81934 openat
 14.81    0.286739           3     81835       203 newfstatat
 10.98    0.212550           2     81593           close
 10.56    0.204458           2     81544           fstat
  0.15    0.002814           3       870           mmap

The rather crazy amount of getdents64 is still there though.
@xokdvium xokdvium force-pushed the tarball-cache-faster branch from dca1ad0 to 1b2cb1d Compare December 2, 2025 03:10
@Ericson2314 Ericson2314 added this pull request to the merge queue Dec 2, 2025
Merged via the queue into master with commit e67c97b Dec 2, 2025
20 checks passed
@Ericson2314 Ericson2314 deleted the tarball-cache-faster branch December 2, 2025 04:39
@edolstra edolstra mentioned this pull request Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fetching Networking with the outside (non-Nix) world, input locking performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants