libfetchers/git-utils: Avoid using git_writestream for small files#14689
libfetchers/git-utils: Avoid using git_writestream for small files#14689Ericson2314 merged 3 commits intomasterfrom
Conversation
Makes private functions static and removes dead code that was used for fetching, but is currently dead.
It turns out that libgit2 is incredibly naive and each git_writestream creates a new temporary file like .cache/nix/tarball-cache/objects/streamed_git2_6a82bb68dc0a3918 that it reads from afterwards. It doesn't do any internal buffering. Doing (with a fresh fetcher cache) a simple: strace -c nix flake metadata "https://releases.nixos.org/nixos/25.05/nixos-25.05.813095.1c8ba8d3f763/nixexprs.tar.xz" --store "dummy://?read-only=false" (Before) % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ------------------ 31.05 2.372728 9 259790 81917 openat 19.21 1.467784 30 48157 unlink 10.43 0.796793 4 162898 getdents64 7.75 0.592637 4 145969 read 7.67 0.585976 3 177877 close 7.11 0.543032 4 129970 190 newfstatat 6.98 0.533211 10 48488 write 4.09 0.312585 3 81443 81443 utimensat 3.22 0.246158 3 81552 fstat (After) % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ------------------ 29.61 0.639393 3 162898 getdents64 26.26 0.567119 3 163523 81934 openat 12.50 0.269835 3 81848 207 newfstatat 11.60 0.250429 3 81443 81443 utimensat 9.82 0.212053 2 81593 close 9.33 0.201390 2 81544 fstat 0.18 0.003814 9 406 17 futex
|
On my machine this shaves off a LOT of syscalls time with just (After) (Before) |
src/libfetchers/git-utils.cc
Outdated
| if (git_repository_open(Setter(repo), path.string().c_str())) | ||
| throw Error("opening Git repository %s: %s", path, git_error_last()->message); | ||
|
|
||
| /* Create a fresh object database because by default the repo also |
There was a problem hiding this comment.
Is this creating a fresh thing on disk?
There was a problem hiding this comment.
Nope, just a libgit2 object.
There was a problem hiding this comment.
So this is necessary to prevent libgit2 from looking up loose objects. We are just passing in the 2 backends that we care about: mempack and packfile. All the others can be disabled.
There was a problem hiding this comment.
Maybe extend the comment to indicate that we're just manually doing some creation of in-memory / ephemeral data structures?
Otherwise the reader may get the false impression that we're changing the .git dir in some way, and amplifying reads into writes.
9175415 to
dca1ad0
Compare
…tarball cache Now the unnecessary utimensat syscalls from the previous commit are completely gone: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ------------------ 33.39 0.646359 3 162898 getdents64 29.34 0.567866 3 163523 81934 openat 14.81 0.286739 3 81835 203 newfstatat 10.98 0.212550 2 81593 close 10.56 0.204458 2 81544 fstat 0.15 0.002814 3 870 mmap The rather crazy amount of getdents64 is still there though.
dca1ad0 to
1b2cb1d
Compare
Motivation
It turns out that libgit2 is incredibly naive and each git_writestream creates
a new temporary file like .cache/nix/tarball-cache/objects/streamed_git2_6a82bb68dc0a3918
that it reads from afterwards. It doesn't do any internal buffering.
Doing (with a fresh fetcher cache) a simple:
strace -c nix flake metadata "https://releases.nixos.org/nixos/25.05/nixos-25.05.813095.1c8ba8d3f763/nixexprs.tar.xz" --store "dummy://?read-only=false"(Before)
(After second commit)
(After third commit)
Context
Should improve the situation with #10683
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.