Skip to content

libutil/tarfile: Create the scratch std::vector only once#13175

Merged
Mic92 merged 1 commit intoNixOS:masterfrom
xokdvium:optimise-fetchtarball
May 13, 2025
Merged

libutil/tarfile: Create the scratch std::vector only once#13175
Mic92 merged 1 commit intoNixOS:masterfrom
xokdvium:optimise-fetchtarball

Conversation

@xokdvium
Copy link
Contributor

@xokdvium xokdvium commented May 13, 2025

Motivation

AFAICT there is no reason to keep a copy of the data since it always gets fed into the sink and there are no coroutines/threads in sight.

I can't find a good way to benchmark in isolation from the git cache, but common sense dictates that creating (and destroying) a 131KiB std::vector for each 131KiB chunk of a regular file from the archive imposes quite a significant overhead regardless of the IO bound git cache.

cc @edolstra git blame points me to #9485 for when this was first introduced. Since then code was moved quite a bit by @Ericson2314. Is there some reason this was done this way and stuff just got lost during refactorings?

Context

Should help a bit with #10683


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

I can't find a good way to benchmark in isolation from the
git cache, but common sense dictates that creating (and destroying)
a 131KiB std::vector for each regular file from the archive imposes
quite a significant overhead regardless of the IO bound git cache.

AFAICT there is no reason to keep a copy of the data since
it always gets fed into the sink and there are no coroutines/threads
in sight.
@xokdvium xokdvium requested a review from edolstra as a code owner May 13, 2025 11:07
@Mic92
Copy link
Member

Mic92 commented May 13, 2025

Don't see a good reason.

@Mic92 Mic92 merged commit f0f196c into NixOS:master May 13, 2025
13 checks passed
@xokdvium
Copy link
Contributor Author

For reference:

A simple patch for ad-hoc measurements:

diff --git a/src/libfetchers/tarball.cc b/src/libfetchers/tarball.cc
index 1bd7e3e59..c5934569d 100644
--- a/src/libfetchers/tarball.cc
+++ b/src/libfetchers/tarball.cc
@@ -10,6 +10,8 @@
 #include "nix/store/store-api.hh"
 #include "nix/fetchers/git-utils.hh"
 
+#include <chrono>
+
 namespace nix::fetchers {
 
 DownloadFileResult downloadFile(
@@ -171,8 +173,12 @@ static DownloadTarballResult downloadTarball_(
         : TarArchive{*source};
     auto tarballCache = getTarballCache();
     auto parseSink = tarballCache->getFileSystemObjectSink();
+    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
     auto lastModified = unpackTarfileToSink(archive, *parseSink);
     auto tree = parseSink->flush();
+    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
+    debug("time in unpackTarfileToSink: %s us", std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count());
 
     act.reset();

For the cases when the cache is not empty fetching new nixpkgs revisions (not cached fully) yield in the ballpark of ~150ms improvement (out of ~7s) for .zip tarball input pretty consistently on my machine.

@xokdvium xokdvium deleted the optimise-fetchtarball branch May 13, 2025 12:37
@Ericson2314
Copy link
Member

Ericson2314 commented May 13, 2025

Thanks! Yes I was not intentionally reallocating from me. Thanks for finding!

@roberth roberth added backports created Does not require attention and can be filtered away backport 2.28-maintenance Automatically creates a PR against the branch labels Jul 30, 2025
mergify bot added a commit that referenced this pull request Jul 30, 2025
…3175

libutil/tarfile: Create the scratch `std::vector` only once (backport #13175)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.28-maintenance Automatically creates a PR against the branch backports created Does not require attention and can be filtered away

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants