Skip to content

libstore: Optimize BinaryCacheStore::queryValidPaths#14877

Open
Zaczero wants to merge 1 commit intoNixOS:masterfrom
Zaczero:zaczero/zesty-tiger
Open

libstore: Optimize BinaryCacheStore::queryValidPaths#14877
Zaczero wants to merge 1 commit intoNixOS:masterfrom
Zaczero:zaczero/zesty-tiger

Conversation

@Zaczero
Copy link
Member

@Zaczero Zaczero commented Dec 28, 2025

Store::queryValidPaths() uses queryPathInfo() per path. For HTTP binary caches, that means GET hash.narinfo and parsing it. Override BinaryCacheStore::queryValidPaths() to use isValidPath() instead. For binary caches, isValidPathUncached() is already implemented in terms of fileExists(hash.narinfo); HttpBinaryCacheStore::fileExists() uses HEAD probes rather than GET+parse.

Motivation

Context


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Dec 28, 2025
@tomberek tomberek added the idea approved The given proposal has been discussed and approved by the Nix team. An implementation is welcome. label Dec 29, 2025
@xokdvium
Copy link
Contributor

xokdvium commented Jan 1, 2026

Using an ad-hoc thread-pool like this doesn't make much sense, since the libcurl is already single-threaded. Something more async would make more sense. I have #14786 up, but it might need a bit more work. Would be nice if we had a more general approach to this than sprinkling ad hoc threading everywhere.

@Zaczero
Copy link
Member Author

Zaczero commented Jan 1, 2026

@xokdvium I believe I did notice that and I agree the broader libcurl rework is ideal. This PR is an easy win because without specialization, it already uses multiple threads for networking but instead of going through GET+parse it just does HEADs. So it's not like this change adds more threading.

@xokdvium
Copy link
Contributor

xokdvium commented Jan 1, 2026

But networking is still single-threaded - it's only the dispatching that's parallel (and that should just be async). Our filetransfer code runs on a singleton curlFileTransfer that only has a single worker thread. The bottleneck here is not the networking, but the usage of blocking completion handlers.

@xokdvium
Copy link
Contributor

xokdvium commented Jan 1, 2026

broader libcurl rework is ideal

TBH there isn't much to rework in filetransfer. It's more about making the downstream usages more ergonomic and efficient by avoiding the usage of blocking completion handlers and instead bridging the callback-based handlers into whatever is more appropriate to use for these sort of fork-style concurrency without paying the absolutely unnecessary overhead of spawning threads (though yes, it's quite low), but we are not going to get anywhere if we keep on sprinkling thread pools everywhere that we can't be bothered to avoid using blocking handlers. Ideally, all the download(), upload() variants would just be removed or they could be reworked to return an awaitable.

@Zaczero
Copy link
Member Author

Zaczero commented Jan 1, 2026

The bottleneck here is not the networking, but the usage of blocking completion handlers.

Let me repeat that the point of this is to perform HEADs instead of GET+parse. The pooling code is mostly copied over from the existing, non-specializing implementation.

@xokdvium
Copy link
Contributor

xokdvium commented Jan 1, 2026

The pooling code is mostly copied over from the existing, non-specializing implementation.

That code is a hack and we shouldn't spread it further.

Let me repeat that the point of this is to perform HEADs instead of GET+parse.

FWIW the GET does have the benefit of filling the on-disk negative (and positive) pathinfo cache, so that quite minimal cost is paid just once. I think there's general preference to query path infos and not just existence checks for this reason: #14769.

@Zaczero
Copy link
Member Author

Zaczero commented Jan 1, 2026

FWIW the GET does have the benefit of filling the on-disk negative (and positive) pathinfo cache

Ye I saw that but from at a glance check it didn't benefit all queryValidPaths call sites. If that's wanted maybe using HEADs is a mistake in the first place? Maybe this PR should be other way around? Removing HEADs and replacing with GETs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

idea approved The given proposal has been discussed and approved by the Nix team. An implementation is welcome. store Issues and pull requests concerning the Nix store

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants