Just general ideas, about things.
Nix should:
- query substituters in parallel
- support a bulk query endpoint to avoid requests for each individual dependency
- use build times from Hydra to decide how to allocate jobs
- improve its remote build protocol to be aware of system load
- support GPUs as platforms -- enable Nix to understand GPUs as system platforms, with all the nuance that brings (can't necessarily run code built for one platform on another, forward compatability like with linux-v1, -v2, etc. but with NVIDIA PTX)
- have something akin to
brokenConditions
andbadPlatformsConditions
-- instead of settingbroken
orbadPlatforms
directly, have attribute sets in meta which map strings to booleans. The keys are explanations of why something is broken or unsupported, and the boolean value indicates whetherbroken
orbadPlatforms
should be set - A way to guard against evaluation under OfBorg or nixpkgs-review, both of which allow broken, that does not involve setting badPlatforms
- NixOS/nix#11712
- NixOS/nix#11683
- NixOS/nix#11642
- NixOS/nix#11557
- NixOS/nix#11555
- NixOS/nix#11359
- NixOS/nix#11268
- NixOS/nix#11228
- NixOS/nix#11161
- NixOS/nix#11101
- NixOS/nix#11001
- NixOS/nix#11744
- NixOS/nix#11746
- NixOS/nix#11719
- NixOS/nix#11506
- NixOS/nix#11373
- NixOS/nix#11294
- NixOS/nix#11143
- NixOS/nix#11130
- NixOS/nix#10937
- NixOS/nix#10590
- NixOS/nix#10511
- NixOS/nix#10505
- NixOS/nix#10280
- NixOS/nix#10218
- NixOS/nix#10201
- NixOS/nix#9967
- NixOS/nix#9895
- pennae mentions reducing the size of a Value to a single tagged pointer
- NixOS/nix#9551
- NixOS/nix#9429
- NixOS/nix#9287
- NixOS/nix#9145
- NixOS/nix#8585
- NixOS/nix#8105
- NixOS/nix#7247
- NixOS/nix#6855
Nix currently queries each substituter in sequence. This is inefficient, as it requires a round trip to each substituter for each path in the closure of dependencies. Instead, Nix should query all substituters in parallel, and then wait for all responses before continuing.
A large amount of traffic is generated by the way Nix queries substituters for binaries. Currently, we iterate through the closure of dependencies and then through each configured substituter. Ideally, having computed the transitive closure of dependencies, we fire off requests to the bulk-query endpoint of each substituter in parallel. This would avoid a number of HTTP HEAD
requests to HTTP binary caches, and potentially lessen the cost of maintaining a binary cache, assuming it is backed directly by S3 by reducing the number of API calls.
Prior art includes Attic (https://github.com/zhaofengli/attic), which has an endpoint to find out which paths are missing (https://github.com/zhaofengli/attic/blob/717cc95983cdc357bc347d70be20ced21f935843/server/src/api/v1/get_missing_paths.rs).
Sample tasklist:
- Gain an understanding of how the HTTP binary store protocol currently works
- Investigate prior art (Attic; potentially others)
- Identify stakeholders (e.g., Cachix, Garnix, Flox, Determinate Systems, NixOS Archivists and those with visibility into bandwidth usage for the main NixOS cache)
- Create an RFC for Nix, collaborating with stakeholders
- Shepard RFC through to approval
- Implement bulk API endpoint
Idea one: allocate memory in bulk — since lists are strict in their length, and attribute sets are strict in their keys. So for lists for example, the list builder would allocate a contiguous block of memory for pointers to values, and another contiguous block of memory for the values themselves. To implement that, I’d introduce a new allocValues
function in eval-inline.hh
which allocates multiple values and then increments the global counter for number of values. (allocValue
increments it for each call.)
Idea two: Looking at the builtins handling lists or attribute sets, it looks like there’s a fair amount of pointer arithmetic, referencing, and dereferencing going on inside for loops. With something like allocValues
allocating memory ahead of time (and handling incrementing the global variable for number of elements), I thought using OpenMP’s SIMD pragma on some of the for loops concerning pointers in the builtins might improve performance — provided the length of the list or attribute set is larger than some threshold.
Idea three: Regardless of bulk memory/ value allocation, I understand cache lines in modern processor architectures are very important. The current Value structure weighs in at 24 bytes on my x64 machine: 8 bytes for the enum with padding and 16 bytes for the actual payload. I’m curious if there would be benefits to getting it down to 16 bytes. I had thought of either using something like tagged pointers, though that introduces the need for bit twiddling and much stronger encapsulation than what Value current has. Another alternative was to keep the enum, but change the payload to 8 bytes: everything that can fit in the 8 bytes is inlined (Null, Boolean, Integer, Float, string without context (which is just a char pointer), empty/singleton list, etc.), while everything else is represented as a reference to that larger structure (path, string with context, lists of two or more elements, attribute sets, etc.). This avoids the need for bit twiddling, but introduces additional dereferences — I’m not sure how expensive those are on modern processors.
nixpkgs-review should:
- be able to skip packages which use
requireFile
as theirsrc
instead of reporing them as broken every time
Attic should:
- be refactored into a client, frontend (HTTP binary cache protocol), and backend (chunk store and NAR assembly)
- be able to run on Cloudflare via Workers and D1
- naive object storage (except maybe MinIO) will not be performant due to the number of small files being fetched simultaneously https://blog.min.io/challenge-big-data-small-files/