Multithreaded evaluator #125

edolstra · 2025-06-23T14:54:15Z

Motivation

Updated version of NixOS#10938.

Context

Summary by CodeRabbit

New Features
- Experimental parallel evaluation: thread-pool executor, eval-cores setting, new __parallel primitive; parallelized flake checks, flake show, search, and JSON/XML rendering.
Performance
- Memory-aware GC sizing and stack fixes; evaluator threading; Boost thread support; default remote-store max connections increased to 64.
UX Improvements
- Better rendering for failed values in text/JSON/XML; improved error-location accuracy and diagnostics.
CI/Tests
- Expanded CI matrix and richer diagnostic output; updated functional tests for adjusted error locations.

This is a mapping from paths to "resolved" paths (i.e. with `default.nix` added, if appropriate). `fileParseCache` and `fileEvalCache` are now keyed on the resolved path *only*.

Previously, the optimistic concurrency approach in `evalFile()` meant that a `nix search nixpkgs ^` would do hundreds of duplicated parsings/evaluations. Now, we reuse the thunk locking mechanism to ensure it's done only once.

This refactoring allows the symbol table to be stored as something other than std::strings.

This allows symbol IDs to be offsets into an arena whose base offset never moves, and can therefore be dereferenced without any locks.

This makes it less likely that we concurrently execute tasks that would block on a common subtask, e.g. evaluating `libfoo` and `libfoo_variant` are likely to have common dependencies.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

src/libexpr/eval-gc.cc (1)

99-106: Restore pthread_attr_get_np fallback; current #error breaks BSD/non-glibc builds

The FreeBSD/non-glibc path is hard-erroring. Implement the pthread_attr_get_np branch with proper error checks.

Apply:

-#    ifdef HAVE_PTHREAD_GETATTR_NP
-    if (pthread_getattr_np(pthread_id, &pattr))
-        throw Error("fixupBoehmStackPointer: pthread_getattr_np failed");
-#    else
-#      error "Need  `pthread_attr_get_np`"
-#    endif
+#    ifdef HAVE_PTHREAD_GETATTR_NP
+    if (pthread_getattr_np(pthread_id, &pattr))
+        throw Error("fixupBoehmStackPointer: pthread_getattr_np failed");
+#    elif defined(HAVE_PTHREAD_ATTR_GET_NP)
+    if (pthread_attr_get_np(pthread_id, &pattr))
+        throw Error("fixupBoehmStackPointer: pthread_attr_get_np failed");
+#    else
+#      error "Need pthread_getattr_np or pthread_attr_get_np"
+#    endif

src/libexpr/include/nix/expr/symbol-table.hh (2)

240-244: Add bounds and alignment checks in operator[] and launder the pointer

Prevent out-of-bounds access and misaligned loads; also use std::launder for safety after placement construction.

     SymbolStr operator[](Symbol s) const
     {
-        assert(s.id);
-        return SymbolStr(*reinterpret_cast<const SymbolValue *>(arena.data + s.id));
+        assert(s.id);
+        // Basic integrity checks
+        assert(s.id < arena.size); // offset within arena
+        assert((s.id % alignment) == 0); // start of an aligned record
+        assert(s.id + sizeof(SymbolValue) <= arena.size); // header fits
+        auto p = std::launder(reinterpret_cast<const SymbolValue *>(arena.data + s.id));
+        return SymbolStr(*p);
     }

256-273: Fix dump() record-advance logic to account for header in alignment calculation

Current padding uses only the string length; it should include the Value header to remain aligned. Otherwise, iteration can desynchronize and hit asserts or skip symbols.

     void dump(T callback) const
     {
         std::string_view left{arena.data, arena.size};
         left = left.substr(alignment);
         while (true) {
             if (left.empty())
                 break;
             left = left.substr(sizeof(Value));
             auto p = left.find('\0');
             assert(p != left.npos);
             auto sym = left.substr(0, p);
             callback(sym);
             // skip alignment padding
-            auto n = sym.size() + 1;
-            left = left.substr(n + (n % alignment ? alignment - (n % alignment) : 0));
+            auto n = sizeof(Value) + sym.size() + 1; // header + string + NUL
+            auto alignedSize = ((n + alignment - 1) / alignment) * alignment;
+            left = left.substr(alignedSize - sizeof(Value));
         }
     }

🧹 Nitpick comments (11)

src/libexpr/eval-gc.cc (4)
38-40: Fix comment: /proc/meminfo, not /proc/cpuinfo

The comment references cpuinfo but the code reads meminfo.

Apply:
-    /* On Linux, use the `MemAvailable` or `MemFree` fields from
-       /proc/cpuinfo. */
+    /* On Linux, use the `MemAvailable` or `MemFree` fields from
+       /proc/meminfo. */
41-59: Harden Linux meminfo parsing against read failures

readFile("/proc/meminfo") can throw; let’s fall back instead of propagating.

Apply:
-    {
+    try {
         std::unordered_map<std::string, std::string> fields;
         for (auto & line :
              tokenizeString<std::vector<std::string>>(readFile(std::filesystem::path("/proc/meminfo")), "\n")) {
             auto colon = line.find(':');
             if (colon == line.npos)
                 continue;
             fields.emplace(line.substr(0, colon), trim(line.substr(colon + 1)));
         }

         auto i = fields.find("MemAvailable");
         if (i == fields.end())
             i = fields.find("MemFree");
         if (i != fields.end()) {
             auto kb = tokenizeString<std::vector<std::string>>(i->second, " ");
             if (kb.size() == 2 && kb[1] == "kB")
                 return string2Int<size_t>(kb[0]).value_or(0) * 1024;
         }
-    }
+    } catch (const std::exception &) {
+        /* Fall through to generic sysconf-based fallback below. */
+    }
151-159: Comment/code mismatch (80% vs 50%)

Comment says 80% but code uses ~50% of free RAM.

Apply one of:

Update comment to 50%:
-    /* Set the initial heap size to something fairly big (80% of
+    /* Set the initial heap size to something fairly big (50% of
        free RAM, up to a maximum of 4 GiB) so that in most cases
Or change code to 80% (if intended).

164-166: Avoid float math; use integer halving

free * 0.5 promotes to double and back. Use integer division for precision.

Apply:
-        auto free = getFreeMem();
-        size = std::max(size, std::min((size_t) (free * 0.5), maxSize));
+        auto free = getFreeMem();
+        auto halfFree = free / 2;
+        size = std::max(size, std::min(halfFree, maxSize));
src/libexpr/include/nix/expr/symbol-table.hh (7)
27-40: Make arena.data writable without const_casts and keep the pointer itself const

Construction writes through data; using const here forces const_cast elsewhere. Prefer a const pointer to mutable memory.
-    const char * data;
+    char * const data;
Follow-up: drop const_casts at call sites (e.g., SymbolStr::SymbolStr in symbol-table.cc).

32-36: Confirm cache-line isolation intent is achieved

aligned(64) on the atomic ensures its start is 64B-aligned, but only if the compiler inserts padding before it. Given preceding fields, this is likely fine, but please verify with static_asserts or sizeof/offsetof in a unit test.

Example check:

Expect offsetof(ContiguousArena, size) % 64 == 0.

Optionally add alignas(64) to the struct and reorder members if needed.

101-109: Key stores arena by reference; ensure lifetime and thread-safety are clear

Key captures ContiguousArena& and is used inside a concurrent container; document that arena outlives the set and is append-only. If not guaranteed, consider a pointer.

139-149: size()/empty() are O(n) due to std::string_view length computation

This is fine if not hot; otherwise consider storing the length next to the string to avoid repeated scans.

219-225: Constructor reserves ID 0; also assert arena invariants

Add cheap asserts to catch early misuse.
     SymbolTable()
         : arena(1 << 30)
     {
         // Reserve symbol ID 0 and ensure alignment of the first allocation.
         arena.allocate(alignment);
+        assert((uintptr_t)arena.data % alignment == 0);
+        assert(arena.size == alignment);
     }
251-254: Atomic load semantics for totalSize()

If used for metrics only, relaxed is fine and cheaper; otherwise use acquire. Make it explicit.
-    size_t totalSize() const
-    {
-        return arena.size;
-    }
+    size_t totalSize() const
+    {
+        return arena.size.load(std::memory_order_relaxed);
+    }
22-24: Consistent access: consider using Value APIs or document the aliasing

Returning a string_view by reinterpreting (this + 1) is correct for your layout but bypasses Value’s accessor; add a comment asserting that mkString stored the same pointer and that the trailing bytes remain immutable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3d8127c and 3425c5d.

📒 Files selected for processing (4)

src/libexpr/eval-gc.cc (3 hunks)
src/libexpr/include/nix/expr/counter.hh (1 hunks)
src/libexpr/include/nix/expr/symbol-table.hh (7 hunks)
src/libexpr/meson.build (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/libexpr/include/nix/expr/counter.hh
src/libexpr/meson.build

🧰 Additional context used

🧬 Code graph analysis (2)

src/libexpr/eval-gc.cc (2)

src/libutil/file-system.cc (7)

readFile (251-264)

readFile (251-251)

readFile (266-269)

readFile (266-266)

readFile (271-298)

readFile (271-271)

path (850-850)

src/libutil/unix/file-descriptor.cc (2)

readFile (32-39)

readFile (32-32)

src/libexpr/include/nix/expr/symbol-table.hh (2)

src/libexpr/include/nix/expr/value.hh (14)

s (1143-1146)

s (1143-1143)

s (1148-1148)

s (1150-1150)

s (1152-1152)

p (1199-1199)

n (205-208)

n (205-205)

n (1128-1131)

n (1128-1128)

n (1133-1136)

n (1133-1133)

n (1216-1219)

n (1216-1216)

src/libexpr/symbol-table.cc (1)

SymbolStr (45-61)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build_aarch64-darwin / build
GitHub Check: build_x86_64-darwin / build
GitHub Check: build_x86_64-linux / build
GitHub Check: build_aarch64-linux / build

🔇 Additional comments (2)

src/libexpr/eval-gc.cc (2)

82-84: Confirm assumption: no GC roots in coroutines

Assuming no GC roots live on coroutine stacks is risky. Please confirm or document invariants/tests ensuring this remains true under parallel eval.

148-150: SP corrector hook: LGTM

Hooking and asserting the corrector is installed looks good.

src/libexpr/eval-gc.cc

src/libexpr/include/nix/expr/symbol-table.hh

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (4)

src/libexpr/include/nix/expr/value.hh (4)
440-542: Complex concurrency design requires careful analysis.

The atomic p0 and non-atomic p1 design with the ordering constraint ("p1 must be written before p0") is a sophisticated approach to lock-free value updates. However, this creates several concerns:

Race condition potential: The requirement that "p1 must always be updated before p0" (Line 459) must be strictly enforced across all code paths.

Memory ordering: The use of memory_order_release for p0 writes and memory_order_acquire for reads is correct, but the implementation must be consistent.

Debug output in production: Lines 539-541 contain debug output that may not be appropriate for production builds.

Apply this fix for the debug output:
-        else if (pd == pdThunk) {
-            printError("BAD FINISH %x", this);
-            unreachable();
-        }
+        else if (pd == pdThunk)
+            unreachable();
1308-1312: Race condition warning for thunk accessor.

The comment "FIXME: remove this since reading it is racy" correctly identifies that accessing thunk data is unsafe in a multithreaded context. This method should be marked deprecated or removed.

Consider deprecating this method or adding runtime checks:
+    [[deprecated("Reading thunk data is racy in multithreaded evaluation")]]
     ClosureThunk thunk() const noexcept
1319-1323: Race condition warning for app accessor.

The comment "FIXME: remove this since reading it is racy" correctly identifies the same threading issue as with the thunk accessor. This should also be addressed.

781-791: Testing-only blackhole function should be restricted.

The mkBlackhole() function is marked as "only used for testing" but is in the public interface. This could be accidentally used in production code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3425c5d and 2a001a8.

📒 Files selected for processing (1)

src/libexpr/include/nix/expr/value.hh (17 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/libexpr/include/nix/expr/value.hh (2)

src/libexpr/parallel-eval.cc (4)

notifyWaiters (248-253)

notifyWaiters (248-248)

waitOnThunk (186-245)

waitOnThunk (186-186)

src/libexpr/include/nix/expr/eval.hh (24)

v (536-539)

v (536-536)

v (541-541)

v (547-547)

v (552-552)

v (553-553)

v (554-554)

v (556-556)

v (561-561)

v (565-565)

v (566-566)

v (567-572)

v (573-573)

v (592-592)

v (757-757)

v (842-842)

pos (594-595)

pos (611-618)

pos (641-641)

pos (646-646)

pos (651-655)

pos (672-672)

pos (786-786)

pos (932-932)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build_aarch64-darwin / build
GitHub Check: build_aarch64-linux / build
GitHub Check: build_x86_64-darwin / build
GitHub Check: build_x86_64-linux / build

🔇 Additional comments (16)

src/libexpr/include/nix/expr/value.hh (16)

336-336: LGTM! Added Failed payload type mapping.

The macro definition correctly includes the Failed payload type in the storage system.

544-572: LGTM! Proper memory ordering in payload setters.

The payload setting functions correctly use the finish() method to ensure proper memory ordering. The bit manipulation and alignment assertions provide good safety checks.

587-621: LGTM! Efficient internal type decoding.

The getInternalType() function efficiently decodes the bit-packed discriminator information. The switch statement correctly handles all primary discriminator cases.

623-641: LGTM! Clean macro-based payload accessors.

The macro generates appropriate getter/setter pairs for pointer-based payloads, maintaining the correct memory ordering through the established payload setter functions.

643-698: LGTM! Consistent payload access patterns.

The storage getters correctly extract data from p1 for single-word payloads and use proper bit manipulation for pointer-based payloads.

695-698: LGTM! Failed payload accessor.

The Failed pointer getter is implemented consistently with other pointer-based payload types.

750-753: LGTM! Failed payload setter.

The setter correctly uses setSingleDWordPayload for storing the Failed pointer.

793-807: LGTM! Forward declarations for multithreaded operations.

The forward declarations for force, waitOnThunk, and notifyWaiters establish the interface for the multithreaded evaluation system.

810-814: LGTM! Template specialization declarations.

The explicit template specializations ensure that the multithreaded operations are properly instantiated for the target pointer size.

1028-1032: LGTM! Efficient finished check.

The isFinished() method provides a clean interface to check if a value is in weak-head normal form.

1048-1052: LGTM! Updated blackhole detection.

The isBlackhole() implementation correctly identifies pending and awaited states as blackhole conditions in the new state machine.

1070-1073: LGTM! Failed value predicate.

The isFailed() method provides a clean interface to check for failed values.

1079-1116: LGTM! Updated type mapping with failure support.

The type() method correctly maps the new tFailed internal type to nFailed and handles the new pending/awaited states as thunk types.

1221-1224: LGTM! Failed value constructor.

The mkFailed() method correctly creates a Failed value with the current exception. The use of std::current_exception() is appropriate for capturing the active exception.

1244-1251: LGTM! Updated trivial value detection.

The isTrivial() documentation correctly reflects that finished values are considered trivial in the new evaluation model.

1335-1338: LGTM! Failed value accessor.

The failed() method provides clean access to the Failed payload, consistent with other payload accessors.

src/libexpr/include/nix/expr/value.hh

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

src/libexpr/include/nix/expr/value.hh (1)

572-585: Release-store in setThunkPayload is correct.

This now preserves the “p1-before-p0” ordering without spurious wakeups for awaited states. Thanks for addressing the earlier fence concern.

🧹 Nitpick comments (3)

src/libexpr/include/nix/expr/eval.hh (1)

377-381: Good move to SharedSync for caches; consider similar treatment for remaining maps.

lookupPathResolved, primOpCalls, and attrSelects are still marked non-thread-safe. Consider SharedSync or per-thread aggregation with periodic merge to avoid lock contention.

Also applies to: 392-393, 398-398

src/libexpr/include/nix/expr/value.hh (2)

1217-1221: Failed payload: confirm GC ownership and lifetime.

mkFailed() allocates with new on a gc-derived type. Please confirm this is traced by the GC in both Boehm and non-Boehm builds and that any printers/visitors handle nFailed without dereferencing a null ex.

Optionally add a brief docstring to ValueType::nFailed and Value::failed() clarifying semantics.

Also applies to: 1331-1335

1044-1049: Expose racy accessors only where safe.

isBlackhole() is fine. The “FIXME: remove this since reading it is racy” for thunk()/app() remains—consider #ifdef UNIT_TESTS or moving to a test-only header to prevent accidental use.

Also applies to: 1304-1309, 1315-1320

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2a001a8 and 92788a0.

📒 Files selected for processing (4)

src/libexpr/eval-gc.cc (3 hunks)
src/libexpr/include/nix/expr/eval.hh (10 hunks)
src/libexpr/include/nix/expr/symbol-table.hh (7 hunks)
src/libexpr/include/nix/expr/value.hh (17 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/libexpr/include/nix/expr/symbol-table.hh
src/libexpr/eval-gc.cc

🧰 Additional context used

🧬 Code graph analysis (2)

src/libexpr/include/nix/expr/value.hh (2)

src/libexpr/parallel-eval.cc (4)

notifyWaiters (248-253)

notifyWaiters (248-248)

waitOnThunk (186-245)

waitOnThunk (186-186)

src/libexpr/include/nix/expr/eval.hh (24)

v (537-540)

v (537-537)

v (542-542)

v (548-548)

v (553-553)

v (554-554)

v (555-555)

v (557-557)

v (562-562)

v (566-566)

v (567-567)

v (568-573)

v (574-574)

v (593-593)

v (758-758)

v (843-843)

pos (595-596)

pos (612-619)

pos (642-642)

pos (647-647)

pos (652-656)

pos (673-673)

pos (787-787)

pos (933-933)

src/libexpr/include/nix/expr/eval.hh (2)

src/libexpr/primops.cc (2)

struct State (4312-4315)

re (4310-4334)

src/libexpr/eval.cc (3)

EvalState (378-380)

EvalState (187-375)

eval (1758-1784)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build_aarch64-darwin / build
GitHub Check: build_x86_64-darwin / build
GitHub Check: build_x86_64-linux / build

🔇 Additional comments (3)

src/libexpr/include/nix/expr/eval.hh (3)

193-193: RegexCache ownership change looks good; verify all call sites.

Return type moved to ref<RegexCache> and the member matches. Please confirm all makeRegexCache() callers were updated and no lingering shared_ptr assumptions remain.

Also applies to: 408-408

537-540: Inline forceValue forwarding is correct.

780-781: Per-thread callDepth via thread_local is appropriate.

coderabbitai · 2025-09-02T14:55:49Z

src/libexpr/include/nix/expr/eval.hh

+    ref<Executor> executor;
+


💡 Verification agent

❓ Verification inconclusive

Ensure executor is always non-null (avoid nullptr deref).

waitOnThunk() uses state.executor->evalCores; if executor is unset you'll crash. Enforce non-null in the constructor (initialize a single-threaded default) or guard all uses.

Would you like a patch that wires a default Executor{evalCores=1} in EvalState’s ctor?

Initialize executor in EvalState’s constructor
Ensure executor is never null by assigning a default single-threaded Executor (e.g. with evalCores=1) in the EvalState ctor to avoid null-pointer deref in waitOnThunk().

🤖 Prompt for AI Agents

In src/libexpr/include/nix/expr/eval.hh around lines 227-228, EvalState's member ref<Executor> executor is not initialized; update the EvalState constructor to create and assign a default single-threaded Executor (e.g. construct Executor with evalCores=1) so executor is never null and waitOnThunk() cannot dereference a null pointer. Instantiate the Executor with appropriate ownership semantics and assign it to the executor field in the ctor, ensuring any required headers/params are provided.

src/libexpr/include/nix/expr/eval.hh

src/libexpr/include/nix/expr/value.hh

When useMaster is true, startMaster() acquires the state lock, then calls isMasterRunning(), which calls addCommonSSHOpts(), which tries to acquire the state lock again, causing a deadlock. The solution is to move tmpDir out of the state. It doesn't need to be there in the first place because it never changes.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/libstore/ssh.cc (2)
118-125: Control master check ignores custom -S path; can skip starting our master and then use a non-existent socket

isMasterRunning() doesn’t pass the control socket path, but startMaster() later returns state->socketPath and callers use “-S ”. If a user-level master exists (default ControlPath), you’ll return our tmpDir path even though no master is listening there.

Apply these changes to scope the check to our socket:
-bool SSHMaster::isMasterRunning()
+bool SSHMaster::isMasterRunning(Path controlPath)
 {
     Strings args = {"-O", "check", hostnameAndUser};
     addCommonSSHOpts(args);
+    if (!controlPath.empty())
+        args.insert(args.end(), {"-S", controlPath});
And in the header:
-    bool isMasterRunning();
+    bool isMasterRunning(Path controlPath = "");
And in startMaster (see below).

249-253: Pass the socket path when checking master state

Tie the check to the socket we’ll actually use.
-    if (isMasterRunning())
+    if (isMasterRunning(state->socketPath))
         return state->socketPath;

🧹 Nitpick comments (2)

src/libstore/include/nix/store/ssh.hh (1)
30-31: Make tmpDir immutable and document lifetime/thread-safety

tmpDir is effectively write-once; marking it const prevents accidental reassignment and clarifies intent. Also, because Connections don’t hold a ref to tmpDir, verify SSHMaster outlives any in-flight ssh child using files under tmpDir.
-    ref<AutoDelete> tmpDir;
+    const ref<AutoDelete> tmpDir;
If you want to make usage clearer and avoid repeated recomputation of paths, consider adding:

a cached known-hosts file path (e.g., Path knownHostsFile;)

a cached control socket path (e.g., Path controlSocket;)
src/libstore/ssh.cc (1)
241-242: Use consistent path joining; avoid deref/cast of ref

Dereferencing tmpDir and C-style casting to Path is brittle. Use the same path API you used above.
-    state->socketPath = (Path) *tmpDir + "/ssh.sock";
+    state->socketPath = (tmpDir->path() / "ssh.sock").string();

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 92788a0 and 4b34658.

📒 Files selected for processing (2)

src/libstore/include/nix/store/ssh.hh (2 hunks)
src/libstore/ssh.cc (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/libstore/ssh.cc (2)

src/libutil/include/nix/util/file-system.hh (1)

createTempDir (330-330)

src/libstore/include/nix/store/ssh.hh (2)

args (42-42)

getNixSshOpts (12-12)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build_x86_64-darwin / build
GitHub Check: build_aarch64-linux / build
GitHub Check: build_x86_64-linux / build
GitHub Check: build_aarch64-darwin / build

🔇 Additional comments (2)

src/libstore/include/nix/store/ssh.hh (1)

4-4: Include looks correct and necessary

Adding ref.hh is appropriate for the new ref<...> member.

src/libstore/ssh.cc (1)

87-88: Good: secure temp dir creation

Creating the tmp dir at construction with 0700 is correct and avoids races.

coderabbitai · 2025-09-03T13:44:39Z

src/libstore/ssh.cc

    if (!keyFile.empty())
        args.insert(args.end(), {"-i", keyFile});
    if (!sshPublicHostKey.empty()) {
-        std::filesystem::path fileName = state->tmpDir->path() / "host-key";
+        std::filesystem::path fileName = tmpDir->path() / "host-key";
        writeFile(fileName.string(), authority.host + " " + sshPublicHostKey + "\n");
        args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()});
    }


🛠️ Refactor suggestion

⚠️ Potential issue

Known_hosts line is malformed (missing key type and base64; wrong host format)

OpenSSH’s known_hosts requires “host keytype base64”. You’re writing raw bytes without a key type, and you don’t bracket non-default ports ([host]:port). This will cause host key verification to fail and may be racy if multiple threads rewrite the same file.

Apply this fix (defaults to ssh-ed25519; ideally plumb the real type):

- if (!sshPublicHostKey.empty()) { - std::filesystem::path fileName = tmpDir->path() / "host-key"; - writeFile(fileName.string(), authority.host + " " + sshPublicHostKey + "\n"); - args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()}); - } + if (!sshPublicHostKey.empty()) { + std::filesystem::path fileName = tmpDir->path() / "known_hosts"; + // Use "[host]:port" when a non-default port is set. + std::string hostForKnownHosts = + authority.port ? fmt("[%s]:%d", authority.host, *authority.port) + : std::string(authority.host); + // TODO: pass the real key type instead of defaulting. + const char *keyType = "ssh-ed25519"; + writeFile( + fileName.string(), + hostForKnownHosts + " " + keyType + " " + base64::encode(sshPublicHostKey) + "\n"); + args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()}); + }

Optionally, write the file once (e.g., at construction) to avoid concurrent rewrites. I can send a follow-up patch using std::once_flag.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!keyFile.empty())

args.insert(args.end(), {"-i", keyFile});

if (!sshPublicHostKey.empty()) {

std::filesystem::path fileName = state->tmpDir->path() / "host-key";

std::filesystem::path fileName = tmpDir->path() / "host-key";

writeFile(fileName.string(), authority.host + " " + sshPublicHostKey + "\n");

args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()});

}

if (!keyFile.empty())

args.insert(args.end(), {"-i", keyFile});

if (!sshPublicHostKey.empty()) {

std::filesystem::path fileName = tmpDir->path() / "known_hosts";

// Use "[host]:port" when a non-default port is set.

std::string hostForKnownHosts =

authority.port ? fmt("[%s]:%d", authority.host, *authority.port)

: std::string(authority.host);

// TODO: pass the real key type instead of defaulting.

const char *keyType = "ssh-ed25519";

writeFile(

fileName.string(),

hostForKnownHosts + " " + keyType + " " + base64::encode(sshPublicHostKey) + "\n");

args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()});

}

🤖 Prompt for AI Agents

In src/libstore/ssh.cc around lines 97 to 103, the known_hosts line is malformed: you currently write raw key bytes without a key type or base64 and you don’t bracket hosts with non-default ports, plus the file may be rewritten concurrently; fix by formatting the entry as "host keytype base64key" (default keytype to "ssh-ed25519" if the real type isn’t available), base64-encode sshPublicHostKey before writing, bracket host as "[host]:port" when a non-default port is used, and ensure the file is created/written only once (e.g., move writeFile to construction or guard with a once_flag) so multiple threads don’t race rewriting it.

edolstra added 30 commits May 20, 2024 10:09

EvalState: Make the parse/eval caches thread-safe

6760b39

LRUCache: Mark size() as const

d3854d1

Sync: Add support for shared locks

945cd69

WIP

5f3b1a3

WIP3

9ddca98

WIP4

d133aca

Disable some blackhole tests for now

1a55754

WIP working

d623dfb

Better hash

a9e3594

Symbol table concurrency hack

b63a132

Hacks

76f822f

Fix failures due to value reuse

6a85af7

Revive the Boehm GC alloc cache

6eafc52

Make RegexCache thread-safe

f018a55

Add some stats

ec8593d

Cleanup

105dea5

Make EvalState::srcToStore thread-safe

27fb652

PosixSourceAccessor: Use SharedSync

d990974

FileParseCache, FileEvalCache: Use read lock

eba54f5

Add getOptional()

a25a5b7

EvalState: Add importResolutionCache

ca11328

This is a mapping from paths to "resolved" paths (i.e. with `default.nix` added, if appropriate). `fileParseCache` and `fileEvalCache` are now keyed on the resolved path *only*.

Make fileEvalCache insertion more efficient

c2c01d8

Ensure that files are parsed/evaluated only once

9b88021

Previously, the optimistic concurrency approach in `evalFile()` meant that a `nix search nixpkgs ^` would do hundreds of duplicated parsings/evaluations. Now, we reuse the thunk locking mechanism to ensure it's done only once.

Small optimization

708e0e8

SymbolStr: Remove std::string conversion

cc38822

This refactoring allows the symbol table to be stored as something other than std::strings.

Use a contiguous arena for storing symbols

424e01e

This allows symbol IDs to be offsets into an arena whose base offset never moves, and can therefore be dereferenced without any locks.

Executor: Randomize the work queue

c663076

This makes it less likely that we concurrently execute tasks that would block on a common subtask, e.g. evaluating `libfoo` and `libfoo_variant` are likely to have common dependencies.

Provide std::hash<SourcePath>

adcc351

Provide std::hash<Symbol>

3988faf

Remove unused #include

a70ec9e

edolstra added 3 commits September 2, 2025 15:38

Remove untested code path

1f89c8a

Code review

b7c36a8

Use alignment constant

3425c5d

github-actions bot temporarily deployed to pull request September 2, 2025 13:47 Inactive

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

src/libexpr/eval-gc.cc Show resolved Hide resolved

src/libexpr/include/nix/expr/symbol-table.hh Outdated Show resolved Hide resolved

GC-allocate Failed

2a001a8

github-actions bot temporarily deployed to pull request September 2, 2025 14:02 Inactive

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

src/libexpr/include/nix/expr/value.hh Show resolved Hide resolved

src/libexpr/include/nix/expr/value.hh Show resolved Hide resolved

edolstra added 2 commits September 2, 2025 16:42

ValueStorage: Fix ordering in operator =

a73bf2e

Add TODO item

92788a0

edolstra force-pushed the multithreaded-eval-v2 branch from ceb8c69 to 92788a0 Compare September 2, 2025 14:45

github-actions bot temporarily deployed to pull request September 2, 2025 14:49 Inactive

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

edolstra enabled auto-merge September 2, 2025 15:00

cole-h disabled auto-merge September 2, 2025 15:23

grahamc added this pull request to the merge queue Sep 2, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 3, 2025

edolstra added this pull request to the merge queue Sep 3, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 3, 2025

cole-h enabled auto-merge September 3, 2025 13:35

github-actions bot temporarily deployed to pull request September 3, 2025 13:39 Inactive

coderabbitai bot reviewed Sep 3, 2025

View reviewed changes

cole-h added this pull request to the merge queue Sep 3, 2025

Merged via the queue into main with commit 2bdbe00 Sep 3, 2025
49 checks passed

cole-h deleted the multithreaded-eval-v2 branch September 3, 2025 14:33

CertainLach mentioned this pull request Sep 4, 2025

Refactor: switch to using bindings to nix instead of REPL deltarocks/fleet#12

Merged

edolstra mentioned this pull request Sep 7, 2025

Introduce a "failed" value type NixOS/nix#13930

Open

coderabbitai bot mentioned this pull request Oct 21, 2025

Flake schemas #217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multithreaded evaluator #125

Multithreaded evaluator #125

Uh oh!

edolstra commented Jun 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Multithreaded evaluator #125

Multithreaded evaluator #125

Uh oh!

Conversation

edolstra commented Jun 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Context

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edolstra commented Jun 23, 2025 •

edited by coderabbitai bot

Loading