-
Notifications
You must be signed in to change notification settings - Fork 2
Multithreaded evaluator #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a mapping from paths to "resolved" paths (i.e. with `default.nix` added, if appropriate). `fileParseCache` and `fileEvalCache` are now keyed on the resolved path *only*.
Previously, the optimistic concurrency approach in `evalFile()` meant that a `nix search nixpkgs ^` would do hundreds of duplicated parsings/evaluations. Now, we reuse the thunk locking mechanism to ensure it's done only once.
This refactoring allows the symbol table to be stored as something other than std::strings.
This allows symbol IDs to be offsets into an arena whose base offset never moves, and can therefore be dereferenced without any locks.
This makes it less likely that we concurrently execute tasks that would block on a common subtask, e.g. evaluating `libfoo` and `libfoo_variant` are likely to have common dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/libexpr/eval-gc.cc (1)
99-106: Restore pthread_attr_get_np fallback; current #error breaks BSD/non-glibc buildsThe FreeBSD/non-glibc path is hard-erroring. Implement the pthread_attr_get_np branch with proper error checks.
Apply:
-# ifdef HAVE_PTHREAD_GETATTR_NP - if (pthread_getattr_np(pthread_id, &pattr)) - throw Error("fixupBoehmStackPointer: pthread_getattr_np failed"); -# else -# error "Need `pthread_attr_get_np`" -# endif +# ifdef HAVE_PTHREAD_GETATTR_NP + if (pthread_getattr_np(pthread_id, &pattr)) + throw Error("fixupBoehmStackPointer: pthread_getattr_np failed"); +# elif defined(HAVE_PTHREAD_ATTR_GET_NP) + if (pthread_attr_get_np(pthread_id, &pattr)) + throw Error("fixupBoehmStackPointer: pthread_attr_get_np failed"); +# else +# error "Need pthread_getattr_np or pthread_attr_get_np" +# endifsrc/libexpr/include/nix/expr/symbol-table.hh (2)
240-244: Add bounds and alignment checks in operator[] and launder the pointerPrevent out-of-bounds access and misaligned loads; also use std::launder for safety after placement construction.
SymbolStr operator[](Symbol s) const { - assert(s.id); - return SymbolStr(*reinterpret_cast<const SymbolValue *>(arena.data + s.id)); + assert(s.id); + // Basic integrity checks + assert(s.id < arena.size); // offset within arena + assert((s.id % alignment) == 0); // start of an aligned record + assert(s.id + sizeof(SymbolValue) <= arena.size); // header fits + auto p = std::launder(reinterpret_cast<const SymbolValue *>(arena.data + s.id)); + return SymbolStr(*p); }
256-273: Fix dump() record-advance logic to account for header in alignment calculationCurrent padding uses only the string length; it should include the Value header to remain aligned. Otherwise, iteration can desynchronize and hit asserts or skip symbols.
void dump(T callback) const { std::string_view left{arena.data, arena.size}; left = left.substr(alignment); while (true) { if (left.empty()) break; left = left.substr(sizeof(Value)); auto p = left.find('\0'); assert(p != left.npos); auto sym = left.substr(0, p); callback(sym); // skip alignment padding - auto n = sym.size() + 1; - left = left.substr(n + (n % alignment ? alignment - (n % alignment) : 0)); + auto n = sizeof(Value) + sym.size() + 1; // header + string + NUL + auto alignedSize = ((n + alignment - 1) / alignment) * alignment; + left = left.substr(alignedSize - sizeof(Value)); } }
🧹 Nitpick comments (11)
src/libexpr/eval-gc.cc (4)
38-40: Fix comment: /proc/meminfo, not /proc/cpuinfoThe comment references cpuinfo but the code reads meminfo.
Apply:
- /* On Linux, use the `MemAvailable` or `MemFree` fields from - /proc/cpuinfo. */ + /* On Linux, use the `MemAvailable` or `MemFree` fields from + /proc/meminfo. */
41-59: Harden Linux meminfo parsing against read failuresreadFile("/proc/meminfo") can throw; let’s fall back instead of propagating.
Apply:
- { + try { std::unordered_map<std::string, std::string> fields; for (auto & line : tokenizeString<std::vector<std::string>>(readFile(std::filesystem::path("/proc/meminfo")), "\n")) { auto colon = line.find(':'); if (colon == line.npos) continue; fields.emplace(line.substr(0, colon), trim(line.substr(colon + 1))); } auto i = fields.find("MemAvailable"); if (i == fields.end()) i = fields.find("MemFree"); if (i != fields.end()) { auto kb = tokenizeString<std::vector<std::string>>(i->second, " "); if (kb.size() == 2 && kb[1] == "kB") return string2Int<size_t>(kb[0]).value_or(0) * 1024; } - } + } catch (const std::exception &) { + /* Fall through to generic sysconf-based fallback below. */ + }
151-159: Comment/code mismatch (80% vs 50%)Comment says 80% but code uses ~50% of free RAM.
Apply one of:
- Update comment to 50%:
- /* Set the initial heap size to something fairly big (80% of + /* Set the initial heap size to something fairly big (50% of free RAM, up to a maximum of 4 GiB) so that in most cases
- Or change code to 80% (if intended).
164-166: Avoid float math; use integer halvingfree * 0.5 promotes to double and back. Use integer division for precision.
Apply:
- auto free = getFreeMem(); - size = std::max(size, std::min((size_t) (free * 0.5), maxSize)); + auto free = getFreeMem(); + auto halfFree = free / 2; + size = std::max(size, std::min(halfFree, maxSize));src/libexpr/include/nix/expr/symbol-table.hh (7)
27-40: Make arena.data writable without const_casts and keep the pointer itself constConstruction writes through data; using const here forces const_cast elsewhere. Prefer a const pointer to mutable memory.
- const char * data; + char * const data;Follow-up: drop const_casts at call sites (e.g., SymbolStr::SymbolStr in symbol-table.cc).
32-36: Confirm cache-line isolation intent is achievedaligned(64) on the atomic ensures its start is 64B-aligned, but only if the compiler inserts padding before it. Given preceding fields, this is likely fine, but please verify with static_asserts or sizeof/offsetof in a unit test.
Example check:
- Expect offsetof(ContiguousArena, size) % 64 == 0.
- Optionally add alignas(64) to the struct and reorder members if needed.
101-109: Key stores arena by reference; ensure lifetime and thread-safety are clearKey captures ContiguousArena& and is used inside a concurrent container; document that arena outlives the set and is append-only. If not guaranteed, consider a pointer.
139-149: size()/empty() are O(n) due to std::string_view length computationThis is fine if not hot; otherwise consider storing the length next to the string to avoid repeated scans.
219-225: Constructor reserves ID 0; also assert arena invariantsAdd cheap asserts to catch early misuse.
SymbolTable() : arena(1 << 30) { // Reserve symbol ID 0 and ensure alignment of the first allocation. arena.allocate(alignment); + assert((uintptr_t)arena.data % alignment == 0); + assert(arena.size == alignment); }
251-254: Atomic load semantics for totalSize()If used for metrics only, relaxed is fine and cheaper; otherwise use acquire. Make it explicit.
- size_t totalSize() const - { - return arena.size; - } + size_t totalSize() const + { + return arena.size.load(std::memory_order_relaxed); + }
22-24: Consistent access: consider using Value APIs or document the aliasingReturning a string_view by reinterpreting (this + 1) is correct for your layout but bypasses Value’s accessor; add a comment asserting that mkString stored the same pointer and that the trailing bytes remain immutable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
src/libexpr/eval-gc.cc(3 hunks)src/libexpr/include/nix/expr/counter.hh(1 hunks)src/libexpr/include/nix/expr/symbol-table.hh(7 hunks)src/libexpr/meson.build(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/libexpr/include/nix/expr/counter.hh
- src/libexpr/meson.build
🧰 Additional context used
🧬 Code graph analysis (2)
src/libexpr/eval-gc.cc (2)
src/libutil/file-system.cc (7)
readFile(251-264)readFile(251-251)readFile(266-269)readFile(266-266)readFile(271-298)readFile(271-271)path(850-850)src/libutil/unix/file-descriptor.cc (2)
readFile(32-39)readFile(32-32)
src/libexpr/include/nix/expr/symbol-table.hh (2)
src/libexpr/include/nix/expr/value.hh (14)
s(1143-1146)s(1143-1143)s(1148-1148)s(1150-1150)s(1152-1152)p(1199-1199)n(205-208)n(205-205)n(1128-1131)n(1128-1128)n(1133-1136)n(1133-1133)n(1216-1219)n(1216-1216)src/libexpr/symbol-table.cc (1)
SymbolStr(45-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build_aarch64-darwin / build
- GitHub Check: build_x86_64-darwin / build
- GitHub Check: build_x86_64-linux / build
- GitHub Check: build_aarch64-linux / build
🔇 Additional comments (2)
src/libexpr/eval-gc.cc (2)
82-84: Confirm assumption: no GC roots in coroutinesAssuming no GC roots live on coroutine stacks is risky. Please confirm or document invariants/tests ensuring this remains true under parallel eval.
148-150: SP corrector hook: LGTMHooking and asserting the corrector is installed looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (4)
src/libexpr/include/nix/expr/value.hh (4)
440-542: Complex concurrency design requires careful analysis.The atomic
p0and non-atomicp1design with the ordering constraint ("p1 must be written before p0") is a sophisticated approach to lock-free value updates. However, this creates several concerns:
Race condition potential: The requirement that "p1 must always be updated before p0" (Line 459) must be strictly enforced across all code paths.
Memory ordering: The use of
memory_order_releaseforp0writes andmemory_order_acquirefor reads is correct, but the implementation must be consistent.Debug output in production: Lines 539-541 contain debug output that may not be appropriate for production builds.
Apply this fix for the debug output:
- else if (pd == pdThunk) { - printError("BAD FINISH %x", this); - unreachable(); - } + else if (pd == pdThunk) + unreachable();
1308-1312: Race condition warning for thunk accessor.The comment "FIXME: remove this since reading it is racy" correctly identifies that accessing thunk data is unsafe in a multithreaded context. This method should be marked deprecated or removed.
Consider deprecating this method or adding runtime checks:
+ [[deprecated("Reading thunk data is racy in multithreaded evaluation")]] ClosureThunk thunk() const noexcept
1319-1323: Race condition warning for app accessor.The comment "FIXME: remove this since reading it is racy" correctly identifies the same threading issue as with the thunk accessor. This should also be addressed.
781-791: Testing-only blackhole function should be restricted.The
mkBlackhole()function is marked as "only used for testing" but is in the public interface. This could be accidentally used in production code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/libexpr/include/nix/expr/value.hh(17 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/libexpr/include/nix/expr/value.hh (2)
src/libexpr/parallel-eval.cc (4)
notifyWaiters(248-253)notifyWaiters(248-248)waitOnThunk(186-245)waitOnThunk(186-186)src/libexpr/include/nix/expr/eval.hh (24)
v(536-539)v(536-536)v(541-541)v(547-547)v(552-552)v(553-553)v(554-554)v(556-556)v(561-561)v(565-565)v(566-566)v(567-572)v(573-573)v(592-592)v(757-757)v(842-842)pos(594-595)pos(611-618)pos(641-641)pos(646-646)pos(651-655)pos(672-672)pos(786-786)pos(932-932)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build_aarch64-darwin / build
- GitHub Check: build_aarch64-linux / build
- GitHub Check: build_x86_64-darwin / build
- GitHub Check: build_x86_64-linux / build
🔇 Additional comments (16)
src/libexpr/include/nix/expr/value.hh (16)
336-336: LGTM! Added Failed payload type mapping.The macro definition correctly includes the Failed payload type in the storage system.
544-572: LGTM! Proper memory ordering in payload setters.The payload setting functions correctly use the
finish()method to ensure proper memory ordering. The bit manipulation and alignment assertions provide good safety checks.
587-621: LGTM! Efficient internal type decoding.The
getInternalType()function efficiently decodes the bit-packed discriminator information. The switch statement correctly handles all primary discriminator cases.
623-641: LGTM! Clean macro-based payload accessors.The macro generates appropriate getter/setter pairs for pointer-based payloads, maintaining the correct memory ordering through the established payload setter functions.
643-698: LGTM! Consistent payload access patterns.The storage getters correctly extract data from
p1for single-word payloads and use proper bit manipulation for pointer-based payloads.
695-698: LGTM! Failed payload accessor.The Failed pointer getter is implemented consistently with other pointer-based payload types.
750-753: LGTM! Failed payload setter.The setter correctly uses
setSingleDWordPayloadfor storing the Failed pointer.
793-807: LGTM! Forward declarations for multithreaded operations.The forward declarations for
force,waitOnThunk, andnotifyWaitersestablish the interface for the multithreaded evaluation system.
810-814: LGTM! Template specialization declarations.The explicit template specializations ensure that the multithreaded operations are properly instantiated for the target pointer size.
1028-1032: LGTM! Efficient finished check.The
isFinished()method provides a clean interface to check if a value is in weak-head normal form.
1048-1052: LGTM! Updated blackhole detection.The
isBlackhole()implementation correctly identifies pending and awaited states as blackhole conditions in the new state machine.
1070-1073: LGTM! Failed value predicate.The
isFailed()method provides a clean interface to check for failed values.
1079-1116: LGTM! Updated type mapping with failure support.The
type()method correctly maps the newtFailedinternal type tonFailedand handles the new pending/awaited states as thunk types.
1221-1224: LGTM! Failed value constructor.The
mkFailed()method correctly creates a Failed value with the current exception. The use ofstd::current_exception()is appropriate for capturing the active exception.
1244-1251: LGTM! Updated trivial value detection.The
isTrivial()documentation correctly reflects that finished values are considered trivial in the new evaluation model.
1335-1338: LGTM! Failed value accessor.The
failed()method provides clean access to the Failed payload, consistent with other payload accessors.
ceb8c69 to
92788a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/libexpr/include/nix/expr/value.hh (1)
572-585: Release-store insetThunkPayloadis correct.This now preserves the “p1-before-p0” ordering without spurious wakeups for awaited states. Thanks for addressing the earlier fence concern.
🧹 Nitpick comments (3)
src/libexpr/include/nix/expr/eval.hh (1)
377-381: Good move toSharedSyncfor caches; consider similar treatment for remaining maps.
lookupPathResolved,primOpCalls, andattrSelectsare still marked non-thread-safe. ConsiderSharedSyncor per-thread aggregation with periodic merge to avoid lock contention.Also applies to: 392-393, 398-398
src/libexpr/include/nix/expr/value.hh (2)
1217-1221:Failedpayload: confirm GC ownership and lifetime.
mkFailed()allocates withnewon agc-derived type. Please confirm this is traced by the GC in both Boehm and non-Boehm builds and that any printers/visitors handlenFailedwithout dereferencing a nullex.Optionally add a brief docstring to
ValueType::nFailedandValue::failed()clarifying semantics.Also applies to: 1331-1335
1044-1049: Expose racy accessors only where safe.
isBlackhole()is fine. The “FIXME: remove this since reading it is racy” forthunk()/app()remains—consider#ifdef UNIT_TESTSor moving to a test-only header to prevent accidental use.Also applies to: 1304-1309, 1315-1320
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
src/libexpr/eval-gc.cc(3 hunks)src/libexpr/include/nix/expr/eval.hh(10 hunks)src/libexpr/include/nix/expr/symbol-table.hh(7 hunks)src/libexpr/include/nix/expr/value.hh(17 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/libexpr/include/nix/expr/symbol-table.hh
- src/libexpr/eval-gc.cc
🧰 Additional context used
🧬 Code graph analysis (2)
src/libexpr/include/nix/expr/value.hh (2)
src/libexpr/parallel-eval.cc (4)
notifyWaiters(248-253)notifyWaiters(248-248)waitOnThunk(186-245)waitOnThunk(186-186)src/libexpr/include/nix/expr/eval.hh (24)
v(537-540)v(537-537)v(542-542)v(548-548)v(553-553)v(554-554)v(555-555)v(557-557)v(562-562)v(566-566)v(567-567)v(568-573)v(574-574)v(593-593)v(758-758)v(843-843)pos(595-596)pos(612-619)pos(642-642)pos(647-647)pos(652-656)pos(673-673)pos(787-787)pos(933-933)
src/libexpr/include/nix/expr/eval.hh (2)
src/libexpr/primops.cc (2)
struct State(4312-4315)re(4310-4334)src/libexpr/eval.cc (3)
EvalState(378-380)EvalState(187-375)eval(1758-1784)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build_aarch64-darwin / build
- GitHub Check: build_x86_64-darwin / build
- GitHub Check: build_x86_64-linux / build
🔇 Additional comments (3)
src/libexpr/include/nix/expr/eval.hh (3)
193-193:RegexCacheownership change looks good; verify all call sites.Return type moved to
ref<RegexCache>and the member matches. Please confirm allmakeRegexCache()callers were updated and no lingeringshared_ptrassumptions remain.Also applies to: 408-408
537-540: InlineforceValueforwarding is correct.
780-781: Per-threadcallDepthviathread_localis appropriate.
| ref<Executor> executor; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Ensure executor is always non-null (avoid nullptr deref).
waitOnThunk() uses state.executor->evalCores; if executor is unset you'll crash. Enforce non-null in the constructor (initialize a single-threaded default) or guard all uses.
Would you like a patch that wires a default Executor{evalCores=1} in EvalState’s ctor?
Initialize executor in EvalState’s constructor
Ensure executor is never null by assigning a default single-threaded Executor (e.g. with evalCores=1) in the EvalState ctor to avoid null-pointer deref in waitOnThunk().
🤖 Prompt for AI Agents
In src/libexpr/include/nix/expr/eval.hh around lines 227-228, EvalState's member
ref<Executor> executor is not initialized; update the EvalState constructor to
create and assign a default single-threaded Executor (e.g. construct Executor
with evalCores=1) so executor is never null and waitOnThunk() cannot dereference
a null pointer. Instantiate the Executor with appropriate ownership semantics
and assign it to the executor field in the ctor, ensuring any required
headers/params are provided.
When useMaster is true, startMaster() acquires the state lock, then calls isMasterRunning(), which calls addCommonSSHOpts(), which tries to acquire the state lock again, causing a deadlock. The solution is to move tmpDir out of the state. It doesn't need to be there in the first place because it never changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/libstore/ssh.cc (2)
118-125: Control master check ignores custom -S path; can skip starting our master and then use a non-existent socketisMasterRunning() doesn’t pass the control socket path, but startMaster() later returns state->socketPath and callers use “-S ”. If a user-level master exists (default ControlPath), you’ll return our tmpDir path even though no master is listening there.
Apply these changes to scope the check to our socket:
-bool SSHMaster::isMasterRunning() +bool SSHMaster::isMasterRunning(Path controlPath) { Strings args = {"-O", "check", hostnameAndUser}; addCommonSSHOpts(args); + if (!controlPath.empty()) + args.insert(args.end(), {"-S", controlPath});And in the header:
- bool isMasterRunning(); + bool isMasterRunning(Path controlPath = "");And in startMaster (see below).
249-253: Pass the socket path when checking master stateTie the check to the socket we’ll actually use.
- if (isMasterRunning()) + if (isMasterRunning(state->socketPath)) return state->socketPath;
🧹 Nitpick comments (2)
src/libstore/include/nix/store/ssh.hh (1)
30-31: Make tmpDir immutable and document lifetime/thread-safetytmpDir is effectively write-once; marking it const prevents accidental reassignment and clarifies intent. Also, because Connections don’t hold a ref to tmpDir, verify SSHMaster outlives any in-flight ssh child using files under tmpDir.
- ref<AutoDelete> tmpDir; + const ref<AutoDelete> tmpDir;If you want to make usage clearer and avoid repeated recomputation of paths, consider adding:
- a cached known-hosts file path (e.g., Path knownHostsFile;)
- a cached control socket path (e.g., Path controlSocket;)
src/libstore/ssh.cc (1)
241-242: Use consistent path joining; avoid deref/cast of refDereferencing tmpDir and C-style casting to Path is brittle. Use the same path API you used above.
- state->socketPath = (Path) *tmpDir + "/ssh.sock"; + state->socketPath = (tmpDir->path() / "ssh.sock").string();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/libstore/include/nix/store/ssh.hh(2 hunks)src/libstore/ssh.cc(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/libstore/ssh.cc (2)
src/libutil/include/nix/util/file-system.hh (1)
createTempDir(330-330)src/libstore/include/nix/store/ssh.hh (2)
args(42-42)getNixSshOpts(12-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build_x86_64-darwin / build
- GitHub Check: build_aarch64-linux / build
- GitHub Check: build_x86_64-linux / build
- GitHub Check: build_aarch64-darwin / build
🔇 Additional comments (2)
src/libstore/include/nix/store/ssh.hh (1)
4-4: Include looks correct and necessaryAdding ref.hh is appropriate for the new ref<...> member.
src/libstore/ssh.cc (1)
87-88: Good: secure temp dir creationCreating the tmp dir at construction with 0700 is correct and avoids races.
| if (!keyFile.empty()) | ||
| args.insert(args.end(), {"-i", keyFile}); | ||
| if (!sshPublicHostKey.empty()) { | ||
| std::filesystem::path fileName = state->tmpDir->path() / "host-key"; | ||
| std::filesystem::path fileName = tmpDir->path() / "host-key"; | ||
| writeFile(fileName.string(), authority.host + " " + sshPublicHostKey + "\n"); | ||
| args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()}); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Known_hosts line is malformed (missing key type and base64; wrong host format)
OpenSSH’s known_hosts requires “host keytype base64”. You’re writing raw bytes without a key type, and you don’t bracket non-default ports ([host]:port). This will cause host key verification to fail and may be racy if multiple threads rewrite the same file.
Apply this fix (defaults to ssh-ed25519; ideally plumb the real type):
- if (!sshPublicHostKey.empty()) {
- std::filesystem::path fileName = tmpDir->path() / "host-key";
- writeFile(fileName.string(), authority.host + " " + sshPublicHostKey + "\n");
- args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()});
- }
+ if (!sshPublicHostKey.empty()) {
+ std::filesystem::path fileName = tmpDir->path() / "known_hosts";
+ // Use "[host]:port" when a non-default port is set.
+ std::string hostForKnownHosts =
+ authority.port ? fmt("[%s]:%d", authority.host, *authority.port)
+ : std::string(authority.host);
+ // TODO: pass the real key type instead of defaulting.
+ const char *keyType = "ssh-ed25519";
+ writeFile(
+ fileName.string(),
+ hostForKnownHosts + " " + keyType + " " + base64::encode(sshPublicHostKey) + "\n");
+ args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()});
+ }Optionally, write the file once (e.g., at construction) to avoid concurrent rewrites. I can send a follow-up patch using std::once_flag.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!keyFile.empty()) | |
| args.insert(args.end(), {"-i", keyFile}); | |
| if (!sshPublicHostKey.empty()) { | |
| std::filesystem::path fileName = state->tmpDir->path() / "host-key"; | |
| std::filesystem::path fileName = tmpDir->path() / "host-key"; | |
| writeFile(fileName.string(), authority.host + " " + sshPublicHostKey + "\n"); | |
| args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()}); | |
| } | |
| if (!keyFile.empty()) | |
| args.insert(args.end(), {"-i", keyFile}); | |
| if (!sshPublicHostKey.empty()) { | |
| std::filesystem::path fileName = tmpDir->path() / "known_hosts"; | |
| // Use "[host]:port" when a non-default port is set. | |
| std::string hostForKnownHosts = | |
| authority.port ? fmt("[%s]:%d", authority.host, *authority.port) | |
| : std::string(authority.host); | |
| // TODO: pass the real key type instead of defaulting. | |
| const char *keyType = "ssh-ed25519"; | |
| writeFile( | |
| fileName.string(), | |
| hostForKnownHosts + " " + keyType + " " + base64::encode(sshPublicHostKey) + "\n"); | |
| args.insert(args.end(), {"-oUserKnownHostsFile=" + fileName.string()}); | |
| } |
🤖 Prompt for AI Agents
In src/libstore/ssh.cc around lines 97 to 103, the known_hosts line is
malformed: you currently write raw key bytes without a key type or base64 and
you don’t bracket hosts with non-default ports, plus the file may be rewritten
concurrently; fix by formatting the entry as "host keytype base64key" (default
keytype to "ssh-ed25519" if the real type isn’t available), base64-encode
sshPublicHostKey before writing, bracket host as "[host]:port" when a
non-default port is used, and ensure the file is created/written only once
(e.g., move writeFile to construction or guard with a once_flag) so multiple
threads don’t race rewriting it.
Motivation
Updated version of NixOS#10938.
Context
Summary by CodeRabbit
New Features
Performance
UX Improvements
CI/Tests