feat(libstore): make cycle errors more verbose#14218
Open
lovesegfault wants to merge 20 commits intoNixOS:masterfrom
Open
feat(libstore): make cycle errors more verbose#14218lovesegfault wants to merge 20 commits intoNixOS:masterfrom
lovesegfault wants to merge 20 commits intoNixOS:masterfrom
Conversation
xokdvium
requested changes
Oct 11, 2025
Ericson2314
reviewed
Oct 11, 2025
src/libstore/build/find-cycles.cc
Outdated
Comment on lines
94
to
98
| auto st = lstat(path); | ||
|
|
||
| debug("scanForCycleEdges2: scanning path = %s", path); | ||
|
|
||
| if (S_ISREG(st.st_mode)) { |
Member
There was a problem hiding this comment.
We would want to do this with std::filesystem stuff, so it is portable to Windows
Contributor
There was a problem hiding this comment.
Can't we reuse NarAccessor to avoid the filesystem stuff completely?
Member
There was a problem hiding this comment.
Probably. In generally I do get the sense that there is a lot of deduplication that can still be done here. I just haven't looked into the details yet.
(Also, FWIW, my plan with CA derivations is to make cycles impossible by construction.)
Member
Author
There was a problem hiding this comment.
I've refactored it to use NarAccessor
d7e7203 to
6c39e5e
Compare
Co-Authored-By: Milan Hauth <milahu@gmail.com>
Previously, `CycleEdgeScanSink::operator()` copied the entire `getResult()` `StringSet` twice on every 64KB chunk to detect newly found hashes. For large files, this created O(n * chunks) overhead. Now we track which hashes have been recorded for the current file using `recordedForCurrentFile`, avoiding the set copies. The insert() returns true only for newly seen hashes, making this O(1) per hash found.
The `hashPathMap` was being passed to `CycleEdgeScanSink` and stored as a member variable, but was never actually used. The sink only needs the hash strings for detection via `RefScanSink`, not the full `StorePath` mapping.
Moved the `edges` member variable from public to private section for better encapsulation. Access is provided through the `getEdges()` method.
Replaced raw `read()` with `readFull()` helper, which properly handles partial reads and `EINTR`. The previous code manually checked for errors but didn't handle the case where `read()` returns fewer bytes than requested.
After refactoring to use `RefScanSink`, we no longer manually search for hashes in buffers, so the `refLength` constant (hash length) is unused. `RefScanSink` handles this internally.
The single-pass greedy algorithm could fail to connect all edges if they arrived in certain orders. For example, edges A→B, C→D, D→A, B→C would result in two paths [D→A→B→C] and [C→D] instead of one complete cycle. Added a second pass that repeatedly tries to merge existing multiedges with each other until no more merges are possible. This ensures we find complete cycle paths regardless of edge discovery order.
…File APIs Replaced POSIX-specific file operations with portable alternatives to improve Windows compatibility.
Added explicit `#include "nix/util/archive.hh"` since we use the caseHackSuffix constant from it.
Replace string concatenation for path joining with type-safe `std::filesystem::path operator/`.
…anPath The "2" suffix was unclear and didn't communicate the function's purpose. The new name better describes what it does: walks the filesystem tree and scans each file using the provided sink.
Changed `walkAndScanPath` to take `std::filesystem::path` instead of `string`, eliminating repeated string→path conversions throughout the function.
…acOS only Since caseHackSuffix is only used inside `#ifdef __APPLE__` blocks, gate the archive.hh include with the same condition.
Move `find-cycles.hh` to `src/libstore/include/nix/store/build/` to ensure it is properly installed as a public header and can be used by tests and other consumers of the library.
Replace the multi-pass O(n²) algorithm with a single-pass O(n) approach using hashmaps to track path endpoints. Before: - First pass: O(n*m) to join each edge with existing multiedges - Second pass: O(m²) repeated merging until no more merges possible - Required multiple iterations through the entire dataset Now: - Single O(n) pass through edges - Two hashmaps (pathStartingAt, pathEndingAt) enable O(1) lookups - Immediately identifies connections without linear searches - Handles all cases in one pass: * Extending existing paths (prepend/append) * Joining two separate paths * Forming cycles when a path connects to itself
Replace direct `std::filesystem` operations with `SourceAccessor`..
6c39e5e to
a9aaf0e
Compare
This was referenced Oct 25, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Attempting to revive the extremely useful #6585
Context
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.