Conversation
This is slightly faster than doing a read() into a buffer just to discard the data.
WalkthroughReuses/string-buffering and guard logic in importPaths to avoid recomputing and re-adding NARs for already-present paths; LocalStore cleanup now advances/skips the NAR when size is known instead of parsing it; adds Changes
Sequence Diagram(s)sequenceDiagram
participant Importer as importPaths
participant Store as Store
participant Buffer as StringSink
participant Source as StringSource
Note over Importer,Buffer: Version 1 import loop (buffer reused)
loop each path
Importer->>Store: isValidPath(path)?
alt not valid
Importer->>Buffer: clear() (reuse)
Importer->>Importer: build NAR into Buffer
Importer->>Importer: compute narHash, narSize
Importer->>Store: addToStore(path, ValidPathInfo, source from Buffer)
Note right of Store: path added
else already valid
Note right of Importer: skip NAR build & add
end
Importer->>Importer: push path into result
end
Note over Source: StringSource::skip(len)
Source->>Source: pos += len
alt pos > size
Source->>Source: pos = size
Source->>Source: throw EndOfFile
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/libstore/export-import.cc (1)
95-101: Optional: consider pre-reserving or spilling for very large NARsFor extremely large single NARs, holding the whole dump in memory can spike RSS. If this becomes a concern, consider pre-reserving a heuristic capacity or spilling to a temp file when
saved.s.size()exceeds a threshold. Not a blocker.Would you like a follow-up patch that spills to a temp file past a configurable threshold?
Also applies to: 122-125
src/libutil/include/nix/util/serialise.hh (1)
257-260: Header override added — align docs with throw semanticsAdding
void skip(size_t len) override;toStringSourceis appropriate. Consider documenting thatskipmay throwEndOfFileiflenexceeds available data, matching base behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/libstore/export-import.cc(3 hunks)src/libstore/local-store.cc(1 hunks)src/libutil/include/nix/util/serialise.hh(1 hunks)src/libutil/serialise.cc(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/libutil/serialise.cc (1)
src/libutil/include/nix/util/serialise.hh (7)
len(101-101)len(206-206)len(259-259)s(155-162)s(155-155)s(196-196)s(196-196)
src/libstore/local-store.cc (3)
src/libstore/dummy-store.cc (4)
info(160-184)info(160-160)source(186-250)source(186-193)src/libstore/include/nix/store/local-store.hh (4)
info(238-238)info(361-361)info(383-383)info(384-384)src/libstore/include/nix/store/store-api.hh (3)
info(525-529)info(759-759)source(540-540)
src/libstore/export-import.cc (3)
src/nix/nario.cc (20)
store(62-69)store(62-62)store(96-100)store(96-96)path(111-122)path(111-111)path(124-128)path(124-124)path(130-158)path(130-130)path(160-164)path(160-160)path(252-256)path(252-253)path(302-305)path(302-302)path(318-321)path(318-318)info(268-288)info(269-269)src/libstore/include/nix/store/store-api.hh (4)
path(308-308)path(330-330)info(525-529)info(759-759)src/libutil/hash.cc (2)
hashString(345-353)hashString(345-345)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build_aarch64-darwin / build
- GitHub Check: build_x86_64-linux / build
🔇 Additional comments (2)
src/libstore/export-import.cc (2)
87-96: Good: Reusing a persistent StringSink reduces reallocs/kernel overhead in v1 loopReusing
StringSink saved;and clearing its string each iteration is a solid micro-optimization for large NARs. LGTM.
113-125: Guarded add avoids unnecessary hashing/import when path existsDeferring
ValidPathInfo/hashing and callingaddToStoreonly when!store.isValidPath(path)is correct and avoids wasted work. Unconditionally pushingpathtorespreserves behavior.Also applies to: 127-127
…v1 case This speeds up re-importing a 15 GiB closure in version 1 of the nario format from 24.3s to 16.0s.
`StringSink` turns out to be pretty expensive if you have a bunch of very large strings, since once the string gets big, it's allocated outside of the regular heap via mmap. So every 4096 bytes we get a page fault, and the whole string is returned to the OS when `StringSink` is destroyed. So we have a huge system time overhead. So reuse the `StringSink` for each NAR. This speeds up importing a 15 GB NixOS system closure (with numerous large NARs) from 15.7s to 7.4s.
edf45d6 to
bbdd59c
Compare
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
3e33516 to
bd74502
Compare
Motivation
Some optimizations that speed up re-importing a nario (version 1) from 24.3s to 7.4s for a 15 GB NixOS system closure.
The main gain is from realizing that
StringSourcehas very high kernel overhead for large strings.Context
Summary by CodeRabbit