Skip to content

nario v1 optimizations#238

Merged
edolstra merged 5 commits intomainfrom
nario-import-optimization
Oct 22, 2025
Merged

nario v1 optimizations#238
edolstra merged 5 commits intomainfrom
nario-import-optimization

Conversation

@edolstra
Copy link
Collaborator

@edolstra edolstra commented Oct 21, 2025

Motivation

Some optimizations that speed up re-importing a nario (version 1) from 24.3s to 7.4s for a 15 GB NixOS system closure.

The main gain is from realizing that StringSource has very high kernel overhead for large strings.

Context

Summary by CodeRabbit

  • Refactor
    • Reduced memory allocations during import operations by reusing buffers and deferring expensive computations until needed, improving import performance and memory use.
    • Cleaned up control flow for import/export paths for better maintainability.
  • Bug Fix / Reliability
    • Strengthened string-reading behavior with stricter bounds checking and clearer end-of-data errors to prevent partial-consumption issues.

This is slightly faster than doing a read() into a buffer just to
discard the data.
@coderabbitai
Copy link

coderabbitai bot commented Oct 21, 2025

Walkthrough

Reuses/string-buffering and guard logic in importPaths to avoid recomputing and re-adding NARs for already-present paths; LocalStore cleanup now advances/skips the NAR when size is known instead of parsing it; adds StringSource::skip(size_t) with end-of-string bounds behavior.

Changes

Cohort / File(s) Summary
Import buffer reuse & guarded addition
src/libstore/export-import.cc
Reuse a StringSink buffer across iterations for version-1 imports (clear per iteration) and move NAR hash / ValidPathInfo construction behind a guard so addToStore only runs when !store.isValidPath(path); always push path into results. Also added an explicit block/brace for the case branch.
LocalStore cleanup: conditional skip
src/libstore/local-store.cc
In LocalStore::addToStore cleanup path, if narSize is known advance/skip the source cursor by narSize instead of unconditionally parsing the NAR into a sink; preserves previous behavior when size is unknown.
StringSource skip API & implementation
src/libutil/include/nix/util/serialise.hh, src/libutil/serialise.cc
Adds void skip(size_t len) override; to StringSource and implements it to advance position by len, clamping to end and throwing EndOfFile("end of string reached") if overshot.

Sequence Diagram(s)

sequenceDiagram
    participant Importer as importPaths
    participant Store as Store
    participant Buffer as StringSink
    participant Source as StringSource

    Note over Importer,Buffer: Version 1 import loop (buffer reused)
    loop each path
        Importer->>Store: isValidPath(path)?
        alt not valid
            Importer->>Buffer: clear() (reuse)
            Importer->>Importer: build NAR into Buffer
            Importer->>Importer: compute narHash, narSize
            Importer->>Store: addToStore(path, ValidPathInfo, source from Buffer)
            Note right of Store: path added
        else already valid
            Note right of Importer: skip NAR build & add
        end
        Importer->>Importer: push path into result
    end

    Note over Source: StringSource::skip(len)
    Source->>Source: pos += len
    alt pos > size
        Source->>Source: pos = size
        Source->>Source: throw EndOfFile
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • cole-h

Poem

🐰 I nibble bytes and clear my bowl,
Reuse the sink, avoid the toll,
I skip a hop when bounds are bent,
Add only when the path's not meant—
A small hop saved, a tidy roll.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "nario v1 optimizations" is directly related to the main objective of this pull request. The changeset consists of performance improvements across multiple files (export-import.cc, local-store.cc, and serialise.cc/hh) designed to optimize the re-import process for nario v1, as confirmed by the PR objectives which state the optimizations reduce re-import time from 24.3s to 7.4s. The title is concise, clear, and specific enough for teammates scanning commit history to understand that this pull request addresses performance enhancements for nario v1 without including unnecessary details or vague terminology.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nario-import-optimization

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edf45d6 and bd74502.

📒 Files selected for processing (3)
  • src/libstore/export-import.cc (3 hunks)
  • src/libstore/local-store.cc (1 hunks)
  • src/libutil/serialise.cc (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/libstore/export-import.cc
  • src/libstore/local-store.cc
  • src/libutil/serialise.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build_x86_64-linux / build
  • GitHub Check: build_aarch64-darwin / build

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Oct 21, 2025

@github-actions github-actions bot temporarily deployed to pull request October 21, 2025 20:47 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/libstore/export-import.cc (1)

95-101: Optional: consider pre-reserving or spilling for very large NARs

For extremely large single NARs, holding the whole dump in memory can spike RSS. If this becomes a concern, consider pre-reserving a heuristic capacity or spilling to a temp file when saved.s.size() exceeds a threshold. Not a blocker.

Would you like a follow-up patch that spills to a temp file past a configurable threshold?

Also applies to: 122-125

src/libutil/include/nix/util/serialise.hh (1)

257-260: Header override added — align docs with throw semantics

Adding void skip(size_t len) override; to StringSource is appropriate. Consider documenting that skip may throw EndOfFile if len exceeds available data, matching base behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 000b572 and edf45d6.

📒 Files selected for processing (4)
  • src/libstore/export-import.cc (3 hunks)
  • src/libstore/local-store.cc (1 hunks)
  • src/libutil/include/nix/util/serialise.hh (1 hunks)
  • src/libutil/serialise.cc (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/libutil/serialise.cc (1)
src/libutil/include/nix/util/serialise.hh (7)
  • len (101-101)
  • len (206-206)
  • len (259-259)
  • s (155-162)
  • s (155-155)
  • s (196-196)
  • s (196-196)
src/libstore/local-store.cc (3)
src/libstore/dummy-store.cc (4)
  • info (160-184)
  • info (160-160)
  • source (186-250)
  • source (186-193)
src/libstore/include/nix/store/local-store.hh (4)
  • info (238-238)
  • info (361-361)
  • info (383-383)
  • info (384-384)
src/libstore/include/nix/store/store-api.hh (3)
  • info (525-529)
  • info (759-759)
  • source (540-540)
src/libstore/export-import.cc (3)
src/nix/nario.cc (20)
  • store (62-69)
  • store (62-62)
  • store (96-100)
  • store (96-96)
  • path (111-122)
  • path (111-111)
  • path (124-128)
  • path (124-124)
  • path (130-158)
  • path (130-130)
  • path (160-164)
  • path (160-160)
  • path (252-256)
  • path (252-253)
  • path (302-305)
  • path (302-302)
  • path (318-321)
  • path (318-318)
  • info (268-288)
  • info (269-269)
src/libstore/include/nix/store/store-api.hh (4)
  • path (308-308)
  • path (330-330)
  • info (525-529)
  • info (759-759)
src/libutil/hash.cc (2)
  • hashString (345-353)
  • hashString (345-345)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build_aarch64-darwin / build
  • GitHub Check: build_x86_64-linux / build
🔇 Additional comments (2)
src/libstore/export-import.cc (2)

87-96: Good: Reusing a persistent StringSink reduces reallocs/kernel overhead in v1 loop

Reusing StringSink saved; and clearing its string each iteration is a solid micro-optimization for large NARs. LGTM.


113-125: Guarded add avoids unnecessary hashing/import when path exists

Deferring ValidPathInfo/hashing and calling addToStore only when !store.isValidPath(path) is correct and avoids wasted work. Unconditionally pushing path to res preserves behavior.

Also applies to: 127-127

…v1 case

This speeds up re-importing a 15 GiB closure in version 1 of the nario
format from 24.3s to 16.0s.
`StringSink` turns out to be pretty expensive if you have a bunch of
very large strings, since once the string gets big, it's allocated
outside of the regular heap via mmap. So every 4096 bytes we get a
page fault, and the whole string is returned to the OS when
`StringSink` is destroyed. So we have a huge system time overhead.

So reuse the `StringSink` for each NAR. This speeds up importing a 15
GB NixOS system closure (with numerous large NARs) from 15.7s to 7.4s.
@edolstra edolstra force-pushed the nario-import-optimization branch from edf45d6 to bbdd59c Compare October 22, 2025 13:09
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@edolstra edolstra force-pushed the nario-import-optimization branch from 3e33516 to bd74502 Compare October 22, 2025 13:11
@github-actions github-actions bot temporarily deployed to pull request October 22, 2025 13:16 Inactive
@edolstra edolstra added this pull request to the merge queue Oct 22, 2025
Merged via the queue into main with commit 60db4b9 Oct 22, 2025
35 checks passed
@edolstra edolstra deleted the nario-import-optimization branch October 22, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants