Skip to content

Limit the number of active curl handles#315

Merged
edolstra merged 3 commits intomainfrom
eelcodolstra/nix-243-hang-when-querying-many-narinfos-in-nix-3121
Jan 14, 2026
Merged

Limit the number of active curl handles#315
edolstra merged 3 commits intomainfrom
eelcodolstra/nix-243-hang-when-querying-many-narinfos-in-nix-3121

Conversation

@edolstra
Copy link
Collaborator

@edolstra edolstra commented Jan 14, 2026

Motivation

Upstream: NixOS#14993

Previously, calling queryValidPaths() with a large number (e.g. 100K) of store paths failed because Nix immediately creates a TransferItem for each .narinfo, which is then registered as a handle with curl. However curl appears to scale poorly internally: even though only a few downloads are actually started (up to the connections/streams limits), it spends a lot of CPU time dealing with the inactive handles. So the curl thread is sitting at 100% CPU, the active downloads stall and time out, and everything grind to a halt.

So now we limit the number of curl handles to http-connections * 5. With this, fetching 100K .narinfo files from localhost succeeds in ~15 seconds.

Also, we now create the Activity associated with a download later. There can be a long time between the creation of TransferItem and the start of the curl download, which can lead to misleading download durations and progress bar status. So now we create the Activity and update startTime when curl actually starts the download.

Context

Summary by CodeRabbit

  • Bug Fixes

    • More accurate and consistent download progress reporting and completion status.
    • Fixed race conditions so progress updates occur only after downloads have actually started.
  • Performance

    • Improved concurrent-download handling with stricter queueing to cap active transfers and reduce overload.

✏️ Tip: You can customize this high-level summary in your review settings.

Previously, calling queryValidPaths() with a large number (e.g. 100K)
of store paths failed because Nix immediately creates a `TransferItem`
for each .narinfo, which is then registered as a handle with
curl. However curl appears to scale poorly internally: even though
only a few downloads are actually started (up to the
connections/streams limits), it spends a lot of CPU time dealing with
the inactive handles. So the curl thread is sitting at 100% CPU, the
active downloads stall and time out, and everything grind to a halt.

So now we limit the number of curl handles to http-connections *
5. With this, fetching 100K .narinfo files from localhost succeeds in
~15 seconds.
There can be a long time between the creation of `TransferItem` and
the start of the curl download, which can lead to misleading download
durations and progress bar status. So now we create the `Activity` and
update `startTime` when curl actually starts the download.
@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

Lazy Activity construction for TransferItem defers Activity creation until first use; resolver start now triggers lazy initialization. curlFileTransfer gains a maxQueueSize to limit concurrent handles and enforce queue-size checks in worker and state transitions to defer excess items.

Changes

Cohort / File(s) Summary
File transfer core
src/libstore/filetransfer.cc
Replaces in-place Activity with std::unique_ptr<Activity> _act; adds Activity & act() accessor and static resolverCallbackWrapper(...) to ensure lazy Activity creation on resolver start; updates progress/finish call sites to use act(); adds const size_t maxQueueSize = fileTransferSettings.httpConnections.get() * 5; to curlFileTransfer; enforces queue-size checks in worker selection and state transitions to defer items when full. Net: +46/-9 lines.

Sequence Diagram(s)

sequenceDiagram
    participant Resolver as libcurl Resolver
    participant Item as TransferItem
    participant Act as Activity
    participant Queue as curlFileTransfer

    Resolver->>Item: resolverCallbackWrapper(clientp)
    activate Item
    Item->>Item: act() (lazy create)
    activate Act
    Act->>Act: set startTime, init
    deactivate Act
    Item-->>Resolver: resolver started
    deactivate Item

    Note over Queue,Item: Worker selects items while checking maxQueueSize
    Queue->>Queue: check maxQueueSize
    alt queue has space
        Queue->>Item: start transfer
    else queue full
        Queue->>Item: defer item
    end

    Resolver->>Item: progress(...)
    activate Item
    Item->>Item: act() (exists)
    Item->>Act: progress(...)
    deactivate Item

    Resolver->>Item: finish(...)
    activate Item
    Item->>Item: act() (exists)
    Item->>Act: finish(...)
    deactivate Item
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I waited till the very start,
Then stitched my activity to the heart—
Queues hold fast, but when space is true,
I hop, report, and finish too! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Limit the number of active curl handles' directly and accurately reflects the main change in the pull request: adding maxQueueSize to cap concurrent curl handles and enforce queue-size checks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
src/libstore/filetransfer.cc (1)

397-404: Consider swallowing the exception instead of aborting the transfer.

Returning 1 from the resolver callback aborts the transfer. If act() throws (e.g., OOM during Activity allocation), the download fails even though it could otherwise succeed — the Activity would be created in progressCallback anyway.

The prior review suggested swallowing the exception for this reason.

♻️ Suggested change
     static int resolverCallbackWrapper(void *, void *, void * clientp) noexcept
     try {
         // Create the `Activity` associated with this download.
         ((TransferItem *) clientp)->act();
         return 0;
     } catch (...) {
-        return 1;
+        // Swallow the exception; Activity will be created in progressCallback if needed.
+        return 0;
     }

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c08722c and fa2250a.

📒 Files selected for processing (1)
  • src/libstore/filetransfer.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build_aarch64-darwin / build
  • GitHub Check: build_x86_64-linux / build
🔇 Additional comments (6)
src/libstore/filetransfer.cc (6)

56-56: LGTM!

Using std::unique_ptr<Activity> enables the lazy initialization pattern correctly. This avoids heap allocation until the Activity is actually needed.


298-316: LGTM!

The lazy act() accessor is well-designed:

  • Defers Activity creation until the download actually starts
  • Correctly resets startTime when Activity is created, so durations reflect actual download time
  • Thread-safe since all call sites (resolver callback, progress callback, finish) execute on the worker thread

734-735: LGTM!

The queue size limit of httpConnections * 5 aligns with the PR objective. Computing it once at construction is appropriate since this setting rarely changes at runtime.


865-871: LGTM!

The queue-limiting logic correctly caps active curl handles:

  • The >= comparison ensures we never exceed maxQueueSize
  • The 100ms wakeup prevents busy-waiting while allowing prompt retry when handles complete
  • Items remain in the priority queue and will be processed in embargo order once space is available

320-320: LGTM!

Consistent use of lazy act() accessor ensures the Activity exists before progress reporting.


576-576: LGTM!

Final progress report via act().progress() ensures the Activity is created even if the resolver callback or progress callback paths were somehow bypassed.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/libstore/filetransfer.cc`:
- Around line 521-524: The code uses CURLOPT_RESOLVER_START_FUNCTION and
CURLOPT_RESOLVER_START_DATA without a libcurl version guard; wrap the two
curl_easy_setopt calls that reference CURLOPT_RESOLVER_START_FUNCTION and
resolverCallbackWrapper (and CURLOPT_RESOLVER_START_DATA) with a preprocessor
check for LIBCURL_VERSION_NUM >= 0x073b00 to match the project's existing guards
so builds with older libcurl versions skip these options.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e66eb48 and c08722c.

📒 Files selected for processing (1)
  • src/libstore/filetransfer.cc
🧰 Additional context used
🧬 Code graph analysis (1)
src/libstore/filetransfer.cc (1)
src/libstore/include/nix/store/filetransfer.hh (6)
  • request (282-282)
  • request (289-289)
  • request (294-294)
  • request (299-299)
  • request (304-304)
  • request (311-311)
🔇 Additional comments (6)
src/libstore/filetransfer.cc (6)

56-56: LGTM!

Using std::unique_ptr<Activity> for lazy initialization is appropriate here. The underscore prefix clearly indicates this is an internal member accessed through the act() accessor.


298-316: LGTM!

The lazy initialization pattern is well-implemented. Resetting startTime when the Activity is created ensures accurate download duration reporting, which addresses the misleading progress-bar status mentioned in the PR objectives.


318-329: LGTM!

The use of act().progress(...) ensures the Activity is properly initialized before progress reporting. This is consistent with the lazy initialization pattern.


574-574: LGTM!

Correctly uses act() to ensure the Activity exists before reporting final progress in the success path.


732-733: LGTM!

The queue size limit of httpConnections * 5 is a reasonable heuristic that allows some pipelining while preventing curl from being overwhelmed by too many inactive handles. This directly addresses the CPU overhead issue described in the PR objectives.


863-869: LGTM!

The queue-size check correctly defers excess items by leaving them in state->incoming and scheduling a wakeup in 100ms. This prevents starvation while maintaining the limit on active curl handles.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +521 to +524
// This seems to be the earliest libcurl callback that signals that the download is happening, so we can
// call act().
curl_easy_setopt(req, CURLOPT_RESOLVER_START_FUNCTION, resolverCallbackWrapper);
curl_easy_setopt(req, CURLOPT_RESOLVER_START_DATA, this);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

CURLOPT_RESOLVER_START_FUNCTION libcurl version introduced

💡 Result:

CURLOPT_RESOLVER_START_FUNCTION was added in libcurl 7.59.0 (released March 14, 2018). [1][2]

Sources:
[1] curl manual page for CURLOPT_RESOLVER_START_FUNCTION.
[2] curl 7.59.0 changelog / release notes.


🏁 Script executed:

# First, let's examine the file at the specified lines and context
cd src/libstore && wc -l filetransfer.cc

Repository: DeterminateSystems/nix-src

Length of output: 90


🏁 Script executed:

# Get lines 521-524 and surrounding context
sed -n '510,530p' src/libstore/filetransfer.cc

Repository: DeterminateSystems/nix-src

Length of output: 972


🏁 Script executed:

# Check the version guards mentioned (lines 435-443 and 745-747)
sed -n '430,450p' src/libstore/filetransfer.cc

Repository: DeterminateSystems/nix-src

Length of output: 1223


🏁 Script executed:

# Check lines 745-747
sed -n '740,755p' src/libstore/filetransfer.cc

Repository: DeterminateSystems/nix-src

Length of output: 689


🏁 Script executed:

# Search for CURL_VERSION_* guards in the file to understand the pattern
rg -n "CURL_VERSION|HAVE_LIBCURL" src/libstore/filetransfer.cc | head -20

Repository: DeterminateSystems/nix-src

Length of output: 570


🏁 Script executed:

# Check project configuration for minimum libcurl version
find . -type f \( -name "*.nix" -o -name "CMakeLists.txt" -o -name "meson.build" -o -name "configure.ac" \) | xargs grep -l libcurl 2>/dev/null | head -5

Repository: DeterminateSystems/nix-src

Length of output: 119


Add version guard for CURLOPT_RESOLVER_START_FUNCTION at lines 521-524.

CURLOPT_RESOLVER_START_FUNCTION requires libcurl 7.59.0 (0x073b00) but is used without a version guard. The file consistently protects newer libcurl features with version checks (lines 435, 438, 742, 745), and supports older versions going back to 7.30.0. Wrap this with:

`#if` LIBCURL_VERSION_NUM >= 0x073b00
curl_easy_setopt(req, CURLOPT_RESOLVER_START_FUNCTION, resolverCallbackWrapper);
curl_easy_setopt(req, CURLOPT_RESOLVER_START_DATA, this);
`#endif`
🤖 Prompt for AI Agents
In `@src/libstore/filetransfer.cc` around lines 521 - 524, The code uses
CURLOPT_RESOLVER_START_FUNCTION and CURLOPT_RESOLVER_START_DATA without a
libcurl version guard; wrap the two curl_easy_setopt calls that reference
CURLOPT_RESOLVER_START_FUNCTION and resolverCallbackWrapper (and
CURLOPT_RESOLVER_START_DATA) with a preprocessor check for LIBCURL_VERSION_NUM
>= 0x073b00 to match the project's existing guards so builds with older libcurl
versions skip these options.

Copy link
Member

@cole-h cole-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@edolstra edolstra enabled auto-merge January 14, 2026 18:43
@github-actions
Copy link

github-actions bot commented Jan 14, 2026

@github-actions github-actions bot temporarily deployed to pull request January 14, 2026 18:46 Inactive
@edolstra edolstra added this pull request to the merge queue Jan 14, 2026
Merged via the queue into main with commit 316d552 Jan 14, 2026
28 checks passed
@edolstra edolstra deleted the eelcodolstra/nix-243-hang-when-querying-many-narinfos-in-nix-3121 branch January 14, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants