Limit the number of active curl handles#315
Conversation
Previously, calling queryValidPaths() with a large number (e.g. 100K) of store paths failed because Nix immediately creates a `TransferItem` for each .narinfo, which is then registered as a handle with curl. However curl appears to scale poorly internally: even though only a few downloads are actually started (up to the connections/streams limits), it spends a lot of CPU time dealing with the inactive handles. So the curl thread is sitting at 100% CPU, the active downloads stall and time out, and everything grind to a halt. So now we limit the number of curl handles to http-connections * 5. With this, fetching 100K .narinfo files from localhost succeeds in ~15 seconds.
There can be a long time between the creation of `TransferItem` and the start of the curl download, which can lead to misleading download durations and progress bar status. So now we create the `Activity` and update `startTime` when curl actually starts the download.
📝 WalkthroughWalkthroughLazy Activity construction for TransferItem defers Activity creation until first use; resolver start now triggers lazy initialization. curlFileTransfer gains a maxQueueSize to limit concurrent handles and enforce queue-size checks in worker and state transitions to defer excess items. Changes
Sequence Diagram(s)sequenceDiagram
participant Resolver as libcurl Resolver
participant Item as TransferItem
participant Act as Activity
participant Queue as curlFileTransfer
Resolver->>Item: resolverCallbackWrapper(clientp)
activate Item
Item->>Item: act() (lazy create)
activate Act
Act->>Act: set startTime, init
deactivate Act
Item-->>Resolver: resolver started
deactivate Item
Note over Queue,Item: Worker selects items while checking maxQueueSize
Queue->>Queue: check maxQueueSize
alt queue has space
Queue->>Item: start transfer
else queue full
Queue->>Item: defer item
end
Resolver->>Item: progress(...)
activate Item
Item->>Item: act() (exists)
Item->>Act: progress(...)
deactivate Item
Resolver->>Item: finish(...)
activate Item
Item->>Item: act() (exists)
Item->>Act: finish(...)
deactivate Item
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (6)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/libstore/filetransfer.cc`:
- Around line 521-524: The code uses CURLOPT_RESOLVER_START_FUNCTION and
CURLOPT_RESOLVER_START_DATA without a libcurl version guard; wrap the two
curl_easy_setopt calls that reference CURLOPT_RESOLVER_START_FUNCTION and
resolverCallbackWrapper (and CURLOPT_RESOLVER_START_DATA) with a preprocessor
check for LIBCURL_VERSION_NUM >= 0x073b00 to match the project's existing guards
so builds with older libcurl versions skip these options.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/libstore/filetransfer.cc
🧰 Additional context used
🧬 Code graph analysis (1)
src/libstore/filetransfer.cc (1)
src/libstore/include/nix/store/filetransfer.hh (6)
request(282-282)request(289-289)request(294-294)request(299-299)request(304-304)request(311-311)
🔇 Additional comments (6)
src/libstore/filetransfer.cc (6)
56-56: LGTM!Using
std::unique_ptr<Activity>for lazy initialization is appropriate here. The underscore prefix clearly indicates this is an internal member accessed through theact()accessor.
298-316: LGTM!The lazy initialization pattern is well-implemented. Resetting
startTimewhen the Activity is created ensures accurate download duration reporting, which addresses the misleading progress-bar status mentioned in the PR objectives.
318-329: LGTM!The use of
act().progress(...)ensures the Activity is properly initialized before progress reporting. This is consistent with the lazy initialization pattern.
574-574: LGTM!Correctly uses
act()to ensure the Activity exists before reporting final progress in the success path.
732-733: LGTM!The queue size limit of
httpConnections * 5is a reasonable heuristic that allows some pipelining while preventing curl from being overwhelmed by too many inactive handles. This directly addresses the CPU overhead issue described in the PR objectives.
863-869: LGTM!The queue-size check correctly defers excess items by leaving them in
state->incomingand scheduling a wakeup in 100ms. This prevents starvation while maintaining the limit on active curl handles.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| // This seems to be the earliest libcurl callback that signals that the download is happening, so we can | ||
| // call act(). | ||
| curl_easy_setopt(req, CURLOPT_RESOLVER_START_FUNCTION, resolverCallbackWrapper); | ||
| curl_easy_setopt(req, CURLOPT_RESOLVER_START_DATA, this); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
CURLOPT_RESOLVER_START_FUNCTION libcurl version introduced
💡 Result:
CURLOPT_RESOLVER_START_FUNCTION was added in libcurl 7.59.0 (released March 14, 2018). [1][2]
Sources:
[1] curl manual page for CURLOPT_RESOLVER_START_FUNCTION.
[2] curl 7.59.0 changelog / release notes.
🏁 Script executed:
# First, let's examine the file at the specified lines and context
cd src/libstore && wc -l filetransfer.ccRepository: DeterminateSystems/nix-src
Length of output: 90
🏁 Script executed:
# Get lines 521-524 and surrounding context
sed -n '510,530p' src/libstore/filetransfer.ccRepository: DeterminateSystems/nix-src
Length of output: 972
🏁 Script executed:
# Check the version guards mentioned (lines 435-443 and 745-747)
sed -n '430,450p' src/libstore/filetransfer.ccRepository: DeterminateSystems/nix-src
Length of output: 1223
🏁 Script executed:
# Check lines 745-747
sed -n '740,755p' src/libstore/filetransfer.ccRepository: DeterminateSystems/nix-src
Length of output: 689
🏁 Script executed:
# Search for CURL_VERSION_* guards in the file to understand the pattern
rg -n "CURL_VERSION|HAVE_LIBCURL" src/libstore/filetransfer.cc | head -20Repository: DeterminateSystems/nix-src
Length of output: 570
🏁 Script executed:
# Check project configuration for minimum libcurl version
find . -type f \( -name "*.nix" -o -name "CMakeLists.txt" -o -name "meson.build" -o -name "configure.ac" \) | xargs grep -l libcurl 2>/dev/null | head -5Repository: DeterminateSystems/nix-src
Length of output: 119
Add version guard for CURLOPT_RESOLVER_START_FUNCTION at lines 521-524.
CURLOPT_RESOLVER_START_FUNCTION requires libcurl 7.59.0 (0x073b00) but is used without a version guard. The file consistently protects newer libcurl features with version checks (lines 435, 438, 742, 745), and supports older versions going back to 7.30.0. Wrap this with:
`#if` LIBCURL_VERSION_NUM >= 0x073b00
curl_easy_setopt(req, CURLOPT_RESOLVER_START_FUNCTION, resolverCallbackWrapper);
curl_easy_setopt(req, CURLOPT_RESOLVER_START_DATA, this);
`#endif`
🤖 Prompt for AI Agents
In `@src/libstore/filetransfer.cc` around lines 521 - 524, The code uses
CURLOPT_RESOLVER_START_FUNCTION and CURLOPT_RESOLVER_START_DATA without a
libcurl version guard; wrap the two curl_easy_setopt calls that reference
CURLOPT_RESOLVER_START_FUNCTION and resolverCallbackWrapper (and
CURLOPT_RESOLVER_START_DATA) with a preprocessor check for LIBCURL_VERSION_NUM
>= 0x073b00 to match the project's existing guards so builds with older libcurl
versions skip these options.
Motivation
Upstream: NixOS#14993
Previously, calling
queryValidPaths()with a large number (e.g. 100K) of store paths failed because Nix immediately creates aTransferItemfor each.narinfo, which is then registered as a handle with curl. However curl appears to scale poorly internally: even though only a few downloads are actually started (up to the connections/streams limits), it spends a lot of CPU time dealing with the inactive handles. So the curl thread is sitting at 100% CPU, the active downloads stall and time out, and everything grind to a halt.So now we limit the number of curl handles to
http-connections * 5. With this, fetching 100K.narinfofiles fromlocalhostsucceeds in ~15 seconds.Also, we now create the
Activityassociated with a download later. There can be a long time between the creation ofTransferItemand the start of the curl download, which can lead to misleading download durations and progress bar status. So now we create theActivityand updatestartTimewhen curl actually starts the download.Context
Summary by CodeRabbit
Bug Fixes
Performance
✏️ Tip: You can customize this high-level summary in your review settings.