fix(shell): bound pipe read to MaxOutputChars before truncating (#1293) by Aaronontheweb · Pull Request #1298 · netclaw-dev/netclaw

Aaronontheweb · 2026-06-02T15:02:03Z

Summary

shell_execute previously called ReadToEndAsync on both stdout and stderr, buffering the entire child output before any size limit was applied. A command emitting hundreds of MB (e.g. kubectl logs, a large curl response) would allocate the full payload — plus several derived copies for StringBuilder, ToString(), Redact(), and the base64 inflation for LOH purposes — before TruncateOutput ran. The 32k MaxOutputChars cap protected the LLM's context window, not the process.
Replaces ReadToEndAsync with BoundedDrainAsync: reads into a head + tail ring-buffer window capped at MaxOutputChars. Chars beyond the cap are discarded on read but the pipe continues to drain so a running child never deadlocks on a full pipe buffer (the existing deadlock-prevention comment and CancellationToken.None rationale are preserved).
Redaction and result assembly now operate on the already-bounded strings — not the raw full output.
The model sees a head+tail view (...\n...\n...) when truncated, so tail-of-log content (error summaries, final exit status) survives instead of being silently dropped by head-only truncation.

Test plan

All existing ShellToolTests pass (30 total, including the updated Output_truncation_applies assertion)
6 new BoundedDrain_* unit tests covering: short output returned verbatim, empty input, output exactly at cap, long output with distinct head/tail markers, even head/tail split, disabled cap falls back to unbounded read
dotnet slopwatch analyze — 0 issues

Closes #1293.

netclawd builds on the Web SDK, which defaults ServerGarbageCollection to true. The daemon is a single-tenant, low-concurrency process typically run in a memory-limited container, where Server GC inflates peak RSS and is slow to return memory to the OS — turning transient allocation spikes into cgroup OOM kills. Switch to Workstation GC with background collection. Verified System.GC.Server=false in the generated netclawd.runtimeconfig.json.

…law-dev#1293) Previously ReadToEndAsync drained the entire child stdout/stderr into memory before TruncateOutput applied the cap. A command emitting 300MB of output (kubectl logs, curl of a large response) would allocate the full payload plus multiple derived copies — all on the LOH — before anything was discarded. The 32k cap protected the LLM context, not the process. Replace ReadToEndAsync with BoundedDrainAsync, which reads into a head+tail ring-buffer window capped at MaxOutputChars. Chars beyond the cap are discarded on read but the pipe continues to drain so a running child never deadlocks on a full pipe buffer. Redaction and assembly now operate on the already-bounded strings instead of the raw full output. The model sees a head+tail view with a "..." separator when output is truncated, so tail-of-log content (error summaries, final status) is preserved instead of being silently dropped by head-only truncation.

…flow + hygiene hardening (netclaw-dev#1293) The initial bounded-drain fix capped what the model sees but not what the drain allocates. A BenchmarkDotNet harness (benchmarks/Netclaw.Benchmarks) isolating the drain loop surfaced that allocation still scaled with total output (~1MB for 50M chars) plus two regressions on the common path. Rework the drain so allocation is O(cap), not O(total output): - allocate the tail ring lazily, only once the head fills, so the common case (output under the cap) allocates head-only; - read through ReadAsync(Memory<char>), which returns a non-allocating ValueTask when the pipe already has data buffered, and pool the scratch read buffer so it never lands on the LOH; - write into the ring with block copies (at most two per chunk) instead of a per-char modulo loop. Result: allocation is flat at ~188KB whether the child prints 1M or 50M chars, zero Gen2/LOH collections, and the large-output path is ~26x faster. The common small-output case is faster and lighter than the original ReadToEnd path. Hardening from review: - long totalChars and overflow-safe headCap so a multi-GB flood (which the bounded drain now lets the process keep producing) can't wrap the truncation check, and a near-int.MaxValue MaxOutputChars can't overflow headCap to a negative StringBuilder capacity; - Return the pooled buffer with clearArray:true so raw stdout/stderr (possibly secrets) is wiped from the shared pool; - reconstruct the result in place on the head StringBuilder, dropping a second builder and a head-sized copy on the truncation path; - correct the disabled-cap comment (0 is an explicit opt-out, not "previous behaviour") and the benchmark reader/csproj comments. Add a chunked-reader test exercising the ring's wraparound + start-advance path, which the StringReader-based tests (single 4KB read) did not cover.

Aaronontheweb · 2026-06-03T04:00:05Z

Benchmark results

BenchmarkDotNet ([MemoryDiagnoser]) over the benchmarks/Netclaw.Benchmarks harness added in this PR. The benchmark isolates the drain loop — it feeds BoundedDrainAsync a synthetic in-memory TextReader (no process spawn, no real pipe I/O), so the numbers reflect only the cost of turning a stream into a bounded, redaction-ready string. Cap is the production default MaxOutputChars = 32_000. Baseline is the old path (ReadToEndAsync + TruncateOutput).

Run config: --job short (3 iterations) on the optimized drain. Allocation figures (MemoryDiagnoser) are deterministic; the time error bars are wide on the small inputs due to the short job, but the order-of-magnitude story holds. The post-review hardening (in-place reconstruction on the truncated path) only reduces allocation further than shown here.

TotalChars	Implementation	Mean	Allocated	Gen2
1,000	ReadToEnd + Truncate (old)	1,971 ns	10.53 KB	—
1,000	BoundedDrainAsync (new)	521 ns	10.23 KB	—
32,000	ReadToEnd + Truncate (old)	12,409 ns	127.00 KB	—
32,000	BoundedDrainAsync (new)	8,146 ns	156.76 KB	—
1,000,000	ReadToEnd + Truncate (old)	1,502,686 ns	4,005.85 KB	498
1,000,000	BoundedDrainAsync (new)	51,219 ns	188.41 KB	—
50,000,000	ReadToEnd + Truncate (old)	54,004,217 ns	197,197.51 KB	2,600
50,000,000	BoundedDrainAsync (new)	2,042,472 ns	188.41 KB	—

What the numbers show

Allocation is bounded by the cap, not the output. The new path allocates 188.41 KB at both 1M and 50M chars — identical to the byte. The old path scales with total output (193 MB at 50M chars). That O(cap) flatness is the property that stops the cgroup OOM-kill on a memory-limited daemon (shell_execute buffers entire child output into memory before truncating (LOH/OOM risk) #1293).
Zero Gen2 / LOH collections on the new path at every size. The old path forces 2,600 Gen2 collections per 1k ops at 50M chars — that GC pressure was the kill mechanism.
Faster across the board, despite doing more bookkeeping: ~26× at 50M chars (54 ms → 2 ms), ~29× at 1M.
The common small-output case is faster and lighter than the old ReadToEnd path (521 ns / 10.23 KB vs 1,971 ns / 10.53 KB at 1,000 chars).

The one spot slightly above baseline is exactly at the cap (32k: 156.76 KB vs 127 KB) — head and tail buffers both allocate right at the boundary where nothing is truncated. It's bounded (~30 KB), only at that precise size, and the path is still ~34% faster there.

Reproduce: dotnet run -c Release --project benchmarks/Netclaw.Benchmarks -- --filter '*ShellDrain*' --job short

Output_truncation_applies used `python -c` to generate output on Windows, but the Windows CI runner resolves `python` to the Microsoft Store stub, which writes its message to stderr. With the new per-stream truncation markers the truncation then lands on stderr ("[stderr truncated"), so the test's "[stdout truncated" assertion failed on windows-latest only. Use `echo` with a long literal instead — a builtin on both bash and cmd.exe that deterministically writes >50 chars to stdout, with no interpreter dependency.

Aaronontheweb added 2 commits June 1, 2026 19:19

Aaronontheweb force-pushed the fix/1293-shell-bounded-drain branch from e838662 to a648cc9 Compare June 2, 2026 15:09

Aaronontheweb added tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc. performance Memory, CPU, and I/O optimization issues. labels Jun 3, 2026

Aaronontheweb enabled auto-merge (squash) June 3, 2026 03:58

Merge branch 'dev' into fix/1293-shell-bounded-drain

1aba751

This was referenced Jun 3, 2026

BackgroundJobExecutionActor buffers full process output in memory (same OOM shape as #1293) #1300

Closed

file_read default path materializes the whole file (OOM risk) and never redacts secrets #1301

Closed

Aaronontheweb added 2 commits June 2, 2026 23:09

Merge branch 'dev' into fix/1293-shell-bounded-drain

3c80e2b

Aaronontheweb merged commit cfa1550 into netclaw-dev:dev Jun 3, 2026
14 checks passed

Aaronontheweb mentioned this pull request Jun 3, 2026

Plan: bound tool output with file spill (closes #1300, #1301) #1305

Merged

Aaronontheweb deleted the fix/1293-shell-bounded-drain branch June 3, 2026 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(shell): bound pipe read to MaxOutputChars before truncating (#1293)#1298

fix(shell): bound pipe read to MaxOutputChars before truncating (#1293)#1298
Aaronontheweb merged 6 commits into
netclaw-dev:devfrom
Aaronontheweb:fix/1293-shell-bounded-drain

Aaronontheweb commented Jun 2, 2026

Uh oh!

Aaronontheweb commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Jun 2, 2026

Summary

Test plan

Uh oh!

Aaronontheweb commented Jun 3, 2026

Benchmark results

What the numbers show

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant