fix(pty-daemon): raise daemon RLIMIT_NOFILE and surface real spawn errno#5067
Conversation
Users on dense shared hosts hit "spawn failed (...): posix_spawnp failed." from the SDK. node-pty's native error drops the errno, and the message is thrown specifically when spawning node-pty's helper fails — i.e. resource exhaustion (EMFILE), not a bad shell/cwd. One org daemon was hosting 100+ worktrees on a single machine against macOS's 256 fd soft default. - DaemonSupervisor: spawn the daemon through `/bin/sh -c 'ulimit -n ...; exec'` (POSIX only) so RLIMIT_NOFILE is raised before exec; exec preserves PID/argv/ process-group, and handoff successors inherit the raised limit. - Pty.spawn: on failure, re-probe the same shell+cwd with spawnSync (which surfaces the real code) and include `errno=` in the wire error, so the next occurrence is diagnosable (EMFILE/EAGAIN/ENOENT) instead of a bare string.
|
Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews. |
|
Ready to review this PR? Stage has broken it down into 2 individual chapters for you:
Chapters generated by Stage for commit f6536a1 on Jun 3, 2026 1:56am UTC. |
📝 WalkthroughWalkthroughTwo independent runtime improvements: daemon process spawning now raises file descriptor limits on non-Windows platforms before executing the daemon binary; and PTY spawn errors now include reprobed errno details to surface underlying shell failures that the native node-pty library would otherwise hide. ChangesDaemon File Descriptor Limit Raising
PTY Spawn Error Diagnosis
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR addresses fd exhaustion on dense shared hosts where the per-org daemon was started with macOS's default 256 soft RLIMIT_NOFILE, causing node-pty to fail silently with
Confidence Score: 4/5Both changes are well-scoped to the failure path; the ulimit wrapper preserves PID and process-group semantics via exec, and the reprobe probe only fires on an already-errored spawn. The ulimit shell wrapper is a battle-tested pattern, exec correctly excludes $0 so the node daemon is invoked with the right arguments, and the Windows code path is untouched. The reprobeErrno helper is non-invasive and only runs after node-pty has already thrown. The one rough edge is that a resolved/transient failure produces errno=ok in the error message, which reads oddly in an exception context but does not affect correctness. No files require special attention; both changed files are straightforward and the logic is confined to daemon-startup and error-reporting paths.
|
| Filename | Overview |
|---|---|
| packages/host-service/src/daemon/DaemonSupervisor.ts | Wraps non-Windows daemon spawn in /bin/sh with ulimit to raise RLIMIT_NOFILE before handing off to the node daemon; exec preserves the PID so adoption and ps-matching remain correct. |
| packages/pty-daemon/src/Pty/Pty.ts | Adds reprobeErrno to recover the actual errno from a failed node-pty spawn by immediately re-running spawnSync; annotates the thrown error with errno= for actionable wire-level diagnostics. Minor: returns the string ok on success, which reads strangely inside an error message. |
Sequence Diagram
sequenceDiagram
participant HS as DaemonSupervisor
participant SH as /bin/sh wrapper
participant D as pty-daemon (node)
participant NT as node-pty native
participant RE as reprobeErrno
HS->>SH: spawn /bin/sh with ulimit script
SH->>SH: ulimit -n 1048576 (raise RLIMIT_NOFILE)
SH->>D: exec node script --socket (PID preserved, limit inherited)
D->>NT: nodePty.spawn(shell, argv, cwd)
alt spawn succeeds
NT-->>D: IPty handle
else spawn fails e.g. EMFILE
NT-->>D: Error posix_spawnp failed, errno undefined
D->>RE: spawnSync(shell, cwd, timeout 1000)
RE-->>D: "probe.error.code = EMFILE"
D-->>HS: "Error spawn failed errno=EMFILE"
end
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
packages/pty-daemon/src/Pty/Pty.ts:152
**`"ok"` in error message is confusing**
When the re-probe succeeds (the EMFILE condition resolved between node-pty's failure and `spawnSync`), the thrown error message reads `errno=ok`, which is contradictory — the caller sees an exception but with a result string that implies success. A label like `"non-sticky"` or `"resolved"` would make it immediately clear that the condition was transient rather than suggesting the probe found no error.
Reviews (1): Last reviewed commit: "fix(pty-daemon): raise daemon RLIMIT_NOF..." | Re-trigger Greptile
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/pty-daemon/src/Pty/Pty.ts`:
- Around line 147-151: The reprobeErrno probe currently calls
childProcess.spawnSync(meta.shell, ["-c", ":"], { cwd: meta.cwd, timeout: 1000,
stdio: "ignore" }) without passing environment, so update the options to include
env: meta.env to match nodePty.spawn behavior; specifically modify the spawnSync
call inside reprobeErrno to pass the meta.env property (env: meta.env) alongside
cwd, timeout, and stdio so environment-driven errno results are reproduced
correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e067bcb7-4917-4aa7-be25-74fd54ae9fa3
📒 Files selected for processing (2)
packages/host-service/src/daemon/DaemonSupervisor.tspackages/pty-daemon/src/Pty/Pty.ts
| const probe = childProcess.spawnSync(meta.shell, ["-c", ":"], { | ||
| cwd: meta.cwd, | ||
| timeout: 1000, | ||
| stdio: "ignore", | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify parity between nodePty.spawn and reprobeErrno spawnSync options in Pty.ts
fd -i 'Pty.ts' | xargs -I{} sh -c '
echo "== {} ==";
rg -n -C3 "nodePty\\.spawn\\(|spawnSync\\(" "{}";
rg -n -C2 "env:\\s*meta\\.env" "{}";
'Repository: superset-sh/superset
Length of output: 958
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Matched files by fd -i 'Pty.ts':"
fd -i 'Pty\.ts$' .
echo
echo "---- reprobeErrno block ----"
sed -n '120,220p' packages/pty-daemon/src/Pty/Pty.ts | cat -n
echo
echo "---- Search for spawnSync options containing env: meta.env ----"
rg -n --context 2 "spawnSync\\(" packages/pty-daemon/src/Pty/Pty.ts
rg -n --context 2 "env:\\s*meta\\.env" packages/pty-daemon/src/Pty/Pty.tsRepository: superset-sh/superset
Length of output: 4552
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "type SessionMeta|interface SessionMeta|SessionMeta" packages/pty-daemon/src/Pty/Pty.ts packages/pty-daemon/src -SRepository: superset-sh/superset
Length of output: 1959
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,120p' packages/pty-daemon/src/protocol/messages.ts | nl -baRepository: superset-sh/superset
Length of output: 107
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,120p' packages/pty-daemon/src/protocol/messages.ts | cat -nRepository: superset-sh/superset
Length of output: 3608
Preserve spawn parity in reprobeErrno by passing meta.env to spawnSync.
reprobeErrno probes with cwd only, while the real nodePty.spawn uses env: meta.env; dropping env can misreport the errno for environment-driven failures.
Suggested patch
function reprobeErrno(meta: SessionMeta): string {
try {
const probe = childProcess.spawnSync(meta.shell, ["-c", ":"], {
cwd: meta.cwd,
+ env: meta.env,
timeout: 1000,
stdio: "ignore",
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const probe = childProcess.spawnSync(meta.shell, ["-c", ":"], { | |
| cwd: meta.cwd, | |
| timeout: 1000, | |
| stdio: "ignore", | |
| }); | |
| const probe = childProcess.spawnSync(meta.shell, ["-c", ":"], { | |
| cwd: meta.cwd, | |
| env: meta.env, | |
| timeout: 1000, | |
| stdio: "ignore", | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/pty-daemon/src/Pty/Pty.ts` around lines 147 - 151, The reprobeErrno
probe currently calls childProcess.spawnSync(meta.shell, ["-c", ":"], { cwd:
meta.cwd, timeout: 1000, stdio: "ignore" }) without passing environment, so
update the options to include env: meta.env to match nodePty.spawn behavior;
specifically modify the spawnSync call inside reprobeErrno to pass the meta.env
property (env: meta.env) alongside cwd, timeout, and stdio so environment-driven
errno results are reproduced correctly.
| timeout: 1000, | ||
| stdio: "ignore", | ||
| }); | ||
| if (!probe.error) return "ok"; |
There was a problem hiding this comment.
"ok" in error message is confusing
When the re-probe succeeds (the EMFILE condition resolved between node-pty's failure and spawnSync), the thrown error message reads errno=ok, which is contradictory — the caller sees an exception but with a result string that implies success. A label like "non-sticky" or "resolved" would make it immediately clear that the condition was transient rather than suggesting the probe found no error.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/pty-daemon/src/Pty/Pty.ts
Line: 152
Comment:
**`"ok"` in error message is confusing**
When the re-probe succeeds (the EMFILE condition resolved between node-pty's failure and `spawnSync`), the thrown error message reads `errno=ok`, which is contradictory — the caller sees an exception but with a result string that implies success. A label like `"non-sticky"` or `"resolved"` would make it immediately clear that the condition was transient rather than suggesting the probe found no error.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
2 issues found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/pty-daemon/src/Pty/Pty.ts">
<violation number="1" location="packages/pty-daemon/src/Pty/Pty.ts:148">
P2: Errno reprobe does not use the same environment as the failing spawn, so reported errno can be incorrect/misleading.</violation>
<violation number="2" location="packages/pty-daemon/src/Pty/Pty.ts:152">
P3: Returning `"ok"` here produces a confusing error message like `errno=ok` in the thrown exception — the caller sees an error but the embedded errno label suggests success. Use a label like `"transient"` or `"non-reproducible"` to clarify that the condition was not replicable at reprobe time.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| function reprobeErrno(meta: SessionMeta): string { | ||
| try { | ||
| const probe = childProcess.spawnSync(meta.shell, ["-c", ":"], { | ||
| cwd: meta.cwd, |
There was a problem hiding this comment.
P2: Errno reprobe does not use the same environment as the failing spawn, so reported errno can be incorrect/misleading.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/pty-daemon/src/Pty/Pty.ts, line 148:
<comment>Errno reprobe does not use the same environment as the failing spawn, so reported errno can be incorrect/misleading.</comment>
<file context>
@@ -142,6 +142,21 @@ function validateDims(cols: number, rows: number): void {
+function reprobeErrno(meta: SessionMeta): string {
+ try {
+ const probe = childProcess.spawnSync(meta.shell, ["-c", ":"], {
+ cwd: meta.cwd,
+ timeout: 1000,
+ stdio: "ignore",
</file context>
| cwd: meta.cwd, | |
| cwd: meta.cwd, | |
| env: meta.env, |
| timeout: 1000, | ||
| stdio: "ignore", | ||
| }); | ||
| if (!probe.error) return "ok"; |
There was a problem hiding this comment.
P3: Returning "ok" here produces a confusing error message like errno=ok in the thrown exception — the caller sees an error but the embedded errno label suggests success. Use a label like "transient" or "non-reproducible" to clarify that the condition was not replicable at reprobe time.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/pty-daemon/src/Pty/Pty.ts, line 152:
<comment>Returning `"ok"` here produces a confusing error message like `errno=ok` in the thrown exception — the caller sees an error but the embedded errno label suggests success. Use a label like `"transient"` or `"non-reproducible"` to clarify that the condition was not replicable at reprobe time.</comment>
<file context>
@@ -142,6 +142,21 @@ function validateDims(cols: number, rows: number): void {
+ timeout: 1000,
+ stdio: "ignore",
+ });
+ if (!probe.error) return "ok";
+ const e = probe.error as NodeJS.ErrnoException;
+ return e.code ?? e.message;
</file context>
| if (!probe.error) return "ok"; | |
| if (!probe.error) return "non-reproducible"; |
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
Bundled fixes since 0.2.21 (no command-surface changes): - #5067 pty-daemon: raise RLIMIT_NOFILE + surface real spawn errno - #5066 projects: restrict deletion to organization owners - #5055 route organization creation through auth hooks Also syncs bun.lock with the desktop 1.12.2 / host-service 0.8.18 bump that landed in e173c89 without a lockfile update.
* chore(desktop): bump version to 1.12.2 (host-service 0.8.17 -> 0.8.18) * release(cli): cut cli v0.2.22 Bundled fixes since 0.2.21 (no command-surface changes): - #5067 pty-daemon: raise RLIMIT_NOFILE + surface real spawn errno - #5066 projects: restrict deletion to organization owners - #5055 route organization creation through auth hooks Also syncs bun.lock with the desktop 1.12.2 / host-service 0.8.18 bump that landed in e173c89 without a lockfile update.
Problem
Users on dense shared hosts were getting this from the SDK:
Diagnosis
Pty.spawn(statSync) had already passed, so the cwd existed — this is not a deleted/missing worktree.pty.forkthrows a bareError("posix_spawnp failed.")with.code: undefined,.errno: undefined— the errno is lost at the boundary.posix_spawnp failed.is thrown specifically when spawning node-pty's helper fails — i.e. resource exhaustion (EMFILE/EAGAIN).thrive-holdings) was hosting 118 live worktrees on a single machine (202 across 7 hosts), creation rate accelerating (1 → 21 → 42 → 56/week), no pruning. The daemon is one process per org per host and never raisedRLIMIT_NOFILE, against macOS's 256 soft default. ~40–50 worktrees with a live agent shell exhausts 256 fds, after which every new PTY spawn fails with EMFILE for every user on that daemon.Changes
DaemonSupervisor.ts— raiseRLIMIT_NOFILESpawn the daemon through
/bin/sh -c 'ulimit -n 1048576 …; exec "$@"'(POSIX only; Windows path unchanged).execpreserves PID/argv/process-group (adoption-by-PID andpsmatching unaffected), and handoff successors the daemon spawns from itself inherit the raised limit.1048576matches systemd'sLimitNOFILEdefault; macOS clamps to the kernel ceiling.Pty.ts— surface the real errnoOn spawn failure, re-probe the same shell+cwd with
spawnSync(which does report the errno) and includeerrno=in the wire error:Only runs on the (rare) failure path; resource errors like EMFILE are sticky enough to reproduce immediately.
Testing
bun run lint(4618 files) ✅ · pty-daemon unit tests 13/13 ✅Serverover a unix socket undernode --experimental-strip-typesatulimit -n 256, fds exhausted → daemon PTY spawn fails → error frame correctly carriescode: ESPAWN+errno=EMFILE. (Smoke test was removed rather than kept, since it can only run under a low ulimit and would self-skip in CI.)terminal disposal cleans up background process groups) is pre-existing — fails identically with this change stashed.Summary by cubic
Raise the daemon’s file descriptor limit to prevent EMFILE-triggered PTY spawn failures on dense hosts, and surface the real errno when a PTY spawn fails. This reduces crashes and makes errors easier to diagnose.
host-service: On POSIX, start the daemon via/bin/sh -c 'ulimit -n 1048576 || ulimit -n "$(ulimit -Hn)" || true; exec "$@"'soRLIMIT_NOFILEis raised before exec and inherited by children; Windows unchanged.pty-daemon: On spawn failure, re-probe the same shell+cwd withspawnSyncand includeerrno=in the error (e.g., EMFILE/EAGAIN/ENOENT); only runs on the failure path.Written for commit f6536a1. Summary will update on new commits.
Summary by CodeRabbit