Skip to content

[codex] Harden PTY daemon auto-update#4460

Merged
Kitenite merged 2 commits into
mainfrom
debug-daemon-change
May 12, 2026
Merged

[codex] Harden PTY daemon auto-update#4460
Kitenite merged 2 commits into
mainfrom
debug-daemon-change

Conversation

@Kitenite
Copy link
Copy Markdown
Collaborator

@Kitenite Kitenite commented May 12, 2026

Summary

  • Fix adopted PTY sessions to wrap inherited master fds with tty.ReadStream instead of a generic fs read stream.
  • Make background daemon auto-update conservative around live sessions: idle stale daemons update automatically, live-session daemons defer to the manual Settings update path.
  • Reject adoption of reachable daemons that cannot answer the version probe, so incompatible/out-of-sync daemons are replaced instead of being treated as healthy.
  • Bump @superset/pty-daemon to 0.2.3 and add unit/real-spawn coverage for the new paths.

Root Cause

The typing/buffering issue came from the fd-handoff path introduced in #3971. Adopted PTY master fds were being read through fs.createReadStream, which is not the right wrapper for TTY fds after daemon handoff. The recent auto-update path made production exercise that latent issue more often.

Behavior

When the daemon is version-skewed after an app update, host-service now probes the adopted daemon. If it is compatible and idle, it updates in the background. If live sessions exist, it keeps them running and leaves the update available for explicit user action. If the daemon is reachable but cannot speak the current probe protocol, adoption is rejected and a fresh daemon is spawned.

Validation

  • bun test packages/host-service/src/daemon/DaemonSupervisor.test.ts
  • node --experimental-strip-types --test packages/host-service/src/daemon/DaemonSupervisor.node-test.ts
  • node --experimental-strip-types --test packages/pty-daemon/test/handoff.test.ts
  • bun test src/protocol src/SessionStore src/handlers src/Pty/Pty.test.ts test/no-encoding-hops.test.ts in packages/pty-daemon
  • bun run typecheck in packages/host-service
  • bun run typecheck in packages/pty-daemon
  • bun run lint
  • git diff --check

After the real-spawn integration run, no test daemon processes remained from this worktree.


Summary by cubic

Hardens PTY daemon auto‑update and adoption to protect live sessions, replace incompatible daemons, and fix adopted PTY read semantics with tty.ReadStream. Bumps @superset/pty-daemon to 0.2.3 with new tests.

  • New Features

    • Background auto‑update defers when live sessions exist or the session list is unavailable; idle daemons update automatically and the manual Settings action remains for live cases.
    • Adoption rejects reachable daemons that cannot answer the version probe (with retry/timeout); they’re terminated and replaced.
    • Dev default: kill stale daemons instead of adopting; opt in to adoption with SUPERSET_PTY_DAEMON_ADOPT_IN_DEV=1.
  • Bug Fixes

    • Adopted PTY sessions use tty.ReadStream for inherited master FDs to restore TTY semantics and fix input buffering.

Written for commit 784a7ae. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes

    • Auto-update now defers when live sessions exist and avoids disruptive force-restarts, preserving running work and session attachment.
    • Adoption now rejects unresponsive/incompatible daemons and ensures spawned daemons exit cleanly when adoption fails.
  • Refactor

    • Centralized and hardened auto-update and adoption flows for more reliable update handoffs and logging.
  • Chores

    • PTY daemon bumped to v0.2.3.
  • Tests

    • Expanded integration tests covering adoption, session survival, and auto-update timing.

Review Change Stack

@capy-ai
Copy link
Copy Markdown

capy-ai Bot commented May 12, 2026

Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 205ae789-7f11-4204-bc93-39c596a8b185

📥 Commits

Reviewing files that changed from the base of the PR and between 05799b9 and 784a7ae.

📒 Files selected for processing (3)
  • packages/host-service/src/daemon/DaemonSupervisor.node-test.ts
  • packages/host-service/src/daemon/DaemonSupervisor.ts
  • packages/pty-daemon/src/Pty/Pty.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/pty-daemon/src/Pty/Pty.ts
  • packages/host-service/src/daemon/DaemonSupervisor.node-test.ts
  • packages/host-service/src/daemon/DaemonSupervisor.ts

📝 Walkthrough

Walkthrough

DaemonSupervisor refactors auto-update to defer when live PTY sessions are active, adds session-aware guards before destructive restarts, improves adoption probing with retries and cleanup, introduces dev adoption control, switches PTY reader to tty.ReadStream, and enhances test infrastructure for deterministic async coordination and multi-supervisor scenarios.

Changes

Auto-update session awareness and deferral

Layer / File(s) Summary
Auto-update core logic and session awareness
packages/host-service/src/daemon/DaemonSupervisor.ts
Constants for timeouts, async runAutoUpdate() method with explicit logging and deferral paths, helpers deferAutoUpdate() and canAutoUpdateForceRestart() that re-check session liveness, and countAliveSessions() utility to decide when smooth handoff is safe.
Auto-update test infrastructure and async coordination
packages/host-service/src/daemon/DaemonSupervisor.test.ts
New test helpers mockListSessions, aliveSession, and flushAutoUpdate() for deterministic async coordination; existing auto-update tests updated to use these helpers and mock session lists for reproducible test execution.
Auto-update integration tests with live sessions
packages/host-service/src/daemon/DaemonSupervisor.node-test.ts, packages/host-service/src/daemon/DaemonSupervisor.test.ts
New integration and unit tests verifying auto-update deferral when live sessions exist, confirming no runUpdate call occurs, updatePending remains visible for manual update, and sessions stay attached to the predecessor daemon.

Adoption lifecycle improvements

Layer / File(s) Summary
Dev adoption control and cleanup
packages/host-service/src/daemon/DaemonSupervisor.ts
Exported shouldKillStaleDaemonForDev() helper gated by environment flag and disabled in production; start() now uses this helper instead of direct NODE_ENV checks; tests cover dev mode, production mode, and dev adoption override scenarios.
Adoption probing with retry and rejection
packages/host-service/src/daemon/DaemonSupervisor.ts
tryAdopt() now uses probeDaemonVersionWithRetry() with timeout governance; on version-probe failure, logs pty_daemon_adopt_rejected, terminates the adopted process tree, and clears the manifest.
Adoption test imports and helpers
packages/host-service/src/daemon/DaemonSupervisor.test.ts
Added Node child_process and fs imports, imported shouldKillStaleDaemonForDev and writePtyDaemonManifest, added process utilities (invokeTryAdopt, waitForProcessExit, isProcessAliveForTest) for adoption test scenarios.
Supervisor instance lifecycle management
packages/host-service/src/daemon/DaemonSupervisor.node-test.ts
dropSupervisorInstance() helper removes in-memory supervisor instances to prevent conflicts when another supervisor adopts a daemon; used in multi-supervisor adoption and external-death scenarios.

PTY daemon reader type migration

Layer / File(s) Summary
Reader implementation switch from fs to tty
packages/pty-daemon/src/Pty/Pty.ts, packages/pty-daemon/package.json
node:tty import added, AdoptedPty reader type changed to tty.ReadStream, constructor uses new tty.ReadStream(fd) instead of fs.createReadStream(); documentation and comments updated; package version bumped to 0.2.3.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nudge the daemons to be kind,
When shells are live, I wait a beat;
Probes and ttys keep things aligned,
Adoptions clean, updates discreet.
A tiny hop for safer ops—huzzah!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[codex] Harden PTY daemon auto-update' directly corresponds to the main change: improving auto-update safety by deferring when live sessions exist and rejecting incompatible daemons.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all required sections: Summary, Root Cause, Behavior, and Validation with specific test commands. It clearly explains the changes and their rationale.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch debug-daemon-change

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR hardens the PTY daemon lifecycle around two root causes: adopted master fds were wrapped with fs.createReadStream (wrong for TTY fds) instead of tty.ReadStream, and the background auto-update path could disrupt live shells. The fix switches to tty.ReadStream for adopted fd I/O, gates the auto-update and force-restart paths behind a live-session check, and rejects adoption of daemons that cannot answer the version probe.

  • Pty.ts: Replaces fs.createReadStream with new tty.ReadStream(fd) so the adopted PTY master fd is read through the correct TTY-aware stream, resolving the typing/buffering regression.
  • DaemonSupervisor.ts: kickoffAutoUpdate is refactored into runAutoUpdate; it now calls listSessions upfront and defers if live shells are present, and forceRestartAfterAutoUpdateFailure gates on a second session check to protect against late-appearing sessions.
  • tryAdopt: Now uses probeDaemonVersionWithRetry (3 s total) and treats a failed probe as a hard rejection rather than allowing adoption with runningVersion = \"unknown\".

Confidence Score: 4/5

Safe to merge; the session-guard and probe-rejection logic is well-tested and the changes do not touch any data path outside the daemon lifecycle.

The tty.ReadStream adoption fix and the session-gated auto-update are both mechanically sound and covered by new unit and integration tests. The two remaining observations are both defended by existing try-catch blocks and do not change visible behavior.

Pty.ts — the fd ownership handoff between tty.ReadStream.destroy() and the explicit fs.closeSync deserves a follow-up comment or explicit autoClose setting to make the intent clear to future readers.

Important Files Changed

Filename Overview
packages/host-service/src/daemon/DaemonSupervisor.ts Core logic change: auto-update now checks for live sessions before updating/force-restarting; adoption rejects and terminates daemons that cannot answer the version probe. One minor redundant guard (!!probed) remains after the early-return.
packages/pty-daemon/src/Pty/Pty.ts Fixes TTY buffering bug by replacing fs.createReadStream with tty.ReadStream for adopted PTY master fds. A double-close pattern is introduced (stream's libuv handle + explicit fs.closeSync), which is guarded but worth clarifying.
packages/host-service/src/daemon/DaemonSupervisor.test.ts Adds unit tests for live-session deferral, force-restart skipping on late-arriving sessions, and adoption rejection on failed version probe.
packages/host-service/src/daemon/DaemonSupervisor.node-test.ts Real-spawn integration tests updated: the force-restart test is replaced with live-session deferral, and dropSupervisorInstance helper is extracted.
packages/pty-daemon/package.json Version bump from 0.2.2 to 0.2.3, consistent with bun.lock.
bun.lock Lock file updated to reflect pty-daemon version bump to 0.2.3.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ensure orgId] --> B{Instance cached?}
    B -- Yes --> Z[Return cached instance]
    B -- No --> C[tryAdopt]
    C --> D{Manifest exists & PID alive?}
    D -- No --> SPAWN[spawn]
    D -- Yes --> E{Socket reachable?}
    E -- No --> KILL1[Terminate + remove manifest] --> SPAWN
    E -- Yes --> F{probeDaemonVersionWithRetry 3s with retries}
    F -- null/timeout --> REJECT[Log adopt_rejected, Terminate + remove manifest] --> SPAWN
    F -- version string --> ADOPTED[Return adopted instance]
    ADOPTED --> H[kickoffAutoUpdate fire-and-forget]
    H --> I{listSessions up to 1.5s}
    I -- null timeout --> DEFER[Defer: session_list_unavailable]
    I -- alive sessions > 0 --> DEFER2[Defer: live_sessions_present]
    I -- no live sessions --> J[startUpdate / runUpdate]
    J -- ok:true --> OK[Log auto_update_ok]
    J -- ok:false --> K{canAutoUpdateForceRestart}
    K --> L{listSessions up to 1.5s, any alive?}
    L -- alive sessions > 0 --> SKIP[Skip force-restart: live_sessions_present]
    L -- none --> FRESTART[forceRestart]
Loading

Comments Outside Diff (1)

  1. packages/pty-daemon/src/Pty/Pty.ts, line 237-247 (link)

    P2 Potential fd double-close with tty.ReadStream ownership

    tty.ReadStream (via net.Socket) registers the fd with a libuv TTY handle via uv_tty_init. When destroy() is called, libuv queues an async uv_close that will call close(fd) one event-loop tick later. The synchronous fs.closeSync(this.fd) runs first and closes the fd correctly, but then libuv's deferred close fires on the now-closed fd, producing an EBADF that libuv handles internally. This is safe in practice, but if any file-open call races between the two close events and reuses the same fd number, libuv would silently close the wrong descriptor. The try/catch comment says "already closed" but the actual close order is closeSync → libuv EBADF, which is the reverse of what the comment implies. Worth either removing the fs.closeSync and relying on tty.ReadStream's own cleanup, or setting autoClose: false explicitly as the old code did.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/pty-daemon/src/Pty/Pty.ts
    Line: 237-247
    
    Comment:
    **Potential fd double-close with `tty.ReadStream` ownership**
    
    `tty.ReadStream` (via `net.Socket`) registers the fd with a libuv TTY handle via `uv_tty_init`. When `destroy()` is called, libuv queues an async `uv_close` that will call `close(fd)` one event-loop tick later. The synchronous `fs.closeSync(this.fd)` runs first and closes the fd correctly, but then libuv's deferred close fires on the now-closed fd, producing an `EBADF` that libuv handles internally. This is safe in practice, but if any file-open call races between the two close events and reuses the same fd number, libuv would silently close the wrong descriptor. The `try/catch` comment says "already closed" but the actual close order is `closeSync → libuv EBADF`, which is the reverse of what the comment implies. Worth either removing the `fs.closeSync` and relying on `tty.ReadStream`'s own cleanup, or setting `autoClose: false` explicitly as the old code did.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/host-service/src/daemon/DaemonSupervisor.ts:876-878
After the `if (!probed)` early-return guard on line 865, `probed` is guaranteed to be a non-null, non-empty string — `!!probed` is unconditionally `true`. The guard is harmless but misleads the reader into thinking a falsy `probed` could reach this line.

```suggestion
		const runningVersion = probed;
		const updatePending =
			!semver.satisfies(probed, `>=${EXPECTED_DAEMON_VERSION}`);
```

### Issue 2 of 2
packages/pty-daemon/src/Pty/Pty.ts:237-247
**Potential fd double-close with `tty.ReadStream` ownership**

`tty.ReadStream` (via `net.Socket`) registers the fd with a libuv TTY handle via `uv_tty_init`. When `destroy()` is called, libuv queues an async `uv_close` that will call `close(fd)` one event-loop tick later. The synchronous `fs.closeSync(this.fd)` runs first and closes the fd correctly, but then libuv's deferred close fires on the now-closed fd, producing an `EBADF` that libuv handles internally. This is safe in practice, but if any file-open call races between the two close events and reuses the same fd number, libuv would silently close the wrong descriptor. The `try/catch` comment says "already closed" but the actual close order is `closeSync → libuv EBADF`, which is the reverse of what the comment implies. Worth either removing the `fs.closeSync` and relying on `tty.ReadStream`'s own cleanup, or setting `autoClose: false` explicitly as the old code did.

Reviews (1): Last reviewed commit: "harden pty daemon auto-update" | Re-trigger Greptile

Comment on lines 876 to 878
const runningVersion = probed;
const updatePending =
!!probed && !semver.satisfies(probed, `>=${EXPECTED_DAEMON_VERSION}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 After the if (!probed) early-return guard on line 865, probed is guaranteed to be a non-null, non-empty string — !!probed is unconditionally true. The guard is harmless but misleads the reader into thinking a falsy probed could reach this line.

Suggested change
const runningVersion = probed;
const updatePending =
!!probed && !semver.satisfies(probed, `>=${EXPECTED_DAEMON_VERSION}`);
const runningVersion = probed;
const updatePending =
!semver.satisfies(probed, `>=${EXPECTED_DAEMON_VERSION}`);
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/daemon/DaemonSupervisor.ts
Line: 876-878

Comment:
After the `if (!probed)` early-return guard on line 865, `probed` is guaranteed to be a non-null, non-empty string — `!!probed` is unconditionally `true`. The guard is harmless but misleads the reader into thinking a falsy `probed` could reach this line.

```suggestion
		const runningVersion = probed;
		const updatePending =
			!semver.satisfies(probed, `>=${EXPECTED_DAEMON_VERSION}`);
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/host-service/src/daemon/DaemonSupervisor.node-test.ts (1)

713-724: ⚡ Quick win

Simplify the mock implementation for clarity.

The current mock uses an IIFE within the return object literal, which is unnecessarily complex and makes the code harder to understand. The IIFE executes when the async function body runs (when constructing the return value), which is correct but non-idiomatic.

♻️ Clearer implementation
 			let runUpdateCalled = false;
 			(
 				sup as unknown as {
 					runUpdate: () => Promise<{ ok: false; reason: string }>;
 				}
-			).runUpdate = async () => ({
-				ok: false as const,
-				reason: (() => {
-					runUpdateCalled = true;
-					return "auto-update should have deferred before runUpdate";
-				})(),
-			});
+			).runUpdate = async () => {
+				runUpdateCalled = true;
+				return {
+					ok: false as const,
+					reason: "auto-update should have deferred before runUpdate"
+				};
+			};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/host-service/src/daemon/DaemonSupervisor.node-test.ts` around lines
713 - 724, The mock of sup.runUpdate is overcomplicated by using an IIFE to set
runUpdateCalled inside the returned object; simplify it by making runUpdate an
async function that sets runUpdateCalled = true and then returns { ok: false,
reason: "auto-update should have deferred before runUpdate" } directly. Locate
the assignment to (sup as unknown as { runUpdate: ... }).runUpdate and replace
the IIFE-based return with a straightforward async function body that flips
runUpdateCalled and returns the literal reason.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/host-service/src/daemon/DaemonSupervisor.node-test.ts`:
- Around line 713-724: The mock of sup.runUpdate is overcomplicated by using an
IIFE to set runUpdateCalled inside the returned object; simplify it by making
runUpdate an async function that sets runUpdateCalled = true and then returns {
ok: false, reason: "auto-update should have deferred before runUpdate" }
directly. Locate the assignment to (sup as unknown as { runUpdate: ...
}).runUpdate and replace the IIFE-based return with a straightforward async
function body that flips runUpdateCalled and returns the literal reason.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 706bab64-e9a5-4728-9f06-dd34c595eac3

📥 Commits

Reviewing files that changed from the base of the PR and between bb31359 and 05799b9.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • packages/host-service/src/daemon/DaemonSupervisor.node-test.ts
  • packages/host-service/src/daemon/DaemonSupervisor.test.ts
  • packages/host-service/src/daemon/DaemonSupervisor.ts
  • packages/pty-daemon/package.json
  • packages/pty-daemon/src/Pty/Pty.ts

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 6 files

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

🧹 Preview Cleanup Complete

The following preview resources have been cleaned up:

  • ✅ Neon database branch

Thank you for your contribution! 🎉

@Kitenite Kitenite merged commit 9662f0c into main May 12, 2026
17 checks passed
@Kitenite Kitenite deleted the debug-daemon-change branch May 12, 2026 19:26
MocA-Love pushed a commit to MocA-Love/superset that referenced this pull request May 25, 2026
* harden pty daemon auto-update

* address pty daemon review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant