feat: pty-daemon integration — terminal sessions survive host-service restarts#3896
feat: pty-daemon integration — terminal sessions survive host-service restarts#3896
Conversation
New package @superset/pty-daemon implementing the long-lived PTY-owning process described in apps/desktop/plans/20260429-pty-daemon-implementation.md. This PR adds the daemon in isolation; host-service integration lands in a follow-up PR so both can be reviewed independently. What's in: - Versioned Unix-socket protocol (length-prefixed JSON frames; hello/ack handshake; open/input/resize/close/list/subscribe/unsubscribe ops). - Pty wrapper around node-pty with dim validation. - SessionStore: in-memory map + 64KB ring buffer per session. No persistence — explicitly out of scope per the v1 lessons. - Server: AF_UNIX SOCK_STREAM accept loop, file-mode 0600 auth boundary, per-connection subscription set, output/exit fan-out. - Handlers: pure functions over (store, conn, msg). Stateless from the client's perspective. - main.ts entrypoint: argv parsing, signal handling, graceful shutdown. Runtime: Node ≥ 20, not Bun. Verified during implementation that node-pty 1.1's master fd setup is incompatible with Bun 1.3's tty.ReadStream (onData/onExit silently never fire). Daemon ships as a Node script in the desktop app bundle; host-service stays on Bun. Tests: 24 unit tests under bun test (protocol framing, SessionStore, handlers with a fake spawn), 6 integration tests under node --test spawning real shells through real Unix sockets. All green. What's NOT in (separate PRs): - Host-service DaemonClient + terminal.ts refactor + manifest adoption. - Daemon-upgrade fd inheritance handoff (Phase 2). - Renderer / WS / tRPC changes (none required; the renderer is unchanged).
Adds an exhaustive control-plane integration test that exercises every usage pattern host-service can throw at the daemon end-to-end (real shells, real Unix socket), plus the production build pipeline matching the host-service pattern. Test coverage (28 integration tests, all passing in ~2.5s): - Handshake variants (non-hello first, unsupported version, mutual picking, duplicate hello) - Session lifecycle (bad dims, duplicate id, ENOENT on missing, instant-exit, SIGKILL hung shell) - I/O patterns (resize during streaming, burst output, multi-byte UTF-8) - Multi-client fan-out (two subscribers, unsubscribe stops delivery, dropped subscriber doesn't crash) - Detach + reattach (late subscriber gets replay, full disconnect → new conn → continues live) - Hostile input (malformed frames, oversized frames, input on dead session) - Concurrency (20 sessions on one conn, 10 conns in parallel) - Server shutdown (in-flight clients disconnect cleanly) - Frame splitting across TCP chunks Reusable test client extracted to test/helpers/client.ts (waitFor, collect, sendRaw, onClose). Found and fixed during the suite: Server.close() now kills owned PTYs synchronously so the daemon process can actually exit (open master fds were keeping the event loop alive). Aligns with the v1-lessons "synchronous teardown only" rule. Production build: build.ts mirrors packages/host-service/build.ts — Bun.build target=node, externalizes node-pty, emits dist/pty-daemon.js that runs under Electron's bundled Node via process.execPath. No new runtime in the desktop bundle. Bun is build-only; same shape as host-service today.
New module packages/host-service/src/terminal/DaemonClient/. Single
long-lived connection to pty-daemon, typed protocol API:
- connect() + handshake, exposing version + protocol number
- open / close / list as request/response promises
- input / resize as fire-and-forget
- subscribe(id, { replay }, { onOutput, onExit }) with multi-local-
subscriber fan-out from one wire subscription; unsubscribe returned
- onDisconnect(cb) for daemon-crash signaling
- dispose() for clean shutdown
Failure model is intentionally dumb: connection-level errors surface
via onDisconnect; the desktop coordinator is responsible for
respawning the daemon and host-service can reconnect by constructing
a new DaemonClient. No in-band reconnect logic.
Adds @superset/pty-daemon as a workspace dependency (host-service was
already on node-pty 1.1; this layers the daemon protocol on top).
Enables allowImportingTsExtensions in host-service tsconfig because
the pty-daemon package's exports map points at .ts source files (Node
ESM requires explicit extensions).
Tests: 5 integration tests against a real Server (node --test):
- connect + handshake exposes daemon version
- open + subscribe + receive output + close
- input forwarded; resize updates dims
- multiple local subscribers fan out from one wire subscription
- disconnect callback fires when daemon goes away
Avoids parameter property shorthand in the constructor — Node's
--experimental-strip-types doesn't allow it.
Doesn't touch terminal.ts yet — that's the next commit on this branch.
Sibling of HostServiceCoordinator that spawns/adopts the long-lived
pty-daemon and feeds its socket path to host-service via
SUPERSET_PTY_DAEMON_SOCKET. PTYs now live in a process whose lifetime
is decoupled from host-service, so host-service restarts don't kill
user shells.
Pieces:
- apps/desktop/src/main/lib/pty-daemon-manifest.ts — sibling of
host-service-manifest.ts. Manifest at
$SUPERSET_HOME_DIR/host/{orgId}/pty-daemon-manifest.json with
pid, socketPath, protocolVersions, daemonVersion, startedAt.
- apps/desktop/src/main/lib/pty-daemon-coordinator.ts — ensure() spawns
detached child or adopts existing daemon (PID alive AND socket
connectable). Same spawn shape as host-service: process.execPath +
bundled script, openRotatingLogFd for stdio, writes manifest after
socket-ready check.
- apps/desktop/src/main/pty-daemon/index.ts — Electron main entry that
imports @superset/pty-daemon's Server and provides argv/signal glue.
Sibling of src/main/host-service/index.ts.
- electron.vite.config.ts: register pty-daemon as a main entry so it
bundles to dist/main/pty-daemon.js next to host-service.js.
- host-service-coordinator: instantiate PtyDaemonCoordinator, ensure
daemon up before each host-service spawn, pass its socket path to
host-service via env. buildEnv signature gains a ptyDaemonSocket
parameter.
- tsconfig: enable allowImportingTsExtensions in apps/desktop and
packages/host-service so transitively imported pty-daemon source
type-checks (Node ESM requires explicit .ts extensions).
What works after this commit:
- Daemon spawns and listens on a 0600 Unix socket per organization.
- host-service receives SUPERSET_PTY_DAEMON_SOCKET in its env.
- Adoption: if a previous daemon is alive + reachable, reuse it.
- Stale daemons (PID alive, socket gone) get killed and respawned.
What does NOT work yet (next commit on this branch):
- terminal.ts in host-service still calls pty.spawn directly. The
daemon spawns but its DaemonClient isn't wired into terminal session
creation. That's the load-bearing refactor; landing it separately so
the coordinator change above can be reviewed in isolation.
The load-bearing change. terminal.ts no longer calls node-pty's spawn; PTY ownership lives in pty-daemon and host-service is a remote control. After this commit, killing host-service does not kill user shells — the daemon's session map and ring buffer survive the restart, and a fresh host-service connects to the existing daemon and re-subscribes with replay. What changed: - New: src/terminal/daemon-client-singleton.ts. Lazy-initialized DaemonClient pulling SUPERSET_PTY_DAEMON_SOCKET from env. Surfaces daemon-disconnect via console.error; the desktop coordinator is responsible for respawning the daemon and restarting host-service. - terminal.ts: replace pty.spawn / pty.onData / pty.onExit with daemon.open + daemon.subscribe(replay:true). The PTY field becomes a thin DaemonPty facade exposing write/resize/kill/onData/onExit unchanged for callers (teardown.ts, etc). - createTerminalSessionInternal becomes async (await daemon.open). All callers updated: trpc/router/terminal launchSession, workspace-creation/setup-terminal startSetupTerminalIfPresent, runtime/teardown runTeardownScript. - session.unsubscribeDaemon is called on disposeSession to release the primary subscription cleanly. - DaemonPty.onData / onExit register additional subscribers via daemon.subscribe; daemon's multi-subscriber fan-out makes this safe. Tests: - pty-daemon: 24 bun unit + 28 node integration → all green - host-service: 37 bun unit + 5 node DaemonClient integration → all green (existing terminal logic still passes its tests; the daemon is wired in but mocked out at the unit level) - Workspace-wide tsc clean across all 27 packages. Build/test plumbing: - DaemonClient.test.ts → DaemonClient.node-test.ts so bun test won't pick it up (node-pty doesn't work under Bun). packages/host-service: new "test:integration" script invokes node. - tooling/typescript/base.json: enable allowImportingTsExtensions globally. Multiple packages now transitively pull in @superset/pty- daemon, whose source uses .ts extension imports (Node ESM requires explicit extensions for directory-style resolution). Roll back the per-package opt-ins added earlier in this branch. What still needs verifying (separate commit/PR): - End-to-end smoke: launch desktop, open a terminal, kill host-service, observe shell survives and renderer reattaches via existing exponential-backoff WebSocket reconnect. The infrastructure is in place; this commit doesn't run the e2e itself.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces a new long‑lived PTY daemon package ( Changes
Sequence DiagramsequenceDiagram
participant Desktop
participant Coordinator as PtyDaemonCoordinator
participant Daemon as "Pty Daemon"
participant HostSpawn as HostServiceCoordinator
participant Host as HostService
participant Client as DaemonClient
Desktop->>Coordinator: ensure(organizationId)
activate Coordinator
Coordinator->>Coordinator: read manifest / adopt?
alt adopt existing daemon
Coordinator-->>Desktop: return socketPath
else spawn new daemon
Coordinator->>Daemon: exec --socket=PATH
Daemon-->>Coordinator: listening on socket
Coordinator-->>Desktop: socketPath
end
deactivate Coordinator
Desktop->>HostSpawn: start host-service with SUPERSET_PTY_DAEMON_SOCKET
HostSpawn->>Host: launch process
Host->>Client: connect()
activate Client
Client->>Daemon: send hello
Daemon-->>Client: hello-ack
Client-->>Host: connected
Host->>Client: open(sessionId, meta)
Client->>Daemon: open
Daemon-->>Client: open-ok
Client-->>Host: open result
Host->>Client: subscribe(sessionId, {replay:true})
Client->>Daemon: subscribe
Daemon-->>Client: output (buffered/live)
Client-->>Host: onOutput callbacks
deactivate Client
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/host-service/src/trpc/router/workspace-creation/shared/finish-checkout.ts (1)
145-159:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnsure progress cleanup always runs when setup-terminal launch fails.
At Line 146, the awaited call can throw and skip
clearProgressat Line 159, leaving stale progress state.Suggested fix
- if (args.runSetupScript) { - const { terminal, warning } = await startSetupTerminalIfPresent({ - ctx, - workspaceId: cloudRow.id, - worktreePath: args.worktreePath, - }); - if (warning) { - warnings.push(warning); - } - if (terminal) { - terminals.push(terminal); - } - } - - clearProgress(args.pendingId); + try { + if (args.runSetupScript) { + const { terminal, warning } = await startSetupTerminalIfPresent({ + ctx, + workspaceId: cloudRow.id, + worktreePath: args.worktreePath, + }); + if (warning) { + warnings.push(warning); + } + if (terminal) { + terminals.push(terminal); + } + } + } finally { + clearProgress(args.pendingId); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/host-service/src/trpc/router/workspace-creation/shared/finish-checkout.ts` around lines 145 - 159, The awaited call to startSetupTerminalIfPresent inside the args.runSetupScript block can throw and skip clearProgress; wrap the startSetupTerminalIfPresent invocation in a try/finally (or try/catch/finally) so that clearProgress(args.pendingId) always runs in the finally block, while preserving existing handling of { terminal, warning } and rethrowing the error if appropriate; reference the startSetupTerminalIfPresent call and clearProgress(args.pendingId) in finish-checkout.ts to locate and apply the change.apps/desktop/src/main/lib/host-service-coordinator.ts (1)
387-406:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRollback
"starting"state when daemon/env bootstrap fails.At Line 398 and Line 400, a thrown error leaves the org instance in
"starting"(already inserted at Line 393) without cleanup/status rollback. That can leave the coordinator in a stuck state after a failed start.🛠️ Proposed fix
this.instances.set(organizationId, instance); this.emitStatus(organizationId, "starting", null); - // Ensure the pty-daemon is up before host-service starts; host-service - // connects to it during boot for terminal ops. - const daemonInstance = await this.ptyDaemon.ensure(organizationId); - - const childEnv = await this.buildEnv( - organizationId, - port, - secret, - config, - daemonInstance.socketPath, - ); + let childEnv: Record<string, string>; + try { + // Ensure the pty-daemon is up before host-service starts; host-service + // connects to it during boot for terminal ops. + const daemonInstance = await this.ptyDaemon.ensure(organizationId); + childEnv = await this.buildEnv( + organizationId, + port, + secret, + config, + daemonInstance.socketPath, + ); + } catch (error) { + this.instances.delete(organizationId); + this.emitStatus(organizationId, "stopped", "starting"); + throw error; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/main/lib/host-service-coordinator.ts` around lines 387 - 406, The code inserts a HostServiceProcess with status "starting" into this.instances before calling this.ptyDaemon.ensure and this.buildEnv so any exception leaves the org stuck in "starting"; wrap the daemon and env bootstrap in a try/catch and on error rollback the inserted instance (either remove the entry from this.instances or set its status to a terminal value like "stopped"/"failed") and call this.emitStatus(organizationId, "<terminal-status>", error) to surface the failure; reference HostServiceProcess, this.instances, this.emitStatus, this.ptyDaemon.ensure and this.buildEnv to locate and update the logic.
🧹 Nitpick comments (5)
packages/pty-daemon/src/protocol/version.ts (1)
3-4: ⚡ Quick winDerive the supported list from the current version constant.
Keeping both literals independent increases the chance of version drift on future protocol bumps.
♻️ Proposed change
export const CURRENT_PROTOCOL_VERSION = 1 as const; -export const SUPPORTED_PROTOCOL_VERSIONS: readonly number[] = [1]; +export const SUPPORTED_PROTOCOL_VERSIONS = [CURRENT_PROTOCOL_VERSION] as const;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/pty-daemon/src/protocol/version.ts` around lines 3 - 4, SUPPORT: Replace the hard-coded SUPPORTED_PROTOCOL_VERSIONS literal with a value derived from the CURRENT_PROTOCOL_VERSION constant so they cannot drift; update the export of SUPPORTED_PROTOCOL_VERSIONS to reference CURRENT_PROTOCOL_VERSION (e.g., construct the array from CURRENT_PROTOCOL_VERSION) while preserving the readonly/const typing and export, ensuring all usages of SUPPORTED_PROTOCOL_VERSIONS continue to work with the derived value.packages/pty-daemon/src/protocol/framing.ts (1)
47-55: ⚡ Quick winAlign one-shot decode with the same max-frame guard used by
FrameDecoder.
decodeFramecurrently skips the 8MB cap check, which makes behavior inconsistent across decode paths.Suggested fix
export function decodeFrame(buf: Buffer): unknown { if (buf.length < HEADER_BYTES) throw new Error("short frame"); const len = buf.readUInt32BE(0); + if (len > MAX_FRAME_BYTES) { + throw new Error(`frame too large: ${len} bytes`); + } if (buf.length !== HEADER_BYTES + len) { throw new Error( `frame length mismatch: header=${len} buf=${buf.length - HEADER_BYTES}`, ); } return JSON.parse(buf.subarray(HEADER_BYTES).toString("utf8")); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/pty-daemon/src/protocol/framing.ts` around lines 47 - 55, The one-shot decoder decodeFrame currently neglects the same max-frame check used by FrameDecoder; update decodeFrame to validate the parsed length (len) against the same maximum frame bytes constant used by FrameDecoder (e.g., MAX_FRAME_BYTES or the equivalent symbol) after reading len and before comparing buf length, and throw the same error when len exceeds that cap to keep behavior consistent with FrameDecoder while still keeping the existing short/frame length mismatch checks involving HEADER_BYTES.packages/pty-daemon/src/Server/Server.ts (1)
245-252: 💤 Low valueSimplify redundant fallback in pickProtocol.
The ternary
(supported.has(CURRENT_PROTOCOL_VERSION) ? null : null)always evaluates tonull, making it equivalent to justreturn best. The?? nullis also redundant sincebestis alreadynullin that branch.♻️ Suggested simplification
function pickProtocol(hello: HelloMessage): number | null { const supported = new Set(SUPPORTED_PROTOCOL_VERSIONS); let best: number | null = null; for (const v of hello.protocols) { if (supported.has(v) && (best === null || v > best)) best = v; } - return best ?? (supported.has(CURRENT_PROTOCOL_VERSION) ? null : null); + return best; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/pty-daemon/src/Server/Server.ts` around lines 245 - 252, The pickProtocol function contains a redundant null-coalescing and ternary fallback: the expression "return best ?? (supported.has(CURRENT_PROTOCOL_VERSION) ? null : null)" always yields best or null, so simplify by returning best directly; update the function pickProtocol (and remove the unused reference to CURRENT_PROTOCOL_VERSION in that return) so it computes best as before (iterating hello.protocols against SUPPORTED_PROTOCOL_VERSIONS) and ends with "return best;".packages/pty-daemon/src/Pty/Pty.ts (1)
35-38: 💤 Low valueClarify the Buffer-to-string cast for future readers.
The
as unknown as stringcast works because node-pty'swrite()accepts both strings and Buffers at runtime, despite TypeScript declarations only showingstring. Consider adding a brief inline comment or linking to node-pty documentation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/pty-daemon/src/Pty/Pty.ts` around lines 35 - 38, Add a brief inline comment above the Pty.write method explaining why the Buffer-to-string double-cast (as unknown as string) is used: note that node-pty's runtime write() accepts Buffers too even though its TypeScript types only declare string, and include a short reference/link to the node-pty write docs for future readers; update the comment that precedes the line this.term.write(data as unknown as string) inside the write(data: Buffer): void method to capture that rationale and point to node-pty's documentation.packages/host-service/src/terminal/terminal.ts (1)
731-755: 💤 Low valueRace window between async session creation and early WebSocket messages.
The fire-and-forget async IIFE keeps the WebSocket open while
createTerminalSessionInternalruns, butonMessagecan fire before the session is registered. If the client sendsinputorresizemessages during this window, they'll be silently dropped becausesessions.get(terminalId)returnsundefined(line 778-779).This is likely acceptable for the expected client behavior (renderer waits for title/replay before sending input), but consider documenting this contract or adding a "pending" state if early messages become an issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/host-service/src/terminal/terminal.ts` around lines 731 - 755, Race condition: onMessage may receive 'input'/'resize' before createTerminalSessionInternal registers the session, causing sessions.get(terminalId) to be undefined and messages to be dropped. Fix by introducing a short-lived pending message queue keyed by terminalId (e.g., pendingMessages: Map<string, Array<{ws, msg}>>), push incoming messages in the onMessage handler when sessions.get(terminalId) is undefined instead of dropping them, and then flush those queued messages right after result.sockets.add(ws) and sendMessage(ws, { type: "title", ... }) in the async IIFE; also ensure to clear the queue on error (when "error" in result) and apply the same queue check in handlers that look up sessions (resize/input) so they either enqueue or handle immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/host-service/src/terminal/daemon-client-singleton.ts`:
- Around line 36-45: When creating the in-flight connection (the connecting
promise from client.connect()), ensure we dispose the underlying DaemonClient if
connect() rejects and make disposeDaemonClient() handle the in-flight connecting
promise too. Change the connecting chain returned by client.connect() (the code
that assigns connecting = client.connect().then(...) .finally(...)) to attach a
.catch that calls client.dispose() (or client.close()/shutdown method) and
rethrows the error, so a failed connect does not leak the created client; also
keep the existing .finally(() => { connecting = null }). Update
disposeDaemonClient() to check for an in-flight connecting promise: await
connecting.catch(() => null) to get the client (if it resolved) and call dispose
on it, and if connecting is still present and unresolved, ensure you either
await its resolution or forcibly dispose the underlying client instance created
before connect() was called so in-flight attempts are cleaned up. Use the
symbols connecting, client.connect(), cached, and disposeDaemonClient() to
locate and implement these changes.
In `@packages/host-service/src/terminal/DaemonClient/DaemonClient.node-test.ts`:
- Around line 63-64: Tests use fixed sleeps (await new Promise(r =>
setTimeout(r, 600))) before combining `chunks` with `Buffer.concat(...)`, which
causes flakiness; replace these hard-coded delays with a predicate-based wait
that polls until the expected condition is met (e.g., until `chunks.length`
reaches the expected count or the concatenated string contains the expected
substring) with a short interval and an overall timeout. Update both occurrences
(the wait before `const combined = Buffer.concat(chunks).toString("utf8")` and
the similar block around lines 147-149) to use the polling helper or inline loop
(awaiting small setTimeout ticks) that throws on timeout, so the test proceeds
as soon as the condition is satisfied instead of waiting a fixed duration.
In `@packages/host-service/src/terminal/DaemonClient/DaemonClient.ts`:
- Around line 69-77: The connect() method opens a socket and registers listeners
before awaiting this.handshake(), but if handshake rejects it currently returns
without cleaning up; update connect() to wrap the await this.handshake() in
try/catch and on any error immediately remove the added listeners (e.g.
socket.removeAllListeners() or remove specific "data","close","error" handlers),
destroy/close the socket (socket.destroy() or socket.end()), set this.socket
back to undefined/null and rethrow the error so state is not leaked; reference
the connect(), openSocket(), this.socket, and handshake() symbols when making
the change.
- Around line 260-262: The handler in list() currently treats any incoming
"error" frame as relevant and can settle the request from unrelated sessions;
update the listener added in list()/requestNonSession() so it only settles on
error frames that match the originating request's identifier (e.g., compare
m.sessionId or m.requestId/m.id to the request's stored id) in addition to the
expected type, and keep using the existing off/settle flow to unsubscribe and
resolve only when the ids match.
- Around line 316-318: When a protocol decode failure is caught in DaemonClient
(the catch that currently calls this.onClose(err as Error)), immediately
forcefully close the underlying transport to prevent further data delivery by
invoking the connection destroy method (for example call
this.transport.destroy() or this.socket.destroy() right after
this.onClose(...)), ensuring any thrown errors from destroy are
handled/suppressed; keep the onClose(err) call but follow it with an explicit
hard-close of the socket/transport.
- Around line 210-272: Add a request-level timeout to the promise logic that
waits for open/close responses and to requestNonSession (the promise that waits
for "list-reply") so the promise is rejected if no response arrives within N ms;
specifically, in the open/close handler (the block that checks req.type ===
"open" && m.type === "open-ok" / req.type === "close" && m.type === "closed")
and in requestNonSession add a timer (setTimeout) that calls the existing fail
with a clear Error like "daemon request timed out" after a chosen timeout (e.g.
30s), store the timer id, and ensure cleanup() clears the timer along with off()
and offDisc() so no leak occurs; optionally make the timeout value configurable
via a parameter or class-level constant.
In `@packages/pty-daemon/package.json`:
- Around line 17-21: The package.json currently declares "engines.node": ">=20"
but the project uses node flags introduced in Node 22.6.0 (e.g., the
--experimental-strip-types flag used by the "start" and "test:integration"
scripts and the bin entry src/main.ts); update the engines.node field in
package.json to ">=22.6.0" (or ">=22" if you prefer a looser constraint) so the
declared engine matches requirements of start, test:integration, and the
experimental-strip-types usage.
In `@packages/pty-daemon/README.md`:
- Around line 32-61: The README.md has an untyped fenced code block that
triggers MD040; open the fenced block containing the src/ tree listing (the
block that includes lines like src/, main.ts, Server/ (Server.ts), build.ts) and
add a language identifier (e.g., change ``` to ```text) on the opening fence so
the block is typed and the markdown linter passes.
In `@packages/pty-daemon/src/main.ts`:
- Around line 26-31: The parsing of the --buffer-bytes flag in the
arg.startsWith("--buffer-bytes=") branch currently assigns Number.parseInt(...)
directly to args.bufferBytes allowing NaN or non-positive values; change this to
parse the value into a Number (e.g., const val = Number.parseInt(..., 10)),
validate that Number.isInteger(val) and val > 0 (or >= 0 if zero is allowed),
and if invalid throw or exit with a clear error before assigning to
args.bufferBytes; apply the same validation pattern to the other similar flag
handling on the similar branch referenced (line ~36).
- Around line 52-58: The shutdown handler (shutdown) must be exception-safe and
deterministic: add a reentrancy guard (e.g., a boolean like shuttingDown) and
wrap the await server.close() call in try/finally so process.exit(0) always runs
even if server.close() throws; ensure that subsequent signals or re-entry return
early when shuttingDown is true and log any error from server.close() before
exiting. Apply this change around the existing shutdown function and the
process.on("SIGINT")/process.on("SIGTERM") invocations.
---
Outside diff comments:
In `@apps/desktop/src/main/lib/host-service-coordinator.ts`:
- Around line 387-406: The code inserts a HostServiceProcess with status
"starting" into this.instances before calling this.ptyDaemon.ensure and
this.buildEnv so any exception leaves the org stuck in "starting"; wrap the
daemon and env bootstrap in a try/catch and on error rollback the inserted
instance (either remove the entry from this.instances or set its status to a
terminal value like "stopped"/"failed") and call this.emitStatus(organizationId,
"<terminal-status>", error) to surface the failure; reference
HostServiceProcess, this.instances, this.emitStatus, this.ptyDaemon.ensure and
this.buildEnv to locate and update the logic.
In
`@packages/host-service/src/trpc/router/workspace-creation/shared/finish-checkout.ts`:
- Around line 145-159: The awaited call to startSetupTerminalIfPresent inside
the args.runSetupScript block can throw and skip clearProgress; wrap the
startSetupTerminalIfPresent invocation in a try/finally (or try/catch/finally)
so that clearProgress(args.pendingId) always runs in the finally block, while
preserving existing handling of { terminal, warning } and rethrowing the error
if appropriate; reference the startSetupTerminalIfPresent call and
clearProgress(args.pendingId) in finish-checkout.ts to locate and apply the
change.
---
Nitpick comments:
In `@packages/host-service/src/terminal/terminal.ts`:
- Around line 731-755: Race condition: onMessage may receive 'input'/'resize'
before createTerminalSessionInternal registers the session, causing
sessions.get(terminalId) to be undefined and messages to be dropped. Fix by
introducing a short-lived pending message queue keyed by terminalId (e.g.,
pendingMessages: Map<string, Array<{ws, msg}>>), push incoming messages in the
onMessage handler when sessions.get(terminalId) is undefined instead of dropping
them, and then flush those queued messages right after result.sockets.add(ws)
and sendMessage(ws, { type: "title", ... }) in the async IIFE; also ensure to
clear the queue on error (when "error" in result) and apply the same queue check
in handlers that look up sessions (resize/input) so they either enqueue or
handle immediately.
In `@packages/pty-daemon/src/protocol/framing.ts`:
- Around line 47-55: The one-shot decoder decodeFrame currently neglects the
same max-frame check used by FrameDecoder; update decodeFrame to validate the
parsed length (len) against the same maximum frame bytes constant used by
FrameDecoder (e.g., MAX_FRAME_BYTES or the equivalent symbol) after reading len
and before comparing buf length, and throw the same error when len exceeds that
cap to keep behavior consistent with FrameDecoder while still keeping the
existing short/frame length mismatch checks involving HEADER_BYTES.
In `@packages/pty-daemon/src/protocol/version.ts`:
- Around line 3-4: SUPPORT: Replace the hard-coded SUPPORTED_PROTOCOL_VERSIONS
literal with a value derived from the CURRENT_PROTOCOL_VERSION constant so they
cannot drift; update the export of SUPPORTED_PROTOCOL_VERSIONS to reference
CURRENT_PROTOCOL_VERSION (e.g., construct the array from
CURRENT_PROTOCOL_VERSION) while preserving the readonly/const typing and export,
ensuring all usages of SUPPORTED_PROTOCOL_VERSIONS continue to work with the
derived value.
In `@packages/pty-daemon/src/Pty/Pty.ts`:
- Around line 35-38: Add a brief inline comment above the Pty.write method
explaining why the Buffer-to-string double-cast (as unknown as string) is used:
note that node-pty's runtime write() accepts Buffers too even though its
TypeScript types only declare string, and include a short reference/link to the
node-pty write docs for future readers; update the comment that precedes the
line this.term.write(data as unknown as string) inside the write(data: Buffer):
void method to capture that rationale and point to node-pty's documentation.
In `@packages/pty-daemon/src/Server/Server.ts`:
- Around line 245-252: The pickProtocol function contains a redundant
null-coalescing and ternary fallback: the expression "return best ??
(supported.has(CURRENT_PROTOCOL_VERSION) ? null : null)" always yields best or
null, so simplify by returning best directly; update the function pickProtocol
(and remove the unused reference to CURRENT_PROTOCOL_VERSION in that return) so
it computes best as before (iterating hello.protocols against
SUPPORTED_PROTOCOL_VERSIONS) and ends with "return best;".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 756b201e-fde5-426f-84e6-00645920dc93
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (43)
apps/desktop/electron.vite.config.tsapps/desktop/package.jsonapps/desktop/src/main/lib/host-service-coordinator.tsapps/desktop/src/main/lib/pty-daemon-coordinator.tsapps/desktop/src/main/lib/pty-daemon-manifest.tsapps/desktop/src/main/pty-daemon/index.tspackages/host-service/package.jsonpackages/host-service/src/runtime/teardown/teardown.tspackages/host-service/src/terminal/DaemonClient/DaemonClient.node-test.tspackages/host-service/src/terminal/DaemonClient/DaemonClient.tspackages/host-service/src/terminal/DaemonClient/index.tspackages/host-service/src/terminal/daemon-client-singleton.tspackages/host-service/src/terminal/terminal.tspackages/host-service/src/trpc/router/terminal/terminal.tspackages/host-service/src/trpc/router/workspace-creation/procedures/create.tspackages/host-service/src/trpc/router/workspace-creation/shared/finish-checkout.tspackages/host-service/src/trpc/router/workspace-creation/shared/setup-terminal.tspackages/pty-daemon/README.mdpackages/pty-daemon/build.tspackages/pty-daemon/package.jsonpackages/pty-daemon/src/Pty/Pty.test.tspackages/pty-daemon/src/Pty/Pty.tspackages/pty-daemon/src/Pty/index.tspackages/pty-daemon/src/Server/Server.tspackages/pty-daemon/src/Server/index.tspackages/pty-daemon/src/SessionStore/SessionStore.test.tspackages/pty-daemon/src/SessionStore/SessionStore.tspackages/pty-daemon/src/SessionStore/index.tspackages/pty-daemon/src/handlers/handlers.test.tspackages/pty-daemon/src/handlers/handlers.tspackages/pty-daemon/src/handlers/index.tspackages/pty-daemon/src/index.tspackages/pty-daemon/src/main.tspackages/pty-daemon/src/protocol/framing.test.tspackages/pty-daemon/src/protocol/framing.tspackages/pty-daemon/src/protocol/index.tspackages/pty-daemon/src/protocol/messages.tspackages/pty-daemon/src/protocol/version.tspackages/pty-daemon/test/control-plane.test.tspackages/pty-daemon/test/helpers/client.tspackages/pty-daemon/test/integration.test.tspackages/pty-daemon/tsconfig.jsontooling/typescript/base.json
| "pty-daemon": "./src/main.ts" | ||
| }, | ||
| "engines": { | ||
| "node": ">=20" | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Checking declared engine and strip-types usages..."
rg -n '"node"\s*:\s*".*"' packages/pty-daemon/package.json
rg -n 'experimental-strip-types|"\./src/main.ts"' packages/pty-daemon/package.json
echo
echo "Checking current Node binary support for --experimental-strip-types..."
node -v
node --help | rg -n 'strip-types|experimental-strip-types' || true
echo
echo "If this repo must truly support Node 20.x, verify in official Node docs/changelog"
echo "that --experimental-strip-types and TS entrypoint behavior are available in that floor version."Repository: superset-sh/superset
Length of output: 660
🌐 Web query:
Node.js --experimental-strip-types when introduced which version
💡 Result:
The --experimental-strip-types flag was introduced in Node.js version 22.6.0.
Citations:
- 1: nodejs/node@35f92d9
- 2: module: add --experimental-strip-types nodejs/node#53725
- 3: https://nodejs.org/en/blog/release/v22.6.0
- 4: https://nodejs.org/learn/typescript/run-natively
- 5: https://nodejs.org/api/typescript.html
Update engine declaration to match actual Node.js requirements.
This manifest declares Node >=20, but the scripts and bin entry require --experimental-strip-types, which was introduced in Node.js 22.6.0. Update "engines.node" to at least ">=22.6.0" (or ">=22" for simplicity), otherwise execution will fail on Node 20.x and 21.x.
Relevant lines
"pty-daemon": "./src/main.ts"
},
"engines": {
"node": ">=20"
},
and
"start": "node --experimental-strip-types src/main.ts",
...
"test:integration": "node --experimental-strip-types --test test/integration.test.ts test/control-plane.test.ts"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/pty-daemon/package.json` around lines 17 - 21, The package.json
currently declares "engines.node": ">=20" but the project uses node flags
introduced in Node 22.6.0 (e.g., the --experimental-strip-types flag used by the
"start" and "test:integration" scripts and the bin entry src/main.ts); update
the engines.node field in package.json to ">=22.6.0" (or ">=22" if you prefer a
looser constraint) so the declared engine matches requirements of start,
test:integration, and the experimental-strip-types usage.
| ``` | ||
| src/ | ||
| ├── main.ts # Node entrypoint: argv → Server.listen() | ||
| ├── index.ts # Public exports for host-service consumers | ||
| ├── protocol/ # Wire schemas + length-prefixed framing | ||
| │ ├── version.ts # CURRENT_PROTOCOL_VERSION + supported list | ||
| │ ├── messages.ts # ClientMessage / ServerMessage unions | ||
| │ ├── framing.ts # encodeFrame / FrameDecoder (4-byte BE prefix) | ||
| │ └── index.ts | ||
| ├── Pty/ # node-pty thin wrapper with dim validation | ||
| │ ├── Pty.ts | ||
| │ └── index.ts | ||
| ├── SessionStore/ # in-memory map + 64KB ring buffer per session | ||
| │ ├── SessionStore.ts | ||
| │ └── index.ts | ||
| ├── handlers/ # pure functions: open/input/resize/close/list/subscribe | ||
| │ ├── handlers.ts | ||
| │ └── index.ts | ||
| └── Server/ # AF_UNIX SOCK_STREAM accept loop, handshake, dispatch | ||
| ├── Server.ts | ||
| └── index.ts | ||
|
|
||
| test/ | ||
| ├── helpers/ | ||
| │ └── client.ts # reusable DaemonClient: connect, send, waitFor, collect | ||
| ├── integration.test.ts # smoke / happy-path (3 tests) | ||
| └── control-plane.test.ts # exhaustive control-plane coverage (25 tests) | ||
|
|
||
| build.ts # Bun bundler → dist/pty-daemon.js (target: node) | ||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced code block to satisfy markdown linting.
The fence starting on Line 32 is untyped and triggers MD040.
Suggested fix
-```
+```text
src/
├── main.ts # Node entrypoint: argv → Server.listen()
...
build.ts # Bun bundler → dist/pty-daemon.js (target: node)</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @packages/pty-daemon/README.md around lines 32 - 61, The README.md has an
untyped fenced code block that triggers MD040; open the fenced block containing
the src/ tree listing (the block that includes lines like src/, main.ts, Server/
(Server.ts), build.ts) and add a language identifier (e.g., change ``` to
passes.
🚀 Preview Deployment🔗 Preview Links
Preview updates automatically with new commits |
Greptile SummaryThis PR wires Two gaps need attention before the headline feature ("shell survival after restart") can be verified:
Confidence Score: 3/5Not safe to merge as-is — the headline shell survival with reattachment feature is broken by the EEXIST gap, and the missing request timeout can cause host-service hangs. Two P1 findings: (1) daemon.open returns EEXIST for surviving sessions, making post-restart reattachment impossible despite the PR's core promise; (2) requestSession has no deadline, so a live-but-unresponsive daemon freezes host-service indefinitely. Both are on the hot path for every terminal open. The surrounding infrastructure is solid, but these gaps block the primary user story. packages/host-service/src/terminal/terminal.ts (EEXIST reattachment gap), packages/host-service/src/terminal/DaemonClient/DaemonClient.ts (requestSession timeout), packages/pty-daemon/src/handlers/handlers.ts (handleOpen must be idempotent or a new attach message needed)
|
| Filename | Overview |
|---|---|
| packages/host-service/src/terminal/terminal.ts | Core terminal refactor to use daemon; daemon.open on an already-existing daemon session returns EEXIST, breaking reattachment after host-service restart (P1); fire-and-forget async WebSocket handler has a minor closed-socket race (P2). |
| packages/host-service/src/terminal/DaemonClient/DaemonClient.ts | New Unix-socket client for pty-daemon; well-structured fan-out and handshake, but requestSession has no timeout (P1 hang risk) and second-subscriber replay is silently ignored (P2). |
| packages/pty-daemon/src/handlers/handlers.ts | handleOpen always returns EEXIST for an existing session with no idempotent/adopt path, which is the root cause of the post-restart reattachment gap flagged in terminal.ts. |
| apps/desktop/src/main/lib/pty-daemon-coordinator.ts | New coordinator that spawns or adopts the pty-daemon process; adopt logic (PID alive + socket connectable) is solid; no significant issues found. |
| packages/pty-daemon/src/Server/Server.ts | Daemon server with per-connection state, handshake enforcement, and broadcast fan-out; stale socket cleanup and owner-only chmod are correct. |
| packages/pty-daemon/src/SessionStore/SessionStore.ts | Ring-buffer session store with correct eviction logic and snapshot API; no issues found. |
| packages/host-service/src/terminal/daemon-client-singleton.ts | Lazy singleton with connection deduplication via connecting promise; disconnect handling correctly nulls the cache; no significant issues. |
| apps/desktop/src/main/lib/pty-daemon-manifest.ts | Manifest read/write with structural validation; owner-only file mode 0o600 matches security intent. |
| packages/pty-daemon/src/protocol/framing.ts | Length-prefixed framing with 8 MB hard cap; correct streaming decoder using subarray; no issues. |
| packages/pty-daemon/src/Pty/Pty.ts | Thin node-pty adapter with dim validation and raw-bytes encoding; correct buffer to string handling in onData. |
Sequence Diagram
sequenceDiagram
participant R as Renderer
participant HS as host-service
participant DC as DaemonClient
participant D as pty-daemon
participant PTY as bash/shell
Note over R,PTY: Normal session creation
R->>HS: WebSocket /terminal/:id?workspaceId
HS->>DC: getDaemonClient
DC->>D: hello {protocols:[1]}
D-->>DC: hello-ack {protocol:1}
HS->>DC: daemon.open(terminalId, meta)
DC->>D: open {id, shell, argv, env, cols, rows}
D->>PTY: node-pty spawn
D-->>DC: open-ok {id, pid}
HS->>DC: daemon.subscribe(id, replay=true)
DC->>D: subscribe {id, replay:true}
D-->>DC: output ring buffer replay
DC-->>HS: onOutput chunk
HS-->>R: data frame
Note over R,PTY: After host-service restart
HS->>HS: process killed sessions map empty
R->>HS: WebSocket reconnect
HS->>DC: getDaemonClient new connection to same daemon
HS->>DC: daemon.open(terminalId, meta)
DC->>D: open {id}
D-->>DC: error EEXIST session still live
DC-->>HS: throws open error
HS-->>R: WS close 1011 reattachment fails
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
packages/host-service/src/terminal/DaemonClient/DaemonClient.ts:204-242
**No timeout on `requestSession` — hangs indefinitely on live-but-unresponsive daemon**
`requestSession` waits forever for a daemon reply; only a full socket disconnect (`onDisconnect`) causes rejection. If the daemon is alive (socket connected) but stuck (e.g., `node-pty.spawn` blocks inside the daemon), every `daemon.open(...)` / `daemon.close(...)` call in host-service will hang indefinitely — blocking the tRPC mutation, the WebSocket `onOpen` IIFE, and the teardown handler with no recovery path.
Add a deadline similar to `waitForFrame`:
```typescript
private requestSession(
id: string,
req: ...,
timeoutMs = 10_000,
): Promise<ServerMessage> {
return new Promise<ServerMessage>((resolve, reject) => {
let resolved = false;
const timer = setTimeout(() => {
if (resolved) return;
resolved = true;
cleanup();
reject(new Error(`daemon: request timed out after ${timeoutMs}ms`));
}, timeoutMs);
const settle = (m: ServerMessage) => {
if (resolved) return;
resolved = true;
clearTimeout(timer);
cleanup();
resolve(m);
};
...
});
}
```
### Issue 2 of 4
packages/host-service/src/terminal/terminal.ts:499-515
**Existing daemon sessions return `EEXIST` on `daemon.open` — reattachment after host-service restart is broken**
After host-service restarts, the in-memory `sessions` map is empty. When the renderer reconnects to an existing terminal, `createTerminalSessionInternal` calls `daemon.open(terminalId, {...})`. The daemon's `handleOpen` returns an `EEXIST` error for any session that is already open in its `SessionStore`. `DaemonClient.open` converts this to a thrown error, so `createTerminalSessionInternal` returns `{ error: "open <id>: session already exists: <id>" }`, which closes the WebSocket.
This breaks the headline feature: the shell survives in the daemon's PTY, but the renderer cannot reattach to it because there is no "adopt existing session" path — only `open` (create-new) is called, never a plain `subscribe` to the surviving session.
To fix, the reattachment path needs either:
- An idempotent `open` variant in the daemon (return existing session's `pid` if same `id`), **or**
- A new `attach` message the host-service sends when it detects the daemon already holds the session (via `daemon.list()`), falling through to `daemon.subscribe(id, { replay: true })` without `daemon.open`.
### Issue 3 of 4
packages/host-service/src/terminal/terminal.ts:730-755
**Fire-and-forget async WebSocket handler — closed WS added to session socket set**
The old code registered `result.sockets.add(ws)` synchronously in the same call frame. The new `void (async () => { ... })()` introduces a window: if the WebSocket closes before `await createTerminalSessionInternal` completes, the `onClose` handler fires (`session?.sockets.delete(ws)` is a no-op since the session doesn't exist yet), and then the async continuation adds the already-closed WS to `result.sockets`. The stale socket is cleaned up by `broadcastMessage` on the next broadcast, but a concurrent reconnect to the same `terminalId` before that cleanup could observe an inconsistent socket set.
Consider checking `ws.readyState` before adding it after the await:
```typescript
if (ws.readyState !== SOCKET_OPEN) return; // WS closed during open
result.sockets.add(ws);
```
### Issue 4 of 4
packages/host-service/src/terminal/DaemonClient/DaemonClient.ts:144-160
**Second `subscribe` call's `replay` option silently ignored**
When a second local subscriber registers for a session that already has a wire subscription, its `opts.replay` value is discarded — only the first subscriber's `replay` value was used when the wire `subscribe` was sent. In the current code this is benign because `DaemonPty.onData()` and `DaemonPty.onExit()` always pass `replay: false`. But if callers invoke `daemon.subscribe(id, { replay: true }, cb)` after the primary subscription exists, they receive no replay data and no error is raised, which can be a silent footgun.
Consider documenting this constraint more prominently, or asserting `opts.replay === false` for non-first subscribers.
Reviews (1): Last reviewed commit: "feat(host-service): route terminal sessi..." | Re-trigger Greptile
| private requestSession( | ||
| id: string, | ||
| req: | ||
| | { type: "open"; id: string; meta: SessionMeta } | ||
| | { type: "close"; id: string; signal: Signal }, | ||
| ): Promise<ServerMessage> { | ||
| return new Promise<ServerMessage>((resolve, reject) => { | ||
| let resolved = false; | ||
| const settle = (m: ServerMessage) => { | ||
| if (resolved) return; | ||
| resolved = true; | ||
| cleanup(); | ||
| resolve(m); | ||
| }; | ||
| const fail = (err: Error) => { | ||
| if (resolved) return; | ||
| resolved = true; | ||
| cleanup(); | ||
| reject(err); | ||
| }; | ||
| const off = this.on((m) => { | ||
| if (m.type === "error" && m.id === id) settle(m); | ||
| else if (req.type === "open" && m.type === "open-ok" && m.id === id) | ||
| settle(m); | ||
| else if (req.type === "close" && m.type === "closed" && m.id === id) | ||
| settle(m); | ||
| }); | ||
| const offDisc = this.onDisconnect((err) => | ||
| fail(err ?? new Error("daemon disconnected")), | ||
| ); | ||
| const cleanup = () => { | ||
| off(); | ||
| offDisc(); | ||
| }; | ||
| this.send(req); | ||
| }); | ||
| } | ||
|
|
||
| private requestNonSession( |
There was a problem hiding this comment.
No timeout on
requestSession — hangs indefinitely on live-but-unresponsive daemon
requestSession waits forever for a daemon reply; only a full socket disconnect (onDisconnect) causes rejection. If the daemon is alive (socket connected) but stuck (e.g., node-pty.spawn blocks inside the daemon), every daemon.open(...) / daemon.close(...) call in host-service will hang indefinitely — blocking the tRPC mutation, the WebSocket onOpen IIFE, and the teardown handler with no recovery path.
Add a deadline similar to waitForFrame:
private requestSession(
id: string,
req: ...,
timeoutMs = 10_000,
): Promise<ServerMessage> {
return new Promise<ServerMessage>((resolve, reject) => {
let resolved = false;
const timer = setTimeout(() => {
if (resolved) return;
resolved = true;
cleanup();
reject(new Error(`daemon: request timed out after ${timeoutMs}ms`));
}, timeoutMs);
const settle = (m: ServerMessage) => {
if (resolved) return;
resolved = true;
clearTimeout(timer);
cleanup();
resolve(m);
};
...
});
}Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/terminal/DaemonClient/DaemonClient.ts
Line: 204-242
Comment:
**No timeout on `requestSession` — hangs indefinitely on live-but-unresponsive daemon**
`requestSession` waits forever for a daemon reply; only a full socket disconnect (`onDisconnect`) causes rejection. If the daemon is alive (socket connected) but stuck (e.g., `node-pty.spawn` blocks inside the daemon), every `daemon.open(...)` / `daemon.close(...)` call in host-service will hang indefinitely — blocking the tRPC mutation, the WebSocket `onOpen` IIFE, and the teardown handler with no recovery path.
Add a deadline similar to `waitForFrame`:
```typescript
private requestSession(
id: string,
req: ...,
timeoutMs = 10_000,
): Promise<ServerMessage> {
return new Promise<ServerMessage>((resolve, reject) => {
let resolved = false;
const timer = setTimeout(() => {
if (resolved) return;
resolved = true;
cleanup();
reject(new Error(`daemon: request timed out after ${timeoutMs}ms`));
}, timeoutMs);
const settle = (m: ServerMessage) => {
if (resolved) return;
resolved = true;
clearTimeout(timer);
cleanup();
resolve(m);
};
...
});
}
```
How can I resolve this? If you propose a fix, please make it concise.| @@ -420,6 +513,7 @@ export function createTerminalSessionInternal({ | |||
| error instanceof Error ? error.message : "Failed to start terminal", | |||
| }; | |||
| } | |||
There was a problem hiding this comment.
Existing daemon sessions return
EEXIST on daemon.open — reattachment after host-service restart is broken
After host-service restarts, the in-memory sessions map is empty. When the renderer reconnects to an existing terminal, createTerminalSessionInternal calls daemon.open(terminalId, {...}). The daemon's handleOpen returns an EEXIST error for any session that is already open in its SessionStore. DaemonClient.open converts this to a thrown error, so createTerminalSessionInternal returns { error: "open <id>: session already exists: <id>" }, which closes the WebSocket.
This breaks the headline feature: the shell survives in the daemon's PTY, but the renderer cannot reattach to it because there is no "adopt existing session" path — only open (create-new) is called, never a plain subscribe to the surviving session.
To fix, the reattachment path needs either:
- An idempotent
openvariant in the daemon (return existing session'spidif sameid), or - A new
attachmessage the host-service sends when it detects the daemon already holds the session (viadaemon.list()), falling through todaemon.subscribe(id, { replay: true })withoutdaemon.open.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/terminal/terminal.ts
Line: 499-515
Comment:
**Existing daemon sessions return `EEXIST` on `daemon.open` — reattachment after host-service restart is broken**
After host-service restarts, the in-memory `sessions` map is empty. When the renderer reconnects to an existing terminal, `createTerminalSessionInternal` calls `daemon.open(terminalId, {...})`. The daemon's `handleOpen` returns an `EEXIST` error for any session that is already open in its `SessionStore`. `DaemonClient.open` converts this to a thrown error, so `createTerminalSessionInternal` returns `{ error: "open <id>: session already exists: <id>" }`, which closes the WebSocket.
This breaks the headline feature: the shell survives in the daemon's PTY, but the renderer cannot reattach to it because there is no "adopt existing session" path — only `open` (create-new) is called, never a plain `subscribe` to the surviving session.
To fix, the reattachment path needs either:
- An idempotent `open` variant in the daemon (return existing session's `pid` if same `id`), **or**
- A new `attach` message the host-service sends when it detects the daemon already holds the session (via `daemon.list()`), falling through to `daemon.subscribe(id, { replay: true })` without `daemon.open`.
How can I resolve this? If you propose a fix, please make it concise.| const themeType = parseThemeType(c.req.query("themeType")); | ||
| const result = createTerminalSessionInternal({ | ||
| terminalId, | ||
| workspaceId, | ||
| themeType, | ||
| db, | ||
| eventBus, | ||
| }); | ||
| // Daemon open is async; fire-and-forget while keeping the WS alive. | ||
| // On success: register the socket; on failure: surface and close. | ||
| void (async () => { | ||
| const result = await createTerminalSessionInternal({ | ||
| terminalId, | ||
| workspaceId, | ||
| themeType, | ||
| db, | ||
| eventBus, | ||
| }); | ||
|
|
||
| if ("error" in result) { | ||
| sendMessage(ws, { type: "error", message: result.error }); | ||
| ws.close(1011, result.error); | ||
| return; | ||
| } | ||
| if ("error" in result) { | ||
| sendMessage(ws, { type: "error", message: result.error }); | ||
| ws.close(1011, result.error); | ||
| return; | ||
| } | ||
|
|
||
| result.sockets.add(ws); | ||
| sendMessage(ws, { type: "title", title: result.title }); | ||
| result.sockets.add(ws); | ||
| sendMessage(ws, { type: "title", title: result.title }); | ||
|
|
||
| db.update(terminalSessions) | ||
| .set({ lastAttachedAt: Date.now() }) | ||
| .where(eq(terminalSessions.id, terminalId)) | ||
| .run(); | ||
| db.update(terminalSessions) | ||
| .set({ lastAttachedAt: Date.now() }) | ||
| .where(eq(terminalSessions.id, terminalId)) | ||
| .run(); | ||
| })(); |
There was a problem hiding this comment.
Fire-and-forget async WebSocket handler — closed WS added to session socket set
The old code registered result.sockets.add(ws) synchronously in the same call frame. The new void (async () => { ... })() introduces a window: if the WebSocket closes before await createTerminalSessionInternal completes, the onClose handler fires (session?.sockets.delete(ws) is a no-op since the session doesn't exist yet), and then the async continuation adds the already-closed WS to result.sockets. The stale socket is cleaned up by broadcastMessage on the next broadcast, but a concurrent reconnect to the same terminalId before that cleanup could observe an inconsistent socket set.
Consider checking ws.readyState before adding it after the await:
if (ws.readyState !== SOCKET_OPEN) return; // WS closed during open
result.sockets.add(ws);Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/terminal/terminal.ts
Line: 730-755
Comment:
**Fire-and-forget async WebSocket handler — closed WS added to session socket set**
The old code registered `result.sockets.add(ws)` synchronously in the same call frame. The new `void (async () => { ... })()` introduces a window: if the WebSocket closes before `await createTerminalSessionInternal` completes, the `onClose` handler fires (`session?.sockets.delete(ws)` is a no-op since the session doesn't exist yet), and then the async continuation adds the already-closed WS to `result.sockets`. The stale socket is cleaned up by `broadcastMessage` on the next broadcast, but a concurrent reconnect to the same `terminalId` before that cleanup could observe an inconsistent socket set.
Consider checking `ws.readyState` before adding it after the await:
```typescript
if (ws.readyState !== SOCKET_OPEN) return; // WS closed during open
result.sockets.add(ws);
```
How can I resolve this? If you propose a fix, please make it concise.| let entry = this.callbacks.get(id); | ||
| const wasFirst = !entry; | ||
| if (!entry) { | ||
| entry = { output: new Set(), exit: new Set() }; | ||
| this.callbacks.set(id, entry); | ||
| } | ||
| entry.output.add(cb.onOutput); | ||
| entry.exit.add(cb.onExit); | ||
| // Only the first subscribe per session id sends the wire `subscribe`. | ||
| // Subsequent local callbacks just register into the existing entry. | ||
| if (wasFirst) { | ||
| this.send({ type: "subscribe", id, replay: opts.replay }); | ||
| } | ||
| return () => { | ||
| const e = this.callbacks.get(id); | ||
| if (!e) return; | ||
| e.output.delete(cb.onOutput); |
There was a problem hiding this comment.
Second
subscribe call's replay option silently ignored
When a second local subscriber registers for a session that already has a wire subscription, its opts.replay value is discarded — only the first subscriber's replay value was used when the wire subscribe was sent. In the current code this is benign because DaemonPty.onData() and DaemonPty.onExit() always pass replay: false. But if callers invoke daemon.subscribe(id, { replay: true }, cb) after the primary subscription exists, they receive no replay data and no error is raised, which can be a silent footgun.
Consider documenting this constraint more prominently, or asserting opts.replay === false for non-first subscribers.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/terminal/DaemonClient/DaemonClient.ts
Line: 144-160
Comment:
**Second `subscribe` call's `replay` option silently ignored**
When a second local subscriber registers for a session that already has a wire subscription, its `opts.replay` value is discarded — only the first subscriber's `replay` value was used when the wire `subscribe` was sent. In the current code this is benign because `DaemonPty.onData()` and `DaemonPty.onExit()` always pass `replay: false`. But if callers invoke `daemon.subscribe(id, { replay: true }, cb)` after the primary subscription exists, they receive no replay data and no error is raised, which can be a silent footgun.
Consider documenting this constraint more prominently, or asserting `opts.replay === false` for non-first subscribers.
How can I resolve this? If you propose a fix, please make it concise.If the daemon can't start (dev build without dist/main/pty-daemon.js,
node-pty native module mismatch, etc.), the prior commit took host-
service down with it — workspaces, git, chat, all unreachable. That's
the wrong coupling: the daemon's job is terminal survival, not gating
the rest of the API.
Now: catch the daemon-spawn error, log it loudly with the cause, and
spawn host-service with SUPERSET_PTY_DAEMON_SOCKET="" so terminal ops
fail with a specific message ("pty-daemon is not available: ...") and
everything else keeps working. The user can still use the app while
the daemon issue is investigated.
This unblocks the "workspace on sidebar but not found in db" symptom
seen during dev: the sidebar shows entries from cloud / local-db, but
host-service was never running so its DB queries return nothing.
…xit code When the daemon fails to come up, the prior coordinator just said "socket did not become ready within 5000ms" with no idea what went wrong. Now: 1. Refuse to spawn if scriptPath doesn't exist (e.g. dist/main/pty- daemon.js missing because electron-vite hasn't bundled the new entry yet). The error tells the user to restart the dev server. 2. Listen for child early-exit; include exit code or signal in the timeout error. 3. On timeout, read the daemon's log file (tail 2 KB) and include it in the thrown error. 4. Console.log the spawn args before fork so the dev terminal shows exactly what's being launched. This makes the next failure self-diagnosing instead of opaque.
Implements Open Decision #3 from the implementation plan: detect crash loops and stop respawning the daemon when something is fundamentally broken, instead of burning CPU on a forever-loop respawn. Behavior: - Daemon exits we initiated (coordinator.stop) don't count toward the crash counter — tracked via a `stopping` set. - Unexpected exits add a timestamp to the per-org crashTimes list. Older-than-60s timestamps are dropped on each accounting. - Up to 3 crashes within 60s → auto-respawn. - The 4th crash within the window → circuit OPEN. No more respawns until clearCrashCircuit(orgId) is called from the UI's "retry" affordance, or the desktop app restarts. - ensure() fails fast with a clear error message when the circuit is open, instead of trying to spawn-and-time-out repeatedly. Plumbing for the UI surface (telemetry + retry affordance) lands in a follow-up commit.
Wires the coordinator-side events the implementation plan called out.
Uses the existing main/lib/analytics track() helper that already feeds
PostHog with telemetry consent gating.
Events emitted:
- pty_daemon_spawn { organizationId, pid, socketPath }
- pty_daemon_adopt { organizationId, pid, ageSeconds }
- pty_daemon_spawn_failed { organizationId, reason, timeoutMs, earlyExitCode, earlyExitSignal }
- pty_daemon_crash { organizationId, exitCode, crashesInWindow, windowSeconds, ageSeconds }
- pty_daemon_circuit_open { organizationId, crashesInWindow }
Known gap (not in this commit): the host-service-side events from the
plan — pty_daemon_session_open, pty_daemon_session_exit,
host_service_restart_sessions_preserved (the headline metric) — need
host-service → desktop-main IPC since host-service runs as a separate
Node process with no PostHog client of its own. Tracked separately;
doesn't block the operational signals (spawn/adopt/crash/circuit-open)
from being available for monitoring.
Adds a test that spawns the bundled daemon as a child process, sends it a real SIGKILL (no Server.close, no graceful shutdown, no exit event broadcast), and asserts that connected clients see the socket close cleanly without hanging. Different from the existing control-plane Server.close test, which exercises the cooperative shutdown path. Real production crashes don't go through Server.close — this test covers the actual path. Wired into `bun run test:integration` script.
Move the pty-daemon supervisor (spawn / adopt / restart / version-detect / crash-circuit / manifest) from the desktop main process into host-service. The daemon is supervised by host-service so it can be deployed independently of Electron — that's the v2 thesis. Daemon outlives host-service crashes via detached spawn + manifest adoption (unchanged). Renderer reads daemon state through `workspaceTrpc.terminal.daemon.*` instead of `electronTrpc.ptyDaemon.*`. Telemetry track() calls become structured `console.log` lines (JSON with `component: "pty-daemon-supervisor"`) — host-service has no PostHog plumbing yet. Boot is fire-and-track: host-service kicks off `ensureDaemon(orgId)` at startup without awaiting; terminal request handlers `await waitForDaemonReady()` before using the supervisor's socket path. Non-terminal ops are unaffected if the daemon takes time to come up. The `SUPERSET_PTY_DAEMON_SOCKET` env-var contract from desktop → host-service goes away in production. Kept as a test escape hatch for the in-process adoption integration test. Tests: 21 supervisor unit tests moved to host-service. The 3 desktop real-spawn version-roundtrip tests are dropped — equivalent coverage already lives at the daemon package boundary. Plan: apps/desktop/plans/20260430-pty-daemon-host-service-migration.md (in the dull-protocol design-doc branch).
Architecture/reference doc colocated with the supervisor code. Replaces the migration plan that was the input to this work — describes the end state for future contributors.
Address PR review on the daemon transport. Four related issues, all in the same family of "subtle bugs that bite under load": - **Request-level timeouts** for open/close/list (15s/5s/5s). Without these, a live-but-stuck daemon (e.g. blocked node-pty.spawn) hangs callers indefinitely — only a full disconnect would unblock them. - **list() filtered to non-session error frames.** Previously any error could settle a pending list, so a concurrent error from a session could resolve a list() call with the wrong reply. - **Handshake failure tears down the socket.** A rejected handshake left the open socket and its listeners alive — leaked resources across retries. connect() now destroys + nulls on throw. - **Decode failure hard-closes the transport.** A protocol decode error called onClose() but didn't destroy() the socket, so the connection could keep delivering frames after local teardown. 200 host-service tests pass (the 1 unrelated `pull-requests` failure is preexisting on the branch baseline).
- DaemonClient.subscribe now throws if a second subscriber requests replay:true. The daemon's ring buffer is delivered once, on the first subscribe; later subscribers can't get historical data this way and used to silently miss it. Loud-fail the surprising case so callers pick a server-side replay path instead. Updated the existing fan-out test to use replay:false on the second subscriber (the right value for that use case anyway). - pty-daemon main.ts: validate --buffer-bytes is a positive integer; wrap the shutdown handler in try/finally with a re-entry guard so a second SIGINT/SIGTERM during graceful close doesn't double-call server.close() and the process always exits deterministically.
Adds 16 new tests across four files to close the test gaps for the pty-daemon migration. Most are bun unit tests; the supervisor integration test runs under node:test because the supervisor uses process.execPath to spawn the daemon (must be node, not bun). - DaemonSupervisor.node-test.ts (5 real-spawn scenarios): fresh spawn, cross-instance adoption, version drift detection on adoption, user-restart kills + respawns, auto-respawn after SIGKILL. - singleton.test.ts (6 cases): getSupervisor identity, fire-and-track bootstrap doesn't await, idempotent startDaemonBootstrap, waitForDaemonReady kicks off lazy bootstrap, failed bootstrap is retryable. - terminal.daemon.test.ts (4 cases): tRPC procedure wiring against a stub supervisor — UNAUTHORIZED gating, getUpdateStatus delegation, listSessions awaits bootstrap before delegating, restart wiring. - no-electron-coupling.test.ts (1 case): asserts host-service source has zero Electron imports/globals/APIs. Substitutes for a true headless smoke test until native-addon distribution is solved (better-sqlite3, node-pty, @parcel/watcher are bundle-external and currently expect Electron's resolution path). Also exports __resetSupervisorForTesting from src/daemon/index.ts so tests can reset the singleton between runs, and registers the new node-test in the test:integration script. Total host-service test suite is now 211 pass / 1 fail (the failing one is a preexisting pull-requests test unrelated to the migration).
Per migration plan D5: in dev mode (NODE_ENV !== production), the host-service shutdown handler now stops the supervised daemon before exit. Production still keeps the daemon detached so PTYs survive host-service restarts (the original v2 thesis). Lets dev iteration on daemon code reset cleanly without manually killing the daemon between cycles.
The supervisor's `sideBySide` path resolution expects pty-daemon.js next to host-service.js in the same dist directory. The Electron deploy bundles host-service into apps/desktop/dist/main/ via electron-vite, and that pipeline still needs an entry to bundle the daemon alongside. Restoring `apps/desktop/src/main/pty-daemon/index.ts` as a thin shim: imports Server from @superset/pty-daemon (workspace dep), parses argv, handles signals, that's it. The daemon implementation still lives entirely in the package. Headless deploys can spawn the package's own main.ts directly via the supervisor's workspace-dist fallback path.
The Settings → Terminal route lives outside any WorkspaceClientProvider (those are per-workspace). Without one, workspaceTrpc hooks fall through to electron-trpc, which has no `terminal.daemon` namespace (we removed the desktop-side proxy in the migration). The renderer silently failed with "no procedure on path terminal.daemon.*". V2SessionsSection now mounts its own WorkspaceClientProvider keyed to the active org's host URL from LocalHostServiceProvider. Hooks now reach host-service over HTTP correctly. Also adds light startup logging on host-service: - `[host-service] starting (org=..., port=..., NODE_ENV=...)` - `[supervisor] kicking off bootstrap for org=...` - `[supervisor] bootstrap OK for org=... pid=... version=... [update pending]` - `[supervisor] bootstrap failed for org=...` These survived the migration debugging session and are useful as production startup-trace lines. Per-call procedure logs were stripped to keep noise low.
resolveSupervisorScriptPath was walking two extra levels (`..`, `..`) when looking for pty-daemon.js next to host-service.js, which in the electron-vite bundle resolved to apps/desktop/pty-daemon.js (doesn't exist). The bundle emits both files in the same dist/main/ directory, so the path is just `path.resolve(here, "pty-daemon.js")`. Manifested as "[pty-daemon] script not found at apps/pty-daemon/dist/ pty-daemon.js" when triggering Restart daemon from Settings — the sideBySide check failed and the supervisor fell through to the workspace-source fallback path, which doesn't apply in a bundled deploy.
Server.onExit marked sessions as exited and fanned out the exit event but never deleted them from the SessionStore. Comment claimed "we delete on next list/close" but neither path did. Result: every closed terminal pane left a permanent row in the daemon's map — list-reply inflated, memory grew unbounded over time. Now: delete the session row immediately after fanning out the exit event. Also clear matching subscriptions on the live conns so they don't carry a stale id forward. Tradeoff: a late subscriber that connects after exit (e.g. host-service restarting *during* the exit window) gets ENOENT instead of buffered output + exit event. The renderer's xterm.js already has whatever was rendered before disconnect — what's lost is just the "Process exited with code N" footer for that narrow window. Accepted per project preference for simplest invariants. Updated tests: dropped subscribe-with-replay-on-exited (its premise no longer holds) and replaced with a non-accumulation assertion + ENOENT expectation on post-exit input. EEXITED is no longer a returned code (still defined in protocol for forward-compat).
Two related dev-quality fixes uncovered during manual QA:
1. Adopted daemons aren't tracked by `child.on("exit")` — the
supervisor only attaches that handler to daemons it spawned.
When an adopted daemon dies externally (kill -9, OOM, etc.) the
supervisor's `instances` map carries a stale entry forever:
`getSocketPath` returns a socket nobody's listening on, terminal
ops fail with ECONNREFUSED until something forces a restart.
Fix: poll `process.kill(pid, 0)` every 2s for adopted PIDs.
On detected death, clear the instance + manifest so the next
`ensure()` respawns. Added integration test:
"detects when an adopted daemon dies externally".
2. host-service and pty-daemon stdout went to per-org rotating log
files in BOTH dev and prod, so dev iteration had no live log
visibility — every diagnostic required tailing files. Now in
dev (NODE_ENV !== production) child stdout/stderr pipes through
to the parent (host-service → desktop main → bun dev), each
line tagged with `[hs:<orgId>]` or `[ptyd:<orgId>]`.
Production stdio still backs to the rotating log file so
detached children can outlive the parent without losing logs.
Helper `pipeWithPrefix` splits chunks on \n so multi-line bursts
keep the prefix on every line (was: only the first line).
… SIGTERM The chain (DaemonClient.close, daemon handleClose, DaemonPty.kill) defaulted to SIGTERM. Interactive shells (especially `zsh -l`, the default macOS login shell) trap SIGTERM and stay alive — so every closed v2 terminal pane leaked a PTY process and a daemon session until something else SIGKILL'd it. Verified: PID 46234 (`zsh -l`, status `Ss+`) survived the v2 pane-close path (which sends SIGTERM by default). Manual `kill -HUP 46234` killed it cleanly. SIGHUP is the right semantic — it's what the kernel sends when a TTY actually closes. Default changed to SIGHUP at all three layers; explicit signals still pass through for callers that need stronger termination (e.g. SIGKILL for hung shells in test). Regression test added: "default close (SIGHUP) terminates an interactive login shell". Without the fix, this would time out waiting for the exit event. The earlier integration tests didn't catch this because they used non-interactive scripts (`-c "true"`) that exit naturally — no signal handling involved.
…upgrade The pty-daemon supervision migration adds a new `terminal.daemon` tRPC namespace and changes host-service-internal lifecycle (supervisor owns the daemon now, dev-mode stdio piping, etc.). Existing 0.4.x host-services running on user machines don't have any of this. Without this version bump, the desktop coordinator's `tryAdopt` would adopt the old host-service in place — Settings → Manage daemon would 404 on the new procedures, and the v2 PTY-survival promise (the whole point of this PR) would silently not engage until something else forced a restart. Bumping HOST_SERVICE_VERSION + MIN_HOST_SERVICE_VERSION to 0.5.0 forces the coordinator to SIGTERM old host-services on first launch of the new desktop build and respawn from the new bundle. One-time terminal-session loss for users on upgrade — covered in release notes.
…ehavior Documents the additions from the manual-QA debugging session: - Adopted-daemon liveness polling (separate code path from spawned child's on-exit handler) - SIGHUP default for close (interactive shells trap SIGTERM) - Session deletion on PTY exit + the niche regression accepted - Dev-mode stdio piping with per-line prefix - Version-bump procedure (HOST_SERVICE_VERSION + MIN_HOST_SERVICE_VERSION in lockstep when adoption-floor matters) - Phase 2 (daemon-upgrade fd-handoff) explicitly noted as deferred, with the design hooks already in place that future work will use Existing sections (boot pattern, version detection, crash circuit, tests) updated to point at the new test files added in this PR.
There was a problem hiding this comment.
13 issues found across 76 files
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed. cubic prioritises the most important files to review.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/host-service/src/terminal/daemon-client-singleton.ts">
<violation number="1" location="packages/host-service/src/terminal/daemon-client-singleton.ts:52">
P1: `getDaemonClient()` has a race window: concurrent calls can create multiple `DaemonClient` instances because `connecting` is assigned only after an `await`.</violation>
<violation number="2" location="packages/host-service/src/terminal/daemon-client-singleton.ts:82">
P2: Do not silently swallow `dispose()` failures in connect-error cleanup; log warning context so cleanup issues are observable.
(Based on your team's feedback about handling async errors explicitly and avoiding silent catch blocks.) [FEEDBACK_USED]</violation>
</file>
<file name="packages/pty-daemon/src/main.ts">
<violation number="1" location="packages/pty-daemon/src/main.ts:28">
P3: `--buffer-bytes` validation accepts malformed numeric strings because `parseInt` partially parses input.</violation>
</file>
<file name="packages/pty-daemon/src/SessionStore/SessionStore.ts">
<violation number="1" location="packages/pty-daemon/src/SessionStore/SessionStore.ts:89">
P1: Oversized output chunks are fully discarded, so replay can lose all recent terminal output.</violation>
</file>
<file name="packages/host-service/src/no-electron-coupling.test.ts">
<violation number="1" location="packages/host-service/src/no-electron-coupling.test.ts:46">
P2: Self-file detection compares a filesystem path to a file URL string, which breaks on Windows and can make this test fail by matching its own pattern definitions.</violation>
</file>
<file name="packages/host-service/src/daemon/log-fd.ts">
<violation number="1" location="packages/host-service/src/daemon/log-fd.ts:25">
P2: Do not swallow this failure silently; log the error (including `logPath`) before returning `-1` so daemon stdio fallback failures are diagnosable.
(Based on your team's feedback about handling errors explicitly and avoiding silent catch blocks.) [FEEDBACK_USED]</violation>
</file>
<file name="apps/desktop/src/renderer/routes/_authenticated/settings/terminal/components/TerminalSettings/components/V2SessionsSection/V2SessionsSection.tsx">
<violation number="1" location="apps/desktop/src/renderer/routes/_authenticated/settings/terminal/components/TerminalSettings/components/V2SessionsSection/V2SessionsSection.tsx:199">
P2: The sessions toggle can get stuck disabled in the "Hide sessions" state when session data becomes empty/unavailable while expanded.</violation>
</file>
<file name="packages/host-service/src/terminal/terminal.ts">
<violation number="1" location="packages/host-service/src/terminal/terminal.ts:824">
P2: The fire-and-forget async session creation in `onOpen` has no rejection handler, which can produce unhandled promise rejections.
(Based on your team's feedback about handling async errors explicitly.) [FEEDBACK_USED]</violation>
<violation number="2" location="packages/host-service/src/terminal/terminal.ts:844">
P1: Replay-buffered daemon output is not sent to the first WebSocket in the async create-on-open path, so users can miss terminal output right after reconnect/adoption.</violation>
</file>
<file name="packages/pty-daemon/package.json">
<violation number="1" location="packages/pty-daemon/package.json:20">
P2: The declared Node engine range is too broad for the scripts in this package: `--experimental-strip-types` is not available on Node 20.</violation>
</file>
<file name="packages/host-service/src/terminal/DaemonClient/DaemonClient.ts">
<violation number="1" location="packages/host-service/src/terminal/DaemonClient/DaemonClient.ts:182">
P1: `subscribe()` mutates callback state before the replay guard, so the error path leaks callbacks and can deliver events to a failed subscriber.</violation>
<violation number="2" location="packages/host-service/src/terminal/DaemonClient/DaemonClient.ts:291">
P2: Synchronous `send()` errors bypass request cleanup, leaking adhoc listeners/timeouts in both request helper paths.</violation>
</file>
<file name="packages/host-service/src/daemon/singleton.ts">
<violation number="1" location="packages/host-service/src/daemon/singleton.ts:67">
P0: Because `startDaemonBootstrap` acts as a fire-and-forget function, throwing inside its `.catch()` block causes an unhandled promise rejection if `waitForDaemonReady` is not immediately called. This will crash the host service on daemon spawn failures, preventing it from staying up as intended.
Instead, store the original promise and attach a side-channel `.catch()` to handle the rejection. This prevents unhandled rejections while allowing `waitForDaemonReady` to throw when it explicitly awaits the promise.
(Based on your team's feedback about handling async calls and errors explicitly to prevent unhandled promise rejections that can crash processes.) [FEEDBACK_USED]</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| bootstrapPromise = sup | ||
| .ensure(organizationId) | ||
| .then((inst) => { | ||
| console.log( | ||
| `[supervisor] bootstrap OK for org=${organizationId} pid=${inst.pid} version=${inst.runningVersion}${inst.updatePending ? " (update pending)" : ""}`, | ||
| ); | ||
| return inst; | ||
| }) | ||
| .catch((err) => { | ||
| console.error( | ||
| `[supervisor] bootstrap failed for org=${organizationId}:`, | ||
| err, | ||
| ); | ||
| // Reset so a future request can retry. | ||
| bootstrapPromise = null; | ||
| throw err; | ||
| }); |
There was a problem hiding this comment.
P0: Because startDaemonBootstrap acts as a fire-and-forget function, throwing inside its .catch() block causes an unhandled promise rejection if waitForDaemonReady is not immediately called. This will crash the host service on daemon spawn failures, preventing it from staying up as intended.
Instead, store the original promise and attach a side-channel .catch() to handle the rejection. This prevents unhandled rejections while allowing waitForDaemonReady to throw when it explicitly awaits the promise.
(Based on your team's feedback about handling async calls and errors explicitly to prevent unhandled promise rejections that can crash processes.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/host-service/src/daemon/singleton.ts, line 67:
<comment>Because `startDaemonBootstrap` acts as a fire-and-forget function, throwing inside its `.catch()` block causes an unhandled promise rejection if `waitForDaemonReady` is not immediately called. This will crash the host service on daemon spawn failures, preventing it from staying up as intended.
Instead, store the original promise and attach a side-channel `.catch()` to handle the rejection. This prevents unhandled rejections while allowing `waitForDaemonReady` to throw when it explicitly awaits the promise.
(Based on your team's feedback about handling async calls and errors explicitly to prevent unhandled promise rejections that can crash processes.) </comment>
<file context>
@@ -0,0 +1,104 @@
+ if (bootstrapPromise) return;
+ const sup = getSupervisor();
+ console.log(`[supervisor] kicking off bootstrap for org=${organizationId}`);
+ bootstrapPromise = sup
+ .ensure(organizationId)
+ .then((inst) => {
</file context>
| bootstrapPromise = sup | |
| .ensure(organizationId) | |
| .then((inst) => { | |
| console.log( | |
| `[supervisor] bootstrap OK for org=${organizationId} pid=${inst.pid} version=${inst.runningVersion}${inst.updatePending ? " (update pending)" : ""}`, | |
| ); | |
| return inst; | |
| }) | |
| .catch((err) => { | |
| console.error( | |
| `[supervisor] bootstrap failed for org=${organizationId}:`, | |
| err, | |
| ); | |
| // Reset so a future request can retry. | |
| bootstrapPromise = null; | |
| throw err; | |
| }); | |
| const promise = sup | |
| .ensure(organizationId) | |
| .then((inst) => { | |
| console.log( | |
| `[supervisor] bootstrap OK for org=${organizationId} pid=${inst.pid} version=${inst.runningVersion}${inst.updatePending ? " (update pending)" : ""}`, | |
| ); | |
| return inst; | |
| }); | |
| promise.catch((err) => { | |
| console.error( | |
| `[supervisor] bootstrap failed for org=${organizationId}:`, | |
| err, | |
| ); | |
| // Reset so a future request can retry. | |
| if (bootstrapPromise === promise) { | |
| bootstrapPromise = null; | |
| } | |
| }); | |
| bootstrapPromise = promise; |
| return sockPath; | ||
| } | ||
|
|
||
| export async function getDaemonClient(): Promise<DaemonClient> { |
There was a problem hiding this comment.
P1: getDaemonClient() has a race window: concurrent calls can create multiple DaemonClient instances because connecting is assigned only after an await.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/host-service/src/terminal/daemon-client-singleton.ts, line 52:
<comment>`getDaemonClient()` has a race window: concurrent calls can create multiple `DaemonClient` instances because `connecting` is assigned only after an `await`.</comment>
<file context>
@@ -0,0 +1,102 @@
+ return sockPath;
+}
+
+export async function getDaemonClient(): Promise<DaemonClient> {
+ if (cached?.isConnected) return cached;
+ if (connecting) return connecting;
</file context>
|
|
||
| /** Append output to a session's ring buffer; evict oldest chunks past the cap. */ | ||
| appendOutput(session: Session, chunk: Buffer): void { | ||
| session.buffer.push(chunk); |
There was a problem hiding this comment.
P1: Oversized output chunks are fully discarded, so replay can lose all recent terminal output.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/pty-daemon/src/SessionStore/SessionStore.ts, line 89:
<comment>Oversized output chunks are fully discarded, so replay can lose all recent terminal output.</comment>
<file context>
@@ -0,0 +1,104 @@
+
+ /** Append output to a session's ring buffer; evict oldest chunks past the cap. */
+ appendOutput(session: Session, chunk: Buffer): void {
+ session.buffer.push(chunk);
+ session.bufferBytes += chunk.byteLength;
+ while (
</file context>
| result.sockets.add(ws); | ||
| sendMessage(ws, { type: "title", title: result.title }); | ||
| result.sockets.add(ws); | ||
| sendMessage(ws, { type: "title", title: result.title }); |
There was a problem hiding this comment.
P1: Replay-buffered daemon output is not sent to the first WebSocket in the async create-on-open path, so users can miss terminal output right after reconnect/adoption.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/host-service/src/terminal/terminal.ts, line 844:
<comment>Replay-buffered daemon output is not sent to the first WebSocket in the async create-on-open path, so users can miss terminal output right after reconnect/adoption.</comment>
<file context>
@@ -621,27 +819,35 @@ export function registerWorkspaceTerminalRoute({
- result.sockets.add(ws);
- sendMessage(ws, { type: "title", title: result.title });
+ result.sockets.add(ws);
+ sendMessage(ws, { type: "title", title: result.title });
- db.update(terminalSessions)
</file context>
| sendMessage(ws, { type: "title", title: result.title }); | |
| sendMessage(ws, { type: "title", title: result.title }); | |
| replayBuffer(result, ws); | |
| if (result.exited) { | |
| sendMessage(ws, { | |
| type: "exit", | |
| exitCode: result.exitCode, | |
| signal: result.exitSignal, | |
| }); | |
| } |
| entry = { output: new Set(), exit: new Set() }; | ||
| this.callbacks.set(id, entry); | ||
| } | ||
| entry.output.add(cb.onOutput); |
There was a problem hiding this comment.
P1: subscribe() mutates callback state before the replay guard, so the error path leaks callbacks and can deliver events to a failed subscriber.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/host-service/src/terminal/DaemonClient/DaemonClient.ts, line 182:
<comment>`subscribe()` mutates callback state before the replay guard, so the error path leaks callbacks and can deliver events to a failed subscriber.</comment>
<file context>
@@ -0,0 +1,442 @@
+ entry = { output: new Set(), exit: new Set() };
+ this.callbacks.set(id, entry);
+ }
+ entry.output.add(cb.onOutput);
+ entry.exit.add(cb.onExit);
+ // Only the first subscribe per session id sends the wire `subscribe`.
</file context>
| <Button | ||
| variant="ghost" | ||
| size="sm" | ||
| disabled={!sessions || sessions.length === 0} |
There was a problem hiding this comment.
P2: The sessions toggle can get stuck disabled in the "Hide sessions" state when session data becomes empty/unavailable while expanded.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/desktop/src/renderer/routes/_authenticated/settings/terminal/components/TerminalSettings/components/V2SessionsSection/V2SessionsSection.tsx, line 199:
<comment>The sessions toggle can get stuck disabled in the "Hide sessions" state when session data becomes empty/unavailable while expanded.</comment>
<file context>
@@ -0,0 +1,305 @@
+ <Button
+ variant="ghost"
+ size="sm"
+ disabled={!sessions || sessions.length === 0}
+ onClick={() => setShowSessionList((v) => !v)}
+ >
</file context>
| }); | ||
| // Daemon open is async; fire-and-forget while keeping the WS alive. | ||
| // On success: register the socket; on failure: surface and close. | ||
| void (async () => { |
There was a problem hiding this comment.
P2: The fire-and-forget async session creation in onOpen has no rejection handler, which can produce unhandled promise rejections.
(Based on your team's feedback about handling async errors explicitly.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/host-service/src/terminal/terminal.ts, line 824:
<comment>The fire-and-forget async session creation in `onOpen` has no rejection handler, which can produce unhandled promise rejections.
(Based on your team's feedback about handling async errors explicitly.) </comment>
<file context>
@@ -621,27 +819,35 @@ export function registerWorkspaceTerminalRoute({
- });
+ // Daemon open is async; fire-and-forget while keeping the WS alive.
+ // On success: register the socket; on failure: surface and close.
+ void (async () => {
+ const result = await createTerminalSessionInternal({
+ terminalId,
</file context>
| "pty-daemon": "./src/main.ts" | ||
| }, | ||
| "engines": { | ||
| "node": ">=20" |
There was a problem hiding this comment.
P2: The declared Node engine range is too broad for the scripts in this package: --experimental-strip-types is not available on Node 20.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/pty-daemon/package.json, line 20:
<comment>The declared Node engine range is too broad for the scripts in this package: `--experimental-strip-types` is not available on Node 20.</comment>
<file context>
@@ -0,0 +1,39 @@
+ "pty-daemon": "./src/main.ts"
+ },
+ "engines": {
+ "node": ">=20"
+ },
+ "scripts": {
</file context>
| offDisc(); | ||
| clearTimeout(timer); | ||
| }; | ||
| this.send(req); |
There was a problem hiding this comment.
P2: Synchronous send() errors bypass request cleanup, leaking adhoc listeners/timeouts in both request helper paths.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/host-service/src/terminal/DaemonClient/DaemonClient.ts, line 291:
<comment>Synchronous `send()` errors bypass request cleanup, leaking adhoc listeners/timeouts in both request helper paths.</comment>
<file context>
@@ -0,0 +1,442 @@
+ offDisc();
+ clearTimeout(timer);
+ };
+ this.send(req);
+ });
+ }
</file context>
| args.socket = arg.slice("--socket=".length); | ||
| else if (arg.startsWith("--buffer-bytes=")) { | ||
| const raw = arg.slice("--buffer-bytes=".length); | ||
| const parsed = Number.parseInt(raw, 10); |
There was a problem hiding this comment.
P3: --buffer-bytes validation accepts malformed numeric strings because parseInt partially parses input.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/pty-daemon/src/main.ts, line 28:
<comment>`--buffer-bytes` validation accepts malformed numeric strings because `parseInt` partially parses input.</comment>
<file context>
@@ -0,0 +1,98 @@
+ args.socket = arg.slice("--socket=".length);
+ else if (arg.startsWith("--buffer-bytes=")) {
+ const raw = arg.slice("--buffer-bytes=".length);
+ const parsed = Number.parseInt(raw, 10);
+ if (!Number.isFinite(parsed) || parsed <= 0) {
+ throw new Error(
</file context>
…ration # Conflicts: # apps/desktop/src/main/lib/host-service-coordinator.ts # packages/host-service/package.json # packages/host-service/src/trpc/router/workspace-creation/procedures/create.ts
- bunfig + test/setup-env.ts: populate env vars before @t3-oss/env-core validates at module load, so integration tests that boot via createApp (without serve.ts) don't crash importing the validated env module. - Align apps/desktop semver caret to ^7.7.4 (Sherif: multiple-dependency- versions across the workspace). - Drop pre-existing unused MinimalCtx interface and replace candidates[0]! non-null assertion with explicit guard (Biome lint). - pty-daemon Server.pickProtocol: remove dead `?? (... ? null : null)` branch and the now-orphan CURRENT_PROTOCOL_VERSION import.
…ration # Conflicts: # bun.lock
Summary
Wires up
@superset/pty-daemonto the desktop app and routes all v2 terminal sessions through it. After this PR merges, killing host-service does not kill user shells — the daemon's session map and ring buffer survive the restart, and a fresh host-service connects to the existing daemon and reattaches.Stacked on top of #3886 (the standalone daemon package). Three commits, each independently reviewable:
feat(host-service): DaemonClient— Unix-socket client with handshake, typed protocol API, multi-local-subscriber fan-out from one wire subscription. 5 integration tests against a real Server.feat(desktop): pty-daemon coordinator + manifest + main entry— sibling ofHostServiceCoordinator. Spawns/adopts the daemon (PID alive AND socket connectable), writes manifest, feeds socket path to host-service viaSUPERSET_PTY_DAEMON_SOCKET. Newapps/desktop/src/main/pty-daemon/index.tsregistered as an electron-vite main entry so it bundles todist/main/pty-daemon.jsnext tohost-service.js.feat(host-service): route terminal sessions through pty-daemon— load-bearing.terminal.tsno longer callsnode-pty.spawn. PTY ownership lives in pty-daemon; the host-serviceptyfield becomes a thin facade.createTerminalSessionInternalbecomes async (awaitdaemon.open). All callers updated.How shell-survival works after this
When host-service restarts:
terminal.tscallsdaemon.subscribe(id, { replay: true })→ daemon sends buffered output → host fans out to the renderer's WebSocket.When the daemon binary version changes (rare): Phase 2 (separate PR) handles fd-handoff via
child_process.spawnstdioinheritance. Not in this PR.Tests
packages/pty-daemon: 24 bun unit + 28 node integration → all greenpackages/host-service: 37 bun unit + 5 nodeDaemonClientintegration → all greentsc: clean across 27 packagesTest plan
DaemonClient(the typed API surface thatterminal.tsdepends on)PtyDaemonCoordinatoradopt-vs-spawn logicterminal.tsrefactor — verify nothing got dropped that wasn't moved to the daemonsleep 600or similar, kill the host-service process (pkill -f host-service.js), observe:ps aux | grep <pid>)Out of scope
bufferper session for in-process replay (matches current behavior). The daemon's ring is the source of truth across restarts. A later cleanup can remove the host buffer if we go to per-WS daemon subscriptions.Summary by cubic
Routes all v2 terminal sessions through
@superset/pty-daemonand moves daemon supervision into host-service so shells survive host-service restarts. Adds a new daemon management API/UI, hardens lifecycle behavior, and bumps host-service to 0.5.0 to ensure the new stack is adopted on upgrade.New Features
DaemonSupervisor: spawn/adopt/restart with manifest, version probe, crash circuit, and liveness polling for adopted PIDs; bootstraps on startup without blocking; dev-mode pipes child stdio and stops the daemon on shutdown.terminal.daemontRPC routes; Settings → Terminal now shows a v2 “Manage daemon” section (V2SessionsSection) wired via aWorkspaceClientProvider.pty-daemonmain entry sopty-daemon.jssits next tohost-service.js; host-service 0.5.0 is required and enforced by the desktop coordinator.Bug Fixes
DaemonClienthardened: request timeouts, proper teardown on handshake/decode errors, single replay subscriber, and clean disconnect./tmppath to avoid Darwinsun_pathlimits; supervisor clears stale state when an adopted daemon dies; spawn failures no longer block host-service and an existingSUPERSET_PTY_DAEMON_SOCKETis honored.semverto7.7.4, and remove a dead branch in the daemon server’s protocol negotiation.Written for commit 011b59d. Summary will update on new commits.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation