fix(host-service): make workspace delete reliably tear down terminal sessions#5168
Conversation
…d kill A failed daemon close left the sqlite row marked disposed while the PTY stayed alive, so workspace-scoped cleanup could never find it again. Mark the row disposed only once the kill is confirmed; failed kills stay active and reapable.
Delete the workspace's confirmed-dead terminal rows in the destroy saga so its session index dies with it instead of lingering as set-null orphans. Still-active rows (failed kills) are kept reachable for the reaper.
The pty-daemon has no workspace mapping, so a PTY orphaned past workspace deletion can only be recovered from the daemon's live-session list. Add a reaper that kills daemon sessions whose row is dead/null-workspace, with a two-pass guard so a being-born session is never killed mid-creation. Runs at startup (drains pre-existing orphans) and every 5 minutes.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughA background terminal session reaper periodically disposes daemon sessions not present in the host DB, DB row updates are applied only on successful daemon close, the reaper is started during app/server startup using the app DB, and workspace destroy now deletes non-active terminal session rows (best-effort). ChangesTerminal Session Reaper & Cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Ready to review this PR? Stage has broken it down into 4 individual chapters for you: Chapters generated by Stage for commit cdb5085 on Jun 7, 2026 12:07am UTC. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/host-service/src/terminal/reaper/reaper.ts`:
- Around line 15-33: The reaper can start overlapping async passes
(startTerminalReaper -> run -> reapOrphanedSessions) causing races on
rowlessSessionsPendingSecondPass and duplicate disposeSessionAndWait calls; fix
by adding a run guard or serialized scheduling: inside startTerminalReaper add a
boolean flag (e.g., isRunning) or a simple mutex around run so that run returns
immediately if already running, set isRunning = true before calling
reapOrphanedSessions and clear it in finally, or replace setInterval with a
recursive setTimeout that schedules the next run only after the previous
reapOrphanedSessions completes; keep interval.unref()/timeout.unref() behavior
and ensure the initial immediate run is preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 90179731-790d-4ad5-abe1-837b05086a63
📒 Files selected for processing (5)
packages/host-service/src/app.tspackages/host-service/src/terminal/reaper/index.tspackages/host-service/src/terminal/reaper/reaper.tspackages/host-service/src/terminal/terminal.tspackages/host-service/src/trpc/router/workspace-cleanup/workspace-cleanup.ts
There was a problem hiding this comment.
2 issues found across 5 files
Architecture diagram
sequenceDiagram
participant CLI as CLI / API
participant DB as SQLite DB
participant Daemon as PTY Daemon
participant Reaper as Terminal Reaper
participant Host as Host Service
Note over CLI,Host: Workspace Delete Flow
CLI->>Host: delete_workspace(workspaceId)
Host->>DB: disposeSessionsByWorkspaceId(workspaceId)
Host->>Daemon: kill(terminalId)
alt Daemon kill succeeds
Daemon-->>Host: SUCCESS
Host->>DB: mark status="disposed", endedAt=now
else Daemon kill fails
Daemon-->>Host: FAILURE
Host->>Host: keep status="active" (retryable)
end
Host->>DB: delete non-active terminal rows for workspaceId
Host->>Host: remove worktree
Host-->>CLI: done
Note over Host,Daemon: Background Reaper (boot + every 5 min)
Reaper->>Daemon: list()
Daemon-->>Reaper: [SessionInfo(id,pid,alive),...]
Reaper->>DB: select id, status, originWorkspaceId from terminalSessions
DB-->>Reaper: rows[]
Reaper->>Reaper: identify orphans (disposed/exited/null-workspace/rowless x2)
loop for each orphanId
Reaper->>Daemon: kill(orphanId)
alt kill succeeds
Daemon-->>Reaper: SUCCESS
Reaper->>DB: mark status="disposed"
else kill fails
Daemon-->>Reaper: FAILURE
Reaper->>Reaper: keep status="active", log warning
end
end
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
Greptile SummaryThis PR fixes a reliability bug in workspace teardown where terminal PTY processes were orphaned after deletion — the cascade existed but the DB row was optimistically marked
Confidence Score: 4/5Safe to merge — the core fix to The terminal fix and cleanup pipeline are well-reasoned and address the root cause cleanly. The two items worth a follow-up are both in the reaper: the module-level packages/host-service/src/terminal/reaper/reaper.ts warrants a second look for the shared-state and retry-cadence concerns; the other four files are straightforward.
|
| Filename | Overview |
|---|---|
| packages/host-service/src/terminal/reaper/reaper.ts | New reaper module with module-level shared state (rowlessSessionsPendingSecondPass) that complicates isolation and causes suboptimal retry cadence for rowless orphans after a failed kill. |
| packages/host-service/src/terminal/terminal.ts | Core fix: disposeSessionAndWait correctly defers the disposed DB write until after closeResult.succeeded, so failed kills stay active and reapable. |
| packages/host-service/src/trpc/router/workspace-cleanup/workspace-cleanup.ts | New delete step correctly removes confirmed-dead terminal rows (ne(status, "active")) after the kill step, before the workspace FK triggers, preventing set null orphan accumulation. |
| packages/host-service/src/app.ts | Wires startTerminalReaper into the app lifecycle with correct best-effort teardown pattern matching the existing cleanup structure. |
| packages/host-service/src/terminal/reaper/index.ts | Thin barrel export for the reaper module; no issues. |
Sequence Diagram
sequenceDiagram
participant WC as workspace-cleanup
participant TT as terminal.ts
participant Daemon as PTY Daemon
participant DB as SQLite
participant Reaper as Terminal Reaper
Note over WC: Step 2a - kill active PTYs
WC->>TT: disposeSessionsByWorkspaceId(workspaceId)
TT->>DB: "SELECT sessions WHERE workspace=X AND status!=disposed"
DB-->>TT: [session rows]
loop for each session
TT->>Daemon: kill(sessionId)
alt kill succeeds
Daemon-->>TT: ok
TT->>DB: "UPDATE status=disposed (NEW: only on success)"
else kill fails
Daemon-->>TT: error
TT->>DB: no update - status stays active
end
end
Note over WC: NEW Step - prune dead rows
WC->>DB: "DELETE sessions WHERE workspace=X AND status!=active"
Note over WC: Step 5 - delete workspace row
WC->>DB: DELETE workspace row
DB-->>DB: "FK onDelete set null - active rows get originWorkspaceId=null"
Note over Reaper: Every 5 min and on startup
Reaper->>Daemon: list() alive sessions
Reaper->>DB: SELECT all terminal_sessions
loop for each live session
alt no DB row first pass
Reaper->>Reaper: add to pendingSecondPass
else no DB row second pass
Reaper->>Daemon: kill(id)
else "status=disposed OR exited OR originWorkspaceId=null"
Reaper->>Daemon: kill(id)
Daemon-->>Reaper: ok
Reaper->>DB: "UPDATE status=disposed"
end
end
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/host-service/src/terminal/reaper/reaper.ts:13
**Module-level mutable state leaks across reaper instances**
`rowlessSessionsPendingSecondPass` lives at module scope, so it persists across calls to `startTerminalReaper` (e.g., in unit tests or if the reaper is ever stopped and restarted). A second reaper started after a stop would inherit stale session IDs from the previous run: IDs that the new daemon knows nothing about will linger in the set until the cleanup loop at the top of the next `reapOrphanedSessions` call removes them. In tests that call `reapOrphanedSessions` directly, state from one test bleeds into the next, making tests order-dependent. Moving the Set inside `startTerminalReaper` (closing over it in `run`) would scope it to the instance lifetime.
### Issue 2 of 2
packages/host-service/src/terminal/reaper/reaper.ts:77-78
**Retry cadence for persistent rowless orphans doubles after a failed kill**
When a rowless orphan's kill fails in pass N (second-pass, so the ID was in `rowlessSessionsPendingSecondPass`), the ID is consumed by the `orphanIds` path and is **not** placed in `stillRowless`. The final `clear()` + re-population then leaves the ID out of the pending set entirely. On pass N+1 the session is seen as a brand-new rowless session and goes into `stillRowless`; the kill isn't retried until pass N+2 — doubling the effective retry interval to 10 minutes instead of 5. Re-adding the ID to `rowlessSessionsPendingSecondPass` after a failed kill (or keeping it in `stillRowless` when the kill failed) would restore the intended 5-minute cadence.
Reviews (1): Last reviewed commit: "feat(host-service): reap orphaned daemon..." | Re-trigger Greptile
|
|
||
| const REAP_INTERVAL_MS = 5 * 60 * 1000; | ||
|
|
||
| const rowlessSessionsPendingSecondPass = new Set<string>(); |
There was a problem hiding this comment.
Module-level mutable state leaks across reaper instances
rowlessSessionsPendingSecondPass lives at module scope, so it persists across calls to startTerminalReaper (e.g., in unit tests or if the reaper is ever stopped and restarted). A second reaper started after a stop would inherit stale session IDs from the previous run: IDs that the new daemon knows nothing about will linger in the set until the cleanup loop at the top of the next reapOrphanedSessions call removes them. In tests that call reapOrphanedSessions directly, state from one test bleeds into the next, making tests order-dependent. Moving the Set inside startTerminalReaper (closing over it in run) would scope it to the instance lifetime.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/terminal/reaper/reaper.ts
Line: 13
Comment:
**Module-level mutable state leaks across reaper instances**
`rowlessSessionsPendingSecondPass` lives at module scope, so it persists across calls to `startTerminalReaper` (e.g., in unit tests or if the reaper is ever stopped and restarted). A second reaper started after a stop would inherit stale session IDs from the previous run: IDs that the new daemon knows nothing about will linger in the set until the cleanup loop at the top of the next `reapOrphanedSessions` call removes them. In tests that call `reapOrphanedSessions` directly, state from one test bleeds into the next, making tests order-dependent. Moving the Set inside `startTerminalReaper` (closing over it in `run`) would scope it to the instance lifetime.
How can I resolve this? If you propose a fix, please make it concise.| rowlessSessionsPendingSecondPass.clear(); | ||
| for (const id of stillRowless) rowlessSessionsPendingSecondPass.add(id); |
There was a problem hiding this comment.
Retry cadence for persistent rowless orphans doubles after a failed kill
When a rowless orphan's kill fails in pass N (second-pass, so the ID was in rowlessSessionsPendingSecondPass), the ID is consumed by the orphanIds path and is not placed in stillRowless. The final clear() + re-population then leaves the ID out of the pending set entirely. On pass N+1 the session is seen as a brand-new rowless session and goes into stillRowless; the kill isn't retried until pass N+2 — doubling the effective retry interval to 10 minutes instead of 5. Re-adding the ID to rowlessSessionsPendingSecondPass after a failed kill (or keeping it in stillRowless when the kill failed) would restore the intended 5-minute cadence.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/terminal/reaper/reaper.ts
Line: 77-78
Comment:
**Retry cadence for persistent rowless orphans doubles after a failed kill**
When a rowless orphan's kill fails in pass N (second-pass, so the ID was in `rowlessSessionsPendingSecondPass`), the ID is consumed by the `orphanIds` path and is **not** placed in `stillRowless`. The final `clear()` + re-population then leaves the ID out of the pending set entirely. On pass N+1 the session is seen as a brand-new rowless session and goes into `stillRowless`; the kill isn't retried until pass N+2 — doubling the effective retry interval to 10 minutes instead of 5. Re-adding the ID to `rowlessSessionsPendingSecondPass` after a failed kill (or keeping it in `stillRowless` when the kill failed) would restore the intended 5-minute cadence.
How can I resolve this? If you propose a fix, please make it concise.
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
…eateApp Starting the reaper inside createApp made it eagerly connect the pty-daemon client at construction time, which raced integration tests that configure a custom daemon socket after building the app (teardown terminals never spawned -> timeouts). Move it to the serve listen callback, alongside connectRelay, so it only runs in the real host process and after daemon bootstrap. Expose db on CreateAppResult to wire it.
…state - Skip a pass when one is already in flight, so setInterval can't interleave two passes racing on the pending-second-pass set. - Move that set into the startTerminalReaper closure so it can't leak across instances/restarts. - Re-queue a failed second-pass rowless kill so it retries on the next pass instead of restarting its two-pass clock.
|
Pushed fixes for the CI failure and the review findings: CI failure (3 integration tests timing out) — root cause was the reaper being started inside Overlapping passes (CodeRabbit + cubic) — added an in-flight guard so Module-level pending state (greptile) — moved Retry cadence doubling (greptile) — a failed second-pass rowless kill is now re-queued into the pending set so it retries on the next 5-min pass instead of restarting its two-pass clock. Warning context (cubic) — included |
Problem
A host with automated stale-workspace deletion accumulated ~400 live terminal sessions across already-deleted workspaces. Reported as "delete just removes the git worktree and leaves the sessions running."
This turned out to be a reliability bug, not a missing feature. The cascade already exists: CLI
delete_workspace→ host-serviceworkspace.delete→destroyWorkspace, whose Step 2a callsdisposeSessionsByWorkspaceIdbefore removing the worktree. The problem is that the cascade could silently fail and orphan the PTY:disposeSessionAndWaitmarked the sqlite rowdisposedbefore confirming the daemon actually killed the PTY. A transient daemon failure left a live process behind adisposedrow — invisible to every future workspace-scoped cleanup.SessionInfois just{id,pid,cols,rows,alive}). The terminal→workspace link lives only in sqliteorigin_workspace_id, and theonDelete: "set null"FK severs it once the workspace row is deleted — so an already-orphaned PTY can't be found by workspace at all.Worktree removal reads git's durable on-disk registry so it always succeeds; terminal disposal trusted a mutable sqlite table and optimistically marked rows disposed — hence the asymmetry.
Changes
fix: mark disposed only after confirmed kill— failed kills now stayactiveand retryable/reapable instead of being lost behind adisposedrow.fix: clear dead terminal rows when destroying a workspace— the destroy saga deletes the workspace's confirmed-dead terminal rows (status ≠active), so its session index dies with it instead of lingering asset nullorphans. Failed kills (active) are deliberately kept reachable for the reaper.feat: reap orphaned daemon PTYs— a reaper lists the daemon's live sessions and kills any whose row isdisposed/exited/null-workspace (or row-less for two consecutive passes, so a being-born session — daemon PTY opened before the sqlite row insert — is never killed mid-creation). Runs at startup (drains the pre-existing ~400 on next host-service restart) and every 5 min; interval isunref()'d and cleared indispose().Why not
onDelete: cascade?An FK action fires at the DB level and can't know whether the PTY was actually killed.
cascadewould delete the row for a still-alive failed-kill PTY, turning it into a row-less orphan that's ambiguous with a being-born session (slower two-pass reap).set nullleaves an unambiguousactive+null row the reaper catches in one pass. Row lifecycle is therefore application-driven and conditional on a confirmed kill; the FK staysset nullas a dumb integrity backstop.Notes for deploy
Verification
bun run typecheck✓bun run lintexit 0 ✓Summary by cubic
Ensure workspace deletion reliably tears down terminal sessions by fixing disposal semantics and adding a background reaper in
host-service. The reaper now starts fromserveafter daemon bootstrap, avoids overlapping passes, and drains existing orphans on restart or within 5 minutes.Bug Fixes
disposedonly after the daemon kill succeeds; failed kills stayactiveand retryable.serveto avoid early daemon connections; skip overlapping passes, keep pending state per process, retry failed second‑pass rowless kills; include workspace id in the terminal-row cleanup warning.activeterminal rows to avoid orphaned records.New Features
Written for commit cdb5085. Summary will update on new commits.
Summary by CodeRabbit
New Features
Bug Fixes
Chores