[codex] Preserve PTYs on daemon auto-update failure#4793
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughThis PR adds structured auto-update failure tracking to the daemon supervisor and surfaces those failures via a new desktop dialog. Background daemon updates are now best-effort: on failure, the predecessor daemon remains running and the failure details are recorded and exposed for explicit user action instead of forcing a destructive restart. ChangesDaemon Auto-Update Failure Tracking and UI
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Ready to review this PR? Stage has broken it down into 5 individual chapters for you: Chapters generated by Stage for commit f2f1a51 on May 20, 2026 11:00pm UTC. |
Greptile SummaryThis PR changes the daemon auto-update fallback from "force-restart on smooth-handoff failure" to "leave the predecessor running and surface the failure through
Confidence Score: 3/5The supervisor-side changes are solid, but the new dialog has a practical defect: clicking Force update immediately dismisses the failure ID before the mutation resolves, so a restart failure leaves the user with a stale daemon and no dialog path to retry. The backend logic is well-structured and tested. The weak point is the dialog dismiss-before-mutate ordering: once the user clicks Force update and the restart fails, the dismissedFailureId guard prevents the dialog from ever reopening for that failure ID, silently trapping users without a retry path. The session-count staleness and in-memory-only dismiss state compound the UX fragility. DaemonAutoUpdateFailureDialog.tsx — the dismiss/mutate ordering and missing session-count polling are worth addressing before merge.
|
| Filename | Overview |
|---|---|
| apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsx | New dialog component that polls update status and lets users keep or force-restart the daemon; has a bug where a failed force-restart permanently dismisses the dialog, plus concerns around stale session count and in-memory-only dismiss state. |
| packages/host-service/src/daemon/DaemonSupervisor.ts | Auto-update fallback logic replaced with best-effort: failed fd-handoff leaves the predecessor running and records the failure via recordAutoUpdateFailure; getUpdateStatus now returns autoUpdateFailure. Misleading leftPending: true logged for the coalesced case. |
| packages/host-service/src/daemon/DaemonSupervisor.test.ts | Tests updated to reflect best-effort behavior: force-restart mocks are now expected not to be called, and new assertions verify autoUpdateFailure state and leftPending log fields. |
| apps/desktop/src/renderer/routes/_authenticated/layout.tsx | Adds DaemonAutoUpdateFailureDialog to the authenticated layout global effects area; straightforward integration with no issues. |
| packages/host-service/src/trpc/router/terminal/terminal.daemon.test.ts | tRPC procedure tests updated to include autoUpdateFailure: null in getUpdateStatus mock return values; changes are purely additive and correct. |
| packages/host-service/src/daemon/expected-version.ts | Comment-only update reflecting the new best-effort policy; no logic changes. |
| apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/index.ts | Barrel re-export for the new dialog component; no issues. |
| packages/host-service/DAEMON_SUPERVISION.md | Documentation updated to accurately describe the new best-effort auto-update behavior and trimmed the removed log events; no issues. |
Sequence Diagram
sequenceDiagram
participant Supervisor as DaemonSupervisor
participant Daemon as PTY Daemon (stale)
participant Dialog as DaemonAutoUpdateFailureDialog
participant User
Supervisor->>Daemon: "adopt (version < expected)"
Supervisor->>Supervisor: kickoffAutoUpdate()
Supervisor->>Daemon: runUpdate() fd-handoff attempt
Daemon-->>Supervisor: ok: false / throws
Supervisor->>Supervisor: recordAutoUpdateFailure()
Note over Daemon: predecessor keeps running
loop every 5 s
Dialog->>Supervisor: getUpdateStatus()
Supervisor-->>Dialog: autoUpdateFailure id and reason
end
Dialog->>User: show dialog with reason and alive session count
alt User keeps current daemon
User->>Dialog: Keep current daemon
Dialog->>Dialog: setDismissedFailureId(id)
else User forces update
User->>Dialog: Force update
Dialog->>Supervisor: terminal.daemon.restart()
Supervisor->>Daemon: SIGTERM and spawn new daemon
Supervisor-->>Dialog: success
Dialog->>Dialog: refetch status autoUpdateFailure cleared
end
Comments Outside Diff (1)
-
packages/host-service/src/daemon/DaemonSupervisor.ts, line 591-601 (link)pty_daemon_auto_update_failedwithleftPending: truelogged even for the coalesced pathWhen
!update.started(the auto-update joined an in-flight manual update), the event is still logged withleftPending: true, butrecordAutoUpdateFailureis intentionally not called soautoUpdateFailurestays null. The log field therefore implies the daemon is left pending from the auto-update path, which is misleading — the manual update path owns the failure from here. A different field value (e.g.,leftPending: false) or a dedicated event (pty_daemon_auto_update_coalesced) would make the log semantics accurate.Prompt To Fix With AI
This is a comment left during a code review. Path: packages/host-service/src/daemon/DaemonSupervisor.ts Line: 591-601 Comment: **`pty_daemon_auto_update_failed` with `leftPending: true` logged even for the coalesced path** When `!update.started` (the auto-update joined an in-flight manual update), the event is still logged with `leftPending: true`, but `recordAutoUpdateFailure` is intentionally not called so `autoUpdateFailure` stays null. The log field therefore implies the daemon is left pending from the auto-update path, which is misleading — the manual update path owns the failure from here. A different field value (e.g., `leftPending: false`) or a dedicated event (`pty_daemon_auto_update_coalesced`) would make the log semantics accurate. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsx:128-137
**Dialog permanently dismissed after a failed "Force update"**
`closeDialog()` runs before `restartDaemon.mutate()`, which immediately stores the current `activeFailureId` into `dismissedFailureId`. If the mutation fails, the `onError` toast fires but the `useEffect` guard `failure.id === dismissedFailureId` prevents the dialog from ever reopening — the user sees only the transient error toast and has no way to retry the forced update without restarting the app.
### Issue 2 of 4
apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsx:51-57
**Session count shown can be stale**
`sessionsQuery` is only set up with `refetchOnWindowFocus: true` and no `refetchInterval`. While the dialog is open, the alive-session count will not update unless the user switches windows. If sessions are created or destroyed while the dialog is on screen, the displayed count may be inaccurate and could influence the user's "Force update" decision.
### Issue 3 of 4
apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsx:40-84
**Dismissed failure only persisted in component state**
`dismissedFailureId` lives in React `useState`. If the `DaemonAutoUpdateFailureDialogInner` component unmounts (e.g., the authenticated layout re-renders, `activeHostUrl` changes, or the user signs out and back in), `dismissedFailureId` resets to `null`. On the next 5-second poll the same failure re-triggers the dialog, potentially surprising the user who already chose "Keep current daemon".
### Issue 4 of 4
packages/host-service/src/daemon/DaemonSupervisor.ts:591-601
**`pty_daemon_auto_update_failed` with `leftPending: true` logged even for the coalesced path**
When `!update.started` (the auto-update joined an in-flight manual update), the event is still logged with `leftPending: true`, but `recordAutoUpdateFailure` is intentionally not called so `autoUpdateFailure` stays null. The log field therefore implies the daemon is left pending from the auto-update path, which is misleading — the manual update path owns the failure from here. A different field value (e.g., `leftPending: false`) or a dedicated event (`pty_daemon_auto_update_coalesced`) would make the log semantics accurate.
Reviews (1): Last reviewed commit: "Preserve PTYs on daemon auto-update fail..." | Re-trigger Greptile
| onClick={() => { | ||
| closeDialog(); | ||
| restartDaemon.mutate(); | ||
| }} | ||
| > | ||
| Force update | ||
| </Button> | ||
| </AlertDialogFooter> | ||
| </AlertDialogContent> | ||
| </AlertDialog> |
There was a problem hiding this comment.
Dialog permanently dismissed after a failed "Force update"
closeDialog() runs before restartDaemon.mutate(), which immediately stores the current activeFailureId into dismissedFailureId. If the mutation fails, the onError toast fires but the useEffect guard failure.id === dismissedFailureId prevents the dialog from ever reopening — the user sees only the transient error toast and has no way to retry the forced update without restarting the app.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsx
Line: 128-137
Comment:
**Dialog permanently dismissed after a failed "Force update"**
`closeDialog()` runs before `restartDaemon.mutate()`, which immediately stores the current `activeFailureId` into `dismissedFailureId`. If the mutation fails, the `onError` toast fires but the `useEffect` guard `failure.id === dismissedFailureId` prevents the dialog from ever reopening — the user sees only the transient error toast and has no way to retry the forced update without restarting the app.
How can I resolve this? If you propose a fix, please make it concise.| const sessionsQuery = workspaceTrpc.terminal.daemon.listSessions.useQuery( | ||
| undefined, | ||
| { | ||
| enabled: activeFailureId !== null, | ||
| refetchOnWindowFocus: true, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
Session count shown can be stale
sessionsQuery is only set up with refetchOnWindowFocus: true and no refetchInterval. While the dialog is open, the alive-session count will not update unless the user switches windows. If sessions are created or destroyed while the dialog is on screen, the displayed count may be inaccurate and could influence the user's "Force update" decision.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsx
Line: 51-57
Comment:
**Session count shown can be stale**
`sessionsQuery` is only set up with `refetchOnWindowFocus: true` and no `refetchInterval`. While the dialog is open, the alive-session count will not update unless the user switches windows. If sessions are created or destroyed while the dialog is on screen, the displayed count may be inaccurate and could influence the user's "Force update" decision.
How can I resolve this? If you propose a fix, please make it concise.| function DaemonAutoUpdateFailureDialogInner() { | ||
| const [activeFailureId, setActiveFailureId] = useState<string | null>(null); | ||
| const [dismissedFailureId, setDismissedFailureId] = useState<string | null>( | ||
| null, | ||
| ); | ||
|
|
||
| const updateStatusQuery = | ||
| workspaceTrpc.terminal.daemon.getUpdateStatus.useQuery(undefined, { | ||
| refetchInterval: STATUS_REFETCH_MS, | ||
| refetchOnWindowFocus: true, | ||
| }); | ||
| const sessionsQuery = workspaceTrpc.terminal.daemon.listSessions.useQuery( | ||
| undefined, | ||
| { | ||
| enabled: activeFailureId !== null, | ||
| refetchOnWindowFocus: true, | ||
| }, | ||
| ); | ||
| const restartDaemon = workspaceTrpc.terminal.daemon.restart.useMutation({ | ||
| onSuccess: () => { | ||
| toast.success("Daemon restarted", { | ||
| description: "All sessions were closed and a fresh daemon is running.", | ||
| }); | ||
| void updateStatusQuery.refetch(); | ||
| }, | ||
| onError: (error) => { | ||
| toast.error("Failed to restart daemon", { description: error.message }); | ||
| }, | ||
| }); | ||
|
|
||
| const failure = updateStatusQuery.data?.autoUpdateFailure ?? null; | ||
| useEffect(() => { | ||
| if (!failure) { | ||
| setActiveFailureId(null); | ||
| return; | ||
| } | ||
| if (failure.id === dismissedFailureId) return; | ||
| setActiveFailureId(failure.id); | ||
| }, [failure, dismissedFailureId]); | ||
|
|
||
| const closeDialog = () => { | ||
| if (activeFailureId) setDismissedFailureId(activeFailureId); | ||
| setActiveFailureId(null); | ||
| }; | ||
|
|
There was a problem hiding this comment.
Dismissed failure only persisted in component state
dismissedFailureId lives in React useState. If the DaemonAutoUpdateFailureDialogInner component unmounts (e.g., the authenticated layout re-renders, activeHostUrl changes, or the user signs out and back in), dismissedFailureId resets to null. On the next 5-second poll the same failure re-triggers the dialog, potentially surprising the user who already chose "Keep current daemon".
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsx
Line: 40-84
Comment:
**Dismissed failure only persisted in component state**
`dismissedFailureId` lives in React `useState`. If the `DaemonAutoUpdateFailureDialogInner` component unmounts (e.g., the authenticated layout re-renders, `activeHostUrl` changes, or the user signs out and back in), `dismissedFailureId` resets to `null`. On the next 5-second poll the same failure re-triggers the dialog, potentially surprising the user who already chose "Keep current daemon".
How can I resolve this? If you propose a fix, please make it concise.
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/host-service/src/daemon/DaemonSupervisor.ts (1)
60-71: 💤 Low valueConsider exporting these types for external consumers.
The desktop UI dialog and tRPC router consume
DaemonUpdateStatusviagetUpdateStatus(). While TypeScript can infer return types, explicitly exporting these interfaces improves developer experience and enables type reuse in the renderer layer.♻️ Suggested change
-interface DaemonAutoUpdateFailure { +export interface DaemonAutoUpdateFailure { id: string; reason: string; failedAt: number; } -interface DaemonUpdateStatus { +export interface DaemonUpdateStatus { pending: boolean; running: string; expected: string; autoUpdateFailure: DaemonAutoUpdateFailure | null; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/host-service/src/daemon/DaemonSupervisor.ts` around lines 60 - 71, Export the interfaces DaemonAutoUpdateFailure and DaemonUpdateStatus so external consumers can import them; update the declarations of DaemonAutoUpdateFailure and DaemonUpdateStatus to be exported (e.g., add the export keyword) to allow the desktop UI dialog and tRPC router to reuse these types from getUpdateStatus(), ensuring type-sharing and better developer DX.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsx`:
- Around line 109-110: The failure reason text is currently not explicitly
selectable; update the span rendering the error in DaemonAutoUpdateFailureDialog
(the span showing {failure?.reason ?? ""}) to include the classes "select-text
cursor-text" so users can copy the error text despite renderer's global
user-select: none; ensure the change targets the span element that contains the
failure?.reason output.
- Around line 128-131: The dialog is being closed immediately before the restart
completes which prevents retry when restartDaemon.mutate() fails; change the
flow so closeDialog() is called only in the mutation's onSuccess callback and
keep the dialog open while the mutation is pending (use restartDaemon.isPending
to disable the "Force update" button), and ensure the dismissedFailureId
comparison logic (the check that hides failures when failure.id ===
dismissedFailureId) can still allow the dialog to remain visible until onSuccess
calls closeDialog().
In `@packages/host-service/DAEMON_SUPERVISION.md`:
- Around line 68-72: The document contains an internal contradiction: the
section describing background auto-update behavior (mentions "fd-handoff",
"updatePending" and exposes failure via getUpdateStatus().autoUpdateFailure)
states it never force-restarts, but a later paragraph still documents a
destructive force-restart fallback; update the later paragraph to match the
non-destructive behavior by removing or replacing the destructive fallback text
and explicitly stating that background auto-updates will not force-restart and
instead surface failures via updatePending/getUpdateStatus().autoUpdateFailure
for an explicit user action, and include a short note that any destructive
fallback has been removed/deprecated so readers won't infer it still exists.
---
Nitpick comments:
In `@packages/host-service/src/daemon/DaemonSupervisor.ts`:
- Around line 60-71: Export the interfaces DaemonAutoUpdateFailure and
DaemonUpdateStatus so external consumers can import them; update the
declarations of DaemonAutoUpdateFailure and DaemonUpdateStatus to be exported
(e.g., add the export keyword) to allow the desktop UI dialog and tRPC router to
reuse these types from getUpdateStatus(), ensuring type-sharing and better
developer DX.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b4b2215d-95a1-422a-ab86-34843f74c699
📒 Files selected for processing (8)
apps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/DaemonAutoUpdateFailureDialog.tsxapps/desktop/src/renderer/routes/_authenticated/components/DaemonAutoUpdateFailureDialog/index.tsapps/desktop/src/renderer/routes/_authenticated/layout.tsxpackages/host-service/DAEMON_SUPERVISION.mdpackages/host-service/src/daemon/DaemonSupervisor.test.tspackages/host-service/src/daemon/DaemonSupervisor.tspackages/host-service/src/daemon/expected-version.tspackages/host-service/src/trpc/router/terminal/terminal.daemon.test.ts
There was a problem hiding this comment.
1 issue found across 8 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews. |
* Preserve PTYs on daemon auto-update failure * Address daemon update review feedback
Summary
terminal.daemon.getUpdateStatus().autoUpdateFailure.Why
A daemon version bump should not silently hard-restart the daemon and close PTY sessions. The supervisor can safely attempt the smooth fd-handoff, but the destructive fallback should be user-confirmed.
Validation
bun test packages/host-service/src/daemon/DaemonSupervisor.test.ts packages/host-service/src/trpc/router/terminal/terminal.daemon.test.tsbun run lintbun run typecheckSummary by cubic
Make daemon auto-update best-effort: if fd-handoff fails, we keep the current daemon and preserve PTY sessions. A global desktop dialog now lets users review the failure and explicitly force an update.
updatePendingstays true, and the failure is logged and stored on the instance.terminal.daemon.getUpdateStatus()now includesautoUpdateFailure(id,reason,failedAt) to surface the last background failure.DaemonAutoUpdateFailureDialogthat polls status, shows the failure reason and live session count, and offers “Keep current daemon” or “Force update” (callsterminal.daemon.restart).DAEMON_SUPERVISION.mdand tests to reflect the best‑effort behavior and removal of the destructive auto‑update fallback.Written for commit f2f1a51. Summary will update on new commits. Review in cubic
Summary by CodeRabbit
New Features
Documentation
Tests