fix(desktop): stop gating v2 workspace on flaky host online status#4430
Conversation
The workspace details page previously rendered a full-screen "Host is
offline" state whenever `v2Hosts.isOnline` was false or the host.info
probe failed. That flag is written by the relay's setOnline path and
drifts from reality during relay crashes, redeploys, transient API
blips, and half-open TCP — which means users see the offline screen
on a host that's actually reachable.
Remove the offline gate entirely. The workspace renders optimistically;
downstream queries (branches, files, terminals) surface their own
errors when the host genuinely isn't reachable. The version-mismatch
("incompatible") gate stays — that's a real correctness check tied
to a successful probe, not a guess at reachability.
Also deletes the now-unused WorkspaceHostOfflineState component.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR removes support for the "offline" host status. The ChangesOffline Status Removal
🎯 2 (Simple) | ⏱️ ~8 minutes
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
Greptile SummaryThis PR removes the full-screen "Host is offline" gate from the v2 workspace layout, replacing flaky
Confidence Score: 4/5Safe to merge for the offline-flakiness fix; the incompatible-version path now shows a brief flash of workspace content before the gate fires. The removal of the
|
| Filename | Overview |
|---|---|
| apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/hooks/useRemoteHostStatus/useRemoteHostStatus.ts | Removes offline status and isOnline DB gate; probe now always fires for remote workspaces. Missing infoQuery.isPending → loading mapping causes the hook to return ready during the version probe, allowing the workspace shell to mount before an incompatible result can be detected. |
| apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/layout.tsx | Removes the offline branch and the WorkspaceHostOfflineState import cleanly; remaining incompatible and loading checks are untouched. |
| apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/components/WorkspaceHostOfflineState/WorkspaceHostOfflineState.tsx | Component deleted — confirmed no other consumers in the diff. |
| apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/components/WorkspaceHostOfflineState/index.ts | Barrel re-export deleted alongside the component — clean removal. |
Sequence Diagram
sequenceDiagram
participant L as layout.tsx
participant H as useRemoteHostStatus
participant DB as v2Hosts (LiveQuery)
participant P as host.info probe
L->>H: call useRemoteHostStatus(workspace)
H->>DB: query host row (name only, no isOnline)
H->>P: "fire probe (enabled: workspace != null && !isLocal)"
note over H: While probe is pending…
H-->>L: "{ status: "ready" }"
L-->>L: renders WorkspaceProvider + Outlet (child queries fire)
P-->>H: probe resolves
alt version OK
H-->>L: "{ status: "ready" }"
note over L: No change — workspace stays rendered
else version too old
H-->>L: "{ status: "incompatible", hostName, hostVersion, minVersion }"
L-->>L: tears down workspace, shows incompatible screen
end
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/hooks/useRemoteHostStatus/useRemoteHostStatus.ts:71-83
**Workspace renders before version check completes**
Previously, `infoQuery.isPending` returned `{ status: "loading" }`, which kept the layout in the blank-div state until the probe settled. That guard is gone. Now, while the version probe is in-flight, `infoQuery.isSuccess` is false so the hook skips the `incompatible` branch and falls straight through to `return { status: "ready" }`. The layout immediately mounts `<WorkspaceProvider><Outlet /></WorkspaceProvider>`, child components (terminals, branch list, file tree) start firing their own queries, and then — after one network round-trip — the layout tears all of that down and shows the incompatible screen. The PR description says the incompatible gate "stays", but it now fires with a visible flash of workspace content preceding it.
Reviews (1): Last reviewed commit: "fix(desktop): stop gating v2 workspace o..." | Re-trigger Greptile
| if (infoQuery.isSuccess) { | ||
| const hostVersion = infoQuery.data.version; | ||
| if (!semver.satisfies(hostVersion, `>=${MIN_HOST_SERVICE_VERSION}`)) { | ||
| return { | ||
| status: "incompatible", | ||
| hostName: hostRow?.name ?? "Unknown host", | ||
| hostVersion, | ||
| minVersion: MIN_HOST_SERVICE_VERSION, | ||
| }; | ||
| } | ||
| } | ||
|
|
||
| return { status: "ready" }; |
There was a problem hiding this comment.
Workspace renders before version check completes
Previously, infoQuery.isPending returned { status: "loading" }, which kept the layout in the blank-div state until the probe settled. That guard is gone. Now, while the version probe is in-flight, infoQuery.isSuccess is false so the hook skips the incompatible branch and falls straight through to return { status: "ready" }. The layout immediately mounts <WorkspaceProvider><Outlet /></WorkspaceProvider>, child components (terminals, branch list, file tree) start firing their own queries, and then — after one network round-trip — the layout tears all of that down and shows the incompatible screen. The PR description says the incompatible gate "stays", but it now fires with a visible flash of workspace content preceding it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/hooks/useRemoteHostStatus/useRemoteHostStatus.ts
Line: 71-83
Comment:
**Workspace renders before version check completes**
Previously, `infoQuery.isPending` returned `{ status: "loading" }`, which kept the layout in the blank-div state until the probe settled. That guard is gone. Now, while the version probe is in-flight, `infoQuery.isSuccess` is false so the hook skips the `incompatible` branch and falls straight through to `return { status: "ready" }`. The layout immediately mounts `<WorkspaceProvider><Outlet /></WorkspaceProvider>`, child components (terminals, branch list, file tree) start firing their own queries, and then — after one network round-trip — the layout tears all of that down and shows the incompatible screen. The PR description says the incompatible gate "stays", but it now fires with a visible flash of workspace content preceding it.
How can I resolve this? If you propose a fix, please make it concise.#4430 just removed the equivalent full-screen "host offline" gate for remote hosts ("render optimistically; downstream queries surface their own errors"). Follow that direction for the local case too: remove WorkspaceLocalHostStoppedState, useLocalHostStatus, and the layout branch. What stays from PR2: - coordinator reset() + hostServiceCoordinator.reset tRPC mutation — useful for the planned Settings escape hatch and for support - LocalHostServiceProvider auto-retries start() for the active org on a 1s/4s/15s backoff (and on a post-running "stopped" status event), resetting the counter when status reaches "running" — transient spawn failures self-heal without any UI A non-blocking recovery surface (banner, not takeover) can be a separate PR if we still want one.
…n stopped state (#4395) * fix(desktop): heal stale host-service adoption; enable tray Restart in stopped state Adopt only host-services that respond to a health check (2s timeout). A live pid no longer serving on the recorded port was being adopted as "running", leaving every V2 surface stuck on "Host service not available" with no in-app recovery (#4299). Also enable tray "Restart" in stopped state — disabling it precisely when users need restart was backwards. Disabled now only while a start is in flight, to avoid racing the pending start. Adds a plan doc covering broader recovery work (banner, in-app reset, retry-with-backoff) for follow-up PRs. * docs(desktop): mark PR1 shipped in host-service recovery plan, prep PR2 handoff - Status badge updated; PR #4395 link added. - Items 1 and 5 marked shipped with implementation pointers. - Items 2-4 (PR2) tightened with concrete file paths, the existing useRemoteHostStatus + WorkspaceHostOfflineState pattern to mirror, router shape, and test cases. - Cross-linked #4396 (white-screen ioreg execFileSync) as the related but independently-fixable issue. - Added a "Handoff checklist for PR2" listing every file to touch. * fix(desktop): add in-app host-service reset + retry loop + recovery UI PR2 of the recovery plan (#4299): when host-service can't be reached the provider now auto-retries on a 1s/4s/15s backoff, and once retries exhaust the v2-workspace layout swaps in a recovery screen with Reset and Copy-diagnostics actions instead of a blank pane. - coordinator: new reset() forcibly SIGKILLs any pid in the manifest, optionally archives host.db to host.db.broken-<ts>, then re-spawns - tRPC: hostServiceCoordinator.reset mutation extends orgInput with wipeHostDb - LocalHostServiceProvider: wraps start with exponential-backoff retry for the active org, resets on running, exposes lastStartError / lastAttemptAt / retryAttempt / retryExhausted on context - new useLocalHostStatus hook + WorkspaceLocalHostStoppedState UI mirroring the existing useRemoteHostStatus / WorkspaceHostOfflineState pattern; layout branches on it above the offline branch * fix(desktop): route host-service-coordinator logs through electron-log The coordinator's narration (including PR1's adoption health-check failures: "Adopted pid=… did not respond …, killing and respawning") went to bare console.log, which never reaches main.log in a packaged build. Route it through electron-log like the auto-updater does so the next #4299-style report has breadcrumbs. * fix(desktop): drop local-host recovery screen; keep reset + auto-retry #4430 just removed the equivalent full-screen "host offline" gate for remote hosts ("render optimistically; downstream queries surface their own errors"). Follow that direction for the local case too: remove WorkspaceLocalHostStoppedState, useLocalHostStatus, and the layout branch. What stays from PR2: - coordinator reset() + hostServiceCoordinator.reset tRPC mutation — useful for the planned Settings escape hatch and for support - LocalHostServiceProvider auto-retries start() for the active org on a 1s/4s/15s backoff (and on a post-running "stopped" status event), resetting the counter when status reaches "running" — transient spawn failures self-heal without any UI A non-blocking recovery surface (banner, not takeover) can be a separate PR if we still want one. * fix(desktop): address review feedback on host-service reset + tests - reset(): read the manifest pid *before* stop() removes it, so a wedged tracked instance gets SIGTERM (from stop) then SIGKILL (escalation) instead of only SIGTERM. (cubic P1) - log on SIGKILL failure in reset() and the adoption health-check path — ESRCH (pid already gone) stays silent, EPERM/etc. now surface. (cubic P2) - tests: restore process.kill in afterEach (unconditional) and rm the mkdtemp dirs in afterEach; rename the "ENOENT" case to "ESRCH" (the errno actually simulated); add a reset test for the tracked-instance SIGTERM→SIGKILL escalation. (greptile/coderabbit) * refactor(desktop): trim host-service recovery PR to the load-bearing bits Drop the speculative pieces: - reset({ wipeHostDb }) + host.db archival — no caller; the Settings "clear local data" button that would use it is out of scope here - LocalHostServiceProvider retry-with-backoff loop — heavier than the #4299 fix needs and invisible without the recovery UI (already dropped); reverted to origin/main What remains: - adopt health-check (the actual #4299 fix) + tray Restart-in-stopped - coordinator reset() (manifest-only force-kill + respawn) + tRPC mutation — small, tested, ready for a support escape hatch - coordinator narration through electron-log * docs(desktop): replace host-service recovery plan with a short shipped summary, move to plans/done The original 220-line plan ranked six items and described a recovery screen + retry loop that didn't ship. Replace it with a tight summary of what landed (adopt health-check, reset(), tray fix, electron-log) and what was considered and deferred, and move it under plans/done per the repo convention. * test(desktop): fix stale 'no rename' in reset test title (wipeHostDb path removed)
…uperset-sh#4430) The workspace details page previously rendered a full-screen "Host is offline" state whenever `v2Hosts.isOnline` was false or the host.info probe failed. That flag is written by the relay's setOnline path and drifts from reality during relay crashes, redeploys, transient API blips, and half-open TCP — which means users see the offline screen on a host that's actually reachable. Remove the offline gate entirely. The workspace renders optimistically; downstream queries (branches, files, terminals) surface their own errors when the host genuinely isn't reachable. The version-mismatch ("incompatible") gate stays — that's a real correctness check tied to a successful probe, not a guess at reachability. Also deletes the now-unused WorkspaceHostOfflineState component.
…n stopped state (superset-sh#4395) * fix(desktop): heal stale host-service adoption; enable tray Restart in stopped state Adopt only host-services that respond to a health check (2s timeout). A live pid no longer serving on the recorded port was being adopted as "running", leaving every V2 surface stuck on "Host service not available" with no in-app recovery (superset-sh#4299). Also enable tray "Restart" in stopped state — disabling it precisely when users need restart was backwards. Disabled now only while a start is in flight, to avoid racing the pending start. Adds a plan doc covering broader recovery work (banner, in-app reset, retry-with-backoff) for follow-up PRs. * docs(desktop): mark PR1 shipped in host-service recovery plan, prep PR2 handoff - Status badge updated; PR superset-sh#4395 link added. - Items 1 and 5 marked shipped with implementation pointers. - Items 2-4 (PR2) tightened with concrete file paths, the existing useRemoteHostStatus + WorkspaceHostOfflineState pattern to mirror, router shape, and test cases. - Cross-linked superset-sh#4396 (white-screen ioreg execFileSync) as the related but independently-fixable issue. - Added a "Handoff checklist for PR2" listing every file to touch. * fix(desktop): add in-app host-service reset + retry loop + recovery UI PR2 of the recovery plan (superset-sh#4299): when host-service can't be reached the provider now auto-retries on a 1s/4s/15s backoff, and once retries exhaust the v2-workspace layout swaps in a recovery screen with Reset and Copy-diagnostics actions instead of a blank pane. - coordinator: new reset() forcibly SIGKILLs any pid in the manifest, optionally archives host.db to host.db.broken-<ts>, then re-spawns - tRPC: hostServiceCoordinator.reset mutation extends orgInput with wipeHostDb - LocalHostServiceProvider: wraps start with exponential-backoff retry for the active org, resets on running, exposes lastStartError / lastAttemptAt / retryAttempt / retryExhausted on context - new useLocalHostStatus hook + WorkspaceLocalHostStoppedState UI mirroring the existing useRemoteHostStatus / WorkspaceHostOfflineState pattern; layout branches on it above the offline branch * fix(desktop): route host-service-coordinator logs through electron-log The coordinator's narration (including PR1's adoption health-check failures: "Adopted pid=… did not respond …, killing and respawning") went to bare console.log, which never reaches main.log in a packaged build. Route it through electron-log like the auto-updater does so the next superset-sh#4299-style report has breadcrumbs. * fix(desktop): drop local-host recovery screen; keep reset + auto-retry superset-sh#4430 just removed the equivalent full-screen "host offline" gate for remote hosts ("render optimistically; downstream queries surface their own errors"). Follow that direction for the local case too: remove WorkspaceLocalHostStoppedState, useLocalHostStatus, and the layout branch. What stays from PR2: - coordinator reset() + hostServiceCoordinator.reset tRPC mutation — useful for the planned Settings escape hatch and for support - LocalHostServiceProvider auto-retries start() for the active org on a 1s/4s/15s backoff (and on a post-running "stopped" status event), resetting the counter when status reaches "running" — transient spawn failures self-heal without any UI A non-blocking recovery surface (banner, not takeover) can be a separate PR if we still want one. * fix(desktop): address review feedback on host-service reset + tests - reset(): read the manifest pid *before* stop() removes it, so a wedged tracked instance gets SIGTERM (from stop) then SIGKILL (escalation) instead of only SIGTERM. (cubic P1) - log on SIGKILL failure in reset() and the adoption health-check path — ESRCH (pid already gone) stays silent, EPERM/etc. now surface. (cubic P2) - tests: restore process.kill in afterEach (unconditional) and rm the mkdtemp dirs in afterEach; rename the "ENOENT" case to "ESRCH" (the errno actually simulated); add a reset test for the tracked-instance SIGTERM→SIGKILL escalation. (greptile/coderabbit) * refactor(desktop): trim host-service recovery PR to the load-bearing bits Drop the speculative pieces: - reset({ wipeHostDb }) + host.db archival — no caller; the Settings "clear local data" button that would use it is out of scope here - LocalHostServiceProvider retry-with-backoff loop — heavier than the superset-sh#4299 fix needs and invisible without the recovery UI (already dropped); reverted to origin/main What remains: - adopt health-check (the actual superset-sh#4299 fix) + tray Restart-in-stopped - coordinator reset() (manifest-only force-kill + respawn) + tRPC mutation — small, tested, ready for a support escape hatch - coordinator narration through electron-log * docs(desktop): replace host-service recovery plan with a short shipped summary, move to plans/done The original 220-line plan ranked six items and described a recovery screen + retry loop that didn't ship. Replace it with a tight summary of what landed (adopt health-check, reset(), tray fix, electron-log) and what was considered and deferred, and move it under plans/done per the repo convention. * test(desktop): fix stale 'no rename' in reset test title (wipeHostDb path removed)
Summary
Removes the full-screen "Host is offline" state from the v2 workspace details page.
The screen was gated on `v2Hosts.isOnline` (a DB flag written by the relay) and a `host.info` probe. Both drift from reality:
End result: users see "Host is offline" on hosts that are actually reachable.
What changes
Downstream queries (load branches, files, terminals) already surface their own errors when the host is genuinely unreachable. They're the correct place to communicate that, not a pre-emptive layout gate.
Test plan
Summary by cubic
Remove the flaky “Host is offline” gate on the v2 workspace page so the workspace renders by default; only a confirmed version mismatch still blocks with the “incompatible” screen. This prevents false offline screens caused by stale
v2Hosts.isOnlineor transient probe failures.offlinestatus fromuseRemoteHostStatusand removed theisOnline-gatedhost.infoprobe.WorkspaceHostOfflineState.host.infosucceeds and the version is too old.Written for commit c2cc393. Summary will update on new commits.
Summary by CodeRabbit