fix(desktop): don't nuke host services on app update#3620
Conversation
Squirrel.Mac's update flow SIGTERMs the old app's process group, which reached the host-service child and caused two nukes: the child's shutdown handler deleted its own manifest, and the child itself was in the parent's process group so it died with the app. On relaunch the new app found no manifests and spawned fresh host-services, losing all in-memory state. - Drop removeManifest from the child's SIGTERM handler. The coordinator already owns manifest lifecycle on intentional stops and observed exits. - Spawn with detached: true and file-backed stdio at <manifestDir>/host-service.log, so the child lives in its own process group and doesn't depend on parent-held pipes. Mirrors the existing terminal-daemon pattern. Tray Stop / Quit & Stop Services are unchanged — coordinator.stop() signals by pid, independent of process group.
📝 WalkthroughWalkthroughThe manifest lifecycle management is transferred from the child host-service process to the coordinator. The child process is now spawned as detached with output redirected to per-organization log files, while the coordinator owns manifest creation and cleanup. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes a macOS update regression where Squirrel.Mac's process-group SIGTERM would kill the host-service child alongside the old app, destroying in-memory state (PTYs, watchers, chat streams) and leaving no manifest for the new app to adopt. Key changes:
Confidence Score: 4/5Safe to merge; the fix is logically sound with two minor non-blocking P2 suggestions. The core mechanism — detached spawn + removing manifest teardown from the child's SIGTERM handler — is correct and mirrors the existing terminal-daemon pattern. The log fd is correctly closed in
|
| Filename | Overview |
|---|---|
| apps/desktop/src/main/host-service/index.ts | Removes removeManifest from the SIGTERM shutdown handler; manifest lifecycle now belongs exclusively to the coordinator. Clean and correct change. |
| apps/desktop/src/main/lib/host-service-coordinator.ts | Adds openLogFile helper, switches spawn to detached: true with file-backed stdio, removes pipe-based stdout/stderr listeners, adds child.unref(). Log fd is properly closed in finally. Minor: silent failure in openLogFile and no log rotation. |
Sequence Diagram
sequenceDiagram
participant Squirrel as Squirrel.Mac
participant OldApp as Old App (Electron)
participant Child as Host-Service Child
participant Manifest as manifest.json
participant NewApp as New App (Electron)
Note over OldApp,Child: Before PR — child in parent's process group
Squirrel->>OldApp: SIGTERM (to process group)
OldApp->>Child: SIGTERM propagates (same group)
Child->>Manifest: removeManifest() ❌
Child-->>Child: exits
Note over OldApp,NewApp: After PR — child in own process group
Squirrel->>OldApp: SIGTERM (to process group)
Note over Child: detached=true → own group, not reached
OldApp-->>OldApp: exits
Manifest-->>Manifest: persists ✅
NewApp->>Manifest: discoverAll() → readManifest()
Manifest-->>NewApp: pid, port, secret
NewApp->>Child: tryAdopt() → health check
Child-->>NewApp: 200 OK
NewApp-->>NewApp: Adopted pid=… (state preserved ✅)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/desktop/src/main/lib/host-service-coordinator.ts
Line: 58-68
Comment:
**Silent log-open failure drops all child output**
When `openLogFile` fails (e.g., permissions error, unexpected filesystem issue), it returns `-1` with no warning. The spawn call then falls back to `stdio: "ignore"`, silently discarding all stdout/stderr from the host-service child. Operators will have no indication that logs are missing.
Consider emitting at least a `console.warn` when the file cannot be opened:
```suggestion
function openLogFile(organizationId: string): number {
try {
const dir = manifestDir(organizationId);
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
}
return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
} catch (err) {
console.warn(
`[host-service:${organizationId}] Failed to open log file, child output will be discarded:`,
err,
);
return -1;
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: apps/desktop/src/main/lib/host-service-coordinator.ts
Line: 64
Comment:
**Unbounded log file growth**
`host-service.log` is opened in append mode (`"a"`) with no size cap or rotation. Over time — especially if the host-service restarts frequently or is verbose — this file will grow without bound.
Consider either:
- Rotating at a fixed size (e.g., rename to `.log.1` when it exceeds a few MBs), or
- Truncating on each new spawn by opening with `"w"` instead of `"a"` (acceptable since each spawn is a fresh process whose history can be considered independent).
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(desktop): don't nuke host services o..." | Re-trigger Greptile
| function openLogFile(organizationId: string): number { | ||
| try { | ||
| const dir = manifestDir(organizationId); | ||
| if (!fs.existsSync(dir)) { | ||
| fs.mkdirSync(dir, { recursive: true, mode: 0o700 }); | ||
| } | ||
| return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600); | ||
| } catch { | ||
| return -1; | ||
| } | ||
| } |
There was a problem hiding this comment.
Silent log-open failure drops all child output
When openLogFile fails (e.g., permissions error, unexpected filesystem issue), it returns -1 with no warning. The spawn call then falls back to stdio: "ignore", silently discarding all stdout/stderr from the host-service child. Operators will have no indication that logs are missing.
Consider emitting at least a console.warn when the file cannot be opened:
| function openLogFile(organizationId: string): number { | |
| try { | |
| const dir = manifestDir(organizationId); | |
| if (!fs.existsSync(dir)) { | |
| fs.mkdirSync(dir, { recursive: true, mode: 0o700 }); | |
| } | |
| return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600); | |
| } catch { | |
| return -1; | |
| } | |
| } | |
| function openLogFile(organizationId: string): number { | |
| try { | |
| const dir = manifestDir(organizationId); | |
| if (!fs.existsSync(dir)) { | |
| fs.mkdirSync(dir, { recursive: true, mode: 0o700 }); | |
| } | |
| return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600); | |
| } catch (err) { | |
| console.warn( | |
| `[host-service:${organizationId}] Failed to open log file, child output will be discarded:`, | |
| err, | |
| ); | |
| return -1; | |
| } | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/main/lib/host-service-coordinator.ts
Line: 58-68
Comment:
**Silent log-open failure drops all child output**
When `openLogFile` fails (e.g., permissions error, unexpected filesystem issue), it returns `-1` with no warning. The spawn call then falls back to `stdio: "ignore"`, silently discarding all stdout/stderr from the host-service child. Operators will have no indication that logs are missing.
Consider emitting at least a `console.warn` when the file cannot be opened:
```suggestion
function openLogFile(organizationId: string): number {
try {
const dir = manifestDir(organizationId);
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
}
return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
} catch (err) {
console.warn(
`[host-service:${organizationId}] Failed to open log file, child output will be discarded:`,
err,
);
return -1;
}
}
```
How can I resolve this? If you propose a fix, please make it concise.| if (!fs.existsSync(dir)) { | ||
| fs.mkdirSync(dir, { recursive: true, mode: 0o700 }); | ||
| } | ||
| return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600); |
There was a problem hiding this comment.
host-service.log is opened in append mode ("a") with no size cap or rotation. Over time — especially if the host-service restarts frequently or is verbose — this file will grow without bound.
Consider either:
- Rotating at a fixed size (e.g., rename to
.log.1when it exceeds a few MBs), or - Truncating on each new spawn by opening with
"w"instead of"a"(acceptable since each spawn is a fresh process whose history can be considered independent).
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/main/lib/host-service-coordinator.ts
Line: 64
Comment:
**Unbounded log file growth**
`host-service.log` is opened in append mode (`"a"`) with no size cap or rotation. Over time — especially if the host-service restarts frequently or is verbose — this file will grow without bound.
Consider either:
- Rotating at a fixed size (e.g., rename to `.log.1` when it exceeds a few MBs), or
- Truncating on each new spawn by opening with `"w"` instead of `"a"` (acceptable since each spawn is a fresh process whose history can be considered independent).
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/desktop/src/main/lib/host-service-coordinator.ts (1)
457-465:⚠️ Potential issue | 🟡 MinorHealthcheck-failure path can leave an orphan manifest.
If the child's
serve(...)listen callback runs (which writes the manifest) but/trpc/health.checknever answers OK within the timeout, we SIGTERM the child andinstances.delete(...)beforechild.on("exit")fires. When exit does fire, the handler's!currentbranch short-circuits and skipsremoveManifest. The nextstart()will try to adopt a manifest pointing at a pid that's either dying or already gone.This is self-healing —
readAndValidateManifestremoves manifests for dead pids viaisProcessAlive— but on a fast relaunch you could still race a short window where the pid is reused or briefly alive. Cheap fix:🛠️ Proposed fix
const endpoint = `http://127.0.0.1:${port}`; const healthy = await pollHealthCheck(endpoint, secret); if (!healthy) { child.kill("SIGTERM"); this.instances.delete(organizationId); + removeManifest(organizationId); throw new Error( `Host service failed to start within ${HEALTH_POLL_TIMEOUT}ms`, ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/main/lib/host-service-coordinator.ts` around lines 457 - 465, When pollHealthCheck(endpoint, secret) times out we currently child.kill("SIGTERM") and this.instances.delete(organizationId) which can leave the manifest orphaned because child.on("exit")'s handler skips removeManifest if instance isn't current; fix by explicitly removing the manifest for this organization on the healthcheck-failure path (call the same cleanup used in exit handling, e.g. invoke removeManifest/readAndValidateManifest logic or call removeManifest(...) for the organizationId/manifest) before or immediately after deleting the instance so a stale manifest cannot be reused by a fast relaunch; ensure the fix reuses the existing removeManifest/readAndValidateManifest/isProcessAlive helpers rather than duplicating manifest-deletion logic.
🧹 Nitpick comments (1)
apps/desktop/src/main/lib/host-service-coordinator.ts (1)
58-68: Consider boundinghost-service.loggrowth.With detached + file-backed stdio,
host-service.logis opened in append mode and now survives Electron quit/update cycles. For users who keep the service adopted across many updates (which is the whole point of this PR), the log can grow without bound and there's no size cap, truncation-on-rotate, or cleanup. Options, from least to most involved:
- Truncate (
"w") on fresh spawns instead of append — you lose prior-run context but bound size per-instance.- Rotate once on spawn: rename existing
host-service.log→host-service.log.1(drop older) before opening.- Emit a warning when
fs.statSync(log).sizeexceeds some threshold.Not a blocker — just flagging since this is the first time the file is persisted across app lifetimes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/main/lib/host-service-coordinator.ts` around lines 58 - 68, openLogFile currently opens host-service.log in append mode with no size bounds; implement log rotation/truncation before opening: inside openLogFile (use manifestDir and the "host-service.log" path) check fs.statSync(file).size and if it exceeds a chosen threshold (e.g. configurable constant) rename the existing file to host-service.log.1 (overwriting/dropping any previous .1) or truncate by renaming and creating a fresh file, then open the new file (preserving file mode 0o600); additionally consider emitting a warning via the app logger when size exceeds the threshold. Ensure all file ops are guarded in the try/catch already around openLogFile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@apps/desktop/src/main/lib/host-service-coordinator.ts`:
- Around line 457-465: When pollHealthCheck(endpoint, secret) times out we
currently child.kill("SIGTERM") and this.instances.delete(organizationId) which
can leave the manifest orphaned because child.on("exit")'s handler skips
removeManifest if instance isn't current; fix by explicitly removing the
manifest for this organization on the healthcheck-failure path (call the same
cleanup used in exit handling, e.g. invoke
removeManifest/readAndValidateManifest logic or call removeManifest(...) for the
organizationId/manifest) before or immediately after deleting the instance so a
stale manifest cannot be reused by a fast relaunch; ensure the fix reuses the
existing removeManifest/readAndValidateManifest/isProcessAlive helpers rather
than duplicating manifest-deletion logic.
---
Nitpick comments:
In `@apps/desktop/src/main/lib/host-service-coordinator.ts`:
- Around line 58-68: openLogFile currently opens host-service.log in append mode
with no size bounds; implement log rotation/truncation before opening: inside
openLogFile (use manifestDir and the "host-service.log" path) check
fs.statSync(file).size and if it exceeds a chosen threshold (e.g. configurable
constant) rename the existing file to host-service.log.1 (overwriting/dropping
any previous .1) or truncate by renaming and creating a fresh file, then open
the new file (preserving file mode 0o600); additionally consider emitting a
warning via the app logger when size exceeds the threshold. Ensure all file ops
are guarded in the try/catch already around openLogFile.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 42bb6d13-fc58-4e6f-8253-1065edd2a357
📒 Files selected for processing (2)
apps/desktop/src/main/host-service/index.tsapps/desktop/src/main/lib/host-service-coordinator.ts
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/desktop/src/main/lib/host-service-coordinator.ts">
<violation number="1" location="apps/desktop/src/main/lib/host-service-coordinator.ts:65">
P2: Empty `catch {}` silently swallows log-open failures — if this path is hit, all child stdout/stderr is discarded (`stdio: "ignore"` fallback) with zero diagnostic trace. Add at minimum a `console.warn` with the organization ID and error so operators can tell why logs are missing.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| fs.mkdirSync(dir, { recursive: true, mode: 0o700 }); | ||
| } | ||
| return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600); | ||
| } catch { |
There was a problem hiding this comment.
P2: Empty catch {} silently swallows log-open failures — if this path is hit, all child stdout/stderr is discarded (stdio: "ignore" fallback) with zero diagnostic trace. Add at minimum a console.warn with the organization ID and error so operators can tell why logs are missing.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/desktop/src/main/lib/host-service-coordinator.ts, line 65:
<comment>Empty `catch {}` silently swallows log-open failures — if this path is hit, all child stdout/stderr is discarded (`stdio: "ignore"` fallback) with zero diagnostic trace. Add at minimum a `console.warn` with the organization ID and error so operators can tell why logs are missing.</comment>
<file context>
@@ -55,6 +55,18 @@ const HEALTH_POLL_INTERVAL = 200;
+ fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
+ }
+ return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
+ } catch {
+ return -1;
+ }
</file context>
| } catch { | |
| } catch (err) { | |
| console.warn( | |
| `[host-service:${organizationId}] Failed to open log file, child output will be discarded:`, | |
| err, | |
| ); |
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
Reconcile with #3620 which landed a simpler version of the same fix on main. Keep our implementation (dev/prod split, log rotation, chmod for pre-existing files, windowsHide, utils extraction) and pick up the `host-service/index.ts` change: drop `removeManifest` from the child's SIGTERM handler — manifest lifecycle belongs to the coordinator. Align log filename to `host-service.log` to match what landed on main.
Summary
On macOS update, Squirrel.Mac SIGTERMs the old app's process group. That reached the host-service child in two destructive ways:
apps/desktop/src/main/host-service/index.ts) calledremoveManifest()on shutdown — wiping the file the new app needed to re-adopt the service.detached: true, so the child sat in the parent's process group and died alongside the app.On relaunch,
coordinator.discoverAll()found no manifests and spawned fresh host-services, losing all in-memory state (PTYs, watchers, chat streams, etc.).Fixes
removeManifestfrom the SIGTERM handler. The coordinator already owns manifest lifecycle:stop()removes it on intentional stops, andchild.on("exit")removes it on observed crashes.detached: true+ file-backed stdio at<manifestDir>/host-service.log. Child gets its own process group, so Squirrel's group-wide SIGTERM doesn't reach it, and it no longer depends on parent-held pipes. Mirrors the existing terminal-daemon pattern atterminal-host/client.ts:1191.Tray "Stop" and "Quit & Stop Services" are unchanged —
coordinator.stop()signals by pid, independent of process group.Test plan
psbefore and after, same pids).~/.superset/host/<orgId>/manifest.jsonis still present after update completes.discoverAll()adopts the surviving service (check logs for "Adopted pid=…" rather than "listening on port …").bun devstill hot-reloads viaenableDevReload(kills + respawns per org).Summary by cubic
Prevent host services from being killed and their manifests deleted during macOS app updates, so they survive and are re-adopted on relaunch. This preserves in-memory state (PTYs, watchers, chat streams) across updates.
detached: trueand file-backed stdio at<manifestDir>/host-service.logto isolate it from the app’s process group and parent-held pipes.Written for commit 87d8deb. Summary will update on new commits.
Summary by CodeRabbit
Release Notes