Skip to content

fix(desktop): don't nuke host services on app update#3620

Merged
saddlepaddle merged 1 commit into
mainfrom
host-service-update-nukes
Apr 21, 2026
Merged

fix(desktop): don't nuke host services on app update#3620
saddlepaddle merged 1 commit into
mainfrom
host-service-update-nukes

Conversation

@saddlepaddle
Copy link
Copy Markdown
Collaborator

@saddlepaddle saddlepaddle commented Apr 21, 2026

Summary

On macOS update, Squirrel.Mac SIGTERMs the old app's process group. That reached the host-service child in two destructive ways:

  1. The child's own SIGTERM handler (apps/desktop/src/main/host-service/index.ts) called removeManifest() on shutdown — wiping the file the new app needed to re-adopt the service.
  2. The coordinator spawned the child without detached: true, so the child sat in the parent's process group and died alongside the app.

On relaunch, coordinator.discoverAll() found no manifests and spawned fresh host-services, losing all in-memory state (PTYs, watchers, chat streams, etc.).

Fixes

  • Child shutdown — drop removeManifest from the SIGTERM handler. The coordinator already owns manifest lifecycle: stop() removes it on intentional stops, and child.on("exit") removes it on observed crashes.
  • Spawn isolationdetached: true + file-backed stdio at <manifestDir>/host-service.log. Child gets its own process group, so Squirrel's group-wide SIGTERM doesn't reach it, and it no longer depends on parent-held pipes. Mirrors the existing terminal-daemon pattern at terminal-host/client.ts:1191.

Tray "Stop" and "Quit & Stop Services" are unchanged — coordinator.stop() signals by pid, independent of process group.

Test plan

  • Install a prior canary build, trigger an update, and verify host-service pids survive across the update (ps before and after, same pids).
  • Confirm ~/.superset/host/<orgId>/manifest.json is still present after update completes.
  • After new app launches, confirm discoverAll() adopts the surviving service (check logs for "Adopted pid=…" rather than "listening on port …").
  • Tray "Stop" still terminates a running service.
  • Tray "Quit & Stop Services" still terminates services before exit.
  • bun dev still hot-reloads via enableDevReload (kills + respawns per org).

Summary by cubic

Prevent host services from being killed and their manifests deleted during macOS app updates, so they survive and are re-adopted on relaunch. This preserves in-memory state (PTYs, watchers, chat streams) across updates.

  • Bug Fixes
    • Removed manifest deletion from the child’s SIGTERM handler; the coordinator now solely manages manifest lifecycle on stop/exit.
    • Spawn the host-service with detached: true and file-backed stdio at <manifestDir>/host-service.log to isolate it from the app’s process group and parent-held pipes.
    • Tray “Stop” and “Quit & Stop Services” are unchanged; dev hot-reload still works.

Written for commit 87d8deb. Summary will update on new commits.

Summary by CodeRabbit

Release Notes

  • Chores
    • Updated host service process lifecycle management including initialization, execution, and shutdown procedures
    • Service logs are now persisted to dedicated log files instead of console output for better long-term accessibility
    • Enhanced resource cleanup and process coordination mechanisms
    • Refined service startup and process handling

Squirrel.Mac's update flow SIGTERMs the old app's process group, which
reached the host-service child and caused two nukes: the child's shutdown
handler deleted its own manifest, and the child itself was in the parent's
process group so it died with the app. On relaunch the new app found no
manifests and spawned fresh host-services, losing all in-memory state.

- Drop removeManifest from the child's SIGTERM handler. The coordinator
  already owns manifest lifecycle on intentional stops and observed exits.
- Spawn with detached: true and file-backed stdio at
  <manifestDir>/host-service.log, so the child lives in its own process
  group and doesn't depend on parent-held pipes. Mirrors the existing
  terminal-daemon pattern.

Tray Stop / Quit & Stop Services are unchanged — coordinator.stop()
signals by pid, independent of process group.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

The manifest lifecycle management is transferred from the child host-service process to the coordinator. The child process is now spawned as detached with output redirected to per-organization log files, while the coordinator owns manifest creation and cleanup.

Changes

Cohort / File(s) Summary
Manifest Lifecycle Transfer
apps/desktop/src/main/host-service/index.ts
Removed removeManifest import and SIGTERM/SIGINT cleanup logic; manifest lifecycle is now coordinator-owned. Manifest creation on server start remains unchanged.
Child Process Isolation & Logging
apps/desktop/src/main/lib/host-service-coordinator.ts
Added openLogFile() helper for per-organization log files with restricted permissions. Child process spawned as detached with stdout/stderr redirected to log file descriptor. Parent-process stream listeners removed; log file descriptor closed after spawning while child remains independent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 With careful paws, we've redesigned the way,
The child now runs detached, throughout the day,
Logs flow to files, lifecycle now stays clear,
Coordinator guides where manifest appears,

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main fix: preventing host services from being terminated during macOS app updates, which aligns with the primary objective of the PR.
Description check ✅ Passed The description is comprehensive, covering the problem statement, implemented fixes, test plan, and additional context. It follows the template with clear sections for summary, fixes, and testing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch host-service-update-nukes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR fixes a macOS update regression where Squirrel.Mac's process-group SIGTERM would kill the host-service child alongside the old app, destroying in-memory state (PTYs, watchers, chat streams) and leaving no manifest for the new app to adopt.

Key changes:

  • host-service/index.ts: Drops removeManifest from the SIGTERM shutdown handler. Manifest lifecycle is now owned exclusively by the coordinator (stop() removes it on intentional stops; child.on(\"exit\") removes it on observed crashes), preventing the child from wiping the manifest the new app needs to re-adopt the service.
  • host-service-coordinator.ts: Spawns the child with detached: true and file-backed stdio (<manifestDir>/host-service.log), placing the child in its own process group so Squirrel's group-wide SIGTERM no longer reaches it. Adds child.unref() so the parent's event loop is not held by the child. The log fd is correctly closed in a finally block after spawn. The stop() path is unchanged — it kills by PID directly, independent of process group.

Confidence Score: 4/5

Safe to merge; the fix is logically sound with two minor non-blocking P2 suggestions.

The core mechanism — detached spawn + removing manifest teardown from the child's SIGTERM handler — is correct and mirrors the existing terminal-daemon pattern. The log fd is correctly closed in finally. The stop() / stopAll() paths are unaffected. The two remaining comments (silent log-open failure, unbounded log growth) are quality-of-life issues that don't affect correctness or the primary fix.

host-service-coordinator.ts — the openLogFile fallback and log rotation are worth a follow-up, but not blocking.

Important Files Changed

Filename Overview
apps/desktop/src/main/host-service/index.ts Removes removeManifest from the SIGTERM shutdown handler; manifest lifecycle now belongs exclusively to the coordinator. Clean and correct change.
apps/desktop/src/main/lib/host-service-coordinator.ts Adds openLogFile helper, switches spawn to detached: true with file-backed stdio, removes pipe-based stdout/stderr listeners, adds child.unref(). Log fd is properly closed in finally. Minor: silent failure in openLogFile and no log rotation.

Sequence Diagram

sequenceDiagram
    participant Squirrel as Squirrel.Mac
    participant OldApp as Old App (Electron)
    participant Child as Host-Service Child
    participant Manifest as manifest.json
    participant NewApp as New App (Electron)

    Note over OldApp,Child: Before PR — child in parent's process group
    Squirrel->>OldApp: SIGTERM (to process group)
    OldApp->>Child: SIGTERM propagates (same group)
    Child->>Manifest: removeManifest() ❌
    Child-->>Child: exits

    Note over OldApp,NewApp: After PR — child in own process group
    Squirrel->>OldApp: SIGTERM (to process group)
    Note over Child: detached=true → own group, not reached
    OldApp-->>OldApp: exits
    Manifest-->>Manifest: persists ✅

    NewApp->>Manifest: discoverAll() → readManifest()
    Manifest-->>NewApp: pid, port, secret
    NewApp->>Child: tryAdopt() → health check
    Child-->>NewApp: 200 OK
    NewApp-->>NewApp: Adopted pid=… (state preserved ✅)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/desktop/src/main/lib/host-service-coordinator.ts
Line: 58-68

Comment:
**Silent log-open failure drops all child output**

When `openLogFile` fails (e.g., permissions error, unexpected filesystem issue), it returns `-1` with no warning. The spawn call then falls back to `stdio: "ignore"`, silently discarding all stdout/stderr from the host-service child. Operators will have no indication that logs are missing.

Consider emitting at least a `console.warn` when the file cannot be opened:

```suggestion
function openLogFile(organizationId: string): number {
	try {
		const dir = manifestDir(organizationId);
		if (!fs.existsSync(dir)) {
			fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
		}
		return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
	} catch (err) {
		console.warn(
			`[host-service:${organizationId}] Failed to open log file, child output will be discarded:`,
			err,
		);
		return -1;
	}
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/desktop/src/main/lib/host-service-coordinator.ts
Line: 64

Comment:
**Unbounded log file growth**

`host-service.log` is opened in append mode (`"a"`) with no size cap or rotation. Over time — especially if the host-service restarts frequently or is verbose — this file will grow without bound.

Consider either:
- Rotating at a fixed size (e.g., rename to `.log.1` when it exceeds a few MBs), or
- Truncating on each new spawn by opening with `"w"` instead of `"a"` (acceptable since each spawn is a fresh process whose history can be considered independent).

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(desktop): don't nuke host services o..." | Re-trigger Greptile

Comment on lines +58 to +68
function openLogFile(organizationId: string): number {
try {
const dir = manifestDir(organizationId);
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
}
return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
} catch {
return -1;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent log-open failure drops all child output

When openLogFile fails (e.g., permissions error, unexpected filesystem issue), it returns -1 with no warning. The spawn call then falls back to stdio: "ignore", silently discarding all stdout/stderr from the host-service child. Operators will have no indication that logs are missing.

Consider emitting at least a console.warn when the file cannot be opened:

Suggested change
function openLogFile(organizationId: string): number {
try {
const dir = manifestDir(organizationId);
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
}
return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
} catch {
return -1;
}
}
function openLogFile(organizationId: string): number {
try {
const dir = manifestDir(organizationId);
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
}
return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
} catch (err) {
console.warn(
`[host-service:${organizationId}] Failed to open log file, child output will be discarded:`,
err,
);
return -1;
}
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/main/lib/host-service-coordinator.ts
Line: 58-68

Comment:
**Silent log-open failure drops all child output**

When `openLogFile` fails (e.g., permissions error, unexpected filesystem issue), it returns `-1` with no warning. The spawn call then falls back to `stdio: "ignore"`, silently discarding all stdout/stderr from the host-service child. Operators will have no indication that logs are missing.

Consider emitting at least a `console.warn` when the file cannot be opened:

```suggestion
function openLogFile(organizationId: string): number {
	try {
		const dir = manifestDir(organizationId);
		if (!fs.existsSync(dir)) {
			fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
		}
		return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
	} catch (err) {
		console.warn(
			`[host-service:${organizationId}] Failed to open log file, child output will be discarded:`,
			err,
		);
		return -1;
	}
}
```

How can I resolve this? If you propose a fix, please make it concise.

if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
}
return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unbounded log file growth

host-service.log is opened in append mode ("a") with no size cap or rotation. Over time — especially if the host-service restarts frequently or is verbose — this file will grow without bound.

Consider either:

  • Rotating at a fixed size (e.g., rename to .log.1 when it exceeds a few MBs), or
  • Truncating on each new spawn by opening with "w" instead of "a" (acceptable since each spawn is a fresh process whose history can be considered independent).
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/main/lib/host-service-coordinator.ts
Line: 64

Comment:
**Unbounded log file growth**

`host-service.log` is opened in append mode (`"a"`) with no size cap or rotation. Over time — especially if the host-service restarts frequently or is verbose — this file will grow without bound.

Consider either:
- Rotating at a fixed size (e.g., rename to `.log.1` when it exceeds a few MBs), or
- Truncating on each new spawn by opening with `"w"` instead of `"a"` (acceptable since each spawn is a fresh process whose history can be considered independent).

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/desktop/src/main/lib/host-service-coordinator.ts (1)

457-465: ⚠️ Potential issue | 🟡 Minor

Healthcheck-failure path can leave an orphan manifest.

If the child's serve(...) listen callback runs (which writes the manifest) but /trpc/health.check never answers OK within the timeout, we SIGTERM the child and instances.delete(...) before child.on("exit") fires. When exit does fire, the handler's !current branch short-circuits and skips removeManifest. The next start() will try to adopt a manifest pointing at a pid that's either dying or already gone.

This is self-healing — readAndValidateManifest removes manifests for dead pids via isProcessAlive — but on a fast relaunch you could still race a short window where the pid is reused or briefly alive. Cheap fix:

🛠️ Proposed fix
 		const endpoint = `http://127.0.0.1:${port}`;
 		const healthy = await pollHealthCheck(endpoint, secret);
 		if (!healthy) {
 			child.kill("SIGTERM");
 			this.instances.delete(organizationId);
+			removeManifest(organizationId);
 			throw new Error(
 				`Host service failed to start within ${HEALTH_POLL_TIMEOUT}ms`,
 			);
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/src/main/lib/host-service-coordinator.ts` around lines 457 -
465, When pollHealthCheck(endpoint, secret) times out we currently
child.kill("SIGTERM") and this.instances.delete(organizationId) which can leave
the manifest orphaned because child.on("exit")'s handler skips removeManifest if
instance isn't current; fix by explicitly removing the manifest for this
organization on the healthcheck-failure path (call the same cleanup used in exit
handling, e.g. invoke removeManifest/readAndValidateManifest logic or call
removeManifest(...) for the organizationId/manifest) before or immediately after
deleting the instance so a stale manifest cannot be reused by a fast relaunch;
ensure the fix reuses the existing
removeManifest/readAndValidateManifest/isProcessAlive helpers rather than
duplicating manifest-deletion logic.
🧹 Nitpick comments (1)
apps/desktop/src/main/lib/host-service-coordinator.ts (1)

58-68: Consider bounding host-service.log growth.

With detached + file-backed stdio, host-service.log is opened in append mode and now survives Electron quit/update cycles. For users who keep the service adopted across many updates (which is the whole point of this PR), the log can grow without bound and there's no size cap, truncation-on-rotate, or cleanup. Options, from least to most involved:

  • Truncate ("w") on fresh spawns instead of append — you lose prior-run context but bound size per-instance.
  • Rotate once on spawn: rename existing host-service.loghost-service.log.1 (drop older) before opening.
  • Emit a warning when fs.statSync(log).size exceeds some threshold.

Not a blocker — just flagging since this is the first time the file is persisted across app lifetimes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/src/main/lib/host-service-coordinator.ts` around lines 58 - 68,
openLogFile currently opens host-service.log in append mode with no size bounds;
implement log rotation/truncation before opening: inside openLogFile (use
manifestDir and the "host-service.log" path) check fs.statSync(file).size and if
it exceeds a chosen threshold (e.g. configurable constant) rename the existing
file to host-service.log.1 (overwriting/dropping any previous .1) or truncate by
renaming and creating a fresh file, then open the new file (preserving file mode
0o600); additionally consider emitting a warning via the app logger when size
exceeds the threshold. Ensure all file ops are guarded in the try/catch already
around openLogFile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/desktop/src/main/lib/host-service-coordinator.ts`:
- Around line 457-465: When pollHealthCheck(endpoint, secret) times out we
currently child.kill("SIGTERM") and this.instances.delete(organizationId) which
can leave the manifest orphaned because child.on("exit")'s handler skips
removeManifest if instance isn't current; fix by explicitly removing the
manifest for this organization on the healthcheck-failure path (call the same
cleanup used in exit handling, e.g. invoke
removeManifest/readAndValidateManifest logic or call removeManifest(...) for the
organizationId/manifest) before or immediately after deleting the instance so a
stale manifest cannot be reused by a fast relaunch; ensure the fix reuses the
existing removeManifest/readAndValidateManifest/isProcessAlive helpers rather
than duplicating manifest-deletion logic.

---

Nitpick comments:
In `@apps/desktop/src/main/lib/host-service-coordinator.ts`:
- Around line 58-68: openLogFile currently opens host-service.log in append mode
with no size bounds; implement log rotation/truncation before opening: inside
openLogFile (use manifestDir and the "host-service.log" path) check
fs.statSync(file).size and if it exceeds a chosen threshold (e.g. configurable
constant) rename the existing file to host-service.log.1 (overwriting/dropping
any previous .1) or truncate by renaming and creating a fresh file, then open
the new file (preserving file mode 0o600); additionally consider emitting a
warning via the app logger when size exceeds the threshold. Ensure all file ops
are guarded in the try/catch already around openLogFile.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 42bb6d13-fc58-4e6f-8253-1065edd2a357

📥 Commits

Reviewing files that changed from the base of the PR and between 1e2302f and 87d8deb.

📒 Files selected for processing (2)
  • apps/desktop/src/main/host-service/index.ts
  • apps/desktop/src/main/lib/host-service-coordinator.ts

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/desktop/src/main/lib/host-service-coordinator.ts">

<violation number="1" location="apps/desktop/src/main/lib/host-service-coordinator.ts:65">
P2: Empty `catch {}` silently swallows log-open failures — if this path is hit, all child stdout/stderr is discarded (`stdio: "ignore"` fallback) with zero diagnostic trace. Add at minimum a `console.warn` with the organization ID and error so operators can tell why logs are missing.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
}
return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
} catch {
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Empty catch {} silently swallows log-open failures — if this path is hit, all child stdout/stderr is discarded (stdio: "ignore" fallback) with zero diagnostic trace. Add at minimum a console.warn with the organization ID and error so operators can tell why logs are missing.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/desktop/src/main/lib/host-service-coordinator.ts, line 65:

<comment>Empty `catch {}` silently swallows log-open failures — if this path is hit, all child stdout/stderr is discarded (`stdio: "ignore"` fallback) with zero diagnostic trace. Add at minimum a `console.warn` with the organization ID and error so operators can tell why logs are missing.</comment>

<file context>
@@ -55,6 +55,18 @@ const HEALTH_POLL_INTERVAL = 200;
+			fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
+		}
+		return fs.openSync(path.join(dir, "host-service.log"), "a", 0o600);
+	} catch {
+		return -1;
+	}
</file context>
Suggested change
} catch {
} catch (err) {
console.warn(
`[host-service:${organizationId}] Failed to open log file, child output will be discarded:`,
err,
);
Fix with Cubic

@saddlepaddle saddlepaddle merged commit f175be4 into main Apr 21, 2026
6 of 7 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🧹 Preview Cleanup Complete

The following preview resources have been cleaned up:

  • ✅ Neon database branch
  • ⚠️ Electric Fly.io app

Thank you for your contribution! 🎉

Kitenite added a commit that referenced this pull request Apr 21, 2026
Reconcile with #3620 which landed a simpler version of the same fix on
main. Keep our implementation (dev/prod split, log rotation, chmod for
pre-existing files, windowsHide, utils extraction) and pick up the
`host-service/index.ts` change: drop `removeManifest` from the child's
SIGTERM handler — manifest lifecycle belongs to the coordinator.

Align log filename to `host-service.log` to match what landed on main.
@Kitenite Kitenite deleted the host-service-update-nukes branch May 6, 2026 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant