Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 39 additions & 38 deletions apps/desktop/docs/HOST_SERVICE_LIFECYCLE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,21 @@

## Architecture

Electron main owns app lifecycle, tray, and host-service management. Host-services run as child processes that can outlive the app via manifest-based adoption.
Electron main owns app lifecycle, tray, and host-service management. Host-service runs as a child process **coupled to Electron** — it starts and stops with the app. Terminal sessions (PTYs) survive Electron restarts via a separate `pty-daemon` that host-service supervises on its own detached lifecycle.

```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify a language for the fenced diagram block.

Line 7 opens a fenced block without a language, which triggers markdownlint MD040.

Suggested fix
-```
+```text
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/desktop/docs/HOST_SERVICE_LIFECYCLE.md` at line 7, The fenced diagram
block in HOST_SERVICE_LIFECYCLE.md is missing a language tag which triggers
markdownlint MD040; update the opening fenced code block (the triple-backtick
diagram block) to include a language specifier such as "text" so it becomes a
fenced block with a language (e.g., change the opening ``` to ```text) to
satisfy the linter and preserve the diagram rendering.

┌─────────────────────────────────────────────────────┐
│ Electron Main Process │
│ │
│ ┌──────────┐ ┌──────────────────────┐ ┌───────┐ │
│ │ Tray │ │ HostServiceManager │ │Windows│ │
│ │ Tray │ │ HostServiceCoordinator│ │Windows│ │
│ │ (macOS) │ │ │ │ │ │
│ │ │◄─┤ status events │ │ hide/ │ │
│ │ restart │ │ start/stop/adopt │ │ show │ │
│ │ stop │ │ per org │ │ │ │
│ │ quit ────┼──┼──► requestQuit(mode) │ │ │ │
│ │ restart │◄─┤ status events │ │ hide/ │ │
│ │ stop │ │ start/stop per org │ │ show │ │
│ │ quit ────┼──┼──► app.quit() │ │ │ │
│ └──────────┘ └──────┬───────────────┘ └───────┘ │
└───────────────────────┼─────────────────────────────┘
IPC + stdio
spawn (attached, detached:false)
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
Expand All @@ -26,50 +25,52 @@ Electron main owns app lifecycle, tray, and host-service management. Host-servic
│ (org A) │ │ (org B) │ │ (org C) │
│ │ │ │ │ │
│ HTTP/tRPC │ │ HTTP/tRPC │ │ HTTP/tRPC │
│ port:rand │ │ port:rand │ │ port:rand │
│ │ │ │ │ │
│ writes │ │ writes │ │ writes │
│ manifest │ │ manifest │ │ manifest │
│ supervises │ │ supervises │ │ supervises │
│ pty-daemon │ │ pty-daemon │ │ pty-daemon │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ pty-daemon │ │ pty-daemon │ │ pty-daemon │
│ (detached) │ │ (detached) │ │ (detached) │
│ → PTYs │ │ → PTYs │ │ → PTYs │
└────────────┘ └────────────┘ └────────────┘
│ │ │
▼ ▼ ▼
~/.superset/host/{orgId}/manifest.json
```

### Quit modes
### Quit behavior

All quit paths use a single `QuitMode` (`"release" | "stop"`):
Electron `before-quit` always SIGTERMs every host-service via `coordinator.stopAll()`. There is no "release" mode — host-services no longer outlive the app.

- **release** — detach from services, they keep running for re-adoption on next launch
- **stop** — SIGTERM all services, then exit
- **implicit** (Cmd+Q with active services on macOS) — hide windows to tray
What survives a quit:
- **pty-daemon + open PTYs** — pty-daemon is spawned by host-service with `detached: true`. On the next launch, host-service adopts the existing pty-daemon via its socket/manifest. See `packages/host-service/src/daemon/DaemonSupervisor.ts`.

### Manifest adoption
What does **not** survive:
- In-flight chat completions, file watchers, durable-session reads. These are bound to host-service's process and tear down with it. The renderer handles reconnect on next launch.

Each host-service child writes `~/.superset/host/{orgId}/manifest.json` on startup (pid, endpoint, authToken, version). It's a pidfile extended with connection info.
### How host-service is reaped

- **Release quit** — children keep running, manifests stay on disk
- **Next launch** — `discoverAndAdoptAll()` scans manifests, health-checks each pid/endpoint, reconnects if healthy, removes and respawns if not
- **Stop quit** — SIGTERM children, they remove their own manifests on shutdown
| Quit path | Mechanism |
|---|---|
| Clean `before-quit` (Cmd+Q, tray quit, auto-update install) | `coordinator.stopAll()` SIGTERMs each child; child closes its HTTP server and exits within `SHUTDOWN_GRACE_MS` (3s) |
| Electron force-killed / crash | Parent-pid watchdog inside host-service (`apps/desktop/src/main/host-service/index.ts`) polls `process.ppid`. When Electron's pid is gone, the child shuts down voluntarily |
| Dev `bun dev` SIGTERM/SIGINT | Coordinator's `stopAll()` runs in the signal handler before `app.exit()` |

```
App Launch App Quit (release) Next Launch
───────── ────────────────── ───────────
spawn child ──► child writes parent detaches scan manifests
manifest.json manifests stay on disk health-check pid/endpoint
{pid, endpoint, child keeps running ├─ healthy → reconnect
authToken, ...} └─ dead/bad → remove, respawn
```
The watchdog only runs when `HOST_PARENT_PID` is set in the child env — CLI-spawned host-services (`packages/cli`) explicitly skip coupling and use `detached: true` for their own deployment model.

### Manifest

### v1 vs v2 terminal paths
Each host-service still writes `~/.superset/host/{orgId}/manifest.json` (pid, endpoint, authToken, app version). Electron's coordinator no longer reads it for adoption; the manifest is now consumed by:

v1 terminals run on a separate **terminal-host daemon** (`src/main/terminal-host/`) — a persistent background process that owns PTYs over a Unix domain socket. It has its own survival and reconnection model independent of host-service.
- **CLI** (`packages/cli`) — finds and talks to a running host-service for `status`/`stop`/`start` commands.
- **`coordinator.reset()`** — SIGKILLs whatever pid the manifest names as a recovery escape hatch when a wedged host-service has been left behind (superset-sh/superset#4299).

v2 terminals run through **host-service** child processes. The quit/adopt/tray lifecycle described here only applies to host-service instances.
Host-service writes the manifest on boot but does not remove it on exit; coordinator removes it on `stop()` and when the child exits.

### Design decisions

- **No supervisor process.** Electron main owns everything. Simpler while v1 and v2 coexist.
- **No tray on Windows/Linux.** Services still survive quit and are re-adopted, but there's no persistent UI to manage them.
- **Tray calls `requestQuit(mode)`.** One function, one codepath — no setter chains or flag mutation.
- **Manifest handling is single-sourced.** Both parent and child use `host-service-manifest.ts`. Files are written with 0o600 permissions.
- **Coupled to Electron.** PTY survival is owned by pty-daemon, not host-service. No reason for host-service itself to outlive the app — coupling deletes the adoption codepath and removes a class of "wedged adopted service" bugs.
- **CLI keeps its own spawn.** Standalone host-service deployments (CLI-driven) still use detached lifetime via `packages/cli/src/lib/host/spawn.ts`. The coordinator's coupling only applies to Electron-spawned children.
- **No supervisor process.** Electron main owns everything.
- **No tray on Windows/Linux.** Services stop with the app.
- **Manifest handling stays single-sourced.** Both desktop and CLI use the same `host-service-manifest.ts` API. Files are written with 0o600 permissions.
4 changes: 2 additions & 2 deletions apps/desktop/src/lib/electron-app/factories/app/setup.ts
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ export async function makeAppSetup(
});

// macOS: keep the app alive (standard behavior) — tray/dock provide re-entry.
// Windows/Linux: quit the app UI. Host-services survive via releaseAll()
// and will be re-adopted on next launch.
// Windows/Linux: quit the app UI. Host-services are coupled to the app and
// stop with it; v1 pty-daemon survives separately.
app.on("window-all-closed", () => !PLATFORM.IS_MAC && app.quit());

return window;
Expand Down
1 change: 0 additions & 1 deletion apps/desktop/src/main/host-service/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ export const env = createEnv({
ORGANIZATION_ID: z.string().min(1),
DESKTOP_VITE_PORT: z.coerce.number().int().positive(),
RELAY_URL: z.string().url().optional(),
SUPERSET_APP_VERSION: z.string().min(1),
},
runtimeEnv: process.env,
emptyStringAsUndefined: true,
Expand Down
66 changes: 57 additions & 9 deletions apps/desktop/src/main/host-service/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,55 @@ import { loadToken } from "lib/trpc/routers/auth/utils/auth-functions";
import { writeManifest } from "main/lib/host-service-manifest";
import { env } from "./env";

const SHUTDOWN_GRACE_MS = 3_000;
const WATCHDOG_INTERVAL_MS = 2_000;

type Server = ReturnType<typeof serve>;

async function main(): Promise<void> {
// Install the parent watchdog before any awaits so a crash during
// startup can still reap this child. `serverRef` is filled in once
// serve() returns; shutdown handles both pre- and post-bind states.
const serverRef: { current: Server | null } = { current: null };
let shuttingDown = false;
const shutdown = (reason: string) => {
if (shuttingDown) return;
shuttingDown = true;
console.log(`[host-service] shutdown (${reason}), draining connections`);
const server = serverRef.current;
if (!server) {
process.exit(0);
}
server.close();
// SSE/WS streams (chat, watchers) ignore server.close() — give in-flight
// HTTP a brief window, then forcibly tear sockets down.
const forceExit = setTimeout(() => {
const httpServer = server as unknown as {
closeAllConnections?: () => void;
};
httpServer.closeAllConnections?.();
process.exit(0);
}, SHUTDOWN_GRACE_MS);
forceExit.unref();
};

process.on("SIGTERM", () => shutdown("SIGTERM"));
process.on("SIGINT", () => shutdown("SIGINT"));

// Self-exit if our Electron parent dies without sending SIGTERM
// (orphan reparenting to init/launchd). CLI-spawned host-services
// don't set HOST_PARENT_PID and skip this.
const parentPid = Number(process.env.HOST_PARENT_PID);
if (Number.isInteger(parentPid) && parentPid > 1) {
const interval = setInterval(() => {
if (!isParentAlive(parentPid)) {
clearInterval(interval);
shutdown("parent-exit");
}
}, WATCHDOG_INTERVAL_MS);
interval.unref();
}

const terminalBaseEnv = await resolveTerminalBaseEnv();
initTerminalBaseEnv(terminalBaseEnv);

Expand Down Expand Up @@ -75,7 +123,6 @@ async function main(): Promise<void> {
authToken: env.HOST_SERVICE_SECRET,
startedAt,
organizationId: env.ORGANIZATION_ID,
spawnedByAppVersion: env.SUPERSET_APP_VERSION,
});
} catch (error) {
console.error("[host-service] Failed to write manifest:", error);
Expand All @@ -94,16 +141,17 @@ async function main(): Promise<void> {
}
},
);
serverRef.current = server;
injectWebSocket(server);
}

// Manifest lifecycle belongs to the coordinator, not the child.
const shutdown = () => {
server.close();
process.exit(0);
};

process.on("SIGTERM", shutdown);
process.on("SIGINT", shutdown);
function isParentAlive(parentPid: number): boolean {
try {
process.kill(parentPid, 0);
return process.ppid === parentPid;
} catch {
return false;
}
}
Comment on lines +148 to 155
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The catch block swallows all errors from process.kill(parentPid, 0), including EPERM. On POSIX, EPERM means the target process exists but the caller lacks permission to signal it (e.g. different uid after a privilege drop). Returning false in that case would cause the watchdog to call shutdown("parent-exit") even though Electron is still alive, killing the host-service spuriously. ESRCH (process not found) is the only error that genuinely means "dead parent".

Suggested change
function isParentAlive(parentPid: number): boolean {
try {
process.kill(parentPid, 0);
return process.ppid === parentPid;
} catch {
return false;
}
}
function isParentAlive(parentPid: number): boolean {
try {
process.kill(parentPid, 0);
return process.ppid === parentPid;
} catch (err) {
// EPERM means the process exists but we can't signal it — still alive.
if ((err as NodeJS.ErrnoException).code === "EPERM") {
return process.ppid === parentPid;
}
return false;
}
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/main/host-service/index.ts
Line: 139-146

Comment:
The `catch` block swallows all errors from `process.kill(parentPid, 0)`, including `EPERM`. On POSIX, `EPERM` means the target process **exists** but the caller lacks permission to signal it (e.g. different uid after a privilege drop). Returning `false` in that case would cause the watchdog to call `shutdown("parent-exit")` even though Electron is still alive, killing the host-service spuriously. `ESRCH` (process not found) is the only error that genuinely means "dead parent".

```suggestion
function isParentAlive(parentPid: number): boolean {
	try {
		process.kill(parentPid, 0);
		return process.ppid === parentPid;
	} catch (err) {
		// EPERM means the process exists but we can't signal it — still alive.
		if ((err as NodeJS.ErrnoException).code === "EPERM") {
			return process.ppid === parentPid;
		}
		return false;
	}
}
```

How can I resolve this? If you propose a fix, please make it concise.


void main().catch((error) => {
Expand Down
26 changes: 9 additions & 17 deletions apps/desktop/src/main/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -176,14 +176,14 @@ export function quitApp(): void {
app.quit();
}

/** Nuclear quit: also kills host-service(s) and pty-daemon/terminal-host. */
/** Quit + also tear down the v1 terminal-host client. Tray "Quit Completely". */
export function quitAppCompletely(): void {
forceFullCleanup = true;
setSkipQuitConfirmation();
app.quit();
}

/** Bypasses before-quit — services are left running for re-adoption on next launch. */
/** Bypasses before-quit. Host-service children self-exit via the parent watchdog. */
export function exitImmediately(): void {
app.exit(0);
}
Expand Down Expand Up @@ -224,11 +224,9 @@ app.on("before-quit", async (event) => {

isQuitting = true;
try {
getHostServiceCoordinator().stopAll();
if (isDev || forceFullCleanup || isUpdateReadyToInstall()) {
await runFullQuitCleanup();
} else {
// Prod: leave services running so the next launch re-adopts via manifest.
getHostServiceCoordinator().releaseAll();
await teardownTerminalHost();
}
shutdownTanstackDbPersistence();
disposeTray();
Expand All @@ -241,13 +239,10 @@ app.on("before-quit", async (event) => {
});

/**
* Full cleanup — kill host-service + terminal-host children. Used in dev, on
* update installs, and on the tray's "Quit Superset Completely" path in prod.
* Tear down the v1 terminal-host client. Skipped on regular quit so v1
* PTY sessions reattach via socket on next launch.
*/
async function runFullQuitCleanup(): Promise<void> {
const coordinator = getHostServiceCoordinator();
await coordinator.teardownKnownManifests();
coordinator.stopAll();
async function teardownTerminalHost(): Promise<void> {
try {
await getTerminalHostClient().shutdownIfRunning({ killSessions: true });
} catch (err) {
Expand All @@ -273,8 +268,9 @@ if (process.env.NODE_ENV === "development") {
if (signalHandled) return;
signalHandled = true;
console.log(`[main] Received ${signal}, quitting...`);
getHostServiceCoordinator().stopAll();
void Promise.allSettled([
runFullQuitCleanup(),
teardownTerminalHost(),
stopNetworkLogger(),
]).finally(() => app.exit(0));
};
Expand Down Expand Up @@ -418,10 +414,6 @@ if (!gotTheLock) {
console.error("[main] Failed to install bundled CLI shim:", error);
}

// Discover and adopt host-services that survived a previous quit
// before the tray initializes, so it shows accurate status immediately.
await getHostServiceCoordinator().discoverAll();

if (IS_DEV) {
getHostServiceCoordinator().enableDevReload(async () => {
const { token } = await loadToken();
Expand Down
Loading
Loading