Skip to content

Recover pty daemon adoption from live socket#4781

Merged
Kitenite merged 2 commits into
mainfrom
adopt-terminals-pty-resta
May 20, 2026
Merged

Recover pty daemon adoption from live socket#4781
Kitenite merged 2 commits into
mainfrom
adopt-terminals-pty-resta

Conversation

@Kitenite
Copy link
Copy Markdown
Collaborator

@Kitenite Kitenite commented May 20, 2026

Summary

  • Recover v2 daemon adoption from the deterministic org socket when the pty-daemon manifest is missing or stale.
  • Include the daemon process id in hello-ack so the host service can rebuild adoption state without host-side PID discovery.
  • Keep the fix scoped to the daemon handshake and supervisor adoption path so host restarts behave closer to v1 socket-first recovery.

Testing

  • bun test packages/host-service/src/daemon/DaemonSupervisor.test.ts packages/pty-daemon/src/protocol/wire-shape.test.ts
  • bun run --cwd packages/host-service typecheck
  • bun run --cwd packages/pty-daemon typecheck
  • bun run --cwd packages/host-service test:integration:daemon
  • bun run lint

Open in Stage

Summary by cubic

Recover v2 pty-daemon adoption from the deterministic org socket when the manifest is missing or stale, rebuilding state using the daemon-reported PID. Also avoids stale adoption metadata so host restarts are reliable and closer to v1’s socket-first recovery.

  • Bug Fixes
    • Extended fallback in DaemonSupervisor.tryAdopt to also handle version-probe failures; adopts from the expected org socket and rewrites the manifest with the resolved PID and preserved startedAt.
    • Replaced version-only probe with a hello handshake that returns both daemonVersion and daemonPid (with retry).
    • Updated hello-ack and server reply in packages/pty-daemon; manifest writes now use CURRENT_PROTOCOL_VERSION.
    • Adds clear adoption logs and a test for manifest-missing socket recovery.

Written for commit ea91066. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • Bug Fixes

    • Improved daemon recovery when local configuration or socket state is missing or stale, automatically restoring expected runtime state for more reliable startup.
  • Improvements

    • Daemon handshake now reports process identity and version more reliably, improving detection of running daemons and update-pending status.
  • Tests

    • Added tests verifying recovery flow, manifest recreation, and telemetry when a daemon manifest is missing.

Review Change Stack

@capy-ai
Copy link
Copy Markdown

capy-ai Bot commented May 20, 2026

Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e73adb6-ed99-4c49-983f-40624e4b1721

📥 Commits

Reviewing files that changed from the base of the PR and between 861452b and ea91066.

📒 Files selected for processing (2)
  • packages/host-service/src/daemon/DaemonSupervisor.test.ts
  • packages/host-service/src/daemon/DaemonSupervisor.ts

📝 Walkthrough

Walkthrough

The PR adds daemonPid to the pty-daemon hello-ack handshake, refactors supervisor probe utilities to return structured hello results (version + optional pid) with retries, changes tryAdopt to recover missing/stale manifests via socket probing (tryAdoptFromSocket), exports ptyDaemonSocketPath, and adds tests that control socket location and returned PID to validate manifest recovery and telemetry.

Changes

Daemon Adoption Recovery with PID-based Manifest Restoration

Layer / File(s) Summary
Protocol: HelloAckMessage daemonPid field
packages/pty-daemon/src/protocol/messages.ts, packages/pty-daemon/src/Server/Server.ts
HelloAckMessage gains optional daemonPid?: number. Server hello-ack now includes daemonPid: process.pid.
Probe utilities & DaemonProbeResult
packages/host-service/src/daemon/DaemonSupervisor.ts
Adds DaemonProbeResult to carry {daemonVersion, daemonPid?}. Refactors probeDaemonHello/probeDaemonVersion to return/consume structured results and introduces probeDaemonHelloWithRetry/probeDaemonVersionWithRetry that retry across the total timeout window.
Adoption recovery and manifest handling
packages/host-service/src/daemon/DaemonSupervisor.ts
Refactors tryAdopt to compute expected per-org socket; adds tryAdoptFromSocket which checks socket reachability, probes hello with retry, validates/uses daemonPid, writes a recovered manifest (protocolVersions: [CURRENT_PROTOCOL_VERSION], startedAt), retries against expected socket path on mismatch, and derives runningVersion/updatePending from probe result. Also exports ptyDaemonSocketPath.
Test: fake daemon socket & manifest-missing recovery
packages/host-service/src/daemon/DaemonSupervisor.test.ts
Extends test fake daemon options with socketPath and daemonPid, makes startFakeDaemon accept caller-provided socket path, includes daemonPid in fake hello-ack, imports ptyDaemonSocketPath for deterministic expected-path testing, and adds a tryAdopt test covering manifest-missing recovery and telemetry verification.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • superset-sh/superset#4460: Modifies supervisor adoption/probing flow with retry-based daemon version probing and changed adoption rejection/telemetry behavior on probe failures.

Poem

🐰 I sniffed the socket in the night,

a PID glowed back in gentle light.
With hello-ack and tidy art,
I stitched the missing manifest part.
Recovery hummed — the daemon's heart.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Recover pty daemon adoption from live socket' directly and concisely summarizes the main change: enabling adoption recovery when manifest is missing by using the live socket path.
Description check ✅ Passed The PR description includes a clear summary, detailed testing instructions, related issue links, and type classification. While not perfectly matching template sections, it provides comprehensive context about the changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch adopt-terminals-pty-resta

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@stage-review
Copy link
Copy Markdown

stage-review Bot commented May 20, 2026

Ready to review this PR? Stage has broken it down into 4 individual chapters for you:

Title
1 Include daemon PID in protocol handshake
2 Upgrade version probe to full hello handshake
3 Implement socket-based adoption recovery logic
4 Test daemon adoption and socket recovery
Open in Stage

Chapters generated by Stage for commit ea91066 on May 20, 2026 8:55pm UTC.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/host-service/src/daemon/DaemonSupervisor.ts (1)

853-915: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mismatched manifest socket can still be adopted instead of retrying the deterministic socket.

Line 853 computes expectedSocketPath, but after a successful probe on manifest.socketPath (Lines 880-915), adoption proceeds even when paths differ. This can keep stale/legacy socket paths alive indefinitely and bypass the deterministic recovery path.

Suggested fix
 		const probe = await probeDaemonHelloWithRetry(
 			manifest.socketPath,
 			ADOPTION_PROBE_TOTAL_TIMEOUT_MS,
 		);
 		if (!probe) {
 			logEvent("pty_daemon_adopt_rejected", {
 				organizationId,
 				pid: manifest.pid,
 				socketPath: manifest.socketPath,
 				reason: "version_probe_failed",
 			});
 			await terminateProcessTreeAndGroups(manifest.pid, "SIGTERM");
 			removePtyDaemonManifest(organizationId);
 			if (manifest.socketPath !== expectedSocketPath) {
 				return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
 					reason: "manifest_version_probe_failed",
 					previousManifest: manifest,
 				});
 			}
 			return null;
 		}
+		if (manifest.socketPath !== expectedSocketPath) {
+			return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
+				reason: "manifest_socket_mismatch",
+				previousManifest: manifest,
+			});
+		}
 		const runningVersion = probe.daemonVersion;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/host-service/src/daemon/DaemonSupervisor.ts` around lines 853 - 915,
The code currently adopts a daemon after probeDaemonHelloWithRetry even when
manifest.socketPath differs from the deterministic
ptyDaemonSocketPath(organizationId); change the final adoption logic in the
function to detect when manifest.socketPath !== expectedSocketPath after a
successful probe and, instead of returning the manifest info, call
tryAdoptFromSocket(organizationId, expectedSocketPath, { reason:
"manifest_socket_mismatch", previousManifest: manifest }) (or otherwise defer to
the deterministic socket), so only daemons listening on the expected
ptyDaemonSocketPath are adopted and stale/legacy sockets are retried via
tryAdoptFromSocket; keep the existing flow for the case where the paths match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/host-service/src/daemon/DaemonSupervisor.ts`:
- Around line 853-915: The code currently adopts a daemon after
probeDaemonHelloWithRetry even when manifest.socketPath differs from the
deterministic ptyDaemonSocketPath(organizationId); change the final adoption
logic in the function to detect when manifest.socketPath !== expectedSocketPath
after a successful probe and, instead of returning the manifest info, call
tryAdoptFromSocket(organizationId, expectedSocketPath, { reason:
"manifest_socket_mismatch", previousManifest: manifest }) (or otherwise defer to
the deterministic socket), so only daemons listening on the expected
ptyDaemonSocketPath are adopted and stale/legacy sockets are retried via
tryAdoptFromSocket; keep the existing flow for the case where the paths match.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f75eded1-048f-419a-aace-30ba33b7fcb2

📥 Commits

Reviewing files that changed from the base of the PR and between 916b5a3 and 861452b.

📒 Files selected for processing (4)
  • packages/host-service/src/daemon/DaemonSupervisor.test.ts
  • packages/host-service/src/daemon/DaemonSupervisor.ts
  • packages/pty-daemon/src/Server/Server.ts
  • packages/pty-daemon/src/protocol/messages.ts

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This PR adds socket-first recovery so the host-service supervisor can re-adopt a running v2 daemon when its manifest is missing or stale. The daemon now includes its PID in the hello-ack handshake, letting the supervisor rebuild adoption state purely from a live connection to the deterministic org socket.

  • DaemonSupervisor.tryAdopt gains four new fallback paths that delegate to tryAdoptFromSocket, which probes the expected socket, validates the returned PID, and writes a fresh manifest.
  • probeDaemonVersionWithRetry is renamed to probeDaemonHelloWithRetry (returning a DaemonProbeResult instead of a bare version string); a thin backward-compat wrapper is kept for the one remaining call-site in the upgrade/handoff flow.
  • The HelloAckMessage protocol type gains an optional daemonPid field, and Server.ts populates it with process.pid on every handshake.

Confidence Score: 4/5

The change is well-scoped and the new socket-adoption path is gated behind multiple live checks; no existing adoption flows are broken.

The new tryAdoptFromSocket method is carefully guarded (socket connectable, PID in hello-ack, process alive) and the protocol addition is backward-compatible. The manifest_pid_dead path passes the dead daemon's startedAt into the new manifest for a different process, but startedAt is advisory and does not drive correctness-critical logic. The test helper duplicates the unexported socket-path formula, creating silent drift risk if that formula ever changes.

packages/host-service/src/daemon/DaemonSupervisor.ts — specifically the startedAt forwarding in the manifest_pid_dead recovery branch

Important Files Changed

Filename Overview
packages/host-service/src/daemon/DaemonSupervisor.ts Core supervisor logic: adds tryAdoptFromSocket fallback with four new entry points; renames probeDaemonVersionWithRetry to probeDaemonHelloWithRetry and keeps a backward-compat wrapper; startedAt from a dead daemon's manifest is forwarded to the newly-adopted daemon's manifest entry
packages/pty-daemon/src/protocol/messages.ts Adds optional daemonPid field to HelloAckMessage; backward-compatible since older clients ignore unknown fields
packages/pty-daemon/src/Server/Server.ts One-line change: includes process.pid in the hello-ack response; straightforward and correct
packages/host-service/src/daemon/DaemonSupervisor.test.ts Adds a test for manifest-missing recovery; the expectedSocketPathForOrg helper duplicates the unexported production ptyDaemonSocketPath logic, creating silent drift risk if the path formula changes

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[tryAdopt] --> B{Read manifest}
    B -- missing --> C[tryAdoptFromSocket\nreason: manifest_missing]
    B -- exists --> D{isProcessAlive\nmanifest.pid}
    D -- dead --> E[removePtyDaemonManifest]
    E --> F[tryAdoptFromSocket\nreason: manifest_pid_dead]
    D -- alive --> G{isSocketConnectable\nmanifest.socketPath}
    G -- unreachable --> H[terminateProcess\nremoveManifest]
    H --> I{socketPath != expectedSocketPath?}
    I -- yes --> J[tryAdoptFromSocket\nreason: manifest_socket_unreachable]
    I -- no --> K[return null: respawn]
    G -- reachable --> L[probeDaemonHelloWithRetry]
    L -- failed --> M[terminateProcess\nremoveManifest]
    M --> N{socketPath != expectedSocketPath?}
    N -- yes --> O[tryAdoptFromSocket\nreason: manifest_version_probe_failed]
    N -- no --> K
    L -- success --> P[return DaemonInstance from manifest]

    subgraph tryAdoptFromSocket
        Q{isSocketConnectable} -- no --> R[return null]
        Q -- yes --> S[probeDaemonHelloWithRetry]
        S -- failed --> T[return null]
        S -- success --> U{valid daemonPid && isProcessAlive?}
        U -- no --> V[return null]
        U -- yes --> W[writePtyDaemonManifest]
        W --> X[return DaemonInstance]
    end

    C --> Q
    F --> Q
    J --> Q
    O --> Q
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/host-service/src/daemon/DaemonSupervisor.ts:859-865
In the `manifest_pid_dead` path, `previousManifest.startedAt` belongs to the dead daemon, not to the newly-adopted daemon discovered at the expected socket. Writing that timestamp into the new manifest silently records the wrong start time for a different process. Using `Date.now()` (or omitting the `previousManifest` argument entirely) avoids this confusion.

```suggestion
		if (!isProcessAlive(manifest.pid)) {
			removePtyDaemonManifest(organizationId);
			return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
				reason: "manifest_pid_dead",
			});
		}
```

### Issue 2 of 2
packages/host-service/src/daemon/DaemonSupervisor.test.ts:1081-1089
**Test helper duplicates unexported production logic**

`expectedSocketPathForOrg` reimplements `ptyDaemonSocketPath` character-for-character. Because the production function is not exported, the test must inline it, but this means a future change to the hash prefix, hash length, or filename template in `DaemonSupervisor.ts` will silently leave this helper pointing at the old path — the fake daemon would be placed on the wrong socket, `tryAdoptFromSocket` would find nothing, and the test would still pass. Consider exporting `ptyDaemonSocketPath` (or a thin test-only re-export) so this helper can delegate to the real implementation.

Reviews (1): Last reviewed commit: "fix(host-service): recover pty daemon ad..." | Re-trigger Greptile

Comment on lines 859 to 865
if (!isProcessAlive(manifest.pid)) {
removePtyDaemonManifest(organizationId);
return null;
return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
reason: "manifest_pid_dead",
previousManifest: manifest,
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 In the manifest_pid_dead path, previousManifest.startedAt belongs to the dead daemon, not to the newly-adopted daemon discovered at the expected socket. Writing that timestamp into the new manifest silently records the wrong start time for a different process. Using Date.now() (or omitting the previousManifest argument entirely) avoids this confusion.

Suggested change
if (!isProcessAlive(manifest.pid)) {
removePtyDaemonManifest(organizationId);
return null;
return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
reason: "manifest_pid_dead",
previousManifest: manifest,
});
}
if (!isProcessAlive(manifest.pid)) {
removePtyDaemonManifest(organizationId);
return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
reason: "manifest_pid_dead",
});
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/daemon/DaemonSupervisor.ts
Line: 859-865

Comment:
In the `manifest_pid_dead` path, `previousManifest.startedAt` belongs to the dead daemon, not to the newly-adopted daemon discovered at the expected socket. Writing that timestamp into the new manifest silently records the wrong start time for a different process. Using `Date.now()` (or omitting the `previousManifest` argument entirely) avoids this confusion.

```suggestion
		if (!isProcessAlive(manifest.pid)) {
			removePtyDaemonManifest(organizationId);
			return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
				reason: "manifest_pid_dead",
			});
		}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 1081 to 1089
function expectedSocketPathForOrg(organizationId: string): string {
const shortId = createHash("sha256")
.update(organizationId)
.digest("hex")
.slice(0, 12);
return path.join(os.tmpdir(), `superset-ptyd-${shortId}.sock`);
}

async function waitForProcessExit(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test helper duplicates unexported production logic

expectedSocketPathForOrg reimplements ptyDaemonSocketPath character-for-character. Because the production function is not exported, the test must inline it, but this means a future change to the hash prefix, hash length, or filename template in DaemonSupervisor.ts will silently leave this helper pointing at the old path — the fake daemon would be placed on the wrong socket, tryAdoptFromSocket would find nothing, and the test would still pass. Consider exporting ptyDaemonSocketPath (or a thin test-only re-export) so this helper can delegate to the real implementation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/daemon/DaemonSupervisor.test.ts
Line: 1081-1089

Comment:
**Test helper duplicates unexported production logic**

`expectedSocketPathForOrg` reimplements `ptyDaemonSocketPath` character-for-character. Because the production function is not exported, the test must inline it, but this means a future change to the hash prefix, hash length, or filename template in `DaemonSupervisor.ts` will silently leave this helper pointing at the old path — the fake daemon would be placed on the wrong socket, `tryAdoptFromSocket` would find nothing, and the test would still pass. Consider exporting `ptyDaemonSocketPath` (or a thin test-only re-export) so this helper can delegate to the real implementation.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🧹 Preview Cleanup Complete

The following preview resources have been cleaned up:

  • ✅ Neon database branch

Thank you for your contribution! 🎉

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread packages/host-service/src/daemon/DaemonSupervisor.ts Outdated
@Kitenite Kitenite merged commit 6dc5b88 into main May 20, 2026
17 checks passed
@Kitenite Kitenite deleted the adopt-terminals-pty-resta branch May 20, 2026 21:02
MocA-Love pushed a commit to MocA-Love/superset that referenced this pull request May 25, 2026
* fix(host-service): recover pty daemon adoption from live socket

* fix(host-service): avoid stale adoption metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant