Recover pty daemon adoption from live socket#4781
Conversation
|
Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR adds daemonPid to the pty-daemon hello-ack handshake, refactors supervisor probe utilities to return structured hello results (version + optional pid) with retries, changes tryAdopt to recover missing/stale manifests via socket probing (tryAdoptFromSocket), exports ptyDaemonSocketPath, and adds tests that control socket location and returned PID to validate manifest recovery and telemetry. ChangesDaemon Adoption Recovery with PID-based Manifest Restoration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Ready to review this PR? Stage has broken it down into 4 individual chapters for you:
Chapters generated by Stage for commit ea91066 on May 20, 2026 8:55pm UTC. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/host-service/src/daemon/DaemonSupervisor.ts (1)
853-915:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMismatched manifest socket can still be adopted instead of retrying the deterministic socket.
Line 853 computes
expectedSocketPath, but after a successful probe onmanifest.socketPath(Lines 880-915), adoption proceeds even when paths differ. This can keep stale/legacy socket paths alive indefinitely and bypass the deterministic recovery path.Suggested fix
const probe = await probeDaemonHelloWithRetry( manifest.socketPath, ADOPTION_PROBE_TOTAL_TIMEOUT_MS, ); if (!probe) { logEvent("pty_daemon_adopt_rejected", { organizationId, pid: manifest.pid, socketPath: manifest.socketPath, reason: "version_probe_failed", }); await terminateProcessTreeAndGroups(manifest.pid, "SIGTERM"); removePtyDaemonManifest(organizationId); if (manifest.socketPath !== expectedSocketPath) { return this.tryAdoptFromSocket(organizationId, expectedSocketPath, { reason: "manifest_version_probe_failed", previousManifest: manifest, }); } return null; } + if (manifest.socketPath !== expectedSocketPath) { + return this.tryAdoptFromSocket(organizationId, expectedSocketPath, { + reason: "manifest_socket_mismatch", + previousManifest: manifest, + }); + } const runningVersion = probe.daemonVersion;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/host-service/src/daemon/DaemonSupervisor.ts` around lines 853 - 915, The code currently adopts a daemon after probeDaemonHelloWithRetry even when manifest.socketPath differs from the deterministic ptyDaemonSocketPath(organizationId); change the final adoption logic in the function to detect when manifest.socketPath !== expectedSocketPath after a successful probe and, instead of returning the manifest info, call tryAdoptFromSocket(organizationId, expectedSocketPath, { reason: "manifest_socket_mismatch", previousManifest: manifest }) (or otherwise defer to the deterministic socket), so only daemons listening on the expected ptyDaemonSocketPath are adopted and stale/legacy sockets are retried via tryAdoptFromSocket; keep the existing flow for the case where the paths match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/host-service/src/daemon/DaemonSupervisor.ts`:
- Around line 853-915: The code currently adopts a daemon after
probeDaemonHelloWithRetry even when manifest.socketPath differs from the
deterministic ptyDaemonSocketPath(organizationId); change the final adoption
logic in the function to detect when manifest.socketPath !== expectedSocketPath
after a successful probe and, instead of returning the manifest info, call
tryAdoptFromSocket(organizationId, expectedSocketPath, { reason:
"manifest_socket_mismatch", previousManifest: manifest }) (or otherwise defer to
the deterministic socket), so only daemons listening on the expected
ptyDaemonSocketPath are adopted and stale/legacy sockets are retried via
tryAdoptFromSocket; keep the existing flow for the case where the paths match.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f75eded1-048f-419a-aace-30ba33b7fcb2
📒 Files selected for processing (4)
packages/host-service/src/daemon/DaemonSupervisor.test.tspackages/host-service/src/daemon/DaemonSupervisor.tspackages/pty-daemon/src/Server/Server.tspackages/pty-daemon/src/protocol/messages.ts
Greptile SummaryThis PR adds socket-first recovery so the host-service supervisor can re-adopt a running v2 daemon when its manifest is missing or stale. The daemon now includes its PID in the
Confidence Score: 4/5The change is well-scoped and the new socket-adoption path is gated behind multiple live checks; no existing adoption flows are broken. The new tryAdoptFromSocket method is carefully guarded (socket connectable, PID in hello-ack, process alive) and the protocol addition is backward-compatible. The manifest_pid_dead path passes the dead daemon's startedAt into the new manifest for a different process, but startedAt is advisory and does not drive correctness-critical logic. The test helper duplicates the unexported socket-path formula, creating silent drift risk if that formula ever changes. packages/host-service/src/daemon/DaemonSupervisor.ts — specifically the startedAt forwarding in the manifest_pid_dead recovery branch
|
| Filename | Overview |
|---|---|
| packages/host-service/src/daemon/DaemonSupervisor.ts | Core supervisor logic: adds tryAdoptFromSocket fallback with four new entry points; renames probeDaemonVersionWithRetry to probeDaemonHelloWithRetry and keeps a backward-compat wrapper; startedAt from a dead daemon's manifest is forwarded to the newly-adopted daemon's manifest entry |
| packages/pty-daemon/src/protocol/messages.ts | Adds optional daemonPid field to HelloAckMessage; backward-compatible since older clients ignore unknown fields |
| packages/pty-daemon/src/Server/Server.ts | One-line change: includes process.pid in the hello-ack response; straightforward and correct |
| packages/host-service/src/daemon/DaemonSupervisor.test.ts | Adds a test for manifest-missing recovery; the expectedSocketPathForOrg helper duplicates the unexported production ptyDaemonSocketPath logic, creating silent drift risk if the path formula changes |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[tryAdopt] --> B{Read manifest}
B -- missing --> C[tryAdoptFromSocket\nreason: manifest_missing]
B -- exists --> D{isProcessAlive\nmanifest.pid}
D -- dead --> E[removePtyDaemonManifest]
E --> F[tryAdoptFromSocket\nreason: manifest_pid_dead]
D -- alive --> G{isSocketConnectable\nmanifest.socketPath}
G -- unreachable --> H[terminateProcess\nremoveManifest]
H --> I{socketPath != expectedSocketPath?}
I -- yes --> J[tryAdoptFromSocket\nreason: manifest_socket_unreachable]
I -- no --> K[return null: respawn]
G -- reachable --> L[probeDaemonHelloWithRetry]
L -- failed --> M[terminateProcess\nremoveManifest]
M --> N{socketPath != expectedSocketPath?}
N -- yes --> O[tryAdoptFromSocket\nreason: manifest_version_probe_failed]
N -- no --> K
L -- success --> P[return DaemonInstance from manifest]
subgraph tryAdoptFromSocket
Q{isSocketConnectable} -- no --> R[return null]
Q -- yes --> S[probeDaemonHelloWithRetry]
S -- failed --> T[return null]
S -- success --> U{valid daemonPid && isProcessAlive?}
U -- no --> V[return null]
U -- yes --> W[writePtyDaemonManifest]
W --> X[return DaemonInstance]
end
C --> Q
F --> Q
J --> Q
O --> Q
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/host-service/src/daemon/DaemonSupervisor.ts:859-865
In the `manifest_pid_dead` path, `previousManifest.startedAt` belongs to the dead daemon, not to the newly-adopted daemon discovered at the expected socket. Writing that timestamp into the new manifest silently records the wrong start time for a different process. Using `Date.now()` (or omitting the `previousManifest` argument entirely) avoids this confusion.
```suggestion
if (!isProcessAlive(manifest.pid)) {
removePtyDaemonManifest(organizationId);
return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
reason: "manifest_pid_dead",
});
}
```
### Issue 2 of 2
packages/host-service/src/daemon/DaemonSupervisor.test.ts:1081-1089
**Test helper duplicates unexported production logic**
`expectedSocketPathForOrg` reimplements `ptyDaemonSocketPath` character-for-character. Because the production function is not exported, the test must inline it, but this means a future change to the hash prefix, hash length, or filename template in `DaemonSupervisor.ts` will silently leave this helper pointing at the old path — the fake daemon would be placed on the wrong socket, `tryAdoptFromSocket` would find nothing, and the test would still pass. Consider exporting `ptyDaemonSocketPath` (or a thin test-only re-export) so this helper can delegate to the real implementation.
Reviews (1): Last reviewed commit: "fix(host-service): recover pty daemon ad..." | Re-trigger Greptile
| if (!isProcessAlive(manifest.pid)) { | ||
| removePtyDaemonManifest(organizationId); | ||
| return null; | ||
| return this.tryAdoptFromSocket(organizationId, expectedSocketPath, { | ||
| reason: "manifest_pid_dead", | ||
| previousManifest: manifest, | ||
| }); | ||
| } |
There was a problem hiding this comment.
In the
manifest_pid_dead path, previousManifest.startedAt belongs to the dead daemon, not to the newly-adopted daemon discovered at the expected socket. Writing that timestamp into the new manifest silently records the wrong start time for a different process. Using Date.now() (or omitting the previousManifest argument entirely) avoids this confusion.
| if (!isProcessAlive(manifest.pid)) { | |
| removePtyDaemonManifest(organizationId); | |
| return null; | |
| return this.tryAdoptFromSocket(organizationId, expectedSocketPath, { | |
| reason: "manifest_pid_dead", | |
| previousManifest: manifest, | |
| }); | |
| } | |
| if (!isProcessAlive(manifest.pid)) { | |
| removePtyDaemonManifest(organizationId); | |
| return this.tryAdoptFromSocket(organizationId, expectedSocketPath, { | |
| reason: "manifest_pid_dead", | |
| }); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/daemon/DaemonSupervisor.ts
Line: 859-865
Comment:
In the `manifest_pid_dead` path, `previousManifest.startedAt` belongs to the dead daemon, not to the newly-adopted daemon discovered at the expected socket. Writing that timestamp into the new manifest silently records the wrong start time for a different process. Using `Date.now()` (or omitting the `previousManifest` argument entirely) avoids this confusion.
```suggestion
if (!isProcessAlive(manifest.pid)) {
removePtyDaemonManifest(organizationId);
return this.tryAdoptFromSocket(organizationId, expectedSocketPath, {
reason: "manifest_pid_dead",
});
}
```
How can I resolve this? If you propose a fix, please make it concise.| function expectedSocketPathForOrg(organizationId: string): string { | ||
| const shortId = createHash("sha256") | ||
| .update(organizationId) | ||
| .digest("hex") | ||
| .slice(0, 12); | ||
| return path.join(os.tmpdir(), `superset-ptyd-${shortId}.sock`); | ||
| } | ||
|
|
||
| async function waitForProcessExit( |
There was a problem hiding this comment.
Test helper duplicates unexported production logic
expectedSocketPathForOrg reimplements ptyDaemonSocketPath character-for-character. Because the production function is not exported, the test must inline it, but this means a future change to the hash prefix, hash length, or filename template in DaemonSupervisor.ts will silently leave this helper pointing at the old path — the fake daemon would be placed on the wrong socket, tryAdoptFromSocket would find nothing, and the test would still pass. Consider exporting ptyDaemonSocketPath (or a thin test-only re-export) so this helper can delegate to the real implementation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/daemon/DaemonSupervisor.test.ts
Line: 1081-1089
Comment:
**Test helper duplicates unexported production logic**
`expectedSocketPathForOrg` reimplements `ptyDaemonSocketPath` character-for-character. Because the production function is not exported, the test must inline it, but this means a future change to the hash prefix, hash length, or filename template in `DaemonSupervisor.ts` will silently leave this helper pointing at the old path — the fake daemon would be placed on the wrong socket, `tryAdoptFromSocket` would find nothing, and the test would still pass. Consider exporting `ptyDaemonSocketPath` (or a thin test-only re-export) so this helper can delegate to the real implementation.
How can I resolve this? If you propose a fix, please make it concise.
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
There was a problem hiding this comment.
1 issue found across 4 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
* fix(host-service): recover pty daemon adoption from live socket * fix(host-service): avoid stale adoption metadata
Summary
hello-ackso the host service can rebuild adoption state without host-side PID discovery.Testing
bun test packages/host-service/src/daemon/DaemonSupervisor.test.ts packages/pty-daemon/src/protocol/wire-shape.test.tsbun run --cwd packages/host-service typecheckbun run --cwd packages/pty-daemon typecheckbun run --cwd packages/host-service test:integration:daemonbun run lintSummary by cubic
Recover v2 pty-daemon adoption from the deterministic org socket when the manifest is missing or stale, rebuilding state using the daemon-reported PID. Also avoids stale adoption metadata so host restarts are reliable and closer to v1’s socket-first recovery.
DaemonSupervisor.tryAdoptto also handle version-probe failures; adopts from the expected org socket and rewrites the manifest with the resolved PID and preservedstartedAt.daemonVersionanddaemonPid(with retry).hello-ackand server reply inpackages/pty-daemon; manifest writes now useCURRENT_PROTOCOL_VERSION.Written for commit ea91066. Summary will update on new commits. Review in cubic
Summary by CodeRabbit
Bug Fixes
Improvements
Tests