Skip to content

[codex] Fix canary host-service adoption#4499

Merged
Kitenite merged 2 commits into
mainfrom
terminal-canary-investiga
May 13, 2026
Merged

[codex] Fix canary host-service adoption#4499
Kitenite merged 2 commits into
mainfrom
terminal-canary-investiga

Conversation

@Kitenite
Copy link
Copy Markdown
Collaborator

@Kitenite Kitenite commented May 13, 2026

Summary

  • Stop treating desktop app-version/canary timestamp drift as an automatic reason to SIGTERM an otherwise healthy host-service.
  • Spawn packaged host-service children with NODE_ENV=production, and only run host-service/daemon dev teardown behavior when NODE_ENV === "development".
  • Update adoption tests and comments around manifest provenance and host-service version reporting.

Root Cause

Canary app version bumps changed app.getVersion() even when the host-service/daemon payload did not require a restart. The desktop coordinator killed the old host-service before health-checking it. In packaged builds the spawned host-service inherited an unset NODE_ENV, which made the old host-service take the dev SIGTERM path and stop the pty daemon, killing live terminal sessions before the new host-service could adopt.

Impact

Canary updates should now reuse a healthy existing host-service across app-version-only bumps, preserving the daemon and PTYs. Unhealthy adopted services still get killed and respawned after the health check.

Validation

  • bun test apps/desktop/src/main/lib/host-service-coordinator.test.ts
  • bun test packages/host-service/src/daemon/DaemonSupervisor.test.ts
  • bun run --cwd packages/pty-daemon build:daemon
  • node --experimental-strip-types --test packages/host-service/src/daemon/DaemonSupervisor.node-test.ts
  • bun run lint:fix
  • bun run lint
  • git diff --check

Summary by cubic

Prevents terminal disruption during canary updates by reusing a healthy host-service on app-version-only bumps and pre-upgrade manifests. Packaged builds now run the host-service in production mode to avoid dev teardown.

  • Bug Fixes
    • Adoption: treat app-version mismatch or empty spawnedByAppVersion as informational; run one health check, reuse if healthy, else SIGKILL and respawn.
    • Env: packaged Electron spawns the host-service with NODE_ENV=production; dev teardown and stdio only when NODE_ENV === "development".
    • Daemon/tests: kill stale daemons only in explicit dev; support adoption testing with SUPERSET_PTY_DAEMON_ADOPT_IN_DEV=1; added tests for pre-upgrade adoption and updated manifest/version comments.

Written for commit 69303d6. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes

    • Adoption now checks service health before reusing instances when app versions differ; healthy services are retained instead of forcibly restarted.
    • Dev-mode detection narrowed to explicit development to avoid unintended daemon restarts; spawned processes now reflect the stricter NODE_ENV semantics.
  • Documentation

    • Clarified adoption and version-compatibility guidance to emphasize health-check decision flow.
  • Tests

    • Updated and expanded tests to cover adoption and dev-mode behavior changes.

Review Change Stack

@capy-ai
Copy link
Copy Markdown

capy-ai Bot commented May 13, 2026

Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 23696be6-0a4b-450a-8f5d-bbec40a67563

📥 Commits

Reviewing files that changed from the base of the PR and between 08a719f and 69303d6.

📒 Files selected for processing (1)
  • apps/desktop/src/main/lib/host-service-coordinator.test.ts

📝 Walkthrough

Walkthrough

The PR changes host-service adoption and development-mode detection: app-version mismatches no longer force termination — the coordinator always health-checks before deciding to kill and respawn; NODE_ENV is explicitly propagated to spawned host-service processes; dev-mode detection is narrowed to NODE_ENV === "development".

Changes

Service adoption and development mode detection

Layer / File(s) Summary
App-version adoption deferred to health check
apps/desktop/src/main/lib/host-service-coordinator.ts, apps/desktop/src/main/lib/host-service-coordinator.test.ts, apps/desktop/src/main/lib/host-service-manifest.ts
Instead of immediately terminating manifests with mismatched spawnedByAppVersion, the coordinator now logs that health will be checked before reuse and proceeds to pollHealthCheck. If health fails, SIGKILL is issued and the manifest is removed. Tests verify adoption succeeds when health is good despite version mismatch, and that SIGKILL only occurs after a failed health check. Documentation clarifies that app version is diagnostic context, not a kill condition.
NODE_ENV explicit propagation to spawned processes
apps/desktop/src/main/lib/host-service-coordinator.ts
The spawned host-service process environment now explicitly sets NODE_ENV to "production" for packaged apps or process.env.NODE_ENV ?? "development" otherwise.
Development mode detection refining from non-production to exact match
packages/host-service/src/daemon/DaemonSupervisor.ts, packages/host-service/src/serve.ts, packages/host-service/src/daemon/DaemonSupervisor.test.ts
shouldKillStaleDaemonForDev now returns false when SUPERSET_PTY_DAEMON_ADOPT_IN_DEV === "1", otherwise checks if NODE_ENV === "development" exactly. Daemon isDev and serve isDev conditions change from NODE_ENV !== "production" to NODE_ENV === "development", affecting stdio attachment and shutdown-handler registration. Tests updated to verify false for unset/empty environment and true only for explicit "development".
Version reporting documentation update
packages/host-service/src/trpc/router/host/host.ts
Comment clarifies HOST_SERVICE_VERSION describes the currently served bundled build for caller reporting, not an adoption check requirement.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • superset-sh/superset#4448: Restructures and restores test mocks used by host-service-coordinator tests, directly supporting the adoption flow test revisions in this PR.
  • superset-sh/superset#4460: Related adjustments to dev-mode gating and daemon adoption behavior overlap with changes to shouldKillStaleDaemonForDev and NODE_ENV handling.

Poem

🐰 I hopped in to check the host, not to smite,

First probe the pulse, then choose to fight.
NODE_ENV set, clear as day or night,
Dev only when "development" is right,
A careful rabbit keeps services light.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[codex] Fix canary host-service adoption' clearly and concisely summarizes the main change: fixing a bug where app-version drifts were incorrectly triggering host-service termination instead of health checking.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering root cause, impact, validation steps, and detailed explanation of the fix. However, it does not follow the provided template structure (Missing 'Type of Change', 'Testing', 'Related Issues', and 'Screenshots' sections).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch terminal-canary-investiga

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 7 files

You’re at about 91% of the monthly review limit. You may want to disable incremental reviews to conserve quota. Reviews will continue until that limit is exceeded. If you need help avoiding interruptions, please contact contact@cubic.dev.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

🧹 Preview Cleanup Complete

The following preview resources have been cleaned up:

  • ✅ Neon database branch

Thank you for your contribution! 🎉

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Greptile Summary

This PR fixes canary host-service adoption by removing the pre-health-check SIGTERM that fired on any app-version mismatch, and by explicitly setting NODE_ENV=production for host-service children spawned from packaged Electron builds so neither the host-service nor its pty-daemon take the dev teardown path during canary updates.

  • tryAdopt in host-service-coordinator.ts now logs a version mismatch and falls through to the existing pollHealthCheck gate, adopting the process if healthy and issuing SIGKILL only if it fails — replacing the unconditional pre-health-check SIGTERM.
  • shouldKillStaleDaemonForDev and the two isDev guards in DaemonSupervisor.ts / serve.ts are flipped from !== "production" to === "development", so an unset NODE_ENV (the root cause in pre-fix packaged builds) is now treated as production-safe instead of triggering daemon teardown.

Confidence Score: 4/5

The coordinator and supervisor changes are internally consistent and well-tested; the one gap is a missing test for the pre-upgrade empty-spawnedByAppVersion path whose behavior changed.

The core logic is correct — health-check gating replaces the blunt app-version kill, and explicit NODE_ENV propagation closes the packaged-build teardown gap. The pre-upgrade manifest path (spawnedByAppVersion coerced to "") silently switched from kill-immediately to health-check-first without a dedicated test, so a regression there would not be caught by the suite.

apps/desktop/src/main/lib/host-service-coordinator.test.ts — the pre-upgrade manifest (empty spawnedByAppVersion) adoption path is untested after the behavioral change.

Important Files Changed

Filename Overview
apps/desktop/src/main/lib/host-service-coordinator.ts App-version mismatch now falls through to the existing health check instead of issuing an early SIGTERM; packaged builds explicitly set NODE_ENV=production in the child environment.
apps/desktop/src/main/lib/host-service-coordinator.test.ts Old version-mismatch test replaced with two new tests (healthy adopt + SIGKILL after failed health check); the pre-upgrade empty-spawnedByAppVersion path is not covered.
packages/host-service/src/daemon/DaemonSupervisor.ts shouldKillStaleDaemonForDev and isDev checks flipped from 'not production' to 'explicitly development', so unset NODE_ENV is now treated as production-safe.
packages/host-service/src/serve.ts isDev guard for the SIGTERM/SIGINT daemon-teardown handler changed from !== 'production' to === 'development', preventing packaged builds from killing PTYs on host-service exit.
apps/desktop/src/main/lib/host-service-manifest.ts Comment-only changes clarifying that spawnedByAppVersion is now diagnostic metadata rather than a kill trigger.
packages/host-service/src/daemon/DaemonSupervisor.test.ts Test updated to reflect that shouldKillStaleDaemonForDev({}) now returns false instead of true.
packages/host-service/src/trpc/router/host/host.ts Comment-only change removing reference to the now-removed strict-equality adoption check.

Sequence Diagram

sequenceDiagram
    participant C as HostServiceCoordinator
    participant M as ManifestStore
    participant HS as Old Host-Service
    participant HC as Health Check

    Note over C: Canary app-version bump
    C->>M: readManifest(orgId)
    M-->>C: "manifest (spawnedByAppVersion != currentAppVersion)"
    Note over C: OLD: SIGTERM → removeManifest → return null
    Note over C: NEW: log version mismatch, fall through

    C->>HC: pollHealthCheck(endpoint, 2000ms)
    alt Healthy
        HC-->>C: true
        C->>C: adopt pid, port, secret
        Note over HS: PTYs survive ✓
    else Unhealthy
        HC-->>C: false
        C->>HS: SIGKILL(pid)
        C->>M: removeManifest(orgId)
        C->>C: "spawn fresh host-service (NODE_ENV=production)"
    end
Loading

Comments Outside Diff (1)

  1. apps/desktop/src/main/lib/host-service-coordinator.test.ts, line 191-224 (link)

    P2 Missing test for pre-upgrade manifest path

    Two new tests cover spawnedByAppVersion = "0.9.0" (explicit old version), but neither covers spawnedByAppVersion = "" (the value coerced for pre-upgrade manifests in readManifest). The behavior for that case changed: the old code hit the !== currentAppVersion branch and killed immediately; the new code health-checks the process instead. A test that uses spawnedByAppVersion: "" would confirm the adoption path (health-check proceeds, healthy → adopt, unhealthy → SIGKILL + respawn) and prevent the empty-string coercion path from silently regressing.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: apps/desktop/src/main/lib/host-service-coordinator.test.ts
    Line: 191-224
    
    Comment:
    **Missing test for pre-upgrade manifest path**
    
    Two new tests cover `spawnedByAppVersion = "0.9.0"` (explicit old version), but neither covers `spawnedByAppVersion = ""` (the value coerced for pre-upgrade manifests in `readManifest`). The behavior for that case changed: the old code hit the `!== currentAppVersion` branch and killed immediately; the new code health-checks the process instead. A test that uses `spawnedByAppVersion: ""` would confirm the adoption path (health-check proceeds, healthy → adopt, unhealthy → SIGKILL + respawn) and prevent the empty-string coercion path from silently regressing.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
apps/desktop/src/main/lib/host-service-coordinator.test.ts:191-224
**Missing test for pre-upgrade manifest path**

Two new tests cover `spawnedByAppVersion = "0.9.0"` (explicit old version), but neither covers `spawnedByAppVersion = ""` (the value coerced for pre-upgrade manifests in `readManifest`). The behavior for that case changed: the old code hit the `!== currentAppVersion` branch and killed immediately; the new code health-checks the process instead. A test that uses `spawnedByAppVersion: ""` would confirm the adoption path (health-check proceeds, healthy → adopt, unhealthy → SIGKILL + respawn) and prevent the empty-string coercion path from silently regressing.

Reviews (1): Last reviewed commit: "fix canary host-service adoption" | Re-trigger Greptile

@Kitenite Kitenite merged commit 9f10750 into main May 13, 2026
17 checks passed
@Kitenite Kitenite deleted the terminal-canary-investiga branch May 13, 2026 16:10
MocA-Love pushed a commit to MocA-Love/superset that referenced this pull request May 25, 2026
* fix canary host-service adoption

* test pre-upgrade host-service adoption
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant