Skip to content

fix(mcp): prevent orphan processes by handling stdin close/end and startup race condition#2049

Merged
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
jwcrystal:fix/mcp-stdin-eof-shutdown
Jun 5, 2026
Merged

fix(mcp): prevent orphan processes by handling stdin close/end and startup race condition#2049
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
jwcrystal:fix/mcp-stdin-eof-shutdown

Conversation

@jwcrystal

Copy link
Copy Markdown
Contributor

Problem

When the parent process (e.g. OpenCode, Claude Code) exits or closes the MCP child process stdin, the GitNexus MCP server can remain running as an orphan process. Over time these accumulate. In some cases orphan processes spin at 100% CPU because stdin EOF is not detected and the event loop stays alive.

Root cause

Three gaps in stdin EOF handling:

  1. Startup race condition: if the parent dies before process.stdin.on('end', ...) is registered (which happens during startMCPServer), the end event was already emitted and the listener never fires.
  2. No close event listener: process.stdin only listened for end and error, not close. On some platforms/Node versions, when a pipe is forcibly closed (parent SIGKILL), close fires without a preceding end.
  3. Transport layer doesn't propagate stdin termination: CompatibleStdioServerTransport only listened for data and error. It neither detected stdin EOF nor propagated it via its onclose callback.

Changes

compatible-stdio-transport.ts

  • Startup race guard: start() checks readableEnded/destroyed before registering listeners. If stdin is already closed, it immediately calls close().
  • stdin end/close listeners: start() registers this._boundClose on stdin end and close events, so the transport self-closes when the parent disconnects.
  • Idempotent close: close() is guarded by a _closed flag so double-invocation doesn't double-fire onclose.
  • Permanent close guard: start() after close() throws, preventing lifecycle bugs.
  • Stable bound reference: _boundClose = this.close.bind(this) avoids creating new function references on each on/off call.

server.ts

  • Added process.stdin.on('close', ...) handler alongside existing end/error handlers.
  • Added comment explaining defense-in-depth: both transport-level and process-level shutdown paths are idempotent.

Tests

Added 5 regression tests for the transport layer lifecycle covering: stdin end triggers close, stdin close triggers close, close() idempotency, startup race (stdin already ended), and start() after close() throws.

…artup race condition

Three gaps in stdin EOF handling:

1. Startup race: parent can die before `process.stdin.on("end", ...)` is
   registered, so the event is missed entirely.
2. Missing "close" event: when pipe is forcibly closed (parent SIGKILL),
   "close" fires without "end" on some platforms.
3. Transport layer did not propagate stdin termination to its onclose
   callback.

Fixes:
- Check readableEnded/destroyed in start() before registering listeners.
- Register stdin end+close listeners in CompatibleStdioServerTransport.
- Add _closed guard for idempotent close().
- Throw if start() is called after close().
- Add process.stdin.on("close") in server.ts alongside existing handlers.
- Add 5 regression tests.
@jwcrystal jwcrystal requested a review from magyargergo as a code owner June 5, 2026 06:19
@vercel

vercel Bot commented Jun 5, 2026

Copy link
Copy Markdown

@jwcrystal is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 16 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":2049,"changed_lines":16,"head_sha":"ca51fa177aad51d39178b829e79b029316d12030","run_id":"26999866625","apply_command":"/autofix"}

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10516 10500 0 16 513s

✅ All 10500 tests passed

16 test(s) skipped — expand for details
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • COBOL pipeline benchmark > scales with file count
  • C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
  • C++ pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • Vue pipeline benchmark > scales with component count

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 80.36% 36901/45918 N/A% 🟢 ████████████████░░░░
Branches 68.97% 23529/34110 N/A% 🟢 █████████████░░░░░░░
Functions 85.76% 3891/4537 N/A% 🟢 █████████████████░░░
Lines 84.05% 33211/39512 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review digest (PR #2049)

Methods: 2-engine review — Codex (live) + 5 Claude lanes (risk-architect, test-ci-verifier, correctness, adversarial, reliability). Strong corroboration on the startup-race / orphan gap (Codex + 3 Claude lanes). Adversarial lane partially refutes the narrow post-connect window (argues close may still deliver) but agrees on the backend.init() window.

MCP lifecycle context (why this PR matters)

GitNexus MCP is a stdio child process (gitnexus mcp). The parent (Cursor / Claude Code / OpenCode) owns stdin; when it exits, the pipe closes. Orphans happen when the child stays alive after stdin EOF because:

  • shutdown()backend.disconnect()process.exit() never runs
  • Handles (LadybugDB, pino worker, timers) keep the event loop alive

The MCP SDK Server.connect() wraps transport.onclose → protocol cleanup only — not process.exit. Actual termination depends on startMCPServer wiring process.stdin end/close to shutdown().

This PR adds transport-level end/close listeners + a startup readableEnded/destroyed guard, and process.stdin.on('close') in server.ts. That fixes the common steady-state case and close-without-end (e.g. broken pipe).

Headline finding

P1 — startup race still allows orphans. shutdown() and process.stdin listeners are registered after await server.connect(transport) (server.ts:340 vs :397-398). If stdin EOF happens earlier (especially during backend.init() in mcp.ts:67-84, where there are no listeners), the transport race guard may call close() but process.exit() is never reached — Node does not replay end/close to late listeners.

Inline findings

  1. P1 server.ts — listener registration ordering vs server.connect()
  2. P1 compatible-stdio-transport.ts — race guard closes transport, not process
  3. P2 compatible-stdio-transport.test.ts — race-guard test doesn't hit the guard branch

What's solid

  • Transport end/close propagation with idempotent _closed guard
  • process.stdin.on('close') defense-in-depth for close-without-end
  • 5 focused unit tests; npx vitest run test/unit/compatible-stdio-transport.test.ts16/16 pass (worktree)

Lower priority / CI

  • P1 CI: quality / format failed — Prettier wants the race-guard if wrapped (npx prettier --check src/mcp/compatible-stdio-transport.ts). Autofix available (/autofix).
  • P2: No integration test proving the child exits on stdin EOF without server-startup.test.ts's SIGKILL fallback.
  • Infra: tests / tree-sitter ABI (ubuntu-latest) failed on onnxruntime-node ETIMEDOUT (network); macOS/windows ABI passed — likely rerun, not PR regression.
  • Vercel deploy auth failure — unrelated.

Suggested fix (for P1)

Hoist shutdown before server.connect(), register process.stdin listeners before connect, wire server.onclose → shutdown, and after connect synchronously check process.stdin.readableEnded || process.stdin.destroyed.


Automated multi-tool digest — verify before acting.

Comment thread gitnexus/src/mcp/server.ts
Comment thread gitnexus/src/mcp/compatible-stdio-transport.ts Outdated

it('start() immediately closes if stdin is already ended/destroyed before listeners register', async () => {
const endedStdin = new PassThrough();
endedStdin.push(null);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 [code-read] push(null) on PassThrough leaves readableEnded=false until the stream is consumed — this test exercises the normal end listener path, not the readableEnded || destroyed race-guard branch at transport line 91. Use destroy() or a pre-ended real pipe to cover the guard.

Verified: node -e "const {PassThrough}=require('stream'); const s=new PassThrough(); s.push(null); console.log(s.readableEnded)"false.

@jwcrystal jwcrystal left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 fixed.

Moved process.stdin.on('end'/'close'/'error', shutdown) and a readableEnded/destroyed pre-connect check before await server.connect(transport). Added test asserting listener counts are > baseline during connect. Also fixed formatting CI failure.

@magyargergo magyargergo merged commit 22304cd into abhigyanpatwari:main Jun 5, 2026
44 of 47 checks passed
@magyargergo

Copy link
Copy Markdown
Collaborator

Thank you for fixing this! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants