fix(mcp): prevent orphan processes by handling stdin close/end and startup race condition#2049
Conversation
…artup race condition
Three gaps in stdin EOF handling:
1. Startup race: parent can die before `process.stdin.on("end", ...)` is
registered, so the event is missed entirely.
2. Missing "close" event: when pipe is forcibly closed (parent SIGKILL),
"close" fires without "end" on some platforms.
3. Transport layer did not propagate stdin termination to its onclose
callback.
Fixes:
- Check readableEnded/destroyed in start() before registering listeners.
- Register stdin end+close listeners in CompatibleStdioServerTransport.
- Add _closed guard for idempotent close().
- Throw if start() is called after close().
- Add process.stdin.on("close") in server.ts alongside existing handlers.
- Add 5 regression tests.
|
@jwcrystal is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
✨ PR AutofixFound fixable formatting / unused-import issues across 16 changed lines. Comment |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10500 tests passed 16 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
magyargergo
left a comment
There was a problem hiding this comment.
Tri-review digest (PR #2049)
Methods: 2-engine review — Codex (live) + 5 Claude lanes (risk-architect, test-ci-verifier, correctness, adversarial, reliability). Strong corroboration on the startup-race / orphan gap (Codex + 3 Claude lanes). Adversarial lane partially refutes the narrow post-connect window (argues close may still deliver) but agrees on the backend.init() window.
MCP lifecycle context (why this PR matters)
GitNexus MCP is a stdio child process (gitnexus mcp). The parent (Cursor / Claude Code / OpenCode) owns stdin; when it exits, the pipe closes. Orphans happen when the child stays alive after stdin EOF because:
shutdown()→backend.disconnect()→process.exit()never runs- Handles (LadybugDB, pino worker, timers) keep the event loop alive
The MCP SDK Server.connect() wraps transport.onclose → protocol cleanup only — not process.exit. Actual termination depends on startMCPServer wiring process.stdin end/close to shutdown().
This PR adds transport-level end/close listeners + a startup readableEnded/destroyed guard, and process.stdin.on('close') in server.ts. That fixes the common steady-state case and close-without-end (e.g. broken pipe).
Headline finding
P1 — startup race still allows orphans. shutdown() and process.stdin listeners are registered after await server.connect(transport) (server.ts:340 vs :397-398). If stdin EOF happens earlier (especially during backend.init() in mcp.ts:67-84, where there are no listeners), the transport race guard may call close() but process.exit() is never reached — Node does not replay end/close to late listeners.
Inline findings
- P1
server.ts— listener registration ordering vsserver.connect() - P1
compatible-stdio-transport.ts— race guard closes transport, not process - P2
compatible-stdio-transport.test.ts— race-guard test doesn't hit the guard branch
What's solid
- Transport
end/closepropagation with idempotent_closedguard process.stdin.on('close')defense-in-depth forclose-without-end- 5 focused unit tests;
npx vitest run test/unit/compatible-stdio-transport.test.ts→ 16/16 pass (worktree)
Lower priority / CI
- P1 CI:
quality / formatfailed — Prettier wants the race-guardifwrapped (npx prettier --check src/mcp/compatible-stdio-transport.ts). Autofix available (/autofix). - P2: No integration test proving the child exits on stdin EOF without
server-startup.test.ts's SIGKILL fallback. - Infra:
tests / tree-sitter ABI (ubuntu-latest)failed ononnxruntime-nodeETIMEDOUT(network); macOS/windows ABI passed — likely rerun, not PR regression. - Vercel deploy auth failure — unrelated.
Suggested fix (for P1)
Hoist shutdown before server.connect(), register process.stdin listeners before connect, wire server.onclose → shutdown, and after connect synchronously check process.stdin.readableEnded || process.stdin.destroyed.
Automated multi-tool digest — verify before acting.
|
|
||
| it('start() immediately closes if stdin is already ended/destroyed before listeners register', async () => { | ||
| const endedStdin = new PassThrough(); | ||
| endedStdin.push(null); |
There was a problem hiding this comment.
P2 [code-read] push(null) on PassThrough leaves readableEnded=false until the stream is consumed — this test exercises the normal end listener path, not the readableEnded || destroyed race-guard branch at transport line 91. Use destroy() or a pre-ended real pipe to cover the guard.
Verified: node -e "const {PassThrough}=require('stream'); const s=new PassThrough(); s.push(null); console.log(s.readableEnded)" → false.
jwcrystal
left a comment
There was a problem hiding this comment.
P1 fixed.
Moved process.stdin.on('end'/'close'/'error', shutdown) and a readableEnded/destroyed pre-connect check before await server.connect(transport). Added test asserting listener counts are > baseline during connect. Also fixed formatting CI failure.
|
Thank you for fixing this! 🙏 |
Problem
When the parent process (e.g. OpenCode, Claude Code) exits or closes the MCP child process stdin, the GitNexus MCP server can remain running as an orphan process. Over time these accumulate. In some cases orphan processes spin at 100% CPU because stdin EOF is not detected and the event loop stays alive.
Root cause
Three gaps in stdin EOF handling:
process.stdin.on('end', ...)is registered (which happens duringstartMCPServer), theendevent was already emitted and the listener never fires.closeevent listener:process.stdinonly listened forendanderror, notclose. On some platforms/Node versions, when a pipe is forcibly closed (parent SIGKILL),closefires without a precedingend.CompatibleStdioServerTransportonly listened fordataanderror. It neither detected stdin EOF nor propagated it via itsonclosecallback.Changes
compatible-stdio-transport.tsstart()checksreadableEnded/destroyedbefore registering listeners. If stdin is already closed, it immediately callsclose().start()registersthis._boundCloseon stdinendandcloseevents, so the transport self-closes when the parent disconnects.close()is guarded by a_closedflag so double-invocation doesn't double-fireonclose.start()afterclose()throws, preventing lifecycle bugs._boundClose = this.close.bind(this)avoids creating new function references on eachon/offcall.server.tsprocess.stdin.on('close', ...)handler alongside existingend/errorhandlers.Tests
Added 5 regression tests for the transport layer lifecycle covering: stdin
endtriggers close, stdinclosetriggers close,close()idempotency, startup race (stdin already ended), andstart()afterclose()throws.