fix: MCP server crashes under parallel tool calls (#326)#349
Conversation
|
@RyuzakiH is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
|
@claude ultra-think while you are reviewing this pr and you are tasked to find bugs, code quality issues, architectural fit and performance issues. |
|
Claude finished @magyargergo's task in 3m 24s —— View job PR Review: fix/mcp-parallel-stabilityTasks
The PR correctly identifies the root cause (lazy 🔴 Bug: FTS init races with concurrent checkouts
pool.set(repoId, { db, available, checkedOut: 0, waiters: [], lastUsed: Date.now(), dbPath });
ensureIdleTimer();
if (!shared.ftsLoaded) {
try {
await available[0].query('LOAD EXTENSION fts'); // ← await here
Fix: Either call 🔴 Bug:
|
| Severity | Issue | File |
|---|---|---|
| 🔴 Bug | pool.set before FTS load allows concurrent checkout of available[0] |
lbug-adapter.ts:227 |
| 🔴 Bug | No concurrency guard in initLbug — parallel inits for same repoId leak connections |
lbug-adapter.ts:162 |
| 🟠 Risk | Lazy createConnection path still live in checkout |
lbug-adapter.ts:255 |
| 🟠 Risk | 1-second watchdog can fire during legitimate slow createConnection |
lbug-adapter.ts:134 |
| 🟡 Issue | Two independent process.stdout.write captures can diverge |
lbug-adapter.ts:56 / server.ts:282 |
| 🟡 Issue | unhandledRejection swallowed; poor error serialization |
server.ts:315 |
| 🟡 Perf | Pre-creating all 8 connections is a 4× resource increase | lbug-adapter.ts:222 |
The two 🔴 bugs — the FTS/checkout race and the concurrent initLbug issue — should be addressed before merging, as they can reproduce the original segfault under parallel load, which is exactly the scenario this PR is meant to fix.
|
Could you please tackle with this? 🙏 |
- Add initPromises dedup map to prevent double-init race when parallel tool calls trigger initLbug for the same repoId simultaneously - Move pool.set() after FTS load so concurrent checkout can't grab a connection mid-async-init (FTS race on available[0]) - Replace lazy createConnection growth path with integrity error — pool is pre-warmed, lazy creation would silence stdout during active queries - Add preWarmActive flag so watchdog timer skips stdout restore during the synchronous pre-warm loop - Unify stdout capture: server.ts imports realStdoutWrite from lbug-adapter instead of capturing its own copy
|
@claude can you recommend integration tests for this PR? |
|
Claude finished @magyargergo's task in 2m 5s —— View job Tasks
Recommended Integration Tests for
|
|
Claude finished @magyargergo's task in 2m 48s —— View job Tasks
Would this PR fix #347, #314, #308?What this PR actually fixesThe PR addresses one specific failure mode: the stdout race condition that occurs when
Issue #347 (very recent — filed after #326 was open)Likely scenario: A follow-up crash or regression seen after #326 was filed but before the fix landed. Given the timing (between #328 and #349), this is probably another variant of the parallel-tool-call crash — possibly triggered by a different tool combination or a slightly different race. Would this PR fix it?
Issue #314 (between Phase 4 #310 and Phase 5 #315 type resolution)Likely scenario: Filed during the type resolution phase rollout. Could be a crash or correctness issue in the resolver/ingestion pipeline, or an early MCP stability report coinciding with the LadybugDB migration (#275 landed around this era). Would this PR fix it?
Issue #308 (filed around the LadybugDB migration era)Likely scenario: Filed just before the Phase 4 type resolution work began. If this is post-migration (#275), it could be a stability report about the new LadybugDB-backed MCP server. Would this PR fix it?
Pros and ConsPros
Cons
Summary verdict
To give a definitive answer, sharing the issue bodies for #347, #314, and #308 here would let me confirm whether their symptoms match the stdout race this PR fixes. |
|
Could you please have a look at the comments above? 🙏 |
7 integration tests covering concurrent query safety, waiter queue overflow, stdout.write restoration, connection leak detection, initLbug deduplication, atomic pool visibility, and mixed query types.
Vitest's projects feature splits test files into two groups: lbug-db (fileParallelism: false) and default (parallel). This prevents native mmap file-lock conflicts on Windows without requiring the CI shell loop locally.
/abhigyanpatwari#316 Verifies that 3 concurrent queries via Promise.all (the exact pattern from the impact command's enrichment phase at local-backend.ts:1415) complete without SIGSEGV on a pre-warmed connection pool.
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 2694 tests passed 20 test(s) skipped — expand for detailsIntegration:
Unit:
Code CoverageCombined (Unit + Integration)
Coverage breakdown by test suiteUnit Tests
Integration Tests
📋 View full run · Generated by CI |
… (abhigyanpatwari#349) * fix: MCP server crashes under parallel tool calls (abhigyanpatwari#326) * fix: ensure full connection pool is pre-created to avoid race conditions during query execution * fix: improve graceful shutdown handling with exit codes * fix: resolve critical concurrency bugs in connection pool init - Add initPromises dedup map to prevent double-init race when parallel tool calls trigger initLbug for the same repoId simultaneously - Move pool.set() after FTS load so concurrent checkout can't grab a connection mid-async-init (FTS race on available[0]) - Replace lazy createConnection growth path with integrity error — pool is pre-warmed, lazy creation would silence stdout during active queries - Add preWarmActive flag so watchdog timer skips stdout restore during the synchronous pre-warm loop - Unify stdout capture: server.ts imports realStdoutWrite from lbug-adapter instead of capturing its own copy * test: add connection pool parallel stability tests 7 integration tests covering concurrent query safety, waiter queue overflow, stdout.write restoration, connection leak detection, initLbug deduplication, atomic pool visibility, and mixed query types. * fix: run LadybugDB tests sequentially via vitest projects config Vitest's projects feature splits test files into two groups: lbug-db (fileParallelism: false) and default (parallel). This prevents native mmap file-lock conflicts on Windows without requiring the CI shell loop locally. * test: add enrichment Promise.all regression test for abhigyanpatwari#292/abhigyanpatwari#316 Verifies that 3 concurrent queries via Promise.all (the exact pattern from the impact command's enrichment phase at local-backend.ts:1415) complete without SIGSEGV on a pre-warmed connection pool.
Problem
The MCP server crashes silently when receiving 4 or more parallel
cypher()orquery()calls. The process exits without error output, leaving clients withConnection closederrors.Simpler tools such as
impact()andcontext()are unaffected.Root cause
The connection pool grows lazily. When all connections are busy,
createConnection()callssilenceStdout(), which replacesprocess.stdout.writewith a no-op to suppress native module debug output.Under parallel load, this races with in-flight MCP responses being written to stdout, causing responses to be dropped and corrupting the JSON-RPC stream.
Fix
1. Pre-warm the full connection pool (
lbug-adapter.ts)All connections are now created eagerly during
initLbug(), socreateConnection()is never called lazily during active query execution, eliminating the stdout race.Also removed the now-unused
INITIAL_CONNS_PER_REPOconstant.2. Safe stdout proxy (
server.ts)Capture the real
process.stdout.writeat startup and route it through a proxy to the MCP transport.This ensures MCP responses still reach the client even if
silenceStdout()is triggered later.3. Crash handlers (
server.ts)uncaughtExceptionhandler that logs to stderr and shuts the server downunhandledRejectionhandler that logs the error but remains non-fatalThis keeps the server availability-first: one missed
catchshould not terminate the process.4. Stdout silence watchdog (
lbug-adapter.ts)Added a 1-second watchdog that restores stdout if it becomes stuck in the silenced state, such as after a native crash during connection construction.
Validation
Tested all reproduction patterns from the issue:
Result
The server no longer drops MCP responses or corrupts the JSON-RPC stream under parallel
cypher()/query()load.Fixes #326