fix(server): move code-locator init off MCP stdio handshake (#380)#385
Merged
Conversation
Pre-fix, serve_stdio awaited get_code_locator().initialize() inline before opening the MCP stdio transport. On a 150MB+ symbol-index DB the cold path took ~45s (sqlite-vec open + tree-sitter load + BM25 pickle load), blowing past Claude Code's 30s MCP initialize timeout on real-world repos — the server "started" but the JSON-RPC handshake never landed and the client gave up. Fix: - ``RealCodeLocatorAdapter.initialize_in_background()`` — schedules ``_ensure_initialized`` in the default executor via an asyncio Task, returns immediately. A done-callback prints the bare error to stderr on failure so the operator still sees the actionable "Run: python -m code_locator index <repo_path>" hint that #243 wrote. - ``_ensure_initialized`` now serializes its body via a threading.Lock. Sync callers from worker threads (the ``asyncio.to_thread(ctx.code_graph.<method>, ...)`` pattern every tool handler already uses) block on the lock until the background Task finishes, then see the post-init state and proceed. No callsite needs to know about the background Task. - ``_run_init_body`` extracted from ``_ensure_initialized`` so tests can monkey-patch the slow body without bypassing the lock/state machine — the lock + Task glue is what's under test. - ``wait_until_ready()`` — optional async gate for callers that want to explicitly await readiness from an async context and surface a structured error to the MCP client on failure. - ``server.py:serve_stdio`` — replaces ``await get_code_locator().initialize()`` with ``get_code_locator().initialize_in_background()`` (synchronous, no await). Stderr message rewritten to reflect the new contract. Trade-off: #243's "server refuses to boot when index is empty" becomes "first code-locator tool call fails loudly when index is empty." Operator still sees the failure on stderr at boot via the done-callback. The fail-loud contract from #243 phase-2 signoff Q3 is preserved, just relocated from boot-time to first-tool-call-time. Measured: JSON-RPC ``initialize`` reply now lands in ~16ms on this repo's own 150MB code-graph.db (was ~45s). Closes #380 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RealCodeLocatorAdapter.initialize_in_background()schedules index init as a background asyncio Task and returns immediately;server.py:serve_stdiouses it instead ofawait initialize()._ensure_initializednow serializes viathreading.Lock— sync callers reaching the adapter throughasyncio.to_thread(ctx.code_graph.<method>, ...)block on the lock until the background Task finishes, then see post-init state and proceed. No callsite needs to know about the background Task.initializereply lands in ~16ms on this repo's own 150MBcode-graph.db(was ~45s under feat(preflight): eliminate silent graph-expansion fallbacks (#173 follow-up) #243's eager-init path).Why
Pre-fix,
serve_stdiodidawait get_code_locator().initialize()inline before opening the MCP stdio transport (server.py:1395). The cold path on a populated index opens sqlite-vec, loads tree-sitter, and reads a 19MB BM25 pickle — ~45s on this repo. Claude Code's MCP client gives up after 30s, so the server "started" but no client ever saw the handshake reply. Symptom:Failed to reconnect to bicameral: MCP server "bicameral" connection timed out after 30000ms.The fail-loud contract #243 phase-2 signoff Q3 wrote — "server refuses to boot when the index is broken" — is preserved but relocated: a background-init failure logs the error to stderr via a
done_callbackso operators see it immediately, and the first code-locator tool call surfaces the same error to the MCP client because_ensure_initializedre-raises through the lock.Linked issues
Closes #380
Linked decisions
(meta: per the just-merged #384 doctrine, org-member PRs cite at least one
decision:<id>. Bicameral MCP is currently disconnected on the dev box because of this bug, so the ingest is blocked; will back-link in a follow-up comment once the binary upgrade lets me reconnect.)Plan / Audit / Seal
Test plan
pytest tests/test_codelocator_background_init.py -v— 5/5 pass. Covers: kickoff returns immediately, second sync caller blocks on lock, failure re-raises throughwait_until_ready, retry-after-failure works, idempotence under repeated kickoffs.pytest tests/test_phase1_code_locator.py tests/test_phase3_integration.py— 15 pass, 1 pre-existing skip. No regression on existing code-locator integrations.ruff check+ruff format --check— clean.initializeagainst the local server with this branch — ~16ms reply on the 150MBcode-graph.db.Out of scope
await wait_until_ready()at the dispatcher. The lock makes this unnecessary — the firstasyncio.to_thread(ctx.code_graph.<method>, ...)blocks the worker thread, not the event loop, so the rest of the server stays responsive.🤖 Generated with Claude Code