Skip to content

fix(server): move code-locator init off MCP stdio handshake (#380)#385

Merged
jinhongkuan merged 1 commit into
devfrom
fix/380-codelocator-init-off-handshake
May 16, 2026
Merged

fix(server): move code-locator init off MCP stdio handshake (#380)#385
jinhongkuan merged 1 commit into
devfrom
fix/380-codelocator-init-off-handshake

Conversation

@jinhongkuan

Copy link
Copy Markdown
Contributor

Summary

  • RealCodeLocatorAdapter.initialize_in_background() schedules index init as a background asyncio Task and returns immediately; server.py:serve_stdio uses it instead of await initialize().
  • _ensure_initialized now serializes via threading.Lock — sync callers reaching the adapter through asyncio.to_thread(ctx.code_graph.<method>, ...) block on the lock until the background Task finishes, then see post-init state and proceed. No callsite needs to know about the background Task.
  • Measured: JSON-RPC initialize reply lands in ~16ms on this repo's own 150MB code-graph.db (was ~45s under feat(preflight): eliminate silent graph-expansion fallbacks (#173 follow-up) #243's eager-init path).

Why

Pre-fix, serve_stdio did await get_code_locator().initialize() inline before opening the MCP stdio transport (server.py:1395). The cold path on a populated index opens sqlite-vec, loads tree-sitter, and reads a 19MB BM25 pickle — ~45s on this repo. Claude Code's MCP client gives up after 30s, so the server "started" but no client ever saw the handshake reply. Symptom: Failed to reconnect to bicameral: MCP server "bicameral" connection timed out after 30000ms.

The fail-loud contract #243 phase-2 signoff Q3 wrote — "server refuses to boot when the index is broken" — is preserved but relocated: a background-init failure logs the error to stderr via a done_callback so operators see it immediately, and the first code-locator tool call surfaces the same error to the MCP client because _ensure_initialized re-raises through the lock.

Linked issues

Closes #380

Linked decisions

(meta: per the just-merged #384 doctrine, org-member PRs cite at least one decision:<id>. Bicameral MCP is currently disconnected on the dev box because of this bug, so the ingest is blocked; will back-link in a follow-up comment once the binary upgrade lets me reconnect.)

Plan / Audit / Seal

  • Plan: trivial; risk:L2 — touches the boot path of every MCP session, but the change is mechanical (await → fire-and-forget + lock) and reversible.

Test plan

  • pytest tests/test_codelocator_background_init.py -v — 5/5 pass. Covers: kickoff returns immediately, second sync caller blocks on lock, failure re-raises through wait_until_ready, retry-after-failure works, idempotence under repeated kickoffs.
  • pytest tests/test_phase1_code_locator.py tests/test_phase3_integration.py — 15 pass, 1 pre-existing skip. No regression on existing code-locator integrations.
  • ruff check + ruff format --check — clean.
  • Manual smoke: timed JSON-RPC initialize against the local server with this branch — ~16ms reply on the 150MB code-graph.db.

Out of scope

  • Issue feat(preflight): eliminate silent graph-expansion fallbacks (#173 follow-up) #243's eager-init contract itself. We keep "fail loud when the index is broken"; we just move the loud moment from boot to first tool call.
  • Streaming tool calls' explicit await wait_until_ready() at the dispatcher. The lock makes this unnecessary — the first asyncio.to_thread(ctx.code_graph.<method>, ...) blocks the worker thread, not the event loop, so the rest of the server stays responsive.
  • A separate "ready" probe tool exposed to the MCP client. Open as a follow-up if pilots want to see init progress before issuing the first real tool call.

🤖 Generated with Claude Code

Pre-fix, serve_stdio awaited get_code_locator().initialize() inline
before opening the MCP stdio transport. On a 150MB+ symbol-index DB
the cold path took ~45s (sqlite-vec open + tree-sitter load + BM25
pickle load), blowing past Claude Code's 30s MCP initialize timeout
on real-world repos — the server "started" but the JSON-RPC handshake
never landed and the client gave up.

Fix:

- ``RealCodeLocatorAdapter.initialize_in_background()`` — schedules
  ``_ensure_initialized`` in the default executor via an asyncio Task,
  returns immediately. A done-callback prints the bare error to stderr
  on failure so the operator still sees the actionable "Run: python -m
  code_locator index <repo_path>" hint that #243 wrote.
- ``_ensure_initialized`` now serializes its body via a
  threading.Lock. Sync callers from worker threads (the
  ``asyncio.to_thread(ctx.code_graph.<method>, ...)`` pattern every
  tool handler already uses) block on the lock until the background
  Task finishes, then see the post-init state and proceed. No callsite
  needs to know about the background Task.
- ``_run_init_body`` extracted from ``_ensure_initialized`` so tests
  can monkey-patch the slow body without bypassing the lock/state
  machine — the lock + Task glue is what's under test.
- ``wait_until_ready()`` — optional async gate for callers that want
  to explicitly await readiness from an async context and surface a
  structured error to the MCP client on failure.
- ``server.py:serve_stdio`` — replaces ``await
  get_code_locator().initialize()`` with
  ``get_code_locator().initialize_in_background()`` (synchronous, no
  await). Stderr message rewritten to reflect the new contract.

Trade-off: #243's "server refuses to boot when index is empty"
becomes "first code-locator tool call fails loudly when index is
empty." Operator still sees the failure on stderr at boot via the
done-callback. The fail-loud contract from #243 phase-2 signoff Q3
is preserved, just relocated from boot-time to first-tool-call-time.

Measured: JSON-RPC ``initialize`` reply now lands in ~16ms on this
repo's own 150MB code-graph.db (was ~45s).

Closes #380

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan added P2 Medium: next milestone or two; default for new issues post-triage fix Bug fix or correctness repair tool MCP tool or handler surface code-locator Code locator, symbol index, or code graph surface labels May 16, 2026
@jinhongkuan jinhongkuan had a problem deploying to recording-approval May 16, 2026 03:46 — with GitHub Actions Failure
@coderabbitai

coderabbitai Bot commented May 16, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6bd08765-7e32-4c66-a9a6-8f980f3f24dc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/380-codelocator-init-off-handshake

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jinhongkuan jinhongkuan enabled auto-merge May 16, 2026 03:46
@jinhongkuan jinhongkuan merged commit dac8f0f into dev May 16, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code-locator Code locator, symbol index, or code graph surface fix Bug fix or correctness repair P2 Medium: next milestone or two; default for new issues post-triage tool MCP tool or handler surface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant