fix: Stage 1 symbol ranking by fuzzy score — MRR@3 0.39→0.58 #25
Conversation
Stage 1 file→symbol expansion was selecting symbols by start_line (insertion order) instead of fuzzy relevance score, dropping MRR@3 from ~0.54 to ~0.39. Preserve match_score from validate_symbols and use it as the within-file ranking key. Aggregate MRR@3: 0.39 → 0.58. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughUpdated Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
adapters/code_locator.py (1)
279-282: Tiebreak for non-relevant symbols relies on stable sort order.For symbols not in
relevant_ids, the secondary key collapses to-0across all rows, so the final order depends on the iteration order returned bydb.lookup_by_file(fp). That's almost certainlystart_line(previous behavior) and fine in practice, but worth a one-liner comment here so a future refactor oflookup_by_fileordering doesn't silently regress ranking quality.📝 Optional clarifying comment
ranked = sorted( file_symbols, + # Relevant (fuzzy/name-matched) symbols first, then by + # descending match_score. Non-relevant rows tie at score 0 + # and fall back to db.lookup_by_file order (start_line) via + # Python's stable sort. key=lambda r: (r["id"] not in relevant_ids, -matched_scores.get(r["id"], 0)), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adapters/code_locator.py` around lines 279 - 282, The sort key for building `ranked` collapses to the same secondary value for all non-relevant symbols, so the final order depends on the input order from `file_symbols` (produced by `db.lookup_by_file(fp)`); add a one-line comment next to the `ranked = sorted(...)` call explaining that non-relevant ties rely on the stable iteration order from `db.lookup_by_file(fp)` (currently start_line order) to preserve expected ranking behavior and to warn future maintainers against changing `lookup_by_file` ordering without adjusting this tie-break logic (referencing `ranked`, `file_symbols`, `relevant_ids`, and `db.lookup_by_file(fp)`).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@adapters/code_locator.py`:
- Around line 279-282: The sort key for building `ranked` collapses to the same
secondary value for all non-relevant symbols, so the final order depends on the
input order from `file_symbols` (produced by `db.lookup_by_file(fp)`); add a
one-line comment next to the `ranked = sorted(...)` call explaining that
non-relevant ties rely on the stable iteration order from
`db.lookup_by_file(fp)` (currently start_line order) to preserve expected
ranking behavior and to warn future maintainers against changing
`lookup_by_file` ordering without adjusting this tie-break logic (referencing
`ranked`, `file_symbols`, `relevant_ids`, and `db.lookup_by_file(fp)`).
…iation seal Reality matches Promise. Three changes (2 repo files + 2 deferred external gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged advisory — manual verification mitigates); Section 4 razor clean. Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7). Implementation: Entry #25 chain hash 51c8a45c. Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5 Architectural decision recorded: bicameral-mcp#135's auto-resolve direction abandoned (no caller LLM in hook context, MCP sampling not viable in Claude Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync. The tooltip surfaces the pending state; the human in their session is the qualified judge. Plan addition tracking (Entry #24 preconditions, final state): ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here) 🟡 #1 — PR description manual verification step (composed in /qor-document) 🟡 #3 — #135 close comment README/docs deferral (composed in /qor-document) Surfaced for follow-up (not blocking): bicameral-mcp#125 scope should be widened — 7 skills under pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/ location claimed by pilot/mcp/CLAUDE.md. Spec correction queued (post-merge gh action): bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates exists when it does not; collision detection lives caller-side via bicameral-context-sentry skill, surfaces via bicameral.preflight.unresolved_collisions. Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/ runtime helpers absent (gate artifacts not written), tools/reliability/ validators absent (Steps 4.6–4.8 skipped), agent-teams not declared, codex-plugin not declared (solo audit/seal), intent_lock capture skipped. Refs #135. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
start_line(insertion order) instead of fuzzy relevance score, causing the grounding pipeline to selectwrong symbols from correct files
match_scorefromvalidate_symbolsas adict[int, float]and use it as the within-file sort key instead of line numberResults
Rank overflow (relevant symbol exists but ranked beyond top-3): 0 across all repos.
Test plan
Summary by CodeRabbit