fix: Stage 1 symbol ranking by fuzzy score — MRR@3 0.39→0.58 by silongtan · Pull Request #25 · BicameralAI/bicameral-mcp

silongtan · 2026-04-17T16:36:15Z

Summary

Stage 1 file→symbol expansion was ranking symbols by start_line (insertion order) instead of fuzzy relevance score, causing the grounding pipeline to select
wrong symbols from correct files
Preserve match_score from validate_symbols as a dict[int, float] and use it as the within-file sort key instead of line number
Stage 2 fallback also fixed: sorts by score descending instead of rowid

Results

Repo	MRR@3 Before	MRR@3 After
medusa	~0.39	0.458
saleor	~0.39	0.773
vendure	~0.39	0.500
Aggregate	0.392	0.577

Rank overflow (relevant symbol exists but ranked beyond top-3): 0 across all repos.

Test plan

29/29 unit tests pass (phase1, phase3, coverage_loop)
Eval harness: aggregate MRR@3 0.392 → 0.577 (+47%)
No regressions in hit rate or grounding rate

Summary by CodeRabbit

Refactor
- Enhanced symbol relevance scoring and ranking mechanism within the code locator functionality to improve how matching symbols are prioritized and resolved.

Stage 1 file→symbol expansion was selecting symbols by start_line (insertion order) instead of fuzzy relevance score, dropping MRR@3 from ~0.54 to ~0.39. Preserve match_score from validate_symbols and use it as the within-file ranking key. Aggregate MRR@3: 0.39 → 0.58. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-17T16:36:30Z

📝 Walkthrough

Walkthrough

Updated _ground_single method in code locator adapter to track symbol relevance scores using a dictionary instead of a set, enabling improved ranking and fallback resolution logic based on match quality scores rather than arbitrary ordering.

Changes

Cohort / File(s)	Summary
Symbol Scoring & Ranking `adapters/code_locator.py`	Replaced `matched_ids` set with `matched_scores` dictionary to track fuzzy and name-matched symbol scores. Updated per-file symbol ranking to sort by relevance and descending scores, and modified Stage 2 fallback ordering to use score-based precedence.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

fix: FC-2 follow-ups — symbol seeding, team-mode test unwrap, step-1 parser #24: Modifies the same RealCodeLocatorAdapter._ground_single method's symbol selection logic, introducing name-based ID seeding that complements the new scoring approach.

Poem

🐰 With scores now tracked instead of sets,
The symbols rank their best bets,
Fuzzy matching finds the way,
As relevance saves the day! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title precisely describes the main change: fixing Stage 1 symbol ranking by using fuzzy scores, and quantifies the improvement (MRR@3 0.39→0.58).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch silong/p0-fix-stage1-symbol-ranking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

adapters/code_locator.py (1)

279-282: Tiebreak for non-relevant symbols relies on stable sort order.

For symbols not in relevant_ids, the secondary key collapses to -0 across all rows, so the final order depends on the iteration order returned by db.lookup_by_file(fp). That's almost certainly start_line (previous behavior) and fine in practice, but worth a one-liner comment here so a future refactor of lookup_by_file ordering doesn't silently regress ranking quality.

📝 Optional clarifying comment

                     ranked = sorted(
                         file_symbols,
+                        # Relevant (fuzzy/name-matched) symbols first, then by
+                        # descending match_score. Non-relevant rows tie at score 0
+                        # and fall back to db.lookup_by_file order (start_line) via
+                        # Python's stable sort.
                         key=lambda r: (r["id"] not in relevant_ids, -matched_scores.get(r["id"], 0)),
                     )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@adapters/code_locator.py` around lines 279 - 282, The sort key for building
`ranked` collapses to the same secondary value for all non-relevant symbols, so
the final order depends on the input order from `file_symbols` (produced by
`db.lookup_by_file(fp)`); add a one-line comment next to the `ranked =
sorted(...)` call explaining that non-relevant ties rely on the stable iteration
order from `db.lookup_by_file(fp)` (currently start_line order) to preserve
expected ranking behavior and to warn future maintainers against changing
`lookup_by_file` ordering without adjusting this tie-break logic (referencing
`ranked`, `file_symbols`, `relevant_ids`, and `db.lookup_by_file(fp)`).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@adapters/code_locator.py`:
- Around line 279-282: The sort key for building `ranked` collapses to the same
secondary value for all non-relevant symbols, so the final order depends on the
input order from `file_symbols` (produced by `db.lookup_by_file(fp)`); add a
one-line comment next to the `ranked = sorted(...)` call explaining that
non-relevant ties rely on the stable iteration order from
`db.lookup_by_file(fp)` (currently start_line order) to preserve expected
ranking behavior and to warn future maintainers against changing
`lookup_by_file` ordering without adjusting this tie-break logic (referencing
`ranked`, `file_symbols`, `relevant_ids`, and `db.lookup_by_file(fp)`).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a77c69f4-1e5c-4372-842b-a7d954923b45

📥 Commits

Reviewing files that changed from the base of the PR and between 8d8cf30 and 6adcfdf.

📒 Files selected for processing (1)

adapters/code_locator.py

…iation seal Reality matches Promise. Three changes (2 repo files + 2 deferred external gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged advisory — manual verification mitigates); Section 4 razor clean. Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7). Implementation: Entry #25 chain hash 51c8a45c. Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5 Architectural decision recorded: bicameral-mcp#135's auto-resolve direction abandoned (no caller LLM in hook context, MCP sampling not viable in Claude Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync. The tooltip surfaces the pending state; the human in their session is the qualified judge. Plan addition tracking (Entry #24 preconditions, final state): ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here) 🟡 #1 — PR description manual verification step (composed in /qor-document) 🟡 #3 — #135 close comment README/docs deferral (composed in /qor-document) Surfaced for follow-up (not blocking): bicameral-mcp#125 scope should be widened — 7 skills under pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/ location claimed by pilot/mcp/CLAUDE.md. Spec correction queued (post-merge gh action): bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates exists when it does not; collision detection lives caller-side via bicameral-context-sentry skill, surfaces via bicameral.preflight.unresolved_collisions. Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/ runtime helpers absent (gate artifacts not written), tools/reliability/ validators absent (Steps 4.6–4.8 skipped), agent-teams not declared, codex-plugin not declared (solo audit/seal), intent_lock capture skipped. Refs #135. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

silongtan merged commit 8c73e68 into main Apr 17, 2026
1 check passed

This was referenced Apr 18, 2026

fix: compound token extraction for grounding recall (9%→14%) #26

Merged

grounding accuracy MRR@3 0.59 → 0.79, recall 14% → 30% #28

Merged

This was referenced Apr 30, 2026

triage(#135): dashboard tooltip → /bicameral-sync (scope-cut from auto-resolve) #138

Merged

release: v0.13.6 (triage) #140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Stage 1 symbol ranking by fuzzy score — MRR@3 0.39→0.58 #25

fix: Stage 1 symbol ranking by fuzzy score — MRR@3 0.39→0.58 #25
silongtan merged 1 commit into
mainfrom
silong/p0-fix-stage1-symbol-ranking

silongtan commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

silongtan commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

silongtan commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading