Skip to content

fix: Stage 1 symbol ranking by fuzzy score — MRR@3 0.39→0.58 #25

Merged
silongtan merged 1 commit into
mainfrom
silong/p0-fix-stage1-symbol-ranking
Apr 17, 2026
Merged

fix: Stage 1 symbol ranking by fuzzy score — MRR@3 0.39→0.58 #25
silongtan merged 1 commit into
mainfrom
silong/p0-fix-stage1-symbol-ranking

Conversation

@silongtan

@silongtan silongtan commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Stage 1 file→symbol expansion was ranking symbols by start_line (insertion order) instead of fuzzy relevance score, causing the grounding pipeline to select
    wrong symbols from correct files
  • Preserve match_score from validate_symbols as a dict[int, float] and use it as the within-file sort key instead of line number
  • Stage 2 fallback also fixed: sorts by score descending instead of rowid

Results

Repo MRR@3 Before MRR@3 After
medusa ~0.39 0.458
saleor ~0.39 0.773
vendure ~0.39 0.500
Aggregate 0.392 0.577

Rank overflow (relevant symbol exists but ranked beyond top-3): 0 across all repos.

Test plan

  • 29/29 unit tests pass (phase1, phase3, coverage_loop)
  • Eval harness: aggregate MRR@3 0.392 → 0.577 (+47%)
  • No regressions in hit rate or grounding rate

Summary by CodeRabbit

  • Refactor
    • Enhanced symbol relevance scoring and ranking mechanism within the code locator functionality to improve how matching symbols are prioritized and resolved.

Stage 1 file→symbol expansion was selecting symbols by start_line
(insertion order) instead of fuzzy relevance score, dropping MRR@3
from ~0.54 to ~0.39. Preserve match_score from validate_symbols and
use it as the within-file ranking key. Aggregate MRR@3: 0.39 → 0.58.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Updated _ground_single method in code locator adapter to track symbol relevance scores using a dictionary instead of a set, enabling improved ranking and fallback resolution logic based on match quality scores rather than arbitrary ordering.

Changes

Cohort / File(s) Summary
Symbol Scoring & Ranking
adapters/code_locator.py
Replaced matched_ids set with matched_scores dictionary to track fuzzy and name-matched symbol scores. Updated per-file symbol ranking to sort by relevance and descending scores, and modified Stage 2 fallback ordering to use score-based precedence.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 With scores now tracked instead of sets,
The symbols rank their best bets,
Fuzzy matching finds the way,
As relevance saves the day! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title precisely describes the main change: fixing Stage 1 symbol ranking by using fuzzy scores, and quantifies the improvement (MRR@3 0.39→0.58).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch silong/p0-fix-stage1-symbol-ranking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
adapters/code_locator.py (1)

279-282: Tiebreak for non-relevant symbols relies on stable sort order.

For symbols not in relevant_ids, the secondary key collapses to -0 across all rows, so the final order depends on the iteration order returned by db.lookup_by_file(fp). That's almost certainly start_line (previous behavior) and fine in practice, but worth a one-liner comment here so a future refactor of lookup_by_file ordering doesn't silently regress ranking quality.

📝 Optional clarifying comment
                     ranked = sorted(
                         file_symbols,
+                        # Relevant (fuzzy/name-matched) symbols first, then by
+                        # descending match_score. Non-relevant rows tie at score 0
+                        # and fall back to db.lookup_by_file order (start_line) via
+                        # Python's stable sort.
                         key=lambda r: (r["id"] not in relevant_ids, -matched_scores.get(r["id"], 0)),
                     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/code_locator.py` around lines 279 - 282, The sort key for building
`ranked` collapses to the same secondary value for all non-relevant symbols, so
the final order depends on the input order from `file_symbols` (produced by
`db.lookup_by_file(fp)`); add a one-line comment next to the `ranked =
sorted(...)` call explaining that non-relevant ties rely on the stable iteration
order from `db.lookup_by_file(fp)` (currently start_line order) to preserve
expected ranking behavior and to warn future maintainers against changing
`lookup_by_file` ordering without adjusting this tie-break logic (referencing
`ranked`, `file_symbols`, `relevant_ids`, and `db.lookup_by_file(fp)`).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@adapters/code_locator.py`:
- Around line 279-282: The sort key for building `ranked` collapses to the same
secondary value for all non-relevant symbols, so the final order depends on the
input order from `file_symbols` (produced by `db.lookup_by_file(fp)`); add a
one-line comment next to the `ranked = sorted(...)` call explaining that
non-relevant ties rely on the stable iteration order from
`db.lookup_by_file(fp)` (currently start_line order) to preserve expected
ranking behavior and to warn future maintainers against changing
`lookup_by_file` ordering without adjusting this tie-break logic (referencing
`ranked`, `file_symbols`, `relevant_ids`, and `db.lookup_by_file(fp)`).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a77c69f4-1e5c-4372-842b-a7d954923b45

📥 Commits

Reviewing files that changed from the base of the PR and between 8d8cf30 and 6adcfdf.

📒 Files selected for processing (1)
  • adapters/code_locator.py

@silongtan silongtan merged commit 8c73e68 into main Apr 17, 2026
1 check passed
jinhongkuan added a commit that referenced this pull request Apr 30, 2026
…iation seal

Reality matches Promise. Three changes (2 repo files + 2 deferred external
gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged
advisory — manual verification mitigates); Section 4 razor clean.

Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7).
Implementation: Entry #25 chain hash 51c8a45c.
Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5

Architectural decision recorded: bicameral-mcp#135's auto-resolve direction
abandoned (no caller LLM in hook context, MCP sampling not viable in Claude
Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync.
The tooltip surfaces the pending state; the human in their session is the
qualified judge.

Plan addition tracking (Entry #24 preconditions, final state):
  ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here)
  🟡 #1 — PR description manual verification step (composed in /qor-document)
  🟡 #3#135 close comment README/docs deferral (composed in /qor-document)

Surfaced for follow-up (not blocking):
  bicameral-mcp#125 scope should be widened — 7 skills under
  pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/
  location claimed by pilot/mcp/CLAUDE.md.

Spec correction queued (post-merge gh action):
  bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates
  exists when it does not; collision detection lives caller-side via
  bicameral-context-sentry skill, surfaces via
  bicameral.preflight.unresolved_collisions.

Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/
runtime helpers absent (gate artifacts not written), tools/reliability/
validators absent (Steps 4.6–4.8 skipped), agent-teams not declared,
codex-plugin not declared (solo audit/seal), intent_lock capture skipped.

Refs #135.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant