grounding accuracy MRR@3 0.59 → 0.79, recall 14% → 30% by silongtan · Pull Request #28 · BicameralAI/bicameral-mcp

silongtan · 2026-04-19T04:13:16Z

Summary

MRR@3: 0.592 → 0.786 (+32.8%) — measures if correct code is in top-3 results
Recall: 13.9% → 29.9% (+16pp) — measures if correct symbols are found
Variance: 0.360 → 0.156 — consistency across repos (Medusa/Saleor/Vendure)
49/49 tests pass

Changes

Pipeline improvements (`adapters/code_locator.py`)

Two-track token extraction: identifier tokens (camelCase/compound) go through fuzzy matching; raw NL words filtered via keyword blocklist. Prevents "void"→Void,
"state"→State graph seed pollution while preserving "checkout"→Checkout
Case-form bigrams: adjacent NL words generate PascalCase/snake_case candidates
Coverage tiers widened: (2,80,5)→(3,75,5) at Tier 0
Symbol type priority as Stage 2 tiebreaker

Eval metric improvements (`tests/eval_code_locator.py`)

Case-insensitive symbol matching
All-component extraction from qualified names

Ground truth corrections (`tests/fixtures/expected/decisions.py`)

Removed 5 phantom Medusa symbols
Fixed PluginManager→PluginsManager, on_commit→fulfillment_created
Fixed overly-specific Vendure file patterns

Config

BM25 k1/b params now configurable
Fuzzy scorer configurable (WRatio/token_set_ratio/partial_ratio)

Test plan

49/49 unit tests pass
Eval on 3 OSS repos (30 decisions)
Saleor MRR stable at 0.864 (no regression)
Variance under 0.25 CI gate

Summary by CodeRabbit

Release Notes

New Features
- Added configurable BM25 tuning parameters for search indexing optimization
- Introduced selectable fuzzy matching scorer strategies
- Enhanced symbol ranking with type-priority ordering
Bug Fixes
- Improved case-insensitive symbol matching and detection
- Refined tokenization for better identifier recognition and relevance filtering

Two-track token extraction prevents NL-word pollution in graph seeds (e.g. "void" → Void, "return" → Return) while preserving domain words (e.g. "checkout" → Checkout). Case-form bigrams bridge vocabulary gap between NL descriptions and code identifiers. Pipeline changes: - Three-track token extraction: identifiers, domain words, case-form bigrams - Keyword blocklist filters programming words from fuzzy matching - Coverage tiers widened: (2,80,5)→(3,75,5) at Tier 0 - RRF k lowered 60→40 for sharper ranking - Symbol type priority as Stage 2 tiebreaker Ground truth corrections: - Remove 5 phantom Medusa symbols (CartCompletionStrategy, etc.) - Fix PluginManager→PluginsManager for Saleor - Fix on_commit→fulfillment_created (Django builtin → actual symbol) - Fix overly-specific Vendure file patterns Results: recall 13.9%→25.5%, variance 0.360→0.156, 49/49 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Eval improvements (25.5% → 29.9% recall): - Case-insensitive symbol matching in recall computation - All-component extraction from qualified names (EventBusService.emit now adds both EventBusService and emit to found_symbols) - Case-insensitive _is_relevant() for MRR consistency Pipeline preparation: - Configurable fuzzy scorer via config (WRatio/token_set_ratio/partial_ratio) - BM25 k1/b params wired through config → bm25s constructor - Enables future grid search tuning without code changes 49/49 tests pass. MRR@3 stable at 0.786. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-19T04:13:29Z

📝 Walkthrough

Walkthrough

Configuration and algorithm tuning across code locator's grounding and retrieval systems: adjusted coverage-tier limits (max_files increased 2→3, 4→5, 6→7) and fuzzy thresholds, added symbol-type-priority ranking, expanded tokenization with regex-based identifier detection, exposed BM25 hyperparameters (k1, b) and RRF parameter (rrf_k) as configurable options, added pluggable fuzzy scorer selection, and made test symbol matching case-insensitive with corresponding fixture and assertion updates.

Changes

Cohort / File(s)	Summary
Configuration & Runtime `code_locator/config.py`, `code_locator_runtime.py`	Added BM25 tuning parameters `bm25_k1` (1.5) and `bm25_b` (0.75); adjusted RRF parameter `rrf_k` default from 60 to 40. Updated `rebuild_index()` to explicitly pass these hyperparameters to BM25 initialization.
Symbol Grounding & Ranking `adapters/code_locator.py`	Tightened coverage-tier thresholds (max_files: 2→3, 4→5, 6→7; fuzzy scores: 80→75, 70→65, 60→55). Added `_SYMBOL_TYPE_PRIORITY` mapping with `_type_priority()` helper; extended per-file symbol ranking to include symbol type priority as third sort key. Expanded `_ground_single` tokenization with regex-based identifier detection, keyword filtering, and three-track token generation (compound/camelCase identifiers, domain words, and case-form bigrams).
BM25 Retrieval `code_locator/retrieval/bm25s_client.py`	Updated `index()` method signature to accept optional `k1` and `b` BM25 tuning parameters (defaults 1.5 and 0.75); passed parameters to `bm25s.BM25` constructor.
Fuzzy Matching `code_locator/tools/validate_symbols.py`	Refactored Stage 2 of `_fuzzy_match` to select scorer dynamically from config (`WRatio`, `token_set_ratio`, or `partial_ratio`), defaulting to `WRatio`. Score now computes maximum of selected scorer's outputs for both `name` and `qualified_name`.
Test Evaluation & Fixtures `tests/eval_code_locator.py`, `tests/fixtures/expected/decisions.py`	Made symbol matching case-insensitive in `_is_relevant()` and recall computation; expanded `found_symbols` collection to include all dot-separated symbol parts. Updated decision fixture expectations for symbol mappings (removed `CartCompletionStrategy`, `JobSchedulerService`, `PaymentSessionService`, `PluginManager`, `WebhookEndpoint`; renamed `PluginManager` → `PluginsManager`; adjusted file pattern matching for Vendure fixtures).
Test Coverage Loop `tests/test_coverage_loop.py`	Updated tier progression expectations: strict tier 0 now triggers at `max_f == 3` (was 2), tier 1 at `max_f == 5` (was 4), full sequence `[3, 5, 7]` (was `[2, 4, 6]`). Adjusted all related assertions for `tiers_tried` and tier stamping tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

fix: compound token extraction for grounding recall (9%→14%) #26 — Directly extends identifier-token extraction in _ground_single tokenization with expanded regex-based detection and multi-track token generation.
fix: Stage 1 symbol ranking by fuzzy score — MRR@3 0.39→0.58 #25 — Modifies symbol ranking logic in _ground_single to incorporate fuzzy match scores, complementing the new symbol-type-priority tiebreak.
decision grounding reuse + coverage loop #5 — Adjusts grounding/coverage loop tier thresholds and _ground_single behavior, overlapping with tier max_files and tokenization changes.

Suggested reviewers

jinhongkuan

Poem

🐰 Hops through the thresholds, tightening the knot,
Three token tracks dance where once was just thought,
BM25's parameters now tuned just right,
Symbols ranked by type—what a glorious sight!
Case-insensitive matching, fuzzy and fleet,
Config and testing complete the retreat. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly reflects the main achievement of the PR: significant improvements in grounding accuracy metrics (MRR@3 from 0.59 to 0.79, recall from 14% to 30%), which aligns with the PR objectives that demonstrate these exact metric improvements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch silong/grounding-accuracy-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (6)

tests/eval_code_locator.py (1)
129-133: Minor consistency nit: denominator could use lowercased set for symmetry.

If expected_symbols ever contains case-variant duplicates (e.g., {"Foo", "foo"}) — unlikely with the current fixtures — the numerator collapses after .lower() while the denominator uses the raw set cardinality, slightly underreporting recall. Using len(expected_lower) would make the metric consistent with the case-insensitive intent. Not a bug with the current fixture contents; flagging only for robustness.
♻️ Optional tweak
-        recall = matched_count / len(expected_symbols) if expected_symbols else 0
+        recall = matched_count / len(expected_lower) if expected_lower else 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/eval_code_locator.py` around lines 129 - 133, The recall calculation
uses len(expected_symbols) as the denominator while the numerator is derived
from expected_lower (case-folded), which miscounts when expected_symbols has
case-variant duplicates; update the recall computation to use
len(expected_lower) (i.e., recall = matched_count / len(expected_lower) if
expected_lower else 0) so the denominator matches the case-insensitive sets used
to compute matched_count (references: expected_lower, found_lower,
matched_count, recall).
adapters/code_locator.py (3)
152-155: Annotate _SYMBOL_TYPE_PRIORITY as ClassVar (RUF012).

Ruff flags this as a mutable class attribute. The mapping is effectively constant; annotating with ClassVar documents that intent and silences the lint without behavior change.
♻️ Proposed annotation
+from typing import ClassVar
@@
-    _SYMBOL_TYPE_PRIORITY = {
+    _SYMBOL_TYPE_PRIORITY: ClassVar[dict[str, int]] = {
         "class": 0, "interface": 1, "type_alias": 2,
         "function": 3, "method": 4, "variable": 5,
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/code_locator.py` around lines 152 - 155, Annotate the class-level
mapping _SYMBOL_TYPE_PRIORITY with ClassVar to mark it as an immutable class
attribute: add "from typing import ClassVar" to imports (or include ClassVar in
the existing typing import) and change the declaration to use ClassVar[dict[str,
int]] (or ClassVar[Mapping[str, int]] if Mapping is preferred) for the variable
type; keep the existing mapping values and name _SYMBOL_TYPE_PRIORITY unchanged
so the lint (RUF012) is satisfied without behavioral change.
208-210: Dead helper — _type_priority is defined but never called.

The sort lambda at lines 351-355 inlines self._SYMBOL_TYPE_PRIORITY.get(r["type"], 3) directly (which is correct, since the row is already in hand — calling this helper would add an unnecessary db.lookup_by_id per symbol). Either remove the helper or route the sort key through it; leaving it as-is invites future callers to pick the DB-fetching version by mistake.
♻️ Suggested removal
-    def _type_priority(self, sid: int, db) -> int:
-        row = db.lookup_by_id(sid)
-        return self._SYMBOL_TYPE_PRIORITY.get(row["type"], 3) if row else 3
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/code_locator.py` around lines 208 - 210, _remove the dead helper
_type_priority which performs an unnecessary db lookup; delete the method
definition and any unused imports so callers use the inline expression
self._SYMBOL_TYPE_PRIORITY.get(r["type"], 3) (or, if you intended a reusable
helper, replace _type_priority(sid, db) with a non-DB version like
_type_priority_from_row(row) that reads row["type"]). Ensure no remaining
references to _type_priority exist and adjust tests/usages accordingly.
234-267: Tokenization looks correct — one small heads-up on acronym handling.

Three-track extraction with de-dup is clean. One heuristic wrinkle: str.capitalize() lowercases the tail, so acronyms from the description — "JWT permissions" — produce "JwtPermissions" / "jwt_permissions" rather than "JWTPermissions". That's still a plausible symbol candidate but won't match true all-caps acronym-prefixed class names like JWTHandler. If grounding acronyms becomes a recall gap in follow-up evals, consider preserving the original casing of already-upper tokens (e.g., w if w.isupper() else w.capitalize()). Not blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/code_locator.py` around lines 234 - 267, The pascal-case
construction in the case_forms loop lowercases acronym tails via
str.capitalize(), causing all-caps tokens like "JWT" to become "JwtPermissions"
and miss true symbols like "JWTPermissions"; update the loop in
adapters/code_locator.py (the nl_words → case_forms section) to preserve
already-uppercase words when building pascal (e.g., use the original token if
token.isupper() else token.capitalize()) and similarly ensure snake uses
lowercased tokens for separation; keep the other heuristics and the length check
intact.
code_locator/retrieval/bm25s_client.py (1)
68-68: Protocol and concrete-class signatures now diverge — consider keeping them in sync.

BM25Search.index(self, repo_path, output_dir) (per code_locator/retrieval/bm25_protocol.py) no longer matches this concrete implementation, which now also accepts symbol_db, k1, and b. Because all extras have defaults, callers using the protocol type still work, so this is cosmetic — but documenting the tuning knobs at the protocol level would help future BM25 backends (e.g. the mentioned "zoekt") honor them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_locator/retrieval/bm25s_client.py` at line 68, The BM25 protocol
signature and docs need to match the concrete implementation: update
BM25Search.index in bm25_protocol.py to accept the same optional parameters
(symbol_db=None, k1: float = 1.5, b: float = 0.75) and add a short docstring
describing each tuning knob so other backends (e.g., zoekt) can implement them
consistently; ensure the parameter names and default values exactly match the
concrete method index in bm25s_client.py.
code_locator/tools/validate_symbols.py (1)
80-85: Hoist _SCORERS to module or class scope.

The dict is rebuilt on every call to _fuzzy_match. Lifting it to a module-level constant (or class attribute) keeps the scorer resolution centralized and avoids the per-call dict allocation. Functionally fine as-is.
♻️ Proposed refactor
 from rapidfuzz import fuzz

+_SCORERS = {
+    "WRatio": fuzz.WRatio,
+    "token_set_ratio": fuzz.token_set_ratio,
+    "partial_ratio": fuzz.partial_ratio,
+}
+
 # JSON Schema for tool parameter validation
@@
-        _SCORERS = {
-            "WRatio": fuzz.WRatio,
-            "token_set_ratio": fuzz.token_set_ratio,
-            "partial_ratio": fuzz.partial_ratio,
-        }
-        scorer = _SCORERS.get(self.config.fuzzy_scorer, fuzz.WRatio)
+        scorer = _SCORERS.get(self.config.fuzzy_scorer, fuzz.WRatio)
One silent-fallback concern worth considering: an unknown/typo'd fuzzy_scorer value in config currently silently reverts to WRatio. Consider logging a warning so a misconfigured env var (CODE_LOCATOR_FUZZY_SCORER=token_sort_ratio) doesn't quietly run with the wrong scorer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code_locator/tools/validate_symbols.py` around lines 80 - 85, Hoist the
_SCORERS dict out of _fuzzy_match into module scope (e.g. TOP_LEVEL _SCORERS
constant) or a class attribute so it isn’t recreated per call, then have
_fuzzy_match look up scorer = _SCORERS.get(self.config.fuzzy_scorer,
fuzz.WRatio); additionally, detect when the get() falls back (i.e.
self.config.fuzzy_scorer not a key) and emit a warning via the existing logger
(or logging.getLogger(__name__)) indicating the unknown fuzzy_scorer and that
WRatio is being used as a fallback; keep symbol names _SCORERS, _fuzzy_match,
and self.config.fuzzy_scorer exactly as in the diff so the changes are easy to
locate.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@adapters/code_locator.py`:
- Around line 152-155: Annotate the class-level mapping _SYMBOL_TYPE_PRIORITY
with ClassVar to mark it as an immutable class attribute: add "from typing
import ClassVar" to imports (or include ClassVar in the existing typing import)
and change the declaration to use ClassVar[dict[str, int]] (or
ClassVar[Mapping[str, int]] if Mapping is preferred) for the variable type; keep
the existing mapping values and name _SYMBOL_TYPE_PRIORITY unchanged so the lint
(RUF012) is satisfied without behavioral change.
- Around line 208-210: _remove the dead helper _type_priority which performs an
unnecessary db lookup; delete the method definition and any unused imports so
callers use the inline expression self._SYMBOL_TYPE_PRIORITY.get(r["type"], 3)
(or, if you intended a reusable helper, replace _type_priority(sid, db) with a
non-DB version like _type_priority_from_row(row) that reads row["type"]). Ensure
no remaining references to _type_priority exist and adjust tests/usages
accordingly.
- Around line 234-267: The pascal-case construction in the case_forms loop
lowercases acronym tails via str.capitalize(), causing all-caps tokens like
"JWT" to become "JwtPermissions" and miss true symbols like "JWTPermissions";
update the loop in adapters/code_locator.py (the nl_words → case_forms section)
to preserve already-uppercase words when building pascal (e.g., use the original
token if token.isupper() else token.capitalize()) and similarly ensure snake
uses lowercased tokens for separation; keep the other heuristics and the length
check intact.

In `@code_locator/retrieval/bm25s_client.py`:
- Line 68: The BM25 protocol signature and docs need to match the concrete
implementation: update BM25Search.index in bm25_protocol.py to accept the same
optional parameters (symbol_db=None, k1: float = 1.5, b: float = 0.75) and add a
short docstring describing each tuning knob so other backends (e.g., zoekt) can
implement them consistently; ensure the parameter names and default values
exactly match the concrete method index in bm25s_client.py.

In `@code_locator/tools/validate_symbols.py`:
- Around line 80-85: Hoist the _SCORERS dict out of _fuzzy_match into module
scope (e.g. TOP_LEVEL _SCORERS constant) or a class attribute so it isn’t
recreated per call, then have _fuzzy_match look up scorer =
_SCORERS.get(self.config.fuzzy_scorer, fuzz.WRatio); additionally, detect when
the get() falls back (i.e. self.config.fuzzy_scorer not a key) and emit a
warning via the existing logger (or logging.getLogger(__name__)) indicating the
unknown fuzzy_scorer and that WRatio is being used as a fallback; keep symbol
names _SCORERS, _fuzzy_match, and self.config.fuzzy_scorer exactly as in the
diff so the changes are easy to locate.

In `@tests/eval_code_locator.py`:
- Around line 129-133: The recall calculation uses len(expected_symbols) as the
denominator while the numerator is derived from expected_lower (case-folded),
which miscounts when expected_symbols has case-variant duplicates; update the
recall computation to use len(expected_lower) (i.e., recall = matched_count /
len(expected_lower) if expected_lower else 0) so the denominator matches the
case-insensitive sets used to compute matched_count (references: expected_lower,
found_lower, matched_count, recall).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6bb0a65c-4ad5-4cb6-a0b7-7154001e85d6

📥 Commits

Reviewing files that changed from the base of the PR and between f06b5df and 6e466f1.

📒 Files selected for processing (8)

adapters/code_locator.py
code_locator/config.py
code_locator/retrieval/bm25s_client.py
code_locator/tools/validate_symbols.py
code_locator_runtime.py
tests/eval_code_locator.py
tests/fixtures/expected/decisions.py
tests/test_coverage_loop.py

Three-round audit cycle (VETO -> VETO -> PASS) for Notion ingest + cache contract migration. Plan ships across five phases: - Phase 0 — cache contract migration (schema v1->v2, schema_version table, callable migration dispatch, upsert_canonical_extraction) - Phase 0.5 — worker-task lifecycle pattern + Slack reference wiring (closes the v0 dormant-Slack-worker gap) - Phase 1 — Notion API client + property serializer (internal- integration auth, no OAuth router) - Phase 2 — Notion ingest worker (per-database watermark, peer- authored team_event) - Phase 3 — Notion task registration on lifespan META_LEDGER entries #29-#33 capture: round-1 VETO (4 missing/ undeclared symbols), round-2 VETO (1 wrong-call-shape for decrypt_token), round-3 PASS, IMPLEMENT, and SUBSTANTIATION. SHADOW_GENOME #7 addendum extends the PARALLEL_STRUCTURE_ASSUMED detection heuristic with three new in-sketch checks: signature, type-boundary, helper-symmetry. The two VETOs in this session are the empirical justification. SYSTEM_STATE.md adds the Priority C v1 section: schema state (v2), architectural properties achieved, audit cycle outcomes, implementation deviations from plan. Merkle seal: SHA256(content_hash + previous_hash) = dcb619104e6d88b97a04689093b80b9f03825f9a24bac3c3b9ab3d0107ff24d7 (content_hash 9f003c40..., previous_hash 6f4f8f8f... = Priority C v0 SEAL at Entry #28). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

QorLogic SDLC governance trail for the Priority C v0 implementation that landed in commits 1-4 of this PR. Includes: - docs/research-brief-priority-c-selective-ingest-2026-05-02.md (v3) — research substrate. v1 was rejected for INVARIANT_FROM_IMPLEMENTATION (treating v0 agent-fetches-only code state as product principle); v2 added playbook substrate; v3 narrowed to Slack-first + team-server + CocoIndex-conditional after operator dialogue clarified "no managed backend" = "no human-ops-tax architecture," not "no backend." - plan-priority-c-team-server-slack-v0.md (437 LOC) — the L3 plan with five phases (Phase 5 deferred per "if we can manage it" feasibility caveat). - docs/SHADOW_GENOME.md Failure Entry #6 + addendum — captures the framing-error pattern AND the "anti-goals must be parsed by their load-bearing keyword" lesson; symmetric to v0-code-as-principle. - docs/META_LEDGER.md Entries #27 (IMPLEMENT) + #28 (SEAL). Predecessor: efd0304b (#135-triage seal on dev). Implement chain: 211ffb9e. Substantiation seal: 6f4f8f8f1d63ad82b952a3c6aff270d30584e08b0572077ff685e84ce453f6c2 - docs/SYSTEM_STATE.md — Priority C v0 section appended; documents schema additions, architectural properties achieved, audit advisory disposition, Phase 5 deferred state, and the qor-logic-internal steps skipped for downstream-project rationale. - .agent/staging/AUDIT_REPORT.md — PASS verdict, three non-blocking advisories all addressed at implement-time. Verdict: REALITY = PROMISE for Phases 1-4. Phase 5 (CocoIndex #136) explicitly deferred per plan slip-independence design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three-round audit cycle (VETO -> VETO -> PASS) for Notion ingest + cache contract migration. Plan ships across five phases: - Phase 0 — cache contract migration (schema v1->v2, schema_version table, callable migration dispatch, upsert_canonical_extraction) - Phase 0.5 — worker-task lifecycle pattern + Slack reference wiring (closes the v0 dormant-Slack-worker gap) - Phase 1 — Notion API client + property serializer (internal- integration auth, no OAuth router) - Phase 2 — Notion ingest worker (per-database watermark, peer- authored team_event) - Phase 3 — Notion task registration on lifespan META_LEDGER entries #29-#33 capture: round-1 VETO (4 missing/ undeclared symbols), round-2 VETO (1 wrong-call-shape for decrypt_token), round-3 PASS, IMPLEMENT, and SUBSTANTIATION. SHADOW_GENOME #7 addendum extends the PARALLEL_STRUCTURE_ASSUMED detection heuristic with three new in-sketch checks: signature, type-boundary, helper-symmetry. The two VETOs in this session are the empirical justification. SYSTEM_STATE.md adds the Priority C v1 section: schema state (v2), architectural properties achieved, audit cycle outcomes, implementation deviations from plan. Merkle seal: SHA256(content_hash + previous_hash) = dcb619104e6d88b97a04689093b80b9f03825f9a24bac3c3b9ab3d0107ff24d7 (content_hash 9f003c40..., previous_hash 6f4f8f8f... = Priority C v0 SEAL at Entry #28). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

silongtan and others added 2 commits April 18, 2026 21:42

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

silongtan merged commit eca0c8d into main Apr 19, 2026
1 check passed

Knapp-Kevin mentioned this pull request May 2, 2026

Priority C v0 — self-managing team-server, Slack-first ingest #153

Closed

8 tasks

Knapp-Kevin mentioned this pull request May 7, 2026

seal(#218 LLM-06): META_LEDGER entry #44 — skills manifest signing substantiated #251

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grounding accuracy MRR@3 0.59 → 0.79, recall 14% → 30% #28

grounding accuracy MRR@3 0.59 → 0.79, recall 14% → 30% #28
silongtan merged 2 commits into
mainfrom
silong/grounding-accuracy-improvements

silongtan commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

silongtan commented Apr 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Pipeline improvements (adapters/code_locator.py)

Eval metric improvements (tests/eval_code_locator.py)

Ground truth corrections (tests/fixtures/expected/decisions.py)

Config

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

silongtan commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

Pipeline improvements (`adapters/code_locator.py`)

Eval metric improvements (`tests/eval_code_locator.py`)

Ground truth corrections (`tests/fixtures/expected/decisions.py`)

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading