Skip to content

fix: FC-2 follow-ups — symbol seeding, team-mode test unwrap, step-1 parser #24

Merged
silongtan merged 2 commits into
mainfrom
silong/fc2-followup-f1-f2-f3
Apr 17, 2026
Merged

fix: FC-2 follow-ups — symbol seeding, team-mode test unwrap, step-1 parser #24
silongtan merged 2 commits into
mainfrom
silong/fc2-followup-f1-f2-f3

Conversation

@silongtan

@silongtan silongtan commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • F1: _ground_single now unions LLM-extracted mapping["symbols"] IDs with regex-token IDs before graph channel seeding — fixes 0% multi-region recall@files by giving
    the graph channel correct seeds instead of NL-fuzzy false positives
  • F2: _dump_graph test helper unwraps TeamWriteAdapter via getattr(ledger, "_inner", ledger) so phase3 integration tests pass in team mode — fixes 6 pre-existing
    failures
  • F3: _extract_step1_excerpt targets ### 1. specifically instead of first ### N. header — fixes 1 pre-existing failure after SKILL.md added Step 0

Test plan

  • pytest tests/ — 435 passed, 0 failed (was 7 failures before)
  • Run eval_code_locator.py against Accountable to measure multi-region recall@files improvement from F1
  • Verify in CI (surrealdb + team mode)

Summary by CodeRabbit

  • Improvements

    • Enhanced symbol identification process to support direct name-based lookups, improving accuracy of code location matching.
  • Tests

    • Updated test infrastructure to ensure compatibility with enhanced parameter handling.
    • Improved test utilities to support wrapped implementation patterns.

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@silongtan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 42 minutes and 13 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 42 minutes and 13 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69088c5b-efff-4929-a088-9c57e2bc7775

📥 Commits

Reviewing files that changed from the base of the PR and between e7cb1ae and 30d03e1.

📒 Files selected for processing (4)
  • adapters/code_locator.py
  • tests/_extract_headless.py
  • tests/test_coverage_loop.py
  • tests/test_phase3_integration.py
📝 Walkthrough

Walkthrough

The PR adds symbol name-based ID seeding to the code locator grounding process, refactors step extraction logic to use direct regex matching, updates test mocks to accept additional parameters, and improves ledger client handling in integration tests.

Changes

Cohort / File(s) Summary
Grounding Symbol Resolution
adapters/code_locator.py
Added optional mapping_symbol_names parameter to _ground_single() enabling name-based ID lookups via database; failures logged at debug level. Updated ground_mappings() to pass symbol names for early ID seeding during grounding tiers.
Test Mock Updates
tests/test_coverage_loop.py
Updated fake_ground_single() test doubles across multiple cases to accept **kwargs parameter, allowing them to tolerate extra parameters passed by production _ground_single() calls.
Test Utility Refactoring
tests/_extract_headless.py, tests/test_phase3_integration.py
Refactored _extract_step1_excerpt() to locate Step 1 via direct regex matching instead of header collection. Enhanced _dump_graph() to resolve inner ledger clients and ensure connectivity before accessing SurrealDB client.

Sequence Diagram

sequenceDiagram
    participant Ground as Ground<br/>Mappings
    participant GS as _ground_single<br/>(with mapping_symbol_names)
    participant DB as Database<br/>(lookup_by_name)
    participant Results as Matched<br/>IDs

    Ground->>GS: Invoke with optional symbol names
    alt When mapping_symbol_names provided
        GS->>DB: lookup_by_name(symbol_name)
        DB-->>GS: Row IDs or error
        GS->>Results: Add resolved IDs to matched_ids
        GS->>GS: Log failures at debug level
    else No symbol names
        GS->>Results: Continue with existing flow
    end
    GS-->>Ground: Return grounded results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A symbol by name is found with grace,
Through database lookups at steady pace,
Mocks bend to kwargs, extraction refined,
Ledger clients unwrapped, all aligned!
Code grows stronger with each careful bind.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes three distinct changes (F1, F2, F3) that address follow-ups to a prior issue, making it specific and informative for scanning history.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch silong/fc2-followup-f1-f2-f3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/_extract_headless.py (1)

144-151: LGTM — Step 1 targeting is correct.

Anchoring on ### 1. with _STEP_HEADER_RE as the terminator properly handles the new Step 0 in SKILL.md and preserves the fallback path. Optional: hoist step1_re to module scope next to _STEP_HEADER_RE to avoid recompiling on every call (minor; re.compile is internally cached anyway).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/_extract_headless.py` around lines 144 - 151, The regular expression
step1_re is being compiled inside the function each call; hoist its definition
to module scope next to the existing _STEP_HEADER_RE so it is compiled once and
reused, then update the function to reference the module-level step1_re
variable; keep the same pattern r"^###\s+1\.\s+" and the existing use of
step1_re.search(body) and _STEP_HEADER_RE.search(body, ...).
adapters/code_locator.py (1)

223-230: Seeding symbol IDs by name looks correct; minor robustness nits.

The union into matched_ids before graph seeding is the right fix for the 0% multi-region recall issue. A couple of small, optional improvements:

  • mapping_symbol_names is passed directly from mapping.get("symbols"), which may contain empty strings, None, or duplicates (e.g., from _build_payload_from_real_code when symbol_name is missing). Consider filtering to avoid unnecessary DB calls.
  • Broad except Exception matches the style elsewhere in this file (Ruff BLE001 is flagged but consistent); fine to keep.
♻️ Optional filter
 if mapping_symbol_names:
-    for name in mapping_symbol_names:
+    for name in {n for n in mapping_symbol_names if isinstance(n, str) and n}:
         try:
             rows = db.lookup_by_name(name)
             for row in rows:
                 matched_ids.add(row["id"])
         except Exception as exc:
             logger.debug("[ground] symbol name lookup failed for '%s': %s", name, exc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/code_locator.py` around lines 223 - 230, mapping_symbol_names coming
from mapping.get("symbols") may include empty strings, None, or duplicates
causing unnecessary db.lookup_by_name calls; before iterating in the block that
updates matched_ids, sanitize the list by removing falsy values and
deduplicating (e.g., convert to a set after filtering) so you only call
db.lookup_by_name for unique, non-empty symbol names; keep the remaining
try/except and matched_ids.add(row["id"]) logic unchanged and reference
mapping_symbol_names, mapping.get("symbols"), db.lookup_by_name, and matched_ids
when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@adapters/code_locator.py`:
- Around line 223-230: mapping_symbol_names coming from mapping.get("symbols")
may include empty strings, None, or duplicates causing unnecessary
db.lookup_by_name calls; before iterating in the block that updates matched_ids,
sanitize the list by removing falsy values and deduplicating (e.g., convert to a
set after filtering) so you only call db.lookup_by_name for unique, non-empty
symbol names; keep the remaining try/except and matched_ids.add(row["id"]) logic
unchanged and reference mapping_symbol_names, mapping.get("symbols"),
db.lookup_by_name, and matched_ids when making the change.

In `@tests/_extract_headless.py`:
- Around line 144-151: The regular expression step1_re is being compiled inside
the function each call; hoist its definition to module scope next to the
existing _STEP_HEADER_RE so it is compiled once and reused, then update the
function to reference the module-level step1_re variable; keep the same pattern
r"^###\s+1\.\s+" and the existing use of step1_re.search(body) and
_STEP_HEADER_RE.search(body, ...).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6160d02-31bd-418e-99be-841b21feebc9

📥 Commits

Reviewing files that changed from the base of the PR and between d6018ef and e7cb1ae.

📒 Files selected for processing (4)
  • adapters/code_locator.py
  • tests/_extract_headless.py
  • tests/test_coverage_loop.py
  • tests/test_phase3_integration.py

silongtan and others added 2 commits April 16, 2026 20:15
F1: _ground_single now accepts mapping_symbol_names and unions
    LLM-extracted symbol IDs with regex-token IDs before graph
    seeding. Fixes 0% multi-region recall@files.

F2: _dump_graph unwraps TeamWriteAdapter via getattr(ledger, "_inner",
    ledger) so phase3 integration tests work in team mode.

F3: _extract_step1_excerpt targets "### 1." specifically instead of
    the first numbered step header (was grabbing Step 0 after SKILL.md
    added boundary detection).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fake_ground_single mocks in test_coverage_loop.py need **kwargs
to accept the new mapping_symbol_names kwarg passed by ground_mappings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@silongtan silongtan force-pushed the silong/fc2-followup-f1-f2-f3 branch from e7cb1ae to 30d03e1 Compare April 17, 2026 00:22
@silongtan silongtan merged commit 8d8cf30 into main Apr 17, 2026
1 check passed
jinhongkuan added a commit that referenced this pull request Apr 30, 2026
…cameral-sync

Scope-cut from #135's original L2 proposal (--auto-resolve-trivial flag on
link_commit). Design enumeration produced 7 options; all required either an
LLM in the deterministic core (violating the "selection over generation"
guardrail) or trivial-cases enumeration with non-zero false-positive risk.

Cut: accept the architectural limit. Post-commit hook stays sync-only.
Resolution path = dashboard tooltip on status === 'pending' rows → user
runs /bicameral-sync in their Claude Code session. No code is auto-resolved.

assets/dashboard.html:
  renderStateCell() ternary at line 455 → if/else if. New 'pending' branch
  attaches tooltip text "Pending compliance — run /bicameral-sync in your
  Claude Code session to resolve." Reuses existing data-tip CSS pattern
  (lines 187–198, hover transitions). Static string literal — no esc()
  needed (no HTML special chars).

skills/bicameral-dashboard/SKILL.md:
  One bullet under Notes documenting the tooltip nudge contract. Per
  pilot/mcp/CLAUDE.md "tool changes ship with skill updates" rule
  (UI behavior changed; tool response shape unchanged).

Section 4 razor: renderStateCell 19 LOC (cap 40), nesting 1 (cap 3),
nested ternaries 0. Replaced ternary with if/else if — improves razor
score, doesn't degrade it.

Verification: manual (no automated test added — dashboard.html has
zero existing test infrastructure; UI test harness absent; PR description
includes manual verification step). Acknowledged advisory in Entry #24
audit.

Refs #135 (close post-merge with scope-cut comment).
Refs BicameralAI/bicameral#108 (Flow 3 spec edit, post-merge gh action).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan added a commit that referenced this pull request Apr 30, 2026
…iation seal

Reality matches Promise. Three changes (2 repo files + 2 deferred external
gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged
advisory — manual verification mitigates); Section 4 razor clean.

Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7).
Implementation: Entry #25 chain hash 51c8a45c.
Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5

Architectural decision recorded: bicameral-mcp#135's auto-resolve direction
abandoned (no caller LLM in hook context, MCP sampling not viable in Claude
Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync.
The tooltip surfaces the pending state; the human in their session is the
qualified judge.

Plan addition tracking (Entry #24 preconditions, final state):
  ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here)
  🟡 #1 — PR description manual verification step (composed in /qor-document)
  🟡 #3#135 close comment README/docs deferral (composed in /qor-document)

Surfaced for follow-up (not blocking):
  bicameral-mcp#125 scope should be widened — 7 skills under
  pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/
  location claimed by pilot/mcp/CLAUDE.md.

Spec correction queued (post-merge gh action):
  bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates
  exists when it does not; collision detection lives caller-side via
  bicameral-context-sentry skill, surfaces via
  bicameral.preflight.unresolved_collisions.

Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/
runtime helpers absent (gate artifacts not written), tools/reliability/
validators absent (Steps 4.6–4.8 skipped), agent-teams not declared,
codex-plugin not declared (solo audit/seal), intent_lock capture skipped.

Refs #135.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan added a commit that referenced this pull request Apr 30, 2026
…cameral-sync

Scope-cut from #135's original L2 proposal (--auto-resolve-trivial flag on
link_commit). Design enumeration produced 7 options; all required either an
LLM in the deterministic core (violating the "selection over generation"
guardrail) or trivial-cases enumeration with non-zero false-positive risk.

Cut: accept the architectural limit. Post-commit hook stays sync-only.
Resolution path = dashboard tooltip on status === 'pending' rows → user
runs /bicameral-sync in their Claude Code session. No code is auto-resolved.

assets/dashboard.html:
  renderStateCell() ternary at line 455 → if/else if. New 'pending' branch
  attaches tooltip text "Pending compliance — run /bicameral-sync in your
  Claude Code session to resolve." Reuses existing data-tip CSS pattern
  (lines 187–198, hover transitions). Static string literal — no esc()
  needed (no HTML special chars).

skills/bicameral-dashboard/SKILL.md:
  One bullet under Notes documenting the tooltip nudge contract. Per
  pilot/mcp/CLAUDE.md "tool changes ship with skill updates" rule
  (UI behavior changed; tool response shape unchanged).

Section 4 razor: renderStateCell 19 LOC (cap 40), nesting 1 (cap 3),
nested ternaries 0. Replaced ternary with if/else if — improves razor
score, doesn't degrade it.

Verification: manual (no automated test added — dashboard.html has
zero existing test infrastructure; UI test harness absent; PR description
includes manual verification step). Acknowledged advisory in Entry #24
audit.

Refs #135 (close post-merge with scope-cut comment).
Refs BicameralAI/bicameral#108 (Flow 3 spec edit, post-merge gh action).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit febb0aa)
jinhongkuan added a commit that referenced this pull request May 1, 2026
…cameral-sync

Scope-cut from #135's original L2 proposal (--auto-resolve-trivial flag on
link_commit). Design enumeration produced 7 options; all required either an
LLM in the deterministic core (violating the "selection over generation"
guardrail) or trivial-cases enumeration with non-zero false-positive risk.

Cut: accept the architectural limit. Post-commit hook stays sync-only.
Resolution path = dashboard tooltip on status === 'pending' rows → user
runs /bicameral-sync in their Claude Code session. No code is auto-resolved.

assets/dashboard.html:
  renderStateCell() ternary at line 455 → if/else if. New 'pending' branch
  attaches tooltip text "Pending compliance — run /bicameral-sync in your
  Claude Code session to resolve." Reuses existing data-tip CSS pattern
  (lines 187–198, hover transitions). Static string literal — no esc()
  needed (no HTML special chars).

skills/bicameral-dashboard/SKILL.md:
  One bullet under Notes documenting the tooltip nudge contract. Per
  pilot/mcp/CLAUDE.md "tool changes ship with skill updates" rule
  (UI behavior changed; tool response shape unchanged).

Section 4 razor: renderStateCell 19 LOC (cap 40), nesting 1 (cap 3),
nested ternaries 0. Replaced ternary with if/else if — improves razor
score, doesn't degrade it.

Verification: manual (no automated test added — dashboard.html has
zero existing test infrastructure; UI test harness absent; PR description
includes manual verification step). Acknowledged advisory in Entry #24
audit.

Refs #135 (close post-merge with scope-cut comment).
Refs BicameralAI/bicameral#108 (Flow 3 spec edit, post-merge gh action).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit febb0aa)
jinhongkuan added a commit that referenced this pull request May 1, 2026
…cameral-sync

Scope-cut from #135's original L2 proposal (--auto-resolve-trivial flag on
link_commit). Design enumeration produced 7 options; all required either an
LLM in the deterministic core (violating the "selection over generation"
guardrail) or trivial-cases enumeration with non-zero false-positive risk.

Cut: accept the architectural limit. Post-commit hook stays sync-only.
Resolution path = dashboard tooltip on status === 'pending' rows → user
runs /bicameral-sync in their Claude Code session. No code is auto-resolved.

assets/dashboard.html:
  renderStateCell() ternary at line 455 → if/else if. New 'pending' branch
  attaches tooltip text "Pending compliance — run /bicameral-sync in your
  Claude Code session to resolve." Reuses existing data-tip CSS pattern
  (lines 187–198, hover transitions). Static string literal — no esc()
  needed (no HTML special chars).

skills/bicameral-dashboard/SKILL.md:
  One bullet under Notes documenting the tooltip nudge contract. Per
  pilot/mcp/CLAUDE.md "tool changes ship with skill updates" rule
  (UI behavior changed; tool response shape unchanged).

Section 4 razor: renderStateCell 19 LOC (cap 40), nesting 1 (cap 3),
nested ternaries 0. Replaced ternary with if/else if — improves razor
score, doesn't degrade it.

Verification: manual (no automated test added — dashboard.html has
zero existing test infrastructure; UI test harness absent; PR description
includes manual verification step). Acknowledged advisory in Entry #24
audit.

Refs #135 (close post-merge with scope-cut comment).
Refs BicameralAI/bicameral#108 (Flow 3 spec edit, post-merge gh action).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit febb0aa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant