Skip to content

fix(preflight): refine render_source_attribution regex + flip default (#209)#239

Merged
Knapp-Kevin merged 4 commits into
devfrom
209-preflight-attribution-regex-refinement
May 7, 2026
Merged

fix(preflight): refine render_source_attribution regex + flip default (#209)#239
Knapp-Kevin merged 4 commits into
devfrom
209-preflight-attribution-regex-refinement

Conversation

@Knapp-Kevin

Copy link
Copy Markdown
Collaborator

Summary

Closes #209. Refines the render_source_attribution redaction regex from the v1 broad \b[A-Z][a-z]+\b to four POSITIONAL-cue patterns, then flips the default from full to redacted (privacy-positive per #200 audit finding A4).

Plan / Audit

What ships

Surface Change
handlers/preflight.py Replace broad _NAME_PATTERN with 4 positional-cue patterns: (?<=· )..., (?<=, )...(?=,?\s+\d{4}-\d{2}-\d{2}), (?<=^Speaker:\s)..., (?<=^From:\s).... Multi-word continuation uses [ \t]+ (not \s+) to avoid line-spanning
context.py:28 _DEFAULT_RENDER_ATTRIBUTION_MODE = "redacted" (was "full")
setup_wizard.py:1005 YAML template flipped: render_source_attribution: redacted (was full)
setup_wizard.py:972-974 Banner print message updated to reflect new default and reverse the opt-in direction
tests/test_preflight_attribution_redaction.py 10 new functional tests covering all 4 cues + preservation contracts + default-flip lock + fresh-install YAML render
tests/test_preflight_render_source_attribution.py 1 pre-existing test updated for the new contract (bare names without positional cues are no longer redacted)

Why no platform-token allowlist

A curated allowlist (Sprint, Linear, GitHub, etc.) was considered as defense-in-depth. Rejected per round-1 audit finding 1: the positional-cue patterns require explicit cues to fire by construction. Context tokens like "Sprint" / "Linear" / "GitHub" appearing in <context-words> position never follow these cues, so they survive without an allowlist. Tests directly verify this contract.

Test plan

  • 10 new functional tests pass (tests/test_preflight_attribution_redaction.py)
  • 96/96 preflight + setup_wizard regression tests pass (1 pre-existing test updated for new contract)
  • ruff check + ruff format --check clean
  • No new dependencies

Acceptance per #209

  • Refined regex matches names + dates without stripping platform/tool tokens
  • New unit tests against representative real-world source_ref strings confirm preserved-vs-redacted boundaries
  • Default flipped to redacted in both context.py default and setup_wizard fresh-install YAML
  • e2e Flow 3 still passes after the flip (CI will validate)

OWASP A04 positive contribution

Closes a fail-open privacy posture: the prior full default leaked names + dates verbatim to the agent's chat surface. The deterministic gate is in place; flipping the default is the privacy-positive move directed by #200 audit finding A4.

🤖 Generated with Claude Code

Knapp-Kevin and others added 3 commits May 6, 2026 19:11
…nt (#209)

Plan B addresses #209: the v1 broad redaction regex (`\b[A-Z][a-z]+\b`)
over-matches every capitalized lowercase token, including platform/tool
names (Sprint, Linear, GitHub, etc.), breaking the agent's structural
parsing of `source_ref`. Plan B replaces it with four POSITIONAL-cue
patterns that require explicit cues (`· `, `, ` adjacent to a date,
`^Speaker:\s`, `^From:\s`) to fire — context tokens never follow these
cues by construction, so no allowlist is needed.

After the refinement, flip the default in both:
- context._DEFAULT_RENDER_ATTRIBUTION_MODE: "full" → "redacted"
- setup_wizard YAML template: "render_source_attribution: full" → "redacted"

Audit: round 1 VETO (2 binding: spec-drift on `_PLATFORM_TOKEN_ALLOWLIST`
declared without consumer; test-failure on ambiguous test description)
→ round 2 PASS after dropping the allowlist (Path A: positional patterns
are precise by construction) and rewriting the test to be unambiguously
functional (invoke `_write_collaboration_config`, assert on rendered YAML).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l cues (#209)

Replace the v1 broad name-redaction regex with four positional-cue patterns:

  - `(?<=· )[A-Z][a-z]+(?:[ \t]+[A-Z][a-z]+)*`  — name after `· ` separator
  - `(?<=, )[A-Z][a-z]+(?:[ \t]+[A-Z][a-z]+)*(?=,?\s+\d{4}-\d{2}-\d{2})` — name before date
  - `(?<=^Speaker:\s)[A-Z][a-z]+(?:[ \t]+[A-Z][a-z]+)*` (re.MULTILINE)
  - `(?<=^From:\s)[A-Z][a-z]+(?:[ \t]+[A-Z][a-z]+)*` (re.MULTILINE)

Names match only after explicit cues; context tokens (Sprint, Linear,
GitHub, etc.) never follow these cues so they survive without an allowlist.

Multi-word continuation uses `[ \t]+` (not `\s+`) to avoid swallowing
text on subsequent lines through `\n`.

Date pattern unchanged (`\b\d{4}-\d{2}-\d{2}\b` is correct).

10 new functional tests in `tests/test_preflight_attribution_redaction.py`
covering the 4 positional cues, platform-token preservation, capitalized
context-word preservation, no-attribution-shape passthrough, full + hidden
mode regression locks, default-flip lock, and fresh-install YAML render.

Updated 1 pre-existing test in `tests/test_preflight_render_source_attribution.py`
to use the canonical attribution shape (`Sprint review · Brian, 2026-03-22`)
since bare names without positional cues are no longer redacted by design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Privacy-positive default flip per #200 audit finding A4. The deterministic
gate (refined regex in handlers/preflight.py) is now precise enough to
preserve agent structural parsing while redacting names and dates. Two
sources-of-truth flip in lockstep:

- context._DEFAULT_RENDER_ATTRIBUTION_MODE: "full" → "redacted"
  (loaded-code default when YAML is missing/malformed)

- setup_wizard.py YAML template at line 1005: "render_source_attribution:
  full" → "render_source_attribution: redacted"
  (fresh-install default written to .bicameral/config.yaml)

Banner print message updated to reflect the new default and reverse the
opt-in direction (was "flip to redacted/hidden", now "flip to full/hidden").

Operators who relied on the verbatim default can opt back via
`render_source_attribution: full` in `.bicameral/config.yaml`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e49400cf-0490-48cd-9853-74885fe5c7cb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 209-preflight-attribution-regex-refinement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

ruff format CI on PR #239 flagged context.py for reformatting. The
default-flip comment line `_DEFAULT_RENDER_ATTRIBUTION_MODE = "redacted"
# #209: ...` exceeded ruff's preferred line length and was wrapped.
Pure formatter pass — no semantic changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Knapp-Kevin Knapp-Kevin had a problem deploying to recording-approval May 6, 2026 23:51 — with GitHub Actions Failure
@Knapp-Kevin Knapp-Kevin merged commit b470658 into dev May 7, 2026
7 of 8 checks passed
@Knapp-Kevin Knapp-Kevin deleted the 209-preflight-attribution-regex-refinement branch May 7, 2026 00:06
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
…icameralAI#225 + BicameralAI#226)

Plan for the three compliance-posture stance declarations:
- BicameralAI#220 / MCP-01: MCP host UX dependency (OWASP LLM-07)
- BicameralAI#225 / NIST-RMF-01 + AI-ACT-02: prohibited-uses declaration
- BicameralAI#226 / SOC2-02: availability stance (operator-run-only)

All three bundle naturally because they share docs/policies/ + a single
README cross-reference section. Pure-doc surface fully disjoint from
in-flight code PRs (BicameralAI#237, BicameralAI#238, BicameralAI#239) — safe as a parallel PR.

Audit: round 1 PASS (L1, doc-only). Doctrine interpretation locked:
for markdown policy artifacts, the unit IS the document content;
read_text() + assert "<commitment>" in content is genuine unit
invocation per qor/references/doctrine-test-functionality.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant