fix(ci): SHA-pin test-summary action + rename eval_decision_relevance attr (#272)#273
Conversation
… attr (#272) Closes 2 of 3 dev-baseline regressions tracked in #272 (the third — Flow 1 e2e expectation post-#263 auto-bind — is deferred for product-level discussion since it touches the ingest skill choreography). ## test-summary/action SHA-pin The `test-summary/action@v2` mutable tag was repointed on 2026-05-07 22:09 UTC from a `dist`-targeted release (with bundled `index.js`) to the new v2.5 release that targets `main` (no bundled output). Every PR opened after that point fails the Test Summary step with `File not found: index.js`, marking MCP Regression Suite jobs red on both ubuntu and windows. Pin to v2.4 commit `31493c76ec9e7aa675f1585d3ed6f1da69269a86` (the last `dist`-targeted release with the bundled artifact). Aligns with OWASP-03 / `docs/policies/install-trust-model.md` discipline (do not trust mutable action tags for security-load-bearing CI). ## tests/eval_decision_relevance.py:166 attribute rename `IngestResponse` schema field is `pending_grounding_decisions` per `contracts.py:571` and `handlers/ingest.py:695`. The eval script was still reading the legacy internal name `ungrounded_decisions` directly off the response, raising `AttributeError` and failing the M1 adversarial corpus eval step (technically `continue-on-error: true`, but worth fixing independently since the eval result was lost on every run). Keeps the JSON-output key `ungrounded_decisions` unchanged so downstream consumers (M1 trend reports, etc.) see no change. Both fixes verified locally: yaml.safe_load on the workflow + IngestResponse field introspection confirms the schema. Refs #272.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…pe substantiated Layer 5 closes the #252 strategy: `bicameral-mcp diagnose` now surfaces the export → reset → import recipe (Layer 4's portable vehicle) as the recommended remediation when drift_status ∈ {drift, unavailable} or when the ledger is >100 MiB. Recipe is operator-facing display text; diagnose never executes it. Per gating-is-observability discipline. Reality matches Promise: - 8 functional tests + 1 content-contract test all pass - _compute_suggestions at 30 LOC (≤40), file at 248 LOC (≤250) - _remediation_recipe helper at 12 LOC - Layer 3 regression: 5/5 _compute_suggestions tests still pass Logged deviations: - Per-branch single-line append compaction beyond plan's example shapes (lands function at 30 LOC vs plan's 42-LOC projection) - bv truthy-guard preserves Layer 3's invariant: version-derivative suggestions skip when bicameral_version is unknown/missing Substantively closes #221 substrate (GDPR Art. 17 right-to-erasure operator workflow) + #222 (GDPR Art. 15 DSAR). Plan: plan-252-layer-5-diagnose-remediation.md (round 2 PASS; gitignored per c206ad4). Audit: PASS round 2 (round 1 razor-overage + spec-drift remediated). Branch caveat: based on plan/252-layer-4-export-import; will rebase onto fresh dev once #258 unblocks via #273 + Jin's Flow 1 fix (#272).
Fix 3 (deferred e2e Flow 1) addressed in #276Picked recommendation #1 from the body — "acceptable behavior change" — and made it canonical. The ingest skill no longer fires an Spec change recorded at BicameralAI/bicameral#108 (comment thread): Flow 1's Step 5 (bind) is now unconditional / auto-fired; Step 7 becomes advisory text; Step 8 (auto-call ratify) is removed from Flow 1's headline path. Ratification belongs to Flow 5 (PM Friday review) or to direct user requests in the same turn ("sign these off", "ratify all") — that shortcut is documented in the new Step 7. #276 is targeted at |
…Fix 3) Closes the third regression deferred from PR #273. Post-#263 (sync auto-bind step 1.5), Flow 1's e2e exhausted budget on ~41 non-bicameral Read/Grep/validate_symbols calls before reaching the ingest skill's auto-fired ratify AskUserQuestion gate, so the agent never invoked ratify. Per #108 Flow 1 spec discussion: this is both a regression AND a canonical flow change. Auto-bind stays (deterministic, useful), but ratify drops out of the auto-prompt path and becomes advisory text. Ratification belongs to Flow 5 (PM Friday review) and to direct user requests like "sign these off" / "ratify all". Skill changes (skills/bicameral-ingest/SKILL.md Step 7): - Replace AskUserQuestion ratify gate with a one-block advisory: "○ N decisions captured as proposals — drift tracking activates after ratification. Run `bicameral.ratify` when ready, or revisit them in your next history review (Flow 5)." - Add explicit "Direct user request shortcut": if the prompt asked for sign-off in the same turn, ratify directly without the round-trip. - Update the example at the bottom to match. Test changes: - tests/e2e/prompts/flow-1-ingest.md: drop "sign them off on our end" so the prompt exercises the new advisory path. Direct sign-off requests are covered by Flow 5 (history + ratify). - tests/e2e/run_e2e_flows.py:assert_flow_1: drop the ratify requirement; accept [ingest, bind?] as the canonical Flow 1 signature. Update Flow 5's stale comment about Flow 1 pre-ratifying. - tests/test_e2e_asserters.py: invert test_flow1_fails_without_ratify → test_flow1_passes_without_ratify (advisory-only is now the expected behavior). Doesn't touch sync skill #263 — that auto-bind change is preserved intentionally; this PR fixes the test/spec contract that conflicted with it. Refs #272 #108 #263. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Fix 3) Closes the third regression deferred from PR #273. Post-#263 (sync auto-bind step 1.5), Flow 1's e2e exhausted budget on ~41 non-bicameral Read/Grep/validate_symbols calls before reaching the ingest skill's auto-fired ratify AskUserQuestion gate, so the agent never invoked ratify. Per #108 Flow 1 spec discussion: this is both a regression AND a canonical flow change. Auto-bind stays (deterministic, useful), but ratify drops out of the auto-prompt path and becomes advisory text. Ratification belongs to Flow 5 (PM Friday review) and to direct user requests like "sign these off" / "ratify all". Skill changes (skills/bicameral-ingest/SKILL.md Step 7): - Replace AskUserQuestion ratify gate with a one-block advisory: "○ N decisions captured as proposals — drift tracking activates after ratification. Run `bicameral.ratify` when ready, or revisit them in your next history review (Flow 5)." - Add explicit "Direct user request shortcut": if the prompt asked for sign-off in the same turn, ratify directly without the round-trip. - Update the example at the bottom to match. Test changes: - tests/e2e/prompts/flow-1-ingest.md: drop "sign them off on our end" so the prompt exercises the new advisory path. Direct sign-off requests are covered by Flow 5 (history + ratify). - tests/e2e/run_e2e_flows.py:assert_flow_1: drop the ratify requirement; accept [ingest, bind?] as the canonical Flow 1 signature. Update Flow 5's stale comment about Flow 1 pre-ratifying. - tests/test_e2e_asserters.py: invert test_flow1_fails_without_ratify → test_flow1_passes_without_ratify (advisory-only is now the expected behavior). Doesn't touch sync skill #263 — that auto-bind change is preserved intentionally; this PR fixes the test/spec contract that conflicted with it. Refs #272 #108 #263. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…residual) Aligns preflight-eval.yml with the test-mcp-regression.yml pin landed in PR #273. Closes the last of #272's three CI-baseline regressions — preflight-eval was the only remaining consumer of the unpinned mutable @v2 tag whose published artifact silently broke between PR #257 and PR #258 (index.js missing from the action's bundled output). Same SHA (31493c76ec9e7aa675f1585d3ed6f1da69269a86, v2.4) used in test-mcp-regression.yml:213 so a future bump is one grep-and-replace. Per docs/policies/install-trust-model.md (OWASP A06 supply-chain discipline): no GitHub Action runs in our CI from a mutable tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes 2 of 3 dev-baseline CI regressions tracked in #272. The third (e2e Flow 1 stale expectation post-#263) is deferred to @jinhongkuan for product-level discussion since it touches ingest skill choreography.
cc @jinhongkuan — Layer 4 PR #258 was the original blocker that surfaced these dev-baseline regressions when it rebased. This PR unblocks the two mechanical fixes; the e2e Flow 1 question is yours.
Fix 1: SHA-pin
test-summary/actionThe
v2mutable tag was repointed on 2026-05-07 22:09 UTC from adist-targeted release (bundledindex.js) to v2.5 targetingmain(no bundled output). Every PR run after that fails the Test Summary step withFile not found: index.js, marking MCP Regression Suite jobs red on both ubuntu and windows.Pinned to v2.4 commit
31493c76ec9e7aa675f1585d3ed6f1da69269a86(lastdist-targeted release). Aligns with OWASP-03 /docs/policies/install-trust-model.mddiscipline (do not trust mutable action tags for security-load-bearing CI).Fix 2:
tests/eval_decision_relevance.py:166attribute renameIngestResponseschema field ispending_grounding_decisionspercontracts.py:571. Eval script was reading the legacy internal nameungrounded_decisionsoff the response, raisingAttributeErrorand failing the M1 adversarial corpus eval step. JSON-output key kept asungrounded_decisionsso downstream consumers see no change.Deferred to @jinhongkuan: e2e Flow 1 expectation
tests/e2e/run_e2e_flows.py:758assertsbicameral.ratifyafter ingest. Post-#263 (eabe7b5), the agent callsingest → bind(via the new auto-bind step 1.5) and exhausts its budget on ~41 non-bicameral Read/Grep/validate_symbols calls before reaching ingest skill Step 7 (Ratify, positioned LAST). The agent never invokes ratify in Flow 1's run.Two design questions worth your read before patching:
Is this an acceptable behavior change (auto-bind pushes ratify out of Flow 1's headline path; Flow 5 already validates ratify in PM Friday review) — in which case relax Flow 1's assertion to require
[ingest, bind]and rely on Flow 5 for ratify validation?Or is it a regression (the user's prompt explicitly asks "please log these and sign them off") — in which case the ingest skill needs to be re-ordered so ratify happens before the budget-heavy bind path, OR the auto-bind step 1.5 needs to be deferred until after ratify?
Happy to ship whichever shape you decide; left it out of this PR so it gets the design review it deserves.
Test plan
index.js)IngestResponse.pending_grounding_decisionsaccess succeeds)🤖 Generated with Claude Code