fix(ci): SHA-pin test-summary action + rename eval_decision_relevance attr (#272) by Knapp-Kevin · Pull Request #273 · BicameralAI/bicameral-mcp

Knapp-Kevin · 2026-05-08T02:11:42Z

Summary

Closes 2 of 3 dev-baseline CI regressions tracked in #272. The third (e2e Flow 1 stale expectation post-#263) is deferred to @jinhongkuan for product-level discussion since it touches ingest skill choreography.

cc @jinhongkuan — Layer 4 PR #258 was the original blocker that surfaced these dev-baseline regressions when it rebased. This PR unblocks the two mechanical fixes; the e2e Flow 1 question is yours.

Fix 1: SHA-pin `test-summary/action`

The v2 mutable tag was repointed on 2026-05-07 22:09 UTC from a dist-targeted release (bundled index.js) to v2.5 targeting main (no bundled output). Every PR run after that fails the Test Summary step with File not found: index.js, marking MCP Regression Suite jobs red on both ubuntu and windows.

Pinned to v2.4 commit 31493c76ec9e7aa675f1585d3ed6f1da69269a86 (last dist-targeted release). Aligns with OWASP-03 / docs/policies/install-trust-model.md discipline (do not trust mutable action tags for security-load-bearing CI).

-        uses: test-summary/action@v2
+        uses: test-summary/action@31493c76ec9e7aa675f1585d3ed6f1da69269a86  # v2.4

Fix 2: `tests/eval_decision_relevance.py:166` attribute rename

IngestResponse schema field is pending_grounding_decisions per contracts.py:571. Eval script was reading the legacy internal name ungrounded_decisions off the response, raising AttributeError and failing the M1 adversarial corpus eval step. JSON-output key kept as ungrounded_decisions so downstream consumers see no change.

-        "ungrounded_decisions": list(result.ungrounded_decisions),
+        "ungrounded_decisions": list(result.pending_grounding_decisions),

Deferred to @jinhongkuan: e2e Flow 1 expectation

tests/e2e/run_e2e_flows.py:758 asserts bicameral.ratify after ingest. Post-#263 (eabe7b5), the agent calls ingest → bind (via the new auto-bind step 1.5) and exhausts its budget on ~41 non-bicameral Read/Grep/validate_symbols calls before reaching ingest skill Step 7 (Ratify, positioned LAST). The agent never invokes ratify in Flow 1's run.

Two design questions worth your read before patching:

Is this an acceptable behavior change (auto-bind pushes ratify out of Flow 1's headline path; Flow 5 already validates ratify in PM Friday review) — in which case relax Flow 1's assertion to require [ingest, bind] and rely on Flow 5 for ratify validation?
Or is it a regression (the user's prompt explicitly asks "please log these and sign them off") — in which case the ingest skill needs to be re-ordered so ratify happens before the budget-heavy bind path, OR the auto-bind step 1.5 needs to be deferred until after ratify?

Happy to ship whichever shape you decide; left it out of this PR so it gets the design review it deserves.

Test plan

CI: MCP Regression Suite (ubuntu + windows) green (Test Summary step now finds bundled index.js)
CI: M1 adversarial corpus eval step exits 0 (the IngestResponse.pending_grounding_decisions access succeeds)
CI: ruff + mypy + TruffleHog + Schema Persistence Smoke Test all green (unchanged scope)
Once merged, re-run feat(cli): #252 Layer 4 — portable JSON-Lines ledger export/import #258 CI — expect green except possibly e2e Flow 1 (deferred)

🤖 Generated with Claude Code

… attr (#272) Closes 2 of 3 dev-baseline regressions tracked in #272 (the third — Flow 1 e2e expectation post-#263 auto-bind — is deferred for product-level discussion since it touches the ingest skill choreography). ## test-summary/action SHA-pin The `test-summary/action@v2` mutable tag was repointed on 2026-05-07 22:09 UTC from a `dist`-targeted release (with bundled `index.js`) to the new v2.5 release that targets `main` (no bundled output). Every PR opened after that point fails the Test Summary step with `File not found: index.js`, marking MCP Regression Suite jobs red on both ubuntu and windows. Pin to v2.4 commit `31493c76ec9e7aa675f1585d3ed6f1da69269a86` (the last `dist`-targeted release with the bundled artifact). Aligns with OWASP-03 / `docs/policies/install-trust-model.md` discipline (do not trust mutable action tags for security-load-bearing CI). ## tests/eval_decision_relevance.py:166 attribute rename `IngestResponse` schema field is `pending_grounding_decisions` per `contracts.py:571` and `handlers/ingest.py:695`. The eval script was still reading the legacy internal name `ungrounded_decisions` directly off the response, raising `AttributeError` and failing the M1 adversarial corpus eval step (technically `continue-on-error: true`, but worth fixing independently since the eval result was lost on every run). Keeps the JSON-output key `ungrounded_decisions` unchanged so downstream consumers (M1 trend reports, etc.) see no change. Both fixes verified locally: yaml.safe_load on the workflow + IngestResponse field introspection confirms the schema. Refs #272.

coderabbitai · 2026-05-08T02:11:49Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f1e280e-a220-4eb9-afe7-c4d4ab136f64

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/272-ci-baseline-action-pin-and-eval-attribute

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…pe substantiated Layer 5 closes the #252 strategy: `bicameral-mcp diagnose` now surfaces the export → reset → import recipe (Layer 4's portable vehicle) as the recommended remediation when drift_status ∈ {drift, unavailable} or when the ledger is >100 MiB. Recipe is operator-facing display text; diagnose never executes it. Per gating-is-observability discipline. Reality matches Promise: - 8 functional tests + 1 content-contract test all pass - _compute_suggestions at 30 LOC (≤40), file at 248 LOC (≤250) - _remediation_recipe helper at 12 LOC - Layer 3 regression: 5/5 _compute_suggestions tests still pass Logged deviations: - Per-branch single-line append compaction beyond plan's example shapes (lands function at 30 LOC vs plan's 42-LOC projection) - bv truthy-guard preserves Layer 3's invariant: version-derivative suggestions skip when bicameral_version is unknown/missing Substantively closes #221 substrate (GDPR Art. 17 right-to-erasure operator workflow) + #222 (GDPR Art. 15 DSAR). Plan: plan-252-layer-5-diagnose-remediation.md (round 2 PASS; gitignored per c206ad4). Audit: PASS round 2 (round 1 razor-overage + spec-drift remediated). Branch caveat: based on plan/252-layer-4-export-import; will rebase onto fresh dev once #258 unblocks via #273 + Jin's Flow 1 fix (#272).

jinhongkuan · 2026-05-08T03:26:36Z

Fix 3 (deferred e2e Flow 1) addressed in #276

Picked recommendation #1 from the body — "acceptable behavior change" — and made it canonical. The ingest skill no longer fires an AskUserQuestion ratify gate; it prints a one-line advisory and stops. Auto-bind stays.

Spec change recorded at BicameralAI/bicameral#108 (comment thread): Flow 1's Step 5 (bind) is now unconditional / auto-fired; Step 7 becomes advisory text; Step 8 (auto-call ratify) is removed from Flow 1's headline path. Ratification belongs to Flow 5 (PM Friday review) or to direct user requests in the same turn ("sign these off", "ratify all") — that shortcut is documented in the new Step 7.

#276 is targeted at dev and stacks behind this PR. Once #273 merges, #276 should drop the third red light from PR #258's CI.

…Fix 3) Closes the third regression deferred from PR #273. Post-#263 (sync auto-bind step 1.5), Flow 1's e2e exhausted budget on ~41 non-bicameral Read/Grep/validate_symbols calls before reaching the ingest skill's auto-fired ratify AskUserQuestion gate, so the agent never invoked ratify. Per #108 Flow 1 spec discussion: this is both a regression AND a canonical flow change. Auto-bind stays (deterministic, useful), but ratify drops out of the auto-prompt path and becomes advisory text. Ratification belongs to Flow 5 (PM Friday review) and to direct user requests like "sign these off" / "ratify all". Skill changes (skills/bicameral-ingest/SKILL.md Step 7): - Replace AskUserQuestion ratify gate with a one-block advisory: "○ N decisions captured as proposals — drift tracking activates after ratification. Run `bicameral.ratify` when ready, or revisit them in your next history review (Flow 5)." - Add explicit "Direct user request shortcut": if the prompt asked for sign-off in the same turn, ratify directly without the round-trip. - Update the example at the bottom to match. Test changes: - tests/e2e/prompts/flow-1-ingest.md: drop "sign them off on our end" so the prompt exercises the new advisory path. Direct sign-off requests are covered by Flow 5 (history + ratify). - tests/e2e/run_e2e_flows.py:assert_flow_1: drop the ratify requirement; accept [ingest, bind?] as the canonical Flow 1 signature. Update Flow 5's stale comment about Flow 1 pre-ratifying. - tests/test_e2e_asserters.py: invert test_flow1_fails_without_ratify → test_flow1_passes_without_ratify (advisory-only is now the expected behavior). Doesn't touch sync skill #263 — that auto-bind change is preserved intentionally; this PR fixes the test/spec contract that conflicted with it. Refs #272 #108 #263. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@v2

…residual) Aligns preflight-eval.yml with the test-mcp-regression.yml pin landed in PR #273. Closes the last of #272's three CI-baseline regressions — preflight-eval was the only remaining consumer of the unpinned mutable @v2 tag whose published artifact silently broke between PR #257 and PR #258 (index.js missing from the action's bundled output). Same SHA (31493c76ec9e7aa675f1585d3ed6f1da69269a86, v2.4) used in test-mcp-regression.yml:213 so a future bump is one grep-and-replace. Per docs/policies/install-trust-model.md (OWASP A06 supply-chain discipline): no GitHub Action runs in our CI from a mutable tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Knapp-Kevin temporarily deployed to ci-test May 8, 2026 02:11 — with GitHub Actions Inactive

Knapp-Kevin added bug Something isn't working P1 High: ship this milestone; user-impacting bug or committed feature infra Infrastructure / build / CI / repo-admin work test Test infrastructure, fixtures, or coverage work labels May 8, 2026

This was referenced May 8, 2026

Dev CI baseline regressions blocking #258 merge: test-summary action + M1 eval AttributeError + e2e Flow 1 #272

Closed

[DRAFT] feat(diagnose): #252 Layer 5 — opt-in remediation recipe #274

Closed

Knapp-Kevin mentioned this pull request May 8, 2026

refactor(telemetry): consolidate BICAMERAL_TELEMETRY env-var namespace (#192) #250

Merged

6 tasks

jinhongkuan mentioned this pull request May 8, 2026

fix(skill): ingest advises on ratify instead of auto-prompting (#272 Fix 3) #276

Merged

3 tasks

jinhongkuan merged commit d7e7357 into dev May 8, 2026
6 checks passed

Knapp-Kevin mentioned this pull request May 14, 2026

fix(ci): SHA-pin test-summary/action in preflight-eval workflow (#272 residual) #317

Merged

2 tasks

devin-ai-integration Bot mentioned this pull request May 15, 2026

fix(infra): always-run schema CI, diagnose row probe, env-var truthy parity (#122, #301, #232) #353

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): SHA-pin test-summary action + rename eval_decision_relevance attr (#272)#273

fix(ci): SHA-pin test-summary action + rename eval_decision_relevance attr (#272)#273
jinhongkuan merged 1 commit into
devfrom
fix/272-ci-baseline-action-pin-and-eval-attribute

Knapp-Kevin commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026

Review skipped

Uh oh!

jinhongkuan commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Knapp-Kevin commented May 8, 2026

Summary

Fix 1: SHA-pin test-summary/action

Fix 2: tests/eval_decision_relevance.py:166 attribute rename

Deferred to @jinhongkuan: e2e Flow 1 expectation

Test plan

Uh oh!

coderabbitai Bot commented May 8, 2026

Review skipped

Uh oh!

jinhongkuan commented May 8, 2026

Fix 3 (deferred e2e Flow 1) addressed in #276

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix 1: SHA-pin `test-summary/action`

Fix 2: `tests/eval_decision_relevance.py:166` attribute rename