Skip to content

fix(ci): SHA-pin test-summary action + rename eval_decision_relevance attr (#272)#273

Merged
jinhongkuan merged 1 commit into
devfrom
fix/272-ci-baseline-action-pin-and-eval-attribute
May 8, 2026
Merged

fix(ci): SHA-pin test-summary action + rename eval_decision_relevance attr (#272)#273
jinhongkuan merged 1 commit into
devfrom
fix/272-ci-baseline-action-pin-and-eval-attribute

Conversation

@Knapp-Kevin

Copy link
Copy Markdown
Collaborator

Summary

Closes 2 of 3 dev-baseline CI regressions tracked in #272. The third (e2e Flow 1 stale expectation post-#263) is deferred to @jinhongkuan for product-level discussion since it touches ingest skill choreography.

cc @jinhongkuan — Layer 4 PR #258 was the original blocker that surfaced these dev-baseline regressions when it rebased. This PR unblocks the two mechanical fixes; the e2e Flow 1 question is yours.

Fix 1: SHA-pin test-summary/action

The v2 mutable tag was repointed on 2026-05-07 22:09 UTC from a dist-targeted release (bundled index.js) to v2.5 targeting main (no bundled output). Every PR run after that fails the Test Summary step with File not found: index.js, marking MCP Regression Suite jobs red on both ubuntu and windows.

Pinned to v2.4 commit 31493c76ec9e7aa675f1585d3ed6f1da69269a86 (last dist-targeted release). Aligns with OWASP-03 / docs/policies/install-trust-model.md discipline (do not trust mutable action tags for security-load-bearing CI).

-        uses: test-summary/action@v2
+        uses: test-summary/action@31493c76ec9e7aa675f1585d3ed6f1da69269a86  # v2.4

Fix 2: tests/eval_decision_relevance.py:166 attribute rename

IngestResponse schema field is pending_grounding_decisions per contracts.py:571. Eval script was reading the legacy internal name ungrounded_decisions off the response, raising AttributeError and failing the M1 adversarial corpus eval step. JSON-output key kept as ungrounded_decisions so downstream consumers see no change.

-        "ungrounded_decisions": list(result.ungrounded_decisions),
+        "ungrounded_decisions": list(result.pending_grounding_decisions),

Deferred to @jinhongkuan: e2e Flow 1 expectation

tests/e2e/run_e2e_flows.py:758 asserts bicameral.ratify after ingest. Post-#263 (eabe7b5), the agent calls ingest → bind (via the new auto-bind step 1.5) and exhausts its budget on ~41 non-bicameral Read/Grep/validate_symbols calls before reaching ingest skill Step 7 (Ratify, positioned LAST). The agent never invokes ratify in Flow 1's run.

Two design questions worth your read before patching:

  1. Is this an acceptable behavior change (auto-bind pushes ratify out of Flow 1's headline path; Flow 5 already validates ratify in PM Friday review) — in which case relax Flow 1's assertion to require [ingest, bind] and rely on Flow 5 for ratify validation?

  2. Or is it a regression (the user's prompt explicitly asks "please log these and sign them off") — in which case the ingest skill needs to be re-ordered so ratify happens before the budget-heavy bind path, OR the auto-bind step 1.5 needs to be deferred until after ratify?

Happy to ship whichever shape you decide; left it out of this PR so it gets the design review it deserves.

Test plan

  • CI: MCP Regression Suite (ubuntu + windows) green (Test Summary step now finds bundled index.js)
  • CI: M1 adversarial corpus eval step exits 0 (the IngestResponse.pending_grounding_decisions access succeeds)
  • CI: ruff + mypy + TruffleHog + Schema Persistence Smoke Test all green (unchanged scope)
  • Once merged, re-run feat(cli): #252 Layer 4 — portable JSON-Lines ledger export/import #258 CI — expect green except possibly e2e Flow 1 (deferred)

🤖 Generated with Claude Code

… attr (#272)

Closes 2 of 3 dev-baseline regressions tracked in #272 (the third — Flow 1
e2e expectation post-#263 auto-bind — is deferred for product-level
discussion since it touches the ingest skill choreography).

## test-summary/action SHA-pin

The `test-summary/action@v2` mutable tag was repointed on 2026-05-07
22:09 UTC from a `dist`-targeted release (with bundled `index.js`) to
the new v2.5 release that targets `main` (no bundled output). Every PR
opened after that point fails the Test Summary step with
`File not found: index.js`, marking MCP Regression Suite jobs red on
both ubuntu and windows.

Pin to v2.4 commit `31493c76ec9e7aa675f1585d3ed6f1da69269a86` (the
last `dist`-targeted release with the bundled artifact). Aligns with
OWASP-03 / `docs/policies/install-trust-model.md` discipline (do not
trust mutable action tags for security-load-bearing CI).

## tests/eval_decision_relevance.py:166 attribute rename

`IngestResponse` schema field is `pending_grounding_decisions` per
`contracts.py:571` and `handlers/ingest.py:695`. The eval script was
still reading the legacy internal name `ungrounded_decisions` directly
off the response, raising `AttributeError` and failing the M1
adversarial corpus eval step (technically `continue-on-error: true`,
but worth fixing independently since the eval result was lost on every
run).

Keeps the JSON-output key `ungrounded_decisions` unchanged so downstream
consumers (M1 trend reports, etc.) see no change.

Both fixes verified locally: yaml.safe_load on the workflow + IngestResponse
field introspection confirms the schema.

Refs #272.
@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f1e280e-a220-4eb9-afe7-c4d4ab136f64

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/272-ci-baseline-action-pin-and-eval-attribute

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Knapp-Kevin Knapp-Kevin added bug Something isn't working P1 High: ship this milestone; user-impacting bug or committed feature infra Infrastructure / build / CI / repo-admin work test Test infrastructure, fixtures, or coverage work labels May 8, 2026
Knapp-Kevin added a commit that referenced this pull request May 8, 2026
…pe substantiated

Layer 5 closes the #252 strategy: `bicameral-mcp diagnose` now surfaces
the export → reset → import recipe (Layer 4's portable vehicle) as the
recommended remediation when drift_status ∈ {drift, unavailable} or
when the ledger is >100 MiB. Recipe is operator-facing display text;
diagnose never executes it. Per gating-is-observability discipline.

Reality matches Promise:
- 8 functional tests + 1 content-contract test all pass
- _compute_suggestions at 30 LOC (≤40), file at 248 LOC (≤250)
- _remediation_recipe helper at 12 LOC
- Layer 3 regression: 5/5 _compute_suggestions tests still pass

Logged deviations:
- Per-branch single-line append compaction beyond plan's example
  shapes (lands function at 30 LOC vs plan's 42-LOC projection)
- bv truthy-guard preserves Layer 3's invariant: version-derivative
  suggestions skip when bicameral_version is unknown/missing

Substantively closes #221 substrate (GDPR Art. 17 right-to-erasure
operator workflow) + #222 (GDPR Art. 15 DSAR).

Plan: plan-252-layer-5-diagnose-remediation.md (round 2 PASS;
gitignored per c206ad4).
Audit: PASS round 2 (round 1 razor-overage + spec-drift remediated).

Branch caveat: based on plan/252-layer-4-export-import; will rebase
onto fresh dev once #258 unblocks via #273 + Jin's Flow 1 fix (#272).
@jinhongkuan

Copy link
Copy Markdown
Contributor

Fix 3 (deferred e2e Flow 1) addressed in #276

Picked recommendation #1 from the body — "acceptable behavior change" — and made it canonical. The ingest skill no longer fires an AskUserQuestion ratify gate; it prints a one-line advisory and stops. Auto-bind stays.

Spec change recorded at BicameralAI/bicameral#108 (comment thread): Flow 1's Step 5 (bind) is now unconditional / auto-fired; Step 7 becomes advisory text; Step 8 (auto-call ratify) is removed from Flow 1's headline path. Ratification belongs to Flow 5 (PM Friday review) or to direct user requests in the same turn ("sign these off", "ratify all") — that shortcut is documented in the new Step 7.

#276 is targeted at dev and stacks behind this PR. Once #273 merges, #276 should drop the third red light from PR #258's CI.

@jinhongkuan jinhongkuan merged commit d7e7357 into dev May 8, 2026
6 checks passed
jinhongkuan pushed a commit that referenced this pull request May 8, 2026
…Fix 3)

Closes the third regression deferred from PR #273. Post-#263 (sync
auto-bind step 1.5), Flow 1's e2e exhausted budget on ~41 non-bicameral
Read/Grep/validate_symbols calls before reaching the ingest skill's
auto-fired ratify AskUserQuestion gate, so the agent never invoked
ratify.

Per #108 Flow 1 spec discussion: this is both a regression AND a
canonical flow change. Auto-bind stays (deterministic, useful), but
ratify drops out of the auto-prompt path and becomes advisory text.
Ratification belongs to Flow 5 (PM Friday review) and to direct user
requests like "sign these off" / "ratify all".

Skill changes (skills/bicameral-ingest/SKILL.md Step 7):

- Replace AskUserQuestion ratify gate with a one-block advisory:
  "○ N decisions captured as proposals — drift tracking activates
   after ratification. Run `bicameral.ratify` when ready, or revisit
   them in your next history review (Flow 5)."
- Add explicit "Direct user request shortcut": if the prompt asked
  for sign-off in the same turn, ratify directly without the round-trip.
- Update the example at the bottom to match.

Test changes:

- tests/e2e/prompts/flow-1-ingest.md: drop "sign them off on our end"
  so the prompt exercises the new advisory path. Direct sign-off
  requests are covered by Flow 5 (history + ratify).
- tests/e2e/run_e2e_flows.py:assert_flow_1: drop the ratify
  requirement; accept [ingest, bind?] as the canonical Flow 1
  signature. Update Flow 5's stale comment about Flow 1 pre-ratifying.
- tests/test_e2e_asserters.py: invert test_flow1_fails_without_ratify
  → test_flow1_passes_without_ratify (advisory-only is now the
  expected behavior).

Doesn't touch sync skill #263 — that auto-bind change is preserved
intentionally; this PR fixes the test/spec contract that conflicted
with it.

Refs #272 #108 #263.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 8, 2026
…Fix 3)

Closes the third regression deferred from PR #273. Post-#263 (sync
auto-bind step 1.5), Flow 1's e2e exhausted budget on ~41 non-bicameral
Read/Grep/validate_symbols calls before reaching the ingest skill's
auto-fired ratify AskUserQuestion gate, so the agent never invoked
ratify.

Per #108 Flow 1 spec discussion: this is both a regression AND a
canonical flow change. Auto-bind stays (deterministic, useful), but
ratify drops out of the auto-prompt path and becomes advisory text.
Ratification belongs to Flow 5 (PM Friday review) and to direct user
requests like "sign these off" / "ratify all".

Skill changes (skills/bicameral-ingest/SKILL.md Step 7):

- Replace AskUserQuestion ratify gate with a one-block advisory:
  "○ N decisions captured as proposals — drift tracking activates
   after ratification. Run `bicameral.ratify` when ready, or revisit
   them in your next history review (Flow 5)."
- Add explicit "Direct user request shortcut": if the prompt asked
  for sign-off in the same turn, ratify directly without the round-trip.
- Update the example at the bottom to match.

Test changes:

- tests/e2e/prompts/flow-1-ingest.md: drop "sign them off on our end"
  so the prompt exercises the new advisory path. Direct sign-off
  requests are covered by Flow 5 (history + ratify).
- tests/e2e/run_e2e_flows.py:assert_flow_1: drop the ratify
  requirement; accept [ingest, bind?] as the canonical Flow 1
  signature. Update Flow 5's stale comment about Flow 1 pre-ratifying.
- tests/test_e2e_asserters.py: invert test_flow1_fails_without_ratify
  → test_flow1_passes_without_ratify (advisory-only is now the
  expected behavior).

Doesn't touch sync skill #263 — that auto-bind change is preserved
intentionally; this PR fixes the test/spec contract that conflicted
with it.

Refs #272 #108 #263.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request May 14, 2026
…residual)

Aligns preflight-eval.yml with the test-mcp-regression.yml pin landed in
PR #273. Closes the last of #272's three CI-baseline regressions —
preflight-eval was the only remaining consumer of the unpinned mutable
@v2 tag whose published artifact silently broke between PR #257 and
PR #258 (index.js missing from the action's bundled output).

Same SHA (31493c76ec9e7aa675f1585d3ed6f1da69269a86, v2.4) used in
test-mcp-regression.yml:213 so a future bump is one grep-and-replace.

Per docs/policies/install-trust-model.md (OWASP A06 supply-chain
discipline): no GitHub Action runs in our CI from a mutable tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working infra Infrastructure / build / CI / repo-admin work P1 High: ship this milestone; user-impacting bug or committed feature test Test infrastructure, fixtures, or coverage work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants