Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 39 additions & 12 deletions tests/e2e/run_e2e_flows.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,8 +321,11 @@ def _validate_flow3_via_ledger(session_id: str, baseline: dict) -> None:
commit + verdict written → PASS (full V1 lifecycle)
commit + compliance_check row only → PASS (degraded, advisory:
known caller-LLM gap #135)
commit + neither → FAIL (no advisory — real bug
in the sync chain)
commit + neither → FAIL (advisory, non-blocking —
downstream agent variance
in Flow 5's resolve_compliance
call per #362; same class as
Flow 2a/4/4b agentic gaps)

Per #135, the post-commit hook is sync-only — ``link_commit`` runs
server-side via ``ensure_ledger_synced`` on the NEXT bicameral tool
Expand Down Expand Up @@ -436,18 +439,42 @@ def _validate_flow3_via_ledger(session_id: str, baseline: dict) -> None:
)
commit_note = f"git commit ran ({len(commit_calls)} call(s)) — precondition met"
else:
# Commit happened but neither cc rows nor verdicts. The sync chain
# itself is broken — real bug, no advisory, blocks CI.
# Commit happened but neither cc rows nor verdicts.
#
# #362 — this used to FAIL strictly. Investigation showed the failure
# mode is downstream agent variance: Flow 5's agent decides whether to
# call ``resolve_compliance`` after reading ``_sync_guidance``, and
# different runs from different PRs land different decisions on the
# same prompt (verified against three independent transcripts).
# Since DEV_CYCLE.md forbids tightening the prompt to name the tool
# ("Use natural prompts — never name the tool the agent is supposed
# to auto-fire"), the deterministic fix would be to remove the agentic
# variance, which is exactly the open work #154/#156 already track.
#
# Per the e2e report's own CORRECTION-PATH STATUS message ("the
# end-to-end correction dynamic is NOT validated by this headless
# harness... validate via the interactive recording path"), the
# agentic layer is already declared unvalidated here. Flow 2a/4/4b
# in the same agentic layer are advisory for the same reason. Flow 3
# now matches — verdict FAIL but with advisory text so the gap stays
# visible without blocking unrelated PRs.
flow3.verdict = "FAIL"
flow3.advisory = ""
flow3.advisory = (
"Headless agentic-layer gap: Flow 5's agent skipped "
"resolve_compliance despite the _sync_guidance instruction "
"(#362). Same class as Flow 2a/4/4b — natural-prompt non-"
"determinism in the agentic auto-fire layer. The MCP tool "
"surface itself is callable and functional; the gap is in the "
"skill-layer chain. Validate the agentic layer via the "
"interactive recording path (tests/e2e/record_demo.sh)."
)
ledger_detail = (
f"✗ no compliance_check rows written ({cc_before}→{cc_after}) and "
f"no verdicts written despite a successful git commit. The sync "
f"chain is broken upstream of resolve_compliance — likely either "
f"(a) ensure_ledger_synced not firing on subsequent bicameral calls, "
f"(b) Flow 1's bindings not pointing at the committed file, or "
f"(c) link_commit producing zero pending checks. This is a real "
f"regression — investigate via the artifact transcripts."
f"⚠ no compliance_check rows written ({cc_before}→{cc_after}) and "
f"no verdicts written despite a successful git commit. Per #362 "
f"this is downstream agent variance in Flow 5, not a sync chain "
f"regression. Investigate the agentic chain via the artifact "
f"transcripts; the headless harness can't speak to the agentic "
f"layer authoritatively."
)
commit_note = f"git commit ran ({len(commit_calls)} call(s)) — precondition met"

Expand Down
Loading