Conversation
…ification (Amara hardening)
Closes the biggest remaining bug Amara flagged on the v0 tool: it
conflated "failed check exists" with "failed required gate exists,"
so non-required transient flakes (e.g. submit-nuget on this repo)
produced spurious nextAction=fix-failed-checks even while the actual
merge gate was clean and waiting on required checks.
Amara's blade:
A failed check is not automatically a failed gate.
## Changes
1. **`fetchPR` now also queries `gh pr checks --required --json name`**
to obtain the required-check name set. Failure is non-fatal: parse
errors fall back to v0 semantics (treat all checks as required —
conservative) rather than crashing.
2. **GateReport gains two fields**:
- `requiredChecks: { ok, inProgress, pending, failed }` — counts
restricted to the required-check name set.
- `warnings: string[]` — diagnostic notes about non-required
failures (e.g. "non-required check failed: submit-nuget").
3. **`classifyGate` and `nextAction` now use required-only counts**
for failure detection. Non-required failures don't gate; they
surface in `warnings`.
4. **`classifyChecks` accepts an optional filter** so the same
classification logic produces both the all-checks summary and
the required-only summary without code duplication.
5. **New regression fixture `non-required-failure-warning.json`**
demonstrates the locked-in v1 behavior: required=clean +
non-required=FAILURE → gate=BLOCKED (mergeStateStatus drives this),
nextAction=wait-ci (required still in-progress), warnings=[
"non-required check failed: submit-nuget"].
## Verification
Live: `bun tools/github/poll-pr-gate.ts 921` (now-merged) reports
state=MERGED, gate=CLEAN, requiredChecks separate from total checks
(7 required vs 24 total — submit-nuget and other non-required
checks correctly filtered).
Fixture: `non-required-failure-warning.json` produces the exact
shape Amara specified — required-only counts drive nextAction,
non-required failure becomes a warning string.
All four prior fixtures (`clean-armed-auto-merge`,
`blocked-by-threads`, `status-context-error`, `behind-needs-rebase`)
still pass without regression.
## Backward compatibility
The `checks` field still reports all-checks counts. The new
`requiredChecks` and `warnings` fields are additive. Consumers
that read `nextAction` get more accurate results without code
changes.
## Verbatim Amara packet preserved at
`docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md` —
includes the seven findings, integration outcomes per finding,
and carved blades. Per Otto-363 substrate-or-it-didn't-happen.
## Findings deferred (per substrate-rate)
- Layered action model (gateAction / maintenanceAction / etc) —
minimal subset shipped (warnings array); full layering deferred.
- Local preflight script for memory edits — proper tool-build job,
composes with task #355 matrix coverage.
- Additional regression fixtures (stale-check / huge-payload /
missing-fixture) — first regression-from-review-finding shipped
here; rest queued.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bb6f78a762
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex caught a real bug in v1: my code only treated `gh pr checks` exit 0 as parseable JSON, but the manual documents exit 4 (checks failed) and exit 8 (checks pending) as the same shape — JSON output is valid; the exit code is just a status hint. The v1 fix targeted the case where required checks are still pending and a non-required check failed. In that exact case, `gh pr checks --required` returns exit 8, my code skipped parsing, requiredCheckNames stayed empty, buildReport fell back to v0 semantics, and the bug persisted. Per `gh help exit-codes` for `gh pr checks`: - 0 OK - 1 invalid invocation - 2 gh CLI / auth error - 4 one or more checks failed (JSON still valid) - 8 checks pending (JSON still valid) Now accepting 0/4/8 as parseable. Exits 1/2 still fall back to v0 semantics. Verified against live PR #923 (this PR — required checks in-progress): the tool now correctly distinguishes total checks from required-only checks even mid-CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pull Request is not mergeable
There was a problem hiding this comment.
Pull request overview
Hardens tools/github/poll-pr-gate.ts by distinguishing failures in required merge-gate checks from failures in non-required checks, reducing false “fix-failed-checks” actions for transient/non-gating CI failures.
Changes:
- Add required-check metadata (
requiredCheckNames) fetched viagh pr checks --required, and computerequiredChecksseparately from allchecks. - Classify
nextAction/gate-failure based on required-only counts, while surfacing non-required failures aswarnings. - Add a regression fixture for “non-required failure should warn, not gate-fail”, and preserve the motivating feedback packet in
docs/research/**.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tools/github/poll-pr-gate.ts | Adds required-vs-non-required check separation, warnings, and required-check fetch via gh pr checks --required. |
| tools/github/fixtures/non-required-failure-warning.json | New fixture covering non-required failure warning behavior. |
| docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md | Preserves the verbatim feedback packet motivating the v1 hardening changes. |
…ing corrections
Four Copilot v1 review threads addressed:
1. **Persona names removed from current-state code** (Otto-279) —
"Amara" attributions in code comments replaced with role-refs
("v1 hardening per peer review 2026-04-30"). Full attribution
stays in the docs/research/ packet, not in the script.
2. **Top-of-file output-shape comment + v0 note updated** to match
v1 emitted shape (`requiredChecks` + `warnings` documented;
v0/v1 lifecycle clarified).
3. **`requiredCheckNames` tri-state semantics** — was empty-array-
means-fallback; Copilot correctly noted that conflated "fetch
failed" with "fetched, zero required checks." Now `undefined`
means unknown/fallback; defined (even empty) means trusted
ground-truth. fetchPR sets `undefined` on parse failure.
4. **PR description regression-fixture wording** corrected — fixture
has IN_PROGRESS required, not "clean" (the demo is no failed
required + non-required failure → wait-ci, not all-clean).
Plus framing corrections from a Claude.ai + Aaron packet preserved
verbatim at docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md
(appended section):
- Aaron: best-practices-mapping has *always* been substrate work,
not "now substrate work" — the prior commit message's framing
was wrong.
- Claude.ai: Insight-block pattern has been escalating into
self-validation ritual; two-condition rule going forward
(generalizes beyond current case AND not already documented in
canonical substrate). Going forward: produce the work, let the
diff carry the evidence.
Verified: tsc clean, all 5 fixtures classify correctly, live
PR #923 self-test shows tri-state semantics working
(`requiredChecks` filtered correctly when fetch succeeds).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Final feedback packet from Deepseek post-PR #924 merge. Most findings already shipped: - submit-nuget transient → PR #923 (v1 hardening, required-vs- non-required classification) - MEMORY.md merge-conflict tax → PR #920 (merge=union driver) - Stale project-file internals cleanup → B-0112 P2 row filed New finding: 30+ dot threshold for deferred-task re-audit (not new lanes, just already-scoped tiny fixes). Composes with Ani's strict-enforcement framing. Per Otto-363 substrate-or-it-didn't-happen. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* research: preserve Ani + Alexia v1 feedback packets verbatim Both peer-AI reviewers responded after PR #921 (poll-pr-gate v0) + PR #922 (memory-points-at-script) merged. Per Otto-363 substrate-or-it-didn't-happen, preserving both packets verbatim at `docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md`. Both packets predominantly "what's working" with smaller actionable findings. Substantive items overlap with PR #923 (v1 hardening, already on main) or queued under existing tasks: - "submit-nuget non-required classification" (both reviewers) — shipped in PR #923. - "Dot-tick discipline still leaky" (Ani #1) — accepted as behavior change going forward (no code, no substrate; commitment). - "Pre-merge mechanical guards" (Ani #3) — persona-name scanner + fixture-name validator composed with task #350 (Otto-357 mechanized auditor) and task #355 (poll-the-gate matrix coverage). Defer per substrate-rate. - "MEMORY.md duplicate-link timing" (Ani #4) — audit candidate; the check fired on PR #922 but missed PR #916. Defer. - "Task list at 58 open" (Ani #5) — pending-task audit overdue; composes with task #321 recovery lane. - "Thread categorization" + "intelligent compaction" (Alexia) — research-grade, not yet operational. No Insight-block commentary added per the discipline accepted in the prior Claude.ai packet absorption: produce the work, let the diff carry the evidence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(research): mark check-github-status path as in-flight at PR #924 (Copilot) Same in-flight-xref pattern caught earlier this session — code-span path implied existence-on-main, but the file is on PR #924's branch. Reworded to make the in-flight status explicit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * research: preserve Deepseek session-arc retrospective packet Final feedback packet from Deepseek post-PR #924 merge. Most findings already shipped: - submit-nuget transient → PR #923 (v1 hardening, required-vs- non-required classification) - MEMORY.md merge-conflict tax → PR #920 (merge=union driver) - Stale project-file internals cleanup → B-0112 P2 row filed New finding: 30+ dot threshold for deferred-task re-audit (not new lanes, just already-scoped tiny fixes). Composes with Ani's strict-enforcement framing. Per Otto-363 substrate-or-it-didn't-happen. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Closes the biggest remaining bug Amara flagged on PR #921's v0 tool: it conflated "failed check exists" with "failed required gate exists," producing spurious
nextAction=fix-failed-checksfor non-required transient flakes (e.g.submit-nuget) even while the actual merge gate was clean and waiting on required checks.Amara's carved blade:
What's load-bearing
fetchPRqueriesgh pr checks --required --json nameto get the required-check name set. Non-fatal on failure (falls back to v0 semantics — conservative).requiredChecksfield added toGateReport— same counts shape aschecks, restricted to required-only.warningsfield added — string array of non-required failures (e.g."non-required check failed: submit-nuget").classifyGate+nextActionuse required-only counts for failure detection. Non-required failures surface inwarningsbut don't gate merge.non-required-failure-warning.jsondemonstrates: required=clean + non-required=FAILURE →gate=BLOCKED(from mergeStateStatus),nextAction=wait-ci,warnings=["non-required check failed: submit-nuget"].Verification
bun tools/github/poll-pr-gate.ts 921correctly reports 7 required vs 24 total checks (filtered).clean-armed-auto-merge,blocked-by-threads,status-context-error,behind-needs-rebase) pass without regression.bunx tsc --noEmit -p .clean.Backward compatibility
The
checksfield still reports all-checks counts. NewrequiredChecks+warningsfields are purely additive. Consumers readingnextActionget more-accurate results without any code changes.Verbatim Amara packet preserved at
docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md— full 7 findings with integration outcomes per finding. Per Otto-363 substrate-or-it-didn't-happen.Findings deferred (per substrate-rate, queued under task #355)
gateAction/maintenanceAction/reviewAction/mergeAction) — minimal subset (warnings) shipped; full layering deferred until consumers need it.Test plan
🤖 Generated with Claude Code