Skip to content

tools(github): poll-pr-gate v1 — required-vs-non-required check classification (Amara hardening)#923

Merged
AceHack merged 3 commits intomainfrom
ops/poll-pr-gate-v1-required-check-classification-2026-04-30
Apr 30, 2026
Merged

tools(github): poll-pr-gate v1 — required-vs-non-required check classification (Amara hardening)#923
AceHack merged 3 commits intomainfrom
ops/poll-pr-gate-v1-required-check-classification-2026-04-30

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 30, 2026

Summary

Closes the biggest remaining bug Amara flagged on PR #921's v0 tool: it conflated "failed check exists" with "failed required gate exists," producing spurious nextAction=fix-failed-checks for non-required transient flakes (e.g. submit-nuget) even while the actual merge gate was clean and waiting on required checks.

Amara's carved blade:

A failed check is not automatically a failed gate.

What's load-bearing

  1. fetchPR queries gh pr checks --required --json name to get the required-check name set. Non-fatal on failure (falls back to v0 semantics — conservative).
  2. requiredChecks field added to GateReport — same counts shape as checks, restricted to required-only.
  3. warnings field added — string array of non-required failures (e.g. "non-required check failed: submit-nuget").
  4. classifyGate + nextAction use required-only counts for failure detection. Non-required failures surface in warnings but don't gate merge.
  5. New regression fixture non-required-failure-warning.json demonstrates: required=clean + non-required=FAILURE → gate=BLOCKED (from mergeStateStatus), nextAction=wait-ci, warnings=["non-required check failed: submit-nuget"].

Verification

  • Live: bun tools/github/poll-pr-gate.ts 921 correctly reports 7 required vs 24 total checks (filtered).
  • New fixture matches Amara's specified shape.
  • All 4 prior fixtures (clean-armed-auto-merge, blocked-by-threads, status-context-error, behind-needs-rebase) pass without regression.
  • bunx tsc --noEmit -p . clean.

Backward compatibility

The checks field still reports all-checks counts. New requiredChecks + warnings fields are purely additive. Consumers reading nextAction get more-accurate results without any code changes.

Verbatim Amara packet preserved at

docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md — full 7 findings with integration outcomes per finding. Per Otto-363 substrate-or-it-didn't-happen.

Findings deferred (per substrate-rate, queued under task #355)

  • Layered action model (gateAction / maintenanceAction / reviewAction / mergeAction) — minimal subset (warnings) shipped; full layering deferred until consumers need it.
  • Local preflight script for memory edits — proper tool-build job.
  • Additional regression fixtures (stale-check / huge-payload / missing-fixture) — first one shipped; rest queued.

Test plan

  • markdownlint clean (research doc)
  • tsc clean (poll-pr-gate.ts)
  • All 5 fixtures classify correctly
  • Live PR mode works against merged + open PRs
  • CI green

🤖 Generated with Claude Code

…ification (Amara hardening)

Closes the biggest remaining bug Amara flagged on the v0 tool: it
conflated "failed check exists" with "failed required gate exists,"
so non-required transient flakes (e.g. submit-nuget on this repo)
produced spurious nextAction=fix-failed-checks even while the actual
merge gate was clean and waiting on required checks.

Amara's blade:

    A failed check is not automatically a failed gate.

## Changes

1. **`fetchPR` now also queries `gh pr checks --required --json name`**
   to obtain the required-check name set. Failure is non-fatal: parse
   errors fall back to v0 semantics (treat all checks as required —
   conservative) rather than crashing.

2. **GateReport gains two fields**:
   - `requiredChecks: { ok, inProgress, pending, failed }` — counts
     restricted to the required-check name set.
   - `warnings: string[]` — diagnostic notes about non-required
     failures (e.g. "non-required check failed: submit-nuget").

3. **`classifyGate` and `nextAction` now use required-only counts**
   for failure detection. Non-required failures don't gate; they
   surface in `warnings`.

4. **`classifyChecks` accepts an optional filter** so the same
   classification logic produces both the all-checks summary and
   the required-only summary without code duplication.

5. **New regression fixture `non-required-failure-warning.json`**
   demonstrates the locked-in v1 behavior: required=clean +
   non-required=FAILURE → gate=BLOCKED (mergeStateStatus drives this),
   nextAction=wait-ci (required still in-progress), warnings=[
   "non-required check failed: submit-nuget"].

## Verification

Live: `bun tools/github/poll-pr-gate.ts 921` (now-merged) reports
state=MERGED, gate=CLEAN, requiredChecks separate from total checks
(7 required vs 24 total — submit-nuget and other non-required
checks correctly filtered).

Fixture: `non-required-failure-warning.json` produces the exact
shape Amara specified — required-only counts drive nextAction,
non-required failure becomes a warning string.

All four prior fixtures (`clean-armed-auto-merge`,
`blocked-by-threads`, `status-context-error`, `behind-needs-rebase`)
still pass without regression.

## Backward compatibility

The `checks` field still reports all-checks counts. The new
`requiredChecks` and `warnings` fields are additive. Consumers
that read `nextAction` get more accurate results without code
changes.

## Verbatim Amara packet preserved at

`docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md` —
includes the seven findings, integration outcomes per finding,
and carved blades. Per Otto-363 substrate-or-it-didn't-happen.

## Findings deferred (per substrate-rate)

- Layered action model (gateAction / maintenanceAction / etc) —
  minimal subset shipped (warnings array); full layering deferred.
- Local preflight script for memory edits — proper tool-build job,
  composes with task #355 matrix coverage.
- Additional regression fixtures (stale-check / huge-payload /
  missing-fixture) — first regression-from-review-finding shipped
  here; rest queued.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 16:43
@AceHack AceHack enabled auto-merge (squash) April 30, 2026 16:43
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb6f78a762

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/github/poll-pr-gate.ts Outdated
Codex caught a real bug in v1: my code only treated `gh pr checks`
exit 0 as parseable JSON, but the manual documents exit 4 (checks
failed) and exit 8 (checks pending) as the same shape — JSON output
is valid; the exit code is just a status hint.

The v1 fix targeted the case where required checks are still
pending and a non-required check failed. In that exact case,
`gh pr checks --required` returns exit 8, my code skipped
parsing, requiredCheckNames stayed empty, buildReport fell back
to v0 semantics, and the bug persisted.

Per `gh help exit-codes` for `gh pr checks`:
- 0 OK
- 1 invalid invocation
- 2 gh CLI / auth error
- 4 one or more checks failed (JSON still valid)
- 8 checks pending (JSON still valid)

Now accepting 0/4/8 as parseable. Exits 1/2 still fall back to
v0 semantics.

Verified against live PR #923 (this PR — required checks
in-progress): the tool now correctly distinguishes total checks
from required-only checks even mid-CI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
auto-merge was automatically disabled April 30, 2026 16:47

Pull Request is not mergeable

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens tools/github/poll-pr-gate.ts by distinguishing failures in required merge-gate checks from failures in non-required checks, reducing false “fix-failed-checks” actions for transient/non-gating CI failures.

Changes:

  • Add required-check metadata (requiredCheckNames) fetched via gh pr checks --required, and compute requiredChecks separately from all checks.
  • Classify nextAction/gate-failure based on required-only counts, while surfacing non-required failures as warnings.
  • Add a regression fixture for “non-required failure should warn, not gate-fail”, and preserve the motivating feedback packet in docs/research/**.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
tools/github/poll-pr-gate.ts Adds required-vs-non-required check separation, warnings, and required-check fetch via gh pr checks --required.
tools/github/fixtures/non-required-failure-warning.json New fixture covering non-required failure warning behavior.
docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md Preserves the verbatim feedback packet motivating the v1 hardening changes.

Comment thread tools/github/fixtures/non-required-failure-warning.json
Comment thread tools/github/poll-pr-gate.ts
Comment thread tools/github/poll-pr-gate.ts
Comment thread tools/github/poll-pr-gate.ts Outdated
@AceHack AceHack enabled auto-merge (squash) April 30, 2026 16:48
…ing corrections

Four Copilot v1 review threads addressed:

1. **Persona names removed from current-state code** (Otto-279) —
   "Amara" attributions in code comments replaced with role-refs
   ("v1 hardening per peer review 2026-04-30"). Full attribution
   stays in the docs/research/ packet, not in the script.

2. **Top-of-file output-shape comment + v0 note updated** to match
   v1 emitted shape (`requiredChecks` + `warnings` documented;
   v0/v1 lifecycle clarified).

3. **`requiredCheckNames` tri-state semantics** — was empty-array-
   means-fallback; Copilot correctly noted that conflated "fetch
   failed" with "fetched, zero required checks." Now `undefined`
   means unknown/fallback; defined (even empty) means trusted
   ground-truth. fetchPR sets `undefined` on parse failure.

4. **PR description regression-fixture wording** corrected — fixture
   has IN_PROGRESS required, not "clean" (the demo is no failed
   required + non-required failure → wait-ci, not all-clean).

Plus framing corrections from a Claude.ai + Aaron packet preserved
verbatim at docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md
(appended section):

- Aaron: best-practices-mapping has *always* been substrate work,
  not "now substrate work" — the prior commit message's framing
  was wrong.
- Claude.ai: Insight-block pattern has been escalating into
  self-validation ritual; two-condition rule going forward
  (generalizes beyond current case AND not already documented in
  canonical substrate). Going forward: produce the work, let the
  diff carry the evidence.

Verified: tsc clean, all 5 fixtures classify correctly, live
PR #923 self-test shows tri-state semantics working
(`requiredChecks` filtered correctly when fetch succeeds).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AceHack AceHack merged commit 217afb8 into main Apr 30, 2026
24 checks passed
@AceHack AceHack deleted the ops/poll-pr-gate-v1-required-check-classification-2026-04-30 branch April 30, 2026 16:55
AceHack added a commit that referenced this pull request Apr 30, 2026
Final feedback packet from Deepseek post-PR #924 merge. Most
findings already shipped:

- submit-nuget transient → PR #923 (v1 hardening, required-vs-
  non-required classification)
- MEMORY.md merge-conflict tax → PR #920 (merge=union driver)
- Stale project-file internals cleanup → B-0112 P2 row filed

New finding: 30+ dot threshold for deferred-task re-audit (not
new lanes, just already-scoped tiny fixes). Composes with Ani's
strict-enforcement framing.

Per Otto-363 substrate-or-it-didn't-happen.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 30, 2026
* research: preserve Ani + Alexia v1 feedback packets verbatim

Both peer-AI reviewers responded after PR #921 (poll-pr-gate v0)
+ PR #922 (memory-points-at-script) merged. Per Otto-363
substrate-or-it-didn't-happen, preserving both packets verbatim
at `docs/research/2026-04-30-amara-poll-pr-gate-v1-hardening.md`.

Both packets predominantly "what's working" with smaller actionable
findings. Substantive items overlap with PR #923 (v1 hardening,
already on main) or queued under existing tasks:

- "submit-nuget non-required classification" (both reviewers) —
  shipped in PR #923.
- "Dot-tick discipline still leaky" (Ani #1) — accepted as
  behavior change going forward (no code, no substrate; commitment).
- "Pre-merge mechanical guards" (Ani #3) — persona-name scanner +
  fixture-name validator composed with task #350 (Otto-357
  mechanized auditor) and task #355 (poll-the-gate matrix
  coverage). Defer per substrate-rate.
- "MEMORY.md duplicate-link timing" (Ani #4) — audit candidate;
  the check fired on PR #922 but missed PR #916. Defer.
- "Task list at 58 open" (Ani #5) — pending-task audit overdue;
  composes with task #321 recovery lane.
- "Thread categorization" + "intelligent compaction" (Alexia) —
  research-grade, not yet operational.

No Insight-block commentary added per the discipline accepted in
the prior Claude.ai packet absorption: produce the work, let the
diff carry the evidence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(research): mark check-github-status path as in-flight at PR #924 (Copilot)

Same in-flight-xref pattern caught earlier this session — code-span
path implied existence-on-main, but the file is on PR #924's branch.
Reworded to make the in-flight status explicit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* research: preserve Deepseek session-arc retrospective packet

Final feedback packet from Deepseek post-PR #924 merge. Most
findings already shipped:

- submit-nuget transient → PR #923 (v1 hardening, required-vs-
  non-required classification)
- MEMORY.md merge-conflict tax → PR #920 (merge=union driver)
- Stale project-file internals cleanup → B-0112 P2 row filed

New finding: 30+ dot threshold for deferred-task re-audit (not
new lanes, just already-scoped tiny fixes). Composes with Ani's
strict-enforcement framing.

Per Otto-363 substrate-or-it-didn't-happen.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants