Skip to content

fix: integrate verdict_extract.py into verify-to-new-pr workflow#1434

Merged
stranske merged 4 commits intomainfrom
fix/verdict-policy-workflow-integration
Feb 10, 2026
Merged

fix: integrate verdict_extract.py into verify-to-new-pr workflow#1434
stranske merged 4 commits intomainfrom
fix/verdict-policy-workflow-integration

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Feb 10, 2026

Source: Issue #1429

Automated Status Summary

Scope

PR #1419 addressed issue #1416, but verification raised concerns (CONCERNS) due to non-deterministic verdict selection and inconsistent split-verdict handling across the workflow and follow-up generation. This follow-up closes the remaining gaps by routing all verdict extraction/classification through the shared deterministic policy in verdict_policy.py, adding a machine-readable needs_human gate for low-confidence split verdicts, and validating end-to-end behavior with integration tests.

Context for Agent

Related Issues/PRs

Context for Agent

Related Issues/PRs

Tasks

  • Update .github/workflows/agents-verify-to-new-pr.yml to run a Python entrypoint/module that imports verdict_policy.py and emits structured step outputs (at minimum: verdict, needs_human, and any required metadata) instead of using first-table-row regex match() parsing.
    • Create a Python entrypoint module that imports verdict_policy (verify: confirm completion in repo) parses provider markdown input (verify: confirm completion in repo)
    • Define the structured output format including verdict (verify: formatter passes)
    • Define the structured output format including needs_human (verify: formatter passes)
    • Define the structured output format including and metadata fields (verify: formatter passes)
    • Define scope for: Emit the structured outputs in a format consumable by GitHub Actions step outputs (verify: formatter passes)
    • Implement focused slice for: Emit the structured outputs in a format consumable by GitHub Actions step outputs (verify: formatter passes)
    • Validate focused slice for: Emit the structured outputs in a format consumable by GitHub Actions step outputs (verify: formatter passes)
    • Document the workflow integration requirements for manual implementation by a maintainer (verify: confirm completion in repo)
  • Implement or update a single importable API in verdict_policy.py that deterministically selects the verdict and returns machine-readable metadata including needs_human, enforcing: for split verdicts (PASS+CONCERNS), set needs_human = True only when CONCERNS confidence is strictly < 0.85 and False otherwise.
    • Define the return type structure for the verdict policy API including verdict (verify: confirm completion in repo)
    • Define scope for: Define the return type structure for the verdict policy API including metadata fields (verify: confirm completion in repo)
    • Implement focused slice for: Define the return type structure for the verdict policy API including metadata fields (verify: confirm completion in repo)
    • Validate focused slice for: Define the return type structure for the verdict policy API including metadata fields (verify: confirm completion in repo)
    • Implement the deterministic verdict selection logic that handles split verdicts consistently (verify: confirm completion in repo)
    • Define scope for: Add the needs_human calculation logic based on CONCERNS confidence threshold of 0.85 (verify: confirm completion in repo)
    • Implement focused slice for: Add the needs_human calculation logic based on CONCERNS confidence threshold of 0.85 (verify: confirm completion in repo)
    • Validate focused slice for: Add the needs_human calculation logic based on CONCERNS confidence threshold of 0.85 (verify: confirm completion in repo)
    • Define scope for: Define the confidence threshold constant in verdict_policy.py as the single source of truth (verify: confirm completion in repo)
    • Implement focused slice for: Define the confidence threshold constant in verdict_policy.py as the single source of truth (verify: confirm completion in repo)
    • Validate focused slice for: Define the confidence threshold constant in verdict_policy.py as the single source of truth (verify: confirm completion in repo)
  • Update the verification-to-PR pipeline code path to consume the verdict_policy.py API output (verdict + metadata) for both (a) verdict extraction in the workflow step and (b) follow-up classification, removing any secondary parsing/decision logic after policy evaluation.
    • Define scope for: Identify all locations in the codebase that perform verdict extraction or classification logic (verify: confirm completion in repo)
    • Implement focused slice for: Identify all locations in the codebase that perform verdict extraction or classification logic (verify: confirm completion in repo)
    • Validate focused slice for: Identify all locations in the codebase that perform verdict extraction or classification logic (verify: confirm completion in repo)
    • Define scope for: Update follow-up classification code to consume verdict_policy.py API instead of local logic (verify: confirm completion in repo)
    • Implement focused slice for: Update follow-up classification code to consume verdict_policy.py API instead of local logic (verify: confirm completion in repo)
    • Validate focused slice for: Update follow-up classification code to consume verdict_policy.py API instead of local logic (verify: confirm completion in repo)
    • Remove secondary parsing (verify: confirm completion in repo) decision logic from all non-policy files (verify: confirm completion in repo)
    • Document the workflow step integration requirements for manual implementation (verify: confirm completion in repo)
  • Refactor followup_issue_generator.py to remove duplicated/parallel verdict policy logic (including split-verdict resolution and confidence gating) and instead import and use the shared verdict_policy.py API, propagating needs_human unchanged into follow-up output.
  • Add integration tests that feed provider markdown/table inputs yielding split verdicts (PASS+CONCERNS) and assert both the workflow-level extraction path and follow-up generation path return identical deterministic verdict and needs_human for: (1) CONCERNS confidence >= 0.85, (2) CONCERNS confidence < 0.85, and (3) swapped provider table row order.
    • Define scope for: Create test fixtures with provider markdown tables containing split verdicts at various confidence levels (verify: confirm completion in repo)
    • Implement focused slice for: Create test fixtures with provider markdown tables containing split verdicts at various confidence levels (verify: confirm completion in repo)
    • Validate focused slice for: Create test fixtures with provider markdown tables containing split verdicts at various confidence levels (verify: confirm completion in repo)
    • Define scope for: Write integration test for split verdict with CONCERNS confidence greater than or equal to 0.85 (verify: confirm completion in repo)
    • Implement focused slice for: Write integration test for split verdict with CONCERNS confidence greater than or equal to 0.85 (verify: confirm completion in repo)
    • Validate focused slice for: Write integration test for split verdict with CONCERNS confidence greater than or equal to 0.85 (verify: confirm completion in repo)
    • Define scope for: Write integration test for split verdict with CONCERNS confidence strictly less than 0.85 (verify: confirm completion in repo)
    • Implement focused slice for: Write integration test for split verdict with CONCERNS confidence strictly less than 0.85 (verify: confirm completion in repo)
    • Validate focused slice for: Write integration test for split verdict with CONCERNS confidence strictly less than 0.85 (verify: confirm completion in repo)
    • Write integration test verifying row-order invariance by swapping provider table rows (verify: confirm completion in repo)
    • Define scope for: Write integration test for the workflow-level extraction path using the verdict policy API (verify: confirm completion in repo)
    • Implement focused slice for: Write integration test for the workflow-level extraction path using the verdict policy API (verify: confirm completion in repo)
    • Validate focused slice for: Write integration test for the workflow-level extraction path using the verdict policy API (verify: confirm completion in repo)
    • Define scope for: Write integration test for the follow-up generation path using the verdict policy API (verify: confirm completion in repo)
    • Implement focused slice for: Write integration test for the follow-up generation path using the verdict policy API (verify: confirm completion in repo)
    • Validate focused slice for: Write integration test for the follow-up generation path using the verdict policy API (verify: confirm completion in repo)

Acceptance criteria

  • .github/workflows/agents-verify-to-new-pr.yml invokes the deterministic verdict extraction API from verdict_policy.py (not a regex match()/first-row selection) and passes verdict and metadata (including needs_human) to downstream steps via step outputs or environment variables.
  • verdict_policy.py exposes a single importable function (or small API surface) that returns a deterministic verdict and metadata including boolean needs_human, where needs_human == true only for split verdicts (PASS+CONCERNS) with CONCERNS confidence strictly < 0.85, and false otherwise.
  • The verification-to-PR pipeline uses the exact same verdict_policy.py API for both (a) extracting verdict/metadata from provider output and (b) driving follow-up classification logic, with no secondary parsing/decision logic applied after policy evaluation.
  • followup_issue_generator.py imports and uses the shared verdict policy from verdict_policy.py and contains no duplicated/parallel logic for split-verdict resolution or the 0.85 CONCERNS confidence gating; follow-up output includes the policy-provided needs_human flag (or equivalent field) unchanged.
  • Integration tests simulate provider markdown/table inputs that yield split verdicts (PASS + CONCERNS) and verify deterministic verdict extraction is invariant to provider table row order (swapping row order produces identical verdict and needs_human).
  • Integration test case A (PASS+CONCERNS with CONCERNS confidence >= 0.85) asserts needs_human == false and asserts workflow-level extraction and follow-up generation produce identical deterministic verdict and metadata (including identical needs_human).
  • Integration test case B (PASS+CONCERNS with CONCERNS confidence < 0.85) asserts needs_human == true and asserts workflow-level extraction and follow-up generation produce identical deterministic verdict and metadata (including identical needs_human).
  • Integration tests enforce the strict boundary: exactly 0.85 CONCERNS confidence in a split verdict yields needs_human == false, and any value strictly below 0.85 yields needs_human == true.
  • No file outside verdict_policy.py contains hard-coded references to the split-verdict confidence threshold (0.85) for gating decisions (excluding tests/fixtures); the threshold is defined/owned by verdict_policy.py and consumed via its API.

Head SHA: 7ab056d
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run
CI Autofix Loop ✅ success View run
Gate ✅ success View run
Health 40 Sweep ✅ success View run
Health 44 Gate Branch Protection ❔ in progress View run
Health 45 Agents Guard ✅ success View run
Health 50 Security Scan ✅ success View run
Health 72 Template Sync ✅ success View run
Health 73 Template Completeness ✅ success View run
Health 74 Template Drift ✅ success View run
Maint 52 Validate Workflows ✅ success View run
PR 11 - Minimal invariant CI ✅ success View run
Selftest CI ✅ success View run
Validate Sync Manifest ✅ success View run

…te.js

PR #1426 resolved custom glob removal in bot-comment-dismiss.js but
introduced minimatch-lite.js (a copy of the old custom code) used by
merge_manager.js and pr-context-graphql.js. This created inconsistency
(one script using vendored minimatch, two using custom lite version)
and a consumer repo breakage risk since minimatch-lite.js was not in
the templates directory or sync manifest.

Changes:
- merge_manager.js: require('./minimatch-lite.js') → require('minimatch')
- pr-context-graphql.js: require('./minimatch-lite.js') → require('minimatch')
- Delete .github/scripts/minimatch-lite.js (176-line custom reimplementation)
- Update template copies to match

All 767 JS tests pass. Addresses verify:compare FAIL from issue #1425.
Wire the deterministic verdict policy into the agents-verify-to-new-pr
workflow so the workflow and followup_issue_generator.py share the same
verdict_policy.py API.

Changes:
- Add 'extract-verdict' step that runs verdict_extract.py with --policy
  worst --emit github to produce structured step outputs (verdict,
  needs_human, needs_human_reason, etc.)
- Add 'verdict-needs-human' step that labels the PR and linked issue
  with needs-human when the policy flags a split verdict with low
  confidence concerns
- Update fallback step to prefer EXTRACTED_VERDICT over regex parsing
- Gate follow-up issue generation on verdict-needs-human not having
  triggered (avoids auto-creating issues for PRs that need human
  review)
- Sync template copy

Closes #1427
Copilot AI review requested due to automatic review settings February 10, 2026 07:11
@stranske stranske temporarily deployed to agent-high-privilege February 10, 2026 07:12 — with GitHub Actions Inactive
@stranske-keepalive
Copy link
Copy Markdown
Contributor

stranske-keepalive bot commented Feb 10, 2026

Automated Status Summary

Head SHA: 8c3fc94
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / guard
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 93.12%
Baseline 85.00%
Delta +8.12%
Minimum 70.00%
Status ✅ Pass

Top Coverage Hotspots (lowest coverage)

File Coverage Missing
src/cli_parser.py 81.8% 4
src/percentile_calculator.py 95.0% 1
src/aggregator.py 95.0% 2
src/__init__.py 100.0% 0
src/ndjson_parser.py 100.0% 0

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

PR #1419 addressed issue #1416, but verification raised concerns (CONCERNS) due to non-deterministic verdict selection and inconsistent split-verdict handling across the workflow and follow-up generation. This follow-up closes the remaining gaps by routing all verdict extraction/classification through the shared deterministic policy in verdict_policy.py, adding a machine-readable needs_human gate for low-confidence split verdicts, and validating end-to-end behavior with integration tests.

Context for Agent

Related Issues/PRs

Context for Agent

Related Issues/PRs

Tasks

  • Update .github/workflows/agents-verify-to-new-pr.yml to run a Python entrypoint/module that imports verdict_policy.py and emits structured step outputs (at minimum: verdict, needs_human, and any required metadata) instead of using first-table-row regex match() parsing.
    • Create a Python entrypoint module that imports verdict_policy (verify: confirm completion in repo) parses provider markdown input (verify: confirm completion in repo)
    • Define the structured output format including verdict (verify: formatter passes)
    • Define the structured output format including needs_human (verify: formatter passes)
    • Define the structured output format including and metadata fields (verify: formatter passes)
    • Define scope for: Emit the structured outputs in a format consumable by GitHub Actions step outputs (verify: formatter passes)
    • Implement focused slice for: Emit the structured outputs in a format consumable by GitHub Actions step outputs (verify: formatter passes)
    • Validate focused slice for: Emit the structured outputs in a format consumable by GitHub Actions step outputs (verify: formatter passes)
    • Document the workflow integration requirements for manual implementation by a maintainer (verify: confirm completion in repo)
  • Implement or update a single importable API in verdict_policy.py that deterministically selects the verdict and returns machine-readable metadata including needs_human, enforcing: for split verdicts (PASS+CONCERNS), set needs_human = True only when CONCERNS confidence is strictly < 0.85 and False otherwise.
    • Define the return type structure for the verdict policy API including verdict (verify: confirm completion in repo)
    • Define scope for: Define the return type structure for the verdict policy API including metadata fields (verify: confirm completion in repo)
    • Implement focused slice for: Define the return type structure for the verdict policy API including metadata fields (verify: confirm completion in repo)
    • Validate focused slice for: Define the return type structure for the verdict policy API including metadata fields (verify: confirm completion in repo)
    • Implement the deterministic verdict selection logic that handles split verdicts consistently (verify: confirm completion in repo)
    • Define scope for: Add the needs_human calculation logic based on CONCERNS confidence threshold of 0.85 (verify: confirm completion in repo)
    • Implement focused slice for: Add the needs_human calculation logic based on CONCERNS confidence threshold of 0.85 (verify: confirm completion in repo)
    • Validate focused slice for: Add the needs_human calculation logic based on CONCERNS confidence threshold of 0.85 (verify: confirm completion in repo)
    • Define scope for: Define the confidence threshold constant in verdict_policy.py as the single source of truth (verify: confirm completion in repo)
    • Implement focused slice for: Define the confidence threshold constant in verdict_policy.py as the single source of truth (verify: confirm completion in repo)
    • Validate focused slice for: Define the confidence threshold constant in verdict_policy.py as the single source of truth (verify: confirm completion in repo)
  • Update the verification-to-PR pipeline code path to consume the verdict_policy.py API output (verdict + metadata) for both (a) verdict extraction in the workflow step and (b) follow-up classification, removing any secondary parsing/decision logic after policy evaluation.
    • Define scope for: Identify all locations in the codebase that perform verdict extraction or classification logic (verify: confirm completion in repo)
    • Implement focused slice for: Identify all locations in the codebase that perform verdict extraction or classification logic (verify: confirm completion in repo)
    • Validate focused slice for: Identify all locations in the codebase that perform verdict extraction or classification logic (verify: confirm completion in repo)
    • Define scope for: Update follow-up classification code to consume verdict_policy.py API instead of local logic (verify: confirm completion in repo)
    • Implement focused slice for: Update follow-up classification code to consume verdict_policy.py API instead of local logic (verify: confirm completion in repo)
    • Validate focused slice for: Update follow-up classification code to consume verdict_policy.py API instead of local logic (verify: confirm completion in repo)
    • Remove secondary parsing (verify: confirm completion in repo) decision logic from all non-policy files (verify: confirm completion in repo)
    • Document the workflow step integration requirements for manual implementation (verify: confirm completion in repo)
  • Refactor followup_issue_generator.py to remove duplicated/parallel verdict policy logic (including split-verdict resolution and confidence gating) and instead import and use the shared verdict_policy.py API, propagating needs_human unchanged into follow-up output.
  • Add integration tests that feed provider markdown/table inputs yielding split verdicts (PASS+CONCERNS) and assert both the workflow-level extraction path and follow-up generation path return identical deterministic verdict and needs_human for: (1) CONCERNS confidence >= 0.85, (2) CONCERNS confidence < 0.85, and (3) swapped provider table row order.
    • Define scope for: Create test fixtures with provider markdown tables containing split verdicts at various confidence levels (verify: confirm completion in repo)
    • Implement focused slice for: Create test fixtures with provider markdown tables containing split verdicts at various confidence levels (verify: confirm completion in repo)
    • Validate focused slice for: Create test fixtures with provider markdown tables containing split verdicts at various confidence levels (verify: confirm completion in repo)
    • Define scope for: Write integration test for split verdict with CONCERNS confidence greater than or equal to 0.85 (verify: confirm completion in repo)
    • Implement focused slice for: Write integration test for split verdict with CONCERNS confidence greater than or equal to 0.85 (verify: confirm completion in repo)
    • Validate focused slice for: Write integration test for split verdict with CONCERNS confidence greater than or equal to 0.85 (verify: confirm completion in repo)
    • Define scope for: Write integration test for split verdict with CONCERNS confidence strictly less than 0.85 (verify: confirm completion in repo)
    • Implement focused slice for: Write integration test for split verdict with CONCERNS confidence strictly less than 0.85 (verify: confirm completion in repo)
    • Validate focused slice for: Write integration test for split verdict with CONCERNS confidence strictly less than 0.85 (verify: confirm completion in repo)
    • Write integration test verifying row-order invariance by swapping provider table rows (verify: confirm completion in repo)
    • Define scope for: Write integration test for the workflow-level extraction path using the verdict policy API (verify: confirm completion in repo)
    • Implement focused slice for: Write integration test for the workflow-level extraction path using the verdict policy API (verify: confirm completion in repo)
    • Validate focused slice for: Write integration test for the workflow-level extraction path using the verdict policy API (verify: confirm completion in repo)
    • Define scope for: Write integration test for the follow-up generation path using the verdict policy API (verify: confirm completion in repo)
    • Implement focused slice for: Write integration test for the follow-up generation path using the verdict policy API (verify: confirm completion in repo)
    • Validate focused slice for: Write integration test for the follow-up generation path using the verdict policy API (verify: confirm completion in repo)

Acceptance criteria

  • .github/workflows/agents-verify-to-new-pr.yml invokes the deterministic verdict extraction API from verdict_policy.py (not a regex match()/first-row selection) and passes verdict and metadata (including needs_human) to downstream steps via step outputs or environment variables.
  • verdict_policy.py exposes a single importable function (or small API surface) that returns a deterministic verdict and metadata including boolean needs_human, where needs_human == true only for split verdicts (PASS+CONCERNS) with CONCERNS confidence strictly < 0.85, and false otherwise.
  • The verification-to-PR pipeline uses the exact same verdict_policy.py API for both (a) extracting verdict/metadata from provider output and (b) driving follow-up classification logic, with no secondary parsing/decision logic applied after policy evaluation.
  • followup_issue_generator.py imports and uses the shared verdict policy from verdict_policy.py and contains no duplicated/parallel logic for split-verdict resolution or the 0.85 CONCERNS confidence gating; follow-up output includes the policy-provided needs_human flag (or equivalent field) unchanged.
  • Integration tests simulate provider markdown/table inputs that yield split verdicts (PASS + CONCERNS) and verify deterministic verdict extraction is invariant to provider table row order (swapping row order produces identical verdict and needs_human).
  • Integration test case A (PASS+CONCERNS with CONCERNS confidence >= 0.85) asserts needs_human == false and asserts workflow-level extraction and follow-up generation produce identical deterministic verdict and metadata (including identical needs_human).
  • Integration test case B (PASS+CONCERNS with CONCERNS confidence < 0.85) asserts needs_human == true and asserts workflow-level extraction and follow-up generation produce identical deterministic verdict and metadata (including identical needs_human).
  • Integration tests enforce the strict boundary: exactly 0.85 CONCERNS confidence in a split verdict yields needs_human == false, and any value strictly below 0.85 yields needs_human == true.
  • No file outside verdict_policy.py contains hard-coded references to the split-verdict confidence threshold (0.85) for gating decisions (excluding tests/fixtures); the threshold is defined/owned by verdict_policy.py and consumed via its API.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd372895a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@stranske-keepalive
Copy link
Copy Markdown
Contributor

stranske-keepalive bot commented Feb 10, 2026

🤖 Keepalive Loop Status

PR #1434 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Disposition skipped (transient)
Gate success
Tasks 14/57 complete
Timeout 45 min (default)
Timeout usage 3m elapsed (8%, 42m remaining)
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Integrates the deterministic verdict extraction/policy into the verify-to-new-PR workflow so verdict selection and needs_human gating are produced by scripts.langchain.verdict_extract rather than ad-hoc regex parsing, and keeps the consumer template workflow in sync.

Changes:

  • Adds a workflow step to run python -m scripts.langchain.verdict_extract ... --emit github and consume its structured outputs (verdict + needs_human metadata).
  • Adds a needs-human labeling/commenting step and gates follow-up issue creation paths based on that signal.
  • Replaces the lightweight minimatch implementation with the vendored minimatch package in a couple of shared scripts and removes the old implementation.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
.github/workflows/agents-verify-to-new-pr.yml Adds verdict extraction step + needs_human handling and updates fallback/gating logic.
templates/consumer-repo/.github/workflows/agents-verify-to-new-pr.yml Mirrors the workflow changes for consumer repos.
.github/scripts/pr-context-graphql.js Switches glob matching dependency to minimatch.
templates/consumer-repo/.github/scripts/pr-context-graphql.js Same dependency switch in the consumer template copy.
.github/scripts/merge_manager.js Switches glob matching dependency to minimatch.
templates/consumer-repo/.github/scripts/merge_manager.js Same dependency switch in the consumer template copy.
.github/scripts/minimatch-lite.js Removes the now-unused custom minimatch-lite implementation.

Comment on lines +4 to 5
const { minimatch } = require('minimatch');

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a runtime dependency on the vendored minimatch package in consumer repos. However, the template’s vendored templates/consumer-repo/.github/scripts/node_modules/ currently contains minimatch/ but not its required transitive dependency brace-expansion/, so require('minimatch') will throw at runtime. Either vendor brace-expansion (and any other transitive deps) alongside minimatch in the template (matching .github/scripts/node_modules), or keep using the dependency-free implementation.

Suggested change
const { minimatch } = require('minimatch');
// Optional dependency on `minimatch`. In consumer templates the vendored
// `minimatch` may be present without its transitive dependencies
// (e.g. `brace-expansion`), so requiring it can throw at runtime. We try to
// load it when available and fall back to a minimal, dependency-free matcher.
let _minimatchImpl;
try {
const maybeMinimatch = require('minimatch');
_minimatchImpl = maybeMinimatch.minimatch || maybeMinimatch;
} catch {
_minimatchImpl = null;
}
function minimatch(path, pattern, options) {
if (_minimatchImpl) {
return _minimatchImpl(path, pattern, options);
}
// Very small glob implementation that supports `*` wildcards.
const escapedPattern = String(pattern)
.replace(/[.+^${}()|[\]\\]/g, '\\$&')
.replace(/\*/g, '.*');
const regex = new RegExp('^' + escapedPattern + '$');
return regex.test(path);
}

Copilot uses AI. Check for mistakes.
(client || github).rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: parseInt(linkedIssue),
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseInt(linkedIssue) should specify a radix (base 10) to avoid edge-case parsing differences and to match other parsing in this workflow (which uses parseInt(..., 10)).

Suggested change
issue_number: parseInt(linkedIssue),
issue_number: parseInt(linkedIssue, 10),

Copilot uses AI. Check for mistakes.
steps.check-merged.outputs.merged == 'true' &&
steps.chain-check.outputs.exceeded != 'true'
steps.chain-check.outputs.exceeded != 'true' &&
steps.verdict-needs-human.outcome != 'success'
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The follow-up generation is gated on steps.verdict-needs-human.outcome != 'success'. If the needs-human labeling/comment step runs (because needs_human == 'true') but fails, its outcome becomes failure, which will incorrectly allow follow-up issue generation to proceed. Gate on steps.extract-verdict.outputs.needs_human != 'true' (and/or set continue-on-error: true on the labeling step) so needs_human reliably suppresses automation even when the labeling step errors.

Suggested change
steps.verdict-needs-human.outcome != 'success'
steps.extract-verdict.outputs.needs_human != 'true'

Copilot uses AI. Check for mistakes.
When verdict-needs-human triggers, the create-issue step is skipped but
downstream steps (persist metadata, upload artifact, comment on PR) were
still running with empty issue fields. This caused broken artifact JSON
and blank PR comments, plus the bridge workflow would fail on missing
issue_number.

Add steps.verdict-needs-human.outcome != 'success' condition to:
- Persist follow-up issue metadata
- Upload follow-up issue metadata
- Comment on original PR

Addresses review feedback on PR #1434.
@stranske stranske temporarily deployed to agent-high-privilege February 10, 2026 07:27 — with GitHub Actions Inactive
…tcome

Using steps.verdict-needs-human.outcome != 'success' is fragile: if the
needs-human labeling step runs but fails (API error, rate limit), outcome
becomes 'failure' not 'success', so the gate passes and follow-up
generation proceeds despite needs_human being true.

Switch all 6 gate conditions to steps.extract-verdict.outputs.needs_human
!= 'true' which is derived directly from the deterministic verdict policy
and is unaffected by whether the labeling/commenting step succeeded.

Also add radix 10 to both parseInt(linkedIssue) calls for consistency
with the rest of the workflow.

Addresses Copilot review feedback on PR #1434.
@stranske stranske temporarily deployed to agent-high-privilege February 10, 2026 07:33 — with GitHub Actions Inactive
@stranske stranske merged commit 3582350 into main Feb 10, 2026
137 checks passed
@stranske stranske deleted the fix/verdict-policy-workflow-integration branch February 10, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants