Feat/consumer agents verifier by stranske · Pull Request #196 · stranske/Workflows

stranske · 2025-12-26T20:12:41Z

No description provided.

- Create reusable-agents-verifier.yml that consumer repos can call - Add agents-verifier.yml thin caller to consumer template - Uses dual checkout pattern to get scripts from Workflows repo The verifier runs post-merge on PRs to check if acceptance criteria were met. If not, it opens a follow-up issue in the consumer repo.

Add new workflow to: - EXPECTED_NAMES in test_workflow_naming.py - Reusable workflow table in WORKFLOWS.md - Prose documentation in WORKFLOWS.md - Primary workflows list in WORKFLOW_SYSTEM.md

- Add continue-on-error: true to Run verifier step so subsequent parsing and issue creation steps run even if Codex crashes - Detect Codex failures and set verdict='error' to create follow-up issues for infrastructure failures, not just criteria failures - Fix checks_run counting to use checkbox syntax (- [x]) instead of counting all bullet points, matching the intended behavior - Update count_checkboxes regex to match both dash (-) and asterisk (*) list markers for consistency with the JavaScript implementation Addresses comments from copilot-pull-request-reviewer.

The verifier was running and potentially creating follow-up issues even when there were no acceptance criteria to verify. Now it skips with a notice when acceptanceCount == 0, avoiding unnecessary issue creation.

Copilot

Pull request overview

This PR introduces a post-merge verification system that validates whether merged PRs meet their documented acceptance criteria. The verifier runs automatically when PRs are merged, waits for CI workflows to complete, uses Codex to verify acceptance criteria, and creates follow-up issues when criteria are not met.

Key changes:

Added a reusable workflow for verifying acceptance criteria post-merge with CI wait logic and follow-up issue creation
Implemented skip logic to prevent verifier runs when no acceptance criteria are found
Integrated verifier metrics collection and reporting for tracking verification outcomes

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`.github/workflows/reusable-agents-verifier.yml`	New reusable workflow implementing post-merge verification with CI polling, Codex execution, verdict parsing, and issue creation logic
`.github/scripts/agents_verifier_context.js`	Added skip logic to prevent verifier execution when acceptance criteria count is zero
`templates/consumer-repo/.github/workflows/agents-verifier.yml`	Template workflow for consumer repos to trigger the reusable verifier on merged PRs
`tests/workflows/test_workflow_naming.py`	Added expected workflow name mappings for the new verifier workflows
`docs/ci/WORKFLOW_SYSTEM.md`	Updated documentation to include reusable-agents-verifier.yml in the workflow topology
`docs/ci/WORKFLOWS.md`	Added documentation entries describing the verifier workflow's purpose and functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-26T20:16:56Z

.github/workflows/reusable-agents-verifier.yml

+            if grep -qiE 'verdict:[[:space:]]*fail' "codex-output.md"; then
+              verdict="fail"
+            elif grep -qiE 'verdict:[[:space:]]*pass' "codex-output.md"; then


The regex pattern for detecting the verdict uses case-insensitive matching but doesn't account for potential variations in whitespace or formatting that Codex might produce. Consider making the pattern more flexible by allowing multiple spaces or other whitespace characters between 'verdict:' and the verdict value to improve robustness.

Suggested change

if grep -qiE 'verdict:[[:space:]]*fail' "codex-output.md"; then

verdict="fail"

elif grep -qiE 'verdict:[[:space:]]*pass' "codex-output.md"; then

if grep -qiE 'verdict[[:space:]]*:[[:space:]]*fail' "codex-output.md"; then

verdict="fail"

elif grep -qiE 'verdict[[:space:]]*:[[:space:]]*pass' "codex-output.md"; then

Copilot · 2025-12-26T20:16:57Z

.github/workflows/reusable-agents-verifier.yml

+            const rawPrNumber = Number('${{ steps.context.outputs.pr_number }}');
+            const prNumber = !Number.isNaN(rawPrNumber) && rawPrNumber > 0 ? rawPrNumber : null;


When converting PR_NUMBER environment variable to a number, the code doesn't handle the case where the value might be an empty string or other non-numeric value before passing it to Number(). This could result in prNumber being NaN rather than null. Consider adding validation or using a more robust conversion approach.

Copilot · 2025-12-26T20:16:57Z

.github/workflows/reusable-agents-verifier.yml

+              return sum(1 for line in text.splitlines() if re.match(r"^\s*[-*]\s+\[[ xX]\]", line))
+
+          should_run = (os.environ.get("SHOULD_RUN") or "").lower() == "true"
+          pr_number = int(os.environ.get("PR_NUMBER") or 0)


The Python script for collecting metrics converts PR_NUMBER to an integer without proper error handling. If the environment variable contains an empty string or invalid value, int() will raise a ValueError. Consider using a try-except block or providing a default value for safer conversion.

Suggested change

pr_number = int(os.environ.get("PR_NUMBER") or 0)

pr_number_raw = os.environ.get("PR_NUMBER")

try:

pr_number = int(pr_number_raw) if pr_number_raw else 0

except (TypeError, ValueError):

pr_number = 0

Copilot · 2025-12-26T20:16:57Z

.github/workflows/reusable-agents-verifier.yml

+            if (verdict === 'error') {
+              // Check if formatErrorIssue exists, otherwise use fallback
+              if (typeof formatErrorIssue === 'function') {
+                const result = formatErrorIssue({ prNumber, prUrl, runUrl, issueNumbers });
+                const { data: issue } = await github.rest.issues.create({
+                  ...context.repo,
+                  title: result.title,
+                  body: result.body,
+                  labels: ['agent:codex', 'bug'],
+                });
+                core.setOutput('issue_number', issue?.number ? String(issue.number) : '');
+                core.info(`Created error follow-up issue #${issue.number}: ${result.title}`);
+                return;
+              }


The code checks if formatErrorIssue exists before calling it, but since formatErrorIssue is not exported from verifier_issue_formatter.js (only formatFollowUpIssue and formatSimpleFollowUpIssue are exported), this branch will never execute and will always fall through to the fallback. Consider either adding formatErrorIssue to the exports in verifier_issue_formatter.js or removing this conditional check and using only the fallback logic.

Copilot · 2025-12-26T20:16:57Z

.github/scripts/agents_verifier_context.js

+  // Skip verifier if there are no acceptance criteria to verify
+  if (acceptanceCount === 0) {
+    const skipReason = 'No acceptance criteria found in PR or linked issues; skipping verifier.';
+    core?.notice?.(skipReason);
+    core?.setOutput?.('should_run', 'false');
+    core?.setOutput?.('skip_reason', skipReason);
+    core?.setOutput?.('pr_number', String(pull.number || ''));
+    core?.setOutput?.('issue_numbers', JSON.stringify(issueNumbers));
+    core?.setOutput?.('pr_html_url', pull.html_url || '');
+    core?.setOutput?.('target_sha', targetSha);
+    core?.setOutput?.('context_path', contextPath);
+    core?.setOutput?.('acceptance_count', '0');
+    core?.setOutput?.('ci_results', JSON.stringify(ciResults));
+    return {
+      shouldRun: false,
+      reason: skipReason,
+      markdown,
+      contextPath,
+      issueNumbers,
+      targetSha,
+      acceptanceCount,
+      ciResults,
+    };
+  }


The new skip logic for when acceptanceCount is 0 lacks test coverage. The test file .github/scripts/tests/agents-verifier-context.test.js should include a test case that verifies the verifier is skipped when no acceptance criteria are found in the PR or linked issues. This is important since it's a new behavioral change affecting when the verifier runs.

Copilot · 2025-12-26T20:16:58Z

templates/consumer-repo/.github/workflows/agents-verifier.yml

+  verifier:
+    # Only run on merged PRs
+    if: github.event.pull_request.merged == true
+    uses: stranske/Workflows/.github/workflows/reusable-agents-verifier.yml@main


The reference 'stranske/Workflows' appears to be a hardcoded repository reference. In a template file designed for consumer repos, this might need to be updated to reflect the actual workflows repository used by the organization. Consider using a placeholder or documenting that consumers need to update this reference to match their setup.

Suggested change

uses: stranske/Workflows/.github/workflows/reusable-agents-verifier.yml@main

# NOTE: Update <ORG>/<WORKFLOWS_REPO> to the organization and repository that host your reusable workflows.

uses: <ORG>/<WORKFLOWS_REPO>/.github/workflows/reusable-agents-verifier.yml@main

Copilot · 2025-12-26T20:16:58Z

.github/workflows/reusable-agents-verifier.yml

+            while (Date.now() - startTime < maxWaitMs) {
+              let allComplete = true;
+              let anyFound = false;
+
+              for (const workflowFile of ciWorkflows) {
+                try {
+                  const { data: runs } = await github.rest.actions.listWorkflowRuns({
+                    ...context.repo,
+                    workflow_id: workflowFile,
+                    head_sha: targetSha,
+                    per_page: 1,
+                  });
+
+                  if (runs.workflow_runs.length > 0) {
+                    anyFound = true;
+                    const run = runs.workflow_runs[0];
+                    core.info(`${workflowFile}: status=${run.status}, conclusion=${run.conclusion || 'pending'}`);
+                    if (run.status !== 'completed') {
+                      allComplete = false;
+                    }
+                  }
+                } catch (err) {
+                  core.warning(`Failed to check ${workflowFile}: ${err.message}`);
+                }
+              }
+
+              if (anyFound && allComplete) {
+                core.info('All CI workflows have completed.');
+                return;
+              }
+
+              core.info(`Waiting ${pollIntervalMs / 1000}s for CI to complete...`);
+              await new Promise(resolve => setTimeout(resolve, pollIntervalMs));
+            }
+
+            core.warning(`CI wait timed out after ${maxWaitMs / 1000}s.`);


The CI wait logic does not check if workflows were found at all before timing out. If none of the specified workflows exist or match the SHA, the loop will wait the full timeout period before continuing. Consider tracking whether any workflows were found during polling and providing a more specific warning message if no matching workflows are discovered within a reasonable timeframe.

stranske · 2025-12-26T20:20:10Z

Closing as superseded by merged PRs #192, #193, #194. The bot review comments identified some valid issues that will be addressed in a separate PR:

✅ Verdict regex - Already fixed in main (uses [[:space:]]*)
⚠️ JS Number() conversion - Minor, already has NaN check
⚠️ Python int() - Low risk with or 0 fallback
🔧 formatErrorIssue not exported - Valid, dead code needs cleanup
🔧 Skip logic test coverage - Valid, needs test
❌ Hardcoded repo reference - Invalid, template is designed for Workflows repo
⚠️ CI wait tracking - Low priority edge case

Will fix #4 and #5 in a follow-up PR.

Address code review feedback from PR #196: 1. Remove formatErrorIssue dead code - function was never exported from verifier_issue_formatter.js, so the import was undefined and the conditional branch never executed. Simplified to just use fallback. 2. Add test coverage for acceptance criteria skip logic - new tests verify verifier skips when no acceptance content exists and runs when acceptance is in a linked issue.

Created 3 test issues: - #193: Stripe integration (should FAIL capability check) - #194: Health monitoring (should trigger task decomposition) - #196: Manager list API (should detect as duplicate of #133) Updated testing metrics dashboard to track progress.

* Add Phase 3 integration plan with testing cycle for Manager-Database Phase 3: Pre-Agent Intelligence (4 capabilities) - 3A: Capability Check - supplements agents:optimize with feasibility gate - Runs BEFORE agent assignment on Issues (not after) - Adds needs-human label when agent cannot proceed - 3B: Task Decomposition - auto-split large issues - 3C: Duplicate Detection - comment-only mode, track false positives - 3D: Semantic Labeling - auto-suggest/apply labels Testing Plan: - Test repo: Manager-Database - ~11 test issues across 4 capabilities - False positive tracking for dedup (target: <5%) - Metrics dashboard for validation Also updates: - Mark Collab-Admin PR #113 as merged (7/7 repos now synced) - All immediate tasks completed - Phase 3 ready to begin * Add Phase 3 test issues for Manager-Database Created 3 test issues: - #193: Stripe integration (should FAIL capability check) - #194: Health monitoring (should trigger task decomposition) - #196: Manager list API (should detect as duplicate of #133) Updated testing metrics dashboard to track progress. * Add Phase 4: Full Automation & Cleanup plan Phase 4 includes 5 initiatives: - 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos - 4B: User Guide - Operational documentation for label system (sync to consumers) - 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation - 4D: Conflict Resolution - Automated merge conflict handling in keepalive - 4E: Verify-to-Issue - Create follow-up issues from verification feedback Key decisions: - Auto-pilot uses workflow_dispatch between steps (not chained labels) - Conflict detection added to keepalive loop (not separate workflow) - Verify-to-issue is user-triggered (not automatic, avoids false positives) Also identifies 7 additional automation opportunities for future phases. Testing plan defined for Manager-Database. * Correct label analysis after codebase search + expand Phase 5 Label Analysis Corrections: - agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js) - agents:activated IS functional (agents_pr_meta_keepalive.js) - from:codex/copilot ARE functional (merge_manager.js) - automerge IS functional (merge_manager.js, agents_belt_scan.js) - agents (bare) IS functional (agent_task.yml template) - risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates) Only 5-6 labels confirmed as bloat: - codex (bare) - redundant with agent:codex - ai:agent - zero matches - auto-merge-audit - zero matches - automerge:ok - zero matches - architecture, backend, cli, etc. - repo-specific, not synced Phase 5 Analysis: - 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow - 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check - 5C: Stale PR cleanup - not needed - 5D: Dependabot - partial (auto-label exists, add auto-merge) - 5E: Issue lint - soft warning approach - 5F: Cross-repo linking - weekly scan with semantic_matcher.py - 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow) * Consolidate agents:pause to agents:paused + expand Phase 4-5 plans Label consolidation: - Replace agents:pause with agents:paused in all source files - Update keepalive_gate.js PAUSE_LABEL constant - Update keepalive_orchestrator_gate_runner.js hardcoded check - Update test to use agents:paused - Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md Phase 4 updates: - 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit) - 4B: Add optional issue creation feature to user guide (deferred) - 4D: Full conflict resolution implementation with code examples - 4E: Complete verify-to-issue workflow implementation Phase 5 updates: - 5F: Marked as SKIPPED (not needed per user decision) - 5G: Full LangSmith integration plan + custom metrics All keepalive tests pass (8/8).

* Add Phase 3 integration plan with testing cycle for Manager-Database Phase 3: Pre-Agent Intelligence (4 capabilities) - 3A: Capability Check - supplements agents:optimize with feasibility gate - Runs BEFORE agent assignment on Issues (not after) - Adds needs-human label when agent cannot proceed - 3B: Task Decomposition - auto-split large issues - 3C: Duplicate Detection - comment-only mode, track false positives - 3D: Semantic Labeling - auto-suggest/apply labels Testing Plan: - Test repo: Manager-Database - ~11 test issues across 4 capabilities - False positive tracking for dedup (target: <5%) - Metrics dashboard for validation Also updates: - Mark Collab-Admin PR #113 as merged (7/7 repos now synced) - All immediate tasks completed - Phase 3 ready to begin * Add Phase 3 test issues for Manager-Database Created 3 test issues: - #193: Stripe integration (should FAIL capability check) - #194: Health monitoring (should trigger task decomposition) - #196: Manager list API (should detect as duplicate of #133) Updated testing metrics dashboard to track progress. * Add Phase 4: Full Automation & Cleanup plan Phase 4 includes 5 initiatives: - 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos - 4B: User Guide - Operational documentation for label system (sync to consumers) - 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation - 4D: Conflict Resolution - Automated merge conflict handling in keepalive - 4E: Verify-to-Issue - Create follow-up issues from verification feedback Key decisions: - Auto-pilot uses workflow_dispatch between steps (not chained labels) - Conflict detection added to keepalive loop (not separate workflow) - Verify-to-issue is user-triggered (not automatic, avoids false positives) Also identifies 7 additional automation opportunities for future phases. Testing plan defined for Manager-Database. * Correct label analysis after codebase search + expand Phase 5 Label Analysis Corrections: - agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js) - agents:activated IS functional (agents_pr_meta_keepalive.js) - from:codex/copilot ARE functional (merge_manager.js) - automerge IS functional (merge_manager.js, agents_belt_scan.js) - agents (bare) IS functional (agent_task.yml template) - risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates) Only 5-6 labels confirmed as bloat: - codex (bare) - redundant with agent:codex - ai:agent - zero matches - auto-merge-audit - zero matches - automerge:ok - zero matches - architecture, backend, cli, etc. - repo-specific, not synced Phase 5 Analysis: - 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow - 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check - 5C: Stale PR cleanup - not needed - 5D: Dependabot - partial (auto-label exists, add auto-merge) - 5E: Issue lint - soft warning approach - 5F: Cross-repo linking - weekly scan with semantic_matcher.py - 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow) * Consolidate agents:pause to agents:paused + expand Phase 4-5 plans Label consolidation: - Replace agents:pause with agents:paused in all source files - Update keepalive_gate.js PAUSE_LABEL constant - Update keepalive_orchestrator_gate_runner.js hardcoded check - Update test to use agents:paused - Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md Phase 4 updates: - 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit) - 4B: Add optional issue creation feature to user guide (deferred) - 4D: Full conflict resolution implementation with code examples - 4E: Complete verify-to-issue workflow implementation Phase 5 updates: - 5F: Marked as SKIPPED (not needed per user decision) - 5G: Full LangSmith integration plan + custom metrics All keepalive tests pass (8/8). * feat: Implement Phase 4-5 automation features Phase 4 implementations: - 4A: Add scripts/cleanup_labels.py for label auditing - Classifies labels as functional/bloat/idiosyncratic - Requires --confirm flag for actual deletion - Reports audit results with recommendations - 4D: Add conflict detection for keepalive pipeline - .github/scripts/conflict_detector.js module - Detects conflicts from GitHub API, CI logs, PR comments - templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md - 4E: Add agents-verify-to-issue.yml workflow - Creates follow-up issues from verification feedback - User-triggered via verify:create-issue label - Extracts concerns and low scores automatically Phase 5 implementations: - 5A: Add agents-auto-label.yml workflow - Semantic label matching for new issues - 90% threshold for auto-apply, 75% for suggestions - Uses existing label_matcher.py script - 5G: Add LangSmith tracing to tools/llm_provider.py - _setup_langsmith_tracing() function - Auto-configures when LANGSMITH_API_KEY present Also: - Update .github/sync-manifest.yml with new sync entries - Update docs/LABELS.md with new label documentation * fix: Address Copilot review comments on PR #650 Code quality improvements based on automated code review: 1. tools/llm_provider.py: - Fix LangSmith API key env var (LANGSMITH_API_KEY vs LANGCHAIN_API_KEY) - Improve f-string formatting for logging - Add usage comment for LANGSMITH_ENABLED constant 2. .github/scripts/conflict_detector.js: - Add debug logging in catch blocks instead of silent failures - Makes debugging easier when log downloads fail 3. .github/workflows/agents-verify-to-issue.yml: - Replace /tmp file usage with GitHub Actions environment files - Use heredoc delimiter for multi-line output - Consolidate find and extract steps for cleaner flow 4. .github/workflows/agents-auto-label.yml: - Make Workflows repo checkout configurable (not hardcoded) - Use github.paginate() for label retrieval (handles >100 labels) 5. templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md: - Replace hardcoded 'main' with {{base_branch}} template variable - Make verification steps language-agnostic (not Python-specific) - Add note about checking project README for test commands * fix: Fix CI test failures 1. Fix test_integration_template_installs_and_tests - The test used --user pip install flag which fails in virtualenvs - Added _in_virtualenv() helper to detect virtualenv environment - Only use --user flag when NOT in a virtualenv 2. Add new workflows to expected names mapping - agents-auto-label.yml: 'Auto-Label Issues' - agents-verify-to-issue.yml: 'Create Issue from Verification' 3. Update workflow documentation - docs/ci/WORKFLOWS.md: Added bullet points for new workflows - docs/ci/WORKFLOW_SYSTEM.md: Added table rows for new workflows All 1120 tests now pass. * fix: Remove duplicate env key in agents-auto-label.yml actionlint was failing because the Match labels step had two env blocks. Merged ISSUE_TITLE and ISSUE_BODY into the main env block.

…#653) * Add Phase 3 integration plan with testing cycle for Manager-Database Phase 3: Pre-Agent Intelligence (4 capabilities) - 3A: Capability Check - supplements agents:optimize with feasibility gate - Runs BEFORE agent assignment on Issues (not after) - Adds needs-human label when agent cannot proceed - 3B: Task Decomposition - auto-split large issues - 3C: Duplicate Detection - comment-only mode, track false positives - 3D: Semantic Labeling - auto-suggest/apply labels Testing Plan: - Test repo: Manager-Database - ~11 test issues across 4 capabilities - False positive tracking for dedup (target: <5%) - Metrics dashboard for validation Also updates: - Mark Collab-Admin PR #113 as merged (7/7 repos now synced) - All immediate tasks completed - Phase 3 ready to begin * Add Phase 3 test issues for Manager-Database Created 3 test issues: - #193: Stripe integration (should FAIL capability check) - #194: Health monitoring (should trigger task decomposition) - #196: Manager list API (should detect as duplicate of #133) Updated testing metrics dashboard to track progress. * Add Phase 4: Full Automation & Cleanup plan Phase 4 includes 5 initiatives: - 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos - 4B: User Guide - Operational documentation for label system (sync to consumers) - 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation - 4D: Conflict Resolution - Automated merge conflict handling in keepalive - 4E: Verify-to-Issue - Create follow-up issues from verification feedback Key decisions: - Auto-pilot uses workflow_dispatch between steps (not chained labels) - Conflict detection added to keepalive loop (not separate workflow) - Verify-to-issue is user-triggered (not automatic, avoids false positives) Also identifies 7 additional automation opportunities for future phases. Testing plan defined for Manager-Database. * Correct label analysis after codebase search + expand Phase 5 Label Analysis Corrections: - agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js) - agents:activated IS functional (agents_pr_meta_keepalive.js) - from:codex/copilot ARE functional (merge_manager.js) - automerge IS functional (merge_manager.js, agents_belt_scan.js) - agents (bare) IS functional (agent_task.yml template) - risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates) Only 5-6 labels confirmed as bloat: - codex (bare) - redundant with agent:codex - ai:agent - zero matches - auto-merge-audit - zero matches - automerge:ok - zero matches - architecture, backend, cli, etc. - repo-specific, not synced Phase 5 Analysis: - 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow - 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check - 5C: Stale PR cleanup - not needed - 5D: Dependabot - partial (auto-label exists, add auto-merge) - 5E: Issue lint - soft warning approach - 5F: Cross-repo linking - weekly scan with semantic_matcher.py - 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow) * Consolidate agents:pause to agents:paused + expand Phase 4-5 plans Label consolidation: - Replace agents:pause with agents:paused in all source files - Update keepalive_gate.js PAUSE_LABEL constant - Update keepalive_orchestrator_gate_runner.js hardcoded check - Update test to use agents:paused - Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md Phase 4 updates: - 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit) - 4B: Add optional issue creation feature to user guide (deferred) - 4D: Full conflict resolution implementation with code examples - 4E: Complete verify-to-issue workflow implementation Phase 5 updates: - 5F: Marked as SKIPPED (not needed per user decision) - 5G: Full LangSmith integration plan + custom metrics All keepalive tests pass (8/8). * feat: Implement Phase 4-5 automation features Phase 4 implementations: - 4A: Add scripts/cleanup_labels.py for label auditing - Classifies labels as functional/bloat/idiosyncratic - Requires --confirm flag for actual deletion - Reports audit results with recommendations - 4D: Add conflict detection for keepalive pipeline - .github/scripts/conflict_detector.js module - Detects conflicts from GitHub API, CI logs, PR comments - templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md - 4E: Add agents-verify-to-issue.yml workflow - Creates follow-up issues from verification feedback - User-triggered via verify:create-issue label - Extracts concerns and low scores automatically Phase 5 implementations: - 5A: Add agents-auto-label.yml workflow - Semantic label matching for new issues - 90% threshold for auto-apply, 75% for suggestions - Uses existing label_matcher.py script - 5G: Add LangSmith tracing to tools/llm_provider.py - _setup_langsmith_tracing() function - Auto-configures when LANGSMITH_API_KEY present Also: - Update .github/sync-manifest.yml with new sync entries - Update docs/LABELS.md with new label documentation * fix: Address Copilot review comments on PR #650 Code quality improvements based on automated code review: 1. tools/llm_provider.py: - Fix LangSmith API key env var (LANGSMITH_API_KEY vs LANGCHAIN_API_KEY) - Improve f-string formatting for logging - Add usage comment for LANGSMITH_ENABLED constant 2. .github/scripts/conflict_detector.js: - Add debug logging in catch blocks instead of silent failures - Makes debugging easier when log downloads fail 3. .github/workflows/agents-verify-to-issue.yml: - Replace /tmp file usage with GitHub Actions environment files - Use heredoc delimiter for multi-line output - Consolidate find and extract steps for cleaner flow 4. .github/workflows/agents-auto-label.yml: - Make Workflows repo checkout configurable (not hardcoded) - Use github.paginate() for label retrieval (handles >100 labels) 5. templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md: - Replace hardcoded 'main' with {{base_branch}} template variable - Make verification steps language-agnostic (not Python-specific) - Add note about checking project README for test commands * fix: Fix CI test failures 1. Fix test_integration_template_installs_and_tests - The test used --user pip install flag which fails in virtualenvs - Added _in_virtualenv() helper to detect virtualenv environment - Only use --user flag when NOT in a virtualenv 2. Add new workflows to expected names mapping - agents-auto-label.yml: 'Auto-Label Issues' - agents-verify-to-issue.yml: 'Create Issue from Verification' 3. Update workflow documentation - docs/ci/WORKFLOWS.md: Added bullet points for new workflows - docs/ci/WORKFLOW_SYSTEM.md: Added table rows for new workflows All 1120 tests now pass. * fix: Remove duplicate env key in agents-auto-label.yml actionlint was failing because the Match labels step had two env blocks. Merged ISSUE_TITLE and ISSUE_BODY into the main env block. * feat: Add Phase 3 workflows and sync configuration Phase 3 Pre-Agent Intelligence workflows: - agents-capability-check.yml: Pre-flight agent feasibility gate - agents-decompose.yml: Task decomposition for large issues - agents-dedup.yml: Duplicate detection using embeddings - agents-auto-label.yml: Semantic label matching Also includes: - agents-verify-to-issue.yml: Create follow-up issues from verification (Phase 4E) - Updated sync-manifest.yml with all new workflow entries - pr_verifier.py: Auth error fallback for LLM provider resilience - Tests for fallback behavior All Phase 3 scripts have 129 tests passing. * docs: Add comprehensive Phase 3 testing plan - Mark all Phase 3 implementation tasks as complete - Add detailed test suite with 12 specific test cases: - Suite A: Capability Check (3 tests) - Suite B: Task Decomposition (3 tests) - Suite C: Duplicate Detection (4 tests) - Suite D: Auto-Label (2 tests) - Include pre-testing checklist and execution tracking table - Add rollback plan and success criteria - Include sample issue bodies for reproducible tests * docs: Add deployment verification plan for cross-repo testing Addresses known issue: verify:compare works on Travel-Plan-Permission but fails on Trend_Model_Project PR #4249. New deployment verification plan includes: - Phase 1: Sync deployment tracking across all 7 repos - Phase 2: Existing workflow verification (investigate failures) - Phase 3: New workflow verification with specific test cases - Phase 4: Troubleshooting guide for common issues - Cross-repo verification summary with minimum pass criteria Separates deployment verification from functional regression testing. * docs: Resolve verify:compare investigation - PR not merged (expected behavior) Investigation findings for Trend_Model_Project PR #4249: - Root cause: PR is OPEN, not merged - Verifier correctly skipped (designed for merged PRs only) - verify:* labels missing in most repos (only Travel-Plan-Permission has them) - Added label prerequisite checklist to deployment plan - Updated verification summary with resolved status

stranske added 4 commits December 26, 2025 19:09

test: add reusable-agents-verifier.yml to expected names and docs

f5af02d

Add new workflow to: - EXPECTED_NAMES in test_workflow_naming.py - Reusable workflow table in WORKFLOWS.md - Prose documentation in WORKFLOWS.md - Primary workflows list in WORKFLOW_SYSTEM.md

fix: skip verifier when no acceptance criteria exist

ce3f938

The verifier was running and potentially creating follow-up issues even when there were no acceptance criteria to verify. Now it skips with a notice when acceptanceCount == 0, avoiding unnecessary issue creation.

Copilot AI review requested due to automatic review settings December 26, 2025 20:12

Copilot started reviewing on behalf of stranske December 26, 2025 20:13 View session

Copilot AI reviewed Dec 26, 2025

View reviewed changes

stranske closed this Dec 26, 2025

stranske mentioned this pull request Dec 26, 2025

fix: remove verifier dead code and add skip logic tests #198

Merged

3 tasks

stranske deleted the feat/consumer-agents-verifier branch December 27, 2025 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/consumer agents verifier#196

Feat/consumer agents verifier#196
stranske wants to merge 4 commits intomainfrom
feat/consumer-agents-verifier

stranske commented Dec 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

stranske commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const rawPrNumber = Number('${{ steps.context.outputs.pr_number }}');
		const prNumber = !Number.isNaN(rawPrNumber) && rawPrNumber > 0 ? rawPrNumber : null;

-          pr_number = int(os.environ.get("PR_NUMBER") or 0)
+          pr_number_raw = os.environ.get("PR_NUMBER")
+          try:
+              pr_number = int(pr_number_raw) if pr_number_raw else 0
+          except (TypeError, ValueError):
+              pr_number = 0

	uses: stranske/Workflows/.github/workflows/reusable-agents-verifier.yml@main
	# NOTE: Update <ORG>/<WORKFLOWS_REPO> to the organization and repository that host your reusable workflows.
	uses: <ORG>/<WORKFLOWS_REPO>/.github/workflows/reusable-agents-verifier.yml@main

Conversation

stranske commented Dec 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

stranske commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants