test: Add unit tests for verifier placeholder filtering helpers by Copilot · Pull Request #317 · stranske/Workflows

Copilot · 2025-12-30T15:46:22Z

Automated Status Summary

Scope

Tasks

Acceptance criteria

Head SHA: 0774326
Latest Runs: ❔ in progress — Gate
Required: gate: ❔ in progress

Workflow / Job	Result	Logs
Agents PR meta manager	❔ in progress	View run
CI Autofix Loop	✅ success	View run
Gate	❔ in progress	View run
Health 40 Sweep	✅ success	View run
Health 44 Gate Branch Protection	❔ in progress	View run
Health 45 Agents Guard	✅ success	View run
Health 50 Security Scan	❔ in progress	View run
Maint 52 Validate Workflows	✅ success	View run
PR 11 - Minimal invariant CI	✅ success	View run
Selftest CI	❔ in progress	View run
Validate Sync Manifest	✅ success	View run

stranske · 2025-12-30T15:50:57Z

@codex resolve conflicts

Copilot

Pull request overview

This PR adds logic to prevent the creation of contentless follow-up issues in the verifier workflows. The changes introduce helper functions to identify placeholder content (like "section missing from source issue" or "N/A"), markdown section headers that were incorrectly captured, and PR/issue reference links. These helpers are used to filter out non-substantive items when determining whether a follow-up issue should be created.

Key changes:

Added helper functions (isPlaceholderContent, looksLikeSectionHeader, looksLikeReferenceLink) to identify non-actionable content
Modified formatFollowUpIssue and formatSimpleFollowUpIssue to include a hasSubstantiveContent flag
Updated both workflow files to check this flag and skip issue creation when there's no substantive content
Added comprehensive unit tests for the three new helper functions (148 new test cases)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
docs/workflows/SystemEvaluation.md	Updated documentation to reflect that parallel testing has been implemented with pytest-xdist
.github/workflows/reusable-agents-verifier.yml	Added logic to skip follow-up issue creation when `hasSubstantiveContent` is false, sets `skipped` output accordingly
.github/workflows/agents-verifier.yml	Added identical logic to skip follow-up issue creation when `hasSubstantiveContent` is false (consistency with reusable workflow)
.github/scripts/verifier_issue_formatter.js	Implemented three helper functions to detect non-substantive content, added `hasSubstantiveContent` calculation to both formatter functions, exported helpers for testing
.github/scripts/tests/verifier-issue-formatter.test.js	Added comprehensive test coverage for the three new helper functions with 148 test cases covering edge cases like empty input, null values, case insensitivity, and various placeholder patterns

Comments suppressed due to low confidence (1)

.github/scripts/tests/verifier-issue-formatter.test.js:580

The test suite for formatFollowUpIssue doesn't verify the new hasSubstantiveContent field that was added to the return value. This field is critical for the new functionality that prevents contentless follow-up issues from being created. Consider adding test cases that verify:

hasSubstantiveContent is true when there are real tasks/criteria
hasSubstantiveContent is false when all sections contain only placeholder content
hasSubstantiveContent is true when there are verifier gaps even without tasks/criteria

  describe('formatFollowUpIssue', () => {
    const verifierOutput = `Verdict: FAIL

The error handling is incomplete.

Blocking:
- Missing retry logic for rate limits
- No backoff delay implementation`;

    const prBody = `## Scope
Implement error handling.

## Tasks
- [x] Add error classifier
- [ ] Add retry logic
- [x] Add tests

## Acceptance Criteria
- [ ] Retry logic handles rate limits
- [ ] Tests cover all error paths`;

    const issue = {
      number: 100,
      title: 'Error handling',
      body: `## Why
We need better error handling.

## Non-Goals
- Changing existing behavior

## Scope
Error classification and recovery.

## Tasks
- [ ] Create error module
- [ ] Add retry wrapper

## Acceptance Criteria
- [ ] Errors are classified
- [ ] Retries use exponential backoff`,
    };

    it('generates title with PR number', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.title.includes('PR #123'));
      assert.ok(result.title.includes('Follow-up'));
    });

    it('includes source section with links', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
        prUrl: 'https://github.com/test/repo/pull/123',
        runUrl: 'https://github.com/test/repo/actions/runs/456',
      });
      assert.ok(result.body.includes('## Source'));
      assert.ok(result.body.includes('#123'));
      assert.ok(result.body.includes('#100'));
    });

    it('preserves Why section from parent issue', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Why'));
      assert.ok(result.body.includes('better error handling'));
    });

    it('preserves Non-Goals section', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Non-Goals'));
      assert.ok(result.body.includes('Changing existing behavior'));
    });

    it('includes unmet acceptance criteria', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Acceptance Criteria'));
      assert.ok(result.body.includes('Retry logic handles rate limits'));
    });

    it('copies incomplete tasks', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Tasks'));
      assert.ok(result.body.includes('Add retry logic'));
    });

    it('generates tasks from gaps when all tasks complete', () => {
      const allTasksComplete = `## Tasks
- [x] Task one
- [x] Task two

## Acceptance Criteria
- [ ] Criterion not met`;

      const result = formatFollowUpIssue({
        verifierOutput,
        prBody: allTasksComplete,
        issues: [],
        prNumber: 123,
      });
      // Should generate tasks from verifier gaps
      assert.ok(result.newTasks.length > 0);
    });

    it('includes implementation notes with summary', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Implementation Notes'));
      assert.ok(result.body.includes('error handling is incomplete'));
    });

    it('returns parsed findings', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.equal(result.findings.verdict, 'fail');
      assert.ok(result.findings.gaps.length > 0);
    });

    it('uses verifier unmet criteria to filter acceptance criteria', () => {
      // Verifier explicitly says which criteria are not met
      const structuredVerifierOutput = `Verdict: FAIL

## Criteria Status
- [x] Retry logic handles rate limits - VERIFIED (code exists)
- [ ] Tests cover all error paths - NOT MET (missing coverage)
- [x] Error messages are helpful - VERIFIED (messages include guidance)
`;

      const prBodyWithCriteria = `## Tasks
- [x] All tasks done

## Acceptance Criteria
- [ ] Retry logic handles rate limits
- [ ] Tests cover all error paths
- [ ] Error messages are helpful`;

      const result = formatFollowUpIssue({
        verifierOutput: structuredVerifierOutput,
        prBody: prBodyWithCriteria,
        issues: [],
        prNumber: 200,
      });

      // Should only include the criterion that was NOT MET in the refined list
      assert.deepEqual(result.unmetCriteria, ['Tests cover all error paths']);
      
      // The Acceptance Criteria section should only have the unmet criterion
      const acceptanceSection = result.body.split('## Acceptance Criteria')[1].split('## ')[0];
      assert.ok(acceptanceSection.includes('Tests cover all error paths'));
      assert.ok(!acceptanceSection.includes('- [ ] Retry logic handles rate limits'));
      assert.ok(!acceptanceSection.includes('- [ ] Error messages are helpful'));
      
      // Verified criteria should appear in Implementation Notes, not Acceptance Criteria
      const notesSection = result.body.split('## Implementation Notes')[1] || '';
      assert.ok(notesSection.includes('Retry logic handles rate limits'));
    });

    it('includes verified criteria in implementation notes', () => {
      const structuredVerifierOutput = `Verdict: FAIL

## Criteria Status
- [x] First criterion - VERIFIED (evidence)
- [ ] Second criterion - NOT MET (missing)
`;

      const prBodyWithCriteria = `## Tasks
- [x] Done

## Acceptance Criteria
- [ ] First criterion
- [ ] Second criterion`;

      const result = formatFollowUpIssue({
        verifierOutput: structuredVerifierOutput,
        prBody: prBodyWithCriteria,
        issues: [],
        prNumber: 201,
      });

      // Implementation notes should mention what was verified
      assert.ok(result.body.includes('Verifier confirmed these criteria were met'));
      assert.ok(result.body.includes('✓ First criterion'));
    });
  });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/scripts/verifier_issue_formatter.js

chatgpt-codex-connector · 2025-12-30T15:55:56Z

Summary

Resolved merge conflict in verifier_issue_formatter.js, preserving the more flexible placeholder detection regex (case-insensitive, optional spacing, and uppercase support). .github/scripts/verifier_issue_formatter.jsL7-L21
Retained the base-branch documentation improvements for checkbox filtering to clarify reuse across different checklist content. .github/scripts/verifier_issue_formatter.jsL217-L221

Testing

✅ node .github/scripts/__tests__/verifier-issue-formatter.test.js

View task →

Added comprehensive test coverage for three exported helper functions: - isPlaceholderContent: 8 tests covering placeholder patterns, N/A variations, empty strings, and edge cases - looksLikeSectionHeader: 7 tests covering markdown headers, whitespace handling, and non-header content - looksLikeReferenceLink: 10 tests covering PR/Issue references, bullet styles, case sensitivity, and edge cases All 444 tests in the JavaScript test suite pass. Co-authored-by: stranske <23046322+stranske@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

agents-workflows-bot · 2025-12-30T16:19:42Z

⚠️ Action Required: Unable to determine source issue for PR #317. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

github-actions · 2025-12-30T16:20:28Z

Automated Status Summary

Head SHA: 054e487
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	0.00%
Baseline	85.00%
Delta	-85.00%
Minimum	70.00%
Status	❌ Below minimum

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`src/trend_analysis/_autofix_trigger_sample.py`	0.0%	10
`src/trend_analysis/_autofix_violation_case2.py`	0.0%	15
`src/trend_analysis/_autofix_violation_case3.py`	0.0%	13
`src/trend_analysis/_ci_probe_faults.py`	0.0%	18
`src/trend_analysis/automation_multifailure.py`	0.0%	4
`src/trend_analysis/constants.py`	0.0%	1
`src/trend_analysis/selector.py`	0.0%	12
`src/trend_analysis/weighting.py`	0.0%	11
`src/trend_analysis/__init__.py`	100.0%	0

Low Coverage Files (<50.0%)

File	Coverage	Missing
`src/trend_analysis/_autofix_trigger_sample.py`	0.0%	10
`src/trend_analysis/_autofix_violation_case2.py`	0.0%	15
`src/trend_analysis/_autofix_violation_case3.py`	0.0%	13
`src/trend_analysis/_ci_probe_faults.py`	0.0%	18
`src/trend_analysis/automation_multifailure.py`	0.0%	4
`src/trend_analysis/constants.py`	0.0%	1
`src/trend_analysis/selector.py`	0.0%	12
`src/trend_analysis/weighting.py`	0.0%	11

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

Tasks

Acceptance criteria

CLI Codex iterations are visible in the PR body Automated Status Summary
Each iteration shows: round number, tasks attempted, outcome, and link to logs
The summary updates automatically after each keepalive loop run
Existing UI Codex tracking (if any) continues to work
[ ]

github-actions · 2025-12-30T16:20:46Z

🤖 Keepalive Loop Status

PR #317 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Gate	success
Tasks	0/41 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

Copilot AI assigned Copilot and stranske Dec 30, 2025

Copilot AI mentioned this pull request Dec 30, 2025

fix: prevent contentless follow-up issues from verifier #316

Merged

20 tasks

Copilot started work on behalf of stranske December 30, 2025 15:46 View session

Base automatically changed from fix/verifier-contentless-issues to main December 30, 2025 15:50

stranske marked this pull request as ready for review December 30, 2025 15:50

stranske self-requested a review as a code owner December 30, 2025 15:50

Copilot AI review requested due to automatic review settings December 30, 2025 15:50

Copilot started reviewing on behalf of stranske December 30, 2025 15:51 View session

Copilot AI changed the title ~~[WIP] WIP address feedback on preventing contentless follow-up issues~~ test: Add unit tests for verifier placeholder filtering helpers Dec 30, 2025

Copilot finished work on behalf of stranske December 30, 2025 15:52

Copilot AI reviewed Dec 30, 2025

View reviewed changes

.github/scripts/verifier_issue_formatter.js Outdated Show resolved Hide resolved

.github/scripts/verifier_issue_formatter.js Outdated Show resolved Hide resolved

stranske approved these changes Dec 30, 2025

View reviewed changes

Copilot AI and others added 3 commits December 30, 2025 16:18

Initial plan

38af36e

Update .github/scripts/verifier_issue_formatter.js

0774326

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

stranske force-pushed the copilot/sub-pr-316 branch from a4fbd0b to 0774326 Compare December 30, 2025 16:18

stranske temporarily deployed to agent-standard December 30, 2025 16:19 — with GitHub Actions Inactive

Merge branch 'main' into copilot/sub-pr-316

f5a81fd

stranske temporarily deployed to agent-standard December 30, 2025 16:22 — with GitHub Actions Inactive

stranske merged commit fd234e0 into main Dec 30, 2025
37 checks passed

stranske deleted the copilot/sub-pr-316 branch December 30, 2025 16:28

github-actions bot mentioned this pull request Dec 30, 2025

[Follow-up] Unmet criteria from PR #317 #324

Closed

39 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Add unit tests for verifier placeholder filtering helpers#317

test: Add unit tests for verifier placeholder filtering helpers#317
stranske merged 4 commits intomainfrom
copilot/sub-pr-316

Copilot AI commented Dec 30, 2025 •

edited by agents-workflows-bot bot

Loading

Uh oh!

stranske commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Dec 30, 2025

Uh oh!

agents-workflows-bot bot commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Dec 30, 2025 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Scope

Tasks

Acceptance criteria

Uh oh!

stranske commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Dec 30, 2025

Uh oh!

agents-workflows-bot bot commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Low Coverage Files (<50.0%)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Dec 30, 2025 •

edited by agents-workflows-bot bot

Loading

github-actions bot commented Dec 30, 2025 •

edited

Loading

github-actions bot commented Dec 30, 2025 •

edited

Loading