Skip to content

test: Add unit tests for verifier placeholder filtering helpers#317

Merged
stranske merged 4 commits intomainfrom
copilot/sub-pr-316
Dec 30, 2025
Merged

test: Add unit tests for verifier placeholder filtering helpers#317
stranske merged 4 commits intomainfrom
copilot/sub-pr-316

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 30, 2025

Automated Status Summary

Scope

  • Context / problem:
  • - The Automated Status Summary in PR bodies currently only shows workflow run results
  • - When the CLI-based Codex (via reusable-codex-run.yml) completes iterations, there's no visibility into:
  • - What tasks Codex completed in each round
  • - The final message/output from Codex
  • - How many files were changed
  • - Whether commits were pushed successfully
  • - This makes it hard to track CLI Codex progress vs the UI version
  • - The keepalive loop evaluation outputs (iteration count, tasks remaining, etc.) are logged but not surfaced to the PR summary
  • Goal:
  • - Capture CLI Codex outputs and integrate them into the Automated Status Summary
  • - Provide visibility into Codex iteration progress and outcomes
  • - Show what changed in each round

Tasks

  • Update reusable-codex-run.yml to emit structured outputs:
  • Add output for final-message from Codex action
  • Add output for files-changed (count of modified files)
  • Add output for commits-pushed (boolean)
  • Write iteration summary to GITHUB_STEP_SUMMARY
  • Create new section in PR body for CLI Codex status:
  • Add <!-- codex-cli-status:start --> / <!-- codex-cli-status:end --> markers
  • Show last iteration number and outcome
  • Show tasks completed this round
  • Show link to workflow run logs
  • Update agents_pr_meta_update_body.js to populate the new section:
  • Fetch latest keepalive loop run results
  • Extract Codex outputs from workflow artifacts or step summaries
  • Format and insert into PR body
  • Update keepalive_loop.js to pass iteration context to the summary:
  • Include current iteration number in output
  • Include tasks remaining count
  • Include estimated rounds to completion
  • Add tests for the new integration:
  • Test output extraction from workflow runs
  • Test PR body section formatting
  • Test edge cases (no Codex runs, failed runs, etc.)

Acceptance criteria

  • CLI Codex iterations are visible in the PR body Automated Status Summary
  • Each iteration shows: round number, tasks attempted, outcome, and link to logs
  • The summary updates automatically after each keepalive loop run
  • Existing UI Codex tracking (if any) continues to work
  • Head SHA: 377ad77
  • Latest Runs: ✅ success — Gate
  • Required: gate: ✅ success
  • | Workflow / Job | Result | Logs |
  • |----------------|--------|------|
  • | Agents PR meta manager | ❔ in progress | View run |
  • | CI Autofix Loop | ✅ success | View run |
  • | Gate | ✅ success | View run |
  • | Health 40 Sweep | ✅ success | View run |
  • | Health 44 Gate Branch Protection | ✅ success | View run |
  • | Health 45 Agents Guard | ✅ success | View run |
  • | Health 50 Security Scan | ✅ success | View run |
  • | Maint 52 Validate Workflows | ✅ success | View run |
  • | PR 11 - Minimal invariant CI | ✅ success | View run |
  • | Selftest CI | ✅ success | View run |

Head SHA: 0774326
Latest Runs: ❔ in progress — Gate
Required: gate: ❔ in progress

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run
CI Autofix Loop ✅ success View run
Gate ❔ in progress View run
Health 40 Sweep ✅ success View run
Health 44 Gate Branch Protection ❔ in progress View run
Health 45 Agents Guard ✅ success View run
Health 50 Security Scan ❔ in progress View run
Maint 52 Validate Workflows ✅ success View run
PR 11 - Minimal invariant CI ✅ success View run
Selftest CI ❔ in progress View run
Validate Sync Manifest ✅ success View run

Base automatically changed from fix/verifier-contentless-issues to main December 30, 2025 15:50
@stranske stranske marked this pull request as ready for review December 30, 2025 15:50
@stranske stranske self-requested a review as a code owner December 30, 2025 15:50
Copilot AI review requested due to automatic review settings December 30, 2025 15:50
@stranske
Copy link
Copy Markdown
Owner

@codex resolve conflicts

Copilot AI changed the title [WIP] WIP address feedback on preventing contentless follow-up issues test: Add unit tests for verifier placeholder filtering helpers Dec 30, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds logic to prevent the creation of contentless follow-up issues in the verifier workflows. The changes introduce helper functions to identify placeholder content (like "section missing from source issue" or "N/A"), markdown section headers that were incorrectly captured, and PR/issue reference links. These helpers are used to filter out non-substantive items when determining whether a follow-up issue should be created.

Key changes:

  • Added helper functions (isPlaceholderContent, looksLikeSectionHeader, looksLikeReferenceLink) to identify non-actionable content
  • Modified formatFollowUpIssue and formatSimpleFollowUpIssue to include a hasSubstantiveContent flag
  • Updated both workflow files to check this flag and skip issue creation when there's no substantive content
  • Added comprehensive unit tests for the three new helper functions (148 new test cases)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/workflows/SystemEvaluation.md Updated documentation to reflect that parallel testing has been implemented with pytest-xdist
.github/workflows/reusable-agents-verifier.yml Added logic to skip follow-up issue creation when hasSubstantiveContent is false, sets skipped output accordingly
.github/workflows/agents-verifier.yml Added identical logic to skip follow-up issue creation when hasSubstantiveContent is false (consistency with reusable workflow)
.github/scripts/verifier_issue_formatter.js Implemented three helper functions to detect non-substantive content, added hasSubstantiveContent calculation to both formatter functions, exported helpers for testing
.github/scripts/tests/verifier-issue-formatter.test.js Added comprehensive test coverage for the three new helper functions with 148 test cases covering edge cases like empty input, null values, case insensitivity, and various placeholder patterns
Comments suppressed due to low confidence (1)

.github/scripts/tests/verifier-issue-formatter.test.js:580

  • The test suite for formatFollowUpIssue doesn't verify the new hasSubstantiveContent field that was added to the return value. This field is critical for the new functionality that prevents contentless follow-up issues from being created. Consider adding test cases that verify:
  1. hasSubstantiveContent is true when there are real tasks/criteria
  2. hasSubstantiveContent is false when all sections contain only placeholder content
  3. hasSubstantiveContent is true when there are verifier gaps even without tasks/criteria
  describe('formatFollowUpIssue', () => {
    const verifierOutput = `Verdict: FAIL

The error handling is incomplete.

Blocking:
- Missing retry logic for rate limits
- No backoff delay implementation`;

    const prBody = `## Scope
Implement error handling.

## Tasks
- [x] Add error classifier
- [ ] Add retry logic
- [x] Add tests

## Acceptance Criteria
- [ ] Retry logic handles rate limits
- [ ] Tests cover all error paths`;

    const issue = {
      number: 100,
      title: 'Error handling',
      body: `## Why
We need better error handling.

## Non-Goals
- Changing existing behavior

## Scope
Error classification and recovery.

## Tasks
- [ ] Create error module
- [ ] Add retry wrapper

## Acceptance Criteria
- [ ] Errors are classified
- [ ] Retries use exponential backoff`,
    };

    it('generates title with PR number', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.title.includes('PR #123'));
      assert.ok(result.title.includes('Follow-up'));
    });

    it('includes source section with links', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
        prUrl: 'https://github.com/test/repo/pull/123',
        runUrl: 'https://github.com/test/repo/actions/runs/456',
      });
      assert.ok(result.body.includes('## Source'));
      assert.ok(result.body.includes('#123'));
      assert.ok(result.body.includes('#100'));
    });

    it('preserves Why section from parent issue', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Why'));
      assert.ok(result.body.includes('better error handling'));
    });

    it('preserves Non-Goals section', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Non-Goals'));
      assert.ok(result.body.includes('Changing existing behavior'));
    });

    it('includes unmet acceptance criteria', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Acceptance Criteria'));
      assert.ok(result.body.includes('Retry logic handles rate limits'));
    });

    it('copies incomplete tasks', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Tasks'));
      assert.ok(result.body.includes('Add retry logic'));
    });

    it('generates tasks from gaps when all tasks complete', () => {
      const allTasksComplete = `## Tasks
- [x] Task one
- [x] Task two

## Acceptance Criteria
- [ ] Criterion not met`;

      const result = formatFollowUpIssue({
        verifierOutput,
        prBody: allTasksComplete,
        issues: [],
        prNumber: 123,
      });
      // Should generate tasks from verifier gaps
      assert.ok(result.newTasks.length > 0);
    });

    it('includes implementation notes with summary', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.ok(result.body.includes('## Implementation Notes'));
      assert.ok(result.body.includes('error handling is incomplete'));
    });

    it('returns parsed findings', () => {
      const result = formatFollowUpIssue({
        verifierOutput,
        prBody,
        issues: [issue],
        prNumber: 123,
      });
      assert.equal(result.findings.verdict, 'fail');
      assert.ok(result.findings.gaps.length > 0);
    });

    it('uses verifier unmet criteria to filter acceptance criteria', () => {
      // Verifier explicitly says which criteria are not met
      const structuredVerifierOutput = `Verdict: FAIL

## Criteria Status
- [x] Retry logic handles rate limits - VERIFIED (code exists)
- [ ] Tests cover all error paths - NOT MET (missing coverage)
- [x] Error messages are helpful - VERIFIED (messages include guidance)
`;

      const prBodyWithCriteria = `## Tasks
- [x] All tasks done

## Acceptance Criteria
- [ ] Retry logic handles rate limits
- [ ] Tests cover all error paths
- [ ] Error messages are helpful`;

      const result = formatFollowUpIssue({
        verifierOutput: structuredVerifierOutput,
        prBody: prBodyWithCriteria,
        issues: [],
        prNumber: 200,
      });

      // Should only include the criterion that was NOT MET in the refined list
      assert.deepEqual(result.unmetCriteria, ['Tests cover all error paths']);
      
      // The Acceptance Criteria section should only have the unmet criterion
      const acceptanceSection = result.body.split('## Acceptance Criteria')[1].split('## ')[0];
      assert.ok(acceptanceSection.includes('Tests cover all error paths'));
      assert.ok(!acceptanceSection.includes('- [ ] Retry logic handles rate limits'));
      assert.ok(!acceptanceSection.includes('- [ ] Error messages are helpful'));
      
      // Verified criteria should appear in Implementation Notes, not Acceptance Criteria
      const notesSection = result.body.split('## Implementation Notes')[1] || '';
      assert.ok(notesSection.includes('Retry logic handles rate limits'));
    });

    it('includes verified criteria in implementation notes', () => {
      const structuredVerifierOutput = `Verdict: FAIL

## Criteria Status
- [x] First criterion - VERIFIED (evidence)
- [ ] Second criterion - NOT MET (missing)
`;

      const prBodyWithCriteria = `## Tasks
- [x] Done

## Acceptance Criteria
- [ ] First criterion
- [ ] Second criterion`;

      const result = formatFollowUpIssue({
        verifierOutput: structuredVerifierOutput,
        prBody: prBodyWithCriteria,
        issues: [],
        prNumber: 201,
      });

      // Implementation notes should mention what was verified
      assert.ok(result.body.includes('Verifier confirmed these criteria were met'));
      assert.ok(result.body.includes('✓ First criterion'));
    });
  });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector
Copy link
Copy Markdown

Summary

Testing

  • node .github/scripts/__tests__/verifier-issue-formatter.test.js

View task →

Copilot AI and others added 3 commits December 30, 2025 16:18
Added comprehensive test coverage for three exported helper functions:
- isPlaceholderContent: 8 tests covering placeholder patterns, N/A variations, empty strings, and edge cases
- looksLikeSectionHeader: 7 tests covering markdown headers, whitespace handling, and non-header content
- looksLikeReferenceLink: 10 tests covering PR/Issue references, bullet styles, case sensitivity, and edge cases

All 444 tests in the JavaScript test suite pass.

Co-authored-by: stranske <23046322+stranske@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@agents-workflows-bot
Copy link
Copy Markdown
Contributor

⚠️ Action Required: Unable to determine source issue for PR #317. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 30, 2025

Automated Status Summary

Head SHA: 054e487
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 0.00%
Baseline 85.00%
Delta -85.00%
Minimum 70.00%
Status ❌ Below minimum

Top Coverage Hotspots (lowest coverage)

File Coverage Missing
src/trend_analysis/_autofix_trigger_sample.py 0.0% 10
src/trend_analysis/_autofix_violation_case2.py 0.0% 15
src/trend_analysis/_autofix_violation_case3.py 0.0% 13
src/trend_analysis/_ci_probe_faults.py 0.0% 18
src/trend_analysis/automation_multifailure.py 0.0% 4
src/trend_analysis/constants.py 0.0% 1
src/trend_analysis/selector.py 0.0% 12
src/trend_analysis/weighting.py 0.0% 11
src/trend_analysis/__init__.py 100.0% 0

Low Coverage Files (<50.0%)

File Coverage Missing
src/trend_analysis/_autofix_trigger_sample.py 0.0% 10
src/trend_analysis/_autofix_violation_case2.py 0.0% 15
src/trend_analysis/_autofix_violation_case3.py 0.0% 13
src/trend_analysis/_ci_probe_faults.py 0.0% 18
src/trend_analysis/automation_multifailure.py 0.0% 4
src/trend_analysis/constants.py 0.0% 1
src/trend_analysis/selector.py 0.0% 12
src/trend_analysis/weighting.py 0.0% 11

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

  • Context / problem:
  • - The Automated Status Summary in PR bodies currently only shows workflow run results
  • - When the CLI-based Codex (via reusable-codex-run.yml) completes iterations, there's no visibility into:
  • - What tasks Codex completed in each round
  • - The final message/output from Codex
  • - How many files were changed
  • - Whether commits were pushed successfully
  • - This makes it hard to track CLI Codex progress vs the UI version
  • - The keepalive loop evaluation outputs (iteration count, tasks remaining, etc.) are logged but not surfaced to the PR summary
  • Goal:
  • - Capture CLI Codex outputs and integrate them into the Automated Status Summary
  • - Provide visibility into Codex iteration progress and outcomes
  • - Show what changed in each round

Tasks

  • Update reusable-codex-run.yml to emit structured outputs:
  • Add output for final-message from Codex action
  • Add output for files-changed (count of modified files)
  • Add output for commits-pushed (boolean)
  • Write iteration summary to GITHUB_STEP_SUMMARY
  • Create new section in PR body for CLI Codex status:
  • Add <!-- codex-cli-status:start --> / <!-- codex-cli-status:end --> markers
  • Show last iteration number and outcome
  • Show tasks completed this round
  • Show link to workflow run logs
  • Update agents_pr_meta_update_body.js to populate the new section:
  • Fetch latest keepalive loop run results
  • Extract Codex outputs from workflow artifacts or step summaries
  • Format and insert into PR body
  • Update keepalive_loop.js to pass iteration context to the summary:
  • Include current iteration number in output
  • Include tasks remaining count
  • Include estimated rounds to completion
  • Add tests for the new integration:
  • Test output extraction from workflow runs
  • Test PR body section formatting
  • Test edge cases (no Codex runs, failed runs, etc.)

Acceptance criteria

  • CLI Codex iterations are visible in the PR body Automated Status Summary
  • Each iteration shows: round number, tasks attempted, outcome, and link to logs
  • The summary updates automatically after each keepalive loop run
  • Existing UI Codex tracking (if any) continues to work
  • [ ]

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 30, 2025

🤖 Keepalive Loop Status

PR #317 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Gate success
Tasks 0/41 complete
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

@stranske stranske merged commit fd234e0 into main Dec 30, 2025
37 checks passed
@stranske stranske deleted the copilot/sub-pr-316 branch December 30, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants