test: Add unit tests for verifier placeholder filtering helpers#317
test: Add unit tests for verifier placeholder filtering helpers#317
Conversation
|
@codex resolve conflicts |
There was a problem hiding this comment.
Pull request overview
This PR adds logic to prevent the creation of contentless follow-up issues in the verifier workflows. The changes introduce helper functions to identify placeholder content (like "section missing from source issue" or "N/A"), markdown section headers that were incorrectly captured, and PR/issue reference links. These helpers are used to filter out non-substantive items when determining whether a follow-up issue should be created.
Key changes:
- Added helper functions (
isPlaceholderContent,looksLikeSectionHeader,looksLikeReferenceLink) to identify non-actionable content - Modified
formatFollowUpIssueandformatSimpleFollowUpIssueto include ahasSubstantiveContentflag - Updated both workflow files to check this flag and skip issue creation when there's no substantive content
- Added comprehensive unit tests for the three new helper functions (148 new test cases)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/workflows/SystemEvaluation.md | Updated documentation to reflect that parallel testing has been implemented with pytest-xdist |
| .github/workflows/reusable-agents-verifier.yml | Added logic to skip follow-up issue creation when hasSubstantiveContent is false, sets skipped output accordingly |
| .github/workflows/agents-verifier.yml | Added identical logic to skip follow-up issue creation when hasSubstantiveContent is false (consistency with reusable workflow) |
| .github/scripts/verifier_issue_formatter.js | Implemented three helper functions to detect non-substantive content, added hasSubstantiveContent calculation to both formatter functions, exported helpers for testing |
| .github/scripts/tests/verifier-issue-formatter.test.js | Added comprehensive test coverage for the three new helper functions with 148 test cases covering edge cases like empty input, null values, case insensitivity, and various placeholder patterns |
Comments suppressed due to low confidence (1)
.github/scripts/tests/verifier-issue-formatter.test.js:580
- The test suite for
formatFollowUpIssuedoesn't verify the newhasSubstantiveContentfield that was added to the return value. This field is critical for the new functionality that prevents contentless follow-up issues from being created. Consider adding test cases that verify:
hasSubstantiveContentis true when there are real tasks/criteriahasSubstantiveContentis false when all sections contain only placeholder contenthasSubstantiveContentis true when there are verifier gaps even without tasks/criteria
describe('formatFollowUpIssue', () => {
const verifierOutput = `Verdict: FAIL
The error handling is incomplete.
Blocking:
- Missing retry logic for rate limits
- No backoff delay implementation`;
const prBody = `## Scope
Implement error handling.
## Tasks
- [x] Add error classifier
- [ ] Add retry logic
- [x] Add tests
## Acceptance Criteria
- [ ] Retry logic handles rate limits
- [ ] Tests cover all error paths`;
const issue = {
number: 100,
title: 'Error handling',
body: `## Why
We need better error handling.
## Non-Goals
- Changing existing behavior
## Scope
Error classification and recovery.
## Tasks
- [ ] Create error module
- [ ] Add retry wrapper
## Acceptance Criteria
- [ ] Errors are classified
- [ ] Retries use exponential backoff`,
};
it('generates title with PR number', () => {
const result = formatFollowUpIssue({
verifierOutput,
prBody,
issues: [issue],
prNumber: 123,
});
assert.ok(result.title.includes('PR #123'));
assert.ok(result.title.includes('Follow-up'));
});
it('includes source section with links', () => {
const result = formatFollowUpIssue({
verifierOutput,
prBody,
issues: [issue],
prNumber: 123,
prUrl: 'https://github.com/test/repo/pull/123',
runUrl: 'https://github.com/test/repo/actions/runs/456',
});
assert.ok(result.body.includes('## Source'));
assert.ok(result.body.includes('#123'));
assert.ok(result.body.includes('#100'));
});
it('preserves Why section from parent issue', () => {
const result = formatFollowUpIssue({
verifierOutput,
prBody,
issues: [issue],
prNumber: 123,
});
assert.ok(result.body.includes('## Why'));
assert.ok(result.body.includes('better error handling'));
});
it('preserves Non-Goals section', () => {
const result = formatFollowUpIssue({
verifierOutput,
prBody,
issues: [issue],
prNumber: 123,
});
assert.ok(result.body.includes('## Non-Goals'));
assert.ok(result.body.includes('Changing existing behavior'));
});
it('includes unmet acceptance criteria', () => {
const result = formatFollowUpIssue({
verifierOutput,
prBody,
issues: [issue],
prNumber: 123,
});
assert.ok(result.body.includes('## Acceptance Criteria'));
assert.ok(result.body.includes('Retry logic handles rate limits'));
});
it('copies incomplete tasks', () => {
const result = formatFollowUpIssue({
verifierOutput,
prBody,
issues: [issue],
prNumber: 123,
});
assert.ok(result.body.includes('## Tasks'));
assert.ok(result.body.includes('Add retry logic'));
});
it('generates tasks from gaps when all tasks complete', () => {
const allTasksComplete = `## Tasks
- [x] Task one
- [x] Task two
## Acceptance Criteria
- [ ] Criterion not met`;
const result = formatFollowUpIssue({
verifierOutput,
prBody: allTasksComplete,
issues: [],
prNumber: 123,
});
// Should generate tasks from verifier gaps
assert.ok(result.newTasks.length > 0);
});
it('includes implementation notes with summary', () => {
const result = formatFollowUpIssue({
verifierOutput,
prBody,
issues: [issue],
prNumber: 123,
});
assert.ok(result.body.includes('## Implementation Notes'));
assert.ok(result.body.includes('error handling is incomplete'));
});
it('returns parsed findings', () => {
const result = formatFollowUpIssue({
verifierOutput,
prBody,
issues: [issue],
prNumber: 123,
});
assert.equal(result.findings.verdict, 'fail');
assert.ok(result.findings.gaps.length > 0);
});
it('uses verifier unmet criteria to filter acceptance criteria', () => {
// Verifier explicitly says which criteria are not met
const structuredVerifierOutput = `Verdict: FAIL
## Criteria Status
- [x] Retry logic handles rate limits - VERIFIED (code exists)
- [ ] Tests cover all error paths - NOT MET (missing coverage)
- [x] Error messages are helpful - VERIFIED (messages include guidance)
`;
const prBodyWithCriteria = `## Tasks
- [x] All tasks done
## Acceptance Criteria
- [ ] Retry logic handles rate limits
- [ ] Tests cover all error paths
- [ ] Error messages are helpful`;
const result = formatFollowUpIssue({
verifierOutput: structuredVerifierOutput,
prBody: prBodyWithCriteria,
issues: [],
prNumber: 200,
});
// Should only include the criterion that was NOT MET in the refined list
assert.deepEqual(result.unmetCriteria, ['Tests cover all error paths']);
// The Acceptance Criteria section should only have the unmet criterion
const acceptanceSection = result.body.split('## Acceptance Criteria')[1].split('## ')[0];
assert.ok(acceptanceSection.includes('Tests cover all error paths'));
assert.ok(!acceptanceSection.includes('- [ ] Retry logic handles rate limits'));
assert.ok(!acceptanceSection.includes('- [ ] Error messages are helpful'));
// Verified criteria should appear in Implementation Notes, not Acceptance Criteria
const notesSection = result.body.split('## Implementation Notes')[1] || '';
assert.ok(notesSection.includes('Retry logic handles rate limits'));
});
it('includes verified criteria in implementation notes', () => {
const structuredVerifierOutput = `Verdict: FAIL
## Criteria Status
- [x] First criterion - VERIFIED (evidence)
- [ ] Second criterion - NOT MET (missing)
`;
const prBodyWithCriteria = `## Tasks
- [x] Done
## Acceptance Criteria
- [ ] First criterion
- [ ] Second criterion`;
const result = formatFollowUpIssue({
verifierOutput: structuredVerifierOutput,
prBody: prBodyWithCriteria,
issues: [],
prNumber: 201,
});
// Implementation notes should mention what was verified
assert.ok(result.body.includes('Verifier confirmed these criteria were met'));
assert.ok(result.body.includes('✓ First criterion'));
});
});
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Summary
Testing
|
Added comprehensive test coverage for three exported helper functions: - isPlaceholderContent: 8 tests covering placeholder patterns, N/A variations, empty strings, and edge cases - looksLikeSectionHeader: 7 tests covering markdown headers, whitespace handling, and non-header content - looksLikeReferenceLink: 10 tests covering PR/Issue references, bullet styles, case sensitivity, and edge cases All 444 tests in the JavaScript test suite pass. Co-authored-by: stranske <23046322+stranske@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
a4fbd0b to
0774326
Compare
Automated Status SummaryHead SHA: 054e487
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Low Coverage Files (<50.0%)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScope
Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #317 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
Automated Status Summary
Scope
reusable-codex-run.yml) completes iterations, there's no visibility into:Tasks
reusable-codex-run.ymlto emit structured outputs:final-messagefrom Codex actionfiles-changed(count of modified files)commits-pushed(boolean)<!-- codex-cli-status:start -->/<!-- codex-cli-status:end -->markersagents_pr_meta_update_body.jsto populate the new section:keepalive_loop.jsto pass iteration context to the summary:Acceptance criteria
Head SHA: 0774326
Latest Runs: ❔ in progress — Gate
Required: gate: ❔ in progress