Skip to content

Fix/scope checkbox stripping#326

Merged
stranske merged 10 commits intomainfrom
fix/scope-checkbox-stripping
Dec 30, 2025
Merged

Fix/scope checkbox stripping#326
stranske merged 10 commits intomainfrom
fix/scope-checkbox-stripping

Conversation

@stranske
Copy link
Copy Markdown
Owner

No description provided.

stranske and others added 9 commits December 30, 2025 16:07
Scope is informational context, not actionable items to check off.

1. Add stripCheckboxesFromScope() function that:
   - Converts '- [ ] text' to '- text' (plain bullets)
   - Filters out placeholder content entirely
   - Removes empty lines

2. Update PR_META_FALLBACK_PLACEHOLDERS.scope:
   - Change from '- [ ] Scope section missing...' to italicized text
   - Matches standard PLACEHOLDERS.scope pattern

3. Update consumer repo template with same placeholder change

4. Update test to match new placeholder format

This ensures scope sections in:
- PR Automated Status Summary (via issue_scope_parser.js)
- Verifier follow-up issues (via verifier_issue_formatter.js)

...display as informational text rather than checkbox items.
Add extractSourceSection() to parse Source section from PR body and
include it in buildTaskAppendix(). This gives Codex visibility into
linked parent issues and original PRs for additional context.

Changes:
- Add extractSourceSection() function to extract Source links from PR body
- Update buildTaskAppendix() to accept options.prBody parameter
- Add 'Source Context' section to agent prompt when Source links present
- Add CONTEXT TIP to keepalive-instruction.md template
- Add 5 new tests for source section extraction

When a PR has a Source section with links like:
- Original PR: #123
- Parent issue: #456

The agent prompt now includes these as 'Source Context' so Codex
knows to check linked issues/PRs for additional background.
The src/trend_analysis/ directory contains CI test fixtures used to
exercise the autofix pipeline workflows - not production code. These
files (e.g., _autofix_trigger_sample.py, _ci_probe_faults.py) were
showing 0% coverage and artificially dragging down overall metrics.

Add 'src/trend_analysis/*' to [tool.coverage.run].omit patterns.
These CI autofix test fixtures were at risk of being mistakenly deleted
or flagged as orphaned code. The README explains:
- Why these files exist (autofix pipeline testing)
- Why they're excluded from coverage
- Which tests use them
The Workflows repo was missing the caller workflow for the bot comment
handler - only the reusable version existed. This meant the
'autofix:bot-comments' label had no effect.

Copied from templates/consumer-repo/.github/workflows/ to enable:
- Manual trigger via 'autofix:bot-comments' label
- Automatic trigger after Gate completion for agent PRs
- Manual dispatch via workflow_dispatch
…r bug

Addresses 3 review comments from Copilot and Codex:

1. **Copilot (regex)**: Changed `\s+` to `\s*` in checkbox pattern to handle
   edge cases where space after bracket is missing (e.g., `- [ ]text`)

2. **Copilot (tests)**: Added 11 new tests for `stripCheckboxesFromScope()`:
   - Checkbox-to-bullet conversion (unchecked, x, X)
   - Preservation of indentation
   - Filtering of placeholder content
   - Handling of empty lines and null input
   - Edge case: no space after bracket

3. **Codex P1 (scope placeholder alignment)**: Fixed critical regression where
   `agents_pr_meta_update_body.js` was still emitting checkbox format for scope
   fallback while `issue_scope_parser.js` changed to italicized format.
   Scope is now treated as informational (plain text), not actionable (checkbox).

Tests: 471 passing (up from 424)
Add instruction #5 and CRITICAL section to keepalive-instruction.md
asking Codex to output '✅ Completed: [task text]' in final summary.

This enables reliable reconciliation parsing for completed tasks.
Copilot AI review requested due to automatic review settings December 30, 2025 17:27
@agents-workflows-bot
Copy link
Copy Markdown
Contributor

⚠️ Action Required: Unable to determine source issue for PR #326. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 30, 2025

Automated Status Summary

Head SHA: 5f81e8c
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 0.00%
Baseline 85.00%
Delta -85.00%
Minimum 70.00%
Status ❌ Below minimum

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

No scope information available

Tasks

  • No tasks defined

Acceptance criteria

  • No acceptance criteria defined

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 30, 2025

🤖 Keepalive Loop Status

PR #326 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Gate success
Tasks 0/0 complete
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

@stranske stranske merged commit 10ab48c into main Dec 30, 2025
37 checks passed
@stranske stranske deleted the fix/scope-checkbox-stripping branch December 30, 2025 17:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the automated task completion detection system by adding synonym matching, camelCase parsing, and lowering keyword matching thresholds to catch more legitimate task completions.

Key Changes:

  • Added synonym expansion for common action verbs (add/create/implement, fix/repair/resolve, etc.) to improve task-to-commit matching
  • Implemented camelCase/PascalCase word splitting to extract keywords from file names and commit messages
  • Lowered keyword matching thresholds from 50% to 35% (with file match) and added a 25% threshold tier to reduce false negatives

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
.github/templates/keepalive-instruction.md Added instructions for explicit completion markers in final summaries to help automation track completed tasks
.github/scripts/keepalive_loop.js Implemented synonym expansion, camelCase splitting, and adjusted matching thresholds in the analyzeTaskCompletion function
.github/scripts/__tests__/keepalive-loop.test.js Updated existing tests with more realistic unrelated tasks and added three new tests for threshold changes and synonym matching

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1580 to +1625
test('analyzeTaskCompletion uses synonym expansion for better matching', async () => {
// Task says "implement", commit says "add" - synonyms should match
const commits = [
{ sha: 'abc123', commit: { message: 'feat: add config validation logic' } },
];
const files = [
{ filename: 'src/config/validator.py' },
];

const github = {
rest: {
repos: {
async compareCommits() {
return { data: { commits } };
},
},
pulls: {
async listFiles() {
return { data: files };
},
},
},
};

const taskText = `
- [ ] Implement config validation with proper error handling
`;

const result = await analyzeTaskCompletion({
github,
context: { repo: { owner: 'test', repo: 'repo' } },
prNumber: 1,
baseSha: 'base123',
headSha: 'head456',
taskText,
core: buildCore(),
});

// "implement" in task should match "add" in commit via synonyms
// plus "config" and "validation" match directly
const configMatch = result.matches.find(m =>
m.task.toLowerCase().includes('config validation')
);
assert.ok(configMatch, 'Should match config validation task');
assert.equal(configMatch.confidence, 'high', 'Should be high confidence with synonym matching');
});
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test claims to test synonym expansion (task says "implement", commit says "add"), but it doesn't properly validate this feature. The commit message contains "add config validation logic" and the task is "Implement config validation with proper error handling". Both contain the words "config" and "validation" which match directly, so the test would pass with high confidence even if synonym expansion didn't work at all.

To properly test synonym expansion, you would need a test where the ONLY matching words are synonyms. For example:

  • Task: "implement feature"
  • Commit: "add feature"
  • Only "feature" matches directly; "implement" and "add" should match via synonyms.

Given the bidirectional synonym bug identified earlier (comment ID 001), this test might actually be passing for the wrong reasons.

Copilot uses AI. Check for mistakes.
Comment on lines +1333 to +1338
return str
.replace(/([a-z])([A-Z])/g, '$1 $2')
.replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2')
.toLowerCase()
.split(/[\s_-]+/)
.filter(w => w.length > 2);
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The splitCamelCase function filters out words with length <= 2 on line 1338, which could remove meaningful short words. For example:

  • "getUserID" would split into ["get", "user", "id"], but "id" would be filtered out
  • "setIO" would split into ["set", "io"], but "io" would be filtered out
  • "addUIElement" would split into ["add", "ui", "element"], but "ui" would be filtered out

While filtering out very short words might reduce noise, it could also remove legitimate technical acronyms and abbreviations that are meaningful in the context of task matching (e.g., "io", "id", "ui", "db", "os", "vm").

Suggested change
return str
.replace(/([a-z])([A-Z])/g, '$1 $2')
.replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2')
.toLowerCase()
.split(/[\s_-]+/)
.filter(w => w.length > 2);
// Common short technical tokens that are still meaningful and should not be filtered out
const SHORT_TECH_TOKENS = new Set([
'id', 'io', 'ui', 'db', 'os', 'vm',
'ci', 'cd', 'ip', 'db', 'api'
]);
return str
.replace(/([a-z])([A-Z])/g, '$1 $2')
.replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2')
.toLowerCase()
.split(/[\s_-]+/)
.filter(w => w.length > 2 || SHORT_TECH_TOKENS.has(w));

Copilot uses AI. Check for mistakes.
Comment on lines +1369 to +1372
for (const keyword of commitKeywords) {
const synonymList = SYNONYMS[keyword];
if (synonymList) {
synonymList.forEach(syn => expandedKeywords.add(syn));
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synonym expansion logic is not bidirectional. When a keyword from commits is found (e.g., "add"), the code adds its synonyms (e.g., "create", "implement") to expandedKeywords. However, if the task contains "implement" and the commit contains "add", this won't match because "implement" from the task is not in commitKeywords, so its synonyms (including "add") are never added to expandedKeywords.

The matching process filters taskWords against expandedKeywords, but expandedKeywords is only expanded based on what's in commitKeywords. This means the synonym mapping only works one way (commit keyword → task synonym) but not the other way (task keyword → commit synonym).

To fix this, when matching task words, you should also check if any task word has synonyms that exist in commitKeywords, or expand keywords bidirectionally.

Suggested change
for (const keyword of commitKeywords) {
const synonymList = SYNONYMS[keyword];
if (synonymList) {
synonymList.forEach(syn => expandedKeywords.add(syn));
// Expand synonym groups bidirectionally: if any word in a synonym group
// appears in commitKeywords, add the entire group to expandedKeywords.
if (SYNONYMS && typeof SYNONYMS === 'object') {
for (const [base, synonymList] of Object.entries(SYNONYMS)) {
const list = Array.isArray(synonymList) ? synonymList : [];
const groupWords = [base, ...list];
const hasIntersection = groupWords.some(word => commitKeywords.has(word));
if (hasIntersection) {
groupWords.forEach(word => expandedKeywords.add(word));
}

Copilot uses AI. Check for mistakes.
const camelWords = commits
.map(c => c.commit.message)
.join(' ')
.match(/[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g) || [];
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern /[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g for detecting camelCase words is too restrictive and will miss several valid camelCase patterns:

  • It requires at least 2 lowercase letters before the first uppercase letter, so it would miss patterns like "aB", "iPhone", or "eBay"
  • It won't match PascalCase words that start with uppercase like "UserService" or "HttpClient"
  • It requires exactly one lowercase letter at the start, so it would miss "IO" or "HTTPSConnection"

A more comprehensive pattern would be /[a-z]+[A-Z][a-zA-Z]*/g or /\b[a-z]+[A-Z][a-z]*|[A-Z][a-z]+[A-Z]/gi to catch various camelCase and PascalCase patterns.

Suggested change
.match(/[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g) || [];
.match(/\b(?:[a-z]+[A-Z][a-zA-Z]*|[A-Z][a-z]+[A-Z][a-zA-Z]*)\b/g) || [];

Copilot uses AI. Check for mistakes.
Comment on lines +1460 to +1470
} else if (score >= 0.35 && (fileMatch || commitMatch)) {
// Lowered threshold from 0.5 to 0.35 to catch more legitimate completions
confidence = 'high';
reason = `${Math.round(score * 100)}% keyword match, ${fileMatch ? 'file match' : 'commit match'}`;
matches.push({ task, reason, confidence });
} else if (score >= 0.3 || fileMatch) {
} else if (score >= 0.25 && fileMatch) {
// File match with moderate keyword overlap is high confidence
confidence = 'high';
reason = `${Math.round(score * 100)}% keyword match with file match`;
matches.push({ task, reason, confidence });
} else if (score >= 0.2 || fileMatch) {
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition on line 1465 (score >= 0.25 && fileMatch) is redundant because it's already covered by the previous condition on line 1460 (score >= 0.35 && (fileMatch || commitMatch)). Any score >= 0.35 with fileMatch would match line 1460 first, so line 1465 would only catch scores in the range [0.25, 0.35) with fileMatch.

Additionally, line 1470's condition score >= 0.2 || fileMatch would catch fileMatch cases when score < 0.25, but this creates an inconsistency:

  • score >= 0.35 with fileMatch → high confidence (line 1460)
  • 0.25 <= score < 0.35 with fileMatch → high confidence (line 1465)
  • score < 0.25 with fileMatch → medium confidence (line 1470)

This means a fileMatch alone (with score < 0.2) gets medium confidence, which seems reasonable, but the threshold boundaries should be documented more clearly to avoid confusion.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1465 to +1468
} else if (score >= 0.25 && fileMatch) {
// File match with moderate keyword overlap is high confidence
confidence = 'high';
reason = `${Math.round(score * 100)}% keyword match with file match`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop marking low-overlap file matches as high confidence

Lowering the high-confidence cutoff to 25% when fileMatch is true means tasks that only share a single token with a touched file now get marked high and will be auto-checked in autoReconcileTasks. For example, touching .github/scripts/__tests__/agents-guard.test.js while a task says Create keepalive-guard-utils.test.js covering pause label detection produces a ~29% keyword match (create/test) plus a substring file match on guard, so this branch would now flag it high even though the referenced file was never created; prior logic kept this as medium and avoided auto-checking unrelated tasks.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants