Conversation
Scope is informational context, not actionable items to check off. 1. Add stripCheckboxesFromScope() function that: - Converts '- [ ] text' to '- text' (plain bullets) - Filters out placeholder content entirely - Removes empty lines 2. Update PR_META_FALLBACK_PLACEHOLDERS.scope: - Change from '- [ ] Scope section missing...' to italicized text - Matches standard PLACEHOLDERS.scope pattern 3. Update consumer repo template with same placeholder change 4. Update test to match new placeholder format This ensures scope sections in: - PR Automated Status Summary (via issue_scope_parser.js) - Verifier follow-up issues (via verifier_issue_formatter.js) ...display as informational text rather than checkbox items.
Add extractSourceSection() to parse Source section from PR body and include it in buildTaskAppendix(). This gives Codex visibility into linked parent issues and original PRs for additional context. Changes: - Add extractSourceSection() function to extract Source links from PR body - Update buildTaskAppendix() to accept options.prBody parameter - Add 'Source Context' section to agent prompt when Source links present - Add CONTEXT TIP to keepalive-instruction.md template - Add 5 new tests for source section extraction When a PR has a Source section with links like: - Original PR: #123 - Parent issue: #456 The agent prompt now includes these as 'Source Context' so Codex knows to check linked issues/PRs for additional background.
The src/trend_analysis/ directory contains CI test fixtures used to exercise the autofix pipeline workflows - not production code. These files (e.g., _autofix_trigger_sample.py, _ci_probe_faults.py) were showing 0% coverage and artificially dragging down overall metrics. Add 'src/trend_analysis/*' to [tool.coverage.run].omit patterns.
These CI autofix test fixtures were at risk of being mistakenly deleted or flagged as orphaned code. The README explains: - Why these files exist (autofix pipeline testing) - Why they're excluded from coverage - Which tests use them
The Workflows repo was missing the caller workflow for the bot comment handler - only the reusable version existed. This meant the 'autofix:bot-comments' label had no effect. Copied from templates/consumer-repo/.github/workflows/ to enable: - Manual trigger via 'autofix:bot-comments' label - Automatic trigger after Gate completion for agent PRs - Manual dispatch via workflow_dispatch
…r bug Addresses 3 review comments from Copilot and Codex: 1. **Copilot (regex)**: Changed `\s+` to `\s*` in checkbox pattern to handle edge cases where space after bracket is missing (e.g., `- [ ]text`) 2. **Copilot (tests)**: Added 11 new tests for `stripCheckboxesFromScope()`: - Checkbox-to-bullet conversion (unchecked, x, X) - Preservation of indentation - Filtering of placeholder content - Handling of empty lines and null input - Edge case: no space after bracket 3. **Codex P1 (scope placeholder alignment)**: Fixed critical regression where `agents_pr_meta_update_body.js` was still emitting checkbox format for scope fallback while `issue_scope_parser.js` changed to italicized format. Scope is now treated as informational (plain text), not actionable (checkbox). Tests: 471 passing (up from 424)
Add instruction #5 and CRITICAL section to keepalive-instruction.md asking Codex to output '✅ Completed: [task text]' in final summary. This enables reliable reconciliation parsing for completed tasks.
Automated Status SummaryHead SHA: 5f81e8c
Coverage Overview
Coverage Trend
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeNo scope information available Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #326 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
There was a problem hiding this comment.
Pull request overview
This PR improves the automated task completion detection system by adding synonym matching, camelCase parsing, and lowering keyword matching thresholds to catch more legitimate task completions.
Key Changes:
- Added synonym expansion for common action verbs (add/create/implement, fix/repair/resolve, etc.) to improve task-to-commit matching
- Implemented camelCase/PascalCase word splitting to extract keywords from file names and commit messages
- Lowered keyword matching thresholds from 50% to 35% (with file match) and added a 25% threshold tier to reduce false negatives
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
.github/templates/keepalive-instruction.md |
Added instructions for explicit completion markers in final summaries to help automation track completed tasks |
.github/scripts/keepalive_loop.js |
Implemented synonym expansion, camelCase splitting, and adjusted matching thresholds in the analyzeTaskCompletion function |
.github/scripts/__tests__/keepalive-loop.test.js |
Updated existing tests with more realistic unrelated tasks and added three new tests for threshold changes and synonym matching |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| test('analyzeTaskCompletion uses synonym expansion for better matching', async () => { | ||
| // Task says "implement", commit says "add" - synonyms should match | ||
| const commits = [ | ||
| { sha: 'abc123', commit: { message: 'feat: add config validation logic' } }, | ||
| ]; | ||
| const files = [ | ||
| { filename: 'src/config/validator.py' }, | ||
| ]; | ||
|
|
||
| const github = { | ||
| rest: { | ||
| repos: { | ||
| async compareCommits() { | ||
| return { data: { commits } }; | ||
| }, | ||
| }, | ||
| pulls: { | ||
| async listFiles() { | ||
| return { data: files }; | ||
| }, | ||
| }, | ||
| }, | ||
| }; | ||
|
|
||
| const taskText = ` | ||
| - [ ] Implement config validation with proper error handling | ||
| `; | ||
|
|
||
| const result = await analyzeTaskCompletion({ | ||
| github, | ||
| context: { repo: { owner: 'test', repo: 'repo' } }, | ||
| prNumber: 1, | ||
| baseSha: 'base123', | ||
| headSha: 'head456', | ||
| taskText, | ||
| core: buildCore(), | ||
| }); | ||
|
|
||
| // "implement" in task should match "add" in commit via synonyms | ||
| // plus "config" and "validation" match directly | ||
| const configMatch = result.matches.find(m => | ||
| m.task.toLowerCase().includes('config validation') | ||
| ); | ||
| assert.ok(configMatch, 'Should match config validation task'); | ||
| assert.equal(configMatch.confidence, 'high', 'Should be high confidence with synonym matching'); | ||
| }); |
There was a problem hiding this comment.
This test claims to test synonym expansion (task says "implement", commit says "add"), but it doesn't properly validate this feature. The commit message contains "add config validation logic" and the task is "Implement config validation with proper error handling". Both contain the words "config" and "validation" which match directly, so the test would pass with high confidence even if synonym expansion didn't work at all.
To properly test synonym expansion, you would need a test where the ONLY matching words are synonyms. For example:
- Task: "implement feature"
- Commit: "add feature"
- Only "feature" matches directly; "implement" and "add" should match via synonyms.
Given the bidirectional synonym bug identified earlier (comment ID 001), this test might actually be passing for the wrong reasons.
| return str | ||
| .replace(/([a-z])([A-Z])/g, '$1 $2') | ||
| .replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2') | ||
| .toLowerCase() | ||
| .split(/[\s_-]+/) | ||
| .filter(w => w.length > 2); |
There was a problem hiding this comment.
The splitCamelCase function filters out words with length <= 2 on line 1338, which could remove meaningful short words. For example:
- "getUserID" would split into ["get", "user", "id"], but "id" would be filtered out
- "setIO" would split into ["set", "io"], but "io" would be filtered out
- "addUIElement" would split into ["add", "ui", "element"], but "ui" would be filtered out
While filtering out very short words might reduce noise, it could also remove legitimate technical acronyms and abbreviations that are meaningful in the context of task matching (e.g., "io", "id", "ui", "db", "os", "vm").
| return str | |
| .replace(/([a-z])([A-Z])/g, '$1 $2') | |
| .replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2') | |
| .toLowerCase() | |
| .split(/[\s_-]+/) | |
| .filter(w => w.length > 2); | |
| // Common short technical tokens that are still meaningful and should not be filtered out | |
| const SHORT_TECH_TOKENS = new Set([ | |
| 'id', 'io', 'ui', 'db', 'os', 'vm', | |
| 'ci', 'cd', 'ip', 'db', 'api' | |
| ]); | |
| return str | |
| .replace(/([a-z])([A-Z])/g, '$1 $2') | |
| .replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2') | |
| .toLowerCase() | |
| .split(/[\s_-]+/) | |
| .filter(w => w.length > 2 || SHORT_TECH_TOKENS.has(w)); |
| for (const keyword of commitKeywords) { | ||
| const synonymList = SYNONYMS[keyword]; | ||
| if (synonymList) { | ||
| synonymList.forEach(syn => expandedKeywords.add(syn)); |
There was a problem hiding this comment.
The synonym expansion logic is not bidirectional. When a keyword from commits is found (e.g., "add"), the code adds its synonyms (e.g., "create", "implement") to expandedKeywords. However, if the task contains "implement" and the commit contains "add", this won't match because "implement" from the task is not in commitKeywords, so its synonyms (including "add") are never added to expandedKeywords.
The matching process filters taskWords against expandedKeywords, but expandedKeywords is only expanded based on what's in commitKeywords. This means the synonym mapping only works one way (commit keyword → task synonym) but not the other way (task keyword → commit synonym).
To fix this, when matching task words, you should also check if any task word has synonyms that exist in commitKeywords, or expand keywords bidirectionally.
| for (const keyword of commitKeywords) { | |
| const synonymList = SYNONYMS[keyword]; | |
| if (synonymList) { | |
| synonymList.forEach(syn => expandedKeywords.add(syn)); | |
| // Expand synonym groups bidirectionally: if any word in a synonym group | |
| // appears in commitKeywords, add the entire group to expandedKeywords. | |
| if (SYNONYMS && typeof SYNONYMS === 'object') { | |
| for (const [base, synonymList] of Object.entries(SYNONYMS)) { | |
| const list = Array.isArray(synonymList) ? synonymList : []; | |
| const groupWords = [base, ...list]; | |
| const hasIntersection = groupWords.some(word => commitKeywords.has(word)); | |
| if (hasIntersection) { | |
| groupWords.forEach(word => expandedKeywords.add(word)); | |
| } |
| const camelWords = commits | ||
| .map(c => c.commit.message) | ||
| .join(' ') | ||
| .match(/[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g) || []; |
There was a problem hiding this comment.
The regex pattern /[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g for detecting camelCase words is too restrictive and will miss several valid camelCase patterns:
- It requires at least 2 lowercase letters before the first uppercase letter, so it would miss patterns like "aB", "iPhone", or "eBay"
- It won't match PascalCase words that start with uppercase like "UserService" or "HttpClient"
- It requires exactly one lowercase letter at the start, so it would miss "IO" or "HTTPSConnection"
A more comprehensive pattern would be /[a-z]+[A-Z][a-zA-Z]*/g or /\b[a-z]+[A-Z][a-z]*|[A-Z][a-z]+[A-Z]/gi to catch various camelCase and PascalCase patterns.
| .match(/[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g) || []; | |
| .match(/\b(?:[a-z]+[A-Z][a-zA-Z]*|[A-Z][a-z]+[A-Z][a-zA-Z]*)\b/g) || []; |
| } else if (score >= 0.35 && (fileMatch || commitMatch)) { | ||
| // Lowered threshold from 0.5 to 0.35 to catch more legitimate completions | ||
| confidence = 'high'; | ||
| reason = `${Math.round(score * 100)}% keyword match, ${fileMatch ? 'file match' : 'commit match'}`; | ||
| matches.push({ task, reason, confidence }); | ||
| } else if (score >= 0.3 || fileMatch) { | ||
| } else if (score >= 0.25 && fileMatch) { | ||
| // File match with moderate keyword overlap is high confidence | ||
| confidence = 'high'; | ||
| reason = `${Math.round(score * 100)}% keyword match with file match`; | ||
| matches.push({ task, reason, confidence }); | ||
| } else if (score >= 0.2 || fileMatch) { |
There was a problem hiding this comment.
The condition on line 1465 (score >= 0.25 && fileMatch) is redundant because it's already covered by the previous condition on line 1460 (score >= 0.35 && (fileMatch || commitMatch)). Any score >= 0.35 with fileMatch would match line 1460 first, so line 1465 would only catch scores in the range [0.25, 0.35) with fileMatch.
Additionally, line 1470's condition score >= 0.2 || fileMatch would catch fileMatch cases when score < 0.25, but this creates an inconsistency:
- score >= 0.35 with fileMatch → high confidence (line 1460)
- 0.25 <= score < 0.35 with fileMatch → high confidence (line 1465)
- score < 0.25 with fileMatch → medium confidence (line 1470)
This means a fileMatch alone (with score < 0.2) gets medium confidence, which seems reasonable, but the threshold boundaries should be documented more clearly to avoid confusion.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } else if (score >= 0.25 && fileMatch) { | ||
| // File match with moderate keyword overlap is high confidence | ||
| confidence = 'high'; | ||
| reason = `${Math.round(score * 100)}% keyword match with file match`; |
There was a problem hiding this comment.
Stop marking low-overlap file matches as high confidence
Lowering the high-confidence cutoff to 25% when fileMatch is true means tasks that only share a single token with a touched file now get marked high and will be auto-checked in autoReconcileTasks. For example, touching .github/scripts/__tests__/agents-guard.test.js while a task says Create keepalive-guard-utils.test.js covering pause label detection produces a ~29% keyword match (create/test) plus a substring file match on guard, so this branch would now flag it high even though the referenced file was never created; prior logic kept this as medium and avoided auto-checking unrelated tasks.
Useful? React with 👍 / 👎.
No description provided.