Fix/scope checkbox stripping by stranske · Pull Request #326 · stranske/Workflows

stranske · 2025-12-30T17:27:09Z

No description provided.

Scope is informational context, not actionable items to check off. 1. Add stripCheckboxesFromScope() function that: - Converts '- [ ] text' to '- text' (plain bullets) - Filters out placeholder content entirely - Removes empty lines 2. Update PR_META_FALLBACK_PLACEHOLDERS.scope: - Change from '- [ ] Scope section missing...' to italicized text - Matches standard PLACEHOLDERS.scope pattern 3. Update consumer repo template with same placeholder change 4. Update test to match new placeholder format This ensures scope sections in: - PR Automated Status Summary (via issue_scope_parser.js) - Verifier follow-up issues (via verifier_issue_formatter.js) ...display as informational text rather than checkbox items.

Add extractSourceSection() to parse Source section from PR body and include it in buildTaskAppendix(). This gives Codex visibility into linked parent issues and original PRs for additional context. Changes: - Add extractSourceSection() function to extract Source links from PR body - Update buildTaskAppendix() to accept options.prBody parameter - Add 'Source Context' section to agent prompt when Source links present - Add CONTEXT TIP to keepalive-instruction.md template - Add 5 new tests for source section extraction When a PR has a Source section with links like: - Original PR: #123 - Parent issue: #456 The agent prompt now includes these as 'Source Context' so Codex knows to check linked issues/PRs for additional background.

The src/trend_analysis/ directory contains CI test fixtures used to exercise the autofix pipeline workflows - not production code. These files (e.g., _autofix_trigger_sample.py, _ci_probe_faults.py) were showing 0% coverage and artificially dragging down overall metrics. Add 'src/trend_analysis/*' to [tool.coverage.run].omit patterns.

These CI autofix test fixtures were at risk of being mistakenly deleted or flagged as orphaned code. The README explains: - Why these files exist (autofix pipeline testing) - Why they're excluded from coverage - Which tests use them

The Workflows repo was missing the caller workflow for the bot comment handler - only the reusable version existed. This meant the 'autofix:bot-comments' label had no effect. Copied from templates/consumer-repo/.github/workflows/ to enable: - Manual trigger via 'autofix:bot-comments' label - Automatic trigger after Gate completion for agent PRs - Manual dispatch via workflow_dispatch

…r bug Addresses 3 review comments from Copilot and Codex: 1. **Copilot (regex)**: Changed `\s+` to `\s*` in checkbox pattern to handle edge cases where space after bracket is missing (e.g., `- [ ]text`) 2. **Copilot (tests)**: Added 11 new tests for `stripCheckboxesFromScope()`: - Checkbox-to-bullet conversion (unchecked, x, X) - Preservation of indentation - Filtering of placeholder content - Handling of empty lines and null input - Edge case: no space after bracket 3. **Codex P1 (scope placeholder alignment)**: Fixed critical regression where `agents_pr_meta_update_body.js` was still emitting checkbox format for scope fallback while `issue_scope_parser.js` changed to italicized format. Scope is now treated as informational (plain text), not actionable (checkbox). Tests: 471 passing (up from 424)

Add instruction #5 and CRITICAL section to keepalive-instruction.md asking Codex to output '✅ Completed: [task text]' in final summary. This enables reliable reconciliation parsing for completed tasks.

agents-workflows-bot · 2025-12-30T17:27:29Z

⚠️ Action Required: Unable to determine source issue for PR #326. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

github-actions · 2025-12-30T17:27:38Z

Automated Status Summary

Head SHA: 5f81e8c
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	0.00%
Baseline	85.00%
Delta	-85.00%
Minimum	70.00%
Status	❌ Below minimum

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

github-actions · 2025-12-30T17:28:00Z

🤖 Keepalive Loop Status

PR #326 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Gate	success
Tasks	0/0 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

Copilot

Pull request overview

This PR improves the automated task completion detection system by adding synonym matching, camelCase parsing, and lowering keyword matching thresholds to catch more legitimate task completions.

Key Changes:

Added synonym expansion for common action verbs (add/create/implement, fix/repair/resolve, etc.) to improve task-to-commit matching
Implemented camelCase/PascalCase word splitting to extract keywords from file names and commit messages
Lowered keyword matching thresholds from 50% to 35% (with file match) and added a 25% threshold tier to reduce false negatives

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`.github/templates/keepalive-instruction.md`	Added instructions for explicit completion markers in final summaries to help automation track completed tasks
`.github/scripts/keepalive_loop.js`	Implemented synonym expansion, camelCase splitting, and adjusted matching thresholds in the `analyzeTaskCompletion` function
`.github/scripts/__tests__/keepalive-loop.test.js`	Updated existing tests with more realistic unrelated tasks and added three new tests for threshold changes and synonym matching

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-30T17:30:02Z

.github/scripts/__tests__/keepalive-loop.test.js

+test('analyzeTaskCompletion uses synonym expansion for better matching', async () => {
+  // Task says "implement", commit says "add" - synonyms should match
+  const commits = [
+    { sha: 'abc123', commit: { message: 'feat: add config validation logic' } },
+  ];
+  const files = [
+    { filename: 'src/config/validator.py' },
+  ];
+
+  const github = {
+    rest: {
+      repos: {
+        async compareCommits() {
+          return { data: { commits } };
+        },
+      },
+      pulls: {
+        async listFiles() {
+          return { data: files };
+        },
+      },
+    },
+  };
+
+  const taskText = `
+- [ ] Implement config validation with proper error handling
+`;
+
+  const result = await analyzeTaskCompletion({
+    github,
+    context: { repo: { owner: 'test', repo: 'repo' } },
+    prNumber: 1,
+    baseSha: 'base123',
+    headSha: 'head456',
+    taskText,
+    core: buildCore(),
+  });
+
+  // "implement" in task should match "add" in commit via synonyms
+  // plus "config" and "validation" match directly
+  const configMatch = result.matches.find(m => 
+    m.task.toLowerCase().includes('config validation')
+  );
+  assert.ok(configMatch, 'Should match config validation task');
+  assert.equal(configMatch.confidence, 'high', 'Should be high confidence with synonym matching');
+});


This test claims to test synonym expansion (task says "implement", commit says "add"), but it doesn't properly validate this feature. The commit message contains "add config validation logic" and the task is "Implement config validation with proper error handling". Both contain the words "config" and "validation" which match directly, so the test would pass with high confidence even if synonym expansion didn't work at all.

To properly test synonym expansion, you would need a test where the ONLY matching words are synonyms. For example:

Task: "implement feature"

Commit: "add feature"

Only "feature" matches directly; "implement" and "add" should match via synonyms.

Given the bidirectional synonym bug identified earlier (comment ID 001), this test might actually be passing for the wrong reasons.

Copilot · 2025-12-30T17:30:03Z

.github/scripts/keepalive_loop.js

+    return str
+      .replace(/([a-z])([A-Z])/g, '$1 $2')
+      .replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2')
+      .toLowerCase()
+      .split(/[\s_-]+/)
+      .filter(w => w.length > 2);


The splitCamelCase function filters out words with length <= 2 on line 1338, which could remove meaningful short words. For example:

"getUserID" would split into ["get", "user", "id"], but "id" would be filtered out

"setIO" would split into ["set", "io"], but "io" would be filtered out

"addUIElement" would split into ["add", "ui", "element"], but "ui" would be filtered out

While filtering out very short words might reduce noise, it could also remove legitimate technical acronyms and abbreviations that are meaningful in the context of task matching (e.g., "io", "id", "ui", "db", "os", "vm").

Suggested change

return str

.replace(/([a-z])([A-Z])/g, '$1 $2')

.replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2')

.toLowerCase()

.split(/[\s_-]+/)

.filter(w => w.length > 2);

// Common short technical tokens that are still meaningful and should not be filtered out

const SHORT_TECH_TOKENS = new Set([

'id', 'io', 'ui', 'db', 'os', 'vm',

'ci', 'cd', 'ip', 'db', 'api'

]);

return str

.replace(/([a-z])([A-Z])/g, '$1 $2')

.replace(/([A-Z]+)([A-Z][a-z])/g, '$1 $2')

.toLowerCase()

.split(/[\s_-]+/)

.filter(w => w.length > 2 || SHORT_TECH_TOKENS.has(w));

Copilot · 2025-12-30T17:30:03Z

.github/scripts/keepalive_loop.js

+  for (const keyword of commitKeywords) {
+    const synonymList = SYNONYMS[keyword];
+    if (synonymList) {
+      synonymList.forEach(syn => expandedKeywords.add(syn));


The synonym expansion logic is not bidirectional. When a keyword from commits is found (e.g., "add"), the code adds its synonyms (e.g., "create", "implement") to expandedKeywords. However, if the task contains "implement" and the commit contains "add", this won't match because "implement" from the task is not in commitKeywords, so its synonyms (including "add") are never added to expandedKeywords.

The matching process filters taskWords against expandedKeywords, but expandedKeywords is only expanded based on what's in commitKeywords. This means the synonym mapping only works one way (commit keyword → task synonym) but not the other way (task keyword → commit synonym).

To fix this, when matching task words, you should also check if any task word has synonyms that exist in commitKeywords, or expand keywords bidirectionally.

Suggested change

for (const keyword of commitKeywords) {

const synonymList = SYNONYMS[keyword];

if (synonymList) {

synonymList.forEach(syn => expandedKeywords.add(syn));

// Expand synonym groups bidirectionally: if any word in a synonym group

// appears in commitKeywords, add the entire group to expandedKeywords.

if (SYNONYMS && typeof SYNONYMS === 'object') {

for (const [base, synonymList] of Object.entries(SYNONYMS)) {

const list = Array.isArray(synonymList) ? synonymList : [];

const groupWords = [base, ...list];

const hasIntersection = groupWords.some(word => commitKeywords.has(word));

if (hasIntersection) {

groupWords.forEach(word => expandedKeywords.add(word));

}

Copilot · 2025-12-30T17:30:03Z

.github/scripts/keepalive_loop.js

+  const camelWords = commits
+    .map(c => c.commit.message)
+    .join(' ')
+    .match(/[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g) || [];


The regex pattern /[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g for detecting camelCase words is too restrictive and will miss several valid camelCase patterns:

It requires at least 2 lowercase letters before the first uppercase letter, so it would miss patterns like "aB", "iPhone", or "eBay"

It won't match PascalCase words that start with uppercase like "UserService" or "HttpClient"

It requires exactly one lowercase letter at the start, so it would miss "IO" or "HTTPSConnection"

A more comprehensive pattern would be /[a-z]+[A-Z][a-zA-Z]*/g or /\b[a-z]+[A-Z][a-z]*|[A-Z][a-z]+[A-Z]/gi to catch various camelCase and PascalCase patterns.

Suggested change

.match(/[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g) || [];

.match(/\b(?:[a-z]+[A-Z][a-zA-Z]*|[A-Z][a-z]+[A-Z][a-zA-Z]*)\b/g) || [];

Copilot · 2025-12-30T17:30:04Z

.github/scripts/keepalive_loop.js

+    } else if (score >= 0.35 && (fileMatch || commitMatch)) {
+      // Lowered threshold from 0.5 to 0.35 to catch more legitimate completions
      confidence = 'high';
      reason = `${Math.round(score * 100)}% keyword match, ${fileMatch ? 'file match' : 'commit match'}`;
      matches.push({ task, reason, confidence });
-    } else if (score >= 0.3 || fileMatch) {
+    } else if (score >= 0.25 && fileMatch) {
+      // File match with moderate keyword overlap is high confidence
+      confidence = 'high';
+      reason = `${Math.round(score * 100)}% keyword match with file match`;
+      matches.push({ task, reason, confidence });
+    } else if (score >= 0.2 || fileMatch) {


The condition on line 1465 (score >= 0.25 && fileMatch) is redundant because it's already covered by the previous condition on line 1460 (score >= 0.35 && (fileMatch || commitMatch)). Any score >= 0.35 with fileMatch would match line 1460 first, so line 1465 would only catch scores in the range [0.25, 0.35) with fileMatch.

Additionally, line 1470's condition score >= 0.2 || fileMatch would catch fileMatch cases when score < 0.25, but this creates an inconsistency:

score >= 0.35 with fileMatch → high confidence (line 1460)

0.25 <= score < 0.35 with fileMatch → high confidence (line 1465)

score < 0.25 with fileMatch → medium confidence (line 1470)

This means a fileMatch alone (with score < 0.2) gets medium confidence, which seems reasonable, but the threshold boundaries should be documented more clearly to avoid confusion.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-30T17:31:39Z

.github/scripts/keepalive_loop.js

+    } else if (score >= 0.25 && fileMatch) {
+      // File match with moderate keyword overlap is high confidence
+      confidence = 'high';
+      reason = `${Math.round(score * 100)}% keyword match with file match`;


Stop marking low-overlap file matches as high confidence

Lowering the high-confidence cutoff to 25% when fileMatch is true means tasks that only share a single token with a touched file now get marked high and will be auto-checked in autoReconcileTasks. For example, touching .github/scripts/__tests__/agents-guard.test.js while a task says Create keepalive-guard-utils.test.js covering pause label detection produces a ~29% keyword match (create/test) plus a substring file match on guard, so this branch would now flag it high even though the referenced file was never created; prior logic kept this as medium and avoided auto-checking unrelated tasks.

Useful? React with 👍 / 👎.

stranske and others added 9 commits December 30, 2025 16:07

Merge branch 'main' into fix/scope-checkbox-stripping

4957a0e

chore(codex-autofix): apply updates (PR #322)

6c28afe

feat: ask Codex to post explicit task completion markers

b9228c3

Add instruction #5 and CRITICAL section to keepalive-instruction.md asking Codex to output '✅ Completed: [task text]' in final summary. This enables reliable reconciliation parsing for completed tasks.

Copilot AI review requested due to automatic review settings December 30, 2025 17:27

Merge branch 'main' into fix/scope-checkbox-stripping

609418d

Copilot started reviewing on behalf of stranske December 30, 2025 17:27 View session

stranske temporarily deployed to agent-standard December 30, 2025 17:27 — with GitHub Actions Inactive

stranske merged commit 10ab48c into main Dec 30, 2025
37 checks passed

stranske deleted the fix/scope-checkbox-stripping branch December 30, 2025 17:29

Copilot AI reviewed Dec 30, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/scope checkbox stripping#326

Fix/scope checkbox stripping#326
stranske merged 10 commits intomainfrom
fix/scope-checkbox-stripping

stranske commented Dec 30, 2025

Uh oh!

agents-workflows-bot bot commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-  for (const keyword of commitKeywords) {
-    const synonymList = SYNONYMS[keyword];
-    if (synonymList) {
-      synonymList.forEach(syn => expandedKeywords.add(syn));
+  // Expand synonym groups bidirectionally: if any word in a synonym group
+  // appears in commitKeywords, add the entire group to expandedKeywords.
+  if (SYNONYMS && typeof SYNONYMS === 'object') {
+    for (const [base, synonymList] of Object.entries(SYNONYMS)) {
+      const list = Array.isArray(synonymList) ? synonymList : [];
+      const groupWords = [base, ...list];
+      const hasIntersection = groupWords.some(word => commitKeywords.has(word));
+      if (hasIntersection) {
+        groupWords.forEach(word => expandedKeywords.add(word));
+      }

	.match(/[a-zA-Z][a-z]+[A-Z][a-zA-Z]*/g) \|\| [];
	.match(/\b(?:[a-z]+[A-Z][a-zA-Z]\|[A-Z][a-z]+[A-Z][a-zA-Z])\b/g) \|\| [];

Conversation

stranske commented Dec 30, 2025

Uh oh!

agents-workflows-bot bot commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 30, 2025 •

edited

Loading

github-actions bot commented Dec 30, 2025 •

edited

Loading