fix: Prevent keyword matcher from applying all labels by stranske · Pull Request #733 · stranske/Workflows

stranske · 2026-01-10T03:41:52Z

Source: Issue #123

Automated Status Summary

Scope

After merging PR #103 (multi-agent routing infrastructure), we need to:

Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
Streamline the Automated Status Summary to reduce clutter when using CLI agents
Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Context for Agent

Design Decisions & Constraints

1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
The keepalive loop now:
|  | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
|  | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
|  | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
The goal: For CLI agents (agent:* label), we should have exactly one updating comment () instead of accumulating 10+ comments per PR.
Requires PR #103 to be merged first
This round you MUST:
Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.

Related Issues/PRs

References

https://github.com/stranske/Workflows/compare/main...codex/issue-123?expand=1

Blockers & Dependencies

After merging PR #103 (multi-agent routing infrastructure), we need to:
1. Mark a task checkbox complete ONLY after verifying the implementation works.

Tasks

Pipeline Validation

After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
Verify task appendix appears in Codex prompt (check workflow logs)
Verify Codex works on actual tasks (not random infrastructure work)
Verify keepalive comment updates with iteration progress

GITHUB_STEP_SUMMARY

Add step summary output to agents-keepalive-loop.yml after agent run
Include: iteration number, tasks completed, files changed, outcome
Ensure summary is visible in workflow run UI

Conditional Status Summary

Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
Keep Scope/Tasks/Acceptance checkboxes for all cases
Pass agent type from workflow to the update_body job

Comment Pattern Cleanup

Acceptance criteria

CLI agent receives explicit tasks in prompt and works on them
Iteration results visible in Actions workflow run summary
PR body shows checkboxes but not workflow clutter when using CLI agents
UI Codex path (no agent label) continues to show full status summary
CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
State tracking is consolidated in the summary comment, not scattered

Dependencies

- Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first

Head SHA: c3e0699
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job	Result	Logs
Agents PR meta manager	❔ in progress	View run
CI Autofix Loop	✅ success	View run
Gate	✅ success	View run
Health 40 Sweep	✅ success	View run
Health 44 Gate Branch Protection	✅ success	View run
Health 45 Agents Guard	✅ success	View run
Health 50 Security Scan	✅ success	View run
Maint 52 Validate Workflows	✅ success	View run
PR 11 - Minimal invariant CI	✅ success	View run
Selftest CI	✅ success	View run

The keyword matching in label_matcher.py was returning 0.95 for ANY token overlap between the query and label descriptions. This caused ALL labels to be applied to issues since common words like 'issue', 'request', 'new', etc. appear in most label descriptions. Changes: - Add comprehensive stopword list to filter out common words - Require label NAME tokens to appear in query for 0.95 score - Use label NAME only (not description) for category matching - Fix: 'duplicate' label no longer matches feature keywords because its description contains 'request' Root cause: The original code at line 250-251 returned 0.95 if ANY tokenized word from the query matched ANY word in the label text (name + description), which was far too permissive. Test results before fix: - All 5 test issues got 15+ labels each (bug, enhancement, duplicate, documentation, agents:*, verify:*, etc.) Test results after fix: - Bug report → only 'bug' label (0.91) - Feature request → only 'enhancement' label (0.9) - Other issues → rely on semantic matching only

github-actions · 2026-01-10T03:42:53Z

github-actions · 2026-01-10T03:43:30Z

Automated Status Summary

Head SHA: c790e59
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	92.21%
Baseline	85.00%
Delta	+7.21%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`scripts/workflow_health_check.py`	62.6%	28
`scripts/classify_test_failures.py`	62.9%	37
`scripts/ledger_validate.py`	65.3%	63
`scripts/mypy_return_autofix.py`	82.6%	11
`scripts/ledger_migrate_base.py`	85.5%	13
`scripts/fix_cosmetic_aggregate.py`	92.3%	1
`scripts/coverage_history_append.py`	92.8%	2
`scripts/workflow_validator.py`	93.3%	4
`scripts/update_autofix_expectations.py`	93.9%	1
`scripts/pr_metrics_tracker.py`	95.7%	3
`scripts/generate_residual_trend.py`	96.6%	1
`scripts/build_autofix_pr_comment.py`	97.0%	2
`scripts/aggregate_agent_metrics.py`	97.2%	0
`scripts/fix_numpy_asserts.py`	98.1%	0
`scripts/sync_test_dependencies.py`	98.3%	1

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

After merging PR #103 (multi-agent routing infrastructure), we need to:

Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
Streamline the Automated Status Summary to reduce clutter when using CLI agents
Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Context for Agent

Design Decisions & Constraints

1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
The keepalive loop now:
|  | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
|  | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
|  | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
The goal: For CLI agents (agent:* label), we should have exactly one updating comment () instead of accumulating 10+ comments per PR.
Requires PR #103 to be merged first
This round you MUST:
Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.

Related Issues/PRs

References

https://github.com/stranske/Workflows/compare/main...codex/issue-123?expand=1

Blockers & Dependencies

After merging PR #103 (multi-agent routing infrastructure), we need to:
1. Mark a task checkbox complete ONLY after verifying the implementation works.

Tasks

Pipeline Validation

After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
Verify task appendix appears in Codex prompt (check workflow logs)
Verify Codex works on actual tasks (not random infrastructure work)
Verify keepalive comment updates with iteration progress

GITHUB_STEP_SUMMARY

Add step summary output to agents-keepalive-loop.yml after agent run
Include: iteration number, tasks completed, files changed, outcome
Ensure summary is visible in workflow run UI

Conditional Status Summary

Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
Keep Scope/Tasks/Acceptance checkboxes for all cases
Pass agent type from workflow to the update_body job

Comment Pattern Cleanup

Acceptance criteria

CLI agent receives explicit tasks in prompt and works on them
Iteration results visible in Actions workflow run summary
PR body shows checkboxes but not workflow clutter when using CLI agents
UI Codex path (no agent label) continues to show full status summary
CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
State tracking is consolidated in the summary comment, not scattered

Dependencies

- Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first

github-actions · 2026-01-10T03:43:52Z

🤖 Keepalive Loop Status

PR #733 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/28 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

Copilot

Pull request overview

This PR fixes a critical bug in the label keyword matching logic that caused all labels to be applied to every issue due to overly permissive token overlap detection. The fix introduces a stopwords list and requires label names (not just descriptions) to match query tokens for high-confidence scoring.

Changes:

Added a comprehensive stopwords list to filter common words that shouldn't trigger keyword matches
Modified keyword matching to require label NAME tokens to appear in the query (after stopword filtering) for the 0.95 confidence score
Changed category matching (bug/feature/docs) to check only label NAMEs instead of names + descriptions to prevent false positives

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`scripts/langchain/label_matcher.py`	Added `_COMMON_STOPWORDS` set and refactored `_keyword_match_score()` to require label name token matches and use name-only for category detection
`tests/scripts/test_label_matcher.py`	Updated test to use `type:documentation` instead of `quality` label to align with new name-only matching behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/langchain/label_matcher.py

- Document critical bug found: keyword matcher applying all labels - Add PR #733 to fix list (open) - Mark conflicted PRs as resolved (all 3 done) - Update Week 2 progress with evening re-testing results - Move Suite C/D fixes from 'needed' to 'applied' section

…results - Document PRs #733, #735 (deep label matcher fixes) - Record validation test results (issues #265-267) - Mark auto-label validation as complete - Key win: 2FA feature request now gets only 'enhancement' (was 3 labels)

- Document critical bug found: keyword matcher applying all labels - Add PR #733 to fix list (open) - Mark conflicted PRs as resolved (all 3 done) - Update Week 2 progress with evening re-testing results - Move Suite C/D fixes from 'needed' to 'applied' section

* docs: Update SHORT_TERM_PLAN with label matcher fixes and validation results - Document PRs #733, #735 (deep label matcher fixes) - Record validation test results (issues #265-267) - Mark auto-label validation as complete - Key win: 2FA feature request now gets only 'enhancement' (was 3 labels) * docs: Add LONG_TERM_PLAN for Phases 4-5 - Phase 4: Auto-pilot workflow, user guide, conflict resolution - Phase 5: Learning from feedback, multi-model arbitration - Infrastructure: Performance, monitoring, cost optimization - Risk assessment and success metrics - Prioritized 8-week roadmap

* docs: Update SHORT_TERM_PLAN with label matcher fixes and validation results - Document PRs #733, #735 (deep label matcher fixes) - Record validation test results (issues #265-267) - Mark auto-label validation as complete - Key win: 2FA feature request now gets only 'enhancement' (was 3 labels) * docs: Add LONG_TERM_PLAN for Phases 4-5 - Phase 4: Auto-pilot workflow, user guide, conflict resolution - Phase 5: Learning from feedback, multi-model arbitration - Infrastructure: Performance, monitoring, cost optimization - Risk assessment and success metrics - Prioritized 8-week roadmap * Expand cleanup_labels.py classifications - Add autofix:*, integration-*, agents:keepalive-nudge to functional - Add common component labels (app, engine, ui, backend, cli) - Add tech labels (javascript, python, github:actions) - Add domain labels (metrics, modeling, schema, etc.) - Reduces idiosyncratic labels from 150+ to 24 - Remaining 24 are legitimate project-specific labels

Copilot AI review requested due to automatic review settings January 10, 2026 03:41

stranske temporarily deployed to agent-standard January 10, 2026 03:42 — with GitHub Actions Inactive

github-actions bot added the autofix Opt-in automated formatting & lint remediation label Jan 10, 2026

Copilot started reviewing on behalf of stranske January 10, 2026 03:42 View session

Merge branch 'main' into fix/label-matcher-keyword-false-positives

c3e0699

Copilot AI reviewed Jan 10, 2026

View reviewed changes

scripts/langchain/label_matcher.py Show resolved Hide resolved

stranske temporarily deployed to agent-standard January 10, 2026 03:45 — with GitHub Actions Inactive

stranske mentioned this pull request Jan 10, 2026

docs: Update SHORT_TERM_PLAN with re-test results #734

Merged

28 tasks

stranske merged commit a769c1c into main Jan 10, 2026
141 checks passed

stranske deleted the fix/label-matcher-keyword-false-positives branch January 10, 2026 03:47

stranske mentioned this pull request Jan 10, 2026

fix: Prevent short tokens from matching keywords via prefix #735

Merged

28 tasks

stranske mentioned this pull request Jan 10, 2026

docs: Update SHORT_TERM_PLAN with label matcher progress #737

Merged

43 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Prevent keyword matcher from applying all labels#733

fix: Prevent keyword matcher from applying all labels#733
stranske merged 2 commits intomainfrom
fix/label-matcher-keyword-false-positives

stranske commented Jan 10, 2026 •

edited by agents-workflows-bot bot

Loading

Uh oh!

github-actions bot commented Jan 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Jan 10, 2026 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Scope

Context for Agent

Design Decisions & Constraints

Related Issues/PRs

References

Blockers & Dependencies

Tasks

Pipeline Validation

GITHUB_STEP_SUMMARY

Conditional Status Summary

Comment Pattern Cleanup

Acceptance criteria

Dependencies

Uh oh!

github-actions bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Context for Agent

Design Decisions & Constraints

Related Issues/PRs

References

Blockers & Dependencies

Tasks

Pipeline Validation

GITHUB_STEP_SUMMARY

Conditional Status Summary

Comment Pattern Cleanup

Acceptance criteria

Dependencies

Uh oh!

github-actions bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stranske commented Jan 10, 2026 •

edited by agents-workflows-bot bot

Loading

github-actions bot commented Jan 10, 2026 •

edited

Loading

github-actions bot commented Jan 10, 2026 •

edited

Loading

github-actions bot commented Jan 10, 2026 •

edited

Loading