fix: Prevent short tokens from matching keywords via prefix#735
fix: Prevent short tokens from matching keywords via prefix#735
Conversation
The _token_matches_keyword function was allowing single-character
tokens to match keywords via prefix matching. For example:
- 'd' (from 'Describe') matched 'defect'
- 'a' could match 'add'
This caused feature requests to get 'bug' label (0.91) because
typical issue text contains short tokens that prefix-match bug
keywords.
Fix: Require token to be >= 4 chars before allowing prefix matching
in either direction.
Before: token='d', keyword='defect' → True (defect.startswith('d'))
After: token='d', keyword='defect' → False (len('d') < 4)
|
Status | ✅ no new diagnostics |
Automated Status SummaryHead SHA: b413caf
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeAfter merging PR #103 (multi-agent routing infrastructure), we need to:
Context for AgentDesign Decisions & Constraints
Related Issues/PRsReferencesBlockers & Dependencies
TasksPipeline Validation
GITHUB_STEP_SUMMARY
Conditional Status Summary
Comment Pattern Cleanup
Acceptance criteria
Dependencies
|
🤖 Keepalive Loop StatusPR #735 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where short tokens (< 4 characters) from issue text were incorrectly matching keywords via prefix matching, causing issues to receive incorrect labels. The fix adds a length requirement (>= 4 chars) to prevent short tokens like "d" (from "Describe") from matching keywords like "defect".
Changes:
- Modified
_token_matches_keywordto require tokens be at least 4 characters long before allowing prefix matching - Added inline comments explaining the rationale for the 4-character minimum
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _token_matches_keyword(token: str, keyword: str) -> bool: | ||
| if token == keyword: | ||
| return True | ||
| # Only allow prefix matching for tokens >= 4 chars to avoid false positives | ||
| # from short tokens like "d" matching "defect" or "a" matching "add" | ||
| if len(token) >= 4 and token.startswith(keyword): | ||
| return True | ||
| return bool(len(keyword) >= 4 and keyword.startswith(token)) | ||
| # Check if keyword starts with token (both must be >= 4 chars) | ||
| return len(token) >= 4 and len(keyword) >= 4 and keyword.startswith(token) |
There was a problem hiding this comment.
The fix correctly prevents short tokens from matching keywords via prefix. However, there's no test coverage for this specific scenario. Consider adding a test case that verifies tokens shorter than 4 characters (e.g., "d" from "Describe") do not match keywords like "defect", which was the root cause of the bug described in the PR.
* docs: Update SHORT_TERM_PLAN with label matcher fixes and validation results - Document PRs #733, #735 (deep label matcher fixes) - Record validation test results (issues #265-267) - Mark auto-label validation as complete - Key win: 2FA feature request now gets only 'enhancement' (was 3 labels) * docs: Add LONG_TERM_PLAN for Phases 4-5 - Phase 4: Auto-pilot workflow, user guide, conflict resolution - Phase 5: Learning from feedback, multi-model arbitration - Infrastructure: Performance, monitoring, cost optimization - Risk assessment and success metrics - Prioritized 8-week roadmap
* docs: Update SHORT_TERM_PLAN with label matcher fixes and validation results - Document PRs #733, #735 (deep label matcher fixes) - Record validation test results (issues #265-267) - Mark auto-label validation as complete - Key win: 2FA feature request now gets only 'enhancement' (was 3 labels) * docs: Add LONG_TERM_PLAN for Phases 4-5 - Phase 4: Auto-pilot workflow, user guide, conflict resolution - Phase 5: Learning from feedback, multi-model arbitration - Infrastructure: Performance, monitoring, cost optimization - Risk assessment and success metrics - Prioritized 8-week roadmap * Expand cleanup_labels.py classifications - Add autofix:*, integration-*, agents:keepalive-nudge to functional - Add common component labels (app, engine, ui, backend, cli) - Add tech labels (javascript, python, github:actions) - Add domain labels (metrics, modeling, schema, etc.) - Reduces idiosyncratic labels from 150+ to 24 - Remaining 24 are legitimate project-specific labels
Automated Status Summary
Scope
After merging PR #103 (multi-agent routing infrastructure), we need to:
GITHUB_STEP_SUMMARYoutput so iteration results are visible in the Actions UIContext for Agent
Design Decisions & Constraints
<!-- keepalive-loop-summary -->| github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |<!-- keepalive-state:v1 -->| agents-workflows-bot[bot] | State tracking |<!-- keepalive-round: N -->| stranske | OLD: Instruction comment | ❌ CLI agents dont need this |agent:*label), we should have exactly one updating comment (<!-- keepalive-loop-summary -->) instead of accumulating 10+ comments per PR.Related Issues/PRs
References
Blockers & Dependencies
Tasks
Pipeline Validation
agent:codexlabelGITHUB_STEP_SUMMARY
agents-keepalive-loop.ymlafter agent runConditional Status Summary
buildStatusBlock()inagents_pr_meta_update_body.jsto acceptagentTypeparameteragentTypeis set (CLI agent): hide workflow table, hide head SHA/required checksComment Pattern Cleanup
agent:*label):<!-- gate-summary: -->comment posting (use step summary instead)<!-- keepalive-round: N -->instruction comments (task appendix replaces this)<!-- keepalive-loop-summary -->to be the single source of truthagent:*label):<!-- gate-summary: -->commentagent_typeoutput to detect job so downstream workflows know the modeagents-pr-meta.ymlto conditionally skip gate summary for CLI agent PRsAcceptance criteria
Dependencies
Head SHA: 9be18ab
Latest Runs: ❔ in progress — Gate
Required: gate: ❔ in progress