t1218: Improve semantic dedup to prevent duplicate task creation#1969
t1218: Improve semantic dedup to prevent duplicate task creation#1969marcusquinn merged 1 commit intomainfrom
Conversation
- Reduce stop words list: keep action verbs (investigate, fix, add, etc.) as signal keywords - Scan recently completed tasks (today + yesterday) in addition to open tasks - Strengthen AI semantic dedup prompt with stricter duplicate criteria - Add keyword safety net: 4+ keyword matches overrides AI 'not duplicate' verdict - Add duplicate prevention rule to AI reasoning system prompt - Add tests for keyword pre-filter and recently-completed task scanning
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the semantic deduplication system to prevent the creation of redundant tasks by the AI supervisor's reasoning engine. It addresses several gaps in the previous system, ensuring that new tasks are not generated for issues already being investigated or recently resolved, thereby improving efficiency and reducing token waste. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 19 15:53:16 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
WalkthroughThe PR enhances the semantic deduplication workflow by expanding it to scan recently completed tasks (last 24h) alongside open tasks. It introduces a dedicated scoring helper for keyword matching and implements a safety-net mechanism that applies keyword-based duplicate detection when AI semantic dedup returns a not-duplicate verdict, with a configurable threshold. New tests validate both the keyword pre-filter and recently-completed task detection behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Workflow as Task Workflow
participant PreFilter as Keyword Pre-filter
participant AISemantic as AI Semantic Check
participant SafetyNet as Safety Net
participant Decision as Dedup Decision
Workflow->>PreFilter: Score candidates from open + recent (24h) tasks
PreFilter->>PreFilter: _score_task_line for each task
PreFilter-->>AISemantic: best_id, best_count, candidates
AISemantic->>AISemantic: AI evaluates similarity
AISemantic-->>SafetyNet: verdict (duplicate/not-duplicate)
alt Not-Duplicate Verdict
SafetyNet->>SafetyNet: Check best_count >= 4 threshold
alt Threshold Met
SafetyNet-->>Decision: Force duplicate with best_id
else Threshold Not Met
SafetyNet-->>Decision: Not-duplicate confirmed
end
else Duplicate Verdict
AISemantic-->>Decision: Duplicate confirmed
end
Decision-->>Workflow: Final dedup decision
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Code Review
This is a comprehensive and well-executed improvement to the semantic deduplication system. The multi-pronged approach of refining stop words, scanning completed tasks, strengthening the AI prompt, and adding a keyword safety net is excellent. The addition of new tests provides good coverage for these changes. I have a couple of suggestions to improve the robustness of the grep commands to fully align with the repository's style guide for error handling.
| # Scan open tasks | ||
| while IFS= read -r task_line; do | ||
| _score_task_line "$task_line" | ||
| done < <(grep -E '^\s*- \[ \] t[0-9]' "$todo_file" 2>/dev/null) |
There was a problem hiding this comment.
Under set -e, this grep command can cause the script to exit if $todo_file doesn't exist, as grep will return a non-zero exit code. The 2>/dev/null only suppresses the error message, not the exit code. The repository style guide recommends using || true to guard against such failures.
| done < <(grep -E '^\s*- \[ \] t[0-9]' "$todo_file" 2>/dev/null) | |
| done < <(grep -E '^\s*- \[ \] t[0-9]' "$todo_file" 2>/dev/null || true) |
| if [[ "$completed_date" == "$today" || "$completed_date" == "$yesterday" ]]; then | ||
| _score_task_line "$task_line" | ||
| fi | ||
| done < <(grep -E '^\s*- \[x\] t[0-9]' "$todo_file" 2>/dev/null) |
There was a problem hiding this comment.
Similar to the grep for open tasks, this command can cause the script to exit under set -e if the file doesn't exist. Guarding with || true as per the repository style guide will make this more robust.
| done < <(grep -E '^\s*- \[x\] t[0-9]' "$todo_file" 2>/dev/null) | |
| done < <(grep -E '^\s*- \[x\] t[0-9]' "$todo_file" 2>/dev/null || true) |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/supervisor/ai-actions.sh:
- Around line 658-680: Guard the numeric comparisons of best_count to avoid
arithmetic errors when it's empty or non-numeric: where the safety-net uses
best_count (the variable set from cut of candidates) — e.g., the check if [[ -n
"$best_id" && "$best_count" -ge 4 ]] and the similar subsequent check — replace
the raw compare with a defaulted/numeric-safe form (use ${best_count:-0} or
validate/coerce to a number before comparing) so the -ge test never receives an
empty value; keep the existing log and return behavior for the duplicate branch
and ensure any other places referencing best_count use the same defensive
defaulting.
In `@tests/test-ai-actions.sh`:
- Around line 2163-2169: The comment in tests/test-ai-actions.sh incorrectly
states that `_keyword_prefilter_open_tasks` only scans open tasks and that
completed-task scanning is in `_check_similar_open_task`; update or remove this
misleading comment to reflect that `_keyword_prefilter_open_tasks` now directly
scans recently completed tasks (see the newly added completed-task scan in
`_keyword_prefilter_open_tasks`), or delete the comment entirely since Test 31
already covers the completed-task path—ensure references to
`_check_similar_open_task` are not left implying it handles the completed-task
scan.
| # Check the best keyword match score for safety-net logic below | ||
| local best_id best_count | ||
| best_id=$(printf '%s' "$candidates" | head -1 | cut -d'|' -f1) | ||
| best_count=$(printf '%s' "$candidates" | head -1 | cut -d'|' -f2) | ||
|
|
||
| # Step 2: AI semantic check (if enabled and CLI available) | ||
| if [[ "${AI_SEMANTIC_DEDUP_USE_AI:-true}" == "true" ]]; then | ||
| local ai_result | ||
| if ai_result=$(_ai_semantic_dedup_check "$title" "$candidates"); then | ||
| printf '%s' "$ai_result" | ||
| return 0 | ||
| fi | ||
| # AI said not a duplicate or was unavailable — trust the AI over keywords | ||
| # AI said not a duplicate or was unavailable. | ||
| # Safety net: if keyword overlap is very high (4+ matches), treat as | ||
| # duplicate anyway. The AI can be wrong, but 4+ shared distinctive | ||
| # keywords is strong evidence of the same topic. This prevents the | ||
| # "9 tasks about stale evaluating" scenario where the AI kept saying | ||
| # "not duplicate" despite obvious overlap. | ||
| if [[ -n "$best_id" && "$best_count" -ge 4 ]]; then | ||
| log_info "AI Actions: semantic dedup safety net: AI said not duplicate but $best_id has $best_count keyword matches — treating as duplicate" | ||
| printf '%s' "$best_id" | ||
| return 0 | ||
| fi |
There was a problem hiding this comment.
Safety net is a pragmatic override — consider guarding best_count against empty/non-numeric values.
The 4+ keyword threshold as a fallback when the AI disagrees is a sound heuristic. However, if best_count is ever empty (e.g., malformed candidate output), the -ge comparison on line 676 will throw an arithmetic error under set -e, aborting the function.
While the current code path makes this unlikely (candidates are well-formed by construction), a ${best_count:-0} default costs nothing and prevents a potential crash.
Defensive fix
- if [[ -n "$best_id" && "$best_count" -ge 4 ]]; then
+ if [[ -n "$best_id" && "${best_count:-0}" -ge 4 ]]; thenAnd similarly at line 686:
- if [[ -n "$best_id" && "$best_count" -ge 3 ]]; then
+ if [[ -n "$best_id" && "${best_count:-0}" -ge 3 ]]; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/ai-actions.sh around lines 658 - 680, Guard the
numeric comparisons of best_count to avoid arithmetic errors when it's empty or
non-numeric: where the safety-net uses best_count (the variable set from cut of
candidates) — e.g., the check if [[ -n "$best_id" && "$best_count" -ge 4 ]] and
the similar subsequent check — replace the raw compare with a
defaulted/numeric-safe form (use ${best_count:-0} or validate/coerce to a number
before comparing) so the -ge test never receives an empty value; keep the
existing log and return behavior for the duplicate branch and ensure any other
places referencing best_count use the same defensive defaulting.
| # Test 4: Recently completed tasks should also be found | ||
| # t102 is [x] with completed:2026-02-19 — should appear as candidate | ||
| # We need to test _check_similar_open_task which includes completed tasks | ||
| # But _keyword_prefilter_open_tasks only scans open tasks by design | ||
| # The completed task scanning is in _check_similar_open_task via the | ||
| # recently-completed scan added in this fix | ||
|
|
There was a problem hiding this comment.
Misleading comment: completed-task scanning now lives in _keyword_prefilter_open_tasks itself.
The comment says _keyword_prefilter_open_tasks only scans open tasks and that completed-task scanning is in _check_similar_open_task. After this PR, _keyword_prefilter_open_tasks directly scans recently completed tasks (lines 475–489 of ai-actions.sh). The comment should be updated to reflect the new behavior, or simply removed since Test 31 covers the completed-task path explicitly.
Suggested fix
- # Test 4: Recently completed tasks should also be found
- # t102 is [x] with completed:2026-02-19 — should appear as candidate
- # We need to test _check_similar_open_task which includes completed tasks
- # But _keyword_prefilter_open_tasks only scans open tasks by design
- # The completed task scanning is in _check_similar_open_task via the
- # recently-completed scan added in this fix
+ # Test 4: Recently completed tasks are also scanned by _keyword_prefilter_open_tasks
+ # (added in t1218). Dedicated coverage is in Test 31 via _check_similar_open_task.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Test 4: Recently completed tasks should also be found | |
| # t102 is [x] with completed:2026-02-19 — should appear as candidate | |
| # We need to test _check_similar_open_task which includes completed tasks | |
| # But _keyword_prefilter_open_tasks only scans open tasks by design | |
| # The completed task scanning is in _check_similar_open_task via the | |
| # recently-completed scan added in this fix | |
| # Test 4: Recently completed tasks are also scanned by _keyword_prefilter_open_tasks | |
| # (added in t1218). Dedicated coverage is in Test 31 via _check_similar_open_task. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test-ai-actions.sh` around lines 2163 - 2169, The comment in
tests/test-ai-actions.sh incorrectly states that `_keyword_prefilter_open_tasks`
only scans open tasks and that completed-task scanning is in
`_check_similar_open_task`; update or remove this misleading comment to reflect
that `_keyword_prefilter_open_tasks` now directly scans recently completed tasks
(see the newly added completed-task scan in `_keyword_prefilter_open_tasks`), or
delete the comment entirely since Test 31 already covers the completed-task
path—ensure references to `_check_similar_open_task` are not left implying it
handles the completed-task scan.
Auto-dismissed: bot review does not block autonomous pipeline



Summary
The AI supervisor's reasoning engine created 9+ duplicate tasks about the same root symptom ("stale evaluating recovery") because the semantic dedup system had several gaps. This PR fixes 5 issues:
Changes
1. Stop words list reduced (ai-actions.sh)
Action verbs like "investigate", "fix", "add", "implement" were being stripped as stop words in the keyword pre-filter. These carry critical semantic signal for dedup — "Investigate X" and "Investigate Y" share a pattern that should be detected.
2. Recently completed tasks scanned (ai-actions.sh)
The dedup only checked open
[ ]tasks. A task completed 30 minutes ago about the same symptom didn't prevent creating a new one. Now scans[x]tasks withcompleted:timestamps from today or yesterday.3. AI semantic dedup prompt strengthened (ai-actions.sh)
The sonnet dedup prompt was too lenient. Now uses "strict task deduplication checker" framing with 5 explicit duplicate criteria and "when in doubt, mark as duplicate" instruction.
4. Keyword safety net (ai-actions.sh)
When the AI says "not duplicate" but keyword overlap is 4+ matches, treat as duplicate anyway. Prevents the scenario where the AI is wrong but keywords clearly show overlap.
5. Reasoning prompt duplicate prevention (ai-reason.sh)
Added a CRITICAL DUPLICATE PREVENTION rule to the reasoning system prompt, instructing the AI to scan TODO for existing tasks before proposing
create_taskorcreate_improvement.Testing
bash -npasses on both filesFiles Changed
.agents/scripts/supervisor/ai-actions.sh.agents/scripts/supervisor/ai-reason.shtests/test-ai-actions.shSummary by CodeRabbit
Release Notes
New Features
Tests