Skip to content

t1218: Improve semantic dedup to prevent duplicate task creation#1969

Merged
marcusquinn merged 1 commit intomainfrom
bugfix/t1218-semantic-dedup-improvements
Feb 19, 2026
Merged

t1218: Improve semantic dedup to prevent duplicate task creation#1969
marcusquinn merged 1 commit intomainfrom
bugfix/t1218-semantic-dedup-improvements

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 19, 2026

Summary

The AI supervisor's reasoning engine created 9+ duplicate tasks about the same root symptom ("stale evaluating recovery") because the semantic dedup system had several gaps. This PR fixes 5 issues:

Changes

1. Stop words list reduced (ai-actions.sh)

Action verbs like "investigate", "fix", "add", "implement" were being stripped as stop words in the keyword pre-filter. These carry critical semantic signal for dedup — "Investigate X" and "Investigate Y" share a pattern that should be detected.

2. Recently completed tasks scanned (ai-actions.sh)

The dedup only checked open [ ] tasks. A task completed 30 minutes ago about the same symptom didn't prevent creating a new one. Now scans [x] tasks with completed: timestamps from today or yesterday.

3. AI semantic dedup prompt strengthened (ai-actions.sh)

The sonnet dedup prompt was too lenient. Now uses "strict task deduplication checker" framing with 5 explicit duplicate criteria and "when in doubt, mark as duplicate" instruction.

4. Keyword safety net (ai-actions.sh)

When the AI says "not duplicate" but keyword overlap is 4+ matches, treat as duplicate anyway. Prevents the scenario where the AI is wrong but keywords clearly show overlap.

5. Reasoning prompt duplicate prevention (ai-reason.sh)

Added a CRITICAL DUPLICATE PREVENTION rule to the reasoning system prompt, instructing the AI to scan TODO for existing tasks before proposing create_task or create_improvement.

Testing

  • ShellCheck: zero new violations (only pre-existing SC1091/SC2016 info notices)
  • Syntax: bash -n passes on both files
  • Test suite: 30/31 pass (1 pre-existing failure in Test 27, unrelated to this PR)
  • New tests: Test 30 (keyword pre-filter signal words) and Test 31 (recently completed task scanning)

Files Changed

File Lines Description
.agents/scripts/supervisor/ai-actions.sh +116/-30 4 fixes: stop words, completed tasks, prompt, safety net
.agents/scripts/supervisor/ai-reason.sh +1 Duplicate prevention rule in reasoning prompt
tests/test-ai-actions.sh +170 2 new tests (Test 30, Test 31)

Summary by CodeRabbit

Release Notes

  • New Features

    • Improved duplicate task detection now examines both open and recently completed tasks.
    • Enhanced safety mechanisms to prevent creation of near-duplicate tasks.
    • Strengthened AI reasoning with additional duplicate prevention safeguards.
  • Tests

    • Added comprehensive tests for task matching and duplicate detection scenarios.

- Reduce stop words list: keep action verbs (investigate, fix, add, etc.) as signal keywords
- Scan recently completed tasks (today + yesterday) in addition to open tasks
- Strengthen AI semantic dedup prompt with stricter duplicate criteria
- Add keyword safety net: 4+ keyword matches overrides AI 'not duplicate' verdict
- Add duplicate prevention rule to AI reasoning system prompt
- Add tests for keyword pre-filter and recently-completed task scanning
@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the semantic deduplication system to prevent the creation of redundant tasks by the AI supervisor's reasoning engine. It addresses several gaps in the previous system, ensuring that new tasks are not generated for issues already being investigated or recently resolved, thereby improving efficiency and reducing token waste.

Highlights

  • Improved Keyword Pre-filtering: Action verbs like 'investigate' and 'fix' are no longer stripped as stop words, enhancing the semantic signal for deduplication.
  • Scanning of Recently Completed Tasks: The deduplication system now considers recently completed tasks (from today or yesterday) to prevent the creation of new tasks for issues that were just addressed.
  • Strengthened AI Semantic Deduplication Prompt: The AI prompt for deduplication has been made stricter, including explicit criteria for duplicates and an instruction to mark as duplicate when uncertain.
  • Keyword Safety Net Implemented: A safety net was added to treat tasks as duplicates if there are 4 or more keyword matches, even if the AI initially indicates it's not a duplicate.
  • Duplicate Prevention in AI Reasoning Prompt: A critical rule was added to the AI reasoning system prompt, instructing it to scan for existing tasks before proposing new ones.
Changelog
  • .agents/scripts/supervisor/ai-actions.sh
    • Reduced the list of stop words to retain critical action verbs for improved keyword pre-filtering.
    • Extended task scanning to include recently completed tasks (from today or yesterday) in addition to open tasks for deduplication.
    • Updated the AI semantic deduplication prompt with stricter criteria and instructions for marking duplicates.
    • Implemented a keyword safety net to override AI's 'not duplicate' assessment if keyword overlap is high.
  • .agents/scripts/supervisor/ai-reason.sh
    • Added a critical duplicate prevention rule to the AI reasoning system prompt.
  • tests/test-ai-actions.sh
    • Added Test 30 to verify that action verbs are correctly handled in the keyword pre-filter.
    • Added Test 31 to confirm that recently completed tasks are properly detected for semantic deduplication.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 15:53:13 UTC 2026: Code review monitoring started
Thu Feb 19 15:53:14 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 30
  • VULNERABILITIES: 0

Generated on: Thu Feb 19 15:53:16 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Walkthrough

The PR enhances the semantic deduplication workflow by expanding it to scan recently completed tasks (last 24h) alongside open tasks. It introduces a dedicated scoring helper for keyword matching and implements a safety-net mechanism that applies keyword-based duplicate detection when AI semantic dedup returns a not-duplicate verdict, with a configurable threshold. New tests validate both the keyword pre-filter and recently-completed task detection behaviors.

Changes

Cohort / File(s) Summary
Semantic Dedup Enhancement
.agents/scripts/supervisor/ai-actions.sh, .agents/scripts/supervisor/ai-reason.sh
Extended dedup workflow to consider recently completed tasks (24h window) alongside open tasks. Added _score_task_line helper for consistent keyword matching. Implemented safety-net logic that enforces duplicate decision when keyword match threshold (4+) is met. Updated AI prompt to emphasize strict duplicate detection. Enhanced pre-filter API to surface best candidate metadata (best_id, best_count).
Test Coverage
tests/test-ai-actions.sh
Added Test 30 and Test 31. Test 30 validates keyword pre-filter behavior with action verbs and candidate matching. Test 31 verifies semantic dedup detection of recently completed tasks while filtering out older completed tasks, including interaction with configuration controls.

Sequence Diagram(s)

sequenceDiagram
    participant Workflow as Task Workflow
    participant PreFilter as Keyword Pre-filter
    participant AISemantic as AI Semantic Check
    participant SafetyNet as Safety Net
    participant Decision as Dedup Decision
    
    Workflow->>PreFilter: Score candidates from open + recent (24h) tasks
    PreFilter->>PreFilter: _score_task_line for each task
    PreFilter-->>AISemantic: best_id, best_count, candidates
    AISemantic->>AISemantic: AI evaluates similarity
    AISemantic-->>SafetyNet: verdict (duplicate/not-duplicate)
    alt Not-Duplicate Verdict
        SafetyNet->>SafetyNet: Check best_count >= 4 threshold
        alt Threshold Met
            SafetyNet-->>Decision: Force duplicate with best_id
        else Threshold Not Met
            SafetyNet-->>Decision: Not-duplicate confirmed
        end
    else Duplicate Verdict
        AISemantic-->>Decision: Duplicate confirmed
    end
    Decision-->>Workflow: Final dedup decision
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🔄 Dedup'd and doubled-checked with care,
A safety net floats through the air—
Old tasks and new in harmony,
Keywords match at threshold three,
No ghost tasks here, just clarity! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 't1218: Improve semantic dedup to prevent duplicate task creation' directly and clearly summarizes the main objective of the PR, which is to enhance semantic deduplication to prevent duplicate tasks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/t1218-semantic-dedup-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a comprehensive and well-executed improvement to the semantic deduplication system. The multi-pronged approach of refining stop words, scanning completed tasks, strengthening the AI prompt, and adding a keyword safety net is excellent. The addition of new tests provides good coverage for these changes. I have a couple of suggestions to improve the robustness of the grep commands to fully align with the repository's style guide for error handling.

# Scan open tasks
while IFS= read -r task_line; do
_score_task_line "$task_line"
done < <(grep -E '^\s*- \[ \] t[0-9]' "$todo_file" 2>/dev/null)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Under set -e, this grep command can cause the script to exit if $todo_file doesn't exist, as grep will return a non-zero exit code. The 2>/dev/null only suppresses the error message, not the exit code. The repository style guide recommends using || true to guard against such failures.

Suggested change
done < <(grep -E '^\s*- \[ \] t[0-9]' "$todo_file" 2>/dev/null)
done < <(grep -E '^\s*- \[ \] t[0-9]' "$todo_file" 2>/dev/null || true)

if [[ "$completed_date" == "$today" || "$completed_date" == "$yesterday" ]]; then
_score_task_line "$task_line"
fi
done < <(grep -E '^\s*- \[x\] t[0-9]' "$todo_file" 2>/dev/null)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the grep for open tasks, this command can cause the script to exit under set -e if the file doesn't exist. Guarding with || true as per the repository style guide will make this more robust.

Suggested change
done < <(grep -E '^\s*- \[x\] t[0-9]' "$todo_file" 2>/dev/null)
done < <(grep -E '^\s*- \[x\] t[0-9]' "$todo_file" 2>/dev/null || true)

coderabbitai[bot]
coderabbitai bot previously requested changes Feb 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/ai-actions.sh:
- Around line 658-680: Guard the numeric comparisons of best_count to avoid
arithmetic errors when it's empty or non-numeric: where the safety-net uses
best_count (the variable set from cut of candidates) — e.g., the check if [[ -n
"$best_id" && "$best_count" -ge 4 ]] and the similar subsequent check — replace
the raw compare with a defaulted/numeric-safe form (use ${best_count:-0} or
validate/coerce to a number before comparing) so the -ge test never receives an
empty value; keep the existing log and return behavior for the duplicate branch
and ensure any other places referencing best_count use the same defensive
defaulting.

In `@tests/test-ai-actions.sh`:
- Around line 2163-2169: The comment in tests/test-ai-actions.sh incorrectly
states that `_keyword_prefilter_open_tasks` only scans open tasks and that
completed-task scanning is in `_check_similar_open_task`; update or remove this
misleading comment to reflect that `_keyword_prefilter_open_tasks` now directly
scans recently completed tasks (see the newly added completed-task scan in
`_keyword_prefilter_open_tasks`), or delete the comment entirely since Test 31
already covers the completed-task path—ensure references to
`_check_similar_open_task` are not left implying it handles the completed-task
scan.

Comment on lines +658 to +680
# Check the best keyword match score for safety-net logic below
local best_id best_count
best_id=$(printf '%s' "$candidates" | head -1 | cut -d'|' -f1)
best_count=$(printf '%s' "$candidates" | head -1 | cut -d'|' -f2)

# Step 2: AI semantic check (if enabled and CLI available)
if [[ "${AI_SEMANTIC_DEDUP_USE_AI:-true}" == "true" ]]; then
local ai_result
if ai_result=$(_ai_semantic_dedup_check "$title" "$candidates"); then
printf '%s' "$ai_result"
return 0
fi
# AI said not a duplicate or was unavailable — trust the AI over keywords
# AI said not a duplicate or was unavailable.
# Safety net: if keyword overlap is very high (4+ matches), treat as
# duplicate anyway. The AI can be wrong, but 4+ shared distinctive
# keywords is strong evidence of the same topic. This prevents the
# "9 tasks about stale evaluating" scenario where the AI kept saying
# "not duplicate" despite obvious overlap.
if [[ -n "$best_id" && "$best_count" -ge 4 ]]; then
log_info "AI Actions: semantic dedup safety net: AI said not duplicate but $best_id has $best_count keyword matches — treating as duplicate"
printf '%s' "$best_id"
return 0
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Safety net is a pragmatic override — consider guarding best_count against empty/non-numeric values.

The 4+ keyword threshold as a fallback when the AI disagrees is a sound heuristic. However, if best_count is ever empty (e.g., malformed candidate output), the -ge comparison on line 676 will throw an arithmetic error under set -e, aborting the function.

While the current code path makes this unlikely (candidates are well-formed by construction), a ${best_count:-0} default costs nothing and prevents a potential crash.

Defensive fix
-	if [[ -n "$best_id" && "$best_count" -ge 4 ]]; then
+	if [[ -n "$best_id" && "${best_count:-0}" -ge 4 ]]; then

And similarly at line 686:

-	if [[ -n "$best_id" && "$best_count" -ge 3 ]]; then
+	if [[ -n "$best_id" && "${best_count:-0}" -ge 3 ]]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/ai-actions.sh around lines 658 - 680, Guard the
numeric comparisons of best_count to avoid arithmetic errors when it's empty or
non-numeric: where the safety-net uses best_count (the variable set from cut of
candidates) — e.g., the check if [[ -n "$best_id" && "$best_count" -ge 4 ]] and
the similar subsequent check — replace the raw compare with a
defaulted/numeric-safe form (use ${best_count:-0} or validate/coerce to a number
before comparing) so the -ge test never receives an empty value; keep the
existing log and return behavior for the duplicate branch and ensure any other
places referencing best_count use the same defensive defaulting.

Comment on lines +2163 to +2169
# Test 4: Recently completed tasks should also be found
# t102 is [x] with completed:2026-02-19 — should appear as candidate
# We need to test _check_similar_open_task which includes completed tasks
# But _keyword_prefilter_open_tasks only scans open tasks by design
# The completed task scanning is in _check_similar_open_task via the
# recently-completed scan added in this fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading comment: completed-task scanning now lives in _keyword_prefilter_open_tasks itself.

The comment says _keyword_prefilter_open_tasks only scans open tasks and that completed-task scanning is in _check_similar_open_task. After this PR, _keyword_prefilter_open_tasks directly scans recently completed tasks (lines 475–489 of ai-actions.sh). The comment should be updated to reflect the new behavior, or simply removed since Test 31 covers the completed-task path explicitly.

Suggested fix
-		# Test 4: Recently completed tasks should also be found
-		# t102 is [x] with completed:2026-02-19 — should appear as candidate
-		# We need to test _check_similar_open_task which includes completed tasks
-		# But _keyword_prefilter_open_tasks only scans open tasks by design
-		# The completed task scanning is in _check_similar_open_task via the
-		# recently-completed scan added in this fix
+		# Test 4: Recently completed tasks are also scanned by _keyword_prefilter_open_tasks
+		# (added in t1218). Dedicated coverage is in Test 31 via _check_similar_open_task.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Test 4: Recently completed tasks should also be found
# t102 is [x] with completed:2026-02-19 — should appear as candidate
# We need to test _check_similar_open_task which includes completed tasks
# But _keyword_prefilter_open_tasks only scans open tasks by design
# The completed task scanning is in _check_similar_open_task via the
# recently-completed scan added in this fix
# Test 4: Recently completed tasks are also scanned by _keyword_prefilter_open_tasks
# (added in t1218). Dedicated coverage is in Test 31 via _check_similar_open_task.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test-ai-actions.sh` around lines 2163 - 2169, The comment in
tests/test-ai-actions.sh incorrectly states that `_keyword_prefilter_open_tasks`
only scans open tasks and that completed-task scanning is in
`_check_similar_open_task`; update or remove this misleading comment to reflect
that `_keyword_prefilter_open_tasks` now directly scans recently completed tasks
(see the newly added completed-task scan in `_keyword_prefilter_open_tasks`), or
delete the comment entirely since Test 31 already covers the completed-task
path—ensure references to `_check_similar_open_task` are not left implying it
handles the completed-task scan.

@marcusquinn marcusquinn dismissed coderabbitai[bot]’s stale review February 19, 2026 15:58

Auto-dismissed: bot review does not block autonomous pipeline

@marcusquinn marcusquinn merged commit 4196344 into main Feb 19, 2026
20 checks passed
@marcusquinn marcusquinn deleted the bugfix/t1218-semantic-dedup-improvements branch February 21, 2026 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant