t2838: Fix Phase 0.9 sanity check that resets completed tasks to queued#2845
t2838: Fix Phase 0.9 sanity check that resets completed tasks to queued#2845alex-solovyev merged 1 commit intomainfrom
Conversation
… queued (t2838) Three-layer fix for the completed-task downgrade loop: 1. sanity-check.sh: Add deterministic guard in _execute_sanity_action() that blocks unclaim/reset actions when DB status is complete/verified/deployed/merged. This is the hard safety net — even if the AI proposes a downgrade, the harness refuses to execute it. 2. sanity-check.sh: Add 'trigger_update_todo' action type that syncs TODO.md to match the DB (marks [x]) instead of resetting DB to match TODO.md. Update the AI prompt with directional authority rule and new state snapshot section showing completed tasks with stale TODO.md entries. 3. deploy.sh: Accept 'verified_complete' as a valid pr_url in verify_task_deliverables(), allowing tasks without PRs (audit, documentation) to pass deliverable verification and get their TODO.md updated normally. Root cause: Workers completed tasks (DB=complete) but update_todo_on_complete failed because verify_task_deliverables rejected non-PR deliverables. TODO.md stayed [ ]. Phase 0.9 AI saw the contradiction and proposed 'reset' to queued, causing re-dispatch of already-completed work. Closes #2838
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue (t2838) where the Phase 0.9 AI sanity check was erroneously resetting completed tasks to a 'queued' state, causing redundant work. The problem stemmed from Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Mar 4 17:48:34 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Code Review
This pull request addresses a critical bug causing completed tasks to reset, implementing a robust multi-layered fix with a deterministic guard, AI prompt improvements, and enhanced deliverable verification. It also introduces security enhancements like a directional authority rule and a new trigger_update_todo action. However, two critical security vulnerabilities were identified in sanity-check.sh: a regex/sed injection due to incomplete sanitization of task IDs, and a potential prompt injection where untrusted database content is directly included in the AI prompt. These require immediate attention. Further improvements include enhancing error visibility by removing stderr suppression and strengthening the new tests for better functionality.
| completed_stale=$(db -separator '|' "$SUPERVISOR_DB" " | ||
| SELECT id, status, COALESCE(pr_url,'') | ||
| FROM tasks | ||
| WHERE repo = '$(sql_escape "$repo_path")' | ||
| AND status IN ('complete', 'verified', 'deployed', 'merged') | ||
| ORDER BY id; | ||
| " 2>/dev/null || echo "") |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null for the db command can mask critical issues like database connection failures, SQL syntax errors, or permission problems. Removing it will ensure these errors are visible for easier debugging. The || echo "" will still handle command failure gracefully.
| completed_stale=$(db -separator '|' "$SUPERVISOR_DB" " | |
| SELECT id, status, COALESCE(pr_url,'') | |
| FROM tasks | |
| WHERE repo = '$(sql_escape "$repo_path")' | |
| AND status IN ('complete', 'verified', 'deployed', 'merged') | |
| ORDER BY id; | |
| " 2>/dev/null || echo "") | |
| completed_stale=$(db -separator '|' "$SUPERVISOR_DB" " | |
| SELECT id, status, COALESCE(pr_url,'') | |
| FROM tasks | |
| WHERE repo = '$(sql_escape "$repo_path")' | |
| AND status IN ('complete', 'verified', 'deployed', 'merged') | |
| ORDER BY id; | |
| " || echo "") |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| # than TODO.md. Completed tasks showing [ ] in TODO.md need their TODO.md | ||
| # updated (via trigger_update_todo), NOT their DB state reset to queued. | ||
| local _current_db_status="" | ||
| _current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || echo "") |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null for the db command can mask critical issues like database connection failures or permission problems. Removing it will ensure these errors are visible for easier debugging. The || echo "" will still handle command failure gracefully.
| _current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || echo "") | |
| _current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" || echo "") |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| escaped_task_id=$(printf '%s' "$task_id" | sed 's/\./\\./g') | ||
| if grep -qE "^[[:space:]]*- \[ \] ${escaped_task_id}( |$)" "$todo_file" 2>/dev/null; then | ||
| sed_inplace -E "s/^([[:space:]]*- )\[ \] (${escaped_task_id} .*)$/\1[x] \2 verified:${today} completed:${today}/" "$todo_file" |
There was a problem hiding this comment.
The task_id variable is used to construct a regular expression for grep and a command for sed without sufficient sanitization. While dots are escaped, other special characters such as /, [, ], *, (, ), and & are not. An attacker who can control the task ID (e.g., by creating a task with a malicious name) could inject regex patterns to match unintended lines or inject sed commands to corrupt the TODO.md file. In some environments, this could potentially lead to arbitrary command execution if the sed implementation supports the e flag.
| @@ -501,6 +535,18 @@ You are a supervisor sanity checker for an automated task dispatch system. The q | |||
|
|
|||
| $state_snapshot | |||
There was a problem hiding this comment.
Untrusted data from the database (task IDs, statuses, PR URLs) is concatenated into the AI prompt via the $state_snapshot variable. An attacker could craft malicious metadata for a task that, when included in the snapshot, performs a prompt injection attack. This could trick the AI into proposing unauthorized or harmful actions, such as cancelling legitimate tasks or bypassing security rules.
| write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \ | ||
| --decision "verified:no_pr:verified_complete" \ | ||
| --evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \ | ||
| --maker "verify_task_deliverables" 2>/dev/null || true |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide important errors from the write_proof_log command, such as issues with the log file path or permissions. It's better to let stderr be visible for debugging. The || true already prevents the script from exiting on failure.
| write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \ | |
| --decision "verified:no_pr:verified_complete" \ | |
| --evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \ | |
| --maker "verify_task_deliverables" 2>/dev/null || true | |
| write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \ | |
| --decision "verified:no_pr:verified_complete" \ | |
| --evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \ | |
| --maker "verify_task_deliverables" || true |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| [[ -z "$csid" ]] && continue | ||
| local escaped_csid | ||
| escaped_csid=$(printf '%s' "$csid" | sed 's/\./\\./g') | ||
| if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file" 2>/dev/null; then |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide errors, for instance if $todo_file doesn't exist. It's better to let potential errors be visible for debugging.
| if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file" 2>/dev/null; then | |
| if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file"; then |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| if grep -q 'complete.*verified.*deployed.*merged' "$SANITY_CHECK_SCRIPT" && | ||
| grep -q 'BLOCKED downgrade' "$SANITY_CHECK_SCRIPT"; then | ||
| pass "t2838: sanity-check.sh has downgrade prevention guard" | ||
| else | ||
| fail "t2838: sanity-check.sh missing downgrade prevention guard" | ||
| fi |
There was a problem hiding this comment.
This test is structural, verifying the presence of keywords in the script using grep. It would be more robust to create a functional test that actually attempts to perform a downgrade and asserts that it is blocked by the guard. This would validate the behavior of the code, not just its text. For example, you could try to execute a reset action on the complete task t2838a and verify that the action is blocked and the task's status remains complete.
…eview (GH#2866) - Remove 2>/dev/null from db() calls to surface database errors (critical) - Replace dot-only regex escaping with full metacharacter _escape_regex() helper - Add task_id format validation to reject malformed IDs before regex/sed use - Replace grep 2>/dev/null with explicit file-existence checks - Add prompt injection mitigation: control char stripping, DATA boundary markers, and anti-injection instruction for AI prompt containing DB data Closes #2866
* fix: address critical quality-debt in sanity-check.sh from PR #2845 review (GH#2866) - Remove 2>/dev/null from db() calls to surface database errors (critical) - Replace dot-only regex escaping with full metacharacter _escape_regex() helper - Add task_id format validation to reject malformed IDs before regex/sed use - Replace grep 2>/dev/null with explicit file-existence checks - Add prompt injection mitigation: control char stripping, DATA boundary markers, and anti-injection instruction for AI prompt containing DB data Closes #2866 * fix: replace remaining grep 2>/dev/null on $todo_file with file-existence guards Address Gemini Code Assist review feedback on PR #2870: the grep at line 342 (orphan detection) still suppressed stderr. Apply the same [[ -f "$todo_file" ]] guard pattern consistently to all remaining grep calls on $todo_file (lines 260, 342, 996-999) so file-not-found and permission errors are visible. --------- Co-authored-by: marcusquinn <6428977+marcusquinn@users.noreply.github.com>
Remove 2>/dev/null from all 6 write_proof_log calls in deploy.sh. The || true already prevents script exit on failure, so suppressing stderr just hides debugging info (path errors, permission issues). Addresses review feedback from Gemini on PR #2845.
) Remove 2>/dev/null from all 6 write_proof_log calls in deploy.sh. The || true already prevents script exit on failure, so suppressing stderr just hides debugging info (path errors, permission issues). Addresses review feedback from Gemini on PR #2845.



Summary
_execute_sanity_action()that blocksunclaim/resetactions when DB status iscomplete/verified/deployed/merged— the hard safety nettrigger_update_todoaction: New sanity check action that syncs TODO.md to match DB (marks[x]) instead of resetting DB to match TODO.md, with fallback force-mark when deliverable verification failsverified_completedeliverables:verify_task_deliverables()now acceptsverified_completeas a validpr_urlfor tasks that don't produce PRs (audit, documentation, research)Root Cause
Workers completed tasks (DB status =
complete) butupdate_todo_on_completefailed becauseverify_task_deliverablesrejected non-PR deliverables likeverified_complete. TODO.md stayed[ ]. Phase 0.9 AI sanity check saw the contradiction (DB=complete, TODO=open) and proposedresettoqueued, causing re-dispatch of already-completed work. Observed on t025.3, t025.4, t025.7.Three-Layer Fix
_execute_sanity_action): Even if the AI proposes a downgrade, the shell code refuses to execute it for advanced states_build_sanity_prompt): Directional authority rule + newtrigger_update_todoaction + state snapshot section showing completed-but-stale tasksverify_task_deliverables): Acceptverified_completesignal soupdate_todo_on_completesucceeds for non-PR tasksTesting
test-supervisor-state-machine.shverifying all fix components are presentCloses #2838