t2838: Fix Phase 0.9 sanity check that resets completed tasks to queued by alex-solovyev · Pull Request #2845 · marcusquinn/aidevops

alex-solovyev · 2026-03-04T17:47:56Z

Summary

Prevent completed-task downgrade: Add deterministic guard in _execute_sanity_action() that blocks unclaim/reset actions when DB status is complete/verified/deployed/merged — the hard safety net
Add trigger_update_todo action: New sanity check action that syncs TODO.md to match DB (marks [x]) instead of resetting DB to match TODO.md, with fallback force-mark when deliverable verification fails
Accept verified_complete deliverables: verify_task_deliverables() now accepts verified_complete as a valid pr_url for tasks that don't produce PRs (audit, documentation, research)

Root Cause

Workers completed tasks (DB status = complete) but update_todo_on_complete failed because verify_task_deliverables rejected non-PR deliverables like verified_complete. TODO.md stayed [ ]. Phase 0.9 AI sanity check saw the contradiction (DB=complete, TODO=open) and proposed reset to queued, causing re-dispatch of already-completed work. Observed on t025.3, t025.4, t025.7.

Three-Layer Fix

Harness guard (deterministic, in _execute_sanity_action): Even if the AI proposes a downgrade, the shell code refuses to execute it for advanced states
AI prompt update (in _build_sanity_prompt): Directional authority rule + new trigger_update_todo action + state snapshot section showing completed-but-stale tasks
Deliverable verification (in verify_task_deliverables): Accept verified_complete signal so update_todo_on_complete succeeds for non-PR tasks

Testing

Structural tests added to test-supervisor-state-machine.sh verifying all fix components are present
ShellCheck passes on all modified files

Closes #2838

… queued (t2838) Three-layer fix for the completed-task downgrade loop: 1. sanity-check.sh: Add deterministic guard in _execute_sanity_action() that blocks unclaim/reset actions when DB status is complete/verified/deployed/merged. This is the hard safety net — even if the AI proposes a downgrade, the harness refuses to execute it. 2. sanity-check.sh: Add 'trigger_update_todo' action type that syncs TODO.md to match the DB (marks [x]) instead of resetting DB to match TODO.md. Update the AI prompt with directional authority rule and new state snapshot section showing completed tasks with stale TODO.md entries. 3. deploy.sh: Accept 'verified_complete' as a valid pr_url in verify_task_deliverables(), allowing tasks without PRs (audit, documentation) to pass deliverable verification and get their TODO.md updated normally. Root cause: Workers completed tasks (DB=complete) but update_todo_on_complete failed because verify_task_deliverables rejected non-PR deliverables. TODO.md stayed [ ]. Phase 0.9 AI saw the contradiction and proposed 'reset' to queued, causing re-dispatch of already-completed work. Closes #2838

coderabbitai · 2026-03-04T17:48:03Z

Warning

Rate limit exceeded

@alex-solovyev has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 55 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 29d54823-560e-41d9-bf91-f84e49d473a4

📥 Commits

Reviewing files that changed from the base of the PR and between 55b249c and dc021b6.

📒 Files selected for processing (3)

.agents/scripts/supervisor-archived/deploy.sh
.agents/scripts/supervisor-archived/sanity-check.sh
tests/test-supervisor-state-machine.sh

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/fix-sanity-check-downgrade

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-04T17:48:21Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue (t2838) where the Phase 0.9 AI sanity check was erroneously resetting completed tasks to a 'queued' state, causing redundant work. The problem stemmed from update_todo_on_complete failing for tasks without traditional PRs, leading to a discrepancy between the database's 'complete' status and TODO.md's 'open' status. The solution introduces a robust, three-pronged fix: a hardcoded safety mechanism to prevent downgrades of advanced-state tasks, an intelligent new action (trigger_update_todo) to synchronize TODO.md, and an update to the deliverable verification process to correctly handle non-PR task completions.

Highlights

Prevent completed-task downgrade: Added a deterministic guard in _execute_sanity_action() to block unclaim/reset actions for tasks in advanced DB states (complete, verified, deployed, merged).
Introduce trigger_update_todo action: Implemented a new sanity check action to synchronize TODO.md with the database for completed tasks, including a fallback to force-mark completion if standard update fails.
Accept verified_complete deliverables: Modified verify_task_deliverables() to recognize verified_complete as a valid PR URL for non-PR tasks (e.g., audit, documentation).
Updated AI prompt guidance: Enhanced the AI prompt with a "Directional Authority Rule" and a "Completed Tasks with Stale TODO.md" section to improve decision-making and prevent incorrect task resets.

Changelog

.agents/scripts/supervisor-archived/deploy.sh
- verify_task_deliverables function was updated to accept "verified_complete" as a valid pr_url for tasks that do not produce pull requests, ensuring proper completion tracking for non-PR deliverables.
.agents/scripts/supervisor-archived/sanity-check.sh
- The _build_sanity_state_snapshot function was enhanced to include a new "Completed Tasks with Stale TODO.md" section, providing the AI with crucial context about tasks completed in the DB but not yet marked in TODO.md.
- The AI prompt was augmented with a "CRITICAL: Directional Authority Rule" and updated guidance to explicitly prevent reset or unclaim actions for tasks in advanced DB states, instead recommending the new trigger_update_todo action.
- The _execute_sanity_action function was modified to include a deterministic guard that blocks unclaim or reset actions if the task's database status is complete, verified, deployed, or merged.
- A new trigger_update_todo action was added to _execute_sanity_action, which attempts to update TODO.md for completed tasks and includes a fallback mechanism to force-mark the task as complete if the initial update fails.
tests/test-supervisor-state-machine.sh
- A new test section was added to comprehensively verify the t2838 fixes, including tests for the verified_complete acceptance in deploy.sh, the downgrade prevention guard in sanity-check.sh, the presence and functionality of the trigger_update_todo action, and the updated AI prompt rules and state snapshot.

Activity

The pull request addresses and resolves a critical bug (t2838) related to incorrect task state management.
Comprehensive structural tests have been added to test-supervisor-state-machine.sh to ensure the robustness and correctness of the implemented fixes.
All modified shell scripts have passed ShellCheck, confirming adherence to scripting best practices.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-03-04T17:48:35Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Mar 4 17:48:31 UTC 2026: Code review monitoring started
Wed Mar 4 17:48:32 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 107
VULNERABILITIES: 0

Generated on: Wed Mar 4 17:48:34 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-03-04T17:49:17Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gemini-code-assist

Code Review

This pull request addresses a critical bug causing completed tasks to reset, implementing a robust multi-layered fix with a deterministic guard, AI prompt improvements, and enhanced deliverable verification. It also introduces security enhancements like a directional authority rule and a new trigger_update_todo action. However, two critical security vulnerabilities were identified in sanity-check.sh: a regex/sed injection due to incomplete sanitization of task IDs, and a potential prompt injection where untrusted database content is directly included in the AI prompt. These require immediate attention. Further improvements include enhancing error visibility by removing stderr suppression and strengthening the new tests for better functionality.

gemini-code-assist · 2026-03-04T17:55:20Z

.agents/scripts/supervisor-archived/sanity-check.sh

+	completed_stale=$(db -separator '|' "$SUPERVISOR_DB" "
+		SELECT id, status, COALESCE(pr_url,'')
+		FROM tasks
+		WHERE repo = '$(sql_escape "$repo_path")'
+		AND status IN ('complete', 'verified', 'deployed', 'merged')
+		ORDER BY id;
+	" 2>/dev/null || echo "")


Suppressing stderr with 2>/dev/null for the db command can mask critical issues like database connection failures, SQL syntax errors, or permission problems. Removing it will ensure these errors are visible for easier debugging. The || echo "" will still handle command failure gracefully.

Suggested change

completed_stale=$(db -separator '|' "$SUPERVISOR_DB" "

SELECT id, status, COALESCE(pr_url,'')

FROM tasks

WHERE repo = '$(sql_escape "$repo_path")'

AND status IN ('complete', 'verified', 'deployed', 'merged')

ORDER BY id;

" 2>/dev/null || echo "")

completed_stale=$(db -separator '|' "$SUPERVISOR_DB" "

SELECT id, status, COALESCE(pr_url,'')

FROM tasks

WHERE repo = '$(sql_escape "$repo_path")'

AND status IN ('complete', 'verified', 'deployed', 'merged')

ORDER BY id;

" || echo "")

References

Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-03-04T17:55:20Z

.agents/scripts/supervisor-archived/sanity-check.sh

+	# than TODO.md. Completed tasks showing [ ] in TODO.md need their TODO.md
+	# updated (via trigger_update_todo), NOT their DB state reset to queued.
+	local _current_db_status=""
+	_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || echo "")


Suppressing stderr with 2>/dev/null for the db command can mask critical issues like database connection failures or permission problems. Removing it will ensure these errors are visible for easier debugging. The || echo "" will still handle command failure gracefully.

Suggested change

_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || echo "")

_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" || echo "")

References

Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-03-04T17:55:20Z

.agents/scripts/supervisor-archived/sanity-check.sh

+			escaped_task_id=$(printf '%s' "$task_id" | sed 's/\./\\./g')
+			if grep -qE "^[[:space:]]*- \[ \] ${escaped_task_id}( |$)" "$todo_file" 2>/dev/null; then
+				sed_inplace -E "s/^([[:space:]]*- )\[ \] (${escaped_task_id} .*)$/\1[x] \2 verified:${today} completed:${today}/" "$todo_file"


The task_id variable is used to construct a regular expression for grep and a command for sed without sufficient sanitization. While dots are escaped, other special characters such as /, [, ], *, (, ), and & are not. An attacker who can control the task ID (e.g., by creating a task with a malicious name) could inject regex patterns to match unintended lines or inject sed commands to corrupt the TODO.md file. In some environments, this could potentially lead to arbitrary command execution if the sed implementation supports the e flag.

gemini-code-assist · 2026-03-04T17:55:20Z

.agents/scripts/supervisor-archived/sanity-check.sh

@@ -501,6 +535,18 @@ You are a supervisor sanity checker for an automated task dispatch system. The q

 $state_snapshot


Untrusted data from the database (task IDs, statuses, PR URLs) is concatenated into the AI prompt via the $state_snapshot variable. An attacker could craft malicious metadata for a task that, when included in the snapshot, performs a prompt injection attack. This could trick the AI into proposing unauthorized or harmful actions, such as cancelling legitimate tasks or bypassing security rules.

gemini-code-assist · 2026-03-04T17:55:20Z

.agents/scripts/supervisor-archived/deploy.sh

+		write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \
+			--decision "verified:no_pr:verified_complete" \
+			--evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \
+			--maker "verify_task_deliverables" 2>/dev/null || true


Suppressing stderr with 2>/dev/null can hide important errors from the write_proof_log command, such as issues with the log file path or permissions. It's better to let stderr be visible for debugging. The || true already prevents the script from exiting on failure.

Suggested change

write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \

--decision "verified:no_pr:verified_complete" \

--evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \

--maker "verify_task_deliverables" 2>/dev/null || true

write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \

--decision "verified:no_pr:verified_complete" \

--evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \

--maker "verify_task_deliverables" || true

References

Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-03-04T17:55:20Z

.agents/scripts/supervisor-archived/sanity-check.sh

+			[[ -z "$csid" ]] && continue
+			local escaped_csid
+			escaped_csid=$(printf '%s' "$csid" | sed 's/\./\\./g')
+			if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file" 2>/dev/null; then


Suppressing stderr with 2>/dev/null can hide errors, for instance if $todo_file doesn't exist. It's better to let potential errors be visible for debugging.

Suggested change

if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file" 2>/dev/null; then

if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file"; then

References

Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

gemini-code-assist · 2026-03-04T17:55:20Z

tests/test-supervisor-state-machine.sh

+	if grep -q 'complete.*verified.*deployed.*merged' "$SANITY_CHECK_SCRIPT" &&
+		grep -q 'BLOCKED downgrade' "$SANITY_CHECK_SCRIPT"; then
+		pass "t2838: sanity-check.sh has downgrade prevention guard"
+	else
+		fail "t2838: sanity-check.sh missing downgrade prevention guard"
+	fi


This test is structural, verifying the presence of keywords in the script using grep. It would be more robust to create a functional test that actually attempts to perform a downgrade and asserts that it is blocked by the guard. This would validate the behavior of the code, not just its text. For example, you could try to execute a reset action on the complete task t2838a and verify that the action is blocked and the task's status remains complete.

…eview (GH#2866) - Remove 2>/dev/null from db() calls to surface database errors (critical) - Replace dot-only regex escaping with full metacharacter _escape_regex() helper - Add task_id format validation to reject malformed IDs before regex/sed use - Replace grep 2>/dev/null with explicit file-existence checks - Add prompt injection mitigation: control char stripping, DATA boundary markers, and anti-injection instruction for AI prompt containing DB data Closes #2866

* fix: address critical quality-debt in sanity-check.sh from PR #2845 review (GH#2866) - Remove 2>/dev/null from db() calls to surface database errors (critical) - Replace dot-only regex escaping with full metacharacter _escape_regex() helper - Add task_id format validation to reject malformed IDs before regex/sed use - Replace grep 2>/dev/null with explicit file-existence checks - Add prompt injection mitigation: control char stripping, DATA boundary markers, and anti-injection instruction for AI prompt containing DB data Closes #2866 * fix: replace remaining grep 2>/dev/null on $todo_file with file-existence guards Address Gemini Code Assist review feedback on PR #2870: the grep at line 342 (orphan detection) still suppressed stderr. Apply the same [[ -f "$todo_file" ]] guard pattern consistently to all remaining grep calls on $todo_file (lines 260, 342, 996-999) so file-not-found and permission errors are visible. --------- Co-authored-by: marcusquinn <6428977+marcusquinn@users.noreply.github.com>

Remove 2>/dev/null from all 6 write_proof_log calls in deploy.sh. The || true already prevents script exit on failure, so suppressing stderr just hides debugging info (path errors, permission issues). Addresses review feedback from Gemini on PR #2845.

) Remove 2>/dev/null from all 6 write_proof_log calls in deploy.sh. The || true already prevents script exit on failure, so suppressing stderr just hides debugging info (path errors, permission issues). Addresses review feedback from Gemini on PR #2845.

github-actions bot added the bug Auto-created from TODO.md tag label Mar 4, 2026

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

alex-solovyev mentioned this pull request Mar 4, 2026

[Supervisor:alex-solovyev] 13 PRs, 16 assigned, 6 workers at 06:15 UTC #2646

Closed

alex-solovyev merged commit b1015c0 into main Mar 4, 2026
27 of 28 checks passed

alex-solovyev deleted the bugfix/fix-sanity-check-downgrade branch March 4, 2026 19:21

alex-solovyev mentioned this pull request Mar 4, 2026

Bug: Phase 0.9 sanity check resets completed tasks to queued when TODO.md shows [ ] #2838

Closed

alex-solovyev mentioned this pull request Mar 5, 2026

fix: address critical quality-debt in sanity-check.sh (GH#2866) #2870

Merged

alex-solovyev mentioned this pull request Mar 5, 2026

t2865: Remove stderr suppression from write_proof_log calls in deploy.sh #2888

Merged

	_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null \|\| echo "")
	_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" \|\| echo "")

		@@ -501,6 +535,18 @@ You are a supervisor sanity checker for an automated task dispatch system. The q

		$state_snapshot

	if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( \|$)" "$todo_file" 2>/dev/null; then
	if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( \|$)" "$todo_file"; then

Conversation

alex-solovyev commented Mar 4, 2026

Summary

Root Cause

Three-Layer Fix

Testing

Uh oh!

coderabbitai bot commented Mar 4, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Mar 4, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Mar 4, 2026

Quality Gate passed

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant