Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .agents/scripts/supervisor-archived/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2571,6 +2571,19 @@ verify_task_deliverables() {
return 0
fi

# Accept verified_complete as a valid completion signal (t2838)
# Tasks that don't produce PRs (audit, documentation, research) may be marked
# complete with FULL_LOOP_COMPLETE signal and pr_url=verified_complete.
# These are legitimate completions that should pass deliverable verification.
if [[ "$pr_url" == "verified_complete" ]]; then
log_info "Task $task_id completed without PR (verified_complete signal) — accepting"
write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \
--decision "verified:no_pr:verified_complete" \
--evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \
--maker "verify_task_deliverables" 2>/dev/null || true
Comment on lines +2580 to +2583

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr with 2>/dev/null can hide important errors from the write_proof_log command, such as issues with the log file path or permissions. It's better to let stderr be visible for debugging. The || true already prevents the script from exiting on failure.

Suggested change
write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \
--decision "verified:no_pr:verified_complete" \
--evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \
--maker "verify_task_deliverables" 2>/dev/null || true
write_proof_log --task "$task_id" --event "deliverable_verified" --stage "complete" \
--decision "verified:no_pr:verified_complete" \
--evidence "pr_url=verified_complete,signal=FULL_LOOP_COMPLETE" \
--maker "verify_task_deliverables" || true
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

return 0
fi

# If no PR URL, task cannot be verified
if [[ -z "$pr_url" || "$pr_url" == "no_pr" || "$pr_url" == "task_only" ]]; then
log_warn "Task $task_id has no PR URL ($pr_url) - cannot verify deliverables"
Expand Down
113 changes: 110 additions & 3 deletions .agents/scripts/supervisor-archived/sanity-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -473,7 +473,41 @@ _build_sanity_state_snapshot() {
fi
snapshot+="\n"

# 9. Identity context
# 9. Completed tasks with stale TODO.md (t2838)
# Tasks that are complete/verified/deployed/merged in DB but still [ ] in TODO.md.
# These need trigger_update_todo, NOT reset/unclaim.
snapshot+="### Completed Tasks with Stale TODO.md\n"
local completed_stale
completed_stale=$(db -separator '|' "$SUPERVISOR_DB" "
SELECT id, status, COALESCE(pr_url,'')
FROM tasks
WHERE repo = '$(sql_escape "$repo_path")'
AND status IN ('complete', 'verified', 'deployed', 'merged')
ORDER BY id;
" 2>/dev/null || echo "")
Comment on lines +481 to +487

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Suppressing stderr with 2>/dev/null for the db command can mask critical issues like database connection failures, SQL syntax errors, or permission problems. Removing it will ensure these errors are visible for easier debugging. The || echo "" will still handle command failure gracefully.

Suggested change
completed_stale=$(db -separator '|' "$SUPERVISOR_DB" "
SELECT id, status, COALESCE(pr_url,'')
FROM tasks
WHERE repo = '$(sql_escape "$repo_path")'
AND status IN ('complete', 'verified', 'deployed', 'merged')
ORDER BY id;
" 2>/dev/null || echo "")
completed_stale=$(db -separator '|' "$SUPERVISOR_DB" "
SELECT id, status, COALESCE(pr_url,'')
FROM tasks
WHERE repo = '$(sql_escape "$repo_path")'
AND status IN ('complete', 'verified', 'deployed', 'merged')
ORDER BY id;
" || echo "")
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

if [[ -n "$completed_stale" ]]; then
local stale_found=false
while IFS='|' read -r csid csstatus cspr; do
[[ -z "$csid" ]] && continue
local escaped_csid
escaped_csid=$(printf '%s' "$csid" | sed 's/\./\\./g')
if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file" 2>/dev/null; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr with 2>/dev/null can hide errors, for instance if $todo_file doesn't exist. It's better to let potential errors be visible for debugging.

Suggested change
if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file" 2>/dev/null; then
if grep -qE "^[[:space:]]*- \[ \] ${escaped_csid}( |$)" "$todo_file"; then
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

if [[ "$stale_found" == false ]]; then
snapshot+="id|db_status|pr_url (these need trigger_update_todo, NOT reset)\n"
stale_found=true
fi
snapshot+="$csid|$csstatus|$cspr\n"
fi
done <<<"$completed_stale"
if [[ "$stale_found" == false ]]; then
snapshot+="(none — all completed tasks are properly marked [x] in TODO.md)\n"
fi
else
snapshot+="(no completed tasks in DB)\n"
fi
snapshot+="\n"

# 10. Identity context
local identity
identity=$(get_aidevops_identity 2>/dev/null || whoami)
snapshot+="### Context\n"
Expand Down Expand Up @@ -501,6 +535,18 @@ You are a supervisor sanity checker for an automated task dispatch system. The q

$state_snapshot

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Untrusted data from the database (task IDs, statuses, PR URLs) is concatenated into the AI prompt via the $state_snapshot variable. An attacker could craft malicious metadata for a task that, when included in the snapshot, performs a prompt injection attack. This could trick the AI into proposing unauthorized or harmful actions, such as cancelling legitimate tasks or bypassing security rules.


## CRITICAL: Directional Authority Rule (t2838)

The DB is authoritative for task state when it shows a MORE ADVANCED state than TODO.md.
State progression: queued < dispatched < running < complete < deployed < verified.

If the DB says a task is "complete", "verified", "deployed", or "merged" but TODO.md
still shows [ ] (open), this means update-todo failed (e.g., deliverable verification
rejected the PR URL). The correct fix is "trigger_update_todo" — NOT "reset" or "unclaim".

NEVER propose "reset" or "unclaim" for tasks where the DB status is complete, verified,
deployed, or merged. Doing so causes completed work to be re-dispatched as duplicate work.

## What to Look For

1. **DB-failed tasks with TODO.md claims**: If the DB says a task failed/blocked but TODO.md still has assignee:/started: fields, the claim is stale. Strip it so the task can be re-assessed.
Expand All @@ -519,14 +565,16 @@ $state_snapshot

8. **Issue tag drift**: Check the "Cross-Repo Issue Tag Truthfulness" section. If GitHub issue labels don't match DB state, recommend "log_only" with the specific drifts so the label sync can be triggered.

9. **Any other contradiction** you can identify between the state sources.
9. **Completed tasks with stale TODO.md**: Check the "Completed Tasks with Stale TODO.md" section. If a task is complete/verified/deployed/merged in the DB but still [ ] in TODO.md, use "trigger_update_todo" to sync TODO.md to match the DB. NEVER use "reset" or "unclaim" for these.

10. **Any other contradiction** you can identify between the state sources.

## Output Format

Respond with ONLY a JSON array of fix actions. No markdown fencing, no explanation.

Each action object must have:
- "action": one of "unclaim", "reset", "cancel_orphan", "remove_blocker", "add_auto_dispatch", "log_only"
- "action": one of "unclaim", "reset", "cancel_orphan", "remove_blocker", "add_auto_dispatch", "trigger_update_todo", "log_only"
- "task_id": the task ID to act on
- "reasoning": why this fix is needed (one sentence)
- For "remove_blocker": include "blocker_id" (the failed blocker to remove)
Expand All @@ -538,6 +586,7 @@ Example:
{"action": "cancel_orphan", "task_id": "t456", "reasoning": "In DB as queued but missing from TODO.md"},
{"action": "remove_blocker", "task_id": "t789", "blocker_id": "t456", "reasoning": "Blocker t456 permanently failed (3/3 retries)"},
{"action": "add_auto_dispatch", "task_id": "t101", "reasoning": "Has model:sonnet, ~2h estimate, no blockers — eligible for dispatch"},
{"action": "trigger_update_todo", "task_id": "t025.3", "reasoning": "DB shows complete but TODO.md still [ ] — sync TODO.md to match DB"},
{"action": "log_only", "task_id": "pipeline", "message": "Phase 3 evaluated 0 tasks for 5 consecutive runs — possible schema drift in gather_task_state query"}
]

Expand Down Expand Up @@ -643,6 +692,25 @@ _execute_sanity_action() {

log_verbose " Sanity AI action: $action_type on $task_id — $reasoning"

# Guard: NEVER downgrade tasks in advanced DB states (t2838)
# The DB is authoritative for task state when it shows a more advanced state
# than TODO.md. Completed tasks showing [ ] in TODO.md need their TODO.md
# updated (via trigger_update_todo), NOT their DB state reset to queued.
local _current_db_status=""
_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || echo "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Suppressing stderr with 2>/dev/null for the db command can mask critical issues like database connection failures or permission problems. Removing it will ensure these errors are visible for easier debugging. The || echo "" will still handle command failure gracefully.

Suggested change
_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || echo "")
_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" || echo "")
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

case "$action_type" in
unclaim | reset)
case "$_current_db_status" in
complete | verified | deployed | merged)
log_warn " Phase 0.9: BLOCKED downgrade of $task_id from '$_current_db_status' to queued (t2838)"
log_warn " Phase 0.9: DB is authoritative — trigger update-todo instead of resetting"
echo "blocked: refusing to downgrade $_current_db_status task to queued (t2838)"
return 1
;;
esac
;;
esac

case "$action_type" in
unclaim)
# Strip assignee:/started: from the task in TODO.md
Expand Down Expand Up @@ -789,6 +857,45 @@ _execute_sanity_action() {
return 0
;;

trigger_update_todo)
# Trigger TODO.md update for a completed task whose [ ] was not yet marked [x] (t2838)
# This is the correct fix when DB shows complete/verified/deployed but TODO.md shows [ ].
# The DB is authoritative — we update TODO.md to match, NOT reset DB to queued.
if [[ -z "$_current_db_status" ]]; then
_current_db_status=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$task_id")';" 2>/dev/null || echo "")
fi
case "$_current_db_status" in
complete | verified | deployed | merged)
log_info " Phase 0.9: Triggering update-todo for $task_id (DB=$_current_db_status, TODO.md=[ ])"
if update_todo_on_complete "$task_id" 2>>"${SUPERVISOR_LOG:-/dev/null}"; then
echo "TODO.md updated to [x] for $_current_db_status task"
return 0
fi
# update_todo_on_complete may fail due to deliverable verification.
# In that case, force-mark with verified:date as proof-log (t2838).
log_warn " Phase 0.9: update_todo_on_complete failed for $task_id — force-marking with verified:date"
local today
today=$(date +%Y-%m-%d)
local escaped_task_id
escaped_task_id=$(printf '%s' "$task_id" | sed 's/\./\\./g')
if grep -qE "^[[:space:]]*- \[ \] ${escaped_task_id}( |$)" "$todo_file" 2>/dev/null; then
sed_inplace -E "s/^([[:space:]]*- )\[ \] (${escaped_task_id} .*)$/\1[x] \2 verified:${today} completed:${today}/" "$todo_file"
Comment on lines +880 to +882

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The task_id variable is used to construct a regular expression for grep and a command for sed without sufficient sanitization. While dots are escaped, other special characters such as /, [, ], *, (, ), and & are not. An attacker who can control the task ID (e.g., by creating a task with a malicious name) could inject regex patterns to match unintended lines or inject sed commands to corrupt the TODO.md file. In some environments, this could potentially lead to arbitrary command execution if the sed implementation supports the e flag.

if grep -qE "^[[:space:]]*- \[x\] ${escaped_task_id} " "$todo_file" 2>/dev/null; then
commit_and_push_todo "$repo_path" "chore: force-mark $task_id complete in TODO.md (DB=$_current_db_status, t2838)" >>"${SUPERVISOR_LOG:-/dev/null}" 2>&1 || true
echo "TODO.md force-marked [x] for $_current_db_status task (deliverable verification bypassed)"
return 0
fi
fi
echo "trigger_update_todo failed for $task_id"
return 1
;;
*)
echo "task $task_id is not in a completed state (DB=$_current_db_status) — cannot trigger update-todo"
return 1
;;
esac
;;

*)
echo "unknown action type: $action_type"
return 1
Expand Down
91 changes: 91 additions & 0 deletions tests/test-supervisor-state-machine.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1897,6 +1897,97 @@ else
"$(grep 't9007' "$UNBLOCK_REPO/TODO.md")"
fi

# ============================================================
# t2838: Sanity check must not downgrade completed tasks
# ============================================================
section "t2838: Sanity check downgrade prevention"

# The sanity check's _execute_sanity_action must refuse to reset/unclaim
# tasks that are in advanced DB states (complete, verified, deployed, merged).
# This test verifies the guard by directly calling the action executor.

# Source the sanity-check module (it expects supervisor-helper.sh globals)
SANITY_CHECK_SCRIPT="$SCRIPTS_DIR/supervisor-archived/sanity-check.sh"
if [[ -f "$SANITY_CHECK_SCRIPT" ]]; then
# Create a task in 'complete' state
sup add t2838a --repo "$TEST_DIR" 2>/dev/null || true
sup transition t2838a running 2>/dev/null || true
sup transition t2838a complete 2>/dev/null || true

t2838a_status=$(get_status "t2838a")
if [[ "$t2838a_status" == "complete" ]]; then
pass "t2838: task t2838a set to complete state"
else
fail "t2838: task t2838a not in complete state" "status=$t2838a_status"
fi

# Verify that cmd_reset on a complete task would succeed (baseline — this is the bug)
# The fix is in _execute_sanity_action, not in cmd_reset itself.
# We test the guard by checking that the sanity check module's state snapshot
# includes completed-but-stale tasks.

# Create a TODO.md with the task still open (simulating the stale state)
mkdir -p "$TEST_DIR"
cat >"$TEST_DIR/TODO.md" <<'TODOEOF'
- [ ] t2838a Test task for sanity check downgrade prevention #auto-dispatch
TODOEOF

# Verify the task is complete in DB but [ ] in TODO.md
if [[ "$(get_status 't2838a')" == "complete" ]] &&
grep -qE '^\s*- \[ \] t2838a' "$TEST_DIR/TODO.md"; then
pass "t2838: precondition met — DB=complete, TODO.md=[ ]"
else
fail "t2838: precondition not met" \
"DB=$(get_status 't2838a'), TODO=$(grep 't2838a' "$TEST_DIR/TODO.md" 2>/dev/null || echo 'not found')"
fi

# Test: verify_task_deliverables accepts verified_complete (deploy.sh fix)
DEPLOY_SCRIPT="$SCRIPTS_DIR/supervisor-archived/deploy.sh"
if [[ -f "$DEPLOY_SCRIPT" ]]; then
# Source deploy.sh in a subshell to test verify_task_deliverables
# We can't easily source the full supervisor stack, so we test the
# string matching logic directly
if grep -q 'verified_complete' "$DEPLOY_SCRIPT"; then
pass "t2838: deploy.sh accepts verified_complete as valid pr_url"
else
fail "t2838: deploy.sh does not handle verified_complete pr_url"
fi
else
skip "t2838: deploy.sh not found at $DEPLOY_SCRIPT"
fi

# Test: sanity-check.sh has the downgrade guard
if grep -q 'complete.*verified.*deployed.*merged' "$SANITY_CHECK_SCRIPT" &&
grep -q 'BLOCKED downgrade' "$SANITY_CHECK_SCRIPT"; then
pass "t2838: sanity-check.sh has downgrade prevention guard"
else
fail "t2838: sanity-check.sh missing downgrade prevention guard"
fi
Comment on lines +1960 to +1965

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is structural, verifying the presence of keywords in the script using grep. It would be more robust to create a functional test that actually attempts to perform a downgrade and asserts that it is blocked by the guard. This would validate the behavior of the code, not just its text. For example, you could try to execute a reset action on the complete task t2838a and verify that the action is blocked and the task's status remains complete.


# Test: sanity-check.sh has trigger_update_todo action
if grep -q 'trigger_update_todo' "$SANITY_CHECK_SCRIPT"; then
pass "t2838: sanity-check.sh has trigger_update_todo action"
else
fail "t2838: sanity-check.sh missing trigger_update_todo action"
fi

# Test: sanity-check.sh AI prompt includes directional authority rule
if grep -q 'Directional Authority' "$SANITY_CHECK_SCRIPT"; then
pass "t2838: sanity-check.sh AI prompt includes directional authority rule"
else
fail "t2838: sanity-check.sh AI prompt missing directional authority rule"
fi

# Test: state snapshot includes completed-stale section
if grep -q 'Completed Tasks with Stale TODO' "$SANITY_CHECK_SCRIPT"; then
pass "t2838: state snapshot includes completed-stale section"
else
fail "t2838: state snapshot missing completed-stale section"
fi
else
skip "t2838: sanity-check.sh not found at $SANITY_CHECK_SCRIPT"
fi

# ============================================================
# SUMMARY
# ============================================================
Expand Down
Loading