Skip to content

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 8, 2026

Summary

  • Add backup_db() / restore_db() helpers and backup / restore commands for supervisor DB safety and manual recovery
  • Backup automatically before destructive schema migrations (t128.8 table rename/recreate, t148 table rename/recreate)
  • Fix t128.8 migration: replace INSERT INTO tasks SELECT * FROM tasks_old with explicit column list to prevent column count mismatch when migrations run out of order

Root Cause

The t128.8 migration used SELECT * to copy data from the old table to the new one. If a prior ALTER TABLE ADD COLUMN migration had already added columns (e.g., issue_url, diagnostic_of, triage_result), the column count would mismatch and the migration would fail. Since the old table was already renamed at that point, a failed transaction could leave the DB in an inconsistent state with no recovery path.

Additionally, no backup was taken before any destructive migration, so there was no way to recover from a failed migration.

Changes

File Change
supervisor-helper.sh Add backup_db() with SQLite .backup, timestamped copies, auto-prune to 5
supervisor-helper.sh Add restore_db() with validation and pre-restore safety backup
supervisor-helper.sh Add cmd_backup / cmd_restore commands
supervisor-helper.sh Backup before t128.8 and t148 destructive migrations
supervisor-helper.sh Fix t128.8 SELECT * → explicit column list
TODO.md Add t162 task entry

Testing

  • bash -n syntax check: pass
  • shellcheck -S error: zero violations
  • State machine tests: 69/69 pass (1 skip expected)
  • Functional test: backup/restore cycle verified with test DB

Summary by CodeRabbit

  • New Features
    • Added backup and restore commands for database management
    • Automatic backups now created before database schema updates
    • Updated help documentation to reflect new commands

…lumn migrations (t162)

- Add backup_db() helper using SQLite .backup with auto-prune to 5 copies
- Add restore_db() with backup validation and pre-restore safety backup
- Add backup/restore commands for manual DB recovery
- Backup before destructive t128.8 and t148 migrations (table rename/recreate)
- Fix t128.8 migration: replace SELECT * with explicit column list to prevent
  column count mismatch when migrations run out of order
- Add t162 to TODO.md
@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 8, 2026

Walkthrough

This PR introduces a backup and restore subsystem for the supervisor SQLite database in the shell automation framework. It adds backup_db and restore_db functions with CLI commands, integrates pre-migration backups into specific schema migrations, and updates help documentation to expose these capabilities.

Changes

Cohort / File(s) Summary
Backup/Restore Subsystem
.agents/scripts/supervisor-helper.sh
Added backup_db() function with timestamped backups, WAL-aware backup with file copy fallback, backup pruning (last 5 retained), and backup path return. Added restore_db() function with SQLite database validation and WAL/shm file preservation. Exposed as public CLI commands cmd_backup and cmd_restore wired into main dispatcher. Updated usage/help text. Integrated pre-migration backup calls into migration hooks (pre-migrate-t128.8 and pre-migrate-t148) with non-fatal fallback.
Backlog Update
TODO.md
Replaced t160 backlog item with new t162 entry describing supervisor DB safety enhancement covering backup-before-migrate strategy and explicit column migration fixes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

Poem

🛡️ Backups bloom before migrations dance,
WAL files waltz in safety's romance,
Five versions kept, the oldest fade,
Database dreams will never be swayed!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: database safety enhancements through backup-before-migrate functionality and explicit column migrations, with reference to the tracking task t162.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t162

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Feb 8, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 36 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sun Feb 8 05:54:22 UTC 2026: Code review monitoring started
Sun Feb 8 05:54:22 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 36

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 36
  • VULNERABILITIES: 0

Generated on: Sun Feb 8 05:54:25 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 8, 2026

@marcusquinn marcusquinn merged commit 9de36e4 into main Feb 8, 2026
9 of 11 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @.agents/scripts/supervisor-helper.sh:
- Around line 454-503: The backup_db function embeds user-supplied reason into
filenames, allowing path traversal or unsafe characters; before composing
backup_file sanitize reason (used in variable reason and backup_file) by
stripping any path separators and leading dots, replacing non-alphanumeric
characters with a safe separator (e.g. hyphen), collapsing repeated separators,
trimming to a reasonable length, and defaulting to "manual" if empty; perform
this normalization immediately after local reason="${1:-manual}" and use the
sanitized value when creating backup_file and when pruning/cleanup so filenames
cannot escape $SUPERVISOR_DIR or contain unsafe characters.
- Around line 510-554: In restore_db(), before copying the backup into
SUPERVISOR_DB, ensure any existing stale SQLite journal files for the target are
removed (delete "${SUPERVISOR_DB}-wal" and "${SUPERVISOR_DB}-shm" if present) so
SQLite won't replay them against the restored file; then perform the cp of
"$backup_file" to "$SUPERVISOR_DB" and immediately check the command exit
status, logging an error via log_error and returning non‑zero on failure; keep
the existing logic that conditionally copies backup-file WAL/SHM after a
successful cp and retain the pre-restore backup_db call.

Comment on lines +454 to +503
#######################################
# Backup supervisor database before destructive operations (t162)
# Creates timestamped copy in supervisor dir. Keeps last 5 backups.
# Usage: backup_db [reason]
#######################################
backup_db() {
local reason="${1:-manual}"

if [[ ! -f "$SUPERVISOR_DB" ]]; then
log_warn "No database to backup at: $SUPERVISOR_DB"
return 1
fi

local timestamp
timestamp=$(date -u +%Y%m%dT%H%M%SZ)
local backup_file="$SUPERVISOR_DIR/supervisor-backup-${timestamp}-${reason}.db"

# Use SQLite .backup for consistency (handles WAL correctly)
if sqlite3 "$SUPERVISOR_DB" ".backup '$backup_file'" 2>/dev/null; then
log_success "Database backed up: $backup_file"
else
# Fallback to file copy if .backup fails
if cp "$SUPERVISOR_DB" "$backup_file" 2>/dev/null; then
# Also copy WAL/SHM if present
[[ -f "${SUPERVISOR_DB}-wal" ]] && cp "${SUPERVISOR_DB}-wal" "${backup_file}-wal" 2>/dev/null || true
[[ -f "${SUPERVISOR_DB}-shm" ]] && cp "${SUPERVISOR_DB}-shm" "${backup_file}-shm" 2>/dev/null || true
log_success "Database backed up (file copy): $backup_file"
else
log_error "Failed to backup database"
return 1
fi
fi

# Prune old backups: keep last 5
local backup_count
# shellcheck disable=SC2012
backup_count=$(ls -1 "$SUPERVISOR_DIR"/supervisor-backup-*.db 2>/dev/null | wc -l | tr -d ' ')
if [[ "$backup_count" -gt 5 ]]; then
local to_remove
to_remove=$((backup_count - 5))
# shellcheck disable=SC2012
ls -1t "$SUPERVISOR_DIR"/supervisor-backup-*.db 2>/dev/null | tail -n "$to_remove" | while IFS= read -r old_backup; do
rm -f "$old_backup" "${old_backup}-wal" "${old_backup}-shm" 2>/dev/null || true
done
log_info "Pruned $to_remove old backup(s)"
fi

echo "$backup_file"
return 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sanitize the backup reason before using it in filenames.

reason is user-supplied and is embedded directly in the backup filename. A value containing path separators or .. can escape $SUPERVISOR_DIR, and unsafe characters can break filenames. Normalize it to a safe slug before composing the path.

✅ Suggested fix
 backup_db() {
     local reason="${1:-manual}"
+    local safe_reason
+    safe_reason=$(echo "$reason" | tr -cs 'A-Za-z0-9._-' '_' | sed -E 's/^_+|_+$//g')
+    [[ -z "$safe_reason" ]] && safe_reason="manual"
 
     if [[ ! -f "$SUPERVISOR_DB" ]]; then
         log_warn "No database to backup at: $SUPERVISOR_DB"
         return 1
     fi
@@
-    local backup_file="$SUPERVISOR_DIR/supervisor-backup-${timestamp}-${reason}.db"
+    local backup_file="$SUPERVISOR_DIR/supervisor-backup-${timestamp}-${safe_reason}.db"

As per coding guidelines, "Reliability and robustness".

🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 454 - 503, The backup_db
function embeds user-supplied reason into filenames, allowing path traversal or
unsafe characters; before composing backup_file sanitize reason (used in
variable reason and backup_file) by stripping any path separators and leading
dots, replacing non-alphanumeric characters with a safe separator (e.g. hyphen),
collapsing repeated separators, trimming to a reasonable length, and defaulting
to "manual" if empty; perform this normalization immediately after local
reason="${1:-manual}" and use the sanitized value when creating backup_file and
when pruning/cleanup so filenames cannot escape $SUPERVISOR_DIR or contain
unsafe characters.

Comment on lines +510 to +554
restore_db() {
local backup_file="${1:-}"

if [[ -z "$backup_file" ]]; then
log_info "Available backups:"
# shellcheck disable=SC2012
ls -1t "$SUPERVISOR_DIR"/supervisor-backup-*.db 2>/dev/null | while IFS= read -r f; do
local size
size=$(du -h "$f" 2>/dev/null | cut -f1)
local task_count
task_count=$(sqlite3 "$f" "SELECT count(*) FROM tasks;" 2>/dev/null || echo "?")
echo " $f ($size, $task_count tasks)"
done
return 0
fi

if [[ ! -f "$backup_file" ]]; then
log_error "Backup file not found: $backup_file"
return 1
fi

# Verify backup is valid SQLite
if ! sqlite3 "$backup_file" "SELECT count(*) FROM tasks;" >/dev/null 2>&1; then
log_error "Backup file is not a valid supervisor database: $backup_file"
return 1
fi

# Backup current DB before overwriting
if [[ -f "$SUPERVISOR_DB" ]]; then
backup_db "pre-restore" >/dev/null 2>&1 || true
fi

cp "$backup_file" "$SUPERVISOR_DB"
[[ -f "${backup_file}-wal" ]] && cp "${backup_file}-wal" "${SUPERVISOR_DB}-wal" 2>/dev/null || true
[[ -f "${backup_file}-shm" ]] && cp "${backup_file}-shm" "${SUPERVISOR_DB}-shm" 2>/dev/null || true

local task_count
task_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks;")
local batch_count
batch_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM batches;")

log_success "Database restored from: $backup_file"
log_info "Tasks: $task_count | Batches: $batch_count"
return 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -20 .agents/scripts/supervisor-helper.sh && \
wc -l .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 1554


🏁 Script executed:

sed -n '500,560p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 2086


🏁 Script executed:

# Search for backup_db and db functions to understand their behavior
grep -n "^backup_db()" .agents/scripts/supervisor-helper.sh
grep -n "^db()" .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 92


🏁 Script executed:

# Check if WAL/SHM handling exists elsewhere in the script
rg -n "wal|shm" .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 775


🏁 Script executed:

sed -n '450,460p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 399


🏁 Script executed:

sed -n '459,500p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 1767


🌐 Web query:

SQLite WAL file replay behavior when database file replaced restored

💡 Result:

In WAL mode, the on-disk “database state” is not just db.sqlite — it’s effectively the combination of:

  • db.sqlite (main database file)
  • db.sqlite-wal (write-ahead log containing committed transactions not yet written back)
  • db.sqlite-shm (wal-index/shared-memory coordination file; rebuildable)

SQLite’s docs explicitly warn that if the database file is separated from its WAL file, committed transactions can be lost and the database can become corrupted; therefore the WAL “is part of the persistent state of the database and should be kept with the database if the database is copied or moved.” [1] They also list “Overwriting a database file with another without also deleting any hot journal (including -wal)” as an action “likely to lead to corruption.” [2]

What happens if you restore/replace only db.sqlite but leave an old db.sqlite-wal?

On the next open, SQLite will see the -wal and treat it like crash-recovery state; readers consult the WAL for the latest committed pages, and SQLite may run recovery/checkpoint activity to reconcile WAL content with the database file [1]. If the leftover -wal does not actually belong to the restored db.sqlite, you’ve “mispaird” files; SQLite can then present inconsistent data, fail with errors, or end up corrupting the restored database as it tries to interpret/replay WAL content that was for a different database incarnation [1], [2].

Practical rules

  • If you restore db.sqlite, restore the matching db.sqlite-wal too (and you may restore db.sqlite-shm, but it’s not required for recovery content). [1]
  • If you intentionally want the restored db.sqlite to stand alone, then delete db.sqlite-wal (and db.sqlite-shm) before opening the restored database, so SQLite cannot try to use stale WAL state. This aligns with SQLite’s corruption guidance about not leaving a “hot” -wal behind when swapping the main db file. [2]

Backup/restore best practice (to avoid this entirely)

Before taking a backup, ensure the DB is in a quiescent/clean state (e.g., checkpoint and close connections) so there is no meaningful -wal to carry around; otherwise you must treat db.sqlite + db.sqlite-wal as an inseparable pair. [1], [2]

Sources: SQLite WAL mode documentation [1], “How To Corrupt An SQLite Database File” (hot journals / mispairing WAL) [2].

[1] https://sqlite.org/wal.html
[2] https://www.sqlite.org/howtocorrupt.html


Remove stale WAL/SHM before restoring the database.

If ${SUPERVISOR_DB}-wal and -shm from a previous database remain, SQLite will attempt to replay them against the restored file, causing corruption or data inconsistency. These must be cleared before copying the backup. Additionally, add error handling to fail fast if the copy operation fails.

Suggested fix
     # Backup current DB before overwriting
     if [[ -f "$SUPERVISOR_DB" ]]; then
         backup_db "pre-restore" >/dev/null 2>&1 || true
     fi
 
-    cp "$backup_file" "$SUPERVISOR_DB"
+    # Remove stale WAL/SHM to avoid replaying old frames
+    rm -f "${SUPERVISOR_DB}-wal" "${SUPERVISOR_DB}-shm" 2>/dev/null || true
+
+    if ! cp "$backup_file" "$SUPERVISOR_DB"; then
+        log_error "Failed to restore database from: $backup_file"
+        return 1
+    fi
     [[ -f "${backup_file}-wal" ]] && cp "${backup_file}-wal" "${SUPERVISOR_DB}-wal" 2>/dev/null || true
     [[ -f "${backup_file}-shm" ]] && cp "${backup_file}-shm" "${SUPERVISOR_DB}-shm" 2>/dev/null || true

Per coding guidelines: Reliability and robustness.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
restore_db() {
local backup_file="${1:-}"
if [[ -z "$backup_file" ]]; then
log_info "Available backups:"
# shellcheck disable=SC2012
ls -1t "$SUPERVISOR_DIR"/supervisor-backup-*.db 2>/dev/null | while IFS= read -r f; do
local size
size=$(du -h "$f" 2>/dev/null | cut -f1)
local task_count
task_count=$(sqlite3 "$f" "SELECT count(*) FROM tasks;" 2>/dev/null || echo "?")
echo " $f ($size, $task_count tasks)"
done
return 0
fi
if [[ ! -f "$backup_file" ]]; then
log_error "Backup file not found: $backup_file"
return 1
fi
# Verify backup is valid SQLite
if ! sqlite3 "$backup_file" "SELECT count(*) FROM tasks;" >/dev/null 2>&1; then
log_error "Backup file is not a valid supervisor database: $backup_file"
return 1
fi
# Backup current DB before overwriting
if [[ -f "$SUPERVISOR_DB" ]]; then
backup_db "pre-restore" >/dev/null 2>&1 || true
fi
cp "$backup_file" "$SUPERVISOR_DB"
[[ -f "${backup_file}-wal" ]] && cp "${backup_file}-wal" "${SUPERVISOR_DB}-wal" 2>/dev/null || true
[[ -f "${backup_file}-shm" ]] && cp "${backup_file}-shm" "${SUPERVISOR_DB}-shm" 2>/dev/null || true
local task_count
task_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks;")
local batch_count
batch_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM batches;")
log_success "Database restored from: $backup_file"
log_info "Tasks: $task_count | Batches: $batch_count"
return 0
}
restore_db() {
local backup_file="${1:-}"
if [[ -z "$backup_file" ]]; then
log_info "Available backups:"
# shellcheck disable=SC2012
ls -1t "$SUPERVISOR_DIR"/supervisor-backup-*.db 2>/dev/null | while IFS= read -r f; do
local size
size=$(du -h "$f" 2>/dev/null | cut -f1)
local task_count
task_count=$(sqlite3 "$f" "SELECT count(*) FROM tasks;" 2>/dev/null || echo "?")
echo " $f ($size, $task_count tasks)"
done
return 0
fi
if [[ ! -f "$backup_file" ]]; then
log_error "Backup file not found: $backup_file"
return 1
fi
# Verify backup is valid SQLite
if ! sqlite3 "$backup_file" "SELECT count(*) FROM tasks;" >/dev/null 2>&1; then
log_error "Backup file is not a valid supervisor database: $backup_file"
return 1
fi
# Backup current DB before overwriting
if [[ -f "$SUPERVISOR_DB" ]]; then
backup_db "pre-restore" >/dev/null 2>&1 || true
fi
# Remove stale WAL/SHM to avoid replaying old frames
rm -f "${SUPERVISOR_DB}-wal" "${SUPERVISOR_DB}-shm" 2>/dev/null || true
if ! cp "$backup_file" "$SUPERVISOR_DB"; then
log_error "Failed to restore database from: $backup_file"
return 1
fi
[[ -f "${backup_file}-wal" ]] && cp "${backup_file}-wal" "${SUPERVISOR_DB}-wal" 2>/dev/null || true
[[ -f "${backup_file}-shm" ]] && cp "${backup_file}-shm" "${SUPERVISOR_DB}-shm" 2>/dev/null || true
local task_count
task_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks;")
local batch_count
batch_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM batches;")
log_success "Database restored from: $backup_file"
log_info "Tasks: $task_count | Batches: $batch_count"
return 0
}
🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 510 - 554, In
restore_db(), before copying the backup into SUPERVISOR_DB, ensure any existing
stale SQLite journal files for the target are removed (delete
"${SUPERVISOR_DB}-wal" and "${SUPERVISOR_DB}-shm" if present) so SQLite won't
replay them against the restored file; then perform the cp of "$backup_file" to
"$SUPERVISOR_DB" and immediately check the command exit status, logging an error
via log_error and returning non‑zero on failure; keep the existing logic that
conditionally copies backup-file WAL/SHM after a successful cp and retain the
pre-restore backup_db call.

@marcusquinn marcusquinn mentioned this pull request Feb 10, 2026
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant