feat(supervisor): add self-healing diagnostic subtasks on failure/block (t150) #462

marcusquinn · 2026-02-07T19:17:28Z

Summary

When a task fails or gets blocked, the supervisor now auto-creates a diagnostic subtask ({task}-diag-N) that analyzes the failure log and attempts a fix
On diagnostic completion, the original task is automatically re-queued for retry
Adds self-heal command for manual diagnostic creation, --no-self-heal flag for pulse, and SUPERVISOR_SELF_HEAL env toggle

Changes

supervisor-helper.sh (+325 lines)

Schema:

Added diagnostic_of TEXT column to tasks table (links diagnostic to parent)
Added migration for existing databases (ALTER TABLE ADD COLUMN)
Added idx_tasks_diagnostic index

Core functions:

is_self_heal_eligible() - checks guards: env toggle, human-only failures (auth/OOM/conflict), recursive prevention, max 1 per task
create_diagnostic_subtask() - creates {task}-diag-N with failure context from log tail, adds to same batch
attempt_self_heal() - eligibility check + creation, called from pulse cycle
handle_diagnostic_completion() - on diagnostic complete, re-queues parent if still blocked/failed

Pulse integration:

complete handler: calls handle_diagnostic_completion() to re-queue parent
blocked handler: calls attempt_self_heal() to create diagnostic
failed handler: calls attempt_self_heal() to create diagnostic
retry->blocked (max retries): calls attempt_self_heal()
retry->failed (re-prompt failed): calls attempt_self_heal()

Command & config:

self-heal <task_id> - manual diagnostic creation for blocked/failed tasks
--no-self-heal flag on pulse command
SUPERVISOR_SELF_HEAL=false env var to disable globally

TODO.md

Added t150 task definition with 5 subtasks

Guards

Guard	Purpose
`SUPERVISOR_SELF_HEAL=false`	Global disable
`--no-self-heal` on pulse	Per-pulse disable
Skip `auth_error`	Human must fix credentials
Skip `merge_conflict`	Human must resolve
Skip `out_of_memory`	Infrastructure issue
Skip `no_log_file`	Nothing to diagnose
Skip `max_retries`	Already exhausted retries
Skip if task is itself a diagnostic	Prevent recursive healing
Max 1 auto-diagnostic per task	Prevent diagnostic spam

Testing

Syntax: bash -n passes
ShellCheck: zero errors at -S error level
Functional: tested init, add, transition, self-heal, batch integration, eligibility guards, duplicate prevention, diagnostic completion re-queue

coderabbitai · 2026-02-07T19:17:34Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 28 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t150

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-07T19:17:47Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust self-healing capability to the supervisor system, designed to enhance its resilience and reduce the need for manual intervention. When tasks encounter failures or become blocked, the system now intelligently generates specialized diagnostic subtasks. These subtasks are equipped to analyze the root cause of the issue from logs and propose or apply fixes. Upon successful completion of a diagnostic subtask, the original problematic task is automatically re-queued, allowing for automated recovery and continuous operation. This feature significantly improves the supervisor's ability to manage and recover from unexpected task states.

Highlights

Automated Diagnostic Subtasks: The supervisor now automatically creates diagnostic subtasks (e.g., {task}-diag-N) when a task fails or becomes blocked. These subtasks analyze failure logs and attempt to fix the underlying issues.
Automated Task Re-queuing: Upon successful completion of a diagnostic subtask, the original parent task is automatically re-queued for retry, reducing manual intervention.
Manual Self-Healing Command: A new self-heal command has been added, allowing users to manually trigger the creation of a diagnostic subtask for any failed or blocked task.
Configurable Self-Healing: Self-healing can be controlled via a --no-self-heal flag for the pulse command and a global SUPERVISOR_SELF_HEAL environment variable.
Self-Healing Guards: Several guards are in place to prevent unwanted self-healing, such as skipping human-fixable errors (e.g., authentication, out-of-memory, merge conflicts), preventing recursive diagnostics, and limiting to one auto-diagnostic per task.

Changelog

.agents/scripts/supervisor-helper.sh
- Added diagnostic_of TEXT column to the tasks table and a corresponding migration for existing databases, along with an index idx_tasks_diagnostic.
- Implemented is_self_heal_eligible() to determine if a task qualifies for self-healing, considering global toggles, specific failure types (e.g., auth_error, out_of_memory), and preventing recursive diagnostics or duplicates.
- Created create_diagnostic_subtask() to generate a new diagnostic task, populating its description with failure context from the parent task's log.
- Introduced attempt_self_heal() to conditionally create diagnostic subtasks during the pulse cycle for blocked or failed tasks.
- Developed handle_diagnostic_completion() to re-queue the parent task if it's still blocked or failed after its diagnostic subtask finishes.
- Integrated self-healing calls into the cmd_pulse function's state transition handlers for complete, blocked, failed, and retry outcomes.
- Added a new cmd_self_heal command for manual diagnostic task creation.
- Updated the command-line help and environment variable documentation to include the new self-heal command, --no-self-heal flag, and SUPERVISOR_SELF_HEAL environment variable.
TODO.md
- Added a new entry t150 for the supervisor self-healing feature, outlining its purpose and breaking it down into five subtasks.

Activity

The pull request introduces a new feature for supervisor self-healing.
No specific review comments or discussions have been provided in the context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable self-healing mechanism for failed or blocked tasks, which is a great enhancement for the supervisor's robustness. The implementation is well-structured, with clear separation of concerns in the new functions and good integration into the existing pulse cycle. I've identified a couple of areas for improvement: a high-severity issue related to SQL query safety that could lead to incorrect behavior, and a medium-severity issue regarding an unused function parameter that affects code clarity. Overall, this is a solid contribution.

gemini-code-assist · 2026-02-07T19:19:31Z

.agents/scripts/supervisor-helper.sh

+
+    # Generate diagnostic task ID: {parent}-diag-{N}
+    local diag_count
+    diag_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks WHERE id LIKE '$(sql_escape "$task_id")-diag-%';" 2>/dev/null || echo "0")


The use of LIKE with sql_escape is not safe because sql_escape only handles single quotes (') and does not escape SQL LIKE wildcards (% and _). If a task_id were to contain one of these characters, this query would produce incorrect results.

To fix this, you should also escape these wildcards and add the ESCAPE clause to your SQL query. A self-contained way to do this is to perform the escaping inline.

Suggested change

diag_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks WHERE id LIKE '$(sql_escape "$task_id")-diag-%';" 2>/dev/null || echo "0")

diag_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks WHERE id LIKE '$(echo "$task_id" | sed -e "s/'/''/g" -e "s/%/\\%/g" -e "s/_/\\_/g")-diag-%' ESCAPE '\\';" 2>/dev/null || echo "0")

gemini-code-assist · 2026-02-07T19:19:31Z

.agents/scripts/supervisor-helper.sh

+#######################################
+attempt_self_heal() {
+    local task_id="$1"
+    local outcome_type="$2"


The outcome_type parameter is declared but never used within the attempt_self_heal function. It should be removed to simplify the function signature and improve code clarity.

To fix this, you should:

Remove this line.

Update the following lines to use $2 and $3 for failure_reason and batch_id respectively.

Update the four call sites for attempt_self_heal in cmd_pulse to pass only 3 arguments instead of 4.

…ck (t150) When a task fails or gets blocked, the supervisor now auto-creates a diagnostic subtask ({task}-diag-N) that analyzes the failure log and attempts a fix. On diagnostic completion, the original task is re-queued. Changes: - Add diagnostic_of column to tasks schema with migration - Add create_diagnostic_subtask() for generating diagnostic tasks - Add attempt_self_heal() called from pulse blocked/failed handlers - Add handle_diagnostic_completion() to re-queue parent on diag success - Add self-heal command for manual diagnostic creation - Add --no-self-heal flag to pulse and SUPERVISOR_SELF_HEAL env toggle - Guards: max 1 auto-diagnostic per task, skip auth/OOM/conflict/recursive

github-actions · 2026-02-07T19:21:46Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 26 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 7 19:21:41 UTC 2026: Code review monitoring started
Sat Feb 7 19:21:42 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 26
Sat Feb 7 19:21:42 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Feb 7 19:21:44 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 26
VULNERABILITIES: 0

Generated on: Sat Feb 7 19:21:45 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-07T19:22:06Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

, #468, #469)

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

marcusquinn force-pushed the feature/t150 branch from e1263b0 to 52e7403 Compare February 7, 2026 19:21

marcusquinn merged commit 1e59eb2 into main Feb 7, 2026
1 of 3 checks passed

marcusquinn mentioned this pull request Feb 7, 2026

feat(supervisor): self-healing diagnostic subtasks on failure/block (t147.5, t150) #464

Closed

marcusquinn added a commit that referenced this pull request Feb 7, 2026

chore: mark t144, t147.3, t149, t150 complete in TODO.md (PRs #462, #463

7498f0a

, #468, #469)

marcusquinn mentioned this pull request Feb 7, 2026

t150: Supervisor self-healing: auto-create diagnostic subtask on failure #456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(supervisor): add self-healing diagnostic subtasks on failure/block (t150) #462

feat(supervisor): add self-healing diagnostic subtasks on failure/block (t150) #462

Uh oh!

marcusquinn commented Feb 7, 2026

Uh oh!

coderabbitai bot commented Feb 7, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Feb 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Uh oh!

gemini-code-assist bot Feb 7, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 7, 2026

Uh oh!

sonarqubecloud bot commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	diag_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks WHERE id LIKE '$(sql_escape "$task_id")-diag-%';" 2>/dev/null \|\| echo "0")
	diag_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks WHERE id LIKE '$(echo "$task_id" \| sed -e "s/'/''/g" -e "s/%/\\%/g" -e "s/_/\\_/g")-diag-%' ESCAPE '\\';" 2>/dev/null \|\| echo "0")

feat(supervisor): add self-healing diagnostic subtasks on failure/block (t150) #462

feat(supervisor): add self-healing diagnostic subtasks on failure/block (t150) #462

Uh oh!

Conversation

marcusquinn commented Feb 7, 2026

Summary

Changes

supervisor-helper.sh (+325 lines)

TODO.md

Guards

Testing

Uh oh!

coderabbitai bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Feb 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 7, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 7, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 7, 2026 •

edited

Loading