-
Notifications
You must be signed in to change notification settings - Fork 5
feat(supervisor): add self-healing diagnostic subtasks on failure/block (t150) #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust self-healing capability to the supervisor system, designed to enhance its resilience and reduce the need for manual intervention. When tasks encounter failures or become blocked, the system now intelligently generates specialized diagnostic subtasks. These subtasks are equipped to analyze the root cause of the issue from logs and propose or apply fixes. Upon successful completion of a diagnostic subtask, the original problematic task is automatically re-queued, allowing for automated recovery and continuous operation. This feature significantly improves the supervisor's ability to manage and recover from unexpected task states. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable self-healing mechanism for failed or blocked tasks, which is a great enhancement for the supervisor's robustness. The implementation is well-structured, with clear separation of concerns in the new functions and good integration into the existing pulse cycle. I've identified a couple of areas for improvement: a high-severity issue related to SQL query safety that could lead to incorrect behavior, and a medium-severity issue regarding an unused function parameter that affects code clarity. Overall, this is a solid contribution.
|
|
||
| # Generate diagnostic task ID: {parent}-diag-{N} | ||
| local diag_count | ||
| diag_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks WHERE id LIKE '$(sql_escape "$task_id")-diag-%';" 2>/dev/null || echo "0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of LIKE with sql_escape is not safe because sql_escape only handles single quotes (') and does not escape SQL LIKE wildcards (% and _). If a task_id were to contain one of these characters, this query would produce incorrect results.
To fix this, you should also escape these wildcards and add the ESCAPE clause to your SQL query. A self-contained way to do this is to perform the escaping inline.
| diag_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks WHERE id LIKE '$(sql_escape "$task_id")-diag-%';" 2>/dev/null || echo "0") | |
| diag_count=$(db "$SUPERVISOR_DB" "SELECT count(*) FROM tasks WHERE id LIKE '$(echo "$task_id" | sed -e "s/'/''/g" -e "s/%/\\%/g" -e "s/_/\\_/g")-diag-%' ESCAPE '\\';" 2>/dev/null || echo "0") |
| ####################################### | ||
| attempt_self_heal() { | ||
| local task_id="$1" | ||
| local outcome_type="$2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outcome_type parameter is declared but never used within the attempt_self_heal function. It should be removed to simplify the function signature and improve code clarity.
To fix this, you should:
- Remove this line.
- Update the following lines to use
$2and$3forfailure_reasonandbatch_idrespectively. - Update the four call sites for
attempt_self_healincmd_pulseto pass only 3 arguments instead of 4.
…ck (t150)
When a task fails or gets blocked, the supervisor now auto-creates a
diagnostic subtask ({task}-diag-N) that analyzes the failure log and
attempts a fix. On diagnostic completion, the original task is re-queued.
Changes:
- Add diagnostic_of column to tasks schema with migration
- Add create_diagnostic_subtask() for generating diagnostic tasks
- Add attempt_self_heal() called from pulse blocked/failed handlers
- Add handle_diagnostic_completion() to re-queue parent on diag success
- Add self-heal command for manual diagnostic creation
- Add --no-self-heal flag to pulse and SUPERVISOR_SELF_HEAL env toggle
- Guards: max 1 auto-diagnostic per task, skip auth/OOM/conflict/recursive
e1263b0 to
52e7403
Compare
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 7 19:21:45 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
{task}-diag-N) that analyzes the failure log and attempts a fixself-healcommand for manual diagnostic creation,--no-self-healflag for pulse, andSUPERVISOR_SELF_HEALenv toggleChanges
supervisor-helper.sh (+325 lines)
Schema:
diagnostic_of TEXTcolumn to tasks table (links diagnostic to parent)ALTER TABLE ADD COLUMN)idx_tasks_diagnosticindexCore functions:
is_self_heal_eligible()- checks guards: env toggle, human-only failures (auth/OOM/conflict), recursive prevention, max 1 per taskcreate_diagnostic_subtask()- creates{task}-diag-Nwith failure context from log tail, adds to same batchattempt_self_heal()- eligibility check + creation, called from pulse cyclehandle_diagnostic_completion()- on diagnostic complete, re-queues parent if still blocked/failedPulse integration:
completehandler: callshandle_diagnostic_completion()to re-queue parentblockedhandler: callsattempt_self_heal()to create diagnosticfailedhandler: callsattempt_self_heal()to create diagnosticretry->blocked(max retries): callsattempt_self_heal()retry->failed(re-prompt failed): callsattempt_self_heal()Command & config:
self-heal <task_id>- manual diagnostic creation for blocked/failed tasks--no-self-healflag on pulse commandSUPERVISOR_SELF_HEAL=falseenv var to disable globallyTODO.md
Guards
SUPERVISOR_SELF_HEAL=false--no-self-healon pulseauth_errormerge_conflictout_of_memoryno_log_filemax_retriesTesting
bash -npasses-S errorlevel