Skip to content

Conversation

@alex-solovyev
Copy link
Collaborator

Summary

  • Adds post-completion quality gate that evaluates worker output before accepting completion
  • If quality is insufficient and escalation depth allows, re-queues the task with a higher-tier model (haiku->sonnet->opus, flash->pro)
  • Configurable per-task (max_escalation, default 2) and per-batch (--skip-quality-gate)

Quality Checks (check_output_quality())

  1. Trivial output: Log < 2KB without completion signals
  2. Error patterns: Panics, fatal errors, segfaults, OOM (>2 occurrences)
  3. Token-to-substance ratio: Log > 500KB with < 3 substance markers
  4. No file changes: Empty git diff on branch
  5. Syntax errors: bash -n failures on changed .sh files (>5 errors)
  6. PR signal: PR existence is a strong positive signal (auto-pass)

New Functions

  • get_next_tier() - Escalation chain mapping
  • check_output_quality() - Heuristic quality checks
  • run_quality_gate() - Orchestrates check + escalation decision

Schema Changes

  • tasks.escalation_depth (INTEGER, default 0) - Current escalation count
  • tasks.max_escalation (INTEGER, default 2) - Max allowed escalations
  • batches.skip_quality_gate (INTEGER, default 0) - Skip gate for batch

State Machine

  • Added evaluating:queued transition for quality gate escalation
  • Quality gate runs in pulse Phase 1 before cmd_transition to complete

Testing

  • bash -n syntax check: pass
  • ShellCheck -S error: zero violations
  • Existing migrations preserved (additive ALTER TABLE only)

Fixes: t132.6 ref:GH#738

Post-completion quality checks that can re-dispatch to a higher-tier model
if output quality is insufficient. Escalation chain: haiku->sonnet->opus,
flash->pro. Configurable max_escalation per task and skip_quality_gate per batch.

- get_next_tier(): maps current model to next tier in escalation chain
- check_output_quality(): heuristic checks (log size, error patterns, file
  changes, syntax errors, token-to-substance ratio)
- run_quality_gate(): orchestrates quality check + escalation decision
- Schema: escalation_depth/max_escalation on tasks, skip_quality_gate on batches
- Wired into pulse complete handler before accepting completion
- --skip-quality-gate flag on batch command for trusted/trivial tasks

Fixes: t132.6 ref:GH#738
@alex-solovyev alex-solovyev merged commit 8fdd553 into main Feb 9, 2026
4 checks passed
@gemini-code-assist
Copy link

Summary of Changes

Hello @alex-solovyev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust quality assurance mechanism for worker tasks. It establishes a post-completion quality gate that intelligently assesses the output of a task using various heuristics. If the output fails to meet quality standards, the system can automatically escalate the task by re-queuing it to be processed by a more advanced model, ensuring higher quality outcomes while providing configurable control over the escalation process.

Highlights

  • Post-Completion Quality Gate: Implements a new quality gate that evaluates worker output after task completion to ensure satisfactory results.
  • Model Escalation Mechanism: Automatically re-queues tasks with higher-tier models (e.g., haiku->sonnet->opus, flash->pro) if the initial output quality is insufficient and escalation depth permits.
  • Configurable Behavior: Allows configuration of maximum escalation attempts per task (max_escalation, default 2) and the ability to skip the quality gate for an entire batch (--skip-quality-gate).
  • Heuristic Quality Checks: Defines several checks including log size, error patterns (panics, OOM), token-to-substance ratio, absence of file changes, and shell script syntax errors.
  • New Core Functions: Introduces get_next_tier(), check_output_quality(), and run_quality_gate() to manage the escalation logic and quality assessment.
  • Database Schema Updates: Adds escalation_depth and max_escalation columns to the tasks table and skip_quality_gate to the batches table.
  • State Machine Integration: Incorporates an evaluating:queued transition to handle tasks re-queued by the quality gate, ensuring proper state management.
Changelog
  • .agents/scripts/supervisor-helper.sh
    • Added comprehensive documentation for the new quality gate and model escalation feature.
    • Introduced the evaluating:queued state transition to support re-queueing tasks after quality gate failures.
    • Implemented database schema migrations to add escalation_depth and max_escalation columns to the tasks table, and skip_quality_gate to the batches table, ensuring backward compatibility.
    • Updated the CREATE TABLE statements for tasks and batches to include the new escalation and quality gate columns.
    • Modified the cmd_batch function to support a new --skip-quality-gate command-line option, allowing users to bypass quality checks for specific batches.
    • Enhanced the cmd_batch output to reflect whether the quality gate is skipped for a newly created batch.
    • Added the get_next_tier() function, which maps current models to their next higher-tier counterparts for escalation.
    • Implemented the check_output_quality() function, a core component that performs heuristic checks on worker logs and worktrees (e.g., log size, error patterns, file changes, shell script syntax) to determine output quality.
    • Introduced the run_quality_gate() function, which orchestrates the quality assessment, decides on task escalation, updates task metadata, and triggers re-queueing if necessary.
    • Integrated the run_quality_gate() logic into the cmd_pulse function, ensuring that quality checks are performed immediately after a worker reports completion and before the task is finalized.
Activity
  • A new feature is introduced to add a quality gate with model escalation to the supervisor script.
  • The pull request includes detailed explanations of the feature's summary, quality checks, new functions, schema changes, state machine updates, and testing notes.
  • The changes are additive, including new database columns and functions, and integrate into existing task management flows.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 54 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Mon Feb 9 17:20:59 UTC 2026: Code review monitoring started
Mon Feb 9 17:20:59 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 54

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 54
  • VULNERABILITIES: 0

Generated on: Mon Feb 9 17:21:01 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

alex-solovyev added a commit that referenced this pull request Feb 9, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a quality gate with model escalation for the supervisor, a significant feature enhancement. The changes include database schema updates, new command-line options, and the core logic for quality checking and task escalation. The implementation is solid and follows the project's shell scripting conventions. I have one suggestion to improve code clarity by aligning comments and variable names with the actual implementation in the check_output_quality function.

Comment on lines +3023 to +3040
# Check 5: ShellCheck for .sh files (quick heuristic)
local changed_sh_files
changed_sh_files=$(git -C "$tworktree" diff --name-only "main..HEAD" 2>/dev/null | grep '\.sh$' || true)
if [[ -n "$changed_sh_files" ]]; then
local shellcheck_errors=0
while IFS= read -r sh_file; do
[[ -z "$sh_file" ]] && continue
local full_path="${tworktree}/${sh_file}"
[[ -f "$full_path" ]] || continue
local sc_count
sc_count=$(bash -n "$full_path" 2>&1 | wc -l | tr -d ' ')
shellcheck_errors=$((shellcheck_errors + sc_count))
done <<< "$changed_sh_files"
if [[ "$shellcheck_errors" -gt 5 ]]; then
echo "fail:syntax_errors_${shellcheck_errors}"
return 0
fi
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comments and variable names in this block refer to "ShellCheck", but the implementation uses bash -n, which only performs a syntax check. This is misleading for future maintenance.

To improve clarity, I suggest updating the comments and variable names to accurately reflect that a syntax check is being performed. This also applies to the function's header comment on line 2953.

Suggested change
# Check 5: ShellCheck for .sh files (quick heuristic)
local changed_sh_files
changed_sh_files=$(git -C "$tworktree" diff --name-only "main..HEAD" 2>/dev/null | grep '\.sh$' || true)
if [[ -n "$changed_sh_files" ]]; then
local shellcheck_errors=0
while IFS= read -r sh_file; do
[[ -z "$sh_file" ]] && continue
local full_path="${tworktree}/${sh_file}"
[[ -f "$full_path" ]] || continue
local sc_count
sc_count=$(bash -n "$full_path" 2>&1 | wc -l | tr -d ' ')
shellcheck_errors=$((shellcheck_errors + sc_count))
done <<< "$changed_sh_files"
if [[ "$shellcheck_errors" -gt 5 ]]; then
echo "fail:syntax_errors_${shellcheck_errors}"
return 0
fi
fi
# Check 5: Syntax errors in .sh files (quick heuristic)
local changed_sh_files
changed_sh_files=$(git -C "$tworktree" diff --name-only "main..HEAD" 2>/dev/null | grep '\.sh$' || true)
if [[ -n "$changed_sh_files" ]]; then
local syntax_errors=0
while IFS= read -r sh_file; do
[[ -z "$sh_file" ]] && continue
local full_path="${tworktree}/${sh_file}"
[[ -f "$full_path" ]] || continue
local error_line_count
error_line_count=$(bash -n "$full_path" 2>&1 | wc -l | tr -d ' ')
syntax_errors=$((syntax_errors + error_line_count))
done <<< "$changed_sh_files"
if [[ "$syntax_errors" -gt 5 ]]; then
echo "fail:syntax_errors_${syntax_errors}"
return 0
fi
fi

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 9, 2026

alex-solovyev added a commit that referenced this pull request Feb 9, 2026
All 8 subtasks of t132 (Cross-Provider Model Routing) are now complete:
- t132.1: Model-specific subagents (PR #758)
- t132.2: Provider/model registry (PR #761)
- t132.3: Model availability checker (PR #770)
- t132.4: Fallback chain config (PR #781)
- t132.5: Supervisor model resolution (PR #787)
- t132.6: Quality gate with escalation (PR #788)
- t132.7: Multi-provider runner/cron support (PR #789)
- t132.8: Cross-model review workflow (PR #791)

Also fixed stale git conflict markers in TODO.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant