feat: add quality gate with model escalation to supervisor (t132.6) #788

alex-solovyev · 2026-02-09T17:20:30Z

Summary

Adds post-completion quality gate that evaluates worker output before accepting completion
If quality is insufficient and escalation depth allows, re-queues the task with a higher-tier model (haiku->sonnet->opus, flash->pro)
Configurable per-task (max_escalation, default 2) and per-batch (--skip-quality-gate)

Quality Checks (`check_output_quality()`)

Trivial output: Log < 2KB without completion signals
Error patterns: Panics, fatal errors, segfaults, OOM (>2 occurrences)
Token-to-substance ratio: Log > 500KB with < 3 substance markers
No file changes: Empty git diff on branch
Syntax errors: bash -n failures on changed .sh files (>5 errors)
PR signal: PR existence is a strong positive signal (auto-pass)

New Functions

get_next_tier() - Escalation chain mapping
check_output_quality() - Heuristic quality checks
run_quality_gate() - Orchestrates check + escalation decision

Schema Changes

tasks.escalation_depth (INTEGER, default 0) - Current escalation count
tasks.max_escalation (INTEGER, default 2) - Max allowed escalations
batches.skip_quality_gate (INTEGER, default 0) - Skip gate for batch

State Machine

Added evaluating:queued transition for quality gate escalation
Quality gate runs in pulse Phase 1 before cmd_transition to complete

Testing

bash -n syntax check: pass
ShellCheck -S error: zero violations
Existing migrations preserved (additive ALTER TABLE only)

Fixes: t132.6 ref:GH#738

Post-completion quality checks that can re-dispatch to a higher-tier model if output quality is insufficient. Escalation chain: haiku->sonnet->opus, flash->pro. Configurable max_escalation per task and skip_quality_gate per batch. - get_next_tier(): maps current model to next tier in escalation chain - check_output_quality(): heuristic checks (log size, error patterns, file changes, syntax errors, token-to-substance ratio) - run_quality_gate(): orchestrates quality check + escalation decision - Schema: escalation_depth/max_escalation on tasks, skip_quality_gate on batches - Wired into pulse complete handler before accepting completion - --skip-quality-gate flag on batch command for trusted/trivial tasks Fixes: t132.6 ref:GH#738

gemini-code-assist · 2026-02-09T17:20:51Z

Summary of Changes

Hello @alex-solovyev, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust quality assurance mechanism for worker tasks. It establishes a post-completion quality gate that intelligently assesses the output of a task using various heuristics. If the output fails to meet quality standards, the system can automatically escalate the task by re-queuing it to be processed by a more advanced model, ensuring higher quality outcomes while providing configurable control over the escalation process.

Highlights

Post-Completion Quality Gate: Implements a new quality gate that evaluates worker output after task completion to ensure satisfactory results.
Model Escalation Mechanism: Automatically re-queues tasks with higher-tier models (e.g., haiku->sonnet->opus, flash->pro) if the initial output quality is insufficient and escalation depth permits.
Configurable Behavior: Allows configuration of maximum escalation attempts per task (max_escalation, default 2) and the ability to skip the quality gate for an entire batch (--skip-quality-gate).
Heuristic Quality Checks: Defines several checks including log size, error patterns (panics, OOM), token-to-substance ratio, absence of file changes, and shell script syntax errors.
New Core Functions: Introduces get_next_tier(), check_output_quality(), and run_quality_gate() to manage the escalation logic and quality assessment.
Database Schema Updates: Adds escalation_depth and max_escalation columns to the tasks table and skip_quality_gate to the batches table.
State Machine Integration: Incorporates an evaluating:queued transition to handle tasks re-queued by the quality gate, ensuring proper state management.

Changelog

.agents/scripts/supervisor-helper.sh
- Added comprehensive documentation for the new quality gate and model escalation feature.
- Introduced the evaluating:queued state transition to support re-queueing tasks after quality gate failures.
- Implemented database schema migrations to add escalation_depth and max_escalation columns to the tasks table, and skip_quality_gate to the batches table, ensuring backward compatibility.
- Updated the CREATE TABLE statements for tasks and batches to include the new escalation and quality gate columns.
- Modified the cmd_batch function to support a new --skip-quality-gate command-line option, allowing users to bypass quality checks for specific batches.
- Enhanced the cmd_batch output to reflect whether the quality gate is skipped for a newly created batch.
- Added the get_next_tier() function, which maps current models to their next higher-tier counterparts for escalation.
- Implemented the check_output_quality() function, a core component that performs heuristic checks on worker logs and worktrees (e.g., log size, error patterns, file changes, shell script syntax) to determine output quality.
- Introduced the run_quality_gate() function, which orchestrates the quality assessment, decides on task escalation, updates task metadata, and triggers re-queueing if necessary.
- Integrated the run_quality_gate() logic into the cmd_pulse function, ensuring that quality checks are performed immediately after a worker reports completion and before the task is finalized.

Activity

A new feature is introduced to add a quality gate with model escalation to the supervisor script.
The pull request includes detailed explanations of the feature's summary, quality checks, new functions, schema changes, state machine updates, and testing notes.
The changes are additive, including new database columns and functions, and integrate into existing task management flows.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-09T17:21:02Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 54 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Mon Feb 9 17:20:59 UTC 2026: Code review monitoring started
Mon Feb 9 17:20:59 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 54

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 54
VULNERABILITIES: 0

Generated on: Mon Feb 9 17:21:01 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist

Code Review

This pull request introduces a quality gate with model escalation for the supervisor, a significant feature enhancement. The changes include database schema updates, new command-line options, and the core logic for quality checking and task escalation. The implementation is solid and follows the project's shell scripting conventions. I have one suggestion to improve code clarity by aligning comments and variable names with the actual implementation in the check_output_quality function.

gemini-code-assist · 2026-02-09T17:22:48Z

.agents/scripts/supervisor-helper.sh

+        # Check 5: ShellCheck for .sh files (quick heuristic)
+        local changed_sh_files
+        changed_sh_files=$(git -C "$tworktree" diff --name-only "main..HEAD" 2>/dev/null | grep '\.sh$' || true)
+        if [[ -n "$changed_sh_files" ]]; then
+            local shellcheck_errors=0
+            while IFS= read -r sh_file; do
+                [[ -z "$sh_file" ]] && continue
+                local full_path="${tworktree}/${sh_file}"
+                [[ -f "$full_path" ]] || continue
+                local sc_count
+                sc_count=$(bash -n "$full_path" 2>&1 | wc -l | tr -d ' ')
+                shellcheck_errors=$((shellcheck_errors + sc_count))
+            done <<< "$changed_sh_files"
+            if [[ "$shellcheck_errors" -gt 5 ]]; then
+                echo "fail:syntax_errors_${shellcheck_errors}"
+                return 0
+            fi
+        fi


The comments and variable names in this block refer to "ShellCheck", but the implementation uses bash -n, which only performs a syntax check. This is misleading for future maintenance.

To improve clarity, I suggest updating the comments and variable names to accurately reflect that a syntax check is being performed. This also applies to the function's header comment on line 2953.

Suggested change

# Check 5: ShellCheck for .sh files (quick heuristic)

local changed_sh_files

changed_sh_files=$(git -C "$tworktree" diff --name-only "main..HEAD" 2>/dev/null | grep '\.sh$' || true)

if [[ -n "$changed_sh_files" ]]; then

local shellcheck_errors=0

while IFS= read -r sh_file; do

[[ -z "$sh_file" ]] && continue

local full_path="${tworktree}/${sh_file}"

[[ -f "$full_path" ]] || continue

local sc_count

sc_count=$(bash -n "$full_path" 2>&1 | wc -l | tr -d ' ')

shellcheck_errors=$((shellcheck_errors + sc_count))

done <<< "$changed_sh_files"

if [[ "$shellcheck_errors" -gt 5 ]]; then

echo "fail:syntax_errors_${shellcheck_errors}"

return 0

fi

fi

# Check 5: Syntax errors in .sh files (quick heuristic)

local changed_sh_files

changed_sh_files=$(git -C "$tworktree" diff --name-only "main..HEAD" 2>/dev/null | grep '\.sh$' || true)

if [[ -n "$changed_sh_files" ]]; then

local syntax_errors=0

while IFS= read -r sh_file; do

[[ -z "$sh_file" ]] && continue

local full_path="${tworktree}/${sh_file}"

[[ -f "$full_path" ]] || continue

local error_line_count

error_line_count=$(bash -n "$full_path" 2>&1 | wc -l | tr -d ' ')

syntax_errors=$((syntax_errors + error_line_count))

done <<< "$changed_sh_files"

if [[ "$syntax_errors" -gt 5 ]]; then

echo "fail:syntax_errors_${syntax_errors}"

return 0

fi

fi

sonarqubecloud · 2026-02-09T17:23:22Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

All 8 subtasks of t132 (Cross-Provider Model Routing) are now complete: - t132.1: Model-specific subagents (PR #758) - t132.2: Provider/model registry (PR #761) - t132.3: Model availability checker (PR #770) - t132.4: Fallback chain config (PR #781) - t132.5: Supervisor model resolution (PR #787) - t132.6: Quality gate with escalation (PR #788) - t132.7: Multi-provider runner/cron support (PR #789) - t132.8: Cross-model review workflow (PR #791) Also fixed stale git conflict markers in TODO.md.

alex-solovyev merged commit 8fdd553 into main Feb 9, 2026
4 checks passed

alex-solovyev added a commit that referenced this pull request Feb 9, 2026

chore: mark t132.6 complete (PR #788)

c88e4f8

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

marcusquinn mentioned this pull request Feb 10, 2026

t132.6: Quality gate with model escalation #738

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add quality gate with model escalation to supervisor (t132.6) #788

feat: add quality gate with model escalation to supervisor (t132.6) #788

Uh oh!

alex-solovyev commented Feb 9, 2026

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

sonarqubecloud bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add quality gate with model escalation to supervisor (t132.6) #788

feat: add quality gate with model escalation to supervisor (t132.6) #788

Uh oh!

Conversation

alex-solovyev commented Feb 9, 2026

Summary

Quality Checks (check_output_quality())

New Functions

Schema Changes

State Machine

Testing

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 9, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Feb 9, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Quality Checks (`check_output_quality()`)