Skip to content

Conversation

@alex-solovyev
Copy link
Collaborator

Summary

  • Adds cross-review command to compare-models-helper.sh for second-opinion review patterns
  • Dispatches the same prompt to multiple AI models in parallel via runner-helper.sh
  • Collects results, generates word count comparison and unified diff

Usage

# Review code with Sonnet and Opus (default)
compare-models-helper.sh cross-review \
  --prompt "Review this code for security issues: $(cat src/auth.ts)"

# Custom model set with timeout
compare-models-helper.sh cross-review \
  --prompt "Audit the architecture" \
  --models "sonnet,opus,pro" \
  --timeout 900

# Specify output directory
compare-models-helper.sh cross-review \
  --prompt "Review PR changes" \
  --models "opus,pro" \
  --output /tmp/my-review

How It Works

  1. Creates temporary runners for each model (via runner-helper.sh)
  2. Dispatches the same prompt to all models in parallel (background processes)
  3. Waits for all to complete (with per-model timeout)
  4. Extracts text responses from JSON output
  5. Displays each model's response
  6. Generates diff summary: word counts + unified diff (for 2-model comparisons)
  7. Cleans up temporary runners

Use Cases

  • Code review: Get a second opinion from a different model
  • Security audit: Cross-validate findings across providers
  • Architecture review: Compare reasoning approaches
  • Quality assurance: Verify model agreement on critical decisions

Testing

  • bash -n: Syntax check passes
  • ShellCheck -S error: Zero violations

Fixes: t132.8 ref:GH#740

Dispatch the same review prompt to multiple AI models in parallel, collect
results, and produce a diff summary. Enables second-opinion patterns for
code review, security audit, and architecture review.

- cmd_cross_review(): dispatches via runner-helper.sh to N models in parallel
- Creates temporary runners per model, cleans up after completion
- Collects JSON responses, extracts text, saves to output directory
- Generates word count comparison and unified diff for 2-model comparisons
- Default models: sonnet,opus (Anthropic second opinion)
- Configurable: --models, --timeout, --workdir, --output

Fixes: t132.8 ref:GH#740
@alex-solovyev alex-solovyev merged commit 26c0033 into main Feb 9, 2026
6 checks passed
@gemini-code-assist
Copy link

Summary of Changes

Hello @alex-solovyev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the compare-models-helper.sh script by adding a cross-review command. This new functionality allows users to simultaneously query multiple AI models with the same prompt, facilitating a comparative analysis of their responses. The script handles parallel execution, result collection, and provides tools like word count comparisons and unified diffs to highlight differences between model outputs, thereby supporting more robust and multi-faceted AI-driven reviews.

Highlights

  • New cross-review command: A new command, cross-review, has been added to compare-models-helper.sh to facilitate second-opinion review patterns across multiple AI models.
  • Parallel AI Model Dispatch: The script now dispatches the same prompt to multiple AI models in parallel using runner-helper.sh, allowing for concurrent processing of review requests.
  • Result Aggregation and Comparison: It collects responses from all models, generates a word count comparison for each, and creates a unified diff when exactly two models are compared, highlighting differences in their outputs.
  • Configurable Options: Users can specify the prompt, a comma-separated list of models, a working directory, a timeout for each model, and an output directory for results.
Changelog
  • .agents/scripts/compare-models-helper.sh
    • Added the cross-review command to the script's main command dispatch logic.
    • Implemented the cmd_cross_review function, which orchestrates the parallel execution of AI model prompts, result collection, and comparison.
    • Updated the help documentation (cmd_help) to list the new cross-review command and provide detailed usage examples.
    • Included logic for argument parsing (--prompt, --models, --workdir, --timeout, --output), default model selection, output directory management, and error handling for missing runner-helper.sh or insufficient models.
    • Integrated parallel execution using background processes (&) and wait for model responses, including error handling for failed dispatches.
    • Added functionality to extract text responses from JSON output using jq and display individual model results.
    • Implemented a diff summary section, including word count comparison for all successful model responses and a unified diff for exactly two models.
    • Ensured cleanup of temporary runners after execution.
Activity
  • The pull request introduces a new feature (feat) related to cross-model review workflows.
  • It addresses task t132.8 and references GH#740.
  • Basic shell syntax check (bash -n) and static analysis (ShellCheck -S error) passed with zero violations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 59 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Mon Feb 9 17:30:23 UTC 2026: Code review monitoring started
Mon Feb 9 17:30:23 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 59

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 59
  • VULNERABILITIES: 0

Generated on: Mon Feb 9 17:30:25 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 9, 2026

alex-solovyev added a commit that referenced this pull request Feb 9, 2026
All 8 subtasks of t132 (Cross-Provider Model Routing) are now complete:
- t132.1: Model-specific subagents (PR #758)
- t132.2: Provider/model registry (PR #761)
- t132.3: Model availability checker (PR #770)
- t132.4: Fallback chain config (PR #781)
- t132.5: Supervisor model resolution (PR #787)
- t132.6: Quality gate with escalation (PR #788)
- t132.7: Multi-provider runner/cron support (PR #789)
- t132.8: Cross-model review workflow (PR #791)

Also fixed stale git conflict markers in TODO.md.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable cross-review feature for comparing model outputs in parallel. The overall implementation is solid, but there are a few important issues regarding robustness and error handling that should be addressed. My review includes feedback on adding a crucial cleanup mechanism to prevent orphaned resources, fixing a logic bug in the parallel job monitoring by correctly handling command exit statuses, and improving the clarity of the final diff output by applying robust error handling. These changes will make the new feature more reliable and user-friendly and align with shell scripting best practices.

# --models "sonnet,opus,pro" [--workdir path] [--timeout N] [--output dir]
# Dispatches via runner-helper.sh in parallel, collects outputs, produces summary.
#######################################
cmd_cross_review() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function launches background processes to create temporary runners but lacks a trap to ensure these runners are cleaned up if the script is interrupted (e.g., with Ctrl+C). This can lead to orphaned resources and violates the repository style guide (rule #33), which requires trap cleanup for temporary resources.

To fix this, please add a trap at the beginning of the function to handle INT and TERM signals. This trap should iterate through the runner_names array and call runner-helper.sh destroy on each. You should then clear the trap using trap - INT TERM before the function returns normally.

References
  1. Repository style guide rule feat: add /session-review and /full-loop commands for comprehensive AI workflow #33 requires that temporary files (and by extension, temporary resources like runners) must have a trap for cleanup on RETURN or EXIT to prevent orphaned resources. (link)

Comment on lines +562 to +582
(
"$runner_helper" create "$runner_name" \
--model "$model_tier" \
--description "Cross-review: $model_tier" \
--workdir "$workdir" 2>/dev/null || true

local result_file="${output_dir}/${model_tier}.txt"
"$runner_helper" run "$runner_name" "$prompt" \
--model "$model_tier" \
--timeout "$review_timeout" \
--format json 2>/dev/null > "${output_dir}/${model_tier}.json" || true

# Extract text response from JSON
if [[ -f "${output_dir}/${model_tier}.json" ]]; then
jq -r '.parts[]? | select(.type == "text") | .text' \
"${output_dir}/${model_tier}.json" 2>/dev/null > "$result_file" || true
fi

# Clean up runner
"$runner_helper" destroy "$runner_name" --force 2>/dev/null || true
) &

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The subshell for each model will always exit with status 0 because all commands within it are guarded with || true. This makes the wait check for failures in the main loop ineffective, as it won't detect when a model run fails. The script will incorrectly report 'done' even for failed runs.

This suggestion modifies the subshell to capture the exit status of the critical runner-helper run command and use it as the exit status for the entire subshell. This allows the wait command to correctly detect and report failures. This approach aligns with best practices for shell scripting, specifically by capturing command exit codes in variables for clearer error handling, as recommended by ShellCheck SC2181.

        (
            local run_status=0
            "$runner_helper" create "$runner_name" \
                --model "$model_tier" \
                --description "Cross-review: $model_tier" \
                --workdir "$workdir" 2>/dev/null || true

            local result_file="${output_dir}/${model_tier}.txt"
            "$runner_helper" run "$runner_name" "$prompt" \
                --model "$model_tier" \
                --timeout "$review_timeout" \
                --format json 2>/dev/null > "${output_dir}/${model_tier}.json" || run_status=$?

            # Extract text response from JSON
            if [[ -f "${output_dir}/${model_tier}.json" ]]; then
                jq -r '.parts[]? | select(.type == "text") | .text' \
                    "${output_dir}/${model_tier}.json" 2>/dev/null > "$result_file" || true
            fi

            # Clean up runner
            "$runner_helper" destroy "$runner_name" --force 2>/dev/null || true

            exit "$run_status"
        ) &
References
  1. In shell scripts, capture a command's exit code in a variable instead of using $? directly in conditionals. This aligns with ShellCheck SC2181 and improves clarity for multi-way branches.

Comment on lines +646 to +648
echo "Diff (${model_names[0]} vs ${model_names[1]}):"
diff --unified=3 "$file_a" "$file_b" 2>/dev/null | head -100 || echo " (files are identical or diff unavailable)"
echo ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for displaying a diff is flawed. If files differ, diff exits with 1, but its output is piped to head, which usually exits 0, preventing the || echo ... fallback from running. If files are identical, diff exits 0 and produces no output, but no message is shown to the user. This makes it unclear whether files are identical or if an error occurred.

The suggested code explicitly checks the exit status of diff to correctly handle all three cases: identical files, different files, and an error during the diff operation. This approach aligns with best practices for shell scripting, specifically by capturing command exit codes in variables for clearer error handling and multi-way branching, as recommended by ShellCheck SC2181.

Suggested change
echo "Diff (${model_names[0]} vs ${model_names[1]}):"
diff --unified=3 "$file_a" "$file_b" 2>/dev/null | head -100 || echo " (files are identical or diff unavailable)"
echo ""
echo "Diff (${model_names[0]} vs ${model_names[1]}):"
diff_output=$(diff --unified=3 "$file_a" "$file_b" 2>/dev/null)
diff_status=$?
if [[ $diff_status -eq 0 ]]; then
echo " (files are identical)"
elif [[ $diff_status -eq 1 ]]; then
echo "$diff_output" | head -n 100
else
echo " (diff command failed)"
fi
echo ""
References
  1. In shell scripts, capture a command's exit code in a variable instead of using $? directly in conditionals. This aligns with ShellCheck SC2181 and improves clarity for multi-way branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant