Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 266 additions & 0 deletions .agents/scripts/compare-models-helper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,8 @@
echo " capabilities Show capability matrix"
echo " providers List supported providers and their models"
echo " discover Detect available providers from local config"
echo " score Record model comparison scores (from evaluation)"
echo " results View past comparison results and rankings"
echo " help Show this help"
echo ""
echo "Examples:"
Expand All @@ -497,6 +499,14 @@
echo " compare-models-helper.sh discover --list-models"
echo " compare-models-helper.sh discover --json"
echo ""
echo "Scoring examples:"
echo " compare-models-helper.sh score --task 'fix React bug' --type code \\"
echo " --model claude-sonnet-4 --correctness 9 --completeness 8 --quality 8 --clarity 9 --adherence 9 \\"
echo " --model gpt-4.1 --correctness 8 --completeness 7 --quality 7 --clarity 8 --adherence 8 \\"
echo " --winner claude-sonnet-4"
echo " compare-models-helper.sh results"
echo " compare-models-helper.sh results --model sonnet --limit 5"
echo ""
echo "Discover options:"
echo " --probe Verify API keys by calling provider endpoints"
echo " --list-models List live models from each verified provider"
Expand Down Expand Up @@ -861,6 +871,256 @@
return 0
}

# =============================================================================
# Comparison Scoring Framework
# =============================================================================
# Stores and retrieves model comparison results for cross-session insights.
# Results are stored in SQLite alongside the model registry.

RESULTS_DB="${AIDEVOPS_WORKSPACE_DIR:-$HOME/.aidevops/.agent-workspace}/memory/model-comparisons.db"

init_results_db() {
local db_dir
db_dir="$(dirname "$RESULTS_DB")"
mkdir -p "$db_dir"

sqlite3 "$RESULTS_DB" <<'SQL'
CREATE TABLE IF NOT EXISTS comparisons (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_description TEXT NOT NULL,
task_type TEXT DEFAULT 'general',
created_at TEXT DEFAULT (datetime('now')),
evaluator_model TEXT,
winner_model TEXT
);

CREATE TABLE IF NOT EXISTS comparison_scores (
id INTEGER PRIMARY KEY AUTOINCREMENT,
comparison_id INTEGER NOT NULL,
model_id TEXT NOT NULL,
correctness INTEGER DEFAULT 0,
completeness INTEGER DEFAULT 0,
code_quality INTEGER DEFAULT 0,
clarity INTEGER DEFAULT 0,
adherence INTEGER DEFAULT 0,
overall INTEGER DEFAULT 0,
latency_ms INTEGER DEFAULT 0,
tokens_used INTEGER DEFAULT 0,
strengths TEXT DEFAULT '',
weaknesses TEXT DEFAULT '',
response_file TEXT DEFAULT '',
FOREIGN KEY (comparison_id) REFERENCES comparisons(id)
);

CREATE INDEX IF NOT EXISTS idx_comparisons_task ON comparisons(task_type);
CREATE INDEX IF NOT EXISTS idx_comparisons_winner ON comparisons(winner_model);
CREATE INDEX IF NOT EXISTS idx_scores_model ON comparison_scores(model_id);
SQL
return 0
}
Comment on lines +882 to +920
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add a sqlite3 availability check for clear error feedback.

If sqlite3 isn't installed, the script will fail with a cryptic "command not found" error under set -e. Per the coding guidelines on reliability and clear logging, a pre-check with an actionable error message would be much friendlier.

🛡️ Proposed fix
 init_results_db() {
+    if ! command -v sqlite3 &>/dev/null; then
+        print_error "sqlite3 is required for comparison scoring but not found in PATH"
+        return 1
+    fi
     local db_dir
     db_dir="$(dirname "$RESULTS_DB")"
     mkdir -p "$db_dir"

As per coding guidelines, "Automation scripts - focus on: Clear logging and feedback, Error recovery mechanisms".

🤖 Prompt for AI Agents
In @.agents/scripts/compare-models-helper.sh around lines 882 - 920, The
init_results_db function currently assumes sqlite3 is present; add a pre-check
using command availability (e.g., verify sqlite3 with command -v or type) before
attempting to create the DB so you can print a clear actionable error and exit
if missing; reference init_results_db, RESULTS_DB, and sqlite3 when implementing
the check and ensure the script emits a readable error message (including that
sqlite3 is required) and returns a non-zero status instead of failing with a
cryptic "command not found".


# Record a comparison result
# Usage: cmd_score --task "description" --type "code" --evaluator "claude-opus-4" \
# --model "claude-sonnet-4" --correctness 9 --completeness 8 --quality 7 \
# --clarity 8 --adherence 9 --latency 1200 --tokens 500 \
# --strengths "Fast, accurate" --weaknesses "Verbose" \
# [--model "gpt-4.1" --correctness 8 ...]
cmd_score() {
init_results_db || return 1

local task="" task_type="general" evaluator="" winner=""
local -a model_entries=()
local current_model="" current_correct=0 current_complete=0 current_quality=0
local current_clarity=0 current_adherence=0 current_latency=0 current_tokens=0
local current_strengths="" current_weaknesses="" current_response=""

flush_model() {
if [[ -n "$current_model" ]]; then
local overall=$(( (current_correct + current_complete + current_quality + current_clarity + current_adherence) / 5 ))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line has two issues:

  1. Style Guide Violation: It combines local with an assignment from a command substitution (arithmetic expansion). The repository style guide (line 11) requires separating declaration and assignment for exit code safety.
  2. Precision Loss: It uses integer division, which truncates the average score (e.g., 8.8 becomes 8), leading to inaccurate overall scores.

To fix both, you can separate the declaration and use bc for floating-point math. This would also require changing the overall column type to REAL in the init_results_db function.

Suggested change
local overall=$(( (current_correct + current_complete + current_quality + current_clarity + current_adherence) / 5 ))
local overall
overall=$(echo "scale=2; ($current_correct + $current_complete + $current_quality + $current_clarity + $current_adherence) / 5" | bc)
References
  1. The style guide requires declaring local variables and assigning them from command substitutions in separate steps to ensure exit code safety. (link)
  2. In shell scripts, capture a command's exit code in a variable instead of using $? directly in conditionals. This aligns with ShellCheck SC2181 and improves clarity for multi-way branches.

model_entries+=("${current_model}|${current_correct}|${current_complete}|${current_quality}|${current_clarity}|${current_adherence}|${overall}|${current_latency}|${current_tokens}|${current_strengths}|${current_weaknesses}|${current_response}")
fi
current_model="" current_correct=0 current_complete=0 current_quality=0
current_clarity=0 current_adherence=0 current_latency=0 current_tokens=0
current_strengths="" current_weaknesses="" current_response=""
}
Comment on lines +937 to +945
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

flush_model missing explicit return and has SC2155.

Per coding guidelines, all functions should include explicit returns. Also, line 939 triggers ShellCheck SC2155 (declare and assign separately to avoid masking return values).

♻️ Proposed fix
     flush_model() {
         if [[ -n "$current_model" ]]; then
-            local overall=$(( (current_correct + current_complete + current_quality + current_clarity + current_adherence) / 5 ))
+            local overall
+            overall=$(( (current_correct + current_complete + current_quality + current_clarity + current_adherence) / 5 ))
             model_entries+=("${current_model}|${current_correct}|${current_complete}|${current_quality}|${current_clarity}|${current_adherence}|${overall}|${current_latency}|${current_tokens}|${current_strengths}|${current_weaknesses}|${current_response}")
         fi
         current_model="" current_correct=0 current_complete=0 current_quality=0
         current_clarity=0 current_adherence=0 current_latency=0 current_tokens=0
         current_strengths="" current_weaknesses="" current_response=""
+        return 0
     }

As per coding guidelines, "Run ShellCheck with zero violations on all scripts" and "Include explicit returns in shell scripts".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
flush_model() {
if [[ -n "$current_model" ]]; then
local overall=$(( (current_correct + current_complete + current_quality + current_clarity + current_adherence) / 5 ))
model_entries+=("${current_model}|${current_correct}|${current_complete}|${current_quality}|${current_clarity}|${current_adherence}|${overall}|${current_latency}|${current_tokens}|${current_strengths}|${current_weaknesses}|${current_response}")
fi
current_model="" current_correct=0 current_complete=0 current_quality=0
current_clarity=0 current_adherence=0 current_latency=0 current_tokens=0
current_strengths="" current_weaknesses="" current_response=""
}
flush_model() {
if [[ -n "$current_model" ]]; then
local overall
overall=$(( (current_correct + current_complete + current_quality + current_clarity + current_adherence) / 5 ))
model_entries+=("${current_model}|${current_correct}|${current_complete}|${current_quality}|${current_clarity}|${current_adherence}|${overall}|${current_latency}|${current_tokens}|${current_strengths}|${current_weaknesses}|${current_response}")
fi
current_model="" current_correct=0 current_complete=0 current_quality=0
current_clarity=0 current_adherence=0 current_latency=0 current_tokens=0
current_strengths="" current_weaknesses="" current_response=""
return 0
}
🤖 Prompt for AI Agents
In @.agents/scripts/compare-models-helper.sh around lines 937 - 945, The
flush_model function should declare and assign the local variable separately and
end with an explicit return; change "local overall=$((...))" to "local overall"
followed by "overall=$(( (current_correct + current_complete + current_quality +
current_clarity + current_adherence) / 5 ))", and break the combined post-if
assignments (current_model="", current_correct=0 ...) into separate statements
for clarity, then append an explicit "return 0" at the end of flush_model; keep
references to model_entries and current_* variables intact.


while [[ $# -gt 0 ]]; do
case "$1" in
--task) task="$2"; shift 2 ;;
--type) task_type="$2"; shift 2 ;;
--evaluator) evaluator="$2"; shift 2 ;;
--winner) winner="$2"; shift 2 ;;
--model) flush_model; current_model="$2"; shift 2 ;;
--correctness) current_correct="$2"; shift 2 ;;
--completeness) current_complete="$2"; shift 2 ;;
--quality) current_quality="$2"; shift 2 ;;
--clarity) current_clarity="$2"; shift 2 ;;
--adherence) current_adherence="$2"; shift 2 ;;
--latency) current_latency="$2"; shift 2 ;;
--tokens) current_tokens="$2"; shift 2 ;;
--strengths) current_strengths="$2"; shift 2 ;;
--weaknesses) current_weaknesses="$2"; shift 2 ;;
--response) current_response="$2"; shift 2 ;;
*) shift ;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent discard of unrecognized flags can mask typos.

*) shift ;; silently ignores unknown arguments. A typo like --corectness 9 would be eaten without warning, resulting in a default score of 0 for that criterion — a data-integrity trap that's hard to debug.

🛡️ Proposed fix — warn on unknown flags
-            *) shift ;;
+            *) print_warning "Unknown option ignored: $1"; shift ;;

As per coding guidelines, "Automation scripts - focus on: Clear logging and feedback".

🤖 Prompt for AI Agents
In @.agents/scripts/compare-models-helper.sh at line 964, The wildcard case in
the argument parsing (the pattern '*) shift ;;') silently swallows unknown
flags; replace it with a handler that prints a clear error to stderr including
the offending token (e.g., "Unknown option: $1") and exits non-zero so typos
fail fast; update the case block that contains '*) shift ;;' to use a
warning/error message directed to stderr (using >&2) and call exit 1 (or, if
intended to continue, at minimum print a warning and shift) so unrecognized
arguments are not silently discarded.

esac
done
Comment on lines +947 to +966
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate numeric inputs to prevent arithmetic failures and corrupt data.

Score values (--correctness, --completeness, etc.) are accepted raw and used directly in integer arithmetic (line 939) and SQL inserts. Non-numeric or out-of-range input will cause a silent arithmetic error or insert garbage into the database. A simple validation helper would catch this early with a clear message.

🛡️ Proposed validation helper
+    # Validate integer in range
+    validate_score() {
+        local name="$1" val="$2"
+        if ! [[ "$val" =~ ^[0-9]+$ ]] || [[ "$val" -lt 1 || "$val" -gt 10 ]]; then
+            print_error "--$name must be an integer between 1 and 10 (got: $val)"
+            return 1
+        fi
+        return 0
+    }
+
     while [[ $# -gt 0 ]]; do
         case "$1" in
             --task) task="$2"; shift 2 ;;
             --type) task_type="$2"; shift 2 ;;
             --evaluator) evaluator="$2"; shift 2 ;;
             --winner) winner="$2"; shift 2 ;;
             --model) flush_model; current_model="$2"; shift 2 ;;
-            --correctness) current_correct="$2"; shift 2 ;;
-            --completeness) current_complete="$2"; shift 2 ;;
-            --quality) current_quality="$2"; shift 2 ;;
-            --clarity) current_clarity="$2"; shift 2 ;;
-            --adherence) current_adherence="$2"; shift 2 ;;
+            --correctness) validate_score correctness "$2" || return 1; current_correct="$2"; shift 2 ;;
+            --completeness) validate_score completeness "$2" || return 1; current_complete="$2"; shift 2 ;;
+            --quality) validate_score quality "$2" || return 1; current_quality="$2"; shift 2 ;;
+            --clarity) validate_score clarity "$2" || return 1; current_clarity="$2"; shift 2 ;;
+            --adherence) validate_score adherence "$2" || return 1; current_adherence="$2"; shift 2 ;;

As per coding guidelines, "Automation scripts - focus on: Reliability and robustness, Clear logging and feedback".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while [[ $# -gt 0 ]]; do
case "$1" in
--task) task="$2"; shift 2 ;;
--type) task_type="$2"; shift 2 ;;
--evaluator) evaluator="$2"; shift 2 ;;
--winner) winner="$2"; shift 2 ;;
--model) flush_model; current_model="$2"; shift 2 ;;
--correctness) current_correct="$2"; shift 2 ;;
--completeness) current_complete="$2"; shift 2 ;;
--quality) current_quality="$2"; shift 2 ;;
--clarity) current_clarity="$2"; shift 2 ;;
--adherence) current_adherence="$2"; shift 2 ;;
--latency) current_latency="$2"; shift 2 ;;
--tokens) current_tokens="$2"; shift 2 ;;
--strengths) current_strengths="$2"; shift 2 ;;
--weaknesses) current_weaknesses="$2"; shift 2 ;;
--response) current_response="$2"; shift 2 ;;
*) shift ;;
esac
done
# Validate integer in range
validate_score() {
local name="$1" val="$2"
if ! [[ "$val" =~ ^[0-9]+$ ]] || [[ "$val" -lt 1 || "$val" -gt 10 ]]; then
print_error "--$name must be an integer between 1 and 10 (got: $val)"
return 1
fi
return 0
}
while [[ $# -gt 0 ]]; do
case "$1" in
--task) task="$2"; shift 2 ;;
--type) task_type="$2"; shift 2 ;;
--evaluator) evaluator="$2"; shift 2 ;;
--winner) winner="$2"; shift 2 ;;
--model) flush_model; current_model="$2"; shift 2 ;;
--correctness) validate_score correctness "$2" || return 1; current_correct="$2"; shift 2 ;;
--completeness) validate_score completeness "$2" || return 1; current_complete="$2"; shift 2 ;;
--quality) validate_score quality "$2" || return 1; current_quality="$2"; shift 2 ;;
--clarity) validate_score clarity "$2" || return 1; current_clarity="$2"; shift 2 ;;
--adherence) validate_score adherence "$2" || return 1; current_adherence="$2"; shift 2 ;;
--latency) current_latency="$2"; shift 2 ;;
--tokens) current_tokens="$2"; shift 2 ;;
--strengths) current_strengths="$2"; shift 2 ;;
--weaknesses) current_weaknesses="$2"; shift 2 ;;
--response) current_response="$2"; shift 2 ;;
*) shift ;;
esac
done
🤖 Prompt for AI Agents
In @.agents/scripts/compare-models-helper.sh around lines 947 - 966, Add
validation for the numeric score flags (current_correct, current_complete,
current_quality, current_clarity, current_adherence, current_latency,
current_tokens) right after argument parsing: implement a helper function (e.g.,
validate_score) that checks each variable is an integer (or numeric as
appropriate) and within expected ranges, logs a clear error and exits on
failure; call this helper before any arithmetic or SQL insertion (used later
around the arithmetic and DB insert logic) and ensure flush_model/current_model
parsing still occurs unaffected.

flush_model

if [[ -z "$task" ]]; then
echo "Usage: compare-models-helper.sh score --task 'description' --model 'model-id' --correctness N ..."
echo ""
echo "Score criteria (1-10 scale):"
echo " --correctness Factual accuracy and correctness"
echo " --completeness Coverage of all requirements"
echo " --quality Code quality (if code task)"
echo " --clarity Response clarity and readability"
echo " --adherence Following instructions precisely"
echo ""
echo "Metadata:"
echo " --task <desc> Task description (required)"
echo " --type <type> Task type: code, text, analysis, design (default: general)"
echo " --evaluator <model> Model that performed the evaluation"
echo " --winner <model> Overall winner model"
echo " --model <id> Start scoring for a model (repeat for each model)"
echo " --latency <ms> Response latency in milliseconds"
echo " --tokens <n> Tokens used"
echo " --strengths <text> Model strengths for this task"
echo " --weaknesses <text> Model weaknesses for this task"
echo " --response <file> Path to response file"
return 1
fi

if [[ ${#model_entries[@]} -eq 0 ]]; then
print_error "No model scores provided. Use --model <id> --correctness N ..."
return 1
fi

# Insert comparison record
local comp_id
comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons (task_description, task_type, evaluator_model, winner_model) VALUES ('$(echo "$task" | sed "s/'/''/g")', '$task_type', '$evaluator', '$winner'); SELECT last_insert_rowid();")

Check warning on line 1000 in .agents/scripts/compare-models-helper.sh

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

.agents/scripts/compare-models-helper.sh#L1000

See if you can use ${variable//search/replace} instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

This SQL query is vulnerable to SQL injection. The variables $task_type, $evaluator, and $winner are inserted directly into the query string without escaping. An attacker could provide malicious input to these parameters, compromising the database. To fix this, you should escape single quotes in these variables, similar to how you've handled the $task variable.

Suggested change
comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons (task_description, task_type, evaluator_model, winner_model) VALUES ('$(echo "$task" | sed "s/'/''/g")', '$task_type', '$evaluator', '$winner'); SELECT last_insert_rowid();")
comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons (task_description, task_type, evaluator_model, winner_model) VALUES ('$(echo "$task" | sed "s/'/''/g")', '$(echo "$task_type" | sed "s/'/''/g")', '$(echo "$evaluator" | sed "s/'/''/g")', '$(echo "$winner" | sed "s/'/''/g")'); SELECT last_insert_rowid();")
References
  1. The style guide recommends using parameterized queries where possible to prevent SQL injection. When not possible, all user-supplied data must be properly escaped. (link)
  2. For standalone shell scripts, it is acceptable to duplicate simple, self-contained helper functions (e.g., a cross-platform sed wrapper) instead of introducing source dependencies. This maintains script independence and avoids risks like path resolution issues, which is particularly important in focused bugfix pull requests.


# Insert scores for each model
for entry in "${model_entries[@]}"; do
IFS='|' read -r m_id m_cor m_com m_qua m_cla m_adh m_ove m_lat m_tok m_str m_wea m_res <<< "$entry"
sqlite3 "$RESULTS_DB" "INSERT INTO comparison_scores (comparison_id, model_id, correctness, completeness, code_quality, clarity, adherence, overall, latency_ms, tokens_used, strengths, weaknesses, response_file) VALUES ($comp_id, '$m_id', $m_cor, $m_com, $m_qua, $m_cla, $m_adh, $m_ove, $m_lat, $m_tok, '$(echo "$m_str" | sed "s/'/''/g")', '$(echo "$m_wea" | sed "s/'/''/g")', '$(echo "$m_res" | sed "s/'/''/g")');"

Check warning on line 1005 in .agents/scripts/compare-models-helper.sh

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

.agents/scripts/compare-models-helper.sh#L1005

See if you can use ${variable//search/replace} instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

This query is also vulnerable to SQL injection. The variable $m_id is not escaped before being inserted into the SQL string. While other text variables in this query are escaped, this one was missed. All user-controllable string literals in an SQL query must be properly escaped to prevent security vulnerabilities.

Suggested change
sqlite3 "$RESULTS_DB" "INSERT INTO comparison_scores (comparison_id, model_id, correctness, completeness, code_quality, clarity, adherence, overall, latency_ms, tokens_used, strengths, weaknesses, response_file) VALUES ($comp_id, '$m_id', $m_cor, $m_com, $m_qua, $m_cla, $m_adh, $m_ove, $m_lat, $m_tok, '$(echo "$m_str" | sed "s/'/''/g")', '$(echo "$m_wea" | sed "s/'/''/g")', '$(echo "$m_res" | sed "s/'/''/g")');"
sqlite3 "$RESULTS_DB" "INSERT INTO comparison_scores (comparison_id, model_id, correctness, completeness, code_quality, clarity, adherence, overall, latency_ms, tokens_used, strengths, weaknesses, response_file) VALUES ($comp_id, '$(echo "$m_id" | sed "s/'/''/g")', $m_cor, $m_com, $m_qua, $m_cla, $m_adh, $m_ove, $m_lat, $m_tok, '$(echo "$m_str" | sed "s/'/''/g")', '$(echo "$m_wea" | sed "s/'/''/g")', '$(echo "$m_res" | sed "s/'/''/g")');"
References
  1. The style guide recommends using parameterized queries where possible to prevent SQL injection. When not possible, all user-supplied data must be properly escaped. (link)
  2. For standalone shell scripts, it is acceptable to duplicate simple, self-contained helper functions (e.g., a cross-platform sed wrapper) instead of introducing source dependencies. This maintains script independence and avoids risks like path resolution issues, which is particularly important in focused bugfix pull requests.

done
Comment on lines +998 to +1006
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection — user-controlled values interpolated without escaping.

On line 1000, only $task is escaped via sed "s/'/''/g", but $task_type, $evaluator, and $winner are interpolated raw into the SQL string. On line 1005, $m_id is also unescaped. A malicious or simply apostrophe-containing value (e.g., --winner "it's a tie") breaks the query or allows injection.

Apply the same sed "s/'/''/g" escaping consistently to all string-interpolated values, or better yet, use a parameterized approach.

🔒 Proposed fix — consistent escaping
     # Insert comparison record
     local comp_id
-    comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons (task_description, task_type, evaluator_model, winner_model) VALUES ('$(echo "$task" | sed "s/'/''/g")', '$task_type', '$evaluator', '$winner'); SELECT last_insert_rowid();")
+    local esc_task esc_type esc_eval esc_winner
+    esc_task="${task//\'/\'\'}"
+    esc_type="${task_type//\'/\'\'}"
+    esc_eval="${evaluator//\'/\'\'}"
+    esc_winner="${winner//\'/\'\'}"
+    comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons (task_description, task_type, evaluator_model, winner_model) VALUES ('$esc_task', '$esc_type', '$esc_eval', '$esc_winner'); SELECT last_insert_rowid();")
 
     # Insert scores for each model
     for entry in "${model_entries[@]}"; do
         IFS='|' read -r m_id m_cor m_com m_qua m_cla m_adh m_ove m_lat m_tok m_str m_wea m_res <<< "$entry"
-        sqlite3 "$RESULTS_DB" "INSERT INTO comparison_scores (comparison_id, model_id, correctness, completeness, code_quality, clarity, adherence, overall, latency_ms, tokens_used, strengths, weaknesses, response_file) VALUES ($comp_id, '$m_id', $m_cor, $m_com, $m_qua, $m_cla, $m_adh, $m_ove, $m_lat, $m_tok, '$(echo "$m_str" | sed "s/'/''/g")', '$(echo "$m_wea" | sed "s/'/''/g")', '$(echo "$m_res" | sed "s/'/''/g")');"
+        local esc_mid="${m_id//\'/\'\'}"
+        local esc_str="${m_str//\'/\'\'}"
+        local esc_wea="${m_wea//\'/\'\'}"
+        local esc_res="${m_res//\'/\'\'}"
+        sqlite3 "$RESULTS_DB" "INSERT INTO comparison_scores (comparison_id, model_id, correctness, completeness, code_quality, clarity, adherence, overall, latency_ms, tokens_used, strengths, weaknesses, response_file) VALUES ($comp_id, '$esc_mid', $m_cor, $m_com, $m_qua, $m_cla, $m_adh, $m_ove, $m_lat, $m_tok, '$esc_str', '$esc_wea', '$esc_res');"
     done
🤖 Prompt for AI Agents
In @.agents/scripts/compare-models-helper.sh around lines 998 - 1006, The SQL
insertion uses unescaped, user-controlled variables (comp_id creation and the
comparison_scores INSERT inside the model_entries loop); fix by consistently
escaping all string interpolations (not just task) before embedding into
SQL—e.g., sanitize task_type, evaluator, winner, and m_id and any string columns
(m_str, m_wea, m_res) with the same sed "s/'/''/g" pattern or, better, switch
the INSERT logic in the comp_id and comparison_scores blocks to a parameterized/
prepared statement approach; update the code around comp_id assignment and the
for-loop that reads model_entries to use the sanitized variables (or bound
parameters) when calling sqlite3 so apostrophes and injection vectors are
handled safely.

Comment on lines +999 to +1006
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

No error handling on SQLite insert operations.

If the sqlite3 call on line 1000 fails (permissions, disk full, corrupted DB), comp_id will be empty and the subsequent inserts on line 1005 will either fail or insert with an invalid comparison_id. Under set -e the script exits abruptly with no user-facing diagnostic.

🛡️ Proposed fix
     local comp_id
-    comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons ...")
+    comp_id=$(sqlite3 "$RESULTS_DB" "INSERT INTO comparisons ...") || {
+        print_error "Failed to write comparison to database: $RESULTS_DB"
+        return 1
+    }
+    if [[ -z "$comp_id" ]]; then
+        print_error "Failed to obtain comparison ID from database"
+        return 1
+    fi

As per coding guidelines, "Automation scripts - focus on: Error recovery mechanisms, Clear logging and feedback".

🤖 Prompt for AI Agents
In @.agents/scripts/compare-models-helper.sh around lines 999 - 1006, The
sqlite3 INSERT that sets comp_id can fail and leave comp_id empty causing
invalid downstream inserts into comparison_scores; update the block around
comp_id and the for-loop to check the sqlite3 command exit status and validate
comp_id is a non-empty numeric value before proceeding: after running the
sqlite3 INSERT into comparisons capture both stdout and stderr (or check $?),
verify comp_id is not empty and consists only of digits, log a clear error to
stderr including $RESULTS_DB and the sqlite3 error output if it failed, and exit
or skip the subsequent loop if validation fails so comparison_scores inserts
only run when comp_id is valid; reference symbols: comp_id, RESULTS_DB,
model_entries, comparison_scores, and the sqlite3 INSERT commands.


print_success "Comparison #$comp_id recorded ($task_type: ${#model_entries[@]} models scored)"

# Display summary table
echo ""
printf "%-22s %5s %5s %5s %5s %5s %7s %8s %6s\n" \
"Model" "Corr" "Comp" "Qual" "Clar" "Adhr" "Overall" "Latency" "Tokens"
printf "%-22s %5s %5s %5s %5s %5s %7s %8s %6s\n" \
"-----" "----" "----" "----" "----" "----" "-------" "-------" "------"

for entry in "${model_entries[@]}"; do
IFS='|' read -r m_id m_cor m_com m_qua m_cla m_adh m_ove m_lat m_tok _ _ _ <<< "$entry"
local lat_fmt="${m_lat}ms"
[[ "$m_lat" -eq 0 ]] && lat_fmt="-"
[[ "$m_tok" -eq 0 ]] && m_tok="-"
printf "%-22s %5d %5d %5d %5d %5d %7d %8s %6s\n" \
"$m_id" "$m_cor" "$m_com" "$m_qua" "$m_cla" "$m_adh" "$m_ove" "$lat_fmt" "$m_tok"
done

if [[ -n "$winner" ]]; then
echo ""
echo " Winner: $winner"
fi
echo ""

return 0
}

# View past comparison results
cmd_results() {
init_results_db || return 1

local limit=10
local model_filter="" type_filter=""

while [[ $# -gt 0 ]]; do
case "$1" in
--limit) limit="$2"; shift 2 ;;
--model) model_filter="$2"; shift 2 ;;
--type) type_filter="$2"; shift 2 ;;
*) shift ;;
esac
done

local where_clause=""
if [[ -n "$model_filter" ]]; then
where_clause="WHERE cs.model_id LIKE '%${model_filter}%'"
fi
if [[ -n "$type_filter" ]]; then
if [[ -n "$where_clause" ]]; then
where_clause="$where_clause AND c.task_type = '$type_filter'"
else
where_clause="WHERE c.task_type = '$type_filter'"
fi
fi
Comment on lines +1052 to +1061

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The construction of where_clause is vulnerable to SQL injection. The variables $model_filter and $type_filter are taken from user input and inserted directly into the SQL query without escaping. This could allow an attacker to manipulate the query. You should sanitize these variables before using them in the query.

Suggested change
if [[ -n "$model_filter" ]]; then
where_clause="WHERE cs.model_id LIKE '%${model_filter}%'"
fi
if [[ -n "$type_filter" ]]; then
if [[ -n "$where_clause" ]]; then
where_clause="$where_clause AND c.task_type = '$type_filter'"
else
where_clause="WHERE c.task_type = '$type_filter'"
fi
fi
if [[ -n "$model_filter" ]]; then
local sanitized_model_filter
sanitized_model_filter=$(echo "$model_filter" | sed "s/'/''/g")
where_clause="WHERE cs.model_id LIKE '%${sanitized_model_filter}%'"
fi
if [[ -n "$type_filter" ]]; then
local sanitized_type_filter
sanitized_type_filter=$(echo "$type_filter" | sed "s/'/''/g")
if [[ -n "$where_clause" ]]; then
where_clause="$where_clause AND c.task_type = '$sanitized_type_filter'"
else
where_clause="WHERE c.task_type = '$sanitized_type_filter'"
fi
fi
References
  1. The style guide recommends using parameterized queries where possible to prevent SQL injection. When not possible, all user-supplied data must be properly escaped. (link)
  2. For standalone shell scripts, it is acceptable to duplicate simple, self-contained helper functions (e.g., a cross-platform sed wrapper) instead of introducing source dependencies. This maintains script independence and avoids risks like path resolution issues, which is particularly important in focused bugfix pull requests.
  3. In shell scripts, capture a command's exit code in a variable instead of using $? directly in conditionals. This aligns with ShellCheck SC2181 and improves clarity for multi-way branches.

Comment on lines +1051 to +1061
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection in filter construction — model_filter, type_filter, and limit interpolated raw.

model_filter and type_filter are placed directly into the SQL WHERE clause without escaping. limit on line 1083 is also unvalidated. All three accept arbitrary user input via --model, --type, and --limit.

🔒 Proposed fix
+    # Validate limit is a positive integer
+    if ! [[ "$limit" =~ ^[0-9]+$ ]]; then
+        print_error "--limit must be a positive integer"
+        return 1
+    fi
+
     local where_clause=""
     if [[ -n "$model_filter" ]]; then
-        where_clause="WHERE cs.model_id LIKE '%${model_filter}%'"
+        local esc_model="${model_filter//\'/\'\'}"
+        where_clause="WHERE cs.model_id LIKE '%${esc_model}%'"
     fi
     if [[ -n "$type_filter" ]]; then
+        local esc_type="${type_filter//\'/\'\'}"
         if [[ -n "$where_clause" ]]; then
-            where_clause="$where_clause AND c.task_type = '$type_filter'"
+            where_clause="$where_clause AND c.task_type = '$esc_type'"
         else
-            where_clause="WHERE c.task_type = '$type_filter'"
+            where_clause="WHERE c.task_type = '$esc_type'"
         fi
     fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local where_clause=""
if [[ -n "$model_filter" ]]; then
where_clause="WHERE cs.model_id LIKE '%${model_filter}%'"
fi
if [[ -n "$type_filter" ]]; then
if [[ -n "$where_clause" ]]; then
where_clause="$where_clause AND c.task_type = '$type_filter'"
else
where_clause="WHERE c.task_type = '$type_filter'"
fi
fi
# Validate limit is a positive integer
if ! [[ "$limit" =~ ^[0-9]+$ ]]; then
print_error "--limit must be a positive integer"
return 1
fi
local where_clause=""
if [[ -n "$model_filter" ]]; then
local esc_model="${model_filter//\'/\'\'}"
where_clause="WHERE cs.model_id LIKE '%${esc_model}%'"
fi
if [[ -n "$type_filter" ]]; then
local esc_type="${type_filter//\'/\'\'}"
if [[ -n "$where_clause" ]]; then
where_clause="$where_clause AND c.task_type = '$esc_type'"
else
where_clause="WHERE c.task_type = '$esc_type'"
fi
fi
🤖 Prompt for AI Agents
In @.agents/scripts/compare-models-helper.sh around lines 1051 - 1061, The WHERE
construction interpolates untrusted vars (model_filter, type_filter) and limit
directly into SQL, enabling injection; fix by sanitizing/validating inputs
before building where_clause and limit usage: escape single quotes and
percent/wildcard characters in model_filter (used in LIKE) and escape single
quotes in type_filter, or better use parameterized queries if the DB client
supports them, and validate limit is an integer (reject or default if not);
update the code paths that set where_clause (variables model_filter,
type_filter, where_clause) and wherever limit is used to perform these
checks/escapes before concatenation.


echo ""
echo "Model Comparison Results (last $limit)"
echo "======================================="
echo ""

local count
count=$(sqlite3 "$RESULTS_DB" "SELECT COUNT(DISTINCT c.id) FROM comparisons c LEFT JOIN comparison_scores cs ON c.id = cs.comparison_id $where_clause;" 2>/dev/null || echo "0")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This sqlite3 command suppresses errors using 2>/dev/null. This pattern is repeated on lines 1084, 1096, and 1116, and it violates the repository style guide (line 50), which disallows blanket error suppression. Please remove 2>/dev/null from these calls to allow database errors to be visible for debugging. The existing || guards and pipes will still handle command failures gracefully.

Suggested change
count=$(sqlite3 "$RESULTS_DB" "SELECT COUNT(DISTINCT c.id) FROM comparisons c LEFT JOIN comparison_scores cs ON c.id = cs.comparison_id $where_clause;" 2>/dev/null || echo "0")
count=$(sqlite3 "$RESULTS_DB" "SELECT COUNT(DISTINCT c.id) FROM comparisons c LEFT JOIN comparison_scores cs ON c.id = cs.comparison_id $where_clause;" || echo "0")
References
  1. The style guide prohibits blanket error suppression with 2>/dev/null. Errors should be visible for debugging or redirected to a log file. (link)


if [[ "$count" -eq 0 ]]; then
echo "No comparison results found."
echo "Run a comparison first: compare-models-helper.sh score --task '...' --model '...' ..."
echo ""
return 0
fi

# Show recent comparisons
sqlite3 -separator '|' "$RESULTS_DB" "
SELECT c.id, c.created_at, c.task_type, c.task_description, c.winner_model
FROM comparisons c
ORDER BY c.created_at DESC
LIMIT $limit;
" 2>/dev/null | while IFS='|' read -r cid cdate ctype cdesc cwinner; do
echo " #$cid [$ctype] $(echo "$cdesc" | head -c 60) ($cdate)"
if [[ -n "$cwinner" ]]; then
echo " Winner: $cwinner"
fi

# Show scores for this comparison
sqlite3 -separator '|' "$RESULTS_DB" "
SELECT model_id, overall, correctness, completeness, code_quality, clarity, adherence
FROM comparison_scores
WHERE comparison_id = $cid
ORDER BY overall DESC;
" 2>/dev/null | while IFS='|' read -r mid ov co cm cq cl ca; do
printf " %-20s overall:%d (corr:%d comp:%d qual:%d clar:%d adhr:%d)\n" \
"$mid" "$ov" "$co" "$cm" "$cq" "$cl" "$ca"
done
echo ""
done
Comment on lines +1079 to +1101
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: --model / --type filters not applied to the recent comparisons listing.

The where_clause is applied to the count query (line 1069) and the aggregate rankings query (line 1106), but the recent comparisons listing here queries comparisons without any filter. Running results --model sonnet will show all comparisons in this section while only showing filtered aggregates below — inconsistent and confusing.

🐛 Proposed fix — apply filter to recent comparisons query
     # Show recent comparisons
     sqlite3 -separator '|' "$RESULTS_DB" "
-        SELECT c.id, c.created_at, c.task_type, c.task_description, c.winner_model
-        FROM comparisons c
-        ORDER BY c.created_at DESC
+        SELECT DISTINCT c.id, c.created_at, c.task_type, c.task_description, c.winner_model
+        FROM comparisons c
+        LEFT JOIN comparison_scores cs ON c.id = cs.comparison_id
+        $where_clause
+        ORDER BY c.created_at DESC
         LIMIT $limit;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sqlite3 -separator '|' "$RESULTS_DB" "
SELECT c.id, c.created_at, c.task_type, c.task_description, c.winner_model
FROM comparisons c
ORDER BY c.created_at DESC
LIMIT $limit;
" 2>/dev/null | while IFS='|' read -r cid cdate ctype cdesc cwinner; do
echo " #$cid [$ctype] $(echo "$cdesc" | head -c 60) ($cdate)"
if [[ -n "$cwinner" ]]; then
echo " Winner: $cwinner"
fi
# Show scores for this comparison
sqlite3 -separator '|' "$RESULTS_DB" "
SELECT model_id, overall, correctness, completeness, code_quality, clarity, adherence
FROM comparison_scores
WHERE comparison_id = $cid
ORDER BY overall DESC;
" 2>/dev/null | while IFS='|' read -r mid ov co cm cq cl ca; do
printf " %-20s overall:%d (corr:%d comp:%d qual:%d clar:%d adhr:%d)\n" \
"$mid" "$ov" "$co" "$cm" "$cq" "$cl" "$ca"
done
echo ""
done
sqlite3 -separator '|' "$RESULTS_DB" "
SELECT DISTINCT c.id, c.created_at, c.task_type, c.task_description, c.winner_model
FROM comparisons c
LEFT JOIN comparison_scores cs ON c.id = cs.comparison_id
$where_clause
ORDER BY c.created_at DESC
LIMIT $limit;
" 2>/dev/null | while IFS='|' read -r cid cdate ctype cdesc cwinner; do
echo " #$cid [$ctype] $(echo "$cdesc" | head -c 60) ($cdate)"
if [[ -n "$cwinner" ]]; then
echo " Winner: $cwinner"
fi
# Show scores for this comparison
sqlite3 -separator '|' "$RESULTS_DB" "
SELECT model_id, overall, correctness, completeness, code_quality, clarity, adherence
FROM comparison_scores
WHERE comparison_id = $cid
ORDER BY overall DESC;
" 2>/dev/null | while IFS='|' read -r mid ov co cm cq cl ca; do
printf " %-20s overall:%d (corr:%d comp:%d qual:%d clar:%d adhr:%d)\n" \
"$mid" "$ov" "$co" "$cm" "$cq" "$cl" "$ca"
done
echo ""
done
🤖 Prompt for AI Agents
In @.agents/scripts/compare-models-helper.sh around lines 1079 - 1101, The
recent comparisons query is missing the filter stored in where_clause so
--model/--type aren't applied; modify the sqlite3 SELECT that reads from the
comparisons table (the block that SELECTs c.id, c.created_at, c.task_type,
c.task_description, c.winner_model) to include the same where_clause used
elsewhere by adding the variable (e.g. append "$where_clause" or
${where_clause}) after "FROM comparisons c" and before "ORDER BY", ensuring
proper spacing/quoting so the shell expands it into the SQL statement.


# Show aggregate model rankings
echo "Aggregate Model Rankings"
echo "------------------------"
sqlite3 -separator '|' "$RESULTS_DB" "
SELECT model_id,
COUNT(*) as comparisons,
ROUND(AVG(overall), 1) as avg_overall,
SUM(CASE WHEN c.winner_model = cs.model_id THEN 1 ELSE 0 END) as wins
FROM comparison_scores cs
JOIN comparisons c ON c.id = cs.comparison_id
$where_clause
GROUP BY model_id
ORDER BY avg_overall DESC;
" 2>/dev/null | while IFS='|' read -r mid cnt avg wins; do
printf " %-22s avg:%s wins:%s/%s\n" "$mid" "$avg" "$wins" "$cnt"
done
echo ""

return 0
}

# =============================================================================
# Main
# =============================================================================
Expand Down Expand Up @@ -894,6 +1154,12 @@
discover)
cmd_discover "$@"
;;
score)
cmd_score "$@"
;;
results)
cmd_results "$@"
;;
help|--help|-h)
cmd_help
;;
Expand Down
Loading