t1012: MODELS.md — live model leaderboard with success rates by marcusquinn · Pull Request #1305 · marcusquinn/aidevops

marcusquinn · 2026-02-12T21:54:29Z

Summary

Adds generate-models-md.sh — queries three SQLite databases (model-registry, pattern-tracker, response-scoring) to produce a live Markdown leaderboard
Generates MODELS.md in repo root showing: all 17 available models across 6 providers, success rates by model tier and task type, quality scores from response-scoring evaluations, and head-to-head contest results
Adds supervisor pulse Phase 12 to auto-regenerate MODELS.md hourly when pattern data changes (throttled, auto-commits if content changed)
Registers script in subagent-index.toon

Output Format

Available Models: full catalog with provider, tier, context window, pricing
Routing Tiers: active model assignments per dispatch tier
Performance Leaderboard: success rates from autonomous task execution
By Task Type: breakdown across feature/bugfix/refactor/etc.
Contest Results: quality scores and head-to-head wins from response-scoring DB

Summary by CodeRabbit

New Features
- Introduced an auto-generated MODELS.md leaderboard displaying model performance metrics, costs, and routing tiers.
- Added periodic leaderboard regeneration (hourly throttle) that aggregates performance data, task-type breakdowns, and contest results.
- Leaderboard includes success rates, response scoring, and model availability information across all dispatched tasks.

…1012) Queries three SQLite databases (model-registry, pattern-tracker, response-scoring) to produce a Markdown leaderboard showing all available models, success rates by tier and task type, quality scores, and head-to-head contest results.

Generated from live data: 17 models across 6 providers, 487 pattern data points, 18 scored responses. Shows success rates by model tier and task type.

gemini-code-assist · 2026-02-12T21:54:32Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-02-12T21:54:38Z

Walkthrough

Introduces a new Bash script that auto-generates a MODELS.md leaderboard by aggregating model registry, pattern-tracker, and response-scoring database data. Integrates periodic regeneration into the supervisor pulse cycle with hourly throttling. Updates manifest to export the new script.

Changes

Cohort / File(s)	Summary
Models Leaderboard Generation `.agents/scripts/generate-models-md.sh`	New script with strict error handling, 10 functions for catalog, routing tiers, performance leaderboard, task-type breakdown, contest results, and stats aggregation from three SQLite databases; includes logging helpers, repo-root detection, and command-line option parsing (--output, --quiet).
Supervisor Integration `.agents/scripts/supervisor-helper.sh`	Adds Phase 12 (t1012) to cmd_pulse for time-throttled MODELS.md regeneration (max once per hour); invokes generate-models-md.sh per repo root discovered from supervisor DB, updates models-md-last-regen timestamp on success.
Manifest Updates `.agents/subagent-index.toon`	Bumps TOON:scripts index from [77] to [78]; registers generate-models-md.sh as new exported public script with help flag support.
Generated Output `MODELS.md`	Auto-generated leaderboard document with metadata, model catalog table, routing tiers, performance metrics by success rate and task type, contest quality scores, and timestamps; marked for non-manual editing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

t1012: MODELS.md — live model leaderboard with success rates from pattern tracker #1302 — Directly implements the models leaderboard generation feature with SQLite aggregation from pattern-tracker and response-scoring systems.

Possibly related PRs

feat: response comparison and scoring framework for model evaluation (t168.3) #773 — Introduces response-scoring SQLite database that the new generate-models-md.sh script queries and aggregates for contest results.
feat: add supervisor-helper.sh with SQLite schema and state machine (t128.1) #376 — Implements supervisor-helper.sh itself; this PR adds a new orchestration phase to that same script.
feat: Claude-Flow inspired features - model routing, semantic memory, pattern tracking (t102) #341 — Implements pattern-tracker system that serves as a data source for the leaderboard aggregation.

Poem

📊 From three databases deep, a leaderboard awakes,
SQLite treasures mined by script's precise hand—
Each model ranked with contests won, and patterns traced,
Supervisor's pulse beats steady, hourly regenerates.
A MODELS.md manifest of champion designs! 🏆

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (5 files): ⚔️ `.agents/scripts/issue-sync-helper.sh` (content) ⚔️ `.agents/scripts/supervisor-helper.sh` (content) ⚔️ `.agents/scripts/supervisor/issue-sync.sh` (content) ⚔️ `.agents/subagent-index.toon` (content) ⚔️ `TODO.md` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main change: introduction of a live, auto-generated MODELS.md leaderboard featuring model information and success rate metrics.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1012

⚔️ Resolve merge conflicts (beta)

Auto-commit resolved conflicts to branch feature/t1012
Create stacked PR with resolved conflicts
Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-12T21:54:59Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 0 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 21:54:55 UTC 2026: Code review monitoring started
Thu Feb 12 21:54:55 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 0

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 0
VULNERABILITIES: 0

Generated on: Thu Feb 12 21:54:58 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

Hourly throttled pulse phase iterates over known repos and regenerates MODELS.md when pattern data changes. Registered in subagent-index.toon.

github-actions · 2026-02-12T22:03:48Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 0 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 12 22:03:44 UTC 2026: Code review monitoring started
Thu Feb 12 22:03:44 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 0

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 0
VULNERABILITIES: 0

Generated on: Thu Feb 12 22:03:47 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist · 2026-02-12T22:04:07Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

sonarqubecloud · 2026-02-12T22:04:38Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @.agents/scripts/generate-models-md.sh:
- Around line 336-358: The current AVG(...) * (1.0/0.25) calculation in the
sqlite query assumes each response has all four criteria and miscomputes when a
criterion is missing; change the query to compute a per-response weighted sum in
a subquery (e.g., alias it ws with columns response_id and weighted_score
computed by SUM(CASE WHEN ... THEN score * weight END) GROUP BY response_id),
JOIN that ws back to responses r on r.response_id, then use
AVG(ws.weighted_score) (formatted with printf) and ORDER BY
AVG(ws.weighted_score) DESC; update the variable reads (model, responses,
avg_score, avg_time) accordingly so avg_score comes from the averaged
ws.weighted_score.
- Around line 492-497: The --output case in the argument-parsing loop uses "$2"
without guarding that a value exists, which causes an unbound variable error
under set -u; update the case branch that assigns OUTPUT_PATH to first verify
that a next argument exists and is not another option (e.g., check that $# -ge 2
and that "$2" does not start with '-') and if the value is missing print a clear
error to stderr and exit with a non-zero code; adjust the case for "--output"
(and any similar flags) to perform this validation before assigning OUTPUT_PATH
and shifting.

In @.agents/scripts/supervisor-helper.sh:
- Around line 11008-11045: The stamp update is currently unconditional; change
this so the stamp (models_md_stamp) is only written when at least one repository
was successfully regenerated. Introduce a local flag (e.g.,
models_md_succeeded=0) before iterating models_repos and set it to 1 when a
regeneration produces a change (inside the branch where "$generate_script"
succeeds and git -C "$models_repo_root" diff shows MODELS.md was updated); after
the loop, only echo "$models_md_now" > "$models_md_stamp" if models_md_succeeded
is 1. Keep all existing logging/commit/push behavior but avoid touching the
stamp on failures or when no repos changed.
- Around line 11049-11057: Remove the orphaned else/fi block and the stray
record_throttle call that follow the Phase 12 MODELS.md logic so the throttle
path is a single coherent if/else branch; specifically, edit the Phase 12
section around the log_warn/log_verbose outputs and remove the dangling else/fi
and the subsequent record_throttle "$models_md_throttle_key" call so that only
the intended throttle handling remains and the script parses correctly (ensure
functions/commands referenced like record_throttle, log_warn, and log_verbose
remain in their proper conditional blocks).

🧹 Nitpick comments (4)

.agents/scripts/generate-models-md.sh (4)
71-80: SQL injection risk in db_has_data — $table_name is interpolated directly into the query.

While currently called only with internal hardcoded table names, this is a fragile pattern. If this utility function is ever reused with user-supplied input, it becomes exploitable. Consider validating the table name against [a-zA-Z_] or using sqlite3's .tables to confirm existence first.
🛡️ Suggested hardening
 db_has_data() {
 	local db_path="$1"
 	local table_name="$2"
 
+	# Validate table name contains only safe characters
+	if [[ ! "$table_name" =~ ^[a-zA-Z_][a-zA-Z0-9_]*$ ]]; then
+		return 1
+	fi
 	[[ -f "$db_path" ]] || return 1
 	local count
 	count=$(sqlite3 "$db_path" "SELECT COUNT(*) FROM $table_name;" 2>/dev/null) || return 1
211-245: Leaderboard queries fire N×3 separate sqlite3 invocations per tier — consider consolidating.

Each tier iteration (line 213) spawns 3 separate sqlite3 processes (successes, failures, last_used). With 5 tiers that's 15 process spawns. A single query could return all tiers at once, improving both performance and readability.
♻️ Sketch: single-query approach
-	local tiers="opus sonnet pro flash haiku"
-	for tier in $tiers; do
-		local successes failures last_used
-		successes=$(sqlite3 "$MEMORY_DB" "..." 2>/dev/null) || successes=0
-		failures=$(sqlite3 "$MEMORY_DB" "..." 2>/dev/null) || failures=0
-		...
-	done
+	sqlite3 -separator '|' "$MEMORY_DB" "
+		SELECT
+			CASE
+				WHEN tags LIKE '%model:opus%' OR content LIKE '%[model:opus]%' THEN 'opus'
+				WHEN tags LIKE '%model:sonnet%' OR content LIKE '%[model:sonnet]%' THEN 'sonnet'
+				WHEN tags LIKE '%model:pro%' OR content LIKE '%[model:pro]%' THEN 'pro'
+				WHEN tags LIKE '%model:flash%' OR content LIKE '%[model:flash]%' THEN 'flash'
+				WHEN tags LIKE '%model:haiku%' OR content LIKE '%[model:haiku]%' THEN 'haiku'
+			END AS tier,
+			SUM(CASE WHEN type IN ($SUCCESS_TYPES) THEN 1 ELSE 0 END),
+			SUM(CASE WHEN type IN ($FAILURE_TYPES) THEN 1 ELSE 0 END),
+			SUBSTR(MAX(created_at), 1, 10)
+		FROM learnings
+		WHERE type IN ($PATTERN_TYPES)
+		GROUP BY tier
+		HAVING tier IS NOT NULL
+		ORDER BY ...
+	" 2>/dev/null | while IFS='|' read -r tier successes failures last_used; do
+		local tasks_total=$((successes + failures))
+		local rate=$(( (successes * 100) / tasks_total ))
+		echo "| $tier | $tasks_total | $successes | $failures | ${rate}% | $last_used |"
+	done
255-303: Same N×2 query pattern in task type breakdown — 13 task types × 2 queries = 26 sqlite3 spawns.

Same consolidation opportunity as the leaderboard. Lower priority since this runs hourly, but worth noting for consistency.

433-468: generate_models_md — clean assembly of sections via subshell redirect.

The grouped command block { ... } > "$output" is a good pattern. All sub-generators write to stdout, cleanly captured. Explicit return 0 present.

One minor note: if the output directory doesn't exist, the redirect will fail. Consider adding a mkdir -p for $(dirname "$output") before the write.
🛡️ Optional: ensure output directory exists
 generate_models_md() {
 	local output="$1"
+	mkdir -p "$(dirname "$output")" 2>/dev/null || true
 	local timestamp
As per coding guidelines: "Error recovery mechanisms" for .agents/scripts/*.sh.

coderabbitai · 2026-02-12T22:08:03Z

.agents/scripts/generate-models-md.sh

+		sqlite3 -separator '|' "$SCORING_DB" "
+            SELECT
+                r.model_id,
+                COUNT(DISTINCT r.response_id),
+                printf('%.2f',
+                    AVG(CASE WHEN s.criterion='correctness' THEN s.score * 0.30
+                             WHEN s.criterion='completeness' THEN s.score * 0.25
+                             WHEN s.criterion='code_quality' THEN s.score * 0.25
+                             WHEN s.criterion='clarity' THEN s.score * 0.20
+                             ELSE 0 END) * (1.0 / 0.25)
+                ),
+                printf('%.1f', AVG(r.response_time))
+            FROM responses r
+            JOIN scores s ON r.response_id = s.response_id
+            GROUP BY r.model_id
+            ORDER BY AVG(CASE WHEN s.criterion='correctness' THEN s.score * 0.30
+                              WHEN s.criterion='completeness' THEN s.score * 0.25
+                              WHEN s.criterion='code_quality' THEN s.score * 0.25
+                              WHEN s.criterion='clarity' THEN s.score * 0.20
+                              ELSE 0 END) DESC;
+        " 2>/dev/null | while IFS='|' read -r model responses avg_score avg_time; do
+			echo "| $model | $responses | $avg_score/5.0 | $avg_time |"
+		done


⚠️ Potential issue | 🟡 Minor

Weighted average calculation is fragile — assumes exactly 4 score criteria per response.

The math AVG(...weighted_component...) * (1.0 / 0.25) (i.e., * 4) only yields a correct weighted score if every response has exactly one row per criterion (correctness, completeness, code_quality, clarity). If any criterion is missing for a response, the divisor in AVG changes and the result inflates or deflates silently.

Consider using a SUM/COUNT(DISTINCT) approach or a subquery that computes the weighted sum per response first, then averages across responses:

SELECT r.model_id, COUNT(DISTINCT r.response_id), printf('%.2f', AVG(ws.weighted_score)), printf('%.1f', AVG(r.response_time)) FROM responses r JOIN ( SELECT response_id, SUM(CASE criterion WHEN 'correctness' THEN score * 0.30 WHEN 'completeness' THEN score * 0.25 WHEN 'code_quality' THEN score * 0.25 WHEN 'clarity' THEN score * 0.20 ELSE 0 END) AS weighted_score FROM scores GROUP BY response_id ) ws ON r.response_id = ws.response_id GROUP BY r.model_id ORDER BY AVG(ws.weighted_score) DESC;

This correctly handles missing criteria by summing only available weights per response.

🤖 Prompt for AI Agents

In @.agents/scripts/generate-models-md.sh around lines 336 - 358, The current AVG(...) * (1.0/0.25) calculation in the sqlite query assumes each response has all four criteria and miscomputes when a criterion is missing; change the query to compute a per-response weighted sum in a subquery (e.g., alias it ws with columns response_id and weighted_score computed by SUM(CASE WHEN ... THEN score * weight END) GROUP BY response_id), JOIN that ws back to responses r on r.response_id, then use AVG(ws.weighted_score) (formatted with printf) and ORDER BY AVG(ws.weighted_score) DESC; update the variable reads (model, responses, avg_score, avg_time) accordingly so avg_score comes from the averaged ws.weighted_score.

coderabbitai · 2026-02-12T22:08:03Z

.agents/scripts/generate-models-md.sh

+while [[ $# -gt 0 ]]; do
+	case "$1" in
+	--output)
+		OUTPUT_PATH="$2"
+		shift 2
+		;;


⚠️ Potential issue | 🟠 Major

Missing guard on --output — $2 may be unbound if it's the last argument.

With set -u, passing --output without a value will cause an unbound variable error. Add a check before accessing $2.

🐛 Proposed fix

--output) + if [[ $# -lt 2 ]]; then + log_error "--output requires a value" + exit 1 + fi OUTPUT_PATH="$2" shift 2 ;;

As per coding guidelines: "Reliability and robustness" and "Proper exit codes" for .agents/scripts/*.sh.

🤖 Prompt for AI Agents

In @.agents/scripts/generate-models-md.sh around lines 492 - 497, The --output case in the argument-parsing loop uses "$2" without guarding that a value exists, which causes an unbound variable error under set -u; update the case branch that assigns OUTPUT_PATH to first verify that a next argument exists and is not another option (e.g., check that $# -ge 2 and that "$2" does not start with '-') and if the value is missing print a clear error to stderr and exit with a non-zero code; adjust the case for "--output" (and any similar flags) to perform this validation before assigning OUTPUT_PATH and shifting.

coderabbitai · 2026-02-12T22:08:03Z

.agents/scripts/supervisor-helper.sh

+	local models_md_interval=3600 # seconds (1 hour)
+	local models_md_stamp="$SUPERVISOR_DIR/models-md-last-regen"
+	local models_md_now
+	models_md_now=$(date +%s)
+	local models_md_last=0
+	if [[ -f "$models_md_stamp" ]]; then
+		models_md_last=$(cat "$models_md_stamp" 2>/dev/null || echo 0)
+	fi
+	local models_md_elapsed=$((models_md_now - models_md_last))
+	if [[ "$models_md_elapsed" -ge "$models_md_interval" ]]; then
+		local generate_script="${SCRIPT_DIR}/generate-models-md.sh"
+		if [[ -x "$generate_script" ]]; then
+			local models_repos
+			models_repos=$(db "$SUPERVISOR_DB" "SELECT DISTINCT repo FROM tasks;" 2>/dev/null || true)
+			if [[ -n "$models_repos" ]]; then
+				while IFS= read -r models_repo_path; do
+					[[ -n "$models_repo_path" && -d "$models_repo_path" ]] || continue
+					local models_repo_root
+					models_repo_root=$(git -C "$models_repo_path" rev-parse --show-toplevel 2>/dev/null) || continue
+					log_verbose "  Phase 12: Regenerating MODELS.md in $models_repo_root"
+					if "$generate_script" --output "${models_repo_root}/MODELS.md" --quiet 2>/dev/null; then
+						if git -C "$models_repo_root" diff --quiet -- MODELS.md 2>/dev/null; then
+							log_verbose "  Phase 12: MODELS.md unchanged in $models_repo_root"
+						else
+							git -C "$models_repo_root" add MODELS.md 2>/dev/null \
+								&& git -C "$models_repo_root" commit -m "chore: regenerate MODELS.md leaderboard (t1012)" --no-verify 2>/dev/null \
+								&& git -C "$models_repo_root" push 2>/dev/null \
+								&& log_info "  Phase 12: MODELS.md updated and pushed ($models_repo_root)" \
+								|| log_warn "  Phase 12: MODELS.md regenerated but commit/push failed ($models_repo_root)"
+						fi
+					else
+						log_warn "  Phase 12: MODELS.md generation failed for $models_repo_root"
+					fi
+				done <<<"$models_repos"
+			fi
+		fi
+		echo "$models_md_now" > "$models_md_stamp" 2>/dev/null || true
+	else


⚠️ Potential issue | 🟠 Major

Don’t advance the regen stamp on failure — preserve throttle state.
Right now the stamp is updated unconditionally, so a failed generation (or missing script) suppresses retries for an hour. The PR objective says failures should preserve throttling state. Gate the stamp update on success (or at least when at least one repo regenerated).

✅ Suggested reliability guard

if [[ "$models_md_elapsed" -ge "$models_md_interval" ]]; then local generate_script="${SCRIPT_DIR}/generate-models-md.sh" if [[ -x "$generate_script" ]]; then + local models_md_success=false local models_repos models_repos=$(db "$SUPERVISOR_DB" "SELECT DISTINCT repo FROM tasks;" 2>/dev/null || true) if [[ -n "$models_repos" ]]; then while IFS= read -r models_repo_path; do [[ -n "$models_repo_path" && -d "$models_repo_path" ]] || continue local models_repo_root models_repo_root=$(git -C "$models_repo_path" rev-parse --show-toplevel 2>/dev/null) || continue log_verbose " Phase 12: Regenerating MODELS.md in $models_repo_root" if "$generate_script" --output "${models_repo_root}/MODELS.md" --quiet 2>/dev/null; then + models_md_success=true if git -C "$models_repo_root" diff --quiet -- MODELS.md 2>/dev/null; then log_verbose " Phase 12: MODELS.md unchanged in $models_repo_root" else git -C "$models_repo_root" add MODELS.md 2>/dev/null \ && git -C "$models_repo_root" commit -m "chore: regenerate MODELS.md leaderboard (t1012)" --no-verify 2>/dev/null \ && git -C "$models_repo_root" push 2>/dev/null \ && log_info " Phase 12: MODELS.md updated and pushed ($models_repo_root)" \ || log_warn " Phase 12: MODELS.md regenerated but commit/push failed ($models_repo_root)" fi else log_warn " Phase 12: MODELS.md generation failed for $models_repo_root" fi done <<<"$models_repos" fi - fi - echo "$models_md_now" > "$models_md_stamp" 2>/dev/null || true + if [[ "$models_md_success" == "true" ]]; then + echo "$models_md_now" >"$models_md_stamp" 2>/dev/null || true + else + log_warn " Phase 12: MODELS.md regen failed — keeping throttle stamp unchanged" + fi + fi else

🤖 Prompt for AI Agents

In @.agents/scripts/supervisor-helper.sh around lines 11008 - 11045, The stamp update is currently unconditional; change this so the stamp (models_md_stamp) is only written when at least one repository was successfully regenerated. Introduce a local flag (e.g., models_md_succeeded=0) before iterating models_repos and set it to 1 when a regeneration produces a change (inside the branch where "$generate_script" succeeds and git -C "$models_repo_root" diff shows MODELS.md was updated); after the loop, only echo "$models_md_now" > "$models_md_stamp" if models_md_succeeded is 1. Keep all existing logging/commit/push behavior but avoid touching the stamp on failures or when no repos changed.

coderabbitai · 2026-02-12T22:08:03Z

.agents/scripts/supervisor-helper.sh

+				else
+					log_warn "  Phase 12: MODELS.md generation failed"
+				fi
+			fi
+		fi
+		record_throttle "$models_md_throttle_key" 2>/dev/null || true
+	else
+		log_verbose "  Phase 12: MODELS.md regen skipped (throttled)"
+	fi


⚠️ Potential issue | 🔴 Critical

Fix the stray Phase‑12 throttle block — it breaks bash parsing.
There’s a dangling else/fi block plus record_throttle calls after the Phase 12 logic, which makes the script invalid and matches the ShellCheck parse errors reported in CI (SC1073/SC1056/SC1072). Remove the orphaned block to restore a single coherent throttle path.

🧩 Proposed fix (remove orphaned block)

- else - log_warn " Phase 12: MODELS.md generation failed" - fi - fi - fi - record_throttle "$models_md_throttle_key" 2>/dev/null || true - else - log_verbose " Phase 12: MODELS.md regen skipped (throttled)" - fi

As per coding guidelines, “Run ShellCheck with zero violations on all scripts in .agents/scripts/”.

🤖 Prompt for AI Agents

In @.agents/scripts/supervisor-helper.sh around lines 11049 - 11057, Remove the orphaned else/fi block and the stray record_throttle call that follow the Phase 12 MODELS.md logic so the throttle path is a single coherent if/else branch; specifically, edit the Phase 12 section around the log_warn/log_verbose outputs and remove the dangling else/fi and the subsequent record_throttle "$models_md_throttle_key" call so that only the intended throttle handling remains and the script parses correctly (ensure functions/commands referenced like record_throttle, log_warn, and log_verbose remain in their proper conditional blocks).

marcusquinn · 2026-02-13T00:34:35Z

Closing: merge conflicts and framework validation failure. Task t1012 will be re-dispatched fresh. Also blocked-by:t1011 which needs re-implementation.

marcusquinn added 2 commits February 12, 2026 21:53

feat: add MODELS.md — initial leaderboard snapshot (t1012)

3fc960a

Generated from live data: 17 models across 6 providers, 487 pattern data points, 18 scored responses. Shows success rates by model tier and task type.

feat: add supervisor Phase 12 for MODELS.md auto-regeneration (t1012)

b970e25

Hourly throttled pulse phase iterates over known repos and regenerates MODELS.md when pattern data changes. Registered in subagent-index.toon.

marcusquinn marked this pull request as ready for review February 12, 2026 22:04

coderabbitai bot requested changes Feb 12, 2026

View reviewed changes

marcusquinn closed this Feb 13, 2026

marcusquinn mentioned this pull request Feb 13, 2026

t1012: MODELS.md — live model leaderboard with success rates #1341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

t1012: MODELS.md — live model leaderboard with success rates#1305

t1012: MODELS.md — live model leaderboard with success rates#1305
marcusquinn wants to merge 3 commits intomainfrom
feature/t1012

marcusquinn commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

sonarqubecloud bot commented Feb 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

marcusquinn commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

marcusquinn commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Output Format

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 12, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

github-actions bot commented Feb 12, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

sonarqubecloud bot commented Feb 12, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

marcusquinn commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading