feat: add model name to comparison report output by stranske · Pull Request #643 · stranske/Workflows

stranske · 2026-01-07T13:22:11Z

Enhancement

Display the model name used by each provider in verify:compare reports. This makes it clear which models were used for evaluation, especially when using different models via the model1/model2 parameters.

Changes

1. Data Model

Added model: str | None field to EvaluationResult to track the model used
Updated _get_llm_clients to return list[tuple[object, str, str]] (client, provider, model)
Updated ComparisonRunner.clients type annotation

2. Report Display

Provider Summary Table:

| Provider | Model | Verdict | Confidence | Summary |
| --- | --- | --- | --- | --- |
| github-models | gpt-4o | PASS | 85% | ... |
| openai | gpt-5.2 | CONCERNS | 72% | ... |

Expandable Details:

#### github-models
- **Model:** gpt-4o
- **Verdict:** PASS
- **Confidence:** 85%
...

Testing

✅ All 8 existing tests pass
✅ Pre-commit hooks pass (syntax, formatting, type check, lint)

Use Cases

Comparing different models: gpt-4o (GitHub) vs gpt-5.2 (OpenAI)
Understanding which model version produced each evaluation
Debugging model-specific differences in verdicts
Tracking model usage for cost/performance analysis

agents-workflows-bot · 2026-01-07T13:22:21Z

⚠️ Action Required: Unable to determine source issue for PR #643. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

github-actions · 2026-01-07T13:23:26Z

github-actions · 2026-01-07T13:23:42Z

Automated Status Summary

Head SHA: 11b5c01
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	92.21%
Baseline	85.00%
Delta	+7.21%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`scripts/workflow_health_check.py`	62.6%	28
`scripts/classify_test_failures.py`	62.9%	37
`scripts/ledger_validate.py`	65.3%	63
`scripts/mypy_return_autofix.py`	82.6%	11
`scripts/ledger_migrate_base.py`	85.5%	13
`scripts/fix_cosmetic_aggregate.py`	92.3%	1
`scripts/coverage_history_append.py`	92.8%	2
`scripts/workflow_validator.py`	93.3%	4
`scripts/update_autofix_expectations.py`	93.9%	1
`scripts/pr_metrics_tracker.py`	95.7%	3
`scripts/generate_residual_trend.py`	96.6%	1
`scripts/build_autofix_pr_comment.py`	97.0%	2
`scripts/aggregate_agent_metrics.py`	97.2%	0
`scripts/fix_numpy_asserts.py`	98.1%	0
`scripts/sync_test_dependencies.py`	98.3%	1

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

github-actions · 2026-01-07T13:24:03Z

🤖 Keepalive Loop Status

PR #643 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/6 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

Copilot

Pull request overview

This PR enhances the comparison report feature by displaying the model name used by each provider, making it easier to identify which specific models (e.g., gpt-4o vs gpt-4-turbo) were used during evaluation. This is particularly useful when comparing results from different models via the model1/model2 parameters.

Key Changes:

Added model field to EvaluationResult data model to track the model used
Updated _get_llm_clients to return model names alongside clients and providers
Enhanced comparison report tables to include a "Model" column

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/langchain/pr_verifier.py

- Add model parameter to test client tuples in test_pr_verifier_compare.py - Update table header assertion to include Model column in test_pr_verifier_comparison_report.py - Addresses failing checks and bot review comments in PR #643

* feat: add model name to comparison report output Display the model name used by each provider in verify:compare reports. Changes: - Add 'model' field to EvaluationResult to track which model was used - Update _get_llm_clients to return tuples with (client, provider, model) - Add 'Model' column to Provider Summary table - Display model name in expandable Full Provider Details section - Update _fallback_evaluation to accept and store model parameter Example output: | Provider | Model | Verdict | Confidence | Summary | | github-models | gpt-4o | PASS | 85% | ... | | openai | gpt-5.2 | CONCERNS | 72% | ... | This helps users understand which models were used for evaluation, especially when using model1/model2 parameters in compare mode. * fix: update tests to match new 3-tuple format with model name - Add model parameter to test client tuples in test_pr_verifier_compare.py - Update table header assertion to include Model column in test_pr_verifier_comparison_report.py - Addresses failing checks and bot review comments in PR #643 * feat: disable automatic follow-up issue creation by agent verifier The agent verifier will no longer automatically create issues after PR evaluations, regardless of CONCERNS or FAIL verdicts. This addresses user feedback that automatic issue creation was creating unwanted noise. Changes: - Modified _should_create_issue() to always return False - Updated test to verify that issue creation is disabled - Workflow will still have --create-issue flag but it will have no effect * fix: add models permission to verifier workflows for GitHub Models API access - Add 'models: read' permission to reusable-agents-verifier.yml - Add 'models: read' permission to agents-verifier.yml - Fixes 401 authentication errors when using GitHub Models provider - Templates already have this permission configured Resolves GitHub Models authentication issue identified in Travel-Plan-Permission PR #318 test

Copilot AI review requested due to automatic review settings January 7, 2026 13:22

stranske enabled auto-merge (squash) January 7, 2026 13:22

stranske temporarily deployed to agent-standard January 7, 2026 13:22 — with GitHub Actions Inactive

Copilot started reviewing on behalf of stranske January 7, 2026 13:22 View session

github-actions bot added the autofix Opt-in automated formatting & lint remediation label Jan 7, 2026

Copilot AI reviewed Jan 7, 2026

View reviewed changes

scripts/langchain/pr_verifier.py Show resolved Hide resolved

scripts/langchain/pr_verifier.py Show resolved Hide resolved

stranske temporarily deployed to agent-standard January 7, 2026 15:55 — with GitHub Actions Inactive

stranske merged commit 6121bea into main Jan 7, 2026
36 checks passed

stranske deleted the add-model-name-to-comparison branch January 7, 2026 15:57

Copilot AI mentioned this pull request Jan 7, 2026

Add model name to comparison #644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add model name to comparison report output#643

feat: add model name to comparison report output#643
stranske merged 2 commits intomainfrom
add-model-name-to-comparison

stranske commented Jan 7, 2026

Uh oh!

agents-workflows-bot bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Jan 7, 2026

Enhancement

Changes

1. Data Model

2. Report Display

Testing

Use Cases

Uh oh!

agents-workflows-bot bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading