feat: add model name to comparison report output#643
Conversation
Display the model name used by each provider in verify:compare reports. Changes: - Add 'model' field to EvaluationResult to track which model was used - Update _get_llm_clients to return tuples with (client, provider, model) - Add 'Model' column to Provider Summary table - Display model name in expandable Full Provider Details section - Update _fallback_evaluation to accept and store model parameter Example output: | Provider | Model | Verdict | Confidence | Summary | | github-models | gpt-4o | PASS | 85% | ... | | openai | gpt-5.2 | CONCERNS | 72% | ... | This helps users understand which models were used for evaluation, especially when using model1/model2 parameters in compare mode.
|
Status | ✅ no new diagnostics |
Automated Status SummaryHead SHA: 11b5c01
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeNo scope information available Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #643 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
There was a problem hiding this comment.
Pull request overview
This PR enhances the comparison report feature by displaying the model name used by each provider, making it easier to identify which specific models (e.g., gpt-4o vs gpt-4-turbo) were used during evaluation. This is particularly useful when comparing results from different models via the model1/model2 parameters.
Key Changes:
- Added
modelfield toEvaluationResultdata model to track the model used - Updated
_get_llm_clientsto return model names alongside clients and providers - Enhanced comparison report tables to include a "Model" column
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add model parameter to test client tuples in test_pr_verifier_compare.py - Update table header assertion to include Model column in test_pr_verifier_comparison_report.py - Addresses failing checks and bot review comments in PR #643
* feat: add model name to comparison report output Display the model name used by each provider in verify:compare reports. Changes: - Add 'model' field to EvaluationResult to track which model was used - Update _get_llm_clients to return tuples with (client, provider, model) - Add 'Model' column to Provider Summary table - Display model name in expandable Full Provider Details section - Update _fallback_evaluation to accept and store model parameter Example output: | Provider | Model | Verdict | Confidence | Summary | | github-models | gpt-4o | PASS | 85% | ... | | openai | gpt-5.2 | CONCERNS | 72% | ... | This helps users understand which models were used for evaluation, especially when using model1/model2 parameters in compare mode. * fix: update tests to match new 3-tuple format with model name - Add model parameter to test client tuples in test_pr_verifier_compare.py - Update table header assertion to include Model column in test_pr_verifier_comparison_report.py - Addresses failing checks and bot review comments in PR #643 * feat: disable automatic follow-up issue creation by agent verifier The agent verifier will no longer automatically create issues after PR evaluations, regardless of CONCERNS or FAIL verdicts. This addresses user feedback that automatic issue creation was creating unwanted noise. Changes: - Modified _should_create_issue() to always return False - Updated test to verify that issue creation is disabled - Workflow will still have --create-issue flag but it will have no effect * fix: add models permission to verifier workflows for GitHub Models API access - Add 'models: read' permission to reusable-agents-verifier.yml - Add 'models: read' permission to agents-verifier.yml - Fixes 401 authentication errors when using GitHub Models provider - Templates already have this permission configured Resolves GitHub Models authentication issue identified in Travel-Plan-Permission PR #318 test
Enhancement
Display the model name used by each provider in verify:compare reports. This makes it clear which models were used for evaluation, especially when using different models via the
model1/model2parameters.Changes
1. Data Model
model: str | Nonefield toEvaluationResultto track the model used_get_llm_clientsto returnlist[tuple[object, str, str]](client, provider, model)ComparisonRunner.clientstype annotation2. Report Display
Provider Summary Table:
Expandable Details:
Testing
Use Cases
gpt-4o(GitHub) vsgpt-5.2(OpenAI)