Conversation
Display the model name used by each provider in verify:compare reports. Changes: - Add 'model' field to EvaluationResult to track which model was used - Update _get_llm_clients to return tuples with (client, provider, model) - Add 'Model' column to Provider Summary table - Display model name in expandable Full Provider Details section - Update _fallback_evaluation to accept and store model parameter Example output: | Provider | Model | Verdict | Confidence | Summary | | github-models | gpt-4o | PASS | 85% | ... | | openai | gpt-5.2 | CONCERNS | 72% | ... | This helps users understand which models were used for evaluation, especially when using model1/model2 parameters in compare mode.
- Add model parameter to test client tuples in test_pr_verifier_compare.py - Update table header assertion to include Model column in test_pr_verifier_comparison_report.py - Addresses failing checks and bot review comments in PR #643
The agent verifier will no longer automatically create issues after PR evaluations, regardless of CONCERNS or FAIL verdicts. This addresses user feedback that automatic issue creation was creating unwanted noise. Changes: - Modified _should_create_issue() to always return False - Updated test to verify that issue creation is disabled - Workflow will still have --create-issue flag but it will have no effect
…I access - Add 'models: read' permission to reusable-agents-verifier.yml - Add 'models: read' permission to agents-verifier.yml - Fixes 401 authentication errors when using GitHub Models provider - Templates already have this permission configured Resolves GitHub Models authentication issue identified in Travel-Plan-Permission PR #318 test
Automated Status SummaryHead SHA: 0bc7819
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeNo scope information available Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #644 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
|
Status | ✅ no new diagnostics |
There was a problem hiding this comment.
Pull request overview
This PR disables automatic follow-up issue creation for LLM evaluations and adds support for displaying model names in comparison reports, along with adding the necessary GitHub Models API permissions to workflows.
Key Changes
- Disabled automatic issue creation for PR evaluation failures/concerns by modifying
_should_create_issueto always returnFalse - Added
models: readpermission to workflow files to enable GitHub Models API access - Updated documentation to reflect completed sync operations and the disabled issue creation feature
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tests/scripts/test_pr_verifier_issue_creation.py |
Updated test to verify that issue creation is now disabled, removing mock HTTP requests and asserting None return value |
scripts/langchain/pr_verifier.py |
Modified _should_create_issue function to always return False with explanatory comment |
docs/plans/langchain-post-code-rollout.md |
Documented PR #643 merge, updated sync status for consumer repos, added test results from Travel-Plan-Permission #318, and marked issue creation as disabled |
.github/workflows/reusable-agents-verifier.yml |
Added models: read permission for GitHub Models API access |
.github/workflows/agents-verifier.yml |
Added models: read permission and cleaned up trailing whitespace in JavaScript code |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_create_followup_issue_posts(monkeypatch) -> None: | ||
| # Since automatic issue creation is disabled, this test verifies | ||
| # that _create_followup_issue returns None without creating an issue | ||
| result = pr_verifier.EvaluationResult(verdict="FAIL", concerns=["Issue found."]) | ||
| monkeypatch.setenv("GITHUB_TOKEN", "token") | ||
| monkeypatch.setenv("GITHUB_REPOSITORY", "org/repo") | ||
|
|
||
| captured = {} | ||
|
|
||
| class FakeResponse: | ||
| def __init__(self): | ||
| self._data = json.dumps({"number": 99}).encode("utf-8") | ||
|
|
||
| def read(self): | ||
| return self._data | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc, tb): | ||
| return False | ||
|
|
||
| def fake_urlopen(request): | ||
| captured["url"] = request.full_url | ||
| captured["body"] = request.data | ||
| return FakeResponse() | ||
|
|
||
| monkeypatch.setattr(pr_verifier.urllib.request, "urlopen", fake_urlopen) | ||
|
|
||
| issue_number = pr_verifier._create_followup_issue( | ||
| result, | ||
| "- Pull request: [#99](https://example.com/pr/99)", | ||
| labels=["agent:codex"], | ||
| run_url="https://example.com/run/99", | ||
| ) | ||
|
|
||
| assert issue_number == 99 | ||
| assert captured["url"] == "https://api.github.com/repos/org/repo/issues" | ||
| payload = json.loads(captured["body"].decode("utf-8")) | ||
| assert payload["title"] == "LLM evaluation concerns for PR #99" | ||
| assert payload["labels"] == ["agent:codex"] | ||
| # Automatic issue creation is disabled, so this should return None | ||
| assert issue_number is None |
There was a problem hiding this comment.
There is no test coverage for the _should_create_issue function itself. While the existing tests verify that _create_followup_issue returns None when issue creation is disabled, there should be a direct test for _should_create_issue to ensure it returns False for all verdict types. This would make the disabled behavior more explicit and prevent accidental re-enabling without proper testing.
No description provided.