Add model name to comparison by stranske · Pull Request #644 · stranske/Workflows

stranske · 2026-01-07T16:57:15Z

No description provided.

- Add model parameter to test client tuples in test_pr_verifier_compare.py - Update table header assertion to include Model column in test_pr_verifier_comparison_report.py - Addresses failing checks and bot review comments in PR #643

The agent verifier will no longer automatically create issues after PR evaluations, regardless of CONCERNS or FAIL verdicts. This addresses user feedback that automatic issue creation was creating unwanted noise. Changes: - Modified _should_create_issue() to always return False - Updated test to verify that issue creation is disabled - Workflow will still have --create-issue flag but it will have no effect

…I access - Add 'models: read' permission to reusable-agents-verifier.yml - Add 'models: read' permission to agents-verifier.yml - Fixes 401 authentication errors when using GitHub Models provider - Templates already have this permission configured Resolves GitHub Models authentication issue identified in Travel-Plan-Permission PR #318 test

agents-workflows-bot · 2026-01-07T16:57:29Z

⚠️ Action Required: Unable to determine source issue for PR #644. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

github-actions · 2026-01-07T16:57:44Z

Automated Status Summary

Head SHA: 0bc7819
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	92.21%
Baseline	85.00%
Delta	+7.21%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`scripts/workflow_health_check.py`	62.6%	28
`scripts/classify_test_failures.py`	62.9%	37
`scripts/ledger_validate.py`	65.3%	63
`scripts/mypy_return_autofix.py`	82.6%	11
`scripts/ledger_migrate_base.py`	85.5%	13
`scripts/fix_cosmetic_aggregate.py`	92.3%	1
`scripts/coverage_history_append.py`	92.8%	2
`scripts/workflow_validator.py`	93.3%	4
`scripts/update_autofix_expectations.py`	93.9%	1
`scripts/pr_metrics_tracker.py`	95.7%	3
`scripts/generate_residual_trend.py`	96.6%	1
`scripts/build_autofix_pr_comment.py`	97.0%	2
`scripts/aggregate_agent_metrics.py`	97.2%	0
`scripts/fix_numpy_asserts.py`	98.1%	0
`scripts/sync_test_dependencies.py`	98.3%	1

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

github-actions · 2026-01-07T16:58:09Z

🤖 Keepalive Loop Status

PR #644 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/0 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

github-actions · 2026-01-07T16:58:24Z

Copilot

Pull request overview

This PR disables automatic follow-up issue creation for LLM evaluations and adds support for displaying model names in comparison reports, along with adding the necessary GitHub Models API permissions to workflows.

Key Changes

Disabled automatic issue creation for PR evaluation failures/concerns by modifying _should_create_issue to always return False
Added models: read permission to workflow files to enable GitHub Models API access
Updated documentation to reflect completed sync operations and the disabled issue creation feature

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`tests/scripts/test_pr_verifier_issue_creation.py`	Updated test to verify that issue creation is now disabled, removing mock HTTP requests and asserting None return value
`scripts/langchain/pr_verifier.py`	Modified `_should_create_issue` function to always return False with explanatory comment
`docs/plans/langchain-post-code-rollout.md`	Documented PR #643 merge, updated sync status for consumer repos, added test results from Travel-Plan-Permission #318, and marked issue creation as disabled
`.github/workflows/reusable-agents-verifier.yml`	Added `models: read` permission for GitHub Models API access
`.github/workflows/agents-verifier.yml`	Added `models: read` permission and cleaned up trailing whitespace in JavaScript code

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-07T17:00:35Z

tests/scripts/test_pr_verifier_issue_creation.py

 def test_create_followup_issue_posts(monkeypatch) -> None:
+    # Since automatic issue creation is disabled, this test verifies
+    # that _create_followup_issue returns None without creating an issue
    result = pr_verifier.EvaluationResult(verdict="FAIL", concerns=["Issue found."])
    monkeypatch.setenv("GITHUB_TOKEN", "token")
    monkeypatch.setenv("GITHUB_REPOSITORY", "org/repo")

-    captured = {}
-
-    class FakeResponse:
-        def __init__(self):
-            self._data = json.dumps({"number": 99}).encode("utf-8")
-
-        def read(self):
-            return self._data
-
-        def __enter__(self):
-            return self
-
-        def __exit__(self, exc_type, exc, tb):
-            return False
-
-    def fake_urlopen(request):
-        captured["url"] = request.full_url
-        captured["body"] = request.data
-        return FakeResponse()
-
-    monkeypatch.setattr(pr_verifier.urllib.request, "urlopen", fake_urlopen)
-
    issue_number = pr_verifier._create_followup_issue(
        result,
        "- Pull request: [#99](https://example.com/pr/99)",
        labels=["agent:codex"],
        run_url="https://example.com/run/99",
    )

-    assert issue_number == 99
-    assert captured["url"] == "https://api.github.com/repos/org/repo/issues"
-    payload = json.loads(captured["body"].decode("utf-8"))
-    assert payload["title"] == "LLM evaluation concerns for PR #99"
-    assert payload["labels"] == ["agent:codex"]
+    # Automatic issue creation is disabled, so this should return None
+    assert issue_number is None


There is no test coverage for the _should_create_issue function itself. While the existing tests verify that _create_followup_issue returns None when issue creation is disabled, there should be a direct test for _should_create_issue to ensure it returns False for all verdict types. This would make the disabled behavior more explicit and prevent accidental re-enabling without proper testing.

stranske added 4 commits January 7, 2026 13:21

Copilot AI review requested due to automatic review settings January 7, 2026 16:57

Merge branch 'main' into add-model-name-to-comparison

a217c22

github-actions bot added the autofix Opt-in automated formatting & lint remediation label Jan 7, 2026

Copilot started reviewing on behalf of stranske January 7, 2026 16:57 View session

stranske temporarily deployed to agent-high-privilege January 7, 2026 16:57 — with GitHub Actions Inactive

stranske merged commit a1254d1 into main Jan 7, 2026
37 checks passed

stranske deleted the add-model-name-to-comparison branch January 7, 2026 17:00

Copilot AI reviewed Jan 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model name to comparison#644

Add model name to comparison#644
stranske merged 5 commits intomainfrom
add-model-name-to-comparison

stranske commented Jan 7, 2026

Uh oh!

agents-workflows-bot bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Jan 7, 2026

Uh oh!

agents-workflows-bot bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading