feat(phase3): Add Phase 3 agent workflows and deployment verification by stranske · Pull Request #653 · stranske/Workflows

stranske · 2026-01-08T03:28:56Z

Summary

This PR implements Phase 3 of the LangChain post-code rollout with:

New Workflows Created

Workflow	Purpose	Trigger
`agents-capability-check.yml`	Pre-flight agent feasibility gate	`agent:codex` label on issue
`agents-decompose.yml`	Task decomposition for large issues	`agents:decompose` label
`agents-dedup.yml`	Duplicate detection using embeddings	Auto on issue creation
`agents-auto-label.yml`	Semantic label matching	Auto on issue create/edit
`agents-verify-to-issue.yml`	Create follow-up issues from verification	`verify:create-issue` on merged PR

Investigation Completed

Investigated verify:compare on Trend_Model_Project PR #4249:

Root cause: PR is OPEN, not merged
Status: ✅ NOT A BUG - Verifier correctly skipped (designed for merged PRs only)
Finding: verify:* labels missing in most repos (only Travel-Plan-Permission has them)
Action needed: Run scripts/create_verifier_labels.py --execute on consumer repos

Deployment Verification Plan

Added comprehensive cross-repo verification plan:

Sync deployment tracking for all 7 consumer repos
Label prerequisite checklist
Workflow-by-workflow test plan
Troubleshooting guide

Files Changed

templates/consumer-repo/.github/workflows/agents-*.yml - 5 new workflows
.github/sync-manifest.yml - Added new workflow entries
docs/plans/langchain-post-code-rollout.md - Updated status, added deployment plan
tests/scripts/test_pr_verifier_fallback.py - Test for PR verifier fallback behavior

Next Steps

After merge:

Sync workflow triggers to create PRs in consumer repos
Create verify:* labels in repos that need them
Test each new workflow per the deployment plan

Phase 3: Pre-Agent Intelligence (4 capabilities) - 3A: Capability Check - supplements agents:optimize with feasibility gate - Runs BEFORE agent assignment on Issues (not after) - Adds needs-human label when agent cannot proceed - 3B: Task Decomposition - auto-split large issues - 3C: Duplicate Detection - comment-only mode, track false positives - 3D: Semantic Labeling - auto-suggest/apply labels Testing Plan: - Test repo: Manager-Database - ~11 test issues across 4 capabilities - False positive tracking for dedup (target: <5%) - Metrics dashboard for validation Also updates: - Mark Collab-Admin PR #113 as merged (7/7 repos now synced) - All immediate tasks completed - Phase 3 ready to begin

Created 3 test issues: - #193: Stripe integration (should FAIL capability check) - #194: Health monitoring (should trigger task decomposition) - #196: Manager list API (should detect as duplicate of #133) Updated testing metrics dashboard to track progress.

Phase 4 includes 5 initiatives: - 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos - 4B: User Guide - Operational documentation for label system (sync to consumers) - 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation - 4D: Conflict Resolution - Automated merge conflict handling in keepalive - 4E: Verify-to-Issue - Create follow-up issues from verification feedback Key decisions: - Auto-pilot uses workflow_dispatch between steps (not chained labels) - Conflict detection added to keepalive loop (not separate workflow) - Verify-to-issue is user-triggered (not automatic, avoids false positives) Also identifies 7 additional automation opportunities for future phases. Testing plan defined for Manager-Database.

Label Analysis Corrections: - agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js) - agents:activated IS functional (agents_pr_meta_keepalive.js) - from:codex/copilot ARE functional (merge_manager.js) - automerge IS functional (merge_manager.js, agents_belt_scan.js) - agents (bare) IS functional (agent_task.yml template) - risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates) Only 5-6 labels confirmed as bloat: - codex (bare) - redundant with agent:codex - ai:agent - zero matches - auto-merge-audit - zero matches - automerge:ok - zero matches - architecture, backend, cli, etc. - repo-specific, not synced Phase 5 Analysis: - 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow - 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check - 5C: Stale PR cleanup - not needed - 5D: Dependabot - partial (auto-label exists, add auto-merge) - 5E: Issue lint - soft warning approach - 5F: Cross-repo linking - weekly scan with semantic_matcher.py - 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow)

Label consolidation: - Replace agents:pause with agents:paused in all source files - Update keepalive_gate.js PAUSE_LABEL constant - Update keepalive_orchestrator_gate_runner.js hardcoded check - Update test to use agents:paused - Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md Phase 4 updates: - 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit) - 4B: Add optional issue creation feature to user guide (deferred) - 4D: Full conflict resolution implementation with code examples - 4E: Complete verify-to-issue workflow implementation Phase 5 updates: - 5F: Marked as SKIPPED (not needed per user decision) - 5G: Full LangSmith integration plan + custom metrics All keepalive tests pass (8/8).

Phase 4 implementations: - 4A: Add scripts/cleanup_labels.py for label auditing - Classifies labels as functional/bloat/idiosyncratic - Requires --confirm flag for actual deletion - Reports audit results with recommendations - 4D: Add conflict detection for keepalive pipeline - .github/scripts/conflict_detector.js module - Detects conflicts from GitHub API, CI logs, PR comments - templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md - 4E: Add agents-verify-to-issue.yml workflow - Creates follow-up issues from verification feedback - User-triggered via verify:create-issue label - Extracts concerns and low scores automatically Phase 5 implementations: - 5A: Add agents-auto-label.yml workflow - Semantic label matching for new issues - 90% threshold for auto-apply, 75% for suggestions - Uses existing label_matcher.py script - 5G: Add LangSmith tracing to tools/llm_provider.py - _setup_langsmith_tracing() function - Auto-configures when LANGSMITH_API_KEY present Also: - Update .github/sync-manifest.yml with new sync entries - Update docs/LABELS.md with new label documentation

Code quality improvements based on automated code review: 1. tools/llm_provider.py: - Fix LangSmith API key env var (LANGSMITH_API_KEY vs LANGCHAIN_API_KEY) - Improve f-string formatting for logging - Add usage comment for LANGSMITH_ENABLED constant 2. .github/scripts/conflict_detector.js: - Add debug logging in catch blocks instead of silent failures - Makes debugging easier when log downloads fail 3. .github/workflows/agents-verify-to-issue.yml: - Replace /tmp file usage with GitHub Actions environment files - Use heredoc delimiter for multi-line output - Consolidate find and extract steps for cleaner flow 4. .github/workflows/agents-auto-label.yml: - Make Workflows repo checkout configurable (not hardcoded) - Use github.paginate() for label retrieval (handles >100 labels) 5. templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md: - Replace hardcoded 'main' with {{base_branch}} template variable - Make verification steps language-agnostic (not Python-specific) - Add note about checking project README for test commands

1. Fix test_integration_template_installs_and_tests - The test used --user pip install flag which fails in virtualenvs - Added _in_virtualenv() helper to detect virtualenv environment - Only use --user flag when NOT in a virtualenv 2. Add new workflows to expected names mapping - agents-auto-label.yml: 'Auto-Label Issues' - agents-verify-to-issue.yml: 'Create Issue from Verification' 3. Update workflow documentation - docs/ci/WORKFLOWS.md: Added bullet points for new workflows - docs/ci/WORKFLOW_SYSTEM.md: Added table rows for new workflows All 1120 tests now pass.

actionlint was failing because the Match labels step had two env blocks. Merged ISSUE_TITLE and ISSUE_BODY into the main env block.

Phase 3 Pre-Agent Intelligence workflows: - agents-capability-check.yml: Pre-flight agent feasibility gate - agents-decompose.yml: Task decomposition for large issues - agents-dedup.yml: Duplicate detection using embeddings - agents-auto-label.yml: Semantic label matching Also includes: - agents-verify-to-issue.yml: Create follow-up issues from verification (Phase 4E) - Updated sync-manifest.yml with all new workflow entries - pr_verifier.py: Auth error fallback for LLM provider resilience - Tests for fallback behavior All Phase 3 scripts have 129 tests passing.

- Mark all Phase 3 implementation tasks as complete - Add detailed test suite with 12 specific test cases: - Suite A: Capability Check (3 tests) - Suite B: Task Decomposition (3 tests) - Suite C: Duplicate Detection (4 tests) - Suite D: Auto-Label (2 tests) - Include pre-testing checklist and execution tracking table - Add rollback plan and success criteria - Include sample issue bodies for reproducible tests

Addresses known issue: verify:compare works on Travel-Plan-Permission but fails on Trend_Model_Project PR #4249. New deployment verification plan includes: - Phase 1: Sync deployment tracking across all 7 repos - Phase 2: Existing workflow verification (investigate failures) - Phase 3: New workflow verification with specific test cases - Phase 4: Troubleshooting guide for common issues - Cross-repo verification summary with minimum pass criteria Separates deployment verification from functional regression testing.

…behavior) Investigation findings for Trend_Model_Project PR #4249: - Root cause: PR is OPEN, not merged - Verifier correctly skipped (designed for merged PRs only) - verify:* labels missing in most repos (only Travel-Plan-Permission has them) - Added label prerequisite checklist to deployment plan - Updated verification summary with resolved status

Merge main's Phase 5A auto-label description with Phase 3 workflow additions: - agents-capability-check.yml (Phase 3A) - agents-decompose.yml (Phase 3B) - agents-dedup.yml (Phase 3C)

github-actions · 2026-01-08T03:33:23Z

Copilot

Pull request overview

This PR implements Phase 3 of the LangChain post-code rollout, adding five new agent workflows for pre-agent intelligence capabilities, deployment verification infrastructure, and Phase 4 groundwork.

Key changes:

5 new Phase 3 workflows: capability-check, task decomposition, duplicate detection, auto-labeling, and verify-to-issue
LangSmith tracing integration for LLM observability and debugging
PR verifier fallback logic to handle authentication errors gracefully
Comprehensive deployment verification plan with cross-repo testing strategy

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tools/llm_provider.py`	Adds LangSmith tracing setup for LLM observability
`scripts/langchain/pr_verifier.py`	Implements auth error fallback between GitHub Models and OpenAI
`tests/scripts/test_pr_verifier_fallback.py`	Tests for new fallback behavior
`tests/test_integration_repo_template.py`	Fixes virtualenv detection for pip install
`tests/workflows/test_workflow_naming.py`	Adds expected names for 2 new workflows
`templates/consumer-repo/.github/workflows/agents-capability-check.yml`	Pre-flight check to identify tasks agents cannot complete
`templates/consumer-repo/.github/workflows/agents-decompose.yml`	Breaks large issues into actionable sub-tasks
`templates/consumer-repo/.github/workflows/agents-dedup.yml`	Detects duplicate issues using semantic similarity
`templates/consumer-repo/.github/workflows/agents-auto-label.yml`	Auto-applies/suggests labels based on content
`templates/consumer-repo/.github/workflows/agents-verify-to-issue.yml`	Creates follow-up issues from verification feedback
`.github/workflows/agents-verify-to-issue.yml`	Workflows repo copy of verify-to-issue
`.github/workflows/agents-auto-label.yml`	Workflows repo copy of auto-label
`scripts/cleanup_labels.py`	Script for auditing and removing bloat labels
`.github/scripts/conflict_detector.js`	Detects merge conflicts for targeted resolution
`templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md`	Agent prompt for resolving merge conflicts
`.github/sync-manifest.yml`	Adds 5 new workflows + conflict detector to sync
`docs/plans/langchain-post-code-rollout.md`	Major update: Phase 3 status, deployment plan, test strategy
`docs/ci/WORKFLOW_SYSTEM.md`	Documents 2 new workflows in system table
`docs/ci/WORKFLOWS.md`	Lists 2 new workflows
`docs/LABELS.md`	Documents verify:create-issue and keepalive control labels

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T03:33:27Z

scripts/langchain/pr_verifier.py

+            fallback_provider = "openai" if "github-models" in provider_name else "github-models"
+            fallback_resolved = _get_llm_client(model=model, provider=fallback_provider)
+            if fallback_resolved is not None:
+                fallback_client, fallback_provider_name = fallback_resolved
+                try:
+                    response = fallback_client.invoke(prompt)
+                    content = getattr(response, "content", None) or str(response)
+                    result = _parse_llm_response(content, fallback_provider_name)
+                    # Add note about fallback
+                    if result.summary:
+                        result = EvaluationResult(
+                            verdict=result.verdict,
+                            scores=result.scores,
+                            concerns=result.concerns,
+                            summary=result.summary,
+                            provider_used=fallback_provider_name,
+                            model=result.model,
+                            used_llm=result.used_llm,
+                            error=f"Primary provider ({provider_name}) failed, used fallback",
+                            raw_content=result.raw_content,
+                        )
+                    return result
+                except Exception as fallback_exc:
+                    return _fallback_evaluation(
+                        f"Primary ({provider_name}): {exc}; Fallback ({fallback_provider_name}): {fallback_exc}"
+                    )


The fallback logic has a bug: when provider is None (auto mode), the code checks if "github-models" is in provider_name to determine the fallback provider. However, if provider_name is "openai/gpt-4o", the fallback would be "github-models", which could fail with the same auth error. The logic should check the actual provider_name value to avoid circular fallback attempts. Consider: fallback_provider = "openai" if "github-models" in provider_name else "github-models" might fail if OpenAI was the primary and GitHub Models lacks permission.

Suggested change

fallback_provider = "openai" if "github-models" in provider_name else "github-models"

fallback_resolved = _get_llm_client(model=model, provider=fallback_provider)

if fallback_resolved is not None:

fallback_client, fallback_provider_name = fallback_resolved

try:

response = fallback_client.invoke(prompt)

content = getattr(response, "content", None) or str(response)

result = _parse_llm_response(content, fallback_provider_name)

# Add note about fallback

if result.summary:

result = EvaluationResult(

verdict=result.verdict,

scores=result.scores,

concerns=result.concerns,

summary=result.summary,

provider_used=fallback_provider_name,

model=result.model,

used_llm=result.used_llm,

error=f"Primary provider ({provider_name}) failed, used fallback",

raw_content=result.raw_content,

)

return result

except Exception as fallback_exc:

return _fallback_evaluation(

f"Primary ({provider_name}): {exc}; Fallback ({fallback_provider_name}): {fallback_exc}"

)

# Determine the base provider from the resolved provider name and choose the opposite

base_provider = provider_name.split("/", 1)[0] if provider_name else ""

if base_provider == "github-models":

fallback_provider = "openai"

elif base_provider == "openai":

fallback_provider = "github-models"

else:

fallback_provider = None

if fallback_provider is not None and fallback_provider != base_provider:

fallback_resolved = _get_llm_client(model=model, provider=fallback_provider)

if fallback_resolved is not None:

fallback_client, fallback_provider_name = fallback_resolved

try:

response = fallback_client.invoke(prompt)

content = getattr(response, "content", None) or str(response)

result = _parse_llm_response(content, fallback_provider_name)

# Add note about fallback

if result.summary:

result = EvaluationResult(

verdict=result.verdict,

scores=result.scores,

concerns=result.concerns,

summary=result.summary,

provider_used=fallback_provider_name,

model=result.model,

used_llm=result.used_llm,

error=f"Primary provider ({provider_name}) failed, used fallback",

raw_content=result.raw_content,

)

return result

except Exception as fallback_exc:

return _fallback_evaluation(

f"Primary ({provider_name}): {exc}; Fallback ({fallback_provider_name}): {fallback_exc}"

)

Copilot · 2026-01-08T03:33:28Z

templates/consumer-repo/.github/workflows/agents-auto-label.yml

+          repository: stranske/Workflows
+          path: workflows-repo
+


The repository value uses a conditional that will always resolve to 'stranske/Workflows'. When github.repository == 'stranske/Workflows', it returns github.repository (which is 'stranske/Workflows'). When the condition is false, it returns 'stranske/Workflows'. This makes the conditional redundant. Simply use repository: stranske/Workflows for clarity.

Copilot · 2026-01-08T03:33:28Z

.github/sync-manifest.yml

+  - source: .github/workflows/agents-capability-check.yml
+    description: "Capability check - pre-flight agent feasibility gate (Phase 3A)"
+
+  - source: .github/workflows/agents-decompose.yml
+    description: "Task decomposition - breaks large issues into sub-tasks (Phase 3B)"
+
+  - source: .github/workflows/agents-dedup.yml
+    description: "Duplicate detection - flags similar open issues (Phase 3C)"
+


The sync manifest references workflow files that don't exist in the source repository. The manifest lists agents-capability-check.yml, agents-decompose.yml, and agents-dedup.yml at lines 75, 78, and 81, but these files only exist in templates/consumer-repo/.github/workflows/ and not in .github/workflows/. The sync workflow will fail when trying to sync these files to consumer repos. Either create these workflows in .github/workflows/ or remove them from the sync manifest until they're ready.

Suggested change

- source: .github/workflows/agents-capability-check.yml

description: "Capability check - pre-flight agent feasibility gate (Phase 3A)"

- source: .github/workflows/agents-decompose.yml

description: "Task decomposition - breaks large issues into sub-tasks (Phase 3B)"

- source: .github/workflows/agents-dedup.yml

description: "Duplicate detection - flags similar open issues (Phase 3C)"

github-actions · 2026-01-08T03:33:38Z

Automated Status Summary

Head SHA: c86af03
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	92.21%
Baseline	85.00%
Delta	+7.21%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`scripts/workflow_health_check.py`	62.6%	28
`scripts/classify_test_failures.py`	62.9%	37
`scripts/ledger_validate.py`	65.3%	63
`scripts/mypy_return_autofix.py`	82.6%	11
`scripts/ledger_migrate_base.py`	85.5%	13
`scripts/fix_cosmetic_aggregate.py`	92.3%	1
`scripts/coverage_history_append.py`	92.8%	2
`scripts/workflow_validator.py`	93.3%	4
`scripts/update_autofix_expectations.py`	93.9%	1
`scripts/pr_metrics_tracker.py`	95.7%	3
`scripts/generate_residual_trend.py`	96.6%	1
`scripts/build_autofix_pr_comment.py`	97.0%	2
`scripts/aggregate_agent_metrics.py`	97.2%	0
`scripts/fix_numpy_asserts.py`	98.1%	0
`scripts/sync_test_dependencies.py`	98.3%	1

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

github-actions · 2026-01-08T03:33:58Z

🤖 Keepalive Loop Status

PR #653 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/8 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

- PR #653 merged, sync workflow completed - All 7 consumer repos have sync PRs open (need manual merge) - All 7 repos have required labels (verify:*, agents:decompose, needs-human) - No substantive bot code review comments on sync PRs - Added sync PR links for manual merging

stranske added 14 commits January 7, 2026 17:19

Merge branch 'main' into phase3-planning

07bd947

fix: Remove duplicate env key in agents-auto-label.yml

0d33c42

actionlint was failing because the Match labels step had two env blocks. Merged ISSUE_TITLE and ISSUE_BODY into the main env block.

Copilot AI review requested due to automatic review settings January 8, 2026 03:28

Copilot started reviewing on behalf of stranske January 8, 2026 03:29 View session

fix: Resolve merge conflict in sync-manifest.yml

2b3064f

Merge main's Phase 5A auto-label description with Phase 3 workflow additions: - agents-capability-check.yml (Phase 3A) - agents-decompose.yml (Phase 3B) - agents-dedup.yml (Phase 3C)

stranske temporarily deployed to agent-standard January 8, 2026 03:32 — with GitHub Actions Inactive

github-actions bot added the autofix Opt-in automated formatting & lint remediation label Jan 8, 2026

Copilot AI reviewed Jan 8, 2026

View reviewed changes

stranske merged commit 5e6336d into main Jan 8, 2026
36 of 37 checks passed

stranske deleted the phase3-planning branch January 8, 2026 03:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(phase3): Add Phase 3 agent workflows and deployment verification#653

feat(phase3): Add Phase 3 agent workflows and deployment verification#653
stranske merged 15 commits intomainfrom
phase3-planning

stranske commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Jan 8, 2026

Summary

New Workflows Created

Investigation Completed

Deployment Verification Plan

Files Changed

Next Steps

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 8, 2026

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Jan 8, 2026

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants