feat(phase3): Add Phase 3 agent workflows and deployment verification#653
feat(phase3): Add Phase 3 agent workflows and deployment verification#653
Conversation
Phase 3: Pre-Agent Intelligence (4 capabilities) - 3A: Capability Check - supplements agents:optimize with feasibility gate - Runs BEFORE agent assignment on Issues (not after) - Adds needs-human label when agent cannot proceed - 3B: Task Decomposition - auto-split large issues - 3C: Duplicate Detection - comment-only mode, track false positives - 3D: Semantic Labeling - auto-suggest/apply labels Testing Plan: - Test repo: Manager-Database - ~11 test issues across 4 capabilities - False positive tracking for dedup (target: <5%) - Metrics dashboard for validation Also updates: - Mark Collab-Admin PR #113 as merged (7/7 repos now synced) - All immediate tasks completed - Phase 3 ready to begin
Phase 4 includes 5 initiatives: - 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos - 4B: User Guide - Operational documentation for label system (sync to consumers) - 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation - 4D: Conflict Resolution - Automated merge conflict handling in keepalive - 4E: Verify-to-Issue - Create follow-up issues from verification feedback Key decisions: - Auto-pilot uses workflow_dispatch between steps (not chained labels) - Conflict detection added to keepalive loop (not separate workflow) - Verify-to-issue is user-triggered (not automatic, avoids false positives) Also identifies 7 additional automation opportunities for future phases. Testing plan defined for Manager-Database.
Label Analysis Corrections: - agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js) - agents:activated IS functional (agents_pr_meta_keepalive.js) - from:codex/copilot ARE functional (merge_manager.js) - automerge IS functional (merge_manager.js, agents_belt_scan.js) - agents (bare) IS functional (agent_task.yml template) - risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates) Only 5-6 labels confirmed as bloat: - codex (bare) - redundant with agent:codex - ai:agent - zero matches - auto-merge-audit - zero matches - automerge:ok - zero matches - architecture, backend, cli, etc. - repo-specific, not synced Phase 5 Analysis: - 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow - 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check - 5C: Stale PR cleanup - not needed - 5D: Dependabot - partial (auto-label exists, add auto-merge) - 5E: Issue lint - soft warning approach - 5F: Cross-repo linking - weekly scan with semantic_matcher.py - 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow)
Label consolidation: - Replace agents:pause with agents:paused in all source files - Update keepalive_gate.js PAUSE_LABEL constant - Update keepalive_orchestrator_gate_runner.js hardcoded check - Update test to use agents:paused - Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md Phase 4 updates: - 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit) - 4B: Add optional issue creation feature to user guide (deferred) - 4D: Full conflict resolution implementation with code examples - 4E: Complete verify-to-issue workflow implementation Phase 5 updates: - 5F: Marked as SKIPPED (not needed per user decision) - 5G: Full LangSmith integration plan + custom metrics All keepalive tests pass (8/8).
Phase 4 implementations: - 4A: Add scripts/cleanup_labels.py for label auditing - Classifies labels as functional/bloat/idiosyncratic - Requires --confirm flag for actual deletion - Reports audit results with recommendations - 4D: Add conflict detection for keepalive pipeline - .github/scripts/conflict_detector.js module - Detects conflicts from GitHub API, CI logs, PR comments - templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md - 4E: Add agents-verify-to-issue.yml workflow - Creates follow-up issues from verification feedback - User-triggered via verify:create-issue label - Extracts concerns and low scores automatically Phase 5 implementations: - 5A: Add agents-auto-label.yml workflow - Semantic label matching for new issues - 90% threshold for auto-apply, 75% for suggestions - Uses existing label_matcher.py script - 5G: Add LangSmith tracing to tools/llm_provider.py - _setup_langsmith_tracing() function - Auto-configures when LANGSMITH_API_KEY present Also: - Update .github/sync-manifest.yml with new sync entries - Update docs/LABELS.md with new label documentation
Code quality improvements based on automated code review:
1. tools/llm_provider.py:
- Fix LangSmith API key env var (LANGSMITH_API_KEY vs LANGCHAIN_API_KEY)
- Improve f-string formatting for logging
- Add usage comment for LANGSMITH_ENABLED constant
2. .github/scripts/conflict_detector.js:
- Add debug logging in catch blocks instead of silent failures
- Makes debugging easier when log downloads fail
3. .github/workflows/agents-verify-to-issue.yml:
- Replace /tmp file usage with GitHub Actions environment files
- Use heredoc delimiter for multi-line output
- Consolidate find and extract steps for cleaner flow
4. .github/workflows/agents-auto-label.yml:
- Make Workflows repo checkout configurable (not hardcoded)
- Use github.paginate() for label retrieval (handles >100 labels)
5. templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md:
- Replace hardcoded 'main' with {{base_branch}} template variable
- Make verification steps language-agnostic (not Python-specific)
- Add note about checking project README for test commands
1. Fix test_integration_template_installs_and_tests - The test used --user pip install flag which fails in virtualenvs - Added _in_virtualenv() helper to detect virtualenv environment - Only use --user flag when NOT in a virtualenv 2. Add new workflows to expected names mapping - agents-auto-label.yml: 'Auto-Label Issues' - agents-verify-to-issue.yml: 'Create Issue from Verification' 3. Update workflow documentation - docs/ci/WORKFLOWS.md: Added bullet points for new workflows - docs/ci/WORKFLOW_SYSTEM.md: Added table rows for new workflows All 1120 tests now pass.
actionlint was failing because the Match labels step had two env blocks. Merged ISSUE_TITLE and ISSUE_BODY into the main env block.
Phase 3 Pre-Agent Intelligence workflows: - agents-capability-check.yml: Pre-flight agent feasibility gate - agents-decompose.yml: Task decomposition for large issues - agents-dedup.yml: Duplicate detection using embeddings - agents-auto-label.yml: Semantic label matching Also includes: - agents-verify-to-issue.yml: Create follow-up issues from verification (Phase 4E) - Updated sync-manifest.yml with all new workflow entries - pr_verifier.py: Auth error fallback for LLM provider resilience - Tests for fallback behavior All Phase 3 scripts have 129 tests passing.
- Mark all Phase 3 implementation tasks as complete - Add detailed test suite with 12 specific test cases: - Suite A: Capability Check (3 tests) - Suite B: Task Decomposition (3 tests) - Suite C: Duplicate Detection (4 tests) - Suite D: Auto-Label (2 tests) - Include pre-testing checklist and execution tracking table - Add rollback plan and success criteria - Include sample issue bodies for reproducible tests
Addresses known issue: verify:compare works on Travel-Plan-Permission but fails on Trend_Model_Project PR #4249. New deployment verification plan includes: - Phase 1: Sync deployment tracking across all 7 repos - Phase 2: Existing workflow verification (investigate failures) - Phase 3: New workflow verification with specific test cases - Phase 4: Troubleshooting guide for common issues - Cross-repo verification summary with minimum pass criteria Separates deployment verification from functional regression testing.
…behavior) Investigation findings for Trend_Model_Project PR #4249: - Root cause: PR is OPEN, not merged - Verifier correctly skipped (designed for merged PRs only) - verify:* labels missing in most repos (only Travel-Plan-Permission has them) - Added label prerequisite checklist to deployment plan - Updated verification summary with resolved status
Merge main's Phase 5A auto-label description with Phase 3 workflow additions: - agents-capability-check.yml (Phase 3A) - agents-decompose.yml (Phase 3B) - agents-dedup.yml (Phase 3C)
|
Status | ✅ no new diagnostics |
There was a problem hiding this comment.
Pull request overview
This PR implements Phase 3 of the LangChain post-code rollout, adding five new agent workflows for pre-agent intelligence capabilities, deployment verification infrastructure, and Phase 4 groundwork.
Key changes:
- 5 new Phase 3 workflows: capability-check, task decomposition, duplicate detection, auto-labeling, and verify-to-issue
- LangSmith tracing integration for LLM observability and debugging
- PR verifier fallback logic to handle authentication errors gracefully
- Comprehensive deployment verification plan with cross-repo testing strategy
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tools/llm_provider.py |
Adds LangSmith tracing setup for LLM observability |
scripts/langchain/pr_verifier.py |
Implements auth error fallback between GitHub Models and OpenAI |
tests/scripts/test_pr_verifier_fallback.py |
Tests for new fallback behavior |
tests/test_integration_repo_template.py |
Fixes virtualenv detection for pip install |
tests/workflows/test_workflow_naming.py |
Adds expected names for 2 new workflows |
templates/consumer-repo/.github/workflows/agents-capability-check.yml |
Pre-flight check to identify tasks agents cannot complete |
templates/consumer-repo/.github/workflows/agents-decompose.yml |
Breaks large issues into actionable sub-tasks |
templates/consumer-repo/.github/workflows/agents-dedup.yml |
Detects duplicate issues using semantic similarity |
templates/consumer-repo/.github/workflows/agents-auto-label.yml |
Auto-applies/suggests labels based on content |
templates/consumer-repo/.github/workflows/agents-verify-to-issue.yml |
Creates follow-up issues from verification feedback |
.github/workflows/agents-verify-to-issue.yml |
Workflows repo copy of verify-to-issue |
.github/workflows/agents-auto-label.yml |
Workflows repo copy of auto-label |
scripts/cleanup_labels.py |
Script for auditing and removing bloat labels |
.github/scripts/conflict_detector.js |
Detects merge conflicts for targeted resolution |
templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md |
Agent prompt for resolving merge conflicts |
.github/sync-manifest.yml |
Adds 5 new workflows + conflict detector to sync |
docs/plans/langchain-post-code-rollout.md |
Major update: Phase 3 status, deployment plan, test strategy |
docs/ci/WORKFLOW_SYSTEM.md |
Documents 2 new workflows in system table |
docs/ci/WORKFLOWS.md |
Lists 2 new workflows |
docs/LABELS.md |
Documents verify:create-issue and keepalive control labels |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fallback_provider = "openai" if "github-models" in provider_name else "github-models" | ||
| fallback_resolved = _get_llm_client(model=model, provider=fallback_provider) | ||
| if fallback_resolved is not None: | ||
| fallback_client, fallback_provider_name = fallback_resolved | ||
| try: | ||
| response = fallback_client.invoke(prompt) | ||
| content = getattr(response, "content", None) or str(response) | ||
| result = _parse_llm_response(content, fallback_provider_name) | ||
| # Add note about fallback | ||
| if result.summary: | ||
| result = EvaluationResult( | ||
| verdict=result.verdict, | ||
| scores=result.scores, | ||
| concerns=result.concerns, | ||
| summary=result.summary, | ||
| provider_used=fallback_provider_name, | ||
| model=result.model, | ||
| used_llm=result.used_llm, | ||
| error=f"Primary provider ({provider_name}) failed, used fallback", | ||
| raw_content=result.raw_content, | ||
| ) | ||
| return result | ||
| except Exception as fallback_exc: | ||
| return _fallback_evaluation( | ||
| f"Primary ({provider_name}): {exc}; Fallback ({fallback_provider_name}): {fallback_exc}" | ||
| ) |
There was a problem hiding this comment.
The fallback logic has a bug: when provider is None (auto mode), the code checks if "github-models" is in provider_name to determine the fallback provider. However, if provider_name is "openai/gpt-4o", the fallback would be "github-models", which could fail with the same auth error. The logic should check the actual provider_name value to avoid circular fallback attempts. Consider: fallback_provider = "openai" if "github-models" in provider_name else "github-models" might fail if OpenAI was the primary and GitHub Models lacks permission.
| fallback_provider = "openai" if "github-models" in provider_name else "github-models" | |
| fallback_resolved = _get_llm_client(model=model, provider=fallback_provider) | |
| if fallback_resolved is not None: | |
| fallback_client, fallback_provider_name = fallback_resolved | |
| try: | |
| response = fallback_client.invoke(prompt) | |
| content = getattr(response, "content", None) or str(response) | |
| result = _parse_llm_response(content, fallback_provider_name) | |
| # Add note about fallback | |
| if result.summary: | |
| result = EvaluationResult( | |
| verdict=result.verdict, | |
| scores=result.scores, | |
| concerns=result.concerns, | |
| summary=result.summary, | |
| provider_used=fallback_provider_name, | |
| model=result.model, | |
| used_llm=result.used_llm, | |
| error=f"Primary provider ({provider_name}) failed, used fallback", | |
| raw_content=result.raw_content, | |
| ) | |
| return result | |
| except Exception as fallback_exc: | |
| return _fallback_evaluation( | |
| f"Primary ({provider_name}): {exc}; Fallback ({fallback_provider_name}): {fallback_exc}" | |
| ) | |
| # Determine the base provider from the resolved provider name and choose the opposite | |
| base_provider = provider_name.split("/", 1)[0] if provider_name else "" | |
| if base_provider == "github-models": | |
| fallback_provider = "openai" | |
| elif base_provider == "openai": | |
| fallback_provider = "github-models" | |
| else: | |
| fallback_provider = None | |
| if fallback_provider is not None and fallback_provider != base_provider: | |
| fallback_resolved = _get_llm_client(model=model, provider=fallback_provider) | |
| if fallback_resolved is not None: | |
| fallback_client, fallback_provider_name = fallback_resolved | |
| try: | |
| response = fallback_client.invoke(prompt) | |
| content = getattr(response, "content", None) or str(response) | |
| result = _parse_llm_response(content, fallback_provider_name) | |
| # Add note about fallback | |
| if result.summary: | |
| result = EvaluationResult( | |
| verdict=result.verdict, | |
| scores=result.scores, | |
| concerns=result.concerns, | |
| summary=result.summary, | |
| provider_used=fallback_provider_name, | |
| model=result.model, | |
| used_llm=result.used_llm, | |
| error=f"Primary provider ({provider_name}) failed, used fallback", | |
| raw_content=result.raw_content, | |
| ) | |
| return result | |
| except Exception as fallback_exc: | |
| return _fallback_evaluation( | |
| f"Primary ({provider_name}): {exc}; Fallback ({fallback_provider_name}): {fallback_exc}" | |
| ) |
| repository: stranske/Workflows | ||
| path: workflows-repo | ||
|
|
There was a problem hiding this comment.
The repository value uses a conditional that will always resolve to 'stranske/Workflows'. When github.repository == 'stranske/Workflows', it returns github.repository (which is 'stranske/Workflows'). When the condition is false, it returns 'stranske/Workflows'. This makes the conditional redundant. Simply use repository: stranske/Workflows for clarity.
| - source: .github/workflows/agents-capability-check.yml | ||
| description: "Capability check - pre-flight agent feasibility gate (Phase 3A)" | ||
|
|
||
| - source: .github/workflows/agents-decompose.yml | ||
| description: "Task decomposition - breaks large issues into sub-tasks (Phase 3B)" | ||
|
|
||
| - source: .github/workflows/agents-dedup.yml | ||
| description: "Duplicate detection - flags similar open issues (Phase 3C)" | ||
|
|
There was a problem hiding this comment.
The sync manifest references workflow files that don't exist in the source repository. The manifest lists agents-capability-check.yml, agents-decompose.yml, and agents-dedup.yml at lines 75, 78, and 81, but these files only exist in templates/consumer-repo/.github/workflows/ and not in .github/workflows/. The sync workflow will fail when trying to sync these files to consumer repos. Either create these workflows in .github/workflows/ or remove them from the sync manifest until they're ready.
| - source: .github/workflows/agents-capability-check.yml | |
| description: "Capability check - pre-flight agent feasibility gate (Phase 3A)" | |
| - source: .github/workflows/agents-decompose.yml | |
| description: "Task decomposition - breaks large issues into sub-tasks (Phase 3B)" | |
| - source: .github/workflows/agents-dedup.yml | |
| description: "Duplicate detection - flags similar open issues (Phase 3C)" |
Automated Status SummaryHead SHA: c86af03
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeNo scope information available Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #653 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
- PR #653 merged, sync workflow completed - All 7 consumer repos have sync PRs open (need manual merge) - All 7 repos have required labels (verify:*, agents:decompose, needs-human) - No substantive bot code review comments on sync PRs - Added sync PR links for manual merging
Summary
This PR implements Phase 3 of the LangChain post-code rollout with:
New Workflows Created
agents-capability-check.ymlagent:codexlabel on issueagents-decompose.ymlagents:decomposelabelagents-dedup.ymlagents-auto-label.ymlagents-verify-to-issue.ymlverify:create-issueon merged PRInvestigation Completed
Investigated
verify:compareon Trend_Model_Project PR #4249:verify:*labels missing in most repos (only Travel-Plan-Permission has them)scripts/create_verifier_labels.py --executeon consumer reposDeployment Verification Plan
Added comprehensive cross-repo verification plan:
Files Changed
templates/consumer-repo/.github/workflows/agents-*.yml- 5 new workflows.github/sync-manifest.yml- Added new workflow entriesdocs/plans/langchain-post-code-rollout.md- Updated status, added deployment plantests/scripts/test_pr_verifier_fallback.py- Test for PR verifier fallback behaviorNext Steps
After merge:
verify:*labels in repos that need them