Skip to content

feat: Implement Phase 4-5 automation features#650

Merged
stranske merged 10 commits intomainfrom
phase3-planning
Jan 7, 2026
Merged

feat: Implement Phase 4-5 automation features#650
stranske merged 10 commits intomainfrom
phase3-planning

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Jan 7, 2026

Source: Issue #645

Automated Status Summary

Scope

After merging PR #103 (multi-agent routing infrastructure), we need to:

  1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
  2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
  3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
  4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Context for Agent

Design Decisions & Constraints

    1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
  • The keepalive loop now:
  • | <!-- keepalive-loop-summary --> | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
  • | <!-- keepalive-state:v1 --> | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
  • | <!-- keepalive-round: N --> | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
  • The goal: For CLI agents (agent:* label), we should have exactly one updating comment (<!-- keepalive-loop-summary -->) instead of accumulating 10+ comments per PR.
  • Requires PR #103 to be merged first
  • This round you MUST:
  • Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.

Related Issues/PRs

References

Blockers & Dependencies

  • After merging PR #103 (multi-agent routing infrastructure), we need to:
    1. Mark a task checkbox complete ONLY after verifying the implementation works.

Context for Agent

Design Decisions & Constraints

    1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
  • The keepalive loop now:
  • | <!-- keepalive-loop-summary --> | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
  • | <!-- keepalive-state:v1 --> | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
  • | <!-- keepalive-round: N --> | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
  • The goal: For CLI agents (agent:* label), we should have exactly one updating comment (<!-- keepalive-loop-summary -->) instead of accumulating 10+ comments per PR.
  • Requires PR #103 to be merged first
  • This round you MUST:
  • Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.
  • | Keepalive E2E | ❔ startup failure | View run |
  • | Keepalive | ❌ disabled |

Related Issues/PRs

References

Blockers & Dependencies

  • After merging PR #103 (multi-agent routing infrastructure), we need to:
  • After merging PR #103 (multi-agent routing infrastructure), we need to:
    1. Mark a task checkbox complete ONLY after verifying the implementation works.

Tasks

Pipeline Validation

  • After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
  • Verify task appendix appears in Codex prompt (check workflow logs)
  • Verify Codex works on actual tasks (not random infrastructure work)
  • Verify keepalive comment updates with iteration progress

GITHUB_STEP_SUMMARY

  • Add step summary output to agents-keepalive-loop.yml after agent run
  • Include: iteration number, tasks completed, files changed, outcome
  • Ensure summary is visible in workflow run UI

Conditional Status Summary

  • Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
  • When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
  • Keep Scope/Tasks/Acceptance checkboxes for all cases
  • Pass agent type from workflow to the update_body job

Comment Pattern Cleanup

  • For CLI agents (agent:* label):
  • Suppress <!-- gate-summary: --> comment posting (use step summary instead)
  • Suppress <!-- keepalive-round: N --> instruction comments (task appendix replaces this)
  • Update <!-- keepalive-loop-summary --> to be the single source of truth
  • Ensure state marker is embedded in the summary comment (not separate)
  • For UI Codex (no agent:* label):
  • Keep existing comment patterns (instruction comments, connector bot reports)
  • Keep <!-- gate-summary: --> comment
  • Add agent_type output to detect job so downstream workflows know the mode
  • Update agents-pr-meta.yml to conditionally skip gate summary for CLI agent PRs

Acceptance criteria

  • CLI agent receives explicit tasks in prompt and works on them
  • Iteration results visible in Actions workflow run summary
  • PR body shows checkboxes but not workflow clutter when using CLI agents
  • UI Codex path (no agent label) continues to show full status summary
  • CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
  • State tracking is consolidated in the summary comment, not scattered

Dependencies

  • - Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first
  • Head SHA: 6e5c4c2
  • Latest Runs: ✅ success — Gate
  • Required: gate: ✅ success
  • | Workflow / Job | Result | Logs |
  • |----------------|--------|------|
  • | Agents PR meta manager | ❔ in progress | View run |
  • | CI Autofix Loop | ✅ success | View run |
  • | Gate | ✅ success | View run |
  • | Health 40 Sweep | ✅ success | View run |
  • | Health 44 Gate Branch Protection | ✅ success | View run |
  • | Health 45 Agents Guard | ✅ success | View run |
  • | Health 50 Security Scan | ✅ success | View run |
  • | Keepalive E2E | ❔ startup failure | View run |
  • | Maint 52 Validate Workflows | ✅ success | View run |
  • | PR 11 - Minimal invariant CI | ✅ success | View run |
  • | Selftest CI | ✅ success | View run |
  • | Validate Sync Manifest | ✅ success | View run |

Head SHA: 0d33c42
Latest Runs: ❔ in progress — Agents PR meta manager
Required: gate: ⏸️ not started

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run

Phase 3: Pre-Agent Intelligence (4 capabilities)
- 3A: Capability Check - supplements agents:optimize with feasibility gate
  - Runs BEFORE agent assignment on Issues (not after)
  - Adds needs-human label when agent cannot proceed
- 3B: Task Decomposition - auto-split large issues
- 3C: Duplicate Detection - comment-only mode, track false positives
- 3D: Semantic Labeling - auto-suggest/apply labels

Testing Plan:
- Test repo: Manager-Database
- ~11 test issues across 4 capabilities
- False positive tracking for dedup (target: <5%)
- Metrics dashboard for validation

Also updates:
- Mark Collab-Admin PR #113 as merged (7/7 repos now synced)
- All immediate tasks completed
- Phase 3 ready to begin
Created 3 test issues:
- #193: Stripe integration (should FAIL capability check)
- #194: Health monitoring (should trigger task decomposition)
- #196: Manager list API (should detect as duplicate of #133)

Updated testing metrics dashboard to track progress.
Phase 4 includes 5 initiatives:
- 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos
- 4B: User Guide - Operational documentation for label system (sync to consumers)
- 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation
- 4D: Conflict Resolution - Automated merge conflict handling in keepalive
- 4E: Verify-to-Issue - Create follow-up issues from verification feedback

Key decisions:
- Auto-pilot uses workflow_dispatch between steps (not chained labels)
- Conflict detection added to keepalive loop (not separate workflow)
- Verify-to-issue is user-triggered (not automatic, avoids false positives)

Also identifies 7 additional automation opportunities for future phases.

Testing plan defined for Manager-Database.
Label Analysis Corrections:
- agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js)
- agents:activated IS functional (agents_pr_meta_keepalive.js)
- from:codex/copilot ARE functional (merge_manager.js)
- automerge IS functional (merge_manager.js, agents_belt_scan.js)
- agents (bare) IS functional (agent_task.yml template)
- risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates)

Only 5-6 labels confirmed as bloat:
- codex (bare) - redundant with agent:codex
- ai:agent - zero matches
- auto-merge-audit - zero matches
- automerge:ok - zero matches
- architecture, backend, cli, etc. - repo-specific, not synced

Phase 5 Analysis:
- 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow
- 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check
- 5C: Stale PR cleanup - not needed
- 5D: Dependabot - partial (auto-label exists, add auto-merge)
- 5E: Issue lint - soft warning approach
- 5F: Cross-repo linking - weekly scan with semantic_matcher.py
- 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow)
Label consolidation:
- Replace agents:pause with agents:paused in all source files
- Update keepalive_gate.js PAUSE_LABEL constant
- Update keepalive_orchestrator_gate_runner.js hardcoded check
- Update test to use agents:paused
- Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md

Phase 4 updates:
- 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit)
- 4B: Add optional issue creation feature to user guide (deferred)
- 4D: Full conflict resolution implementation with code examples
- 4E: Complete verify-to-issue workflow implementation

Phase 5 updates:
- 5F: Marked as SKIPPED (not needed per user decision)
- 5G: Full LangSmith integration plan + custom metrics

All keepalive tests pass (8/8).
Phase 4 implementations:
- 4A: Add scripts/cleanup_labels.py for label auditing
  - Classifies labels as functional/bloat/idiosyncratic
  - Requires --confirm flag for actual deletion
  - Reports audit results with recommendations

- 4D: Add conflict detection for keepalive pipeline
  - .github/scripts/conflict_detector.js module
  - Detects conflicts from GitHub API, CI logs, PR comments
  - templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md

- 4E: Add agents-verify-to-issue.yml workflow
  - Creates follow-up issues from verification feedback
  - User-triggered via verify:create-issue label
  - Extracts concerns and low scores automatically

Phase 5 implementations:
- 5A: Add agents-auto-label.yml workflow
  - Semantic label matching for new issues
  - 90% threshold for auto-apply, 75% for suggestions
  - Uses existing label_matcher.py script

- 5G: Add LangSmith tracing to tools/llm_provider.py
  - _setup_langsmith_tracing() function
  - Auto-configures when LANGSMITH_API_KEY present

Also:
- Update .github/sync-manifest.yml with new sync entries
- Update docs/LABELS.md with new label documentation
Copilot AI review requested due to automatic review settings January 7, 2026 19:27
@stranske stranske temporarily deployed to agent-high-privilege January 7, 2026 19:27 — with GitHub Actions Inactive
@github-actions github-actions bot added the autofix Opt-in automated formatting & lint remediation label Jan 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 7, 2026

Status | ✅ no new diagnostics
History points | 1
Timestamp | 2026-01-07 21:33:48 UTC
Report artifact | autofix-report-pr-650
Remaining | 0
New | 0
No additional artifacts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 7, 2026

Automated Status Summary

Head SHA: c4f5ff9
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 92.21%
Baseline 85.00%
Delta +7.21%
Minimum 70.00%
Status ✅ Pass

Top Coverage Hotspots (lowest coverage)

File Coverage Missing
scripts/workflow_health_check.py 62.6% 28
scripts/classify_test_failures.py 62.9% 37
scripts/ledger_validate.py 65.3% 63
scripts/mypy_return_autofix.py 82.6% 11
scripts/ledger_migrate_base.py 85.5% 13
scripts/fix_cosmetic_aggregate.py 92.3% 1
scripts/coverage_history_append.py 92.8% 2
scripts/workflow_validator.py 93.3% 4
scripts/update_autofix_expectations.py 93.9% 1
scripts/pr_metrics_tracker.py 95.7% 3
scripts/generate_residual_trend.py 96.6% 1
scripts/build_autofix_pr_comment.py 97.0% 2
scripts/aggregate_agent_metrics.py 97.2% 0
scripts/fix_numpy_asserts.py 98.1% 0
scripts/sync_test_dependencies.py 98.3% 1

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

After merging PR #103 (multi-agent routing infrastructure), we need to:

  1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
  2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
  3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
  4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Context for Agent

Design Decisions & Constraints

    1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
  • The keepalive loop now:
  • | <!-- keepalive-loop-summary --> | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
  • | <!-- keepalive-state:v1 --> | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
  • | <!-- keepalive-round: N --> | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
  • The goal: For CLI agents (agent:* label), we should have exactly one updating comment (<!-- keepalive-loop-summary -->) instead of accumulating 10+ comments per PR.
  • Requires PR #103 to be merged first
  • This round you MUST:
  • Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.

Related Issues/PRs

References

Blockers & Dependencies

  • After merging PR #103 (multi-agent routing infrastructure), we need to:
    1. Mark a task checkbox complete ONLY after verifying the implementation works.

Context for Agent

Design Decisions & Constraints

    1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
  • The keepalive loop now:
  • | <!-- keepalive-loop-summary --> | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
  • | <!-- keepalive-state:v1 --> | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
  • | <!-- keepalive-round: N --> | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
  • The goal: For CLI agents (agent:* label), we should have exactly one updating comment (<!-- keepalive-loop-summary -->) instead of accumulating 10+ comments per PR.
  • Requires PR #103 to be merged first
  • This round you MUST:
  • Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.
  • | Keepalive E2E | ❔ startup failure | View run |
  • | Keepalive | ❌ disabled |

Related Issues/PRs

References

Blockers & Dependencies

  • After merging PR #103 (multi-agent routing infrastructure), we need to:
  • After merging PR #103 (multi-agent routing infrastructure), we need to:
    1. Mark a task checkbox complete ONLY after verifying the implementation works.

Tasks

Pipeline Validation

  • After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
  • Verify task appendix appears in Codex prompt (check workflow logs)
  • Verify Codex works on actual tasks (not random infrastructure work)
  • Verify keepalive comment updates with iteration progress

GITHUB_STEP_SUMMARY

  • Add step summary output to agents-keepalive-loop.yml after agent run
  • Include: iteration number, tasks completed, files changed, outcome
  • Ensure summary is visible in workflow run UI

Conditional Status Summary

  • Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
  • When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
  • Keep Scope/Tasks/Acceptance checkboxes for all cases
  • Pass agent type from workflow to the update_body job

Comment Pattern Cleanup

  • For CLI agents (agent:* label):
  • Suppress <!-- gate-summary: --> comment posting (use step summary instead)
  • Suppress <!-- keepalive-round: N --> instruction comments (task appendix replaces this)
  • Update <!-- keepalive-loop-summary --> to be the single source of truth
  • Ensure state marker is embedded in the summary comment (not separate)
  • For UI Codex (no agent:* label):
  • Keep existing comment patterns (instruction comments, connector bot reports)
  • Keep <!-- gate-summary: --> comment
  • Add agent_type output to detect job so downstream workflows know the mode
  • Update agents-pr-meta.yml to conditionally skip gate summary for CLI agent PRs

Acceptance criteria

  • CLI agent receives explicit tasks in prompt and works on them
  • Iteration results visible in Actions workflow run summary
  • PR body shows checkboxes but not workflow clutter when using CLI agents
  • UI Codex path (no agent label) continues to show full status summary
  • CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
  • State tracking is consolidated in the summary comment, not scattered

Dependencies

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 7, 2026

🤖 Keepalive Loop Status

PR #650 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Disposition skipped (transient)
Gate success
Tasks 0/45 complete
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Phase 4-5 automation features for the workflow agent system, focusing on label management, conflict detection, and automated issue creation from verification feedback. The changes introduce several new scripts and workflows while consolidating the pause label naming convention from agents:pause to agents:paused.

Key Changes:

  • New LangSmith tracing integration for LLM operation monitoring
  • Label cleanup utility to remove bloat labels across consumer repositories
  • Conflict detection module and resolution prompt for automated merge conflict handling
  • User-triggered workflow to create follow-up issues from verification feedback
  • Semantic auto-labeling workflow for new issues

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tools/llm_provider.py Adds LangSmith tracing configuration with environment variable setup
scripts/cleanup_labels.py Label audit and cleanup utility identifying functional, informational, bloat, and idiosyncratic labels
.github/scripts/conflict_detector.js Conflict detection module checking GitHub API, CI logs, and PR comments for merge conflicts
.github/workflows/agents-verify-to-issue.yml Workflow to create follow-up issues from verification feedback when user adds trigger label
.github/workflows/agents-auto-label.yml Semantic label matching workflow that auto-applies high-confidence labels and suggests others
templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md Comprehensive prompt template guiding agents through merge conflict resolution
templates/consumer-repo/README.md Updates all references from agents:pause to agents:paused
.github/scripts/keepalive_gate.js Updates pause label constant to use consolidated agents:paused
.github/scripts/keepalive_orchestrator_gate_runner.js Updates pause label check to use agents:paused
.github/scripts/__tests__/keepalive-orchestrator-gate-runner.test.js Updates test to use agents:paused label
docs/LABELS.md Documents new labels: verify:create-issue, agents:paused, agents:keepalive, follow-up, needs-formatting
docs/keepalive/GoalsAndPlumbing.md Updates documentation to reference agents:paused
CLAUDE.md Updates agent instructions to check for agents:paused label
.github/sync-manifest.yml Adds new workflows, prompts, and scripts to sync manifest for consumer repos
docs/plans/langchain-post-code-rollout.md Extensive planning updates documenting Phase 4-5 implementation details

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Code quality improvements based on automated code review:

1. tools/llm_provider.py:
   - Fix LangSmith API key env var (LANGSMITH_API_KEY vs LANGCHAIN_API_KEY)
   - Improve f-string formatting for logging
   - Add usage comment for LANGSMITH_ENABLED constant

2. .github/scripts/conflict_detector.js:
   - Add debug logging in catch blocks instead of silent failures
   - Makes debugging easier when log downloads fail

3. .github/workflows/agents-verify-to-issue.yml:
   - Replace /tmp file usage with GitHub Actions environment files
   - Use heredoc delimiter for multi-line output
   - Consolidate find and extract steps for cleaner flow

4. .github/workflows/agents-auto-label.yml:
   - Make Workflows repo checkout configurable (not hardcoded)
   - Use github.paginate() for label retrieval (handles >100 labels)

5. templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md:
   - Replace hardcoded 'main' with {{base_branch}} template variable
   - Make verification steps language-agnostic (not Python-specific)
   - Add note about checking project README for test commands
@stranske stranske temporarily deployed to agent-high-privilege January 7, 2026 20:37 — with GitHub Actions Inactive
@stranske stranske temporarily deployed to agent-high-privilege January 7, 2026 20:39 — with GitHub Actions Inactive
1. Fix test_integration_template_installs_and_tests
   - The test used --user pip install flag which fails in virtualenvs
   - Added _in_virtualenv() helper to detect virtualenv environment
   - Only use --user flag when NOT in a virtualenv

2. Add new workflows to expected names mapping
   - agents-auto-label.yml: 'Auto-Label Issues'
   - agents-verify-to-issue.yml: 'Create Issue from Verification'

3. Update workflow documentation
   - docs/ci/WORKFLOWS.md: Added bullet points for new workflows
   - docs/ci/WORKFLOW_SYSTEM.md: Added table rows for new workflows

All 1120 tests now pass.
@stranske stranske temporarily deployed to agent-high-privilege January 7, 2026 20:55 — with GitHub Actions Inactive
actionlint was failing because the Match labels step had two env blocks.
Merged ISSUE_TITLE and ISSUE_BODY into the main env block.
@stranske stranske temporarily deployed to agent-high-privilege January 7, 2026 21:32 — with GitHub Actions Inactive
@stranske stranske merged commit 1a15b48 into main Jan 7, 2026
940 checks passed
@stranske stranske deleted the phase3-planning branch January 7, 2026 22:03
stranske added a commit that referenced this pull request Jan 8, 2026
…#653)

* Add Phase 3 integration plan with testing cycle for Manager-Database

Phase 3: Pre-Agent Intelligence (4 capabilities)
- 3A: Capability Check - supplements agents:optimize with feasibility gate
  - Runs BEFORE agent assignment on Issues (not after)
  - Adds needs-human label when agent cannot proceed
- 3B: Task Decomposition - auto-split large issues
- 3C: Duplicate Detection - comment-only mode, track false positives
- 3D: Semantic Labeling - auto-suggest/apply labels

Testing Plan:
- Test repo: Manager-Database
- ~11 test issues across 4 capabilities
- False positive tracking for dedup (target: <5%)
- Metrics dashboard for validation

Also updates:
- Mark Collab-Admin PR #113 as merged (7/7 repos now synced)
- All immediate tasks completed
- Phase 3 ready to begin

* Add Phase 3 test issues for Manager-Database

Created 3 test issues:
- #193: Stripe integration (should FAIL capability check)
- #194: Health monitoring (should trigger task decomposition)
- #196: Manager list API (should detect as duplicate of #133)

Updated testing metrics dashboard to track progress.

* Add Phase 4: Full Automation & Cleanup plan

Phase 4 includes 5 initiatives:
- 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos
- 4B: User Guide - Operational documentation for label system (sync to consumers)
- 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation
- 4D: Conflict Resolution - Automated merge conflict handling in keepalive
- 4E: Verify-to-Issue - Create follow-up issues from verification feedback

Key decisions:
- Auto-pilot uses workflow_dispatch between steps (not chained labels)
- Conflict detection added to keepalive loop (not separate workflow)
- Verify-to-issue is user-triggered (not automatic, avoids false positives)

Also identifies 7 additional automation opportunities for future phases.

Testing plan defined for Manager-Database.

* Correct label analysis after codebase search + expand Phase 5

Label Analysis Corrections:
- agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js)
- agents:activated IS functional (agents_pr_meta_keepalive.js)
- from:codex/copilot ARE functional (merge_manager.js)
- automerge IS functional (merge_manager.js, agents_belt_scan.js)
- agents (bare) IS functional (agent_task.yml template)
- risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates)

Only 5-6 labels confirmed as bloat:
- codex (bare) - redundant with agent:codex
- ai:agent - zero matches
- auto-merge-audit - zero matches
- automerge:ok - zero matches
- architecture, backend, cli, etc. - repo-specific, not synced

Phase 5 Analysis:
- 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow
- 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check
- 5C: Stale PR cleanup - not needed
- 5D: Dependabot - partial (auto-label exists, add auto-merge)
- 5E: Issue lint - soft warning approach
- 5F: Cross-repo linking - weekly scan with semantic_matcher.py
- 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow)

* Consolidate agents:pause to agents:paused + expand Phase 4-5 plans

Label consolidation:
- Replace agents:pause with agents:paused in all source files
- Update keepalive_gate.js PAUSE_LABEL constant
- Update keepalive_orchestrator_gate_runner.js hardcoded check
- Update test to use agents:paused
- Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md

Phase 4 updates:
- 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit)
- 4B: Add optional issue creation feature to user guide (deferred)
- 4D: Full conflict resolution implementation with code examples
- 4E: Complete verify-to-issue workflow implementation

Phase 5 updates:
- 5F: Marked as SKIPPED (not needed per user decision)
- 5G: Full LangSmith integration plan + custom metrics

All keepalive tests pass (8/8).

* feat: Implement Phase 4-5 automation features

Phase 4 implementations:
- 4A: Add scripts/cleanup_labels.py for label auditing
  - Classifies labels as functional/bloat/idiosyncratic
  - Requires --confirm flag for actual deletion
  - Reports audit results with recommendations

- 4D: Add conflict detection for keepalive pipeline
  - .github/scripts/conflict_detector.js module
  - Detects conflicts from GitHub API, CI logs, PR comments
  - templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md

- 4E: Add agents-verify-to-issue.yml workflow
  - Creates follow-up issues from verification feedback
  - User-triggered via verify:create-issue label
  - Extracts concerns and low scores automatically

Phase 5 implementations:
- 5A: Add agents-auto-label.yml workflow
  - Semantic label matching for new issues
  - 90% threshold for auto-apply, 75% for suggestions
  - Uses existing label_matcher.py script

- 5G: Add LangSmith tracing to tools/llm_provider.py
  - _setup_langsmith_tracing() function
  - Auto-configures when LANGSMITH_API_KEY present

Also:
- Update .github/sync-manifest.yml with new sync entries
- Update docs/LABELS.md with new label documentation

* fix: Address Copilot review comments on PR #650

Code quality improvements based on automated code review:

1. tools/llm_provider.py:
   - Fix LangSmith API key env var (LANGSMITH_API_KEY vs LANGCHAIN_API_KEY)
   - Improve f-string formatting for logging
   - Add usage comment for LANGSMITH_ENABLED constant

2. .github/scripts/conflict_detector.js:
   - Add debug logging in catch blocks instead of silent failures
   - Makes debugging easier when log downloads fail

3. .github/workflows/agents-verify-to-issue.yml:
   - Replace /tmp file usage with GitHub Actions environment files
   - Use heredoc delimiter for multi-line output
   - Consolidate find and extract steps for cleaner flow

4. .github/workflows/agents-auto-label.yml:
   - Make Workflows repo checkout configurable (not hardcoded)
   - Use github.paginate() for label retrieval (handles >100 labels)

5. templates/consumer-repo/.github/codex/prompts/fix_merge_conflicts.md:
   - Replace hardcoded 'main' with {{base_branch}} template variable
   - Make verification steps language-agnostic (not Python-specific)
   - Add note about checking project README for test commands

* fix: Fix CI test failures

1. Fix test_integration_template_installs_and_tests
   - The test used --user pip install flag which fails in virtualenvs
   - Added _in_virtualenv() helper to detect virtualenv environment
   - Only use --user flag when NOT in a virtualenv

2. Add new workflows to expected names mapping
   - agents-auto-label.yml: 'Auto-Label Issues'
   - agents-verify-to-issue.yml: 'Create Issue from Verification'

3. Update workflow documentation
   - docs/ci/WORKFLOWS.md: Added bullet points for new workflows
   - docs/ci/WORKFLOW_SYSTEM.md: Added table rows for new workflows

All 1120 tests now pass.

* fix: Remove duplicate env key in agents-auto-label.yml

actionlint was failing because the Match labels step had two env blocks.
Merged ISSUE_TITLE and ISSUE_BODY into the main env block.

* feat: Add Phase 3 workflows and sync configuration

Phase 3 Pre-Agent Intelligence workflows:
- agents-capability-check.yml: Pre-flight agent feasibility gate
- agents-decompose.yml: Task decomposition for large issues
- agents-dedup.yml: Duplicate detection using embeddings
- agents-auto-label.yml: Semantic label matching

Also includes:
- agents-verify-to-issue.yml: Create follow-up issues from verification (Phase 4E)
- Updated sync-manifest.yml with all new workflow entries
- pr_verifier.py: Auth error fallback for LLM provider resilience
- Tests for fallback behavior

All Phase 3 scripts have 129 tests passing.

* docs: Add comprehensive Phase 3 testing plan

- Mark all Phase 3 implementation tasks as complete
- Add detailed test suite with 12 specific test cases:
  - Suite A: Capability Check (3 tests)
  - Suite B: Task Decomposition (3 tests)
  - Suite C: Duplicate Detection (4 tests)
  - Suite D: Auto-Label (2 tests)
- Include pre-testing checklist and execution tracking table
- Add rollback plan and success criteria
- Include sample issue bodies for reproducible tests

* docs: Add deployment verification plan for cross-repo testing

Addresses known issue: verify:compare works on Travel-Plan-Permission
but fails on Trend_Model_Project PR #4249.

New deployment verification plan includes:
- Phase 1: Sync deployment tracking across all 7 repos
- Phase 2: Existing workflow verification (investigate failures)
- Phase 3: New workflow verification with specific test cases
- Phase 4: Troubleshooting guide for common issues
- Cross-repo verification summary with minimum pass criteria

Separates deployment verification from functional regression testing.

* docs: Resolve verify:compare investigation - PR not merged (expected behavior)

Investigation findings for Trend_Model_Project PR #4249:
- Root cause: PR is OPEN, not merged
- Verifier correctly skipped (designed for merged PRs only)
- verify:* labels missing in most repos (only Travel-Plan-Permission has them)
- Added label prerequisite checklist to deployment plan
- Updated verification summary with resolved status
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autofix Opt-in automated formatting & lint remediation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants