Skip to content

Add Phase 3 integration plan with testing cycle#645

Merged
stranske merged 6 commits intomainfrom
phase3-planning
Jan 7, 2026
Merged

Add Phase 3 integration plan with testing cycle#645
stranske merged 6 commits intomainfrom
phase3-planning

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Jan 7, 2026

Source: Issue #123

Automated Status Summary

Scope

After merging PR #103 (multi-agent routing infrastructure), we need to:

  1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
  2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
  3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
  4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Context for Agent

Design Decisions & Constraints

    1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
  • The keepalive loop now:
  • | <!-- keepalive-loop-summary --> | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
  • | <!-- keepalive-state:v1 --> | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
  • | <!-- keepalive-round: N --> | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
  • The goal: For CLI agents (agent:* label), we should have exactly one updating comment (<!-- keepalive-loop-summary -->) instead of accumulating 10+ comments per PR.
  • Requires PR #103 to be merged first
  • This round you MUST:
  • Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.

Related Issues/PRs

References

Blockers & Dependencies

  • After merging PR #103 (multi-agent routing infrastructure), we need to:
    1. Mark a task checkbox complete ONLY after verifying the implementation works.

Tasks

Pipeline Validation

  • After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
  • Verify task appendix appears in Codex prompt (check workflow logs)
  • Verify Codex works on actual tasks (not random infrastructure work)
  • Verify keepalive comment updates with iteration progress

GITHUB_STEP_SUMMARY

  • Add step summary output to agents-keepalive-loop.yml after agent run
  • Include: iteration number, tasks completed, files changed, outcome
  • Ensure summary is visible in workflow run UI

Conditional Status Summary

  • Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
  • When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
  • Keep Scope/Tasks/Acceptance checkboxes for all cases
  • Pass agent type from workflow to the update_body job

Comment Pattern Cleanup

  • For CLI agents (agent:* label):
  • Suppress <!-- gate-summary: --> comment posting (use step summary instead)
  • Suppress <!-- keepalive-round: N --> instruction comments (task appendix replaces this)
  • Update <!-- keepalive-loop-summary --> to be the single source of truth
  • Ensure state marker is embedded in the summary comment (not separate)
  • For UI Codex (no agent:* label):
  • Keep existing comment patterns (instruction comments, connector bot reports)
  • Keep <!-- gate-summary: --> comment
  • Add agent_type output to detect job so downstream workflows know the mode
  • Update agents-pr-meta.yml to conditionally skip gate summary for CLI agent PRs

Acceptance criteria

  • CLI agent receives explicit tasks in prompt and works on them
  • Iteration results visible in Actions workflow run summary
  • PR body shows checkboxes but not workflow clutter when using CLI agents
  • UI Codex path (no agent label) continues to show full status summary
  • CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
  • State tracking is consolidated in the summary comment, not scattered

Dependencies

Head SHA: 6e5c4c2
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run
CI Autofix Loop ✅ success View run
Gate ✅ success View run
Health 40 Sweep ✅ success View run
Health 44 Gate Branch Protection ✅ success View run
Health 45 Agents Guard ✅ success View run
Health 50 Security Scan ✅ success View run
Keepalive E2E ❔ startup failure View run
Maint 52 Validate Workflows ✅ success View run
PR 11 - Minimal invariant CI ✅ success View run
Selftest CI ✅ success View run
Validate Sync Manifest ✅ success View run

Phase 3: Pre-Agent Intelligence (4 capabilities)
- 3A: Capability Check - supplements agents:optimize with feasibility gate
  - Runs BEFORE agent assignment on Issues (not after)
  - Adds needs-human label when agent cannot proceed
- 3B: Task Decomposition - auto-split large issues
- 3C: Duplicate Detection - comment-only mode, track false positives
- 3D: Semantic Labeling - auto-suggest/apply labels

Testing Plan:
- Test repo: Manager-Database
- ~11 test issues across 4 capabilities
- False positive tracking for dedup (target: <5%)
- Metrics dashboard for validation

Also updates:
- Mark Collab-Admin PR #113 as merged (7/7 repos now synced)
- All immediate tasks completed
- Phase 3 ready to begin
Copilot AI review requested due to automatic review settings January 7, 2026 17:20
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 7, 2026

Automated Status Summary

Head SHA: 16dd05a
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 92.21%
Baseline 85.00%
Delta +7.21%
Minimum 70.00%
Status ✅ Pass

Top Coverage Hotspots (lowest coverage)

File Coverage Missing
scripts/workflow_health_check.py 62.6% 28
scripts/classify_test_failures.py 62.9% 37
scripts/ledger_validate.py 65.3% 63
scripts/mypy_return_autofix.py 82.6% 11
scripts/ledger_migrate_base.py 85.5% 13
scripts/fix_cosmetic_aggregate.py 92.3% 1
scripts/coverage_history_append.py 92.8% 2
scripts/workflow_validator.py 93.3% 4
scripts/update_autofix_expectations.py 93.9% 1
scripts/pr_metrics_tracker.py 95.7% 3
scripts/generate_residual_trend.py 96.6% 1
scripts/build_autofix_pr_comment.py 97.0% 2
scripts/aggregate_agent_metrics.py 97.2% 0
scripts/fix_numpy_asserts.py 98.1% 0
scripts/sync_test_dependencies.py 98.3% 1

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

After merging PR #103 (multi-agent routing infrastructure), we need to:

  1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
  2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
  3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
  4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Context for Agent

Design Decisions & Constraints

    1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
  • The keepalive loop now:
  • | <!-- keepalive-loop-summary --> | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
  • | <!-- keepalive-state:v1 --> | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
  • | <!-- keepalive-round: N --> | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
  • The goal: For CLI agents (agent:* label), we should have exactly one updating comment (<!-- keepalive-loop-summary -->) instead of accumulating 10+ comments per PR.
  • Requires PR #103 to be merged first
  • This round you MUST:
  • Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.

Related Issues/PRs

References

Blockers & Dependencies

  • After merging PR #103 (multi-agent routing infrastructure), we need to:
    1. Mark a task checkbox complete ONLY after verifying the implementation works.

Tasks

Pipeline Validation

  • After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
  • Verify task appendix appears in Codex prompt (check workflow logs)
  • Verify Codex works on actual tasks (not random infrastructure work)
  • Verify keepalive comment updates with iteration progress

GITHUB_STEP_SUMMARY

  • Add step summary output to agents-keepalive-loop.yml after agent run
  • Include: iteration number, tasks completed, files changed, outcome
  • Ensure summary is visible in workflow run UI

Conditional Status Summary

  • Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
  • When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
  • Keep Scope/Tasks/Acceptance checkboxes for all cases
  • Pass agent type from workflow to the update_body job

Comment Pattern Cleanup

  • For CLI agents (agent:* label):
  • Suppress <!-- gate-summary: --> comment posting (use step summary instead)
  • Suppress <!-- keepalive-round: N --> instruction comments (task appendix replaces this)
  • Update <!-- keepalive-loop-summary --> to be the single source of truth
  • Ensure state marker is embedded in the summary comment (not separate)
  • For UI Codex (no agent:* label):
  • Keep existing comment patterns (instruction comments, connector bot reports)
  • Keep <!-- gate-summary: --> comment
  • Add agent_type output to detect job so downstream workflows know the mode
  • Update agents-pr-meta.yml to conditionally skip gate summary for CLI agent PRs

Acceptance criteria

  • CLI agent receives explicit tasks in prompt and works on them
  • Iteration results visible in Actions workflow run summary
  • PR body shows checkboxes but not workflow clutter when using CLI agents
  • UI Codex path (no agent label) continues to show full status summary
  • CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
  • State tracking is consolidated in the summary comment, not scattered

Dependencies

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 7, 2026

🤖 Keepalive Loop Status

PR #645 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Disposition skipped (transient)
Gate success
Tasks 0/28 complete
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive Phase 3 rollout plan for integrating 5 unused LangChain scripts into the production workflow, focusing on pre-agent intelligence capabilities. It also updates the status of all consumer repository syncs, marking Collab-Admin PR #113 as merged to complete the 7/7 repo synchronization milestone.

Key changes:

  • Defines Phase 3 with 4 pre-agent intelligence capabilities (capability check, task decomposition, duplicate detection, and semantic labeling)
  • Establishes a detailed testing plan with ~11 test issues on Manager-Database repository, including metrics for false positive tracking
  • Updates deployment status from "5/6 repos synced" to "7/7 repos synced" throughout the document, marking Collab-Admin PR #113 as merged

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Created 3 test issues:
- #193: Stripe integration (should FAIL capability check)
- #194: Health monitoring (should trigger task decomposition)
- #196: Manager list API (should detect as duplicate of #133)

Updated testing metrics dashboard to track progress.
Phase 4 includes 5 initiatives:
- 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos
- 4B: User Guide - Operational documentation for label system (sync to consumers)
- 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation
- 4D: Conflict Resolution - Automated merge conflict handling in keepalive
- 4E: Verify-to-Issue - Create follow-up issues from verification feedback

Key decisions:
- Auto-pilot uses workflow_dispatch between steps (not chained labels)
- Conflict detection added to keepalive loop (not separate workflow)
- Verify-to-issue is user-triggered (not automatic, avoids false positives)

Also identifies 7 additional automation opportunities for future phases.

Testing plan defined for Manager-Database.
Label Analysis Corrections:
- agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js)
- agents:activated IS functional (agents_pr_meta_keepalive.js)
- from:codex/copilot ARE functional (merge_manager.js)
- automerge IS functional (merge_manager.js, agents_belt_scan.js)
- agents (bare) IS functional (agent_task.yml template)
- risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates)

Only 5-6 labels confirmed as bloat:
- codex (bare) - redundant with agent:codex
- ai:agent - zero matches
- auto-merge-audit - zero matches
- automerge:ok - zero matches
- architecture, backend, cli, etc. - repo-specific, not synced

Phase 5 Analysis:
- 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow
- 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check
- 5C: Stale PR cleanup - not needed
- 5D: Dependabot - partial (auto-label exists, add auto-merge)
- 5E: Issue lint - soft warning approach
- 5F: Cross-repo linking - weekly scan with semantic_matcher.py
- 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow)
Label consolidation:
- Replace agents:pause with agents:paused in all source files
- Update keepalive_gate.js PAUSE_LABEL constant
- Update keepalive_orchestrator_gate_runner.js hardcoded check
- Update test to use agents:paused
- Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md

Phase 4 updates:
- 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit)
- 4B: Add optional issue creation feature to user guide (deferred)
- 4D: Full conflict resolution implementation with code examples
- 4E: Complete verify-to-issue workflow implementation

Phase 5 updates:
- 5F: Marked as SKIPPED (not needed per user decision)
- 5G: Full LangSmith integration plan + custom metrics

All keepalive tests pass (8/8).
@stranske stranske merged commit d4e89a0 into main Jan 7, 2026
310 checks passed
@stranske stranske deleted the phase3-planning branch January 7, 2026 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants