Add Phase 3 integration plan with testing cycle by stranske · Pull Request #645 · stranske/Workflows

stranske · 2026-01-07T17:20:27Z

Source: Issue #123

Automated Status Summary

Scope

After merging PR #103 (multi-agent routing infrastructure), we need to:

Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
Streamline the Automated Status Summary to reduce clutter when using CLI agents
Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Context for Agent

Design Decisions & Constraints

1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
The keepalive loop now:
|  | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
|  | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
|  | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
The goal: For CLI agents (agent:* label), we should have exactly one updating comment () instead of accumulating 10+ comments per PR.
Requires PR #103 to be merged first
This round you MUST:
Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.

Related Issues/PRs

References

https://github.com/stranske/Workflows/compare/main...codex/issue-123?expand=1

Blockers & Dependencies

After merging PR #103 (multi-agent routing infrastructure), we need to:
1. Mark a task checkbox complete ONLY after verifying the implementation works.

Tasks

Pipeline Validation

After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
Verify task appendix appears in Codex prompt (check workflow logs)
Verify Codex works on actual tasks (not random infrastructure work)
Verify keepalive comment updates with iteration progress

GITHUB_STEP_SUMMARY

Add step summary output to agents-keepalive-loop.yml after agent run
Include: iteration number, tasks completed, files changed, outcome
Ensure summary is visible in workflow run UI

Conditional Status Summary

Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
Keep Scope/Tasks/Acceptance checkboxes for all cases
Pass agent type from workflow to the update_body job

Comment Pattern Cleanup

Acceptance criteria

CLI agent receives explicit tasks in prompt and works on them
Iteration results visible in Actions workflow run summary
PR body shows checkboxes but not workflow clutter when using CLI agents
UI Codex path (no agent label) continues to show full status summary
CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
State tracking is consolidated in the summary comment, not scattered

Dependencies

- Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first

Head SHA: 6e5c4c2
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job	Result	Logs
Agents PR meta manager	❔ in progress	View run
CI Autofix Loop	✅ success	View run
Gate	✅ success	View run
Health 40 Sweep	✅ success	View run
Health 44 Gate Branch Protection	✅ success	View run
Health 45 Agents Guard	✅ success	View run
Health 50 Security Scan	✅ success	View run
Keepalive E2E	❔ startup failure	View run
Maint 52 Validate Workflows	✅ success	View run
PR 11 - Minimal invariant CI	✅ success	View run
Selftest CI	✅ success	View run
Validate Sync Manifest	✅ success	View run

Phase 3: Pre-Agent Intelligence (4 capabilities) - 3A: Capability Check - supplements agents:optimize with feasibility gate - Runs BEFORE agent assignment on Issues (not after) - Adds needs-human label when agent cannot proceed - 3B: Task Decomposition - auto-split large issues - 3C: Duplicate Detection - comment-only mode, track false positives - 3D: Semantic Labeling - auto-suggest/apply labels Testing Plan: - Test repo: Manager-Database - ~11 test issues across 4 capabilities - False positive tracking for dedup (target: <5%) - Metrics dashboard for validation Also updates: - Mark Collab-Admin PR #113 as merged (7/7 repos now synced) - All immediate tasks completed - Phase 3 ready to begin

github-actions · 2026-01-07T17:21:07Z

Automated Status Summary

Head SHA: 16dd05a
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	92.21%
Baseline	85.00%
Delta	+7.21%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`scripts/workflow_health_check.py`	62.6%	28
`scripts/classify_test_failures.py`	62.9%	37
`scripts/ledger_validate.py`	65.3%	63
`scripts/mypy_return_autofix.py`	82.6%	11
`scripts/ledger_migrate_base.py`	85.5%	13
`scripts/fix_cosmetic_aggregate.py`	92.3%	1
`scripts/coverage_history_append.py`	92.8%	2
`scripts/workflow_validator.py`	93.3%	4
`scripts/update_autofix_expectations.py`	93.9%	1
`scripts/pr_metrics_tracker.py`	95.7%	3
`scripts/generate_residual_trend.py`	96.6%	1
`scripts/build_autofix_pr_comment.py`	97.0%	2
`scripts/aggregate_agent_metrics.py`	97.2%	0
`scripts/fix_numpy_asserts.py`	98.1%	0
`scripts/sync_test_dependencies.py`	98.3%	1

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

After merging PR #103 (multi-agent routing infrastructure), we need to:

Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
Streamline the Automated Status Summary to reduce clutter when using CLI agents
Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Context for Agent

Design Decisions & Constraints

1. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments
The keepalive loop now:
|  | github-actions[bot] | NEW: CLI agent iteration tracking | ✅ Keep for CLI agents |
|  | agents-workflows-bot[bot] | State tracking | ⚠️ Multiple copies accumulate |
|  | stranske | OLD: Instruction comment | ❌ CLI agents dont need this |
The goal: For CLI agents (agent:* label), we should have exactly one updating comment () instead of accumulating 10+ comments per PR.
Requires PR #103 to be merged first
This round you MUST:
Review the Scope/Tasks/Acceptance below, identify the next incomplete task that requires code, implement it, then post a reply comment with the completed items using their exact original text.

Related Issues/PRs

References

https://github.com/stranske/Workflows/compare/main...codex/issue-123?expand=1

Blockers & Dependencies

After merging PR #103 (multi-agent routing infrastructure), we need to:
1. Mark a task checkbox complete ONLY after verifying the implementation works.

Tasks

Pipeline Validation

After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
Verify task appendix appears in Codex prompt (check workflow logs)
Verify Codex works on actual tasks (not random infrastructure work)
Verify keepalive comment updates with iteration progress

GITHUB_STEP_SUMMARY

Add step summary output to agents-keepalive-loop.yml after agent run
Include: iteration number, tasks completed, files changed, outcome
Ensure summary is visible in workflow run UI

Conditional Status Summary

Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
Keep Scope/Tasks/Acceptance checkboxes for all cases
Pass agent type from workflow to the update_body job

Comment Pattern Cleanup

Acceptance criteria

CLI agent receives explicit tasks in prompt and works on them
Iteration results visible in Actions workflow run summary
PR body shows checkboxes but not workflow clutter when using CLI agents
UI Codex path (no agent label) continues to show full status summary
CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
State tracking is consolidated in the summary comment, not scattered

Dependencies

- Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first

github-actions · 2026-01-07T17:21:32Z

🤖 Keepalive Loop Status

PR #645 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/28 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

Copilot

Pull request overview

This PR adds a comprehensive Phase 3 rollout plan for integrating 5 unused LangChain scripts into the production workflow, focusing on pre-agent intelligence capabilities. It also updates the status of all consumer repository syncs, marking Collab-Admin PR #113 as merged to complete the 7/7 repo synchronization milestone.

Key changes:

Defines Phase 3 with 4 pre-agent intelligence capabilities (capability check, task decomposition, duplicate detection, and semantic labeling)
Establishes a detailed testing plan with ~11 test issues on Manager-Database repository, including metrics for false positive tracking
Updates deployment status from "5/6 repos synced" to "7/7 repos synced" throughout the document, marking Collab-Admin PR #113 as merged

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Created 3 test issues: - #193: Stripe integration (should FAIL capability check) - #194: Health monitoring (should trigger task decomposition) - #196: Manager list API (should detect as duplicate of #133) Updated testing metrics dashboard to track progress.

Phase 4 includes 5 initiatives: - 4A: Label Cleanup - Remove bloat labels, standardize across 7 repos - 4B: User Guide - Operational documentation for label system (sync to consumers) - 4C: Auto-Pilot Label - End-to-end issue-to-merged-PR automation - 4D: Conflict Resolution - Automated merge conflict handling in keepalive - 4E: Verify-to-Issue - Create follow-up issues from verification feedback Key decisions: - Auto-pilot uses workflow_dispatch between steps (not chained labels) - Conflict detection added to keepalive loop (not separate workflow) - Verify-to-issue is user-triggered (not automatic, avoids false positives) Also identifies 7 additional automation opportunities for future phases. Testing plan defined for Manager-Database.

Label Analysis Corrections: - agents:pause/paused ARE functional (keepalive_gate.js, keepalive-runner.js) - agents:activated IS functional (agents_pr_meta_keepalive.js) - from:codex/copilot ARE functional (merge_manager.js) - automerge IS functional (merge_manager.js, agents_belt_scan.js) - agents (bare) IS functional (agent_task.yml template) - risk:low, ci:green, codex-ready ARE functional (merge_manager.js, issue templates) Only 5-6 labels confirmed as bloat: - codex (bare) - redundant with agent:codex - ai:agent - zero matches - auto-merge-audit - zero matches - automerge:ok - zero matches - architecture, backend, cli, etc. - repo-specific, not synced Phase 5 Analysis: - 5A: Auto-labeling - label_matcher.py EXISTS, ready for workflow - 5B: Coverage check - maint-coverage-guard.yml EXISTS, add soft PR check - 5C: Stale PR cleanup - not needed - 5D: Dependabot - partial (auto-label exists, add auto-merge) - 5E: Issue lint - soft warning approach - 5F: Cross-repo linking - weekly scan with semantic_matcher.py - 5G: Metrics - hybrid LangSmith (LLM) + custom (workflow)

Label consolidation: - Replace agents:pause with agents:paused in all source files - Update keepalive_gate.js PAUSE_LABEL constant - Update keepalive_orchestrator_gate_runner.js hardcoded check - Update test to use agents:paused - Update documentation in README, CLAUDE.md, GoalsAndPlumbing.md Phase 4 updates: - 4A: Add idiosyncratic repo bloat cleanup strategy (per-repo audit) - 4B: Add optional issue creation feature to user guide (deferred) - 4D: Full conflict resolution implementation with code examples - 4E: Complete verify-to-issue workflow implementation Phase 5 updates: - 5F: Marked as SKIPPED (not needed per user decision) - 5G: Full LangSmith integration plan + custom metrics All keepalive tests pass (8/8).

Copilot AI review requested due to automatic review settings January 7, 2026 17:20

stranske temporarily deployed to agent-standard January 7, 2026 17:20 — with GitHub Actions Inactive

Copilot started reviewing on behalf of stranske January 7, 2026 17:21 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

stranske temporarily deployed to agent-standard January 7, 2026 17:28 — with GitHub Actions Inactive

stranske temporarily deployed to agent-standard January 7, 2026 17:44 — with GitHub Actions Inactive

stranske temporarily deployed to agent-standard January 7, 2026 18:02 — with GitHub Actions Inactive

stranske temporarily deployed to agent-standard January 7, 2026 18:18 — with GitHub Actions Inactive

Merge branch 'main' into phase3-planning

6e5c4c2

stranske temporarily deployed to agent-standard January 7, 2026 18:24 — with GitHub Actions Inactive

stranske merged commit d4e89a0 into main Jan 7, 2026
310 checks passed

stranske deleted the phase3-planning branch January 7, 2026 18:33

stranske mentioned this pull request Jan 7, 2026

feat: Implement Phase 4-5 automation features #650

Merged

45 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Phase 3 integration plan with testing cycle#645

Add Phase 3 integration plan with testing cycle#645
stranske merged 6 commits intomainfrom
phase3-planning

stranske commented Jan 7, 2026 •

edited by agents-workflows-bot bot

Loading

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Jan 7, 2026 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Scope

Context for Agent

Design Decisions & Constraints

Related Issues/PRs

References

Blockers & Dependencies

Tasks

Pipeline Validation

GITHUB_STEP_SUMMARY

Conditional Status Summary

Comment Pattern Cleanup

Acceptance criteria

Dependencies

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Context for Agent

Design Decisions & Constraints

Related Issues/PRs

References

Blockers & Dependencies

Tasks

Pipeline Validation

GITHUB_STEP_SUMMARY

Conditional Status Summary

Comment Pattern Cleanup

Acceptance criteria

Dependencies

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stranske commented Jan 7, 2026 •

edited by agents-workflows-bot bot

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading