Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 119 additions & 88 deletions docs/plans/langchain-post-code-rollout.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# LangChain Post-Code Production Capabilities - Evaluation & Rollout Plan

> **Date:** January 8, 2026
> **Status:** Phase 3 Workflows Created, Pending Consumer Sync
> **Last Validation:** 2026-01-08 (Thorough audit - all workflows created)
> **Date:** January 9, 2026
> **Status:** Phase 3 Deployed to All Consumer Repos + Conflict Resolution Active
> **Last Validation:** 2026-01-09 (Full sync verified, conflict resolution working)

---

Expand Down Expand Up @@ -347,34 +347,31 @@

---

### Phase 1: Sync Deployment (All 7 Repos)

After merging to main, verify sync creates PRs in all consumer repos.

| Repo | Sync PR # | Sync PR Status | New Workflows Present | Notes |
|------|-----------|----------------|----------------------|-------|
| Manager-Database | - | ⏳ | - | Primary test repo |
| Template | - | ⏳ | - | |
| trip-planner | - | ⏳ | - | |
| Travel-Plan-Permission | - | ⏳ | - | Verify workflows working, has labels ✅ |
| Portable-Alpha-Extension-Model | - | ⏳ | - | |
| Trend_Model_Project | - | ⏳ | - | Needs verify labels created |
| Collab-Admin | - | ⏳ | - | |

**Checklist:**
- [ ] Merge Phase 3 PR to main in Workflows repo
- [ ] Verify sync workflow triggered (Actions tab)
- [ ] Check each consumer repo for sync PR
- [ ] Review sync PR for correct workflow files:
- `agents-capability-check.yml`
- `agents-decompose.yml`
- `agents-dedup.yml`
- `agents-auto-label.yml`
- `agents-verify-to-issue.yml`
- [ ] Review bot comments on sync PRs for code issues
- [ ] Merge sync PRs to each consumer repo
- [ ] Create `verify:*` labels in repos that need them:
- Run: `python scripts/create_verifier_labels.py --execute`
### Phase 1: Sync Deployment (All 7 Repos) ✅ COMPLETED

All consumer repos synced as of 2026-01-09. Latest sync PRs merged:

| Repo | Last Sync | Status |
|------|-----------|--------|
| Manager-Database | 2026-01-09T04:47 | ✅ Synced |
| Template | 2026-01-09T05:21 | ✅ Synced (+ isort fix) |
| trip-planner | 2026-01-09T05:02 | ✅ Synced |
| Travel-Plan-Permission | 2026-01-09T04:46 | ✅ Synced |
| Portable-Alpha-Extension-Model | 2026-01-09T04:47 | ✅ Synced |
| Trend_Model_Project | 2026-01-09T04:51 | ✅ Synced |
| Collab-Admin | 2026-01-09T04:47 | ✅ Synced |

**Phase 3 Workflows Deployed (verified in all repos):**
- ✅ `agents-capability-check.yml`
- ✅ `agents-decompose.yml`
- ✅ `agents-dedup.yml`
- ✅ `agents-auto-label.yml`

**Conflict Resolution Pipeline Deployed:**
- ✅ `keepalive_loop.js` with conflict detection
- ✅ `keepalive_prompt_routing.js` with conflict prompts
- ✅ `resolve-trivial-conflicts` job in keepalive workflow
- ✅ All 7 consumer repos have conflict resolution active

---

Expand Down Expand Up @@ -554,23 +551,23 @@ If workflow works in Repo A but not Repo B:

---

### Verification Summary
### Verification Summary (Updated 2026-01-09)

| Workflow | Repos Tested | Repos Passing | Status |
|----------|--------------|---------------|--------|
| `verify:evaluate` | 1/7 | 1 | ✅ Travel-Plan-Permission working |
| `verify:compare` | 1/7 | 1 | ✅ NOT A BUG - PR #4249 not merged (expected skip) |
| `agents:optimize` | 1/7 | 1 | ✅ Manager-Database working |
| `agents-capability-check` | 0/7 | - | ⏳ Pending sync |
| `agents-decompose` | 0/7 | - | ⏳ Pending sync |
| `agents-dedup` | 0/7 | - | ⏳ Pending sync |
| `agents-auto-label` | 0/7 | - | ⏳ Pending sync |
| `agents-verify-to-issue` | 0/7 | - | ⏳ Pending sync |
| Workflow | Repos Deployed | Status | Notes |
|----------|----------------|--------|-------|
| `verify:evaluate` | 7/7 | ✅ Deployed | Travel-Plan-Permission tested working |
| `verify:compare` | 7/7 | ✅ Deployed | Multi-provider comparison working |
| `agents:optimize` | 7/7 | ✅ Deployed | Manager-Database tested working |
| `agents-capability-check` | 7/7 | ✅ Deployed | Pending functional testing |
| `agents-decompose` | 7/7 | ✅ Deployed | Pending functional testing |
| `agents-dedup` | 7/7 | ✅ Deployed | Pending functional testing |
| `agents-auto-label` | 7/7 | ✅ Deployed | Pending functional testing |
| `resolve-trivial-conflicts` | 7/7 | ✅ Deployed | pr_body.md auto-resolution working |

**Investigation completed (2026-01-08):**
- Trend_Model_Project PR #4249 is OPEN, not merged
- Verifier correctly skipped (designed for merged PRs only)
- Labels need to be created in repos (only Travel-Plan-Permission has them)
**Current Conflicting PRs (as of 2026-01-09):**
- Manager-Database: 2 PRs (#134, #135) - real code conflicts, need Codex
- Portable-Alpha-Extension-Model: 1 PR (#1049) - real code conflict, needs Codex
- All other repos (besides Manager-Database and Portable-Alpha-Extension-Model): No conflicts

**Minimum for Phase 3 Completion:** Each workflow tested in ≥2 repos, passing in ≥2 repos

Expand All @@ -579,7 +576,7 @@ If workflow works in Repo A but not Repo B:
## Phase 3 Functional Testing (Manager-Database)

> **Purpose:** Validate workflows produce correct results (after deployment verified)
> **Status:** Blocked on deployment verification
> **Status:** Ready for functional testing - workflows deployed
> **Test Repository:** Manager-Database (primary), Travel-Plan-Permission (secondary)

### Test Suite A: Capability Check (3 issues)
Expand Down Expand Up @@ -752,23 +749,30 @@ If any workflow causes issues in consumer repos:
|-------|-------|-------|-----------|--------|
| 1 | PR Verification | 2 | Manager-Database | ✅ Deployed, 7/7 repos synced |
| 2 | Issue Formatting | 1 | Manager-Database | ✅ Deployed & tested - Quality: 7.5/10 |
| 3 | Pre-Agent Intelligence | 4 | Manager-Database | ✅ All 4 workflows created, in sync manifest |
| 4 | Full Automation & Cleanup | 5 | Manager-Database | 🔄 Implementation started |
| 3 | Pre-Agent Intelligence | 4 | Manager-Database | ✅ All 4 workflows deployed to all 7 repos (2026-01-09) |
| 4 | Full Automation & Cleanup | 5 | Manager-Database | ✅ 4D Conflict Resolution deployed, 4E implemented |

**Phase 3 Components:**
- **3A:** Capability Check - Pre-agent feasibility gate - ✅ Script + Workflow created (`agents-capability-check.yml`)
- **3B:** Task Decomposition - Auto-split large issues - ✅ Script + Workflow created (`agents-decompose.yml`)
- **3C:** Duplicate Detection - Comment-only mode - ✅ Script + Workflow created (`agents-dedup.yml`)
- **3D:** Semantic Labeling - Auto-suggest/apply labels - ✅ Script + Workflow created (`agents-auto-label.yml`)
**Phase 3 Components (ALL DEPLOYED 2026-01-09):**
- **3A:** Capability Check - Pre-agent feasibility gate - ✅ Deployed to 7/7 repos
- **3B:** Task Decomposition - Auto-split large issues - ✅ Deployed to 7/7 repos
- **3C:** Duplicate Detection - Comment-only mode - ✅ Deployed to 7/7 repos
- **3D:** Semantic Labeling - Auto-suggest/apply labels - ✅ Deployed to 7/7 repos

**Phase 4 Components:**
- **4A:** Label Cleanup - ✅ Script created (`scripts/cleanup_labels.py`)
- **4B:** User Guide - Operational documentation for label system - 📋 Deferred
- **4C:** Auto-Pilot Label - End-to-end issue-to-merge automation - 📋 Planning
- **4D:** Conflict Resolution - ✅ **FULLY INTEGRATED** - `conflict_detector.js` + `keepalive_loop.js` + `keepalive_prompt_routing.js` wired together
- **4D:** Conflict Resolution - ✅ **DEPLOYED TO ALL 7 REPOS** - `resolve-trivial-conflicts` job + conflict detection + prompts
- **4E:** Verify-to-Issue - ✅ Workflow created (`agents-verify-to-issue.yml`), in sync manifest

**Total: 12 deployment actions** - Phases 1-2 deployed. Phase 3 scripts ready. Phase 4 partially implemented.
**Conflict Resolution Pipeline (DEPLOYED 2026-01-09):**
- ✅ `conflict_detector.js` - Detects merge conflicts via GitHub API
- ✅ `keepalive_loop.js` - Calls conflict detection, routes to appropriate action
- ✅ `keepalive_prompt_routing.js` - Routes conflict action to fix_merge_conflicts.md prompt
- ✅ `resolve-trivial-conflicts` job - Auto-resolves pr_body.md conflicts in keepalive workflow
- ✅ All 7 consumer repos have complete conflict resolution pipeline

**Total: 12 deployment actions** - Phases 1-3 deployed. Phase 4 partially implemented.

**Substantive Quality Assessment:**
- **agents:optimize:** 8.6/10 - Provides valuable, actionable analysis
Expand All @@ -786,6 +790,21 @@ If any workflow causes issues in consumer repos:
4. ~~**Resolve Collab-Admin sync**~~ ✅ PR #113 merged 2026-01-07
5. ~~**Live test `agents:optimize`**~~ ✅ Tested on Manager-Database #184 - Quality: 8.6/10
6. ~~**Live test `agents:apply-suggestions`**~~ ✅ Tested on Manager-Database #184 - Quality: 6/10
7. ~~**Deploy Phase 3 workflows**~~ ✅ All 4 workflows deployed to 7/7 repos (2026-01-09)
8. ~~**Deploy conflict resolution**~~ ✅ Complete pipeline deployed to 7/7 repos (2026-01-09)
9. ~~**Fix Template sync PR**~~ ✅ Added `tools` to `known-first-party` for ruff isort (2026-01-09)
10. ~~**Resolve Trend_Model_Project #4301**~~ ✅ Manually resolved code conflicts (2026-01-09)

### High Priority - Functional Testing (Next)
1. **Test `agents-capability-check.yml`** - Create issue with external dependency, verify BLOCKED response
2. **Test `agents-decompose.yml`** - Create large issue, verify sub-task breakdown
3. **Test `agents-dedup.yml`** - Create similar issue, verify duplicate warning
4. **Test `agents-auto-label.yml`** - Create unlabeled issue, verify label suggestions

### High Priority - Resolve Remaining Conflicts
1. **Manager-Database #134** - Add UK Filing Parser Implementation (real code conflict)
2. **Manager-Database #135** - Implement production rate limiter with Redis backend (real code conflict)
3. **Portable-Alpha-Extension-Model #1049** - Codex bootstrap for #1048 (real code conflict)

### High Priority Enhancements
1. ~~**Enable LLM for apply_suggestions**~~ ✅ Changed `use_llm=False` to `use_llm=True` in workflow
Expand Down Expand Up @@ -1463,12 +1482,12 @@ os.environ["LANGCHAIN_PROJECT"] = "workflows-agents"

| Initiative | Effort | Value | Priority | Status |
|------------|--------|-------|----------|--------|
| 4A. Label Cleanup | Low | Medium | Ready | ❌ Not started |
| 4B. User Guide | Medium | High | Defer | 📋 After other features stable |
| 4C. Auto-Pilot | High | High | Test carefully | ❌ Not started |
| 4D. Conflict Resolution | Medium | High | In Progress | ✅ Script done, integration pending |
| 4E. Verify-to-Issue | Low | Medium | Ready | ✅ **Implemented & synced** |
| 5A. Auto-labeling | Low | Medium | Ready | ✅ **Workflow created** |
| 4A. Label Cleanup | Low | Medium | Ready | ⏳ Short term |
| 4B. User Guide | Medium | High | Defer | ⏳ Medium term |
| 4C. Auto-Pilot | High | High | Test carefully | ⏳ Medium term |
| 4D. Conflict Resolution | Medium | High | In Progress | ✅ **Deployed to all repos** |
| 4E. Verify-to-Issue | Low | Medium | Ready | ✅ **Deployed, needs live test** |
| 5A. Auto-labeling | Low | Medium | Ready | ✅ **Deployed, needs live test** |
| 5B. Coverage PR Check | Low | Medium | Ready | ⚠️ Existing workflow, enhance |
| 5D. Dependabot Auto-merge | Low | Medium | Ready | ⚠️ Extend existing |
| 5E. Issue Lint | Low | Low | Later | ❌ Not started |
Expand All @@ -1481,50 +1500,49 @@ os.environ["LANGCHAIN_PROJECT"] = "workflows-agents"

### Immediate (Can Do Now)

1. **Create Phase 3 Workflows** - Scripts ready, just need workflow files:
- `agents-capability-check.yml` - Gate before agent assignment
- `agents-decompose.yml` - Split large issues automatically
- `agents-dedup.yml` - Detect duplicate issues
1. ~~**Create Phase 3 Workflows**~~ ✅ DONE - All 4 workflows deployed to 7/7 repos (2026-01-09)

2. ~~**Integrate Conflict Detector**~~ ✅ DONE - Complete pipeline deployed:
- `resolve-trivial-conflicts` job auto-resolves pr_body.md
- Full conflict detection routes to Codex for resolution

2. **Integrate Conflict Detector** - Script exists, add to keepalive pipeline:
- Update `keepalive_gate.js` to call `conflict_detector.js`
- Add conflict prompt routing in `keepalive_prompt_routing.js`
3. **Functional Test Phase 3 Workflows** - PRIORITY (Issues created):
- [#227](https://github.com/stranske/Manager-Database/issues/227) - Test Suite A: Capability Check
- [#228](https://github.com/stranske/Manager-Database/issues/228) - Test Suite B: Decomposition
- [#229](https://github.com/stranske/Manager-Database/issues/229) - Test Suite C: Deduplication
- [#230](https://github.com/stranske/Manager-Database/issues/230) - Test Suite D: Auto-Label

3. **Test 4E Verify-to-Issue** - Workflow deployed, needs live test:
4. ~~**Resolve Remaining Sync PR Code Conflicts**~~ ✅ DONE - All sync PRs merged (2026-01-09)

5. **Test 4E Verify-to-Issue** - Workflow deployed, needs live test:
- Find merged PR with verification feedback
- Add `verify:create-issue` label
- Validate issue creation and linking

4. **Test Auto-Label Workflow** - Deployed to consumer repos:
- Create test issue with clear topic (e.g., "bug" or "documentation")
- Verify label suggestions appear

### Short Term (1-2 weeks)

5. **Label Cleanup Audit** - Per-repo idiosyncratic labels:
- Create `scripts/cleanup_labels.py`
6. **Label Cleanup Audit** - Per-repo idiosyncratic labels:
- Script exists: `scripts/cleanup_labels.py`
- Audit Manager-Database first
- Generate cleanup PRs with human approval

6. **GitHub Models Authentication Fix**:
- Investigate 401 "models permission required" in consumer repos
- Either fix token permissions or document OpenAI-only mode
7. ~~**GitHub Models Authentication Fix**~~ ✅ DONE - Token permissions resolved

### Medium Term (2-4 weeks)

7. **Auto-Pilot Design & Testing** - High risk, careful rollout:
8. **Auto-Pilot Design & Testing** - High risk, careful rollout:
- Design state machine for sequential workflow triggers
- Test on Manager-Database with controlled simple issues
- Add safety limits (max iterations, token budgets)

8. **User Guide Documentation** - After Phase 4 features stable:
9. **User Guide Documentation** - After Phase 4 features stable:
- Create `docs/WORKFLOW_USER_GUIDE.md`
- Add to sync manifest
- Include label decision tree

---

## Test Results Summary (2026-01-08)
## Test Results Summary (2026-01-09)

### Phase 3 Script Test Coverage

Expand All @@ -1538,12 +1556,25 @@ os.environ["LANGCHAIN_PROJECT"] = "workflows-agents"

### Deployed Workflows

| Workflow | Phase | Consumer Sync |
|----------|-------|---------------|
| `agents-issue-optimizer.yml` | 2 | ✅ Synced |
| `agents-verifier.yml` | 1 | ✅ Synced |
| `agents-auto-label.yml` | 3D | ✅ In manifest |
| `agents-verify-to-issue.yml` | 4E | ✅ In manifest |
| Workflow | Phase | Consumer Sync | Functional Test |
|----------|-------|---------------|-----------------|
| `agents-issue-optimizer.yml` | 2 | ✅ All 7 repos | ✅ Production use |
| `agents-verifier.yml` | 1 | ✅ All 7 repos | ✅ Production use |
| `agents-capability-check.yml` | 3A | ✅ All 7 repos | ⏳ Pending |
| `agents-decompose.yml` | 3B | ✅ All 7 repos | ⏳ Pending |
| `agents-dedup.yml` | 3C | ✅ All 7 repos | ⏳ Pending |
| `agents-auto-label.yml` | 3D | ✅ All 7 repos | ⏳ Pending |
| `agents-verify-to-issue.yml` | 4E | ✅ All 7 repos | ⏳ Pending |

### Conflict Resolution Pipeline

| Component | Status | Location |
|-----------|--------|----------|
| `keepalive_loop.js` | ✅ Deployed | All 7 repos |
| `keepalive_prompt_routing.js` | ✅ Deployed | All 7 repos |
| `conflict_detector.js` | ✅ Deployed | All 7 repos |
| `fix_merge_conflicts.md` prompt | ✅ Deployed | All 7 repos |
| `resolve-trivial-conflicts` job | ✅ Deployed | Keepalive workflow |

### Implemented but Not Workflow-Integrated

Expand Down
Loading