diff --git a/.agents/AGENTS.md b/.agents/AGENTS.md index f69d8a3585..f55abeaf9b 100644 --- a/.agents/AGENTS.md +++ b/.agents/AGENTS.md @@ -191,7 +191,7 @@ Read subagents on-demand. Full index: `subagent-index.toon`. | OpenAPI exploration | `tools/context/openapi-search.md` | | Local models | `tools/local-models/local-models.md`, `tools/local-models/huggingface.md`, `scripts/local-model-helper.sh` | | Model routing | `tools/context/model-routing.md`, `reference/orchestration.md` | -| Orchestration | `reference/orchestration.md`, `tools/ai-assistants/headless-dispatch.md`, `scripts/commands/pulse.md` | +| Orchestration | `reference/orchestration.md`, `tools/ai-assistants/headless-dispatch.md`, `scripts/commands/pulse.md`, `workflows/mission-orchestrator.md` | | Agent/MCP dev | `tools/build-agent/build-agent.md`, `tools/build-mcp/build-mcp.md`, `tools/mcp-toolkit/mcporter.md` | | Framework | `aidevops/architecture.md`, `scripts/commands/skills.md` | diff --git a/.agents/subagent-index.toon b/.agents/subagent-index.toon index 1d8b601a38..2f4857ba05 100644 --- a/.agents/subagent-index.toon +++ b/.agents/subagent-index.toon @@ -99,9 +99,10 @@ google-search-console,npx mcp-server-gsc@latest,service-account,seo/google-searc mcporter,npx mcporter,none,tools/mcp-toolkit/mcporter.md,Discover call compose and generate CLIs for any MCP server --> - + +## Quick Reference + +- **Purpose**: Drive a mission from scoped milestones to verified completion over hours or days +- **Input**: A populated mission state file at `todo/missions/{mission_id}-mission.md` +- **Output**: Completed milestones, merged PRs, budget reconciliation, retrospective +- **Created by**: `/mission` command (scoping + decomposition) +- **Invoked by**: Pulse supervisor or manual `/mission run {mission_id}` + +**Key files**: + +| File | Purpose | +|------|---------| +| `todo/missions/{id}-mission.md` | Mission state — the single source of truth | +| `templates/mission-template.md` | Template for new missions | +| `scripts/commands/mission.md` | `/mission` command (scoping phase) | +| `scripts/commands/pulse.md` | Supervisor that dispatches mission features | +| `workflows/ralph-loop.md` | Iterative development pattern used by workers | + +**Execution model**: Sequential milestones, parallel features within each milestone. One orchestrator layer — no recursive orchestration. The orchestrator dispatches workers; workers do not spawn sub-orchestrators. + +**State lives in git.** The mission file is the database. Update it after every significant event (milestone start, feature completion, validation result, re-plan). Commit and push after updates so the pulse supervisor and other sessions can see current state. + + + +## How the Orchestrator Thinks + +You are an intelligent project manager, not a script executor. The mission file tells you the goal, milestones, and current state. Your job is to drive progress by dispatching workers, validating results, handling failures, and re-planning when reality diverges from the plan. + +**Judgment over rules.** The guidance below describes what to check and why. When you encounter something unexpected — a worker that produced something different from what was specified, a milestone that turns out to be unnecessary, a dependency that wasn't anticipated — handle it the way a competent project manager would: assess the situation, decide, act, record the decision, move on. + +**Speed matters.** A mission that ships 80% of scope on time beats one that achieves 100% scope three weeks late. When a feature is blocked or proving harder than estimated, consider: skip it, simplify it, or defer it to a follow-up. Record the decision in the mission's Decision Log. + +## Execution Loop + +```text +For each milestone (sequential): + 1. Check prerequisites — are blockers resolved? Resources available? + 2. Dispatch features as workers (parallel, up to max_parallel_workers) + 3. Monitor progress — track PRs, handle failures, re-dispatch + 4. Validate milestone — run validation criteria + - Pass → update state, advance to next milestone + - Fail → diagnose, create fix tasks, re-validate + 5. Update budget tracking + +After all milestones: + 6. Final validation + 7. Budget reconciliation + 8. Retrospective + 9. Offer improvements back to aidevops +``` + +### Step 1: Read Mission State + +```bash +# Find and read the mission file +cat todo/missions/{mission_id}-mission.md + +# Determine current milestone (first with status != passed/skipped) +# Check which features are pending, in-progress, or done +``` + +Parse the YAML frontmatter for configuration: mode (poc/full), budget limits, model routing preferences, max parallel workers. + +### Step 2: Check Prerequisites + +Before starting a milestone: + +1. **Blockers**: Are all `blocked-by:` references resolved? Check GitHub issue/PR state. +2. **Resources**: Does the Resources section list anything with status `needed`? If so, pause and report — the orchestrator cannot provision accounts or make purchases autonomously (unless a payment agent is available, see t1358). +3. **Budget**: Has any budget category exceeded the alert threshold? If so, pause and report with a spend summary and projection. + +If prerequisites are not met, update the milestone status to `blocked` with a reason, and check if a later milestone can proceed independently (rare — milestones are sequential by design, but some may be independent). + +### Step 3: Dispatch Features + +For each feature in the current milestone with status `pending`: + +```bash +# Full mode — standard worktree + PR workflow +opencode run --dir --title "Mission {id} / {feature_id}: {title}" \ + "/full-loop Implement {feature_description} -- Context: part of mission {id}, milestone {N}" & +sleep 2 + +# POC mode — skip ceremony, commit to branch/main +opencode run --dir --title "Mission {id} / {feature_id}: {title}" \ + "/full-loop --poc Implement {feature_description}" & +sleep 2 +``` + +**Dispatch rules:** + +- Respect `max_parallel_workers` from mission config +- Use model tier from `model_routing.workers` (default: sonnet) +- Include mission context in the dispatch prompt so workers understand the bigger picture +- Update the feature's Status column to `dispatched` and Worker column to the PID or session ID +- ALWAYS use `opencode run` — never `claude` or `claude -p` + +### Step 4: Monitor Progress + +After dispatching, monitor via GitHub state (same pattern as pulse): + +```bash +# Check for PRs created by mission workers +gh pr list --repo --search "Mission {id}" --json number,title,state,mergeable,statusCheckRollup + +# Check worker processes +ps axo pid,etime,command | grep "Mission {id}" | grep '\.opencode' +``` + +**Handle outcomes:** + +| Outcome | Action | +|---------|--------| +| PR merged | Update feature status to `done`, record in budget log | +| PR with green CI | Merge it (same as pulse Step 3) | +| PR with failing CI | Dispatch a fix worker, or fix inline if simple | +| Worker died (no PR, process gone) | Re-dispatch the feature | +| Worker stuck (3+ hours, no PR) | Kill process, re-dispatch | +| Feature turns out unnecessary | Mark `skipped` with reason in Decision Log | + +Update the mission file after each state change. Commit and push so state is visible. + +### Step 5: Validate Milestone + +When all features in a milestone are `done` or `skipped`: + +1. Read the milestone's Validation Criteria +2. Run each criterion — these may be: + - **Automated**: `npm test`, `pytest`, build commands, API endpoint checks + - **Browser-based**: Use Playwright/Stagehand to verify UI renders correctly + - **Manual inspection**: Read generated files, check configurations +3. Record results against each validation checkbox + +```bash +# Example: run tests +cd && npm test 2>&1 + +# Example: check API endpoint +curl -s http://localhost:3000/api/health | jq .status + +# Example: browser verification (if Playwright available) +# Dispatch a validation worker with browser tools enabled +``` + +**Validation outcomes:** + +- **All criteria pass** → Set milestone status to `passed`, update TOON metadata, advance +- **Some criteria fail** → Diagnose the failure: + 1. Is it a bug in a completed feature? Create a fix task, dispatch a worker + 2. Is it a missing feature that wasn't in the plan? Add it, dispatch + 3. Is it an integration issue between features? Create an integration fix task + 4. After fixes, re-validate +- **Repeated validation failure** (3+ attempts) → Pause, report to user with diagnosis + +### Step 6: Budget Tracking + +After each worker session completes, update the Budget Tracking section: + +1. Estimate token usage from worker session duration and model tier +2. Update the Spend Log table with date, category, amount, description, milestone +3. Recalculate the Summary table (spent, remaining, % used) +4. If any category exceeds `alert_threshold_pct`, pause and report + +**Budget projection**: After milestone 1 completes, compare actual vs estimated. If actual exceeds estimate by >50%, recalculate projections for remaining milestones and report: "At current burn rate, the mission will cost ${X} / {Y}h vs budgeted ${A} / {B}h. Continue, reduce scope, or pause?" + +### Step 7: Completion + +When all milestones are `passed`: + +1. Run final validation (if defined in the mission file) +2. Update mission status to `completed` +3. Fill in the Retrospective section: + - Outcomes: what was delivered vs the original goal + - Budget accuracy: budgeted vs actual for time, money, tokens + - Lessons learned: what worked, what didn't +4. Review mission agents for promotion (see below) +5. Offer improvements back to aidevops (see below) + +## Self-Organisation + +Missions often require capabilities that don't exist yet — domain-specific knowledge, project-specific conventions, or tooling that the framework doesn't have. The orchestrator should self-organise by creating what it needs. + +### File and Folder Management + +Each mission gets a workspace for its artifacts: + +```text +todo/missions/ + {mission_id}-mission.md # Mission state file (the source of truth) + {mission_id}/ # Mission workspace folder + research/ # Research artifacts, comparisons, notes + decisions/ # Detailed decision records (if too long for the log) + artifacts/ # Generated configs, schemas, mockups +``` + +**Rules:** + +- The mission state file is ALWAYS at `todo/missions/{id}-mission.md` — never nested +- The workspace folder is optional — create it when the mission generates artifacts +- Research artifacts go in `{id}/research/` — PDFs, screenshots, comparison tables +- Keep the workspace lean — it's for mission-specific artifacts, not code (code lives in the repo) + +### Temporary Agent Creation (Draft Tier) + +When the orchestrator discovers a need for reusable instructions during mission execution — a domain-specific validation pattern, a project-specific coding convention, a research methodology — create a draft agent. + +**When to create a draft agent:** + +- The same instructions would be repeated across 3+ worker dispatches +- A domain requires specific knowledge that workers keep getting wrong +- A validation pattern is complex enough to warrant its own agent doc +- Research in an unfamiliar domain has produced findings worth capturing + +**How to create:** + +```bash +mkdir -p ~/.aidevops/agents/draft/{mission_id}/ + +cat > ~/.aidevops/agents/draft/{mission_id}/{name}.md << 'AGENT' +--- +description: {Purpose — what this agent knows/does} +mode: subagent +status: draft +created: {YYYY-MM-DD} +source: mission {mission_id} +tools: + read: true + # ... appropriate permissions +--- + +# {Agent Name} + +{Instructions captured from mission context...} +AGENT +``` + +**After creating a draft:** + +1. Record it in the mission file's Mission Agents table +2. Reference it in subsequent worker dispatches instead of repeating instructions +3. At mission completion, review each draft for promotion: + - **Promote to custom/** if useful for this user but not generally applicable + - **Promote to shared/** (via PR to `.agents/`) if generally useful + - **Discard** if it was truly mission-specific and won't be reused + +See `tools/build-agent/build-agent.md` "Agent Lifecycle Tiers" for full promotion workflow. + +### Improvement Feedback to aidevops + +Every mission is an opportunity to improve the framework. During and after execution, look for: + +**Patterns to capture:** + +| Signal | Action | +|--------|--------| +| A worker repeatedly misunderstands an instruction | Improve the relevant agent doc | +| A manual step could be automated | File a GitHub issue proposing a helper script | +| A missing capability blocked progress | File a feature request issue | +| A draft agent proved generally useful | Propose promotion via PR | +| Budget estimates were consistently off | Improve estimation guidance | +| A validation pattern would help other missions | Extract to a shared workflow | + +**How to feed back:** + +1. **During mission**: Record observations in the mission's Decision Log or Lessons Learned +2. **At completion**: Review the Retrospective section for actionable improvements +3. **File issues**: `gh issue create --repo --title "Improvement: {description}"` for anything that would benefit future missions +4. **Create PRs**: For draft agents worth promoting, follow the standard worktree + PR workflow + +The goal is a natural feedback loop: missions discover patterns, patterns become agents/scripts, agents/scripts improve future missions. + +## Research Guidance for Unknown Domains + +Missions often venture into domains where the orchestrator and workers lack expertise. When encountering an unfamiliar domain: + +### Research-First Pattern + +Before dispatching implementation workers for an unfamiliar domain: + +1. **Dispatch a research worker** using the cheapest appropriate model: + + ```bash + opencode run --dir --agent Research --title "Mission {id}: Research {topic}" \ + "Research {topic} for mission {id}. Deliverable: a summary document at \ + todo/missions/{id}/research/{topic}.md covering: key concepts, recommended \ + approach, libraries/tools to use, common pitfalls, and example patterns. \ + Use context7 MCP for library docs. Use webfetch only for URLs from search results." & + ``` + +2. **Review research output** before dispatching implementation workers +3. **Create a draft agent** if the research reveals domain-specific patterns workers will need +4. **Include research context** in implementation worker dispatch prompts + +### Using Existing aidevops Capabilities + +Before building something new, check if aidevops already has it: + +```bash +# Search the subagent index for relevant capabilities +rg "{keyword}" ~/.aidevops/agents/subagent-index.toon + +# Search agent files for relevant patterns +rg -l "{keyword}" ~/.aidevops/agents/**/*.md + +# Check available skills +aidevops skills search "{keyword}" + +# Check available MCP servers +rg "{keyword}" ~/.aidevops/agents/subagent-index.toon | grep mcp_servers +``` + +**Common capabilities missions need:** + +| Need | Existing Capability | Reference | +|------|---------------------|-----------| +| Browser testing | Playwright, Stagehand, Playwriter | `tools/browser/browser-automation.md` | +| API integration | Better Auth, Hono, Drizzle | `tools/api/` | +| Database setup | PGlite, Postgres+Drizzle | `tools/database/`, `services/database/` | +| Deployment | Coolify, Vercel, Cloudron | `tools/deployment/` | +| Email sending | SES, email testing suite | `services/email/` | +| Payment processing | Stripe, RevenueCat | `services/payments/` | +| Code quality | ShellCheck, ESLint, SonarCloud | `tools/code-review/` | +| Documentation | Pandoc, document creation | `tools/document/`, `tools/conversion/` | +| Secret management | gopass, SOPS, gocryptfs | `tools/credentials/` | +| Container orchestration | OrbStack, remote dispatch | `tools/containers/` | +| Library documentation | Context7 MCP | `tools/context/context7.md` | + +### Model Routing for Mission Phases + +Different mission phases have different intelligence requirements: + +| Phase | Default Tier | Rationale | +|-------|-------------|-----------| +| Scoping & decomposition | opus | Strategic reasoning, trade-off analysis | +| Research | haiku/flash | Information gathering, summarisation — cheapest tier | +| Implementation (workers) | sonnet | Code generation, standard dev tasks | +| Validation | sonnet | Test execution, integration checking | +| Re-planning after failure | opus | Requires understanding what went wrong and why | +| Budget analysis | haiku | Arithmetic, table updates | + +Override defaults via the mission file's `model_routing` section. Pattern data from `/patterns recommend` overrides static rules when evidence is strong (>75% success, 3+ samples). + +## POC Mode Specifics + +POC mode trades ceremony for speed. The orchestrator adjusts its behaviour: + +| Aspect | Full Mode | POC Mode | +|--------|-----------|----------| +| Git workflow | Worktree + PR per feature | Single branch or direct to main | +| Task briefs | Required per feature | Skipped | +| Code review | PR review before merge | Self-merge, no review | +| Quality gates | Preflight + postflight | Basic: does it run? | +| Parallel workers | Up to `max_parallel_workers` | Single worker per milestone | +| Validation | Full criteria | "Does the demo work?" | +| Documentation | Required | Minimal — README only | + +**When to suggest POC mode:** + +- Research missions where the deliverable is a prototype +- Time-boxed spikes (< 8 hours) +- Exploring feasibility before committing to full implementation +- The user explicitly asks for "quick", "prototype", or "spike" + +**POC to Full transition:** If a POC proves viable and the user wants to productionise it, create a new mission in Full mode that references the POC as prior art. Don't try to retrofit ceremony onto a POC — start fresh with proper structure. + +## Failure Recovery + +Missions run over hours or days. Things will go wrong. The orchestrator must be crash-resilient. + +### Cold Start Recovery + +If the orchestrator session dies and restarts (context compaction, machine restart, OOM): + +1. **Read the mission file** — it's the source of truth, not conversation history +2. **Check GitHub state** — open PRs, merged PRs, issue status for mission features +3. **Check running workers** — `ps axo pid,etime,command | grep "Mission {id}"` +4. **Determine current state** — which milestone is active, which features are done/pending +5. **Resume from where you are** — don't re-dispatch completed features + +The mission file must always reflect reality. If you're unsure whether a feature completed, check GitHub for a merged PR before re-dispatching. + +### Common Failure Patterns + +| Failure | Recovery | +|---------|----------| +| Worker produces wrong output | Review the dispatch prompt — was it clear enough? Improve and re-dispatch | +| Worker can't find a library/tool | Check if it needs installing. Add to Resources section. Re-dispatch with install instructions | +| Milestone validation fails repeatedly | Step back — is the milestone decomposition wrong? Re-plan if needed | +| Budget exceeded | Pause, report remaining scope vs remaining budget, ask user to decide | +| External dependency unavailable | Mark milestone blocked, try next independent milestone if any | +| Merge conflicts between parallel features | Dispatch a conflict resolution worker, or serialize remaining features | + +### Re-Planning + +When reality diverges significantly from the plan (>50% budget overrun, critical feature proves infeasible, new requirement discovered): + +1. Pause execution — don't dispatch more workers +2. Assess the gap between plan and reality +3. Propose a revised plan: + - Which milestones/features to keep, modify, or drop + - Updated budget projection + - Updated timeline +4. Record the re-plan decision in the Decision Log +5. Update the mission file with revised milestones +6. Resume execution + +Re-planning is a judgment call, not a formula. A 10% overrun on milestone 1 doesn't warrant re-planning. A fundamental assumption proving wrong does. + +## Integration with Pulse + +The mission orchestrator integrates with the existing pulse supervisor: + +- **Pulse dispatches mission features** as regular workers — they appear as GitHub issues with mission context in the title/body +- **The orchestrator monitors** mission-specific state that pulse doesn't track: milestone sequencing, validation gates, budget limits +- **No conflict**: Pulse handles individual task dispatch; the orchestrator handles mission-level coordination. They operate at different abstraction levels. + +**How it works in practice:** + +1. `/mission` creates the mission file + TODO entries + GitHub issues for all features +2. The orchestrator (this doc) guides the session that monitors and drives the mission +3. Pulse may independently dispatch mission features if they appear as available issues +4. The orchestrator tracks which features are done and triggers milestone validation +5. After validation, the orchestrator creates issues for the next milestone's features + +This means missions can progress even without a dedicated orchestrator session — pulse will dispatch features as regular tasks. The orchestrator adds: milestone sequencing, validation gates, budget tracking, and re-planning.