diff --git a/.agents/AGENTS.md b/.agents/AGENTS.md index d610e2fb5..7004303b1 100644 --- a/.agents/AGENTS.md +++ b/.agents/AGENTS.md @@ -235,7 +235,7 @@ Orchestration agents can create drafts in `draft/` for reusable parallel process | Security | `tools/security/tirith.md` (terminal guard), `tools/security/shannon.md` (pentesting) | | Cloud GPU | `tools/infrastructure/cloud-gpu.md` | | Parallel agents | `tools/ai-assistants/headless-dispatch.md`, `tools/ai-assistants/runners/` | -| Orchestration | `supervisor-helper.sh` (batch dispatch, cron pulse, self-healing) | +| Orchestration | `supervisor-helper.sh` (batch dispatch, cron pulse, self-healing), `/runners-check` (quick queue status) | | MCP dev | `tools/build-mcp/build-mcp.md` | | Agent design | `tools/build-agent/build-agent.md` | | Framework | `aidevops/architecture.md` | diff --git a/.agents/scripts/commands/runners-check.md b/.agents/scripts/commands/runners-check.md new file mode 100644 index 000000000..b1571cdef --- /dev/null +++ b/.agents/scripts/commands/runners-check.md @@ -0,0 +1,56 @@ +--- +description: Quick health check of supervisor batch queue, workers, PRs, and system resources +agent: Build+ +mode: subagent +--- + +Quick diagnostic of the supervisor queue. Shows batch status, stuck tasks, open PRs, and issues. + +Arguments: $ARGUMENTS + +## Steps + +Run these commands in parallel and present a unified report: + +```bash +# 1. Active batch status +~/.aidevops/agents/scripts/supervisor-helper.sh status 2>&1 + +# 2. Open PRs from workers (need merge/review) +gh pr list --state open --json number,title,headRefName,createdAt,statusCheckRollup \ + --jq '.[] | "\(.number) [\(.headRefName)] \(.title) checks:\(.statusCheckRollup | map(.conclusion // .state) | join(","))"' 2>/dev/null + +# 3. Active worktrees (worker sessions) +git worktree list 2>/dev/null + +# 4. System resources +~/.aidevops/agents/scripts/supervisor-helper.sh db \ + "SELECT id, state, retries FROM tasks WHERE state NOT IN ('deployed','cancelled','failed') ORDER BY state;" 2>/dev/null +``` + +## Report Format + +Present results as a concise dashboard: + +### Batch Status +- Batch name, total/completed/queued/running/failed counts +- Any tasks stuck in retrying or evaluating for >10 minutes + +### Action Items +Flag these for the user (most important first): +1. **PRs ready to merge** — all CI green, no review comments +2. **PRs with CI failures** — need investigation +3. **Tasks stuck** — in retrying/evaluating too long +4. **Tasks at max retries** — need manual intervention or re-queue +5. **Stale worktrees** — for tasks already deployed/merged + +### System Health +- Load, memory, worker count +- Cron pulse status: `~/.aidevops/agents/scripts/supervisor-helper.sh cron status 2>&1` + +## Arguments + +- No arguments: check the most recent active batch +- `--batch `: check a specific batch +- `--all`: show all batches including completed +- `--fix`: auto-fix simple issues (merge green PRs, clean stale worktrees, reset stuck tasks)