diff --git a/.agents/scripts/commands/full-loop.md b/.agents/scripts/commands/full-loop.md
index 576ac9aca..45fd96c03 100644
--- a/.agents/scripts/commands/full-loop.md
+++ b/.agents/scripts/commands/full-loop.md
@@ -122,9 +122,69 @@ fi
| `status:done` | PR merged | sync-on-pr-merge workflow (automated) |
| `status:verify-failed` | Post-merge verification failed | Worker (contextual) |
| `status:needs-testing` | Code merged, needs manual testing | Worker (contextual) |
+| `dispatched:{model}` | Worker started on task | **Worker (Step 0.7)** |
Only `status:available`, `status:claimed`, and `status:done` are fully automated. All other transitions are set contextually by the agent that best understands the current state. When setting a new status label, always remove the prior status labels to keep exactly one active.
+### Step 0.7: Label Dispatch Model — `dispatched:{model}`
+
+After setting `status:in-progress`, tag the issue with the model running this worker. This provides observability into which model solved each task — essential for cost/quality analysis.
+
+**Detect the current model** from the system prompt or environment. The model name appears in the system prompt as "You are powered by the model named X" or via `ANTHROPIC_MODEL` / `CLAUDE_MODEL` environment variables. Map to a short label:
+
+| Model contains | Label |
+|----------------|-------|
+| `opus` | `dispatched:opus` |
+| `sonnet` | `dispatched:sonnet` |
+| `haiku` | `dispatched:haiku` |
+| unknown | skip labeling |
+
+```bash
+# Detect model — check env vars first, fall back to known model identity
+MODEL_SHORT=""
+for VAR in "$ANTHROPIC_MODEL" "$CLAUDE_MODEL"; do
+ case "$VAR" in
+ *opus*) MODEL_SHORT="opus" ;;
+ *sonnet*) MODEL_SHORT="sonnet" ;;
+ *haiku*) MODEL_SHORT="haiku" ;;
+ esac
+ [[ -n "$MODEL_SHORT" ]] && break
+done
+
+# Fallback: the agent knows its own model from the system prompt.
+# If env vars are empty, set MODEL_SHORT based on your model identity.
+# Example: if you are claude-opus-4-6, set MODEL_SHORT="opus"
+
+if [[ -n "$MODEL_SHORT" && -n "$ISSUE_NUM" && "$ISSUE_NUM" != "null" ]]; then
+ REPO=$(gh repo view --json nameWithOwner -q .nameWithOwner)
+
+ # Remove stale dispatched:* labels so attribution is unambiguous
+ for OLD in "dispatched:opus" "dispatched:sonnet" "dispatched:haiku"; do
+ if [[ "$OLD" != "dispatched:${MODEL_SHORT}" ]]; then
+ if ! gh issue edit "$ISSUE_NUM" --repo "$REPO" --remove-label "$OLD" 2>/dev/null; then
+ : # Label not present — expected, not an error
+ fi
+ fi
+ done
+
+ # Create the label if it doesn't exist yet
+ if ! LABEL_ERR=$(gh label create "dispatched:${MODEL_SHORT}" --repo "$REPO" \
+ --description "Task dispatched to ${MODEL_SHORT} model" --color "1D76DB" 2>&1); then
+ # "already exists" is expected — only warn on other failures
+ if [[ "$LABEL_ERR" != *"already exists"* ]]; then
+ echo "[dispatch-label] Warning: label create failed for dispatched:${MODEL_SHORT} on ${REPO}: ${LABEL_ERR}" >&2
+ fi
+ fi
+
+ if ! EDIT_ERR=$(gh issue edit "$ISSUE_NUM" --repo "$REPO" \
+ --add-label "dispatched:${MODEL_SHORT}" 2>&1); then
+ echo "[dispatch-label] Warning: could not add dispatched:${MODEL_SHORT} to issue #${ISSUE_NUM} on ${REPO}: ${EDIT_ERR}" >&2
+ fi
+fi
+```
+
+**For interactive sessions** (not headless dispatch): If you are working on a task interactively and the issue exists, apply the label based on your own model identity. This ensures all task work is attributed, not just headless dispatches.
+
### Step 1: Auto-Branch Setup
The loop automatically handles branch setup when on main/master:
diff --git a/.agents/scripts/commands/pulse.md b/.agents/scripts/commands/pulse.md
index ce7c1b228..8e8db998e 100644
--- a/.agents/scripts/commands/pulse.md
+++ b/.agents/scripts/commands/pulse.md
@@ -8,12 +8,12 @@ You are the supervisor pulse. You run every 2 minutes via launchd — **there is
**AUTONOMOUS EXECUTION REQUIRED:** You MUST execute every step including dispatching workers. NEVER present a summary and stop. NEVER ask "what would you like to action/do/work on?" — there is nobody to answer. Your output is a log of actions you ALREADY TOOK (past tense), not a menu of options. If you finish without having run `opencode run` or `gh pr merge` commands, you have failed.
-**TARGET: 6 concurrent workers at all times.** If slots are available and work exists, dispatch workers to fill them. An idle slot is wasted capacity.
+**TARGET: fill all available worker slots.** The max worker count is calculated dynamically by `pulse-wrapper.sh` based on available RAM (1 GB per worker, 8 GB reserved for OS + user apps, capped at 8). Read it at the start of each pulse — do not hardcode a number.
Your job is simple:
1. Check the circuit breaker. If tripped, exit immediately.
-2. Count running workers. If all 6 slots are full, continue to Step 2 (you can still merge ready PRs and observe outcomes).
+2. Read the dynamic max workers limit and count running workers. If all slots are full, continue to Step 2 (you can still merge ready PRs and observe outcomes).
3. Fetch open issues and PRs from the managed repos.
4. **Observe outcomes** — check for stuck or failed work and file improvement issues.
5. Pick the highest-value items to fill available worker slots.
@@ -22,7 +22,7 @@ Your job is simple:
That's it. Minimal state (circuit breaker only). No databases. GitHub is the state DB.
-**Max concurrency: 6 workers.**
+**Max concurrency is dynamic** — determined by available RAM at pulse time. See Step 1.
## Step 0: Circuit Breaker Check (t1331)
@@ -36,16 +36,28 @@ That's it. Minimal state (circuit breaker only). No databases. GitHub is the sta
The circuit breaker trips after 3 consecutive task failures (configurable via `SUPERVISOR_CIRCUIT_BREAKER_THRESHOLD`). It auto-resets after 30 minutes or on manual reset (`circuit-breaker-helper.sh reset`). Any task success resets the counter to 0.
-## Step 1: Count Running Workers
+## Step 1: Read Max Workers and Count Running Workers
```bash
-# Count running full-loop workers (macOS pgrep has no -c flag)
-WORKER_COUNT=$(pgrep -f '/full-loop' 2>/dev/null | wc -l | tr -d ' ')
-echo "Running workers: $WORKER_COUNT / 6"
+# Read dynamic max workers (calculated by pulse-wrapper.sh from available RAM).
+# Falls back to 4 if the file doesn't exist (conservative default).
+MAX_WORKERS=$(cat ~/.aidevops/logs/pulse-max-workers 2>/dev/null || echo 4)
+
+# Count running full-loop workers — IMPORTANT: each opencode run spawns TWO
+# processes (a node launcher + the .opencode binary). Only count the .opencode
+# binaries to avoid 2x inflation. Filter by the binary path, not just '/full-loop'.
+WORKER_COUNT=0
+while IFS= read -r pid; do
+ cmd=$(ps -p "$pid" -o command= 2>/dev/null || true)
+ if echo "$cmd" | grep -q '\.opencode'; then
+ WORKER_COUNT=$((WORKER_COUNT + 1))
+ fi
+done < <(pgrep -f '/full-loop' 2>/dev/null || true)
+echo "Running workers: $WORKER_COUNT / $MAX_WORKERS"
```
-- If `WORKER_COUNT >= 6`: set `AVAILABLE=0` — no new workers, but continue to Step 2 (merges and outcome observation don't need slots).
-- Otherwise: calculate `AVAILABLE=$((6 - WORKER_COUNT))` — this is how many workers you can dispatch.
+- If `WORKER_COUNT >= MAX_WORKERS`: set `AVAILABLE=0` — no new workers, but continue to Step 2 (merges and outcome observation don't need slots).
+- Otherwise: calculate `AVAILABLE=$((MAX_WORKERS - WORKER_COUNT))` — this is how many workers you can dispatch.
## Step 2: Fetch GitHub State
@@ -87,7 +99,7 @@ If you see a pattern (same type of failure, same error), create an improvement i
**Duplicate work:** If two open PRs target the same issue or have very similar titles, flag it by commenting on the newer one.
-**Long-running workers:** Check the runtime of each running worker process with `ps axo pid,etime,command | grep '/full-loop'`. The `etime` column shows elapsed time (format: `HH:MM` or `D-HH:MM:SS`). Parse it to get hours.
+**Long-running workers:** Check the runtime of each running worker process with `ps axo pid,etime,command | grep '/full-loop' | grep '\.opencode'` (filter to `.opencode` binaries only — each worker has a `node` launcher + `.opencode` binary; only check the binary to avoid double-counting). The `etime` column shows elapsed time (format: `HH:MM` or `D-HH:MM:SS`). Parse it to get hours.
Workers now have a self-imposed 2-hour time budget (see full-loop.md rule 8), but the supervisor enforces a safety net. For any worker running 2+ hours, **assess whether it's making progress** before deciding to kill:
@@ -233,7 +245,7 @@ This turns blocked issues from a dead end into an actively managed queue.
**Skip issues that already have an open PR:** If an issue number appears in the title or branch name of an open PR, a worker has already produced output for it. Do not dispatch another worker for the same issue. Check the PR list you already fetched — if any PR's `headRefName` or `title` contains the issue number, skip that issue.
-**Deduplication — check running processes:** Before dispatching, check `ps axo command | grep '/full-loop'` for any running worker whose command line contains the issue/PR number you're about to dispatch. Different pulse runs may have used different title formats for the same work (e.g., "issue-2300-simplify-infra-scripts" vs "Issue #2300: t1337 Simplify Tier 3"). Extract the canonical number (e.g., `2300`, `t1337`) and check if ANY running worker references it. If so, skip — do not dispatch a duplicate.
+**Deduplication — check running processes:** Before dispatching, check `ps axo command | grep '/full-loop' | grep '\.opencode'` for any running worker whose command line contains the issue/PR number you're about to dispatch (filter to `.opencode` binaries to avoid double-counting node launchers). Different pulse runs may have used different title formats for the same work (e.g., "issue-2300-simplify-infra-scripts" vs "Issue #2300: t1337 Simplify Tier 3"). Extract the canonical number (e.g., `2300`, `t1337`) and check if ANY running worker references it. If so, skip — do not dispatch a duplicate.
**Blocker-chain validation (MANDATORY before dispatch):** Before dispatching a worker for any issue, validate that its entire dependency chain is resolved — not just the immediate `status:blocked` label. This prevents the #1 cause of workers running 3-9 hours without producing PRs: they start work on tasks whose prerequisites aren't merged yet, then spin trying to work around missing schemas, APIs, or migrations.
@@ -276,7 +288,7 @@ If you're unsure whether it needs decomposition, dispatch the worker — but pre
## Step 4: Execute Dispatches NOW
-**CRITICAL: Do not stop after Step 3. Do not present a summary and wait. Execute the commands below for every item you selected in Step 3. The goal is 6 concurrent workers at all times — if you have available slots, fill them. An idle slot is wasted capacity.**
+**CRITICAL: Do not stop after Step 3. Do not present a summary and wait. Execute the commands below for every item you selected in Step 3. The goal is MAX_WORKERS concurrent workers at all times — if you have available slots, fill them. An idle slot is wasted capacity.**
### For PRs that just need merging (CI green, approved):
@@ -496,7 +508,7 @@ fi
The strategic review does what sonnet cannot: meta-reasoning about queue health, resource utilisation, stuck chains, stale state, and systemic issues. It can take corrective actions (merge ready PRs, file issues, clean worktrees, dispatch high-value work).
-This does NOT count against the 6-worker concurrency limit — it's a supervisor function, not a task worker.
+This does NOT count against the MAX_WORKERS concurrency limit — it's a supervisor function, not a task worker.
See `scripts/commands/strategic-review.md` for the full review prompt.
@@ -504,7 +516,7 @@ See `scripts/commands/strategic-review.md` for the full review prompt.
- Do NOT maintain state files, databases, or logs (the circuit breaker, stuck detection, and opus review helpers manage their own state files — those are the only exceptions)
- Do NOT auto-kill workers based on stuck detection alone — stuck detection (Step 2b) is advisory only. The kill decision is separate (Step 2a) and requires your judgment
-- Do NOT dispatch more workers than available slots (max 6 total)
+- Do NOT dispatch more workers than available slots (max MAX_WORKERS total, read from Step 1)
- Do NOT try to implement anything yourself — you are the supervisor, not a worker
- Do NOT read source code, run tests, or do any task work
- Do NOT retry failed workers — the next pulse will pick up where things left off
diff --git a/.agents/scripts/pulse-wrapper.sh b/.agents/scripts/pulse-wrapper.sh
new file mode 100755
index 000000000..4e0249bfc
--- /dev/null
+++ b/.agents/scripts/pulse-wrapper.sh
@@ -0,0 +1,351 @@
+#!/usr/bin/env bash
+# pulse-wrapper.sh - Wrapper for supervisor pulse with timeout and dedup
+#
+# Solves: opencode run enters idle state after completing the pulse prompt
+# but never exits, blocking all future pulses via the pgrep dedup guard.
+#
+# This wrapper:
+# 1. Uses a PID file with staleness check (not pgrep) for dedup
+# 2. Runs opencode run with a hard timeout (default: 10 min)
+# 3. Guarantees the process is killed if it hangs after completion
+#
+# Called by launchd every 120s via the supervisor-pulse plist.
+
+set -euo pipefail
+
+#######################################
+# Configuration
+#######################################
+PULSE_TIMEOUT="${PULSE_TIMEOUT:-600}" # 10 minutes max per pulse
+PULSE_STALE_THRESHOLD="${PULSE_STALE_THRESHOLD:-900}" # 15 min = definitely stuck
+
+# Validate numeric configuration
+if ! [[ "$PULSE_TIMEOUT" =~ ^[0-9]+$ ]]; then
+ echo "[pulse-wrapper] Invalid PULSE_TIMEOUT: $PULSE_TIMEOUT — using default 600" >&2
+ PULSE_TIMEOUT=600
+fi
+if ! [[ "$PULSE_STALE_THRESHOLD" =~ ^[0-9]+$ ]]; then
+ echo "[pulse-wrapper] Invalid PULSE_STALE_THRESHOLD: $PULSE_STALE_THRESHOLD — using default 900" >&2
+ PULSE_STALE_THRESHOLD=900
+fi
+PIDFILE="${HOME}/.aidevops/logs/pulse.pid"
+LOGFILE="${HOME}/.aidevops/logs/pulse.log"
+OPENCODE_BIN="${OPENCODE_BIN:-/opt/homebrew/bin/opencode}"
+PULSE_DIR="${PULSE_DIR:-${HOME}/Git/aidevops}"
+PULSE_MODEL="${PULSE_MODEL:-anthropic/claude-sonnet-4-6}"
+ORPHAN_MAX_AGE="${ORPHAN_MAX_AGE:-7200}" # 2 hours — kill orphans older than this
+RAM_PER_WORKER_MB="${RAM_PER_WORKER_MB:-1024}" # 1 GB per worker
+RAM_RESERVE_MB="${RAM_RESERVE_MB:-8192}" # 8 GB reserved for OS + user apps
+MAX_WORKERS_CAP="${MAX_WORKERS_CAP:-8}" # Hard ceiling regardless of RAM
+
+#######################################
+# Ensure log directory exists
+#######################################
+mkdir -p "$(dirname "$PIDFILE")"
+
+#######################################
+# Check for stale PID file and clean up
+# Returns: 0 if safe to proceed, 1 if another pulse is genuinely running
+#######################################
+check_dedup() {
+ if [[ ! -f "$PIDFILE" ]]; then
+ return 0
+ fi
+
+ local old_pid
+ old_pid=$(cat "$PIDFILE" 2>/dev/null || echo "")
+
+ if [[ -z "$old_pid" ]]; then
+ rm -f "$PIDFILE"
+ return 0
+ fi
+
+ # Check if the process is still running
+ if ! kill -0 "$old_pid" 2>/dev/null; then
+ # Process is dead, clean up stale PID file
+ rm -f "$PIDFILE"
+ return 0
+ fi
+
+ # Process is running — check how long
+ local elapsed_seconds
+ elapsed_seconds=$(_get_process_age "$old_pid")
+
+ if [[ "$elapsed_seconds" -gt "$PULSE_STALE_THRESHOLD" ]]; then
+ # Process has been running too long — it's stuck
+ echo "[pulse-wrapper] Killing stale pulse process $old_pid (running ${elapsed_seconds}s, threshold ${PULSE_STALE_THRESHOLD}s)" >>"$LOGFILE"
+ _kill_tree "$old_pid"
+ sleep 2
+ # Force kill if still alive
+ if kill -0 "$old_pid" 2>/dev/null; then
+ _force_kill_tree "$old_pid"
+ fi
+ rm -f "$PIDFILE"
+ return 0
+ fi
+
+ # Process is running and within time limit — genuine dedup
+ echo "[pulse-wrapper] Pulse already running (PID $old_pid, ${elapsed_seconds}s elapsed). Skipping." >>"$LOGFILE"
+ return 1
+}
+
+#######################################
+# Kill a process and all its children (macOS-compatible)
+# Arguments:
+# $1 - PID to kill
+#######################################
+_kill_tree() {
+ local pid="$1"
+ # Find all child processes recursively (bash 3.2 compatible — no mapfile)
+ local child
+ while IFS= read -r child; do
+ [[ -n "$child" ]] && _kill_tree "$child"
+ done < <(pgrep -P "$pid" 2>/dev/null || true)
+ kill "$pid" 2>/dev/null || true
+ return 0
+}
+
+#######################################
+# Force kill a process and all its children
+# Arguments:
+# $1 - PID to kill
+#######################################
+_force_kill_tree() {
+ local pid="$1"
+ local child
+ while IFS= read -r child; do
+ [[ -n "$child" ]] && _force_kill_tree "$child"
+ done < <(pgrep -P "$pid" 2>/dev/null || true)
+ kill -9 "$pid" 2>/dev/null || true
+ return 0
+}
+
+#######################################
+# Get process age in seconds
+# Arguments:
+# $1 - PID
+# Returns: elapsed seconds via stdout
+#######################################
+_get_process_age() {
+ local pid="$1"
+ local etime
+ # macOS ps etime format: MM:SS or HH:MM:SS or D-HH:MM:SS
+ etime=$(ps -p "$pid" -o etime= 2>/dev/null | tr -d ' ') || etime=""
+
+ if [[ -z "$etime" ]]; then
+ echo "0"
+ return 0
+ fi
+
+ local days=0 hours=0 minutes=0 seconds=0
+
+ # Parse D-HH:MM:SS format
+ if [[ "$etime" == *-* ]]; then
+ days="${etime%%-*}"
+ etime="${etime#*-}"
+ fi
+
+ # Count colons to determine format
+ local colon_count
+ colon_count=$(echo "$etime" | tr -cd ':' | wc -c | tr -d ' ')
+
+ if [[ "$colon_count" -eq 2 ]]; then
+ # HH:MM:SS
+ IFS=':' read -r hours minutes seconds <<<"$etime"
+ elif [[ "$colon_count" -eq 1 ]]; then
+ # MM:SS
+ IFS=':' read -r minutes seconds <<<"$etime"
+ else
+ seconds="$etime"
+ fi
+
+ # Remove leading zeros to avoid octal interpretation
+ days=$((10#${days}))
+ hours=$((10#${hours}))
+ minutes=$((10#${minutes}))
+ seconds=$((10#${seconds}))
+
+ echo $((days * 86400 + hours * 3600 + minutes * 60 + seconds))
+ return 0
+}
+
+#######################################
+# Run the pulse with timeout
+#######################################
+run_pulse() {
+ echo "[pulse-wrapper] Starting pulse at $(date -u +%Y-%m-%dT%H:%M:%SZ)" >>"$LOGFILE"
+
+ # Start opencode run in background
+ "$OPENCODE_BIN" run "/pulse" \
+ --dir "$PULSE_DIR" \
+ -m "$PULSE_MODEL" \
+ --title "Supervisor Pulse" \
+ >>"$LOGFILE" 2>&1 &
+
+ local opencode_pid=$!
+ echo "$opencode_pid" >"$PIDFILE"
+
+ echo "[pulse-wrapper] opencode PID: $opencode_pid, timeout: ${PULSE_TIMEOUT}s" >>"$LOGFILE"
+
+ # Wait for completion OR timeout
+ local waited=0
+ local check_interval=5
+
+ while kill -0 "$opencode_pid" 2>/dev/null; do
+ if [[ "$waited" -ge "$PULSE_TIMEOUT" ]]; then
+ echo "[pulse-wrapper] Timeout after ${PULSE_TIMEOUT}s — killing opencode PID $opencode_pid and children" >>"$LOGFILE"
+ _kill_tree "$opencode_pid"
+ sleep 2
+ # Force kill if graceful shutdown didn't work
+ if kill -0 "$opencode_pid" 2>/dev/null; then
+ echo "[pulse-wrapper] Force killing PID $opencode_pid" >>"$LOGFILE"
+ _force_kill_tree "$opencode_pid"
+ fi
+ break
+ fi
+ sleep "$check_interval"
+ waited=$((waited + check_interval))
+ done
+
+ # Clean up PID file
+ rm -f "$PIDFILE"
+
+ # Wait to collect exit status (avoid zombie)
+ wait "$opencode_pid" 2>/dev/null || true
+
+ echo "[pulse-wrapper] Pulse completed at $(date -u +%Y-%m-%dT%H:%M:%SZ) (ran ${waited}s)" >>"$LOGFILE"
+ return 0
+}
+
+#######################################
+# Main
+#######################################
+main() {
+ if ! check_dedup; then
+ return 0
+ fi
+
+ cleanup_orphans
+ calculate_max_workers
+ run_pulse
+ return 0
+}
+
+#######################################
+# Kill orphaned opencode processes
+#
+# Criteria (ALL must be true):
+# - No TTY (headless — not a user's terminal tab)
+# - Not a current worker (/full-loop not in command)
+# - Not the supervisor pulse (Supervisor Pulse not in command)
+# - Not a strategic review (Strategic Review not in command)
+# - Older than ORPHAN_MAX_AGE seconds
+#
+# These are completed headless sessions where opencode entered idle
+# state with a file watcher and never exited.
+#######################################
+cleanup_orphans() {
+ local killed=0
+ local total_mb=0
+
+ while IFS= read -r line; do
+ local pid tty etime rss cmd
+ pid=$(echo "$line" | awk '{print $1}')
+ tty=$(echo "$line" | awk '{print $2}')
+ etime=$(echo "$line" | awk '{print $3}')
+ rss=$(echo "$line" | awk '{print $4}')
+ cmd=$(echo "$line" | cut -d' ' -f5-)
+
+ # Skip interactive sessions (has a real TTY)
+ if [[ "$tty" != "??" ]]; then
+ continue
+ fi
+
+ # Skip active workers, pulse, strategic reviews, and language servers
+ if echo "$cmd" | grep -qE '/full-loop|Supervisor Pulse|Strategic Review|language-server|eslintServer'; then
+ continue
+ fi
+
+ # Skip young processes
+ local age_seconds
+ age_seconds=$(_get_process_age "$pid")
+ if [[ "$age_seconds" -lt "$ORPHAN_MAX_AGE" ]]; then
+ continue
+ fi
+
+ # This is an orphan — kill it
+ local mb=$((rss / 1024))
+ kill "$pid" 2>/dev/null || true
+ killed=$((killed + 1))
+ total_mb=$((total_mb + mb))
+ done < <(ps axo pid,tty,etime,rss,command | grep '\.opencode' | grep -v grep | grep -v 'bash-language-server')
+
+ # Also kill orphaned node launchers (parent of .opencode processes)
+ while IFS= read -r line; do
+ local pid tty etime rss cmd
+ pid=$(echo "$line" | awk '{print $1}')
+ tty=$(echo "$line" | awk '{print $2}')
+ etime=$(echo "$line" | awk '{print $3}')
+ rss=$(echo "$line" | awk '{print $4}')
+ cmd=$(echo "$line" | cut -d' ' -f5-)
+
+ [[ "$tty" != "??" ]] && continue
+ echo "$cmd" | grep -qE '/full-loop|Supervisor Pulse|Strategic Review|language-server|eslintServer' && continue
+
+ local age_seconds
+ age_seconds=$(_get_process_age "$pid")
+ [[ "$age_seconds" -lt "$ORPHAN_MAX_AGE" ]] && continue
+
+ kill "$pid" 2>/dev/null || true
+ local mb=$((rss / 1024))
+ killed=$((killed + 1))
+ total_mb=$((total_mb + mb))
+ done < <(ps axo pid,tty,etime,rss,command | grep 'node.*opencode' | grep -v grep | grep -v '\.opencode')
+
+ if [[ "$killed" -gt 0 ]]; then
+ echo "[pulse-wrapper] Cleaned up $killed orphaned opencode processes (freed ~${total_mb}MB)" >>"$LOGFILE"
+ fi
+ return 0
+}
+
+#######################################
+# Calculate max workers from available RAM
+#
+# Formula: (free_ram - RAM_RESERVE_MB) / RAM_PER_WORKER_MB
+# Clamped to [1, MAX_WORKERS_CAP]
+#
+# Writes MAX_WORKERS to a file that pulse.md reads via bash.
+#######################################
+calculate_max_workers() {
+ local free_mb
+ if [[ "$(uname)" == "Darwin" ]]; then
+ # macOS: use vm_stat for free + inactive (reclaimable) pages
+ local page_size free_pages inactive_pages
+ page_size=$(sysctl -n hw.pagesize 2>/dev/null || echo 16384)
+ free_pages=$(vm_stat 2>/dev/null | awk '/Pages free/ {gsub(/\./,"",$3); print $3}')
+ inactive_pages=$(vm_stat 2>/dev/null | awk '/Pages inactive/ {gsub(/\./,"",$3); print $3}')
+ free_mb=$(((free_pages + inactive_pages) * page_size / 1024 / 1024))
+ else
+ # Linux: use MemAvailable from /proc/meminfo
+ free_mb=$(awk '/MemAvailable/ {print int($2/1024)}' /proc/meminfo 2>/dev/null || echo 8192)
+ fi
+
+ local available_mb=$((free_mb - RAM_RESERVE_MB))
+ local max_workers=$((available_mb / RAM_PER_WORKER_MB))
+
+ # Clamp to [1, MAX_WORKERS_CAP]
+ if [[ "$max_workers" -lt 1 ]]; then
+ max_workers=1
+ elif [[ "$max_workers" -gt "$MAX_WORKERS_CAP" ]]; then
+ max_workers="$MAX_WORKERS_CAP"
+ fi
+
+ # Write to a file that pulse.md can read
+ local max_workers_file="${HOME}/.aidevops/logs/pulse-max-workers"
+ echo "$max_workers" >"$max_workers_file"
+
+ echo "[pulse-wrapper] Available RAM: ${free_mb}MB, reserve: ${RAM_RESERVE_MB}MB, max workers: ${max_workers}" >>"$LOGFILE"
+ return 0
+}
+
+main "$@"
diff --git a/.agents/scripts/supervisor-archived/launchd.sh b/.agents/scripts/supervisor-archived/launchd.sh
index 92664b90c..17e01e9fb 100755
--- a/.agents/scripts/supervisor-archived/launchd.sh
+++ b/.agents/scripts/supervisor-archived/launchd.sh
@@ -131,33 +131,33 @@ _generate_supervisor_pulse_plist() {
local label="com.aidevops.aidevops-supervisor-pulse"
- # Build ProgramArguments array
- local prog_args
- prog_args="${script_path}
- pulse"
- if [[ -n "$batch_arg" ]]; then
- # batch_arg is "--batch id" — split into two strings
- local batch_id
- batch_id="${batch_arg#--batch }"
- prog_args="${prog_args}
- --batch
- ${batch_id}"
+ # Use pulse-wrapper.sh which handles dedup (PID file with staleness),
+ # timeout (kills opencode if it hangs after completing), and cleanup.
+ # This replaces the old pgrep-based dedup guard that failed when
+ # opencode run entered idle state without exiting (t1345).
+ local wrapper_path
+ wrapper_path="$(dirname "$script_path")/../scripts/pulse-wrapper.sh"
+ # Resolve to absolute path
+ if [[ -f "$HOME/.aidevops/agents/scripts/pulse-wrapper.sh" ]]; then
+ wrapper_path="$HOME/.aidevops/agents/scripts/pulse-wrapper.sh"
fi
- # Build EnvironmentVariables dict
+ # Validate wrapper exists before generating plist
+ if [[ ! -f "$wrapper_path" ]]; then
+ echo "ERROR: pulse-wrapper.sh not found at $wrapper_path" >&2
+ return 1
+ fi
+
+ # Build EnvironmentVariables dict — wrapper reads these
local env_dict
env_dict="PATH
${env_path}
- SUPERVISOR_PULSE_LOCK_TIMEOUT
- 1800
- SUPERVISOR_AI_TIMEOUT
- 600"
- # SUPERVISOR_PULSE_LOCK_TIMEOUT=1800s (30 min): prevents concurrent pulses when
- # Phase 14 AI reasoning runs long (108KB+ context can take 6+ min). The default
- # 600s was shorter than the total pulse duration, causing concurrent pulses that
- # interfered with each other via Phase 4e orphan killing (t1301).
- # SUPERVISOR_AI_TIMEOUT=600s (10 min): gives the AI CLI more time to respond to
- # large context prompts before portable_timeout fires (t1301).
+ HOME
+ ${HOME}
+ PULSE_TIMEOUT
+ 600
+ PULSE_STALE_THRESHOLD
+ 900"
if [[ -n "$gh_token" ]]; then
env_dict="${env_dict}
GH_TOKEN
@@ -173,7 +173,8 @@ _generate_supervisor_pulse_plist() {
${label}
ProgramArguments
- ${prog_args}
+ /bin/bash
+ ${wrapper_path}
StartInterval
${interval_seconds}
diff --git a/setup.sh b/setup.sh
index a9eb48f11..02624bcb0 100755
--- a/setup.sh
+++ b/setup.sh
@@ -662,32 +662,41 @@ main() {
fi
fi
- # Enable supervisor pulse scheduler if not already installed
- # The pulse dispatches AI workers via opencode run "/pulse" every 2 minutes.
- # macOS: launchd plist | Linux: cron entry
- local opencode_path
- opencode_path=$(command -v opencode 2>/dev/null) || opencode_path=""
- if [[ -n "$opencode_path" ]] && [[ "${AIDEVOPS_SUPERVISOR_PULSE:-true}" != "false" ]]; then
- local _pulse_installed=false
+ # Enable supervisor pulse scheduler
+ # Uses pulse-wrapper.sh which handles timeout, dedup, and process cleanup.
+ # macOS: launchd plist invoking wrapper | Linux: cron entry invoking wrapper
+ # The wrapper invokes `opencode run "/pulse"` (AI-guided supervisor) with a
+ # hard timeout to prevent hangs when opencode enters idle state without exiting.
+ if [[ "${AIDEVOPS_SUPERVISOR_PULSE:-true}" != "false" ]]; then
+ local wrapper_script="$HOME/.aidevops/agents/scripts/pulse-wrapper.sh"
+ local pulse_label="com.aidevops.aidevops-supervisor-pulse"
local _aidevops_dir
_aidevops_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+ # Detect if pulse is already installed (any platform)
+ local _pulse_installed=false
if [[ "$(uname -s)" == "Darwin" ]]; then
- # macOS: check launchd
- if _launchd_has_agent "com.aidevops.aidevops-supervisor-pulse"; then
- _pulse_installed=true
+ local pulse_plist="$HOME/Library/LaunchAgents/${pulse_label}.plist"
+ if _launchd_has_agent "$pulse_label"; then
+ # Check if it's the old format (needs upgrade to wrapper-based)
+ if [[ -f "$pulse_plist" ]] && grep -q "pulse-wrapper" "$pulse_plist" 2>/dev/null; then
+ _pulse_installed=true
+ fi
+ # Old format will fall through to upgrade path
fi
fi
# Both platforms: check cron
- if [[ "$_pulse_installed" == "false" ]] && crontab -l 2>/dev/null | grep -qF "aidevops: supervisor-pulse"; then
+ if [[ "$_pulse_installed" == "false" ]] && crontab -l 2>/dev/null | grep -qF "pulse-wrapper" 2>/dev/null; then
_pulse_installed=true
fi
- if [[ "$_pulse_installed" == "false" ]]; then
- local _install_pulse=false
- if [[ "$NON_INTERACTIVE" == "true" ]]; then
- _install_pulse=true
- else
+ if [[ "$_pulse_installed" == "false" && -f "$wrapper_script" ]]; then
+ # Detect opencode binary location
+ local opencode_bin
+ opencode_bin=$(command -v opencode 2>/dev/null || echo "/opt/homebrew/bin/opencode")
+
+ local _do_install=true
+ if [[ "$NON_INTERACTIVE" != "true" ]]; then
echo ""
echo "The supervisor pulse enables autonomous orchestration:"
echo " - Dispatches AI workers to implement tasks from GitHub issues"
@@ -696,27 +705,95 @@ main() {
echo " - Circuit breaker pauses dispatch on consecutive failures"
echo ""
read -r -p "Enable supervisor pulse? [Y/n]: " enable_pulse
- if [[ "$enable_pulse" =~ ^[Yy]?$ || -z "$enable_pulse" ]]; then
- _install_pulse=true
- else
- print_info "Skipped. Enable later — see: runners.md 'Pulse Scheduler Setup'"
+ if [[ ! "$enable_pulse" =~ ^[Yy]?$ && -n "$enable_pulse" ]]; then
+ _do_install=false
+ print_info "Skipped. Enable later: ./setup.sh (re-run)"
fi
fi
- if [[ "$_install_pulse" == "true" ]]; then
+ if [[ "$_do_install" == "true" ]]; then
mkdir -p "$HOME/.aidevops/logs"
- local _pulse_cmd="$opencode_path run \"/pulse\" --dir $_aidevops_dir -m anthropic/claude-sonnet-4-6 --title \"Supervisor Pulse\""
- # Install as cron entry (works on both macOS and Linux)
- (
- crontab -l 2>/dev/null | grep -v 'aidevops: supervisor-pulse'
- echo "*/2 * * * * pgrep -f 'Supervisor Pulse' >/dev/null || $_pulse_cmd >> $HOME/.aidevops/logs/pulse.log 2>&1 # aidevops: supervisor-pulse"
- ) | crontab - 2>/dev/null || true
- if crontab -l 2>/dev/null | grep -qF "aidevops: supervisor-pulse"; then
- print_info "Supervisor pulse enabled (cron, every 2 min). Disable: crontab -e and remove the supervisor-pulse line"
+
+ if [[ "$(uname -s)" == "Darwin" ]]; then
+ # macOS: use launchd plist with wrapper
+ local pulse_plist="$HOME/Library/LaunchAgents/${pulse_label}.plist"
+
+ # Unload old plist if upgrading
+ if _launchd_has_agent "$pulse_label"; then
+ launchctl unload "$pulse_plist" 2>/dev/null || true
+ pkill -f 'Supervisor Pulse' 2>/dev/null || true
+ fi
+
+ # Also clean up old label if present
+ local old_plist="$HOME/Library/LaunchAgents/com.aidevops.supervisor-pulse.plist"
+ if [[ -f "$old_plist" ]]; then
+ launchctl unload "$old_plist" 2>/dev/null || true
+ rm -f "$old_plist"
+ fi
+
+ # Write the plist
+ cat >"$pulse_plist" <
+
+
+
+ Label
+ ${pulse_label}
+ ProgramArguments
+
+ /bin/bash
+ ${wrapper_script}
+
+ StartInterval
+ 120
+ StandardOutPath
+ ${HOME}/.aidevops/logs/pulse-wrapper.log
+ StandardErrorPath
+ ${HOME}/.aidevops/logs/pulse-wrapper.log
+ EnvironmentVariables
+
+ PATH
+ ${PATH}
+ HOME
+ ${HOME}
+ OPENCODE_BIN
+ ${opencode_bin}
+ PULSE_DIR
+ ${_aidevops_dir}
+ PULSE_TIMEOUT
+ 600
+ PULSE_STALE_THRESHOLD
+ 900
+
+ RunAtLoad
+
+ KeepAlive
+
+
+
+PLIST
+
+ if launchctl load "$pulse_plist" 2>/dev/null; then
+ print_info "Supervisor pulse enabled (launchd, every 2 min)"
+ else
+ print_warning "Failed to load supervisor pulse LaunchAgent"
+ fi
else
- print_warning "Failed to install supervisor pulse cron entry. See runners.md for manual setup."
+ # Linux: use cron entry with wrapper
+ # Remove old-style cron entries (direct opencode invocation)
+ (
+ crontab -l 2>/dev/null | grep -v 'aidevops: supervisor-pulse'
+ echo "*/2 * * * * OPENCODE_BIN=${opencode_bin} PULSE_DIR=${_aidevops_dir} /bin/bash ${wrapper_script} >> $HOME/.aidevops/logs/pulse-wrapper.log 2>&1 # aidevops: supervisor-pulse"
+ ) | crontab - 2>/dev/null || true
+ if crontab -l 2>/dev/null | grep -qF "aidevops: supervisor-pulse"; then
+ print_info "Supervisor pulse enabled (cron, every 2 min). Disable: crontab -e and remove the supervisor-pulse line"
+ else
+ print_warning "Failed to install supervisor pulse cron entry. See runners.md for manual setup."
+ fi
fi
fi
+ elif [[ ! -f "$wrapper_script" ]]; then
+ print_warning "pulse-wrapper.sh not found — supervisor pulse not installed"
fi
fi