diff --git a/.claude/skills/archon/SKILL.md b/.claude/skills/archon/SKILL.md
index 7f126c9bac..9a9a2f7c0b 100644
--- a/.claude/skills/archon/SKILL.md
+++ b/.claude/skills/archon/SKILL.md
@@ -42,12 +42,54 @@ Determine the user's intent and dispatch to the appropriate guide:
 | **Variable substitution reference** | Read `references/variables.md` |
 | **CLI command reference** | Read `references/cli-commands.md` |
 | **Run an interactive workflow** | Read `references/interactive-workflows.md` — transparent relay protocol |
+| **Workflow good practices / anti-patterns** | Read `references/good-practices.md` — read before designing a non-trivial workflow |
+| **Troubleshoot a failing / stuck workflow** | Read `references/troubleshooting.md` — log locations, common failure modes |
 | **Run a workflow (default)** | Continue with "Running Workflows" below |
 
 If the intent is ambiguous, ask the user to clarify.
 
 ---
 
+## Richer Context: [archon.diy](https://archon.diy)
+
+The references in this skill are a distilled subset. The full, canonical docs live at **[archon.diy](https://archon.diy)** (Starlight site from `packages/docs-web/`). If the skill's reference pages don't cover what you need — an edge case, a worked example, a diagram, a deeper section on a feature — fetch the matching page from archon.diy.
+
+### When to reach for the live docs
+
+- You need an end-to-end example that's longer than what the skill shows (e.g. full patterns for hooks, MCP config, sandbox schema, approval flows)
+- You're explaining a concept to the user and want the most readable framing (the `book/` series is written as a tutorial, not a reference)
+- You hit a feature the skill only mentions in passing (e.g. `agents:` inline sub-agents, advanced Codex options, the full SyncHookJSONOutput schema)
+- The user asks "where is this documented?" — point them at the archon.diy URL, not a skill file path
+
+### URL map
+
+| Topic | URL |
+|-------|-----|
+| Landing + install | [archon.diy](https://archon.diy) |
+| Getting started (installation, quick start, concepts) | [archon.diy/getting-started/](https://archon.diy/getting-started/overview/) |
+| The book (tutorial-style walkthrough) | [archon.diy/book/](https://archon.diy/book/) |
+| Workflow authoring guide | [archon.diy/guides/authoring-workflows/](https://archon.diy/guides/authoring-workflows/) |
+| Command authoring guide | [archon.diy/guides/authoring-commands/](https://archon.diy/guides/authoring-commands/) |
+| Node type guides | [archon.diy/guides/loop-nodes/](https://archon.diy/guides/loop-nodes/), [/approval-nodes/](https://archon.diy/guides/approval-nodes/), [/script-nodes/](https://archon.diy/guides/script-nodes/) |
+| Per-node features (Claude only) | [/hooks/](https://archon.diy/guides/hooks/), [/mcp-servers/](https://archon.diy/guides/mcp-servers/), [/skills/](https://archon.diy/guides/skills/) |
+| Global workflows/commands/scripts | [archon.diy/guides/global-workflows/](https://archon.diy/guides/global-workflows/) |
+| Variables reference | [archon.diy/reference/variables/](https://archon.diy/reference/variables/) |
+| CLI reference | [archon.diy/reference/cli/](https://archon.diy/reference/cli/) |
+| Security model (env, sandbox, target-repo `.env` stripping) | [archon.diy/reference/security/](https://archon.diy/reference/security/) |
+| Architecture | [archon.diy/reference/architecture/](https://archon.diy/reference/architecture/) |
+| Configuration (`.archon/config.yaml` full schema) | [archon.diy/reference/configuration/](https://archon.diy/reference/configuration/) |
+| Troubleshooting | [archon.diy/reference/troubleshooting/](https://archon.diy/reference/troubleshooting/) |
+| Adapter setup (Slack/Telegram/GitHub/Web/Discord/Gitea/GitLab) | [archon.diy/adapters/](https://archon.diy/adapters/) |
+| Deployment (Docker, cloud, Windows) | [archon.diy/deployment/](https://archon.diy/deployment/) |
+
+URL shape is `archon.diy/<section>/<page>/` — the paths mirror the filenames under `packages/docs-web/src/content/docs/`.
+
+### Precedence
+
+This skill's reference pages are the primary source for routine workflow authoring, CLI use, and setup. Reach for archon.diy when the skill is incomplete for your case — don't go to the live docs first by default (skill refs load into context faster and are tuned for agents).
+
+---
+
 ## Running Workflows
 
 ### Core Command
@@ -204,6 +246,29 @@ Each node has exactly ONE of: `command`, `prompt`, `bash`, `script`, `loop`, `ap
     until_bash: "bun run test"    # Optional: exit 0 = done
 ```
 
+**Approval node** — pauses the workflow for human review. Requires `interactive: true` at the workflow level for Web UI delivery:
+```yaml
+interactive: true   # workflow level — required for web UI
+
+nodes:
+  - id: review-gate
+    approval:
+      message: "Review the plan above before proceeding."
+      capture_response: true      # Optional: user's comment → $review-gate.output
+      on_reject:                  # Optional: AI rework on rejection instead of cancel
+        prompt: "Revise based on feedback: $REJECTION_REASON"
+        max_attempts: 3           # Range 1-10, default 3
+    depends_on: [plan]
+```
+
+**Cancel node** — terminates the workflow with a reason. Typically gated with `when:`:
+```yaml
+- id: stop-if-unsafe
+  cancel: "Refusing to proceed: input flagged UNSAFE."
+  depends_on: [classify]
+  when: "$classify.output != 'SAFE'"
+```
+
 For the full authoring guide with all fields, conditions, trigger rules, and patterns: Read `references/workflow-dag.md`
 
 ### Creating a Command File
diff --git a/.claude/skills/archon/references/authoring-commands.md b/.claude/skills/archon/references/authoring-commands.md
index 0b1240da6b..603dd3e4a3 100644
--- a/.claude/skills/archon/references/authoring-commands.md
+++ b/.claude/skills/archon/references/authoring-commands.md
@@ -4,14 +4,29 @@ Commands are plain Markdown files containing AI prompt templates. They are the a
 
 ## File Location
 
+Commands are discovered from three scopes, highest-precedence first:
+
 ```
-.archon/commands/
-├── my-command.md           # Custom command
-├── review-code.md          # Another custom command
-└── defaults/               # Optional: override bundled defaults
-    └── archon-assist.md    # Overrides the bundled archon-assist
+<repoRoot>/.archon/commands/     # 1. Repo-scoped (wins)
+├── my-command.md                #    Custom command for this repo
+├── archon-assist.md             #    Overrides the bundled archon-assist
+└── triage/                      #    Subfolders allowed, 1 level deep
+    └── review.md                #    Resolves as 'review', not 'triage/review'
+
+~/.archon/commands/              # 2. Home-scoped (user-level, shared across all repos)
+├── review-checklist.md          #    Personal helper available in every repo
+└── pr-style-guide.md
+
+<bundled defaults>                # 3. Shipped with Archon (archon-assist, etc.)
 ```
 
+**Resolution rules:**
+
+- Filename-without-extension is the command name (e.g. `my-command.md` → `my-command`).
+- 1-level subfolders are supported for grouping; resolution is still by filename (`triage/review.md` → `review`).
+- Repo scope overrides home scope overrides bundled, by name.
+- Duplicate basenames **within a scope** (e.g. two different `review.md` files in `triage/` and `security/`) are a user error — keep names unique within each scope.
+
 Commands are referenced by name (without `.md`) in workflow YAML files.
 
 ## File Format
@@ -78,11 +93,14 @@ Command names must:
 ## Discovery and Priority
 
 When a workflow references `command: my-command`, Archon searches in this order:
-1. `.archon/commands/my-command.md` (repo custom)
-2. `.archon/commands/defaults/my-command.md` (repo default overrides)
+
+1. `<repoRoot>/.archon/commands/my-command.md` (repo scope)
+2. `~/.archon/commands/my-command.md` (home scope — shared across every repo on the machine)
 3. Bundled defaults (shipped with Archon)
 
-First match wins. To override a bundled command, create a file with the same name in your repo.
+First match wins. To override a bundled command, drop a file with the same name at either scope. To override a home-scoped command for a specific repo, drop a file with the same name in that repo's `.archon/commands/`.
+
+> **Web UI note**: Home-scoped commands appear in the workflow builder's node palette under a dedicated "Global (~/.archon/commands/)" section, distinct from project and bundled entries.
 
 ## Referencing Commands from Workflows
 
diff --git a/.claude/skills/archon/references/cli-commands.md b/.claude/skills/archon/references/cli-commands.md
index 157eacb713..0cc1a0ee06 100644
--- a/.claude/skills/archon/references/cli-commands.md
+++ b/.claude/skills/archon/references/cli-commands.md
@@ -32,7 +32,7 @@ archon workflow run archon-fix-github-issue --resume
 | `--branch <name>` / `-b` | Branch name for worktree. Reuses existing worktree if healthy |
 | `--from <name>` / `--from-branch <name>` | Start-point branch for new worktree (default: repo default branch) |
 | `--no-worktree` | Skip isolation — run in the live checkout |
-| `--resume` | Resume the last failed run of this workflow (skips completed steps/nodes) |
+| `--resume` | Resume the last failed run of this workflow at this cwd (skips completed nodes) |
 | `--cwd <path>` | Working directory override |
 
 **Flag conflicts** (errors):
@@ -42,6 +42,87 @@ archon workflow run archon-fix-github-issue --resume
 
 **Default behavior** (no flags): Auto-creates a worktree with branch name `{workflow-name}-{timestamp}`.
 
+**Auto-resume without `--resume`**: If a prior invocation of the same workflow at the same cwd failed, the next invocation automatically skips completed nodes. `--resume` is only needed when you want to force resume a specific failed run or to reuse the worktree from that run.
+
+### `archon workflow status`
+
+Show the currently running workflow (if any) with its run ID, state, and last activity.
+
+```bash
+archon workflow status
+archon workflow status --json       # Machine-readable output
+```
+
+### `archon workflow approve <run-id> [comment]`
+
+Approve a paused approval-node workflow. Auto-resumes the workflow.
+
+```bash
+archon workflow approve abc123
+archon workflow approve abc123 --comment "Plan looks good"
+archon workflow approve abc123 "Plan looks good"   # positional form
+```
+
+For interactive loop nodes, the comment becomes `$LOOP_USER_INPUT` on the next iteration. For approval nodes with `capture_response: true`, the comment becomes `$<gate-id>.output` for downstream nodes.
+
+### `archon workflow reject <run-id> [reason]`
+
+Reject a paused approval gate. Without `on_reject` on the node, cancels the workflow. With `on_reject`, runs the rework prompt with `$REJECTION_REASON` substituted and re-pauses.
+
+```bash
+archon workflow reject abc123
+archon workflow reject abc123 --reason "Plan misses test coverage"
+archon workflow reject abc123 "Plan misses test coverage"
+```
+
+### `archon workflow abandon <run-id>`
+
+Mark a non-terminal workflow run as cancelled. Use when a `running` row is stuck after a server crash or when you want to discard a paused run without rejecting. This does NOT kill an in-flight subprocess — it only transitions the DB row.
+
+```bash
+archon workflow abandon abc123
+```
+
+> **There is no `archon workflow cancel` CLI subcommand.** To actively cancel a running workflow (terminate its subprocess), use the chat slash command `/workflow cancel <run-id>` on the platform that started it (Web UI, Slack, Telegram, etc.), or the Cancel button on the Web UI dashboard. The CLI only offers `abandon`, which is the right tool for orphan cleanup but does not interrupt a live subprocess.
+
+### `archon workflow resume <run-id> [message]`
+
+Explicitly re-run a failed run. Most workflows auto-resume without this — use it when you want to force a specific run ID.
+
+```bash
+archon workflow resume abc123
+archon workflow resume abc123 "continue with the plan"
+```
+
+### `archon workflow cleanup [days]`
+
+**Deletes** old terminal workflow runs (`completed`/`failed`/`cancelled`) from the database for disk hygiene. Does NOT transition `running` rows — use `abandon`/`cancel` for those.
+
+```bash
+archon workflow cleanup             # Default: 7 days
+archon workflow cleanup 30          # Custom: 30 days
+```
+
+### `archon workflow event emit --run-id <uuid> --type <event-type> [--data <json>]`
+
+Emit a workflow event to a running workflow. Used inside loop prompts to signal state (e.g. "checkpoint written") for observability. Rarely invoked from the shell directly.
+
+```bash
+archon workflow event emit --run-id abc123 --type checkpoint --data '{"step":"plan"}'
+```
+
+### `archon continue <branch> [flags] [message]`
+
+Continue work on a branch with prior context. Defaults to `archon-assist`; use `--workflow` to pick a different workflow. Useful for iterative sessions on the same worktree without typing the full `workflow run` incantation.
+
+```bash
+archon continue feat/auth "Add password reset"
+archon continue feat/auth --workflow archon-feature-development "Continue from step 3"
+archon continue feat/auth --no-context "Start fresh without loading prior artifacts"
+```
+
+Flags: `--workflow <name>`, `--no-context`.
+
 ## Isolation Commands
 
 ### `archon isolation list`
@@ -59,11 +140,20 @@ Outputs: branch name, path, workflow type, platform, last activity age. Ghost en
 Remove stale worktree environments.
 
 ```bash
-archon isolation cleanup          # Default: 7 days
-archon isolation cleanup 14       # Custom: 14 days
-archon isolation cleanup --merged # Remove branches merged into main (+ remote branches)
+archon isolation cleanup                             # Default: 7 days
+archon isolation cleanup 14                          # Custom: 14 days
+archon isolation cleanup --merged                    # Also remove worktrees whose branches merged into main (deletes remote branches too)
+archon isolation cleanup --merged --include-closed   # Also remove worktrees whose PRs were closed without merging
 ```
 
+**Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `[days]` | Positional — age threshold in days. Environments untouched for longer than this are removed. Default: 7 |
+| `--merged` | Union of three signals — ancestry (`git branch --merged`), patch equivalence (`git cherry`), and PR state (`gh`) — safely catches squash-merges |
+| `--include-closed` | With `--merged`, also remove worktrees whose PRs were closed (abandoned, not merged) |
+
 ## Validate Commands
 
 ### `archon validate workflows [name]`
diff --git a/.claude/skills/archon/references/good-practices.md b/.claude/skills/archon/references/good-practices.md
new file mode 100644
index 0000000000..e731a2583d
--- /dev/null
+++ b/.claude/skills/archon/references/good-practices.md
@@ -0,0 +1,241 @@
+# Workflow Good Practices and Anti-Patterns
+
+Guidance for authoring workflows that survive first contact with a real codebase. Written for an agent or human writing their first non-trivial workflow.
+
+## Good Practices
+
+### 1. Use deterministic nodes for deterministic work
+
+AI nodes are expensive, non-reproducible, and can hallucinate. Use `bash:` or `script:` for anything that has a right answer a computer can produce.
+
+- **Run tests** with `bash: "bun run test"`, not `prompt: "run the tests and tell me if they passed"`.
+- **Parse JSON** with `script:` (bun/uv), not a `prompt:` that re-derives structure from free text.
+- **Read files with known paths** via `bash: "cat path/to/file"` or `Read` in an AI node where the agent actually needs to reason about the content.
+- **Git state checks** (current branch, uncommitted changes, merge-base) → `bash:`.
+
+### 2. Use `output_format` for every node whose output downstream `when:` reads
+
+`when:` conditions do best-effort JSON parsing on `$nodeId.output` for `.field` access. If the upstream node doesn't enforce a shape, you're pattern-matching free-form AI text — fragile.
+
+```yaml
+# GOOD
+- id: classify
+  prompt: "Classify as BUG or FEATURE"
+  output_format:                          # enforces the JSON shape
+    type: object
+    properties:
+      type: { type: string, enum: [BUG, FEATURE] }
+    required: [type]
+
+- id: investigate
+  command: investigate-bug
+  depends_on: [classify]
+  when: "$classify.output.type == 'BUG'"  # safe field access
+
+# BAD
+- id: classify
+  prompt: "Is this a bug or a feature?"
+  # no output_format; AI might reply "it looks like a bug", "BUG", or "This is a bug.\n\n..."
+
+- id: investigate
+  command: investigate-bug
+  depends_on: [classify]
+  when: "$classify.output == 'BUG'"       # fragile string match
+```
+
+### 3. `trigger_rule: none_failed_min_one_success` after conditional branches
+
+After `when:`-gated branches, the downstream merge node will see one or more **skipped** dependencies. Skipped ≠ success. Default `all_success` fails.
+
+```yaml
+- id: investigate
+  command: investigate-bug
+  depends_on: [classify]
+  when: "$classify.output.type == 'BUG'"
+
+- id: plan
+  command: plan-feature
+  depends_on: [classify]
+  when: "$classify.output.type == 'FEATURE'"
+
+- id: implement
+  command: implement
+  depends_on: [investigate, plan]
+  trigger_rule: none_failed_min_one_success   # CORRECT — exactly one ran
+  # trigger_rule: all_success               ← would fail here (one dep skipped)
+```
+
+Use `one_success` when any dep succeeding is enough; `none_failed_min_one_success` when no dep should have failed AND at least one must have succeeded; `all_done` for "run cleanup regardless" patterns with `cancel:` or notification nodes.
+
+### 4. `context: fresh` requires artifacts for state passing
+
+A node with `context: fresh` starts with no memory of prior nodes in the same workflow. The only way state moves is via files. Default is `fresh` for parallel layers and `shared` for sequential — explicit `context: fresh` is common when you want cost isolation.
+
+```yaml
+- id: investigate
+  command: investigate-bug
+  # Investigator WRITES to $ARTIFACTS_DIR/investigation.md
+
+- id: implement
+  command: implement-fix
+  depends_on: [investigate]
+  context: fresh
+  # Implementer MUST read $ARTIFACTS_DIR/investigation.md — it has no memory
+  # of what the investigator found.
+```
+
+Command files should lead with "read artifacts from `$ARTIFACTS_DIR/...`" when they're downstream of a fresh node. This is the single biggest quality lever on multi-node workflows.
+
+### 5. Cheap models for glue, strong models for substance
+
+Classification, routing, formatting, and short summaries don't need Opus. Use `model: haiku` for these and reserve `sonnet`/`opus` for the nodes that actually produce code or long-form analysis. Combined with `allowed_tools: []` on pure-text nodes, this cuts cost dramatically.
+
+```yaml
+- id: classify
+  prompt: "Classify this issue"
+  model: haiku              # fast + cheap
+  allowed_tools: []         # no tool overhead
+  output_format: { ... }
+
+- id: implement
+  command: implement-fix
+  model: sonnet             # where the thinking happens
+```
+
+### 6. Write the workflow description for routing
+
+Archon's orchestrator routes user intent to workflows by description. Write descriptions that make routing obvious.
+
+- Start with the imperative action: "Fix a GitHub issue end-to-end", "Generate a Remotion video composition".
+- Mention triggers: "Use when the user asks to review a PR", "Use when there's a failing test run".
+- Mention what it does NOT do: "Does not create a PR — use `archon-plan-to-pr` for that".
+
+### 7. Validate before shipping
+
+Never declare a workflow "done" without:
+
+```bash
+archon validate workflows <name>     # YAML + DAG structure + resource refs
+```
+
+This checks: YAML syntax, node ID uniqueness, no cycles, all `depends_on` exist, all `$nodeId.output` refs point to known nodes, all `command:` files exist, all `mcp:` configs parse, all `skills:` directories exist, provider/model compatibility, named script existence, runtime availability. Fix everything it reports before first run.
+
+For brand-new workflows, also:
+1. Run once against a trivial input (`archon workflow run my-workflow --branch test/sanity "hello"`)
+2. Check the run log at `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl`
+3. Check artifacts at `~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<run-id>/`
+
+See `references/troubleshooting.md` for how to read those.
+
+### 8. Design the artifact chain before writing command files
+
+In a multi-node workflow, each node's artifact IS the specification for the next node. Before writing any command body, map out:
+
+| Node | Reads | Writes |
+|------|-------|--------|
+| `investigate-issue` | GitHub issue via `gh` | `$ARTIFACTS_DIR/issues/issue-{n}.md` |
+| `implement-issue` | Artifact from `investigate-issue` | Code files, tests |
+| `create-pr` | Git diff | GitHub PR, `$ARTIFACTS_DIR/pr-body.md` |
+
+If a downstream agent can't execute from just its artifact, the artifact is incomplete. This is the single most common failure mode in multi-node workflows.
+
+### 9. Keep workflows reversible
+
+Use `worktree.enabled: true` at the workflow level for anything that modifies the codebase. The CLI `--no-worktree` flag will hard-error, forcing users into isolation. The cost is a one-time cp of the worktree; the benefit is never having a failed workflow corrupt a live checkout.
+
+For read-only workflows (triage, reporting, code analysis), pin `worktree.enabled: false` instead — saves the worktree setup cost.
+
+---
+
+## Anti-Patterns
+
+### ❌ Asking AI to run deterministic checks
+
+```yaml
+# BAD
+- id: test
+  prompt: "Run bun run test and tell me if it passed"
+
+# GOOD
+- id: test
+  bash: "bun run test 2>&1"
+
+- id: react-to-tests
+  prompt: "Fix any failures: $test.output"
+  depends_on: [test]
+  trigger_rule: all_done            # run even if tests failed
+```
+
+### ❌ Pattern-matching free-form AI output in `when:`
+
+```yaml
+# BAD — brittle
+- id: decide
+  prompt: "Should we proceed? Answer yes or no."
+- id: do-thing
+  depends_on: [decide]
+  when: "$decide.output == 'yes'"    # AI says "Yes!" or "Yes, because..." — no match
+
+# GOOD
+- id: decide
+  prompt: "Should we proceed?"
+  output_format:
+    type: object
+    properties: { proceed: { type: boolean } }
+    required: [proceed]
+- id: do-thing
+  depends_on: [decide]
+  when: "$decide.output.proceed == 'true'"
+```
+
+### ❌ Commands that assume prior-node memory in a `context: fresh` chain
+
+```markdown
+<!-- BAD — implement.md -->
+Fix the bug we discussed in the investigation phase.
+
+<!-- GOOD — implement.md -->
+Read the investigation at `$ARTIFACTS_DIR/issues/issue-{n}.md`.
+Extract the root cause, affected files, and implementation plan.
+Implement the changes exactly as specified in the plan.
+```
+
+### ❌ Long flat layers of AI nodes
+
+Ten sibling `prompt:` nodes in one layer all depending on one upstream is a $N/run cost bomb and a latency trap. If the work is parallel and similar, use the `agents:` inline sub-agent map-reduce pattern with a cheap model per item and a single stronger reducer. See `references/dag-advanced.md` and the [Inline sub-agents section on archon.diy](https://archon.diy/guides/authoring-workflows/#inline-sub-agents) for a worked example.
+
+### ❌ Hardcoding secrets in YAML or MCP configs
+
+Use `$ENV_VAR` expansion in MCP configs and the `env:` block in `.archon/config.yaml` (or Web UI Settings → Projects → Env Vars). See `references/repo-init.md` §Per-Project Env Injection.
+
+### ❌ `retry` on a loop node
+
+Loop nodes manage their own iteration via `max_iterations`. Setting `retry:` on a loop is a **hard parse error** — the workflow fails to load. If a loop iteration is flaky, handle it inside the loop prompt (the AI can retry tool calls) or use `until_bash` to gate completion on a deterministic check.
+
+### ❌ Tiny `max_iterations` on open-ended loops
+
+A loop with `max_iterations: 3` that's supposed to implement N stories from a PRD will silently stop after 3 iterations and leave the work half-done. Think about the worst case — multi-story PRDs need 10–20, fix-iterate cycles need 5–8, refinement loops need 3–5.
+
+### ❌ Missing `interactive: true` at workflow level for approval/loop gates on web
+
+Web UI dispatches non-interactive workflows to a background worker that cannot deliver chat messages. Approval-gate messages and loop `gate_message` prompts will never reach the user. If the workflow has `approval:` nodes OR `loop.interactive: true`, set workflow-level `interactive: true`.
+
+### ❌ Tool-restricted nodes without the MCP wildcard
+
+```yaml
+# BAD — no tools available, including MCP
+- id: analyze
+  prompt: "Use the Postgres MCP to query users"
+  mcp: .archon/mcp/postgres.json
+  allowed_tools: []          # OOPS — disables EVERYTHING, including MCP tools
+
+# FIXED — Archon auto-adds mcp__<server>__* wildcards when mcp: is set,
+# so this actually works out of the box. The anti-pattern is forgetting
+# and manually adding Read/Write/Bash/etc. when you only want MCP.
+- id: analyze
+  prompt: "Use Postgres MCP to query users"
+  mcp: .archon/mcp/postgres.json
+  allowed_tools: []          # correct — MCP tools auto-attached
+```
+
+Caveat: this only helps Claude. Codex gets MCP config from `~/.codex/config.toml` globally, not per-node.
diff --git a/.claude/skills/archon/references/interactive-workflows.md b/.claude/skills/archon/references/interactive-workflows.md
index 243cfdb7b0..856d50afd1 100644
--- a/.claude/skills/archon/references/interactive-workflows.md
+++ b/.claude/skills/archon/references/interactive-workflows.md
@@ -103,4 +103,4 @@ archon workflow reject <run-id> "reason for rejection"
 
 - **Workflow shows `running` for a long time**: The AI is doing research/implementation. Be patient — check again in a few minutes.
 - **Log file not found**: The log is at `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl`
-- **User wants to cancel**: Run `archon workflow reject <run-id>` or `archon workflow cancel <run-id>`
+- **User wants to cancel**: Run `archon workflow reject <run-id>` to stop at an approval gate, or `archon workflow abandon <run-id>` to mark the run cancelled without killing any subprocess. To actively terminate a still-live subprocess, use the chat slash command `/workflow cancel <run-id>` on the platform that started it — there is no `archon workflow cancel` CLI subcommand
diff --git a/.claude/skills/archon/references/repo-init.md b/.claude/skills/archon/references/repo-init.md
index 66be6375f5..e44907fd2e 100644
--- a/.claude/skills/archon/references/repo-init.md
+++ b/.claude/skills/archon/references/repo-init.md
@@ -10,14 +10,27 @@ Create the following in your repository root:
 .archon/
 ├── commands/         # Custom command files (.md)
 ├── workflows/        # Workflow definitions (.yaml)
+├── scripts/          # Named scripts for script: nodes (.ts/.js for bun, .py for uv) — optional
 ├── mcp/              # MCP server config files (.json) — optional
-└── config.yaml       # Repo-specific configuration — optional
+├── state/            # Cross-run workflow state — gitignored, never committed
+├── config.yaml       # Repo-specific configuration — optional
+└── .env              # Repo-scoped Archon env (optional; do NOT commit)
 ```
 
 ```bash
-mkdir -p .archon/commands .archon/workflows
+mkdir -p .archon/commands .archon/workflows .archon/scripts
 ```
 
+**What each directory is for:**
+
+- `commands/` — Reusable prompt templates used by `command:` workflow nodes. Committed to git.
+- `workflows/` — YAML workflow definitions. Committed to git.
+- `scripts/` — Named TypeScript/JavaScript (bun) or Python (uv) scripts referenced by `script:` nodes. Extension determines runtime: `.ts`/`.js` → bun, `.py` → uv. Committed to git.
+- `mcp/` — MCP server JSON configs. Usually checked in with `$ENV_VAR` references; avoid hardcoding secrets. Some teams gitignore this and rely entirely on env expansion.
+- `state/` — Workflow-written cross-run state (e.g. the `repo-triage` dedup log). **Always gitignore** — these are runtime artifacts, not source.
+- `config.yaml` — Repo-specific defaults (assistant, worktree settings, etc.). Committed to git.
+- `.env` — Repo-scoped Archon env (loaded with `override: true` at boot). **Do NOT commit.** This is different from the target repo's top-level `.env` — that file belongs to the target project, and Archon strips its auto-loaded keys from subprocess env before spawning AI to prevent leakage. See **Three-Path Env Model** below.
+
 ## Minimal config.yaml
 
 Create `.archon/config.yaml` only if you need to override defaults:
@@ -52,11 +65,59 @@ Archon ships with built-in commands and workflows (like `archon-assist`, `archon
 Add to your `.gitignore`:
 
 ```gitignore
-# Archon runtime artifacts (never commit)
-.archon/mcp/          # May contain env var references
+# Archon runtime artifacts — NEVER commit
+.archon/state/        # Cross-run workflow state, runtime-only
+.archon/.env          # Repo-scoped Archon env (secrets)
+
+# Optional — gitignore if your MCP configs hardcode secrets
+.archon/mcp/
+```
+
+`.archon/commands/`, `.archon/workflows/`, and `.archon/scripts/` **should be committed** — they are part of your project's workflow definitions. `.archon/config.yaml` should be committed unless it contains secrets (use `.archon/.env` for those instead).
+
+## Three-Path Env Model
+
+Archon loads env from three distinct paths at boot, with different trust levels and precedence:
+
+| Path | Scope | Trust | Loaded? |
+|------|-------|-------|---------|
+| `~/.archon/.env` | User (home) | Trusted — user owns it | Yes, with `override: true` |
+| `<cwd>/.archon/.env` | Repo (per-project, Archon-owned) | Trusted — user owns it | Yes, with `override: true` (overrides home) |
+| `<cwd>/.env` | Target repo | **Untrusted** — belongs to the project being worked on | **Stripped from `process.env`** before subprocess spawn to prevent secret leakage (see [archon.diy/reference/security/](https://archon.diy/reference/security/#target-repo-env-isolation) for the full trust model) |
+
+Boot behavior emits observable log lines:
+
+```
+[archon] loaded N keys from ~/.archon/.env
+[archon] loaded M keys from /path/to/repo/.archon/.env
+[archon] stripped K keys from /path/to/repo (ANTHROPIC_API_KEY, OPENAI_API_KEY, ...)
 ```
 
-The `.archon/commands/` and `.archon/workflows/` directories should be committed — they are part of your project's workflow definitions.
+**Where should you put what?**
+
+- **API keys for Archon itself** (`ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, `DATABASE_URL`, `SLACK_BOT_TOKEN`, etc.) → `~/.archon/.env` (shared across all repos) or `<cwd>/.archon/.env` (per-repo override).
+- **Target-project env that a workflow needs** (`GH_TOKEN`, `DOTENV_PRIVATE_KEY`, etc.) → see [Per-Project Env Injection](#per-project-env-injection) below.
+- **Target-project env that Archon should NOT touch** → leave it in `<cwd>/.env` where the project already expects it. Archon strips it from subprocess env but doesn't delete the file.
+
+The `archon setup --scope home|project [--force]` wizard writes to the right file for you and produces a timestamped backup on every rewrite.
+
+## Per-Project Env Injection
+
+For env vars a workflow's `bash:` and `script:` subprocesses need (`GH_TOKEN` for `gh` calls, `DATABASE_URL` for a migration script, etc.), use one of the two **managed injection** surfaces — both inject into subprocess env at workflow execution time, after the target-repo `.env` strip:
+
+**Option 1: `.archon/config.yaml` `env:` block** (checked into git; values can be `$REF_NAME` expansions from Archon env):
+
+```yaml
+env:
+  GH_TOKEN: $GH_TOKEN             # expanded from ~/.archon/.env at runtime
+  BUILD_TARGET: production        # literal value
+```
+
+**Option 2: Web UI Settings → Projects → Env Vars** — per-codebase, stored in the Archon DB, values never returned over the API (only keys are listed). Use this for values that should NOT appear in git.
+
+Both surfaces inject into: Claude/Codex/Pi subprocess env, `bash:` node subprocess env, `script:` node subprocess env, and direct chat messages that run against the codebase. The worktree isolation layer propagates them as well.
+
+> **About keys in the target repo's `<cwd>/.env`**: Archon unconditionally strips the keys auto-loaded from `<cwd>/.env` out of `process.env` at boot (see the Three-Path Env Model above) and the Bun subprocess is invoked with `--no-env-file`, so those values do NOT reach AI / bash / script subprocesses. If a workflow needs a value that currently lives in the target repo's `.env`, surface it through one of the two managed injection options above — don't expect the target `.env` to leak through.
 
 ## Global Configuration
 
diff --git a/.claude/skills/archon/references/troubleshooting.md b/.claude/skills/archon/references/troubleshooting.md
new file mode 100644
index 0000000000..099cccd928
--- /dev/null
+++ b/.claude/skills/archon/references/troubleshooting.md
@@ -0,0 +1,162 @@
+# Troubleshooting Workflows
+
+Where to look when a workflow fails, hangs, or does the wrong thing.
+
+## Log Locations
+
+Workflow run logs are written as JSONL per run:
+
+```
+~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl
+```
+
+Each line is a structured event. The discriminator is the `type` field. Values (see `packages/workflows/src/logger.ts` for the canonical list):
+
+| `type` | Meaning |
+|--------|---------|
+| `workflow_start` / `workflow_complete` / `workflow_error` | Run lifecycle |
+| `node_start` / `node_complete` / `node_error` / `node_skipped` | Node lifecycle |
+| `assistant` | AI assistant message — has `content` field with the full AI output |
+| `tool` | SDK tool invocation — has `tool_name`, `tool_input`, `duration_ms`, and optionally `tokens` |
+| `validation` | Workflow-level validation event — has `check` and `result` (`pass` / `fail` / `warn` / `unknown`) |
+
+> **Loop iterations and per-attempt retry events are NOT in the JSONL file.** They go through the workflow event emitter (WebSocket / `workflow_events` DB table) under `loop_iteration_started` / `loop_iteration_completed` etc. To see them, query the DB or the Web UI dashboard — not the JSONL log.
+
+Find the run ID from `archon workflow status` (most recent run). Then:
+
+```bash
+# Last assistant message (what the AI said before failure)
+jq 'select(.type == "assistant") | .content' <log-file> | tail -1
+
+# All error events (node failures + workflow-level failures)
+jq 'select(.type == "node_error" or .type == "workflow_error")' <log-file>
+
+# Full event stream
+cat <log-file> | jq .
+```
+
+Adapter logs (Slack / Telegram / Web / GitHub) are emitted to stderr when `LOG_LEVEL=debug` is set on the server.
+
+## Artifact Locations
+
+```
+~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<run-id>/
+```
+
+Inspect artifacts when a multi-node workflow produces wrong output. The failing node's upstream artifact is usually where the problem originated.
+
+```bash
+ls ~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<run-id>/
+cat ~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<run-id>/issues/issue-42.md
+```
+
+Artifacts are **external** to the repo on purpose — they don't pollute git.
+
+## Common Failure Modes
+
+### "No base branch could be resolved"
+
+A node references `$BASE_BRANCH` in its prompt, but neither git auto-detection nor `worktree.baseBranch` in `.archon/config.yaml` produced a branch.
+
+**Fix:**
+1. Set `worktree.baseBranch: main` (or `dev`, or whatever) in `.archon/config.yaml`.
+2. Or pass `--from <branch>` on `archon workflow run`.
+3. Or remove the `$BASE_BRANCH` reference if the node doesn't actually need it.
+
+### "Claude Code not found" / "Codex CLI binary not found"
+
+Compiled-binary builds of Archon no longer embed Claude Code / Codex — you install them separately and Archon resolves the binary via env var or config.
+
+**Fix (Claude):**
+- Install: `curl -fsSL https://claude.ai/install.sh | bash` (or `npm install -g @anthropic-ai/claude-code`)
+- Set `CLAUDE_BIN_PATH=/path/to/claude` in `~/.archon/.env`, OR
+- Set `assistants.claude.claudeBinaryPath: /absolute/path` in `.archon/config.yaml`
+- Autodetect covers `$HOME/.local/bin/claude` (native installer) — no config needed if you used that path
+
+**Fix (Codex):**
+- Install: `npm install -g @openai/codex` (or platform-specific instructions)
+- Set `CODEX_CLI_PATH=/path/to/codex` or `assistants.codex.codexBinaryPath` in config
+- Autodetect covers the standard npm / Homebrew locations per platform
+
+See [archon.diy/getting-started/installation/](https://archon.diy/getting-started/installation/) for full platform-specific install paths.
+
+### Workflow shows `running` for a long time but nothing happens
+
+Three possibilities:
+
+1. **The AI is actually working.** Check `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl` — if you see recent `tool` or `assistant` events in the tail, it's fine. Wait.
+2. **The server crashed and left an orphan row.** Server startup no longer auto-fails orphaned `running` rows (per the "No Autonomous Lifecycle Mutation" rule — `CLAUDE.md`). Transition it manually:
+   - Web UI: Dashboard → Abandon or Cancel button on the run card
+   - CLI: `archon workflow abandon <run-id>` — marks the DB row cancelled without killing any subprocess. Right tool for orphans since the subprocess is already gone
+   - Chat (Slack / Telegram / Web): `/workflow cancel <run-id>` — actively terminates the subprocess. Use for a still-live run that needs to be interrupted (there is no `archon workflow cancel` CLI subcommand)
+3. **A node is past its `idle_timeout`.** The default is 5 minutes. Override with per-node `idle_timeout: 600000` (10 min) for long-running nodes.
+
+### Workflow fails mid-way; how do I resume?
+
+Auto-resume is default — just re-invoke the same workflow at the same cwd:
+
+```bash
+archon workflow run my-workflow "original message"
+# → "Resuming workflow — skipping N already-completed node(s)"
+```
+
+Use `--resume` only when you want to force-reuse the same worktree from a specific failed run. Use `archon workflow resume <run-id>` to force a specific run ID.
+
+**Caveat:** AI session context from prior nodes is NOT restored on resume. If a `context: shared` node depended on in-session memory, re-running it will have fresh context. Artifact-based handoff survives; in-context memory does not.
+
+### Approval gate not appearing on web UI
+
+You set `interactive: true` on the approval node but the workflow still runs in the background and no chat message appears.
+
+**Fix:** Set `interactive: true` at the **workflow level** too. Node-level `interactive` is ignored on web without workflow-level `interactive`. See `references/workflow-dag.md` §Approval Nodes and §Interactive Loops.
+
+### `MCP server connection failed: <plugin>` noise in chat
+
+User-level Claude plugin MCPs (e.g. `telegram`, `notion`) inherited from `~/.claude/` fail to connect in the headless subprocess. This is normal — they're not configured for Archon's worktree context. Archon filters these to debug logs (`dag.mcp_plugin_connection_suppressed`) and surfaces only workflow-configured MCP failures.
+
+If you see a failure for an MCP you DID configure via `mcp:` in the workflow: check the config JSON path, the MCP server's `command`/`args`, and any referenced env vars.
+
+### Node output is empty / `$nodeId.output.field` resolves to empty string
+
+Common causes:
+
+1. Upstream node is an AI node without `output_format` — the output is free-form text, JSON parsing fails, field access returns empty.
+2. Upstream node was **skipped** (its `when:` evaluated false). Downstream `when:` with `==` comparisons against a specific value will fail-closed.
+3. Bash/script node printed to stderr, not stdout. Only stdout is captured.
+4. For script nodes, non-zero exit on a non-existent file / missing import silently drops the output. Check the run log for `node_error` entries.
+
+## Useful Diagnostic Commands
+
+```bash
+# All active runs as JSON (running / paused / recently finished, depending on retention)
+archon workflow status --json | jq '.runs[]'
+
+# Human-readable status of any active runs
+archon workflow status
+
+# Active worktrees and their last activity
+archon isolation list
+
+# Validate a specific workflow before running
+archon validate workflows my-workflow
+
+# Validate a specific command
+archon validate commands my-command
+
+# Dump the last 50 lines of a workflow's log
+tail -n 50 ~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl | jq .
+
+# Increase log verbosity (workflow run)
+archon workflow run my-workflow --verbose "..."
+
+# Increase server log verbosity
+LOG_LEVEL=debug bun run start
+```
+
+## Escalation: when nothing makes sense
+
+1. Run `archon version` and note the version.
+2. Run `archon validate workflows <name>` and capture the output.
+3. Grab the last ~50 lines of the run's JSONL log.
+4. Check the `CHANGELOG.md` for known issues / recent changes to the subsystem you're hitting.
+5. File an issue at https://github.com/coleam00/Archon/issues with version, validate output, log tail, and the YAML.
diff --git a/.claude/skills/archon/references/workflow-dag.md b/.claude/skills/archon/references/workflow-dag.md
index 5132e0dab6..817d7e9db0 100644
--- a/.claude/skills/archon/references/workflow-dag.md
+++ b/.claude/skills/archon/references/workflow-dag.md
@@ -20,6 +20,88 @@ nodes:
     depends_on: [other-node]        # Node IDs that must complete first
 ```
 
+## Workflow-Level Fields
+
+Top-level YAML fields on a workflow object. Per-node overrides (same name under a node) win over workflow-level defaults.
+
+### Core
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `name` | string (required) | Workflow identifier (used in `archon workflow run <name>`) |
+| `description` | string (required) | Human-readable summary. Used for routing; see [Workflow Description Best Practices](https://archon.diy/guides/authoring-workflows/#workflow-description-best-practices) |
+| `provider` | string | AI provider (e.g. `claude`, `codex`, `pi`). Default: from `.archon/config.yaml` |
+| `model` | string | Model override. Claude: `sonnet` \| `opus` \| `haiku` \| `claude-*` \| `inherit`. Codex: any non-Claude model ID |
+| `interactive` | boolean | **Required for web UI** when the workflow has approval gates or `loop.interactive` nodes. Forces foreground execution so gate messages reach the user's chat. Default: `false` (background on web) |
+
+### Isolation
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `worktree.enabled` | boolean | Pin isolation regardless of caller. `false` = always live checkout (CLI `--branch`/`--from` hard-error). `true` = always worktree (CLI `--no-worktree` hard-errors). Omit = caller decides. Use `false` for read-only workflows (triage, reporting) |
+
+Other worktree config (`baseBranch`, `copyFiles`, `initSubmodules`, `path`) lives in `.archon/config.yaml`, not the workflow YAML — see `references/repo-init.md`.
+
+### Claude SDK Advanced Options
+
+These fields apply to Claude nodes workflow-wide; each can be overridden per-node. Codex nodes ignore them with a warning.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `effort` | `'low'` \| `'medium'` \| `'high'` \| `'max'` | Claude Agent SDK reasoning depth. Different from Codex `modelReasoningEffort` below |
+| `thinking` | string \| object | Extended thinking. String shorthand: `'adaptive'` \| `'enabled'` \| `'disabled'`. Object form: `{ type: 'enabled', budgetTokens: 8000 }` |
+| `fallbackModel` | string | Model to use if the primary model fails (e.g. `claude-haiku-4-5-20251001`) |
+| `betas` | string[] | SDK beta feature flags (non-empty array). Example: `['context-1m-2025-08-07']` for 1M-context Claude |
+| `sandbox` | object | OS-level filesystem/network restrictions. Nested `network` / `filesystem` sub-objects — see [archon.diy/guides/authoring-workflows/#claude-sdk-advanced-options](https://archon.diy/guides/authoring-workflows/#claude-sdk-advanced-options) for the full schema. Layers on top of worktree isolation |
+
+Per-node-only (NOT valid at workflow level): `maxBudgetUsd`, `systemPrompt`.
+
+### Codex-Specific Options
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `modelReasoningEffort` | `'minimal'` \| `'low'` \| `'medium'` \| `'high'` \| `'xhigh'` | Codex reasoning depth. Separate field from Claude's `effort` |
+| `webSearchMode` | `'disabled'` \| `'cached'` \| `'live'` | Codex web search behavior. Default: `disabled` |
+| `additionalDirectories` | string[] | Absolute paths Codex can read outside the codebase (shared libraries, docs repos) |
+
+### Complete workflow-level example
+
+```yaml
+name: careful-migration
+description: |
+  Plan a migration, get explicit approval, then implement under strict
+  sandbox and cost limits. Used by the ops team before destructive work.
+provider: claude
+model: sonnet
+interactive: true                   # required — this workflow has an approval gate
+
+worktree:
+  enabled: true                     # always isolate; reject --no-worktree
+
+effort: high
+thinking: adaptive
+fallbackModel: claude-haiku-4-5-20251001
+betas: ['context-1m-2025-08-07']
+sandbox:
+  enabled: true
+  network:
+    allowedDomains: ['api.github.com']
+    allowManagedDomainsOnly: true
+  filesystem:
+    denyWrite: ['/etc', '/usr']
+
+nodes:
+  - id: plan
+    command: plan-migration
+  - id: review
+    approval:
+      message: "Review the migration plan above."
+    depends_on: [plan]
+  - id: implement
+    command: implement-migration
+    depends_on: [review]
+```
+
 ## Node Types (Mutually Exclusive)
 
 Each node must have exactly ONE of these fields: `command`, `prompt`, `bash`, `script`, `loop`, `approval`, or `cancel`.
@@ -177,14 +259,53 @@ nodes:
 
 ## Conditions (`when:`)
 
+Gate whether a node runs based on upstream output. A condition that evaluates to `false` skips the node (fail-closed — skipped nodes propagate their skipped state to dependants).
+
+### Operators
+
+**String comparison** (literal string equality):
 ```yaml
-- id: investigate
-  command: investigate-bug
-  depends_on: [classify]
-  when: "$classify.output.issue_type == 'bug'"
+when: "$nodeId.output == 'VALUE'"
+when: "$nodeId.output != 'VALUE'"
+when: "$nodeId.output.field == 'VALUE'"       # JSON dot notation (requires output_format)
+```
+
+**Numeric comparison** (both sides auto-parsed as numbers; fail-closed if either side is not finite):
+```yaml
+when: "$score.output > '80'"
+when: "$score.output >= '0.9'"
+when: "$score.output < '100'"
+when: "$score.output <= '5'"
+when: "$score.output.confidence >= '0.9'"
 ```
 
-**Syntax**: `$nodeId.output OPERATOR 'value'` — operators: `==`, `!=` only. Values single-quoted. Invalid expressions skip the node (fail-closed).
+All six operators — `==`, `!=`, `<`, `>`, `<=`, `>=` — are supported. Values are single-quoted strings (even for numeric comparisons).
+
+### Compound Expressions
+
+Combine conditions with `&&` (AND) and `||` (OR). **`&&` binds tighter than `||`.** No parentheses supported — structure expressions with that precedence in mind.
+
+```yaml
+when: "$a.output == 'X' && $b.output != 'Y'"
+when: "$a.output == 'X' || $b.output == 'Y'"
+when: "$score.output > '80' && $flag.output == 'true'"
+
+# Precedence: (A && B) || C
+when: "$a.output == 'X' && $b.output == 'Y' || $c.output == 'Z'"
+```
+
+Short-circuit evaluation: `&&` stops at the first false, `||` stops at the first true.
+
+### Dot Notation (JSON Field Access)
+
+`$nodeId.output.field` parses the upstream output as JSON and extracts the named field. Returns empty string if parsing fails or the field is absent — which then fails-closed against any literal value. Requires the upstream node to have `output_format` set (for AI nodes) or to print valid JSON (for bash/script nodes).
+
+### Fail-Closed Rules
+
+- Invalid or unparseable expression → node skipped, warning logged
+- Numeric operator with a non-numeric side → node skipped
+- `$nodeId.output.field` on non-JSON output → field is empty → comparison fails
+- Referenced node did not run (skipped upstream) → substitution is empty → comparison fails
 
 ## Node Output Substitution
 
@@ -259,15 +380,53 @@ Loop nodes iterate an AI prompt until a completion condition is met. Use them fo
     max_iterations: 10         # Required. Integer >= 1. Fails if exceeded
     fresh_context: true        # Optional. Default: false
     until_bash: "..."          # Optional. Exit 0 = complete
+    interactive: true          # Optional. Pauses between iterations for user input
+    gate_message: "..."        # Required when interactive: true
 ```
 
 | Field | Type | Required | Description |
 |-------|------|----------|-------------|
-| `prompt` | string | Yes | Prompt template. Supports all variable substitution (`$ARGUMENTS`, `$nodeId.output`, etc.) |
+| `prompt` | string | Yes | Prompt template. Supports all variable substitution (`$ARGUMENTS`, `$nodeId.output`, `$LOOP_USER_INPUT`, etc.) |
 | `until` | string | Yes | Completion signal to detect in AI output |
 | `max_iterations` | number | Yes | Hard limit. Node **fails** if exceeded |
 | `fresh_context` | boolean | No | Default `false`. `true` = fresh AI session each iteration |
-| `until_bash` | string | No | Shell script run after each iteration. Exit 0 = complete |
+| `until_bash` | string | No | Shell script run after each iteration. Exit 0 = complete. Variable substitution applies; `$nodeId.output` IS shell-quoted here |
+| `interactive` | boolean | No | Default `false`. `true` = pause after each non-completing iteration for user feedback via `/workflow approve <id> <text>` |
+| `gate_message` | string | **Required when `interactive: true`** | Message shown to the user at each pause. Validated at parse time — a loop with `interactive: true` and no `gate_message` fails to load |
+
+### Interactive Loops
+
+Interactive loops pause between iterations so a human can provide feedback that feeds the next iteration. Use them for guided writing/refinement (e.g. PRD co-authoring, iterative design).
+
+```yaml
+name: guided-refine
+description: Refine an output with human feedback between iterations
+interactive: true                # REQUIRED at the workflow level for web UI
+
+nodes:
+  - id: refine
+    loop:
+      prompt: |
+        Review the current draft and improve it based on this feedback:
+        $LOOP_USER_INPUT
+
+        When the output is satisfactory, output: <promise>DONE</promise>
+      until: DONE
+      max_iterations: 5
+      interactive: true          # node level — enables the pause
+      gate_message: |
+        Review the output above. Reply with feedback, or type DONE to finish.
+```
+
+The flow:
+1. Iteration N runs. AI produces output.
+2. If AI signalled completion (`<promise>DONE</promise>`) or `until_bash` exited 0, loop ends.
+3. Otherwise: `gate_message` is sent to the user, workflow pauses (status = `paused`).
+4. User runs `archon workflow approve <run-id> "<their feedback>"` (or replies naturally in chat platforms).
+5. Iteration N+1 runs with `$LOOP_USER_INPUT` substituted to the user's feedback — but **only on that first resumed iteration**. Subsequent iterations in the same resumed session see `$LOOP_USER_INPUT` as empty string.
+6. Repeat.
+
+**Workflow-level `interactive: true` is required** for the gate message to reach the user on the web UI (otherwise the workflow dispatches to a background worker that can't deliver chat messages). The loader emits a warning if a node has `interactive: true` without workflow-level `interactive: true`.
 
 ### Completion Detection
 
@@ -327,6 +486,148 @@ First iteration is always fresh regardless.
 
 ---
 
+## Approval Nodes
+
+Approval nodes **pause the workflow** until a human approves or rejects the gate. Use them to insert review steps between AI-driven nodes — for example, reviewing a generated plan before committing to expensive implementation work.
+
+### Configuration
+
+```yaml
+- id: review-gate
+  approval:
+    message: "Review the plan above before proceeding with implementation."
+    capture_response: false        # Optional. true = user's comment stored as $review-gate.output
+    on_reject:                     # Optional. AI rework on rejection instead of cancel
+      prompt: "Revise based on feedback: $REJECTION_REASON"
+      max_attempts: 3              # Range 1–10, default 3. After max, workflow is cancelled.
+  depends_on: [plan]
+```
+
+### Fields
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `approval.message` | **Yes** | The message shown to the user when the workflow pauses |
+| `approval.capture_response` | No | `true` = user's approval comment stored as `$<node-id>.output` for downstream nodes. Default: `false` (downstream `$<node-id>.output` is empty string) |
+| `approval.on_reject.prompt` | No | Prompt run via AI when the user rejects. `$REJECTION_REASON` is substituted with the reject reason. After running, the workflow re-pauses at the same gate |
+| `approval.on_reject.max_attempts` | No | Max times the on_reject prompt runs before the workflow is cancelled. Range: 1–10. Default: 3 |
+
+### Web UI Requirement
+
+Approval gates delivered on the Web UI require `interactive: true` at the **workflow level** — otherwise the workflow dispatches to a background worker and the gate message never reaches the user's chat window.
+
+```yaml
+name: plan-approve-implement
+interactive: true   # REQUIRED for approval gates on web UI
+nodes:
+  - id: plan
+    command: plan-feature
+  - id: review-gate
+    approval:
+      message: "Approve the plan to proceed."
+    depends_on: [plan]
+  - id: implement
+    command: implement
+    depends_on: [review-gate]
+```
+
+### Approve and Reject Commands
+
+```bash
+# From the CLI
+archon workflow approve <run-id>
+archon workflow approve <run-id> --comment "looks good"
+archon workflow reject <run-id>
+archon workflow reject <run-id> --reason "plan needs more test coverage"
+
+# Cross-platform (Slack / Telegram / Web / GitHub chat)
+/workflow approve <run-id> <optional comment>
+/workflow reject <run-id> <optional reason>
+
+# Natural language (all platforms except CLI — auto-detects paused workflow)
+User: "Looks good, proceed"
+# → auto-approves. With capture_response: true, the message becomes $review-gate.output
+```
+
+### What Does NOT Work on Approval Nodes
+
+AI-specific fields (`model`, `provider`, `hooks`, `mcp`, `skills`, `output_format`, `allowed_tools`, `denied_tools`, `context`, `effort`, `thinking`, etc.) are accepted by the parser but emit a loader warning and are ignored — no AI runs during the pause. (Note: `on_reject.prompt` DOES run AI, using the workflow's default provider/model.)
+
+`retry`, `when`, `trigger_rule`, `depends_on`, `idle_timeout` all work.
+
+---
+
+## Cancel Nodes
+
+Cancel nodes **terminate the workflow run** with a reason string. Useful for guarded exits — a `cancel:` node with a `when:` condition stops the workflow cleanly when preconditions aren't met.
+
+### Configuration
+
+```yaml
+- id: gate-branch
+  cancel: "Refusing to run on main — this workflow modifies files."
+  when: "$check-branch.output == 'main'"
+  depends_on: [check-branch]
+```
+
+When a cancel node runs, Archon:
+- Marks the workflow run as `cancelled` (not `failed`)
+- Stops in-flight parallel nodes via the existing cancellation plumbing
+- Records the reason string in the run's metadata
+- Emits a `node_completed` event for the cancel node itself
+
+### Fields
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `cancel` | **Yes** | Non-empty reason string shown to the user and recorded in metadata |
+
+Standard DAG fields (`id`, `depends_on`, `when`, `trigger_rule`, `idle_timeout`) all work. AI-specific fields emit a loader warning and are ignored — cancel nodes don't invoke AI.
+
+### When to use `cancel` vs failing a `bash:` check
+
+- **Use `cancel:`** when the precondition failure is **expected** (e.g., wrong branch, required file missing, feature flag disabled). The run shows as `cancelled`, which doesn't trigger the DAG auto-resume path.
+- **Use a `bash:` node that exits non-zero** when the check itself fails (e.g., network error, tool missing). The run shows as `failed`, which auto-resumes on the next invocation.
+
+### Typical Patterns
+
+**Gate on upstream classification:**
+```yaml
+- id: classify
+  prompt: "Is the input safe to proceed? Output 'SAFE' or 'UNSAFE'."
+  allowed_tools: []
+
+- id: stop-if-unsafe
+  cancel: "Refusing to proceed: input flagged UNSAFE by classifier."
+  depends_on: [classify]
+  when: "$classify.output != 'SAFE'"
+
+- id: do-work
+  command: the-work
+  depends_on: [classify]
+  when: "$classify.output == 'SAFE'"
+```
+
+**Stop before expensive step unless precondition met:**
+```yaml
+- id: check-budget
+  bash: |
+    spent=$(gh api /meta --jq '.rate.used // 0')
+    echo "$spent"
+
+- id: abort-if-over
+  cancel: "Aborting — GH API quota exhausted."
+  depends_on: [check-budget]
+  when: "$check-budget.output > '4500'"
+
+- id: run-api-heavy-work
+  command: heavy-work
+  depends_on: [check-budget]
+  when: "$check-budget.output <= '4500'"
+```
+
+---
+
 ## Validate Before Finishing
 
 Before declaring a workflow complete, validate it:
@@ -354,6 +655,9 @@ Use `--json` for machine-readable output. Use `archon validate commands <name>`
 - Script nodes require `runtime: bun` or `runtime: uv`
 - Named scripts must exist in `.archon/scripts/` or `~/.archon/scripts/` with extension matching declared runtime
 - `retry` on loop node = hard error
+- `approval.message` required and non-empty
+- `cancel` reason required and non-empty
+- Approval `on_reject.max_attempts` must be 1–10 if set
 - `steps:` format rejected (deprecated — use `nodes:` only)
 
 ## Complete Example
diff --git a/packages/docs-web/src/content/docs/book/dag-workflows.md b/packages/docs-web/src/content/docs/book/dag-workflows.md
index 93bf766872..558df2590f 100644
--- a/packages/docs-web/src/content/docs/book/dag-workflows.md
+++ b/packages/docs-web/src/content/docs/book/dag-workflows.md
@@ -230,23 +230,23 @@ The classify-and-route example uses `none_failed_min_one_success` on `implement`
 
 ## Node Types
 
-Archon supports seven node types:
+Archon supports seven node types. Exactly one mode field is required per node:
 
 | Type | Syntax | When to use |
 |------|--------|-------------|
 | **Command** | `command: my-command` | Load a command from `.archon/commands/my-command.md`. The standard choice. |
 | **Prompt** | `prompt: "inline instructions..."` | Quick, one-off instructions that don't need a reusable command file. |
 | **Bash** | `bash: "shell command"` | Run a shell script without AI. Stdout is captured as `$nodeId.output`. Deterministic operations only. |
-| **Script** | `script: "..." runtime: bun\|uv` | TypeScript (via bun) or Python (via uv) — deterministic typed transforms where bash would need fragile quoting. Stdout is captured as `$nodeId.output`. See [Script Nodes](/guides/script-nodes/). |
+| **Script** | `script: "..." ` + `runtime: bun \| uv` | Run TypeScript/JavaScript (bun) or Python (uv) without AI. Inline code or named reference to `.archon/scripts/`. Stdout captured as `$nodeId.output`. See [Script Nodes](/guides/script-nodes/). |
 | **Loop** | `loop: { prompt: "...", until: SIGNAL }` | Repeat an AI prompt until a completion signal appears in the output. See [Loop Nodes](/guides/loop-nodes/). |
-| **Approval** | `approval: { message: "..." }` | Pause the run for human review before continuing. See [Approval Nodes](/guides/approval-nodes/). |
-| **Cancel** | `cancel: "reason string"` | Terminate the run with a reason (useful as a `when:`-gated branch for safety checks). |
+| **Approval** | `approval: { message: "..." }` | Pause the workflow for a human approve/reject decision. See [Approval Nodes](/guides/approval-nodes/). |
+| **Cancel** | `cancel: "reason string"` | Terminate the workflow run (status: cancelled, not failed). Usually gated with `when:`. |
 
 **Command** is the most common. Use it for anything you'll reuse across workflows.
 
 **Prompt** is convenient for glue nodes — summarizing outputs, formatting data — where the logic is simple and workflow-specific.
 
-**Bash** is powerful for deterministic operations: running tests, checking git status, reading a file, fetching an API. The AI doesn't run the bash command; your shell does. The output becomes a variable for downstream nodes:
+**Bash** is powerful for deterministic shell operations: running tests, checking git status, reading a file, fetching an API. The AI doesn't run the bash command; your shell does. The output becomes a variable for downstream nodes:
 
 ```yaml
 - id: check-tests
@@ -258,6 +258,22 @@ Archon supports seven node types:
   prompt: "Test output: $check-tests.output\n\nFix any failures."
 ```
 
+**Script** is for deterministic work that needs a real programming language — parsing JSON, transforming data between AI nodes, calling typed HTTP clients. Use `runtime: bun` for TypeScript/JavaScript and `runtime: uv` for Python:
+
+```yaml
+- id: transform
+  script: |
+    const raw = process.env.UPSTREAM ?? '{}';
+    const items = JSON.parse(raw).items ?? [];
+    console.log(JSON.stringify({ count: items.length }));
+  runtime: bun
+
+- id: analyze
+  script: analyze-metrics        # Named script: .archon/scripts/analyze-metrics.py
+  runtime: uv
+  deps: ["pandas>=2.0"]          # uv-only; bun auto-installs imports
+```
+
 **Loop** is for iterative tasks where you don't know how many steps it will take. The AI runs until it emits a completion signal:
 
 ```yaml
@@ -272,6 +288,32 @@ Archon supports seven node types:
     fresh_context: true
 ```
 
+**Approval** pauses the workflow for human review. The downstream nodes don't run until the user approves in chat, CLI, or web UI:
+
+```yaml
+interactive: true                 # required at workflow level for web UI delivery
+
+nodes:
+  - id: plan
+    command: plan-feature
+  - id: review-gate
+    approval:
+      message: "Review the plan above."
+    depends_on: [plan]
+  - id: implement
+    command: implement
+    depends_on: [review-gate]
+```
+
+**Cancel** terminates the workflow with a reason string. Pair with `when:` for guarded exits — the run shows as `cancelled` rather than `failed`:
+
+```yaml
+- id: gate-branch
+  cancel: "Refusing to run on main — this workflow modifies files."
+  when: "$check-branch.output == 'main'"
+  depends_on: [check-branch]
+```
+
 ---
 
 ## Best Practices
diff --git a/packages/docs-web/src/content/docs/book/quick-reference.md b/packages/docs-web/src/content/docs/book/quick-reference.md
index 2c3123acdd..a0c34643c3 100644
--- a/packages/docs-web/src/content/docs/book/quick-reference.md
+++ b/packages/docs-web/src/content/docs/book/quick-reference.md
@@ -124,10 +124,10 @@ All nodes share these base fields:
 | `command` | One of | string | Name of a command file in `.archon/commands/` |
 | `prompt` | One of | string | Inline AI instructions |
 | `bash` | One of | string | Shell script (runs without AI; stdout captured as `$nodeId.output`) |
-| `script` | One of | string | TypeScript/JS (via bun) or Python (via uv); requires `runtime:` (`bun` or `uv`); optional `deps:` (uv only) and `timeout:` (ms). Stdout captured as `$nodeId.output`. See [Script Nodes](/guides/script-nodes/) |
+| `script` | One of | string | TypeScript/JavaScript (bun) or Python (uv) — inline or named ref to `.archon/scripts/`. Requires `runtime`. See [Script Nodes](/guides/script-nodes/) |
 | `loop` | One of | object | Loop configuration (see Loop Options below) |
-| `approval` | One of | object | Human-review gate; pauses the run until approved or rejected. See [Approval Nodes](/guides/approval-nodes/) |
-| `cancel` | One of | string | Terminates the run with the given reason string |
+| `approval` | One of | object | Pause for human review; see [Approval Nodes](/guides/approval-nodes/) |
+| `cancel` | One of | string | Reason string; terminates the run with `cancelled` status (not `failed`). Usually gated with `when:` |
 | `depends_on` | No | string[] | Node IDs that must complete before this node runs |
 | `when` | No | string | Condition expression; node is skipped if false |
 | `trigger_rule` | No | string | Join semantics when multiple upstreams exist (see Trigger Rules) |
@@ -138,12 +138,30 @@ All nodes share these base fields:
 | `allowed_tools` | No | string[] | Restrict available tools to this list (Claude only) |
 | `denied_tools` | No | string[] | Remove specific tools from this node's context (Claude only) |
 | `idle_timeout` | No | number | Per-node idle timeout in milliseconds (default: 5 minutes) |
-| `retry` | No | object | Retry configuration for transient failures (see Retry Options) |
+| `retry` | No | object | Retry configuration for transient failures (see Retry Options). **Hard error on loop nodes** |
 | `hooks` | No | object | SDK hook callbacks (Claude only; see Hook Schema) |
 | `mcp` | No | string | Path to MCP server config JSON file (Claude only) |
 | `skills` | No | string[] | Skill names to preload into this node's context (Claude only) |
+| `agents` | No | object | Inline sub-agent definitions keyed by kebab-case ID. Claude only |
 
-> **bash node timeout**: The `timeout` field on bash nodes is in **milliseconds** (default: 120000). This differs from hook `timeout`, which is in seconds.
+**Script-specific fields** (required when `script:` is set):
+
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `runtime` | Yes | `'bun'` \| `'uv'` | Which runtime executes the script. Must match file extension for named scripts (`.ts`/`.js` → bun, `.py` → uv) |
+| `deps` | No | string[] | Python dependencies for `uv run --with`. Ignored for bun (bun auto-installs) |
+| `timeout` | No | number | Hard kill in ms. Default: 120000 (2 min). Same semantics as `bash` timeout |
+
+**Approval-specific fields** (required when `approval:` is set):
+
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `approval.message` | Yes | string | The message shown to the user when the workflow pauses |
+| `approval.capture_response` | No | boolean | `true` = user's comment becomes `$<node-id>.output`. Default: `false` |
+| `approval.on_reject.prompt` | No | string | AI rework prompt when the user rejects. `$REJECTION_REASON` substituted |
+| `approval.on_reject.max_attempts` | No | number | Max rework iterations before cancel. Range 1-10, default 3 |
+
+> **bash and script node timeout**: The `timeout` field is in **milliseconds** (default: 120000). This differs from hook `timeout`, which is in seconds.
 
 ### Trigger Rules