From afd9a47e6eaeb695aeef90205bd0ed6e879a950a Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 16:58:22 +0300
Subject: [PATCH 01/14] fix(skill/when): document the full `when:` operator set
 and compound expressions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The skill reference previously stated "operators: ==, != only" which is
materially wrong — the condition evaluator supports ==, !=, <, >, <=, >=
plus && / || compound expressions with && binding tighter than ||, plus
dot-notation JSON field access. An agent authoring a workflow from the
skill would think half the operators don't exist.

Replaces the single-sentence section with a structured reference covering:
- All six comparison operators (string and numeric modes)
- Compound expressions with precedence rules and short-circuit eval
- JSON dot notation semantics and failure modes
- The fail-closed rules in full (invalid expression, non-numeric side,
  missing field, skipped upstream)

Grounded in packages/workflows/src/condition-evaluator.ts.
---
 .../skills/archon/references/workflow-dag.md  | 49 +++++++++++++++++--
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/.claude/skills/archon/references/workflow-dag.md b/.claude/skills/archon/references/workflow-dag.md
index 5132e0dab6..5ad4dcb0ab 100644
--- a/.claude/skills/archon/references/workflow-dag.md
+++ b/.claude/skills/archon/references/workflow-dag.md
@@ -177,14 +177,53 @@ nodes:
 
 ## Conditions (`when:`)
 
+Gate whether a node runs based on upstream output. A condition that evaluates to `false` skips the node (fail-closed — skipped nodes propagate their skipped state to dependants).
+
+### Operators
+
+**String comparison** (literal string equality):
 ```yaml
-- id: investigate
-  command: investigate-bug
-  depends_on: [classify]
-  when: "$classify.output.issue_type == 'bug'"
+when: "$nodeId.output == 'VALUE'"
+when: "$nodeId.output != 'VALUE'"
+when: "$nodeId.output.field == 'VALUE'"       # JSON dot notation (requires output_format)
 ```
 
-**Syntax**: `$nodeId.output OPERATOR 'value'` — operators: `==`, `!=` only. Values single-quoted. Invalid expressions skip the node (fail-closed).
+**Numeric comparison** (both sides auto-parsed as numbers; fail-closed if either side is not finite):
+```yaml
+when: "$score.output > '80'"
+when: "$score.output >= '0.9'"
+when: "$score.output < '100'"
+when: "$score.output <= '5'"
+when: "$score.output.confidence >= '0.9'"
+```
+
+All six operators — `==`, `!=`, `<`, `>`, `<=`, `>=` — are supported. Values are single-quoted strings (even for numeric comparisons).
+
+### Compound Expressions
+
+Combine conditions with `&&` (AND) and `||` (OR). **`&&` binds tighter than `||`.** No parentheses supported — structure expressions with that precedence in mind.
+
+```yaml
+when: "$a.output == 'X' && $b.output != 'Y'"
+when: "$a.output == 'X' || $b.output == 'Y'"
+when: "$score.output > '80' && $flag.output == 'true'"
+
+# Precedence: (A && B) || C
+when: "$a.output == 'X' && $b.output == 'Y' || $c.output == 'Z'"
+```
+
+Short-circuit evaluation: `&&` stops at the first false, `||` stops at the first true.
+
+### Dot Notation (JSON Field Access)
+
+`$nodeId.output.field` parses the upstream output as JSON and extracts the named field. Returns empty string if parsing fails or the field is absent — which then fails-closed against any literal value. Requires the upstream node to have `output_format` set (for AI nodes) or to print valid JSON (for bash/script nodes).
+
+### Fail-Closed Rules
+
+- Invalid or unparseable expression → node skipped, warning logged
+- Numeric operator with a non-numeric side → node skipped
+- `$nodeId.output.field` on non-JSON output → field is empty → comparison fails
+- Referenced node did not run (skipped upstream) → substitution is empty → comparison fails
 
 ## Node Output Substitution
 

From edb9eb8122d5205e0643612d7c4ea54ed51f9daf Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 16:59:51 +0300
Subject: [PATCH 02/14] feat(skill): document Approval and Cancel node types
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Approval and cancel nodes are first-class DAG node types (approval since the
workflow lifecycle work in #871, cancel as a guarded-exit primitive) but the
skill never described either one. An agent reading the skill and asked to
"add a review gate before implementation" or "stop the workflow if the input
is unsafe" would fall back to bash + exit 1, losing the proper semantics
(cancelled vs. failed, on_reject AI rework, web UI auto-resume).

Approval node coverage (references/workflow-dag.md, SKILL.md):
- Full configuration block with message, capture_response, on_reject
- The interactive: true workflow-level requirement for web UI delivery
- Approve/reject commands across all platforms (CLI, slash, natural
  language) and the capture_response → $node-id.output flow
- Ignored-fields list + the on_reject.prompt AI sub-node exception

Cancel node coverage (references/workflow-dag.md, SKILL.md):
- Single-field schema (cancel: "<reason>")
- Lifecycle: cancelled (not failed); in-flight parallel nodes stopped;
  no DAG auto-resume path
- The "cancel: vs bash-exit-1" decision rule (expected precondition miss
  vs. check itself failing)
- Two canonical patterns — upstream-classification gate, pre-expensive-step
  gate

Validation-rules list updated to enumerate approval/cancel constraints
(message non-empty, on_reject.max_attempts range 1-10, cancel reason
non-empty), plus a forward note that script: joins the mutually-exclusive
set once PR #1362 lands.

Placement in both files is after the Loop section and before the validation
section, so this commit stays additive with respect to PR #1362's Script
node insertion between Bash and Loop — rebase is clean.
---
 .claude/skills/archon/SKILL.md                |  23 +++
 .../skills/archon/references/workflow-dag.md  | 145 ++++++++++++++++++
 2 files changed, 168 insertions(+)

diff --git a/.claude/skills/archon/SKILL.md b/.claude/skills/archon/SKILL.md
index 7f126c9bac..18dda373dc 100644
--- a/.claude/skills/archon/SKILL.md
+++ b/.claude/skills/archon/SKILL.md
@@ -204,6 +204,29 @@ Each node has exactly ONE of: `command`, `prompt`, `bash`, `script`, `loop`, `ap
     until_bash: "bun run test"    # Optional: exit 0 = done
 ```
 
+**Approval node** — pauses the workflow for human review. Requires `interactive: true` at the workflow level for Web UI delivery:
+```yaml
+interactive: true   # workflow level — required for web UI
+
+nodes:
+  - id: review-gate
+    approval:
+      message: "Review the plan above before proceeding."
+      capture_response: true      # Optional: user's comment → $review-gate.output
+      on_reject:                  # Optional: AI rework on rejection instead of cancel
+        prompt: "Revise based on feedback: $REJECTION_REASON"
+        max_attempts: 3           # Range 1-10, default 3
+    depends_on: [plan]
+```
+
+**Cancel node** — terminates the workflow with a reason. Typically gated with `when:`:
+```yaml
+- id: stop-if-unsafe
+  cancel: "Refusing to proceed: input flagged UNSAFE."
+  depends_on: [classify]
+  when: "$classify.output != 'SAFE'"
+```
+
 For the full authoring guide with all fields, conditions, trigger rules, and patterns: Read `references/workflow-dag.md`
 
 ### Creating a Command File
diff --git a/.claude/skills/archon/references/workflow-dag.md b/.claude/skills/archon/references/workflow-dag.md
index 5ad4dcb0ab..6474460804 100644
--- a/.claude/skills/archon/references/workflow-dag.md
+++ b/.claude/skills/archon/references/workflow-dag.md
@@ -366,6 +366,148 @@ First iteration is always fresh regardless.
 
 ---
 
+## Approval Nodes
+
+Approval nodes **pause the workflow** until a human approves or rejects the gate. Use them to insert review steps between AI-driven nodes — for example, reviewing a generated plan before committing to expensive implementation work.
+
+### Configuration
+
+```yaml
+- id: review-gate
+  approval:
+    message: "Review the plan above before proceeding with implementation."
+    capture_response: false        # Optional. true = user's comment stored as $review-gate.output
+    on_reject:                     # Optional. AI rework on rejection instead of cancel
+      prompt: "Revise based on feedback: $REJECTION_REASON"
+      max_attempts: 3              # Range 1–10, default 3. After max, workflow is cancelled.
+  depends_on: [plan]
+```
+
+### Fields
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `approval.message` | **Yes** | The message shown to the user when the workflow pauses |
+| `approval.capture_response` | No | `true` = user's approval comment stored as `$<node-id>.output` for downstream nodes. Default: `false` (downstream `$<node-id>.output` is empty string) |
+| `approval.on_reject.prompt` | No | Prompt run via AI when the user rejects. `$REJECTION_REASON` is substituted with the reject reason. After running, the workflow re-pauses at the same gate |
+| `approval.on_reject.max_attempts` | No | Max times the on_reject prompt runs before the workflow is cancelled. Range: 1–10. Default: 3 |
+
+### Web UI Requirement
+
+Approval gates delivered on the Web UI require `interactive: true` at the **workflow level** — otherwise the workflow dispatches to a background worker and the gate message never reaches the user's chat window.
+
+```yaml
+name: plan-approve-implement
+interactive: true   # REQUIRED for approval gates on web UI
+nodes:
+  - id: plan
+    command: plan-feature
+  - id: review-gate
+    approval:
+      message: "Approve the plan to proceed."
+    depends_on: [plan]
+  - id: implement
+    command: implement
+    depends_on: [review-gate]
+```
+
+### Approve and Reject Commands
+
+```bash
+# From the CLI
+archon workflow approve <run-id>
+archon workflow approve <run-id> --comment "looks good"
+archon workflow reject <run-id>
+archon workflow reject <run-id> --reason "plan needs more test coverage"
+
+# Cross-platform (Slack / Telegram / Web / GitHub chat)
+/workflow approve <run-id> <optional comment>
+/workflow reject <run-id> <optional reason>
+
+# Natural language (all platforms except CLI — auto-detects paused workflow)
+User: "Looks good, proceed"
+# → auto-approves. With capture_response: true, the message becomes $review-gate.output
+```
+
+### What Does NOT Work on Approval Nodes
+
+AI-specific fields (`model`, `provider`, `hooks`, `mcp`, `skills`, `output_format`, `allowed_tools`, `denied_tools`, `context`, `effort`, `thinking`, etc.) are accepted by the parser but emit a loader warning and are ignored — no AI runs during the pause. (Note: `on_reject.prompt` DOES run AI, using the workflow's default provider/model.)
+
+`retry`, `when`, `trigger_rule`, `depends_on`, `idle_timeout` all work.
+
+---
+
+## Cancel Nodes
+
+Cancel nodes **terminate the workflow run** with a reason string. Useful for guarded exits — a `cancel:` node with a `when:` condition stops the workflow cleanly when preconditions aren't met.
+
+### Configuration
+
+```yaml
+- id: gate-branch
+  cancel: "Refusing to run on main — this workflow modifies files."
+  when: "$check-branch.output == 'main'"
+  depends_on: [check-branch]
+```
+
+When a cancel node runs, Archon:
+- Marks the workflow run as `cancelled` (not `failed`)
+- Stops in-flight parallel nodes via the existing cancellation plumbing
+- Records the reason string in the run's metadata
+- Emits a `node_completed` event for the cancel node itself
+
+### Fields
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `cancel` | **Yes** | Non-empty reason string shown to the user and recorded in metadata |
+
+Standard DAG fields (`id`, `depends_on`, `when`, `trigger_rule`, `idle_timeout`) all work. AI-specific fields emit a loader warning and are ignored — cancel nodes don't invoke AI.
+
+### When to use `cancel` vs failing a `bash:` check
+
+- **Use `cancel:`** when the precondition failure is **expected** (e.g., wrong branch, required file missing, feature flag disabled). The run shows as `cancelled`, which doesn't trigger the DAG auto-resume path.
+- **Use a `bash:` node that exits non-zero** when the check itself fails (e.g., network error, tool missing). The run shows as `failed`, which auto-resumes on the next invocation.
+
+### Typical Patterns
+
+**Gate on upstream classification:**
+```yaml
+- id: classify
+  prompt: "Is the input safe to proceed? Output 'SAFE' or 'UNSAFE'."
+  allowed_tools: []
+
+- id: stop-if-unsafe
+  cancel: "Refusing to proceed: input flagged UNSAFE by classifier."
+  depends_on: [classify]
+  when: "$classify.output != 'SAFE'"
+
+- id: do-work
+  command: the-work
+  depends_on: [classify]
+  when: "$classify.output == 'SAFE'"
+```
+
+**Stop before expensive step unless precondition met:**
+```yaml
+- id: check-budget
+  bash: |
+    spent=$(gh api /meta --jq '.rate.used // 0')
+    echo "$spent"
+
+- id: abort-if-over
+  cancel: "Aborting — GH API quota exhausted."
+  depends_on: [check-budget]
+  when: "$check-budget.output > '4500'"
+
+- id: run-api-heavy-work
+  command: heavy-work
+  depends_on: [check-budget]
+  when: "$check-budget.output <= '4500'"
+```
+
+---
+
 ## Validate Before Finishing
 
 Before declaring a workflow complete, validate it:
@@ -393,6 +535,9 @@ Use `--json` for machine-readable output. Use `archon validate commands <name>`
 - Script nodes require `runtime: bun` or `runtime: uv`
 - Named scripts must exist in `.archon/scripts/` or `~/.archon/scripts/` with extension matching declared runtime
 - `retry` on loop node = hard error
+- `approval.message` required and non-empty
+- `cancel` reason required and non-empty
+- Approval `on_reject.max_attempts` must be 1–10 if set
 - `steps:` format rejected (deprecated — use `nodes:` only)
 
 ## Complete Example

From 5ccf88889b31918904f5d8010877bf0d995c5f0c Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 17:00:44 +0300
Subject: [PATCH 03/14] feat(skill): document workflow-level fields beyond
 name/provider/model
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The skill's Schema section previously showed only name, description, provider,
and model at the workflow level — which is most of a stub. Agents asked to
"use the 1M-context Claude beta" or "run this under a network sandbox" or
"add a fallback model in case Opus rate-limits" had no way to discover
that any of these fields existed at the workflow level.

Adds a comprehensive Workflow-Level Fields section covering:
- Core: name, description, provider, model, interactive (with explicit
  callout that interactive: true is REQUIRED for approval/loop gates on
  web UI — a common footgun)
- Isolation: worktree.enabled for pin-on/pin-off (the only worktree field
  at workflow level; baseBranch/copyFiles/path/initSubmodules are
  config.yaml only, so a cross-reference points there)
- Claude SDK advanced: effort, thinking, fallbackModel, betas, sandbox,
  with explicit per-node-only exceptions (maxBudgetUsd, systemPrompt)
- Codex-specific: modelReasoningEffort (with note that it's NOT the same
  as Claude's effort — this has confused users), webSearchMode,
  additionalDirectories
- A complete worked example combining sandbox + approval + interactive

All fields cross-referenced against packages/workflows/src/schemas/workflow.ts
and packages/workflows/src/schemas/dag-node.ts.
---
 .../skills/archon/references/workflow-dag.md  | 82 +++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/.claude/skills/archon/references/workflow-dag.md b/.claude/skills/archon/references/workflow-dag.md
index 6474460804..3cde30fec0 100644
--- a/.claude/skills/archon/references/workflow-dag.md
+++ b/.claude/skills/archon/references/workflow-dag.md
@@ -20,6 +20,88 @@ nodes:
     depends_on: [other-node]        # Node IDs that must complete first
 ```
 
+## Workflow-Level Fields
+
+Top-level YAML fields on a workflow object. Per-node overrides (same name under a node) win over workflow-level defaults.
+
+### Core
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `name` | string (required) | Workflow identifier (used in `archon workflow run <name>`) |
+| `description` | string (required) | Human-readable summary. Used for routing; see **Workflow Description Best Practices** in `docs-web/.../authoring-workflows.md` |
+| `provider` | string | AI provider (e.g. `claude`, `codex`, `pi`). Default: from `.archon/config.yaml` |
+| `model` | string | Model override. Claude: `sonnet` \| `opus` \| `haiku` \| `claude-*` \| `inherit`. Codex: any non-Claude model ID |
+| `interactive` | boolean | **Required for web UI** when the workflow has approval gates or `loop.interactive` nodes. Forces foreground execution so gate messages reach the user's chat. Default: `false` (background on web) |
+
+### Isolation
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `worktree.enabled` | boolean | Pin isolation regardless of caller. `false` = always live checkout (CLI `--branch`/`--from` hard-error). `true` = always worktree (CLI `--no-worktree` hard-errors). Omit = caller decides. Use `false` for read-only workflows (triage, reporting) |
+
+Other worktree config (`baseBranch`, `copyFiles`, `initSubmodules`, `path`) lives in `.archon/config.yaml`, not the workflow YAML — see `references/repo-init.md`.
+
+### Claude SDK Advanced Options
+
+These fields apply to Claude nodes workflow-wide; each can be overridden per-node. Codex nodes ignore them with a warning.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `effort` | `'low'` \| `'medium'` \| `'high'` \| `'max'` | Claude Agent SDK reasoning depth. Different from Codex `modelReasoningEffort` below |
+| `thinking` | string \| object | Extended thinking. String shorthand: `'adaptive'` \| `'enabled'` \| `'disabled'`. Object form: `{ type: 'enabled', budgetTokens: 8000 }` |
+| `fallbackModel` | string | Model to use if the primary model fails (e.g. `claude-haiku-4-5-20251001`) |
+| `betas` | string[] | SDK beta feature flags (non-empty array). Example: `['context-1m-2025-08-07']` for 1M-context Claude |
+| `sandbox` | object | OS-level filesystem/network restrictions. Nested `network` / `filesystem` sub-objects — see the docs site for the full schema. Layers on top of worktree isolation |
+
+Per-node-only (NOT valid at workflow level): `maxBudgetUsd`, `systemPrompt`.
+
+### Codex-Specific Options
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `modelReasoningEffort` | `'minimal'` \| `'low'` \| `'medium'` \| `'high'` \| `'xhigh'` | Codex reasoning depth. Separate field from Claude's `effort` |
+| `webSearchMode` | `'disabled'` \| `'cached'` \| `'live'` | Codex web search behavior. Default: `disabled` |
+| `additionalDirectories` | string[] | Absolute paths Codex can read outside the codebase (shared libraries, docs repos) |
+
+### Complete workflow-level example
+
+```yaml
+name: careful-migration
+description: |
+  Plan a migration, get explicit approval, then implement under strict
+  sandbox and cost limits. Used by the ops team before destructive work.
+provider: claude
+model: sonnet
+interactive: true                   # required — this workflow has an approval gate
+
+worktree:
+  enabled: true                     # always isolate; reject --no-worktree
+
+effort: high
+thinking: adaptive
+fallbackModel: claude-haiku-4-5-20251001
+betas: ['context-1m-2025-08-07']
+sandbox:
+  enabled: true
+  network:
+    allowedDomains: ['api.github.com']
+    allowManagedDomainsOnly: true
+  filesystem:
+    denyWrite: ['/etc', '/usr']
+
+nodes:
+  - id: plan
+    command: plan-migration
+  - id: review
+    approval:
+      message: "Review the migration plan above."
+    depends_on: [plan]
+  - id: implement
+    command: implement-migration
+    depends_on: [review]
+```
+
 ## Node Types (Mutually Exclusive)
 
 Each node must have exactly ONE of these fields: `command`, `prompt`, `bash`, `script`, `loop`, `approval`, or `cancel`.

From 3d216d418177bdddc808c781728d08c8fc32bd37 Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 17:01:24 +0300
Subject: [PATCH 04/14] feat(skill/loop): document interactive loops and
 gate_message
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Interactive loop nodes pause between iterations for human feedback via
/workflow approve — used by archon-piv-loop and archon-interactive-prd.
The skill's Loop Nodes section previously omitted both interactive: true
and gate_message entirely, so an agent writing a guided-refinement
workflow wouldn't know the feature exists or that gate_message is
required at parse time.

Adds:
- interactive and gate_message rows to the config table (marking
  gate_message as required when interactive: true — enforced by the
  loader's superRefine)
- A dedicated "Interactive Loops" subsection explaining the 6-step
  iterate-pause-approve-resume flow
- Explicit call-out that $LOOP_USER_INPUT populates ONLY on the first
  iteration of a resumed session — easy to miss and a common surprise
- Workflow-level interactive: true requirement for web UI delivery
  (loader warning otherwise) so the full-flow example is complete
- Note that until_bash substitution DOES shell-quote $nodeId.output
  (unlike script bodies) — called out since the audit surfaced this
  inconsistency
---
 .../skills/archon/references/workflow-dag.md  | 42 ++++++++++++++++++-
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/.claude/skills/archon/references/workflow-dag.md b/.claude/skills/archon/references/workflow-dag.md
index 3cde30fec0..e13047704e 100644
--- a/.claude/skills/archon/references/workflow-dag.md
+++ b/.claude/skills/archon/references/workflow-dag.md
@@ -380,15 +380,53 @@ Loop nodes iterate an AI prompt until a completion condition is met. Use them fo
     max_iterations: 10         # Required. Integer >= 1. Fails if exceeded
     fresh_context: true        # Optional. Default: false
     until_bash: "..."          # Optional. Exit 0 = complete
+    interactive: true          # Optional. Pauses between iterations for user input
+    gate_message: "..."        # Required when interactive: true
 ```
 
 | Field | Type | Required | Description |
 |-------|------|----------|-------------|
-| `prompt` | string | Yes | Prompt template. Supports all variable substitution (`$ARGUMENTS`, `$nodeId.output`, etc.) |
+| `prompt` | string | Yes | Prompt template. Supports all variable substitution (`$ARGUMENTS`, `$nodeId.output`, `$LOOP_USER_INPUT`, etc.) |
 | `until` | string | Yes | Completion signal to detect in AI output |
 | `max_iterations` | number | Yes | Hard limit. Node **fails** if exceeded |
 | `fresh_context` | boolean | No | Default `false`. `true` = fresh AI session each iteration |
-| `until_bash` | string | No | Shell script run after each iteration. Exit 0 = complete |
+| `until_bash` | string | No | Shell script run after each iteration. Exit 0 = complete. Variable substitution applies; `$nodeId.output` IS shell-quoted here |
+| `interactive` | boolean | No | Default `false`. `true` = pause after each non-completing iteration for user feedback via `/workflow approve <id> <text>` |
+| `gate_message` | string | **Required when `interactive: true`** | Message shown to the user at each pause. Validated at parse time — a loop with `interactive: true` and no `gate_message` fails to load |
+
+### Interactive Loops
+
+Interactive loops pause between iterations so a human can provide feedback that feeds the next iteration. Use them for guided writing/refinement (e.g. PRD co-authoring, iterative design).
+
+```yaml
+name: guided-refine
+description: Refine an output with human feedback between iterations
+interactive: true                # REQUIRED at the workflow level for web UI
+
+nodes:
+  - id: refine
+    loop:
+      prompt: |
+        Review the current draft and improve it based on this feedback:
+        $LOOP_USER_INPUT
+
+        When the output is satisfactory, output: <promise>DONE</promise>
+      until: DONE
+      max_iterations: 5
+      interactive: true          # node level — enables the pause
+      gate_message: |
+        Review the output above. Reply with feedback, or type DONE to finish.
+```
+
+The flow:
+1. Iteration N runs. AI produces output.
+2. If AI signalled completion (`<promise>DONE</promise>`) or `until_bash` exited 0, loop ends.
+3. Otherwise: `gate_message` is sent to the user, workflow pauses (status = `paused`).
+4. User runs `archon workflow approve <run-id> "<their feedback>"` (or replies naturally in chat platforms).
+5. Iteration N+1 runs with `$LOOP_USER_INPUT` substituted to the user's feedback — but **only on that first resumed iteration**. Subsequent iterations in the same resumed session see `$LOOP_USER_INPUT` as empty string.
+6. Repeat.
+
+**Workflow-level `interactive: true` is required** for the gate message to reach the user on the web UI (otherwise the workflow dispatches to a background worker that can't deliver chat messages). The loader emits a warning if a node has `interactive: true` without workflow-level `interactive: true`.
 
 ### Completion Detection
 

From f10b989ecc59889779d0d9dc5d20c0b55f491b15 Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 17:02:41 +0300
Subject: [PATCH 05/14] fix(skill/cli): complete the CLI command reference with
 missing lifecycle commands
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The CLI reference previously documented only list, run, cleanup, validate,
complete, version, setup, and chat — missing nearly every workflow
lifecycle command an agent needs to operate a paused, failed, or stuck
run. The interactive-workflows reference assumed these commands existed
without actually documenting them.

Adds full documentation for:
- archon workflow status — show running workflow(s)
- archon workflow approve <run-id> [comment] — resume approval gate
  (also populates $LOOP_USER_INPUT on interactive loops and the gate
  node's output when capture_response: true)
- archon workflow reject <run-id> [reason] — reject gate; cancels or
  triggers on_reject rework depending on node config
- archon workflow cancel <run-id> — terminate running/paused with
  in-flight subprocess kill
- archon workflow abandon <run-id> — mark stuck row cancelled without
  subprocess kill (for orphan-cleanup after server crashes — matches
  the #1216 precedent)
- archon workflow resume <run-id> [message] — force-resume specific
  run (auto-resume is default; this is for explicit override)
- archon workflow cleanup [days] — disk hygiene for old terminal runs
  (with explicit callout that it does NOT transition 'running' rows,
  a common confusion)
- archon workflow event emit — used inside loop prompts for state
  signalling; documented so agents don't invent their own mechanism
- archon continue <branch> [flags] [msg] — iterative-session entry
  point with --workflow and --no-context flags

Also:
- Adds --allow-env-keys flag to the `workflow run` flag table with
  audit-log context and the env-leak-gate remediation use case
- Adds an "Auto-resume without --resume" note disambiguating when
  --resume is needed vs. when auto-resume handles it
- Adds --include-closed flag to `isolation cleanup`, which was
  previously missing; converts the flag list to a structured table
- Explains the cancel/abandon distinction (live subprocess vs. orphan)

All grounded in packages/cli/src/commands/workflow.ts, continue.ts,
and isolation.ts.
---
 .../skills/archon/references/cli-commands.md  | 107 +++++++++++++++++-
 1 file changed, 103 insertions(+), 4 deletions(-)

diff --git a/.claude/skills/archon/references/cli-commands.md b/.claude/skills/archon/references/cli-commands.md
index 157eacb713..5a10c01aa3 100644
--- a/.claude/skills/archon/references/cli-commands.md
+++ b/.claude/skills/archon/references/cli-commands.md
@@ -32,7 +32,8 @@ archon workflow run archon-fix-github-issue --resume
 | `--branch <name>` / `-b` | Branch name for worktree. Reuses existing worktree if healthy |
 | `--from <name>` / `--from-branch <name>` | Start-point branch for new worktree (default: repo default branch) |
 | `--no-worktree` | Skip isolation — run in the live checkout |
-| `--resume` | Resume the last failed run of this workflow (skips completed steps/nodes) |
+| `--resume` | Resume the last failed run of this workflow at this cwd (skips completed nodes) |
+| `--allow-env-keys` | Grant env-leak gate consent during auto-registration. Use when the repo's `.env` has sensitive keys (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.) and you've confirmed they should be allowed for this codebase. Audit-logged as `env_leak_consent_granted` |
 | `--cwd <path>` | Working directory override |
 
 **Flag conflicts** (errors):
@@ -42,6 +43,95 @@ archon workflow run archon-fix-github-issue --resume
 
 **Default behavior** (no flags): Auto-creates a worktree with branch name `{workflow-name}-{timestamp}`.
 
+**Auto-resume without `--resume`**: If a prior invocation of the same workflow at the same cwd failed, the next invocation automatically skips completed nodes. `--resume` is only needed when you want to force resume a specific failed run or to reuse the worktree from that run.
+
+### `archon workflow status`
+
+Show the currently running workflow (if any) with its run ID, state, and last activity.
+
+```bash
+archon workflow status
+archon workflow status --json       # Machine-readable output
+```
+
+### `archon workflow approve <run-id> [comment]`
+
+Approve a paused approval-node workflow. Auto-resumes the workflow.
+
+```bash
+archon workflow approve abc123
+archon workflow approve abc123 --comment "Plan looks good"
+archon workflow approve abc123 "Plan looks good"   # positional form
+```
+
+For interactive loop nodes, the comment becomes `$LOOP_USER_INPUT` on the next iteration. For approval nodes with `capture_response: true`, the comment becomes `$<gate-id>.output` for downstream nodes.
+
+### `archon workflow reject <run-id> [reason]`
+
+Reject a paused approval gate. Without `on_reject` on the node, cancels the workflow. With `on_reject`, runs the rework prompt with `$REJECTION_REASON` substituted and re-pauses.
+
+```bash
+archon workflow reject abc123
+archon workflow reject abc123 --reason "Plan misses test coverage"
+archon workflow reject abc123 "Plan misses test coverage"
+```
+
+### `archon workflow cancel <run-id>`
+
+Cancel a running or paused workflow. Terminates in-flight subprocesses.
+
+```bash
+archon workflow cancel abc123
+```
+
+Different from `abandon`: `cancel` actively terminates; `abandon` marks a row as cancelled without killing any subprocess (use when the subprocess is already gone, e.g. server crash).
+
+### `archon workflow abandon <run-id>`
+
+Mark a non-terminal workflow run as cancelled without terminating a subprocess. Use when a `running` row is stuck after a server crash or when you want to discard a paused run without rejecting.
+
+```bash
+archon workflow abandon abc123
+```
+
+### `archon workflow resume <run-id> [message]`
+
+Explicitly re-run a failed run. Most workflows auto-resume without this — use it when you want to force a specific run ID.
+
+```bash
+archon workflow resume abc123
+archon workflow resume abc123 "continue with the plan"
+```
+
+### `archon workflow cleanup [days]`
+
+**Deletes** old terminal workflow runs (`completed`/`failed`/`cancelled`) from the database for disk hygiene. Does NOT transition `running` rows — use `abandon`/`cancel` for those.
+
+```bash
+archon workflow cleanup             # Default: 7 days
+archon workflow cleanup 30          # Custom: 30 days
+```
+
+### `archon workflow event emit --run-id <uuid> --type <event-type> [--data <json>]`
+
+Emit a workflow event to a running workflow. Used inside loop prompts to signal state (e.g. "checkpoint written") for observability. Rarely invoked from the shell directly.
+
+```bash
+archon workflow event emit --run-id abc123 --type checkpoint --data '{"step":"plan"}'
+```
+
+### `archon continue <branch> [flags] [message]`
+
+Continue work on a branch with prior context. Defaults to `archon-assist`; use `--workflow` to pick a different workflow. Useful for iterative sessions on the same worktree without typing the full `workflow run` incantation.
+
+```bash
+archon continue feat/auth "Add password reset"
+archon continue feat/auth --workflow archon-feature-development "Continue from step 3"
+archon continue feat/auth --no-context "Start fresh without loading prior artifacts"
+```
+
+Flags: `--workflow <name>`, `--no-context`.
+
 ## Isolation Commands
 
 ### `archon isolation list`
@@ -59,11 +149,20 @@ Outputs: branch name, path, workflow type, platform, last activity age. Ghost en
 Remove stale worktree environments.
 
 ```bash
-archon isolation cleanup          # Default: 7 days
-archon isolation cleanup 14       # Custom: 14 days
-archon isolation cleanup --merged # Remove branches merged into main (+ remote branches)
+archon isolation cleanup                             # Default: 7 days
+archon isolation cleanup 14                          # Custom: 14 days
+archon isolation cleanup --merged                    # Also remove worktrees whose branches merged into main (deletes remote branches too)
+archon isolation cleanup --merged --include-closed   # Also remove worktrees whose PRs were closed without merging
 ```
 
+**Flags:**
+
+| Flag | Description |
+|------|-------------|
+| `[days]` | Positional — age threshold in days. Environments untouched for longer than this are removed. Default: 7 |
+| `--merged` | Union of three signals — ancestry (`git branch --merged`), patch equivalence (`git cherry`), and PR state (`gh`) — safely catches squash-merges |
+| `--include-closed` | With `--merged`, also remove worktrees whose PRs were closed (abandoned, not merged) |
+
 ## Validate Commands
 
 ### `archon validate workflows [name]`

From 02898ec6c87f0a542448e2bbc3d19b4d10d6c81e Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 17:03:52 +0300
Subject: [PATCH 06/14] feat(skill/repo-init): add scripts/ and state/,
 three-path env model, per-project env injection
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The repo-init reference was missing two first-class .archon/ directories
(scripts/ since v0.3.3, state/ since the workflow-state feature) and had
nothing to say about env — the #1 thing a user hits on first-run when
their repo has a .env file with API keys.

Directory tree updates:
- Adds .archon/scripts/ with the extension->runtime rule (.ts/.js -> bun,
  .py -> uv) so agents know where to put named scripts referenced by
  script: nodes.
- Adds .archon/state/ with explicit "always gitignore" callout — these
  are runtime artifacts, not source. Previously undocumented in the skill.
- Adds .archon/.env (repo-scoped Archon env) and distinguishes it from
  the target repo's top-level .env.
- Adds a "What each directory is for" list so the structure isn't just
  a tree with no narrative.

.gitignore guidance:
- state/ and .env added as must-gitignore (state/ matches CLAUDE.md and
  reference/archon-directories.md — skill was lagging).
- mcp/ demoted to conditional — gitignore only if you hardcode secrets.

New "Three-Path Env Model" section:
- ~/.archon/.env (trusted, user), <cwd>/.archon/.env (trusted, repo),
  <cwd>/.env (UNTRUSTED, target project — stripped from subprocess env).
- Precedence (override: true across archon-owned paths) and the
  observable [archon] loaded N keys / stripped K keys log lines so
  operators can verify what actually happened.
- Decision tree for where to put API keys vs. target-project env vs.
  things Archon shouldn't touch.
- Links to archon setup --scope home|project with --force for writing
  to the right file with timestamped backups.

New "Per-Project Env Injection" section:
- Documents both managed surfaces: .archon/config.yaml env: block
  (git-committed, $REF expansion) and Web UI Settings → Projects →
  Env Vars (DB-stored, never returned over API).
- Names every execution surface that receives the injected vars:
  Claude/Codex/Pi subprocess, bash: nodes, script: nodes, and direct
  codebase-scoped chat.
- Documents the env-leak gate with all 5 remediation paths so an agent
  hitting "Cannot register: env has sensitive keys" knows the options.

Grounded in CHANGELOG v0.3.7 (three-path env + setup flags), v0.3.0
(env-leak gate), and reference/security.md on the docs site.
---
 .claude/skills/archon/references/repo-init.md | 79 +++++++++++++++++--
 1 file changed, 74 insertions(+), 5 deletions(-)

diff --git a/.claude/skills/archon/references/repo-init.md b/.claude/skills/archon/references/repo-init.md
index 66be6375f5..14005c68b6 100644
--- a/.claude/skills/archon/references/repo-init.md
+++ b/.claude/skills/archon/references/repo-init.md
@@ -10,14 +10,27 @@ Create the following in your repository root:
 .archon/
 ├── commands/         # Custom command files (.md)
 ├── workflows/        # Workflow definitions (.yaml)
+├── scripts/          # Named scripts for script: nodes (.ts/.js for bun, .py for uv) — optional
 ├── mcp/              # MCP server config files (.json) — optional
-└── config.yaml       # Repo-specific configuration — optional
+├── state/            # Cross-run workflow state — gitignored, never committed
+├── config.yaml       # Repo-specific configuration — optional
+└── .env              # Repo-scoped Archon env (optional; do NOT commit)
 ```
 
 ```bash
-mkdir -p .archon/commands .archon/workflows
+mkdir -p .archon/commands .archon/workflows .archon/scripts
 ```
 
+**What each directory is for:**
+
+- `commands/` — Reusable prompt templates used by `command:` workflow nodes. Committed to git.
+- `workflows/` — YAML workflow definitions. Committed to git.
+- `scripts/` — Named TypeScript/JavaScript (bun) or Python (uv) scripts referenced by `script:` nodes. Extension determines runtime: `.ts`/`.js` → bun, `.py` → uv. Committed to git.
+- `mcp/` — MCP server JSON configs. Usually checked in with `$ENV_VAR` references; avoid hardcoding secrets. Some teams gitignore this and rely entirely on env expansion.
+- `state/` — Workflow-written cross-run state (e.g. the `repo-triage` dedup log). **Always gitignore** — these are runtime artifacts, not source.
+- `config.yaml` — Repo-specific defaults (assistant, worktree settings, etc.). Committed to git.
+- `.env` — Repo-scoped Archon env (loaded with `override: true` at boot). **Do NOT commit.** This is different from the target repo's top-level `.env` — that file belongs to the target project, and Archon strips its auto-loaded keys from subprocess env before spawning AI to prevent leakage. See **Three-Path Env Model** below.
+
 ## Minimal config.yaml
 
 Create `.archon/config.yaml` only if you need to override defaults:
@@ -52,11 +65,67 @@ Archon ships with built-in commands and workflows (like `archon-assist`, `archon
 Add to your `.gitignore`:
 
 ```gitignore
-# Archon runtime artifacts (never commit)
-.archon/mcp/          # May contain env var references
+# Archon runtime artifacts — NEVER commit
+.archon/state/        # Cross-run workflow state, runtime-only
+.archon/.env          # Repo-scoped Archon env (secrets)
+
+# Optional — gitignore if your MCP configs hardcode secrets
+.archon/mcp/
+```
+
+`.archon/commands/`, `.archon/workflows/`, and `.archon/scripts/` **should be committed** — they are part of your project's workflow definitions. `.archon/config.yaml` should be committed unless it contains secrets (use `.archon/.env` for those instead).
+
+## Three-Path Env Model
+
+Archon loads env from three distinct paths at boot, with different trust levels and precedence:
+
+| Path | Scope | Trust | Loaded? |
+|------|-------|-------|---------|
+| `~/.archon/.env` | User (home) | Trusted — user owns it | Yes, with `override: true` |
+| `<cwd>/.archon/.env` | Repo (per-project, Archon-owned) | Trusted — user owns it | Yes, with `override: true` (overrides home) |
+| `<cwd>/.env` | Target repo | **Untrusted** — belongs to the project being worked on | **Stripped from `process.env`** before subprocess spawn to prevent secret leakage (see [Security Model](../../../packages/docs-web/src/content/docs/reference/security.md) on the docs site) |
+
+Boot behavior emits observable log lines:
+
+```
+[archon] loaded N keys from ~/.archon/.env
+[archon] loaded M keys from /path/to/repo/.archon/.env
+[archon] stripped K keys from /path/to/repo (ANTHROPIC_API_KEY, OPENAI_API_KEY, ...)
+```
+
+**Where should you put what?**
+
+- **API keys for Archon itself** (`ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, `DATABASE_URL`, `SLACK_BOT_TOKEN`, etc.) → `~/.archon/.env` (shared across all repos) or `<cwd>/.archon/.env` (per-repo override).
+- **Target-project env that a workflow needs** (`GH_TOKEN`, `DOTENV_PRIVATE_KEY`, etc.) → see [Per-Project Env Injection](#per-project-env-injection) below.
+- **Target-project env that Archon should NOT touch** → leave it in `<cwd>/.env` where the project already expects it. Archon strips it from subprocess env but doesn't delete the file.
+
+The `archon setup --scope home|project [--force]` wizard writes to the right file for you and produces a timestamped backup on every rewrite.
+
+## Per-Project Env Injection
+
+For env vars a workflow's `bash:` and `script:` subprocesses need (`GH_TOKEN` for `gh` calls, `DATABASE_URL` for a migration script, etc.), use one of the two **managed injection** surfaces — both inject into subprocess env at workflow execution time, after the target-repo `.env` strip:
+
+**Option 1: `.archon/config.yaml` `env:` block** (checked into git; values can be `$REF_NAME` expansions from Archon env):
+
+```yaml
+env:
+  GH_TOKEN: $GH_TOKEN             # expanded from ~/.archon/.env at runtime
+  BUILD_TARGET: production        # literal value
 ```
 
-The `.archon/commands/` and `.archon/workflows/` directories should be committed — they are part of your project's workflow definitions.
+**Option 2: Web UI Settings → Projects → Env Vars** — per-codebase, stored in the Archon DB, values never returned over the API (only keys are listed). Use this for values that should NOT appear in git.
+
+Both surfaces inject into: Claude/Codex/Pi subprocess env, `bash:` node subprocess env, `script:` node subprocess env, and direct chat messages that run against the codebase. The worktree isolation layer propagates them as well.
+
+**Env-leak gate** — when a codebase's auto-loaded `<cwd>/.env` contains sensitive keys (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, and 5 others), Archon refuses to register or spawn into that codebase. Remediations (any one):
+
+1. Remove the key from the target `.env`
+2. Rename `.env` → `.env.secrets` (changes the auto-load behavior)
+3. Web UI: Settings → Projects → flip "Allow env keys" to on
+4. CLI: `archon workflow run --allow-env-keys ...`
+5. Global bypass: `allow_target_repo_keys: true` in `~/.archon/config.yaml`
+
+Full details in `reference/security.md` on the docs site.
 
 ## Global Configuration
 

From eaf2af66025c1000c2410d3faa7bf4086e42b7df Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 17:04:40 +0300
Subject: [PATCH 07/14] fix(skill/authoring-commands): correct override paths
 and add home-scoped commands
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The file-location and discovery sections described an override layout that
does not match the actual resolver. It showed:

  .archon/commands/defaults/archon-assist.md  # Overrides the bundled

and claimed `.archon/commands/defaults/` was where repo-level overrides
lived. In fact the resolver (executor-shared.ts:152-200 + command-
validation.ts) walks `.archon/commands/` 1 level deep and uses basename
matching — putting `archon-assist.md` at the top of `.archon/commands/`
is the canonical way to override the bundled version. The `defaults/`
subfolder is a Archon-internal convention for shipping bundled defaults,
not a user-facing override pattern.

Also, home-scoped commands (`~/.archon/commands/`, shipped in v0.3.7)
were completely absent — agents authoring personal helpers wouldn't
know they could live at the user level and be shared across every repo.

Changes:
- File Location section now shows all three discovery scopes (repo,
  home, bundled) with precedence ordering and 1-level subfolder rules
- Duplicate-basename rule documented as a user error surface
- Discovery and Priority section rewritten with accurate 3-step lookup
  order — no more references to the nonexistent defaults/ override path
- Adds the Web UI "Global (~/.archon/commands/)" palette label note so
  users authoring helpers for the builder know what to expect

No code changes — this is a pure fix of stale/incorrect skill reference
material.
---
 .../archon/references/authoring-commands.md   | 34 ++++++++++++++-----
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/.claude/skills/archon/references/authoring-commands.md b/.claude/skills/archon/references/authoring-commands.md
index 0b1240da6b..603dd3e4a3 100644
--- a/.claude/skills/archon/references/authoring-commands.md
+++ b/.claude/skills/archon/references/authoring-commands.md
@@ -4,14 +4,29 @@ Commands are plain Markdown files containing AI prompt templates. They are the a
 
 ## File Location
 
+Commands are discovered from three scopes, highest-precedence first:
+
 ```
-.archon/commands/
-├── my-command.md           # Custom command
-├── review-code.md          # Another custom command
-└── defaults/               # Optional: override bundled defaults
-    └── archon-assist.md    # Overrides the bundled archon-assist
+<repoRoot>/.archon/commands/     # 1. Repo-scoped (wins)
+├── my-command.md                #    Custom command for this repo
+├── archon-assist.md             #    Overrides the bundled archon-assist
+└── triage/                      #    Subfolders allowed, 1 level deep
+    └── review.md                #    Resolves as 'review', not 'triage/review'
+
+~/.archon/commands/              # 2. Home-scoped (user-level, shared across all repos)
+├── review-checklist.md          #    Personal helper available in every repo
+└── pr-style-guide.md
+
+<bundled defaults>                # 3. Shipped with Archon (archon-assist, etc.)
 ```
 
+**Resolution rules:**
+
+- Filename-without-extension is the command name (e.g. `my-command.md` → `my-command`).
+- 1-level subfolders are supported for grouping; resolution is still by filename (`triage/review.md` → `review`).
+- Repo scope overrides home scope overrides bundled, by name.
+- Duplicate basenames **within a scope** (e.g. two different `review.md` files in `triage/` and `security/`) are a user error — keep names unique within each scope.
+
 Commands are referenced by name (without `.md`) in workflow YAML files.
 
 ## File Format
@@ -78,11 +93,14 @@ Command names must:
 ## Discovery and Priority
 
 When a workflow references `command: my-command`, Archon searches in this order:
-1. `.archon/commands/my-command.md` (repo custom)
-2. `.archon/commands/defaults/my-command.md` (repo default overrides)
+
+1. `<repoRoot>/.archon/commands/my-command.md` (repo scope)
+2. `~/.archon/commands/my-command.md` (home scope — shared across every repo on the machine)
 3. Bundled defaults (shipped with Archon)
 
-First match wins. To override a bundled command, create a file with the same name in your repo.
+First match wins. To override a bundled command, drop a file with the same name at either scope. To override a home-scoped command for a specific repo, drop a file with the same name in that repo's `.archon/commands/`.
+
+> **Web UI note**: Home-scoped commands appear in the workflow builder's node palette under a dedicated "Global (~/.archon/commands/)" section, distinct from project and bundled entries.
 
 ## Referencing Commands from Workflows
 

From 66d2b86e21b6beeb8d4fb7636af457e6c3e60b5a Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 17:07:34 +0300
Subject: [PATCH 08/14] feat(skill): add workflow good-practices and
 troubleshooting reference pages
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes two gaps from the audit. The skill previously had zero guidance on
designing multi-node workflows (what to avoid, what to reach for first,
how to structure artifact chains) and zero guidance on where to look
when things go wrong (log paths, env-leak gate remediations, orphan-row
cleanup, resume semantics).

New references/good-practices.md (9 Good Practices + 7 Anti-Patterns):

- Use deterministic nodes (bash:/script:) for deterministic work, AI for
  reasoning — the single biggest quality lever
- output_format required whenever downstream when: reads a field — the
  most common source of "workflow silently routes wrong"
- trigger_rule: none_failed_min_one_success after conditional branches —
  the classic bug where all_success fails because a skipped when:-gated
  branch doesn't count as a success
- context: fresh requires artifacts for state passing — commands must
  explicitly "read $ARTIFACTS_DIR/..." when downstream of fresh
- Cheap models (haiku) for glue, strong for substance
- Workflow descriptions as routing affordances
- Validate (archon validate workflows) + smoke-run before shipping
- Artifact-chain-first design
- worktree.enabled: true for code-changing workflows (reversibility)
- Anti-patterns with before/after YAML examples for each (AI-for-tests,
  free-form when: matching, context: fresh without artifacts, long flat
  AI-node layers, secrets in YAML, retry on loop nodes, tiny
  max_iterations, missing workflow-level interactive:, tool-restricted
  MCP nodes)

New references/troubleshooting.md:

- Log location (~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl)
  with jq recipes for common queries (last assistant message, failed
  events, full stream)
- Artifact location for cross-node handoff debugging
- 9 Common Failure Modes, each with root cause + concrete fix:
  - $BASE_BRANCH unresolvable
  - Env-leak gate (5 remediations)
  - Claude/Codex binary not found (compiled-binary-only)
  - "running" forever (AI working / orphan / idle_timeout)
  - Mid-workflow failure and auto-resume semantics
  - Approval gate missing on web UI (workflow-level interactive:)
  - MCP plugin connection noise (filtered by design)
  - Empty $nodeId.output / field access (4 causes)
- Diagnostic command cheat sheet (list, status, isolation list, validate,
  tail-log, --verbose, LOG_LEVEL=debug)
- Escalation protocol (version + validate + log tail + CHANGELOG + issue)

SKILL.md routing table now dispatches "Workflow good practices /
anti-patterns" and "Troubleshoot a failing / stuck workflow" to the new
references so an agent can find them without having to know they exist.
---
 .claude/skills/archon/SKILL.md                |   2 +
 .../archon/references/good-practices.md       | 241 ++++++++++++++++++
 .../archon/references/troubleshooting.md      | 171 +++++++++++++
 3 files changed, 414 insertions(+)
 create mode 100644 .claude/skills/archon/references/good-practices.md
 create mode 100644 .claude/skills/archon/references/troubleshooting.md

diff --git a/.claude/skills/archon/SKILL.md b/.claude/skills/archon/SKILL.md
index 18dda373dc..e60e170d56 100644
--- a/.claude/skills/archon/SKILL.md
+++ b/.claude/skills/archon/SKILL.md
@@ -42,6 +42,8 @@ Determine the user's intent and dispatch to the appropriate guide:
 | **Variable substitution reference** | Read `references/variables.md` |
 | **CLI command reference** | Read `references/cli-commands.md` |
 | **Run an interactive workflow** | Read `references/interactive-workflows.md` — transparent relay protocol |
+| **Workflow good practices / anti-patterns** | Read `references/good-practices.md` — read before designing a non-trivial workflow |
+| **Troubleshoot a failing / stuck workflow** | Read `references/troubleshooting.md` — log locations, common failure modes |
 | **Run a workflow (default)** | Continue with "Running Workflows" below |
 
 If the intent is ambiguous, ask the user to clarify.
diff --git a/.claude/skills/archon/references/good-practices.md b/.claude/skills/archon/references/good-practices.md
new file mode 100644
index 0000000000..34b2d4e403
--- /dev/null
+++ b/.claude/skills/archon/references/good-practices.md
@@ -0,0 +1,241 @@
+# Workflow Good Practices and Anti-Patterns
+
+Guidance for authoring workflows that survive first contact with a real codebase. Written for an agent or human writing their first non-trivial workflow.
+
+## Good Practices
+
+### 1. Use deterministic nodes for deterministic work
+
+AI nodes are expensive, non-reproducible, and can hallucinate. Use `bash:` or `script:` for anything that has a right answer a computer can produce.
+
+- **Run tests** with `bash: "bun run test"`, not `prompt: "run the tests and tell me if they passed"`.
+- **Parse JSON** with `script:` (bun/uv), not a `prompt:` that re-derives structure from free text.
+- **Read files with known paths** via `bash: "cat path/to/file"` or `Read` in an AI node where the agent actually needs to reason about the content.
+- **Git state checks** (current branch, uncommitted changes, merge-base) → `bash:`.
+
+### 2. Use `output_format` for every node whose output downstream `when:` reads
+
+`when:` conditions do best-effort JSON parsing on `$nodeId.output` for `.field` access. If the upstream node doesn't enforce a shape, you're pattern-matching free-form AI text — fragile.
+
+```yaml
+# GOOD
+- id: classify
+  prompt: "Classify as BUG or FEATURE"
+  output_format:                          # enforces the JSON shape
+    type: object
+    properties:
+      type: { type: string, enum: [BUG, FEATURE] }
+    required: [type]
+
+- id: investigate
+  command: investigate-bug
+  depends_on: [classify]
+  when: "$classify.output.type == 'BUG'"  # safe field access
+
+# BAD
+- id: classify
+  prompt: "Is this a bug or a feature?"
+  # no output_format; AI might reply "it looks like a bug", "BUG", or "This is a bug.\n\n..."
+
+- id: investigate
+  command: investigate-bug
+  depends_on: [classify]
+  when: "$classify.output == 'BUG'"       # fragile string match
+```
+
+### 3. `trigger_rule: none_failed_min_one_success` after conditional branches
+
+After `when:`-gated branches, the downstream merge node will see one or more **skipped** dependencies. Skipped ≠ success. Default `all_success` fails.
+
+```yaml
+- id: investigate
+  command: investigate-bug
+  depends_on: [classify]
+  when: "$classify.output.type == 'BUG'"
+
+- id: plan
+  command: plan-feature
+  depends_on: [classify]
+  when: "$classify.output.type == 'FEATURE'"
+
+- id: implement
+  command: implement
+  depends_on: [investigate, plan]
+  trigger_rule: none_failed_min_one_success   # CORRECT — exactly one ran
+  # trigger_rule: all_success               ← would fail here (one dep skipped)
+```
+
+Use `one_success` when any dep succeeding is enough; `none_failed_min_one_success` when no dep should have failed AND at least one must have succeeded; `all_done` for "run cleanup regardless" patterns with `cancel:` or notification nodes.
+
+### 4. `context: fresh` requires artifacts for state passing
+
+A node with `context: fresh` starts with no memory of prior nodes in the same workflow. The only way state moves is via files. Default is `fresh` for parallel layers and `shared` for sequential — explicit `context: fresh` is common when you want cost isolation.
+
+```yaml
+- id: investigate
+  command: investigate-bug
+  # Investigator WRITES to $ARTIFACTS_DIR/investigation.md
+
+- id: implement
+  command: implement-fix
+  depends_on: [investigate]
+  context: fresh
+  # Implementer MUST read $ARTIFACTS_DIR/investigation.md — it has no memory
+  # of what the investigator found.
+```
+
+Command files should lead with "read artifacts from `$ARTIFACTS_DIR/...`" when they're downstream of a fresh node. This is the single biggest quality lever on multi-node workflows.
+
+### 5. Cheap models for glue, strong models for substance
+
+Classification, routing, formatting, and short summaries don't need Opus. Use `model: haiku` for these and reserve `sonnet`/`opus` for the nodes that actually produce code or long-form analysis. Combined with `allowed_tools: []` on pure-text nodes, this cuts cost dramatically.
+
+```yaml
+- id: classify
+  prompt: "Classify this issue"
+  model: haiku              # fast + cheap
+  allowed_tools: []         # no tool overhead
+  output_format: { ... }
+
+- id: implement
+  command: implement-fix
+  model: sonnet             # where the thinking happens
+```
+
+### 6. Write the workflow description for routing
+
+Archon's orchestrator routes user intent to workflows by description. Write descriptions that make routing obvious.
+
+- Start with the imperative action: "Fix a GitHub issue end-to-end", "Generate a Remotion video composition".
+- Mention triggers: "Use when the user asks to review a PR", "Use when there's a failing test run".
+- Mention what it does NOT do: "Does not create a PR — use `archon-plan-to-pr` for that".
+
+### 7. Validate before shipping
+
+Never declare a workflow "done" without:
+
+```bash
+archon validate workflows <name>     # YAML + DAG structure + resource refs
+```
+
+This checks: YAML syntax, node ID uniqueness, no cycles, all `depends_on` exist, all `$nodeId.output` refs point to known nodes, all `command:` files exist, all `mcp:` configs parse, all `skills:` directories exist, provider/model compatibility, named script existence, runtime availability. Fix everything it reports before first run.
+
+For brand-new workflows, also:
+1. Run once against a trivial input (`archon workflow run my-workflow --branch test/sanity "hello"`)
+2. Check the run log at `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl`
+3. Check artifacts at `~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<run-id>/`
+
+See `references/troubleshooting.md` for how to read those.
+
+### 8. Design the artifact chain before writing command files
+
+In a multi-node workflow, each node's artifact IS the specification for the next node. Before writing any command body, map out:
+
+| Node | Reads | Writes |
+|------|-------|--------|
+| `investigate-issue` | GitHub issue via `gh` | `$ARTIFACTS_DIR/issues/issue-{n}.md` |
+| `implement-issue` | Artifact from `investigate-issue` | Code files, tests |
+| `create-pr` | Git diff | GitHub PR, `$ARTIFACTS_DIR/pr-body.md` |
+
+If a downstream agent can't execute from just its artifact, the artifact is incomplete. This is the single most common failure mode in multi-node workflows.
+
+### 9. Keep workflows reversible
+
+Use `worktree.enabled: true` at the workflow level for anything that modifies the codebase. The CLI `--no-worktree` flag will hard-error, forcing users into isolation. The cost is a one-time cp of the worktree; the benefit is never having a failed workflow corrupt a live checkout.
+
+For read-only workflows (triage, reporting, code analysis), pin `worktree.enabled: false` instead — saves the worktree setup cost.
+
+---
+
+## Anti-Patterns
+
+### ❌ Asking AI to run deterministic checks
+
+```yaml
+# BAD
+- id: test
+  prompt: "Run bun run test and tell me if it passed"
+
+# GOOD
+- id: test
+  bash: "bun run test 2>&1"
+
+- id: react-to-tests
+  prompt: "Fix any failures: $test.output"
+  depends_on: [test]
+  trigger_rule: all_done            # run even if tests failed
+```
+
+### ❌ Pattern-matching free-form AI output in `when:`
+
+```yaml
+# BAD — brittle
+- id: decide
+  prompt: "Should we proceed? Answer yes or no."
+- id: do-thing
+  depends_on: [decide]
+  when: "$decide.output == 'yes'"    # AI says "Yes!" or "Yes, because..." — no match
+
+# GOOD
+- id: decide
+  prompt: "Should we proceed?"
+  output_format:
+    type: object
+    properties: { proceed: { type: boolean } }
+    required: [proceed]
+- id: do-thing
+  depends_on: [decide]
+  when: "$decide.output.proceed == 'true'"
+```
+
+### ❌ Commands that assume prior-node memory in a `context: fresh` chain
+
+```markdown
+<!-- BAD — implement.md -->
+Fix the bug we discussed in the investigation phase.
+
+<!-- GOOD — implement.md -->
+Read the investigation at `$ARTIFACTS_DIR/issues/issue-{n}.md`.
+Extract the root cause, affected files, and implementation plan.
+Implement the changes exactly as specified in the plan.
+```
+
+### ❌ Long flat layers of AI nodes
+
+Ten sibling `prompt:` nodes in one layer all depending on one upstream is a $N/run cost bomb and a latency trap. If the work is parallel and similar, use the `agents:` inline sub-agent map-reduce pattern with a cheap model per item and a single stronger reducer. See `references/dag-advanced.md` and the docs site's Inline sub-agents section.
+
+### ❌ Hardcoding secrets in YAML or MCP configs
+
+Use `$ENV_VAR` expansion in MCP configs and the `env:` block in `.archon/config.yaml` (or Web UI Settings → Projects → Env Vars). See `references/repo-init.md` §Per-Project Env Injection.
+
+### ❌ `retry` on a loop node
+
+Loop nodes manage their own iteration via `max_iterations`. Setting `retry:` on a loop is a **hard parse error** — the workflow fails to load. If a loop iteration is flaky, handle it inside the loop prompt (the AI can retry tool calls) or use `until_bash` to gate completion on a deterministic check.
+
+### ❌ Tiny `max_iterations` on open-ended loops
+
+A loop with `max_iterations: 3` that's supposed to implement N stories from a PRD will silently stop after 3 iterations and leave the work half-done. Think about the worst case — multi-story PRDs need 10–20, fix-iterate cycles need 5–8, refinement loops need 3–5.
+
+### ❌ Missing `interactive: true` at workflow level for approval/loop gates on web
+
+Web UI dispatches non-interactive workflows to a background worker that cannot deliver chat messages. Approval-gate messages and loop `gate_message` prompts will never reach the user. If the workflow has `approval:` nodes OR `loop.interactive: true`, set workflow-level `interactive: true`.
+
+### ❌ Tool-restricted nodes without the MCP wildcard
+
+```yaml
+# BAD — no tools available, including MCP
+- id: analyze
+  prompt: "Use the Postgres MCP to query users"
+  mcp: .archon/mcp/postgres.json
+  allowed_tools: []          # OOPS — disables EVERYTHING, including MCP tools
+
+# FIXED — Archon auto-adds mcp__<server>__* wildcards when mcp: is set,
+# so this actually works out of the box. The anti-pattern is forgetting
+# and manually adding Read/Write/Bash/etc. when you only want MCP.
+- id: analyze
+  prompt: "Use Postgres MCP to query users"
+  mcp: .archon/mcp/postgres.json
+  allowed_tools: []          # correct — MCP tools auto-attached
+```
+
+Caveat: this only helps Claude. Codex gets MCP config from `~/.codex/config.toml` globally, not per-node.
diff --git a/.claude/skills/archon/references/troubleshooting.md b/.claude/skills/archon/references/troubleshooting.md
new file mode 100644
index 0000000000..7405f57a9b
--- /dev/null
+++ b/.claude/skills/archon/references/troubleshooting.md
@@ -0,0 +1,171 @@
+# Troubleshooting Workflows
+
+Where to look when a workflow fails, hangs, or does the wrong thing.
+
+## Log Locations
+
+Workflow run logs are written as JSONL per run:
+
+```
+~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl
+```
+
+Each line is a structured event. Common event types:
+
+| Event | Meaning |
+|-------|---------|
+| `workflow_started` / `workflow_completed` / `workflow_failed` | Run lifecycle |
+| `node_started` / `node_completed` / `node_failed` / `node_skipped` | Node lifecycle |
+| `assistant` | AI assistant message (has `content` field with the full AI output) |
+| `tool_use` / `tool_result` | SDK tool call + result |
+| `retry_attempt` | Node retry with attempt number and reason |
+| `loop_iteration_started` / `loop_iteration_completed` | Loop bookkeeping |
+
+Find the run ID from `archon workflow status` or `archon workflow list` (most recent run). Then:
+
+```bash
+# Last assistant message (what the AI said before failure)
+jq 'select(.type == "assistant") | .content' <log-file> | tail -1
+
+# All failed events
+jq 'select(.event == "node_failed" or .event == "workflow_failed")' <log-file>
+
+# Full event stream
+cat <log-file> | jq .
+```
+
+Adapter logs (Slack / Telegram / Web / GitHub) are emitted to stderr when `LOG_LEVEL=debug` is set on the server.
+
+## Artifact Locations
+
+```
+~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<run-id>/
+```
+
+Inspect artifacts when a multi-node workflow produces wrong output. The failing node's upstream artifact is usually where the problem originated.
+
+```bash
+ls ~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<run-id>/
+cat ~/.archon/workspaces/<owner>/<repo>/artifacts/runs/<run-id>/issues/issue-42.md
+```
+
+Artifacts are **external** to the repo on purpose — they don't pollute git.
+
+## Common Failure Modes
+
+### "No base branch could be resolved"
+
+A node references `$BASE_BRANCH` in its prompt, but neither git auto-detection nor `worktree.baseBranch` in `.archon/config.yaml` produced a branch.
+
+**Fix:**
+1. Set `worktree.baseBranch: main` (or `dev`, or whatever) in `.archon/config.yaml`.
+2. Or pass `--from <branch>` on `archon workflow run`.
+3. Or remove the `$BASE_BRANCH` reference if the node doesn't actually need it.
+
+### "Cannot register: codebase has sensitive env keys"
+
+The env-leak gate blocked the workflow because `<cwd>/.env` contains keys like `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`.
+
+**Remediations (any one):**
+1. Remove the key from `<cwd>/.env`.
+2. Rename the file to `.env.secrets` — auto-loading no longer applies.
+3. Web UI: Settings → Projects → flip "Allow env keys" on.
+4. CLI: rerun with `archon workflow run --allow-env-keys ...`.
+5. Machine-wide bypass: `allow_target_repo_keys: true` in `~/.archon/config.yaml`.
+
+### "Claude Code not found" / "Codex CLI binary not found"
+
+Compiled-binary builds of Archon no longer embed Claude Code / Codex — you install them separately and Archon resolves the binary via env var or config.
+
+**Fix (Claude):**
+- Install: `curl -fsSL https://claude.ai/install.sh | bash` (or `npm install -g @anthropic-ai/claude-code`)
+- Set `CLAUDE_BIN_PATH=/path/to/claude` in `~/.archon/.env`, OR
+- Set `assistants.claude.claudeBinaryPath: /absolute/path` in `.archon/config.yaml`
+- Autodetect covers `$HOME/.local/bin/claude` (native installer) — no config needed if you used that path
+
+**Fix (Codex):**
+- Install: `npm install -g @openai/codex` (or platform-specific instructions)
+- Set `CODEX_CLI_PATH=/path/to/codex` or `assistants.codex.codexBinaryPath` in config
+- Autodetect covers the standard npm / Homebrew locations per platform
+
+See the Install page on the docs site for full platform-specific install paths.
+
+### Workflow shows `running` for a long time but nothing happens
+
+Three possibilities:
+
+1. **The AI is actually working.** Check `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl` — if you see recent `tool_use` events, it's fine. Wait.
+2. **The server crashed and left an orphan row.** Server startup no longer auto-fails orphaned `running` rows (per the "No Autonomous Lifecycle Mutation" rule — `CLAUDE.md`). Transition it manually:
+   - Web UI: Dashboard → Abandon or Cancel button on the run card
+   - CLI: `archon workflow abandon <run-id>` (no subprocess kill, for orphans) or `archon workflow cancel <run-id>` (with subprocess kill, for stuck live runs)
+3. **A node is past its `idle_timeout`.** The default is 5 minutes. Override with per-node `idle_timeout: 600000` (10 min) for long-running nodes.
+
+### Workflow fails mid-way; how do I resume?
+
+Auto-resume is default — just re-invoke the same workflow at the same cwd:
+
+```bash
+archon workflow run my-workflow "original message"
+# → "Resuming workflow — skipping N already-completed node(s)"
+```
+
+Use `--resume` only when you want to force-reuse the same worktree from a specific failed run. Use `archon workflow resume <run-id>` to force a specific run ID.
+
+**Caveat:** AI session context from prior nodes is NOT restored on resume. If a `context: shared` node depended on in-session memory, re-running it will have fresh context. Artifact-based handoff survives; in-context memory does not.
+
+### Approval gate not appearing on web UI
+
+You set `interactive: true` on the approval node but the workflow still runs in the background and no chat message appears.
+
+**Fix:** Set `interactive: true` at the **workflow level** too. Node-level `interactive` is ignored on web without workflow-level `interactive`. See `references/workflow-dag.md` §Approval Nodes and §Interactive Loops.
+
+### `MCP server connection failed: <plugin>` noise in chat
+
+User-level Claude plugin MCPs (e.g. `telegram`, `notion`) inherited from `~/.claude/` fail to connect in the headless subprocess. This is normal — they're not configured for Archon's worktree context. Archon filters these to debug logs (`dag.mcp_plugin_connection_suppressed`) and surfaces only workflow-configured MCP failures.
+
+If you see a failure for an MCP you DID configure via `mcp:` in the workflow: check the config JSON path, the MCP server's `command`/`args`, and any referenced env vars.
+
+### Node output is empty / `$nodeId.output.field` resolves to empty string
+
+Common causes:
+
+1. Upstream node is an AI node without `output_format` — the output is free-form text, JSON parsing fails, field access returns empty.
+2. Upstream node was **skipped** (its `when:` evaluated false). Downstream `when:` with `==` comparisons against a specific value will fail-closed.
+3. Bash/script node printed to stderr, not stdout. Only stdout is captured.
+4. For script nodes, non-zero exit on a non-existent file / missing import silently drops the output. Check the run log for `node_failed`.
+
+## Useful Diagnostic Commands
+
+```bash
+# What ran recently and how did each run end
+archon workflow list --json | jq '.workflows[] | select(.runs)'
+
+# Current status of any active runs
+archon workflow status
+
+# Active worktrees and their last activity
+archon isolation list
+
+# Validate a specific workflow before running
+archon validate workflows my-workflow
+
+# Validate a specific command
+archon validate commands my-command
+
+# Dump the last 50 lines of a workflow's log
+tail -n 50 ~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl | jq .
+
+# Increase log verbosity (workflow run)
+archon workflow run my-workflow --verbose "..."
+
+# Increase server log verbosity
+LOG_LEVEL=debug bun run start
+```
+
+## Escalation: when nothing makes sense
+
+1. Run `archon version` and note the version.
+2. Run `archon validate workflows <name>` and capture the output.
+3. Grab the last ~50 lines of the run's JSONL log.
+4. Check the `CHANGELOG.md` for known issues / recent changes to the subsystem you're hitting.
+5. File an issue at https://github.com/coleam00/Archon/issues with version, validate output, log tail, and the YAML.

From 1380ffc60e26797c22a9f30ac1e8a2ca3c977292 Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Wed, 22 Apr 2026 17:24:28 +0300
Subject: [PATCH 09/14] docs(book): update node-types coverage from four to all
 seven
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The book is the curated first-contact reading path (landing page → "Get
Started" → /book/). Both dag-workflows.md and quick-reference.md were
stuck on "four node types" — missing script, approval, and cancel. A user
reading the book as their first introduction would form an incomplete
mental model, then find three more node types in the reference section
later with no explanation of when they arrived.

book/dag-workflows.md:
- "four node types" → "seven node types. Exactly one mode field is
  required per node"
- Table now lists Command, Prompt, Bash, Script, Loop, Approval, Cancel
  with one-line "when to use" for each, and cross-links to the dedicated
  guide pages for Script / Loop / Approval
- New sections below the table for Script (inline + named examples with
  runtime and deps), Approval (with the interactive: true workflow-level
  note that's easy to miss), and Cancel (guarded-exit pattern) — keeping
  the existing narrative shape for Bash and Loop

book/quick-reference.md:
- Node Options table now includes script, approval, cancel rows
- agents row added (inline sub-agents, Claude-only)
- New "Script-specific fields" and "Approval-specific fields" subsections
  so the cheat-sheet is actually complete rather than pointing users
  elsewhere for the required constraints
- Retry row callout that loop nodes hard-error on retry — previously
  omitted
- bash timeout note widened to cover script timeout (same semantics)

Both files are docs-web content; the CI build on the docs-script-nodes
PR (#1362) previously validated the Starlight build path with a similar
table addition, so this should render clean.
---
 .../src/content/docs/book/dag-workflows.md    | 52 +++++++++++++++++--
 .../src/content/docs/book/quick-reference.md  | 28 ++++++++--
 2 files changed, 70 insertions(+), 10 deletions(-)

diff --git a/packages/docs-web/src/content/docs/book/dag-workflows.md b/packages/docs-web/src/content/docs/book/dag-workflows.md
index 93bf766872..558df2590f 100644
--- a/packages/docs-web/src/content/docs/book/dag-workflows.md
+++ b/packages/docs-web/src/content/docs/book/dag-workflows.md
@@ -230,23 +230,23 @@ The classify-and-route example uses `none_failed_min_one_success` on `implement`
 
 ## Node Types
 
-Archon supports seven node types:
+Archon supports seven node types. Exactly one mode field is required per node:
 
 | Type | Syntax | When to use |
 |------|--------|-------------|
 | **Command** | `command: my-command` | Load a command from `.archon/commands/my-command.md`. The standard choice. |
 | **Prompt** | `prompt: "inline instructions..."` | Quick, one-off instructions that don't need a reusable command file. |
 | **Bash** | `bash: "shell command"` | Run a shell script without AI. Stdout is captured as `$nodeId.output`. Deterministic operations only. |
-| **Script** | `script: "..." runtime: bun\|uv` | TypeScript (via bun) or Python (via uv) — deterministic typed transforms where bash would need fragile quoting. Stdout is captured as `$nodeId.output`. See [Script Nodes](/guides/script-nodes/). |
+| **Script** | `script: "..." ` + `runtime: bun \| uv` | Run TypeScript/JavaScript (bun) or Python (uv) without AI. Inline code or named reference to `.archon/scripts/`. Stdout captured as `$nodeId.output`. See [Script Nodes](/guides/script-nodes/). |
 | **Loop** | `loop: { prompt: "...", until: SIGNAL }` | Repeat an AI prompt until a completion signal appears in the output. See [Loop Nodes](/guides/loop-nodes/). |
-| **Approval** | `approval: { message: "..." }` | Pause the run for human review before continuing. See [Approval Nodes](/guides/approval-nodes/). |
-| **Cancel** | `cancel: "reason string"` | Terminate the run with a reason (useful as a `when:`-gated branch for safety checks). |
+| **Approval** | `approval: { message: "..." }` | Pause the workflow for a human approve/reject decision. See [Approval Nodes](/guides/approval-nodes/). |
+| **Cancel** | `cancel: "reason string"` | Terminate the workflow run (status: cancelled, not failed). Usually gated with `when:`. |
 
 **Command** is the most common. Use it for anything you'll reuse across workflows.
 
 **Prompt** is convenient for glue nodes — summarizing outputs, formatting data — where the logic is simple and workflow-specific.
 
-**Bash** is powerful for deterministic operations: running tests, checking git status, reading a file, fetching an API. The AI doesn't run the bash command; your shell does. The output becomes a variable for downstream nodes:
+**Bash** is powerful for deterministic shell operations: running tests, checking git status, reading a file, fetching an API. The AI doesn't run the bash command; your shell does. The output becomes a variable for downstream nodes:
 
 ```yaml
 - id: check-tests
@@ -258,6 +258,22 @@ Archon supports seven node types:
   prompt: "Test output: $check-tests.output\n\nFix any failures."
 ```
 
+**Script** is for deterministic work that needs a real programming language — parsing JSON, transforming data between AI nodes, calling typed HTTP clients. Use `runtime: bun` for TypeScript/JavaScript and `runtime: uv` for Python:
+
+```yaml
+- id: transform
+  script: |
+    const raw = process.env.UPSTREAM ?? '{}';
+    const items = JSON.parse(raw).items ?? [];
+    console.log(JSON.stringify({ count: items.length }));
+  runtime: bun
+
+- id: analyze
+  script: analyze-metrics        # Named script: .archon/scripts/analyze-metrics.py
+  runtime: uv
+  deps: ["pandas>=2.0"]          # uv-only; bun auto-installs imports
+```
+
 **Loop** is for iterative tasks where you don't know how many steps it will take. The AI runs until it emits a completion signal:
 
 ```yaml
@@ -272,6 +288,32 @@ Archon supports seven node types:
     fresh_context: true
 ```
 
+**Approval** pauses the workflow for human review. The downstream nodes don't run until the user approves in chat, CLI, or web UI:
+
+```yaml
+interactive: true                 # required at workflow level for web UI delivery
+
+nodes:
+  - id: plan
+    command: plan-feature
+  - id: review-gate
+    approval:
+      message: "Review the plan above."
+    depends_on: [plan]
+  - id: implement
+    command: implement
+    depends_on: [review-gate]
+```
+
+**Cancel** terminates the workflow with a reason string. Pair with `when:` for guarded exits — the run shows as `cancelled` rather than `failed`:
+
+```yaml
+- id: gate-branch
+  cancel: "Refusing to run on main — this workflow modifies files."
+  when: "$check-branch.output == 'main'"
+  depends_on: [check-branch]
+```
+
 ---
 
 ## Best Practices
diff --git a/packages/docs-web/src/content/docs/book/quick-reference.md b/packages/docs-web/src/content/docs/book/quick-reference.md
index 2c3123acdd..a0c34643c3 100644
--- a/packages/docs-web/src/content/docs/book/quick-reference.md
+++ b/packages/docs-web/src/content/docs/book/quick-reference.md
@@ -124,10 +124,10 @@ All nodes share these base fields:
 | `command` | One of | string | Name of a command file in `.archon/commands/` |
 | `prompt` | One of | string | Inline AI instructions |
 | `bash` | One of | string | Shell script (runs without AI; stdout captured as `$nodeId.output`) |
-| `script` | One of | string | TypeScript/JS (via bun) or Python (via uv); requires `runtime:` (`bun` or `uv`); optional `deps:` (uv only) and `timeout:` (ms). Stdout captured as `$nodeId.output`. See [Script Nodes](/guides/script-nodes/) |
+| `script` | One of | string | TypeScript/JavaScript (bun) or Python (uv) — inline or named ref to `.archon/scripts/`. Requires `runtime`. See [Script Nodes](/guides/script-nodes/) |
 | `loop` | One of | object | Loop configuration (see Loop Options below) |
-| `approval` | One of | object | Human-review gate; pauses the run until approved or rejected. See [Approval Nodes](/guides/approval-nodes/) |
-| `cancel` | One of | string | Terminates the run with the given reason string |
+| `approval` | One of | object | Pause for human review; see [Approval Nodes](/guides/approval-nodes/) |
+| `cancel` | One of | string | Reason string; terminates the run with `cancelled` status (not `failed`). Usually gated with `when:` |
 | `depends_on` | No | string[] | Node IDs that must complete before this node runs |
 | `when` | No | string | Condition expression; node is skipped if false |
 | `trigger_rule` | No | string | Join semantics when multiple upstreams exist (see Trigger Rules) |
@@ -138,12 +138,30 @@ All nodes share these base fields:
 | `allowed_tools` | No | string[] | Restrict available tools to this list (Claude only) |
 | `denied_tools` | No | string[] | Remove specific tools from this node's context (Claude only) |
 | `idle_timeout` | No | number | Per-node idle timeout in milliseconds (default: 5 minutes) |
-| `retry` | No | object | Retry configuration for transient failures (see Retry Options) |
+| `retry` | No | object | Retry configuration for transient failures (see Retry Options). **Hard error on loop nodes** |
 | `hooks` | No | object | SDK hook callbacks (Claude only; see Hook Schema) |
 | `mcp` | No | string | Path to MCP server config JSON file (Claude only) |
 | `skills` | No | string[] | Skill names to preload into this node's context (Claude only) |
+| `agents` | No | object | Inline sub-agent definitions keyed by kebab-case ID. Claude only |
 
-> **bash node timeout**: The `timeout` field on bash nodes is in **milliseconds** (default: 120000). This differs from hook `timeout`, which is in seconds.
+**Script-specific fields** (required when `script:` is set):
+
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `runtime` | Yes | `'bun'` \| `'uv'` | Which runtime executes the script. Must match file extension for named scripts (`.ts`/`.js` → bun, `.py` → uv) |
+| `deps` | No | string[] | Python dependencies for `uv run --with`. Ignored for bun (bun auto-installs) |
+| `timeout` | No | number | Hard kill in ms. Default: 120000 (2 min). Same semantics as `bash` timeout |
+
+**Approval-specific fields** (required when `approval:` is set):
+
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `approval.message` | Yes | string | The message shown to the user when the workflow pauses |
+| `approval.capture_response` | No | boolean | `true` = user's comment becomes `$<node-id>.output`. Default: `false` |
+| `approval.on_reject.prompt` | No | string | AI rework prompt when the user rejects. `$REJECTION_REASON` substituted |
+| `approval.on_reject.max_attempts` | No | number | Max rework iterations before cancel. Range 1-10, default 3 |
+
+> **bash and script node timeout**: The `timeout` field is in **milliseconds** (default: 120000). This differs from hook `timeout`, which is in seconds.
 
 ### Trigger Rules
 

From cdf6fa2343f121a67f26e1a453c7da016282d179 Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Fri, 24 Apr 2026 11:07:40 +0300
Subject: [PATCH 10/14] fix(skill/cli): remove nonexistent \`archon workflow
 cancel\`, fix workflow status jq recipe
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two accuracy issues from the PR code-reviewer (comment 4311243858).

C1: \`archon workflow cancel <run-id>\` does NOT exist as a CLI subcommand.
The switch at packages/cli/src/cli.ts:318-485 dispatches on list / run /
status / resume / abandon / approve / reject / cleanup / event — running
\`archon workflow cancel\` hits the default case and exits with "Unknown
workflow subcommand: cancel" (cli.ts:478-484). Active cancellation is
only available via:
  - /workflow cancel <run-id> chat slash command (all platforms)
  - Cancel button on the Web UI dashboard
  - POST /api/workflows/runs/{runId}/cancel REST endpoint

cli-commands.md: removed the \`### archon workflow cancel <run-id>\`
subsection; kept the \`abandon\` subsection but made it explicit that
abandon does NOT kill a subprocess. Added a call-out box at the bottom
of the abandon section explaining where to go for actual cancellation.

troubleshooting.md "running forever" section: split the original
cancel-vs-abandon advice into three bullets — Web UI / CLI abandon (for
orphans, no subprocess kill) / chat \`/workflow cancel\` (for live runs
that need interruption). Added an explicit "there is no archon workflow
cancel CLI subcommand" parenthetical since the wrong command was being
suggested in flow.

I1: the \`archon workflow list --json\` diagnostic used an incorrect jq
filter. workflow list's --json output (workflow.ts:185-219) has shape
{ workflows: [{ name, description, provider?, model?, ... }], errors: [...] }
with no \`runs\` field — \`jq '.workflows[] | select(.runs)'\` returns empty
unconditionally. Replaced with \`archon workflow status --json | jq '.runs[]'\`,
which matches the actual shape of workflowStatusCommand at
workflow.ts:852+ ({ runs: WorkflowRun[] }). Also tightened the narration
to distinguish JSON from human-readable status output.

No change to the commit history in this PR — these are follow-up fixes
to claims I introduced in earlier commits of this branch (f10b989e for
C1, 66d2b86e for I1).
---
 .claude/skills/archon/references/cli-commands.md   | 14 +++-----------
 .../skills/archon/references/troubleshooting.md    |  9 +++++----
 2 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/.claude/skills/archon/references/cli-commands.md b/.claude/skills/archon/references/cli-commands.md
index 5a10c01aa3..4555ee0a0a 100644
--- a/.claude/skills/archon/references/cli-commands.md
+++ b/.claude/skills/archon/references/cli-commands.md
@@ -76,24 +76,16 @@ archon workflow reject abc123 --reason "Plan misses test coverage"
 archon workflow reject abc123 "Plan misses test coverage"
 ```
 
-### `archon workflow cancel <run-id>`
-
-Cancel a running or paused workflow. Terminates in-flight subprocesses.
-
-```bash
-archon workflow cancel abc123
-```
-
-Different from `abandon`: `cancel` actively terminates; `abandon` marks a row as cancelled without killing any subprocess (use when the subprocess is already gone, e.g. server crash).
-
 ### `archon workflow abandon <run-id>`
 
-Mark a non-terminal workflow run as cancelled without terminating a subprocess. Use when a `running` row is stuck after a server crash or when you want to discard a paused run without rejecting.
+Mark a non-terminal workflow run as cancelled. Use when a `running` row is stuck after a server crash or when you want to discard a paused run without rejecting. This does NOT kill an in-flight subprocess — it only transitions the DB row.
 
 ```bash
 archon workflow abandon abc123
 ```
 
+> **There is no `archon workflow cancel` CLI subcommand.** To actively cancel a running workflow (terminate its subprocess), use the chat slash command `/workflow cancel <run-id>` on the platform that started it (Web UI, Slack, Telegram, etc.), or the Cancel button on the Web UI dashboard. The CLI only offers `abandon`, which is the right tool for orphan cleanup but does not interrupt a live subprocess.
+
 ### `archon workflow resume <run-id> [message]`
 
 Explicitly re-run a failed run. Most workflows auto-resume without this — use it when you want to force a specific run ID.
diff --git a/.claude/skills/archon/references/troubleshooting.md b/.claude/skills/archon/references/troubleshooting.md
index 7405f57a9b..23c00f1c9e 100644
--- a/.claude/skills/archon/references/troubleshooting.md
+++ b/.claude/skills/archon/references/troubleshooting.md
@@ -97,7 +97,8 @@ Three possibilities:
 1. **The AI is actually working.** Check `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl` — if you see recent `tool_use` events, it's fine. Wait.
 2. **The server crashed and left an orphan row.** Server startup no longer auto-fails orphaned `running` rows (per the "No Autonomous Lifecycle Mutation" rule — `CLAUDE.md`). Transition it manually:
    - Web UI: Dashboard → Abandon or Cancel button on the run card
-   - CLI: `archon workflow abandon <run-id>` (no subprocess kill, for orphans) or `archon workflow cancel <run-id>` (with subprocess kill, for stuck live runs)
+   - CLI: `archon workflow abandon <run-id>` — marks the DB row cancelled without killing any subprocess. Right tool for orphans since the subprocess is already gone
+   - Chat (Slack / Telegram / Web): `/workflow cancel <run-id>` — actively terminates the subprocess. Use for a still-live run that needs to be interrupted (there is no `archon workflow cancel` CLI subcommand)
 3. **A node is past its `idle_timeout`.** The default is 5 minutes. Override with per-node `idle_timeout: 600000` (10 min) for long-running nodes.
 
 ### Workflow fails mid-way; how do I resume?
@@ -137,10 +138,10 @@ Common causes:
 ## Useful Diagnostic Commands
 
 ```bash
-# What ran recently and how did each run end
-archon workflow list --json | jq '.workflows[] | select(.runs)'
+# All active runs as JSON (running / paused / recently finished, depending on retention)
+archon workflow status --json | jq '.runs[]'
 
-# Current status of any active runs
+# Human-readable status of any active runs
 archon workflow status
 
 # Active worktrees and their last activity

From 59b149f99e68c78665d4568d1e43a59f782061f4 Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Fri, 24 Apr 2026 11:09:13 +0300
Subject: [PATCH 11/14] fix(skill): remove env-leak gate references (feature
 was removed in provider extraction)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

C2 from the PR code-reviewer (comment 4311243858). The pre-spawn env-leak
gate was removed from the codebase during the provider-extraction refactor
— see TODO(#1135) at packages/providers/src/claude/provider.ts:908. Zero
hits for --allow-env-keys / allowEnvKeys / allow_env_keys / allow_target_repo_keys
across packages/. The CLI's parseArgs (cli.ts:182-208) has no
--allow-env-keys option, and because parseArgs uses strict: false, an
unknown --allow-env-keys would be silently ignored rather than error.

What remains accurate and is NOT touched:
- Three-Path Env Model section (user/repo archon-owned envs are loaded;
  target repo <cwd>/.env keys are stripped from process.env at boot)
  still correctly describes current behavior, grounded in
  packages/paths/src/strip-cwd-env.ts + env-integration.test.ts
- Per-Project Env Injection section (Option 1: .archon/config.yaml env:
  block; Option 2: Web UI Settings → Projects → Env Vars) is unchanged —
  both remain the sanctioned way to get env vars into subprocesses

Removed claims (all three files):
- cli-commands.md: --allow-env-keys flag row in the workflow run flags
  table
- repo-init.md: the "Env-leak gate" subsection at the end of Per-Project
  Env Injection listing 5 remediations (all of which reference UI/CLI/
  config surfaces that don't exist). Replaced with a succinct callout
  that explains the actual current behavior — target repo .env keys are
  stripped, workflows that need those values should use managed
  injection — so the reader still gets the "where to put my env vars"
  answer
- troubleshooting.md: the "Cannot register: codebase has sensitive env
  keys" section (error message that can no longer be emitted)

If the env-leak gate is ever resurrected per TODO(#1135), the docs can be
re-added then. The CHANGELOG v0.3.0 entry describing the gate is a
historical record of past behavior and does not need to be rewritten.
---
 .claude/skills/archon/references/cli-commands.md    |  1 -
 .claude/skills/archon/references/repo-init.md       | 10 +---------
 .claude/skills/archon/references/troubleshooting.md | 11 -----------
 3 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/.claude/skills/archon/references/cli-commands.md b/.claude/skills/archon/references/cli-commands.md
index 4555ee0a0a..0cc1a0ee06 100644
--- a/.claude/skills/archon/references/cli-commands.md
+++ b/.claude/skills/archon/references/cli-commands.md
@@ -33,7 +33,6 @@ archon workflow run archon-fix-github-issue --resume
 | `--from <name>` / `--from-branch <name>` | Start-point branch for new worktree (default: repo default branch) |
 | `--no-worktree` | Skip isolation — run in the live checkout |
 | `--resume` | Resume the last failed run of this workflow at this cwd (skips completed nodes) |
-| `--allow-env-keys` | Grant env-leak gate consent during auto-registration. Use when the repo's `.env` has sensitive keys (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.) and you've confirmed they should be allowed for this codebase. Audit-logged as `env_leak_consent_granted` |
 | `--cwd <path>` | Working directory override |
 
 **Flag conflicts** (errors):
diff --git a/.claude/skills/archon/references/repo-init.md b/.claude/skills/archon/references/repo-init.md
index 14005c68b6..6923147fb9 100644
--- a/.claude/skills/archon/references/repo-init.md
+++ b/.claude/skills/archon/references/repo-init.md
@@ -117,15 +117,7 @@ env:
 
 Both surfaces inject into: Claude/Codex/Pi subprocess env, `bash:` node subprocess env, `script:` node subprocess env, and direct chat messages that run against the codebase. The worktree isolation layer propagates them as well.
 
-**Env-leak gate** — when a codebase's auto-loaded `<cwd>/.env` contains sensitive keys (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, and 5 others), Archon refuses to register or spawn into that codebase. Remediations (any one):
-
-1. Remove the key from the target `.env`
-2. Rename `.env` → `.env.secrets` (changes the auto-load behavior)
-3. Web UI: Settings → Projects → flip "Allow env keys" to on
-4. CLI: `archon workflow run --allow-env-keys ...`
-5. Global bypass: `allow_target_repo_keys: true` in `~/.archon/config.yaml`
-
-Full details in `reference/security.md` on the docs site.
+> **About keys in the target repo's `<cwd>/.env`**: Archon unconditionally strips the keys auto-loaded from `<cwd>/.env` out of `process.env` at boot (see the Three-Path Env Model above) and the Bun subprocess is invoked with `--no-env-file`, so those values do NOT reach AI / bash / script subprocesses. If a workflow needs a value that currently lives in the target repo's `.env`, surface it through one of the two managed injection options above — don't expect the target `.env` to leak through.
 
 ## Global Configuration
 
diff --git a/.claude/skills/archon/references/troubleshooting.md b/.claude/skills/archon/references/troubleshooting.md
index 23c00f1c9e..66e32eeb07 100644
--- a/.claude/skills/archon/references/troubleshooting.md
+++ b/.claude/skills/archon/references/troubleshooting.md
@@ -62,17 +62,6 @@ A node references `$BASE_BRANCH` in its prompt, but neither git auto-detection n
 2. Or pass `--from <branch>` on `archon workflow run`.
 3. Or remove the `$BASE_BRANCH` reference if the node doesn't actually need it.
 
-### "Cannot register: codebase has sensitive env keys"
-
-The env-leak gate blocked the workflow because `<cwd>/.env` contains keys like `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`.
-
-**Remediations (any one):**
-1. Remove the key from `<cwd>/.env`.
-2. Rename the file to `.env.secrets` — auto-loading no longer applies.
-3. Web UI: Settings → Projects → flip "Allow env keys" on.
-4. CLI: rerun with `archon workflow run --allow-env-keys ...`.
-5. Machine-wide bypass: `allow_target_repo_keys: true` in `~/.archon/config.yaml`.
-
 ### "Claude Code not found" / "Codex CLI binary not found"
 
 Compiled-binary builds of Archon no longer embed Claude Code / Codex — you install them separately and Archon resolves the binary via env var or config.

From 2c1a9c7bebd68b5b86993331aae3d1d5332773d8 Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Fri, 24 Apr 2026 11:10:06 +0300
Subject: [PATCH 12/14] fix(skill/troubleshooting): correct JSONL event type
 names and field name
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

C3 from the PR code-reviewer (comment 4311243858). The troubleshooting
reference's event-types table used _started / _completed / _failed
suffixes, but packages/workflows/src/logger.ts:19-30 shows the actual
WorkflowEvent.type enum is:

  workflow_start | workflow_complete | workflow_error |
  assistant | tool | validation |
  node_start | node_complete | node_skipped | node_error

The second jq recipe also queried `.event` but the discriminator is `.type`.

Fixes:
- Event table: renamed columns (_started → _start, _completed → _complete,
  _failed → _error). Explicitly called out the field name as `type` so the
  reader knows what jq selector to use
- Replaced the "tool_use / tool_result" row with a single `tool` row and
  listed its actual payload fields (tool_name, tool_input, duration_ms,
  tokens) — tool_use/tool_result are SDK message kinds that appear within
  the AI stream, not top-level log event types
- Added a `validation` row (was missing; it's emitted by workflow-level
  validation calls with `check` and `result` fields)
- Removed `retry_attempt` row — this event type is not emitted to the
  JSONL file. Retry bookkeeping goes through pino logs, not the workflow
  log file
- Added an explicit callout that loop_iteration_started /
  loop_iteration_completed (and other emitter-only events) go through
  the workflow event emitter + DB workflow_events table, NOT the JSONL
  file. Pointed readers to the DB or Web UI for loop-level detail. This
  distinguishes the two parallel event systems — easy to conflate
  (store.ts:11-17 uses _started/_completed/_failed for the DB side,
  logger.ts uses _start/_complete/_error for JSONL)
- Fixed the "all failed events" jq recipe: .event → .type and _failed → _error
- Minor cleanup: the inline "tool_use events" mention in the "running
  forever" section said the wrong event name — updated to "tool or
  assistant events in the tail"

Grounded in packages/workflows/src/logger.ts (canonical JSONL event
shape) and packages/workflows/src/store.ts (the parallel DB event
naming, which the reviewer correctly flagged as different and worth
keeping distinct).
---
 .../archon/references/troubleshooting.md      | 27 ++++++++++---------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/.claude/skills/archon/references/troubleshooting.md b/.claude/skills/archon/references/troubleshooting.md
index 66e32eeb07..3c4b46ac72 100644
--- a/.claude/skills/archon/references/troubleshooting.md
+++ b/.claude/skills/archon/references/troubleshooting.md
@@ -10,25 +10,26 @@ Workflow run logs are written as JSONL per run:
 ~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl
 ```
 
-Each line is a structured event. Common event types:
+Each line is a structured event. The discriminator is the `type` field. Values (see `packages/workflows/src/logger.ts` for the canonical list):
 
-| Event | Meaning |
-|-------|---------|
-| `workflow_started` / `workflow_completed` / `workflow_failed` | Run lifecycle |
-| `node_started` / `node_completed` / `node_failed` / `node_skipped` | Node lifecycle |
-| `assistant` | AI assistant message (has `content` field with the full AI output) |
-| `tool_use` / `tool_result` | SDK tool call + result |
-| `retry_attempt` | Node retry with attempt number and reason |
-| `loop_iteration_started` / `loop_iteration_completed` | Loop bookkeeping |
+| `type` | Meaning |
+|--------|---------|
+| `workflow_start` / `workflow_complete` / `workflow_error` | Run lifecycle |
+| `node_start` / `node_complete` / `node_error` / `node_skipped` | Node lifecycle |
+| `assistant` | AI assistant message — has `content` field with the full AI output |
+| `tool` | SDK tool invocation — has `tool_name`, `tool_input`, `duration_ms`, and optionally `tokens` |
+| `validation` | Workflow-level validation event — has `check` and `result` (`pass` / `fail` / `warn` / `unknown`) |
 
-Find the run ID from `archon workflow status` or `archon workflow list` (most recent run). Then:
+> **Loop iterations and per-attempt retry events are NOT in the JSONL file.** They go through the workflow event emitter (WebSocket / `workflow_events` DB table) under `loop_iteration_started` / `loop_iteration_completed` etc. To see them, query the DB or the Web UI dashboard — not the JSONL log.
+
+Find the run ID from `archon workflow status` (most recent run). Then:
 
 ```bash
 # Last assistant message (what the AI said before failure)
 jq 'select(.type == "assistant") | .content' <log-file> | tail -1
 
-# All failed events
-jq 'select(.event == "node_failed" or .event == "workflow_failed")' <log-file>
+# All error events (node failures + workflow-level failures)
+jq 'select(.type == "node_error" or .type == "workflow_error")' <log-file>
 
 # Full event stream
 cat <log-file> | jq .
@@ -83,7 +84,7 @@ See the Install page on the docs site for full platform-specific install paths.
 
 Three possibilities:
 
-1. **The AI is actually working.** Check `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl` — if you see recent `tool_use` events, it's fine. Wait.
+1. **The AI is actually working.** Check `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl` — if you see recent `tool` or `assistant` events in the tail, it's fine. Wait.
 2. **The server crashed and left an orphan row.** Server startup no longer auto-fails orphaned `running` rows (per the "No Autonomous Lifecycle Mutation" rule — `CLAUDE.md`). Transition it manually:
    - Web UI: Dashboard → Abandon or Cancel button on the run card
    - CLI: `archon workflow abandon <run-id>` — marks the DB row cancelled without killing any subprocess. Right tool for orphans since the subprocess is already gone

From 4c0f268666d5ac8e66c9b6bce0c3d4d4c45a7b25 Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Fri, 24 Apr 2026 11:11:13 +0300
Subject: [PATCH 13/14] fix(skill): two stragglers from the code-reviewer audit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cleanup of two references that slipped through the earlier C1 and C3 fixes:

- references/troubleshooting.md:126: \`node_failed\` → \`node_error\`
  (the "Node output is empty" diagnostics section references the JSONL
  log, which uses the logger.ts enum — not the DB workflow_events table
  which does use \`node_failed\`). The C3 fix corrected the event table
  and one jq recipe but missed this inline mention.

- references/interactive-workflows.md:106: removed \`archon workflow
  cancel <run-id>\` (nonexistent CLI subcommand) from the
  troubleshooting bullet. This was pre-existing before the hardening
  PR but fell within the C1 remediation scope. Replaced with the
  correct triage: reject (approval gate only) vs abandon (orphan
  cleanup, no subprocess kill) vs chat /workflow cancel (actual
  subprocess termination).

Grounded in the same sources as the earlier C1/C3 commits:
packages/cli/src/cli.ts:318-485 (no cancel case) and
packages/workflows/src/logger.ts:19-30 (JSONL type enum).
---
 .claude/skills/archon/references/interactive-workflows.md | 2 +-
 .claude/skills/archon/references/troubleshooting.md       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.claude/skills/archon/references/interactive-workflows.md b/.claude/skills/archon/references/interactive-workflows.md
index 243cfdb7b0..856d50afd1 100644
--- a/.claude/skills/archon/references/interactive-workflows.md
+++ b/.claude/skills/archon/references/interactive-workflows.md
@@ -103,4 +103,4 @@ archon workflow reject <run-id> "reason for rejection"
 
 - **Workflow shows `running` for a long time**: The AI is doing research/implementation. Be patient — check again in a few minutes.
 - **Log file not found**: The log is at `~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl`
-- **User wants to cancel**: Run `archon workflow reject <run-id>` or `archon workflow cancel <run-id>`
+- **User wants to cancel**: Run `archon workflow reject <run-id>` to stop at an approval gate, or `archon workflow abandon <run-id>` to mark the run cancelled without killing any subprocess. To actively terminate a still-live subprocess, use the chat slash command `/workflow cancel <run-id>` on the platform that started it — there is no `archon workflow cancel` CLI subcommand
diff --git a/.claude/skills/archon/references/troubleshooting.md b/.claude/skills/archon/references/troubleshooting.md
index 3c4b46ac72..33d08cde77 100644
--- a/.claude/skills/archon/references/troubleshooting.md
+++ b/.claude/skills/archon/references/troubleshooting.md
@@ -123,7 +123,7 @@ Common causes:
 1. Upstream node is an AI node without `output_format` — the output is free-form text, JSON parsing fails, field access returns empty.
 2. Upstream node was **skipped** (its `when:` evaluated false). Downstream `when:` with `==` comparisons against a specific value will fail-closed.
 3. Bash/script node printed to stderr, not stdout. Only stdout is captured.
-4. For script nodes, non-zero exit on a non-existent file / missing import silently drops the output. Check the run log for `node_failed`.
+4. For script nodes, non-zero exit on a non-existent file / missing import silently drops the output. Check the run log for `node_error` entries.
 
 ## Useful Diagnostic Commands
 

From dbf05e3b244de71746dee3254367723e99ac02bc Mon Sep 17 00:00:00 2001
From: Rasmus Widing <rasmus.widing@gmail.com>
Date: Fri, 24 Apr 2026 11:22:10 +0300
Subject: [PATCH 14/14] feat(skill): point to archon.diy as the canonical docs
 source
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The skill had no reference to archon.diy (the live docs site built from
packages/docs-web/). Several reference files said "see the docs site"
without naming the URL, leaving the agent to guess or grep the repo for
the hostname. An agent with the skill loaded should know that when the
distilled reference pages don't cover a case, the full canonical docs
are one WebFetch away.

SKILL.md: new "Richer Context: archon.diy" section between Routing and
Running Workflows. Covers:
- When to reach for the live docs (longer examples, tutorial framing,
  features the skill only mentions in passing, "where's that
  documented?" user questions)
- URL map — 13 starting points covering getting-started, book (tutorial
  series), guides/ (authoring + per-node-type + per-node-feature),
  reference/ (variables, CLI, security, architecture, configuration,
  troubleshooting), adapters/, deployment/
- Precedence: skill refs first (context-cheap, tuned for agents), docs
  site as escalation. Prevents agents defaulting to WebFetch when a
  local skill ref already covers the answer

Also upgrades the 5 existing generic "docs site" mentions across
reference files to concrete archon.diy URLs with anchor fragments where
helpful:
- good-practices.md: Inline sub-agents pattern → archon.diy/guides/
  authoring-workflows/#inline-sub-agents
- troubleshooting.md: "Install page on the docs site" → archon.diy/
  getting-started/installation/
- workflow-dag.md: "Workflow Description Best Practices" → anchor link;
  sandbox schema reference → archon.diy/guides/authoring-workflows/
  #claude-sdk-advanced-options
- repo-init.md: Security Model reference → archon.diy/reference/
  security/#target-repo-env-isolation (deep-link into the section that
  covers the <cwd>/.env strip behavior)

URL source of truth: astro.config.mjs:5 (site: 'https://archon.diy').
URL structure mirrors packages/docs-web/src/content/docs/<section>/
<page>.md — verified by the 62 pages the docs build produces.
---
 .claude/skills/archon/SKILL.md                | 40 +++++++++++++++++++
 .../archon/references/good-practices.md       |  2 +-
 .claude/skills/archon/references/repo-init.md |  2 +-
 .../archon/references/troubleshooting.md      |  2 +-
 .../skills/archon/references/workflow-dag.md  |  4 +-
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/.claude/skills/archon/SKILL.md b/.claude/skills/archon/SKILL.md
index e60e170d56..9a9a2f7c0b 100644
--- a/.claude/skills/archon/SKILL.md
+++ b/.claude/skills/archon/SKILL.md
@@ -50,6 +50,46 @@ If the intent is ambiguous, ask the user to clarify.
 
 ---
 
+## Richer Context: [archon.diy](https://archon.diy)
+
+The references in this skill are a distilled subset. The full, canonical docs live at **[archon.diy](https://archon.diy)** (Starlight site from `packages/docs-web/`). If the skill's reference pages don't cover what you need — an edge case, a worked example, a diagram, a deeper section on a feature — fetch the matching page from archon.diy.
+
+### When to reach for the live docs
+
+- You need an end-to-end example that's longer than what the skill shows (e.g. full patterns for hooks, MCP config, sandbox schema, approval flows)
+- You're explaining a concept to the user and want the most readable framing (the `book/` series is written as a tutorial, not a reference)
+- You hit a feature the skill only mentions in passing (e.g. `agents:` inline sub-agents, advanced Codex options, the full SyncHookJSONOutput schema)
+- The user asks "where is this documented?" — point them at the archon.diy URL, not a skill file path
+
+### URL map
+
+| Topic | URL |
+|-------|-----|
+| Landing + install | [archon.diy](https://archon.diy) |
+| Getting started (installation, quick start, concepts) | [archon.diy/getting-started/](https://archon.diy/getting-started/overview/) |
+| The book (tutorial-style walkthrough) | [archon.diy/book/](https://archon.diy/book/) |
+| Workflow authoring guide | [archon.diy/guides/authoring-workflows/](https://archon.diy/guides/authoring-workflows/) |
+| Command authoring guide | [archon.diy/guides/authoring-commands/](https://archon.diy/guides/authoring-commands/) |
+| Node type guides | [archon.diy/guides/loop-nodes/](https://archon.diy/guides/loop-nodes/), [/approval-nodes/](https://archon.diy/guides/approval-nodes/), [/script-nodes/](https://archon.diy/guides/script-nodes/) |
+| Per-node features (Claude only) | [/hooks/](https://archon.diy/guides/hooks/), [/mcp-servers/](https://archon.diy/guides/mcp-servers/), [/skills/](https://archon.diy/guides/skills/) |
+| Global workflows/commands/scripts | [archon.diy/guides/global-workflows/](https://archon.diy/guides/global-workflows/) |
+| Variables reference | [archon.diy/reference/variables/](https://archon.diy/reference/variables/) |
+| CLI reference | [archon.diy/reference/cli/](https://archon.diy/reference/cli/) |
+| Security model (env, sandbox, target-repo `.env` stripping) | [archon.diy/reference/security/](https://archon.diy/reference/security/) |
+| Architecture | [archon.diy/reference/architecture/](https://archon.diy/reference/architecture/) |
+| Configuration (`.archon/config.yaml` full schema) | [archon.diy/reference/configuration/](https://archon.diy/reference/configuration/) |
+| Troubleshooting | [archon.diy/reference/troubleshooting/](https://archon.diy/reference/troubleshooting/) |
+| Adapter setup (Slack/Telegram/GitHub/Web/Discord/Gitea/GitLab) | [archon.diy/adapters/](https://archon.diy/adapters/) |
+| Deployment (Docker, cloud, Windows) | [archon.diy/deployment/](https://archon.diy/deployment/) |
+
+URL shape is `archon.diy/<section>/<page>/` — the paths mirror the filenames under `packages/docs-web/src/content/docs/`.
+
+### Precedence
+
+This skill's reference pages are the primary source for routine workflow authoring, CLI use, and setup. Reach for archon.diy when the skill is incomplete for your case — don't go to the live docs first by default (skill refs load into context faster and are tuned for agents).
+
+---
+
 ## Running Workflows
 
 ### Core Command
diff --git a/.claude/skills/archon/references/good-practices.md b/.claude/skills/archon/references/good-practices.md
index 34b2d4e403..e731a2583d 100644
--- a/.claude/skills/archon/references/good-practices.md
+++ b/.claude/skills/archon/references/good-practices.md
@@ -202,7 +202,7 @@ Implement the changes exactly as specified in the plan.
 
 ### ❌ Long flat layers of AI nodes
 
-Ten sibling `prompt:` nodes in one layer all depending on one upstream is a $N/run cost bomb and a latency trap. If the work is parallel and similar, use the `agents:` inline sub-agent map-reduce pattern with a cheap model per item and a single stronger reducer. See `references/dag-advanced.md` and the docs site's Inline sub-agents section.
+Ten sibling `prompt:` nodes in one layer all depending on one upstream is a $N/run cost bomb and a latency trap. If the work is parallel and similar, use the `agents:` inline sub-agent map-reduce pattern with a cheap model per item and a single stronger reducer. See `references/dag-advanced.md` and the [Inline sub-agents section on archon.diy](https://archon.diy/guides/authoring-workflows/#inline-sub-agents) for a worked example.
 
 ### ❌ Hardcoding secrets in YAML or MCP configs
 
diff --git a/.claude/skills/archon/references/repo-init.md b/.claude/skills/archon/references/repo-init.md
index 6923147fb9..e44907fd2e 100644
--- a/.claude/skills/archon/references/repo-init.md
+++ b/.claude/skills/archon/references/repo-init.md
@@ -83,7 +83,7 @@ Archon loads env from three distinct paths at boot, with different trust levels
 |------|-------|-------|---------|
 | `~/.archon/.env` | User (home) | Trusted — user owns it | Yes, with `override: true` |
 | `<cwd>/.archon/.env` | Repo (per-project, Archon-owned) | Trusted — user owns it | Yes, with `override: true` (overrides home) |
-| `<cwd>/.env` | Target repo | **Untrusted** — belongs to the project being worked on | **Stripped from `process.env`** before subprocess spawn to prevent secret leakage (see [Security Model](../../../packages/docs-web/src/content/docs/reference/security.md) on the docs site) |
+| `<cwd>/.env` | Target repo | **Untrusted** — belongs to the project being worked on | **Stripped from `process.env`** before subprocess spawn to prevent secret leakage (see [archon.diy/reference/security/](https://archon.diy/reference/security/#target-repo-env-isolation) for the full trust model) |
 
 Boot behavior emits observable log lines:
 
diff --git a/.claude/skills/archon/references/troubleshooting.md b/.claude/skills/archon/references/troubleshooting.md
index 33d08cde77..099cccd928 100644
--- a/.claude/skills/archon/references/troubleshooting.md
+++ b/.claude/skills/archon/references/troubleshooting.md
@@ -78,7 +78,7 @@ Compiled-binary builds of Archon no longer embed Claude Code / Codex — you ins
 - Set `CODEX_CLI_PATH=/path/to/codex` or `assistants.codex.codexBinaryPath` in config
 - Autodetect covers the standard npm / Homebrew locations per platform
 
-See the Install page on the docs site for full platform-specific install paths.
+See [archon.diy/getting-started/installation/](https://archon.diy/getting-started/installation/) for full platform-specific install paths.
 
 ### Workflow shows `running` for a long time but nothing happens
 
diff --git a/.claude/skills/archon/references/workflow-dag.md b/.claude/skills/archon/references/workflow-dag.md
index e13047704e..817d7e9db0 100644
--- a/.claude/skills/archon/references/workflow-dag.md
+++ b/.claude/skills/archon/references/workflow-dag.md
@@ -29,7 +29,7 @@ Top-level YAML fields on a workflow object. Per-node overrides (same name under
 | Field | Type | Description |
 |-------|------|-------------|
 | `name` | string (required) | Workflow identifier (used in `archon workflow run <name>`) |
-| `description` | string (required) | Human-readable summary. Used for routing; see **Workflow Description Best Practices** in `docs-web/.../authoring-workflows.md` |
+| `description` | string (required) | Human-readable summary. Used for routing; see [Workflow Description Best Practices](https://archon.diy/guides/authoring-workflows/#workflow-description-best-practices) |
 | `provider` | string | AI provider (e.g. `claude`, `codex`, `pi`). Default: from `.archon/config.yaml` |
 | `model` | string | Model override. Claude: `sonnet` \| `opus` \| `haiku` \| `claude-*` \| `inherit`. Codex: any non-Claude model ID |
 | `interactive` | boolean | **Required for web UI** when the workflow has approval gates or `loop.interactive` nodes. Forces foreground execution so gate messages reach the user's chat. Default: `false` (background on web) |
@@ -52,7 +52,7 @@ These fields apply to Claude nodes workflow-wide; each can be overridden per-nod
 | `thinking` | string \| object | Extended thinking. String shorthand: `'adaptive'` \| `'enabled'` \| `'disabled'`. Object form: `{ type: 'enabled', budgetTokens: 8000 }` |
 | `fallbackModel` | string | Model to use if the primary model fails (e.g. `claude-haiku-4-5-20251001`) |
 | `betas` | string[] | SDK beta feature flags (non-empty array). Example: `['context-1m-2025-08-07']` for 1M-context Claude |
-| `sandbox` | object | OS-level filesystem/network restrictions. Nested `network` / `filesystem` sub-objects — see the docs site for the full schema. Layers on top of worktree isolation |
+| `sandbox` | object | OS-level filesystem/network restrictions. Nested `network` / `filesystem` sub-objects — see [archon.diy/guides/authoring-workflows/#claude-sdk-advanced-options](https://archon.diy/guides/authoring-workflows/#claude-sdk-advanced-options) for the full schema. Layers on top of worktree isolation |
 
 Per-node-only (NOT valid at workflow level): `maxBudgetUsd`, `systemPrompt`.