diff --git a/.archon/commands/maintainer-review-code-review.md b/.archon/commands/maintainer-review-code-review.md new file mode 100644 index 0000000000..eca2c2cfed --- /dev/null +++ b/.archon/commands/maintainer-review-code-review.md @@ -0,0 +1,125 @@ +--- +description: Review the PR for code quality, CLAUDE.md compliance, project conventions, and bugs (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Code Review + +You are a focused code reviewer for one GitHub PR. **Always run** for every PR that passes the gate. Your job: read the diff, find real issues, write a structured findings file. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +### Read the PR number + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +``` + +### Read the project's rules + +Read the repo's `CLAUDE.md` (project-level). It's the source of truth for engineering principles, type-safety rules, eslint policy, error-handling conventions, and forbidden patterns. + +### Read the gate decision + +```bash +cat $ARTIFACTS_DIR/gate-decision.md +``` + +The gate already classified direction/scope. Don't re-litigate that here. Focus on **code quality** within the scope the gate accepted. + +### Read the PR diff + +```bash +gh pr diff $PR_NUMBER +``` + +If the diff is too large to reason about cleanly, sample: read the diff against each changed file individually with `gh pr diff $PR_NUMBER -- `. + +--- + +## Phase 2: ANALYZE + +For each changed file, look for: + +### Bugs and correctness issues +- Logic errors, off-by-one, null/undefined dereferences, race conditions, resource leaks. +- Incorrect or missing error handling. Silent catches that swallow errors. +- API misuse (wrong types, wrong arguments, deprecated calls). +- Concurrency bugs in async code. + +### CLAUDE.md compliance +- TypeScript: explicit return types? No `any` without justification? +- Imports: typed imports for types? Namespace imports for submodules? +- Logging: structured Pino with `{domain}.{action}_{state}` event names? +- Error handling: errors surfaced, not swallowed? `classifyIsolationError` used where appropriate? +- Database: rowCount checks on UPDATEs? Errors logged with context? +- Workflow: schema rules followed? `output_format` for `when:` consumers? + +### Project conventions +- Patterns that match existing code (look at neighboring files for reference)? +- Naming, structure, and organization aligned with the rest of the package? +- Cross-package boundaries respected (no `import * from '@archon/core'`, etc.)? + +### Bug-likelihood signals +- New conditional branches without tests? +- Hardcoded values that should be configurable? +- TODO / FIXME / HACK / XXX comments left in? + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/code-review-findings.md` with this structure: + +```markdown +# Code Review — PR # + +## Summary +<1-2 sentences. State the overall verdict: ready-to-merge / minor-fixes-needed / blocking-issues.> + +## Findings + +### CRITICAL +- ****: + - **Why it matters**: + - **Suggested fix**: + +### HIGH +- (same format) + +### MEDIUM +- (same format) + +### LOW / NITPICK +- (same format — combine if many) + +## CLAUDE.md compliance + + +## Notes for synthesizer + +``` + +If you find nothing to flag, write the file with `## Findings\n\nNone — code looks clean.` and stop. Don't manufacture issues. + +--- + +## Phase 4: RETURN + +Return a single line summary as your response: + +``` +Code review complete. CRITICAL, HIGH, MEDIUM, LOW findings. Verdict: . +``` + +Don't return the full findings — those live in the artifact. Synthesizer reads the file. + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/code-review-findings.md` written. +- [ ] Each finding has a file path, line number when applicable, and a concrete fix. +- [ ] No invented issues. If clean, say "None." +- [ ] Single-line summary returned. diff --git a/.archon/commands/maintainer-review-comment-quality.md b/.archon/commands/maintainer-review-comment-quality.md new file mode 100644 index 0000000000..17861fed64 --- /dev/null +++ b/.archon/commands/maintainer-review-comment-quality.md @@ -0,0 +1,95 @@ +--- +description: Review the PR's added/modified comments and docstrings for accuracy, value, and long-term maintainability (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Comment Quality + +You are a comment / docstring reviewer. Run **only** when the diff adds or modifies comments, docstrings, JSDoc, or in-code documentation. Your job: keep the code's comments truthful, valuable, and unlikely to rot. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +gh pr diff $PR_NUMBER +``` + +Read the project's comment policy in `CLAUDE.md`: +- Default to writing **no comments**. +- Only add when the **WHY** is non-obvious (hidden constraint, subtle invariant, workaround). +- Don't explain WHAT (well-named identifiers do that). +- Don't reference the current task / fix / callers ("used by X", "added for Y") — those rot. +- Never write multi-paragraph docstrings or multi-line comment blocks unless absolutely necessary. + +--- + +## Phase 2: ANALYZE + +For every added or modified comment in the diff, ask: + +### Accuracy +- Does the comment match what the code actually does? +- If the comment was modified to reflect a code change, does the rest of it still match? + +### Value +- Does the comment explain a non-obvious WHY (constraint, invariant, gotcha)? +- Or does it restate WHAT the code does? (Restating WHAT = comment rot risk.) +- Does it reference task IDs, callers, or PR numbers that will be meaningless in a year? + +### Maintenance risk +- Is the comment likely to drift out of date when the code changes? +- Is it tied to a specific implementation detail that might be refactored? + +### Style +- One short line preferred. Multi-line blocks only when truly necessary. +- No trailing summaries that just describe the next line. + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/comment-quality-findings.md`: + +```markdown +# Comment Quality Review — PR # + +## Summary +<1-2 sentences. Comment quality: good / minor-issues / significant-rot-risk.> + +## Findings + +### HIGH — inaccurate comments (don't match the code) +- ****: + - **Suggested fix**: + +### MEDIUM — comment rot risk +- (same format — references that will rot, restated-what-not-why, multi-paragraph fluff) + +### LOW — style / consistency +- (same format) + +## Comments that are actually valuable + + +## Notes for synthesizer + +``` + +If comments are clean, write `## Findings\n\nComments are accurate and capture non-obvious WHY where present.` and stop. + +--- + +## Phase 4: RETURN + +``` +Comment-quality review complete. HIGH, MEDIUM, LOW findings. Quality: . +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/comment-quality-findings.md` written. +- [ ] Each HIGH cites the exact comment text and the code it disagrees with. +- [ ] Don't flag every short comment — many are intentionally brief. diff --git a/.archon/commands/maintainer-review-docs-impact.md b/.archon/commands/maintainer-review-docs-impact.md new file mode 100644 index 0000000000..4ed5b64085 --- /dev/null +++ b/.archon/commands/maintainer-review-docs-impact.md @@ -0,0 +1,118 @@ +--- +description: Review whether the PR's user-facing changes (APIs, CLI flags, env vars, behavior) are reflected in documentation (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Docs Impact + +You are a docs-impact reviewer. Run **only** when the diff adds, removes, or renames public APIs, CLI flags, environment variables, or other user-facing behavior. Your job: catch missing or stale documentation. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +gh pr diff $PR_NUMBER +``` + +Find docs locations: + +```bash +ls packages/docs-web/src/content/docs/ 2>/dev/null +ls docs/ 2>/dev/null +ls README.md CONTRIBUTING.md CLAUDE.md 2>/dev/null +``` + +The project's docs site is at `packages/docs-web/` (Starlight). User-facing docs published to archon.diy. Repo-level docs include `CLAUDE.md`, `CONTRIBUTING.md`, and any `docs/` content. + +--- + +## Phase 2: ANALYZE + +For each user-facing change in the diff, identify the docs that should be updated: + +### What counts as user-facing +- New CLI command or flag (in `packages/cli/`). +- New environment variable. +- New / removed / renamed API route (in `packages/server/src/routes/`). +- New workflow node type, command file, or workflow YAML field. +- New configuration field in `.archon/config.yaml`. +- Change in default behavior that an existing user would notice. + +### What doesn't +- Internal refactors with no API change. +- Test-only changes. +- Bug fixes that restore documented behavior. + +### For each user-facing change + +- **New surface**: is there a docs page describing it? Is it linked from a landing page or the relevant section? +- **Changed surface**: are existing docs pages still accurate? Do they need updates? +- **Removed surface**: are existing references stale? `grep` the docs site for old name. +- **Migration**: does a breaking change need a migration note in CHANGELOG.md or docs? + +### Specific places to check +- `packages/docs-web/src/content/docs/getting-started/` — quickstart, install, concepts. +- `packages/docs-web/src/content/docs/guides/` — workflow authoring, hooks, MCP, scripts. +- `packages/docs-web/src/content/docs/reference/` — CLI, variables, configuration. +- `packages/docs-web/src/content/docs/adapters/` — Slack, Telegram, GitHub, Discord, Web. +- `packages/docs-web/src/content/docs/deployment/` — Docker, cloud. +- `CHANGELOG.md` — Keep-a-Changelog entry for user-visible changes. +- `CLAUDE.md` — only if the change affects how *agents* working in this repo should behave. + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/docs-impact-findings.md`: + +```markdown +# Docs Impact Review — PR # + +## Summary +<1-2 sentences. Docs status: in-sync / minor-gaps / significant-gaps.> + +## User-facing changes detected +- (file:line) +- + +## Findings + +### CRITICAL — missing docs for new public surface +- ****: + - **Where to add**: + - **What to write**: + +### HIGH — stale docs from changed/removed surface +- (same format) + +### MEDIUM — minor gaps (changelog entry, examples) +- (same format) + +### LOW — nice-to-have polish +- (same format) + +## Pages that look in-sync + + +## Notes for synthesizer + +``` + +If no user-facing changes, write `## Findings\n\nNo user-facing changes — no docs updates needed.` and stop. + +--- + +## Phase 4: RETURN + +``` +Docs-impact review complete. CRITICAL, HIGH, MEDIUM, LOW findings. Status: . +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/docs-impact-findings.md` written. +- [ ] Each CRITICAL/HIGH names a specific doc file path and what's missing. +- [ ] Internal-only changes don't generate findings. diff --git a/.archon/commands/maintainer-review-error-handling.md b/.archon/commands/maintainer-review-error-handling.md new file mode 100644 index 0000000000..b45b82a0af --- /dev/null +++ b/.archon/commands/maintainer-review-error-handling.md @@ -0,0 +1,94 @@ +--- +description: Review the PR for error-handling correctness — surfaced errors, no silent swallows, consistent error patterns (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Error Handling + +You are an error-handling-focused reviewer. Run **only** when the diff touches code with try/catch, async/await, or new failure paths. Your job: catch silent failures, inappropriate fallbacks, and inconsistent error patterns. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +gh pr diff $PR_NUMBER +``` + +Read the project's error-handling principles in `CLAUDE.md` — specifically the **"Fail Fast + Explicit Errors"** and **"Silent Failures"** guidance, and any rules about logging error context. + +--- + +## Phase 2: ANALYZE + +For every `try/catch`, `async/await`, error path, or fallback in the diff, ask: + +### Silent-failure risks +- Is an error caught and ignored without logging? +- Is a fallback returned that hides the actual problem from the caller? +- Is a `try` block too broad, catching errors that should propagate? +- Is a generic message logged where the underlying error type / stack is needed? + +### Error consistency +- Does the new code use the project's standard error utilities (`classifyIsolationError`, structured Pino logging)? +- Are error events named per the `{domain}.{action}_{state}` convention? +- Are errors thrown with enough context (id, operation, parameters)? + +### Promise / async correctness +- Unhandled promise rejections? Missing `await`? +- `Promise.all` vs `Promise.allSettled` — is the choice intentional? +- Cancellation / timeout handling correct? + +### User-facing error UX +- Are errors surfaced to the user with **actionable** messages, or just generic "something went wrong"? +- For platform adapters: does the error reach the chat / web UI? + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/error-handling-findings.md`: + +```markdown +# Error Handling Review — PR # + +## Summary +<1-2 sentences. Overall risk level: low / medium / high.> + +## Findings + +### CRITICAL — silent failures +- ****: + - **Why it matters**: + - **Suggested fix**: + +### HIGH — inconsistent error patterns +- (same format) + +### MEDIUM — context / actionability +- (same format) + +### LOW / NITPICK +- (same format) + +## Notes for synthesizer + +``` + +If no error-handling concerns, write `## Findings\n\nNone — error handling is consistent and surfaces failures appropriately.` and stop. + +--- + +## Phase 4: RETURN + +``` +Error-handling review complete. CRITICAL, HIGH, MEDIUM, LOW findings. Risk: . +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/error-handling-findings.md` written. +- [ ] Every CRITICAL/HIGH finding cites a real catch / try / promise / fallback in the diff. +- [ ] No invented issues. If clean, say "None." diff --git a/.archon/commands/maintainer-review-gate.md b/.archon/commands/maintainer-review-gate.md new file mode 100644 index 0000000000..92e97a4691 --- /dev/null +++ b/.archon/commands/maintainer-review-gate.md @@ -0,0 +1,255 @@ +--- +description: Gate a single PR on direction alignment, scope focus, and PR-template fill quality before any deep review +argument-hint: (no arguments — reads upstream node outputs and writes artifacts) +--- + +# Maintainer Review — Gate + +You are the **gatekeeper** for a single GitHub PR. Your job is to decide whether the PR is worth a comprehensive review or whether the maintainer should politely decline / request a split. You do **not** review code quality here — that happens downstream if you say "review." + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD INPUTS + +Three sources of upstream context, all gathered for you below. **You may also `cat .github/PULL_REQUEST_TEMPLATE.md` if you need to compare the PR body's structure against the project's template** — that's the one allowed extra read; everything else lives in the inputs below. + +### PR data (gh pr view JSON) + +```json +$fetch-pr.output +``` + +### PR diff (truncated to 2500 lines) + +```text +$fetch-diff.output +``` + +### Maintainer context (direction.md, profile.md, prior state, recent briefs, clock) + +```json +$read-context.output +``` + +Inside `read-context.output`: +- `direction` — the project's committed direction.md (what Archon IS / IS NOT, open questions) +- `profile` — the running maintainer's profile.md (role, scope, current focus) +- `prior_state` — last morning-standup state.json (carry_over may already mention this PR) +- `recent_briefs` — last 3 daily briefs (look here if this PR was previously flagged) +- `today` — today's local date as `YYYY-MM-DD` (deterministic, set by the gather script) +- `deadline_3d` — today + 3 calendar days, `YYYY-MM-DD` (precomputed for the decline comment's reply window) + +--- + +## Phase 2: EVALUATE THREE GATES + +You're checking three gates. **All three** inform the verdict. + +### Gate A — Direction alignment + +Does the PR align with `direction.md`? + +- **aligned**: PR clearly fits one of the "What Archon IS" clauses, or extends an existing pattern. +- **conflict**: PR clearly violates a "What Archon is NOT" clause. Cite the specific clause (e.g. `direction.md §single-developer-tool`). +- **unclear**: PR raises a question `direction.md` doesn't answer (touches an "Open question" or a new concern). Note it for later direction-doc evolution. + +### Gate B — Scope focus + +Does the PR do **one thing**? + +- **focused**: PR has a single feature, single fix, or single coherent refactor. Size is fine — a 2000-line PR can be focused if it's all one feature. +- **multiple_concerns**: PR mixes 2+ unrelated changes (e.g. "fix the bug + add new feature + bump deps + reformat"). The right action is to ask the contributor to split it. +- **too_broad**: One ostensibly-coherent change but with sprawling collateral edits across unrelated subsystems. Fixable by tighter scope, but currently too much to review. + +To assess scope, look at: +- Diff structure: do the changed files cluster around a single concern, or sprawl? +- Title + body: does the contributor describe one change, or several "while I was here" changes? +- Commit history if visible in `gh pr view`: is the PR a single coherent story, or accreted fixes? + +### Gate C — Template quality + +Was `.github/PULL_REQUEST_TEMPLATE.md` filled in? + +- **good**: All template sections completed thoughtfully (Summary, Validation, Security, Rollback, etc.). +- **partial**: Template structure present but several sections empty or perfunctory ("N/A", "TBD", or single-word answers where prose is expected). +- **empty**: No template, or template skeleton with all sections blank. + +The PR body is in `pr_data.body`. If you need the template's expected structure for comparison, that's the one allowed extra read: `cat .github/PULL_REQUEST_TEMPLATE.md`. + +--- + +## Phase 3: DECIDE VERDICT + +Combine the three gates into a single verdict. + +| Direction | Scope | Template | → Verdict | +|-----------|-------|----------|-----------| +| aligned | focused | good or partial | **review** — proceed to deep review | +| aligned | focused | empty | **review** with note in synthesis to nudge template | +| aligned | multiple_concerns | * | **needs_split** — draft "split this up" comment | +| aligned | too_broad | * | **needs_split** — same | +| conflict | * | * | **decline** — draft polite-decline citing direction clause | +| unclear | * | * | **unclear** — surface to maintainer for manual call | + +When the gate is `unclear`, do NOT draft a decline comment. The maintainer needs to decide. + +When the verdict is `decline` or `needs_split`, draft the comment per Phase 4. + +--- + +## Phase 4: DRAFT THE DECLINE COMMENT (only if verdict in [decline, needs_split]) + +The drafted comment is the **bot's voice** — polite, specific, citing direction.md when relevant, and giving the contributor a clear path forward. + +### Tone rules + +- Open with thanks for the contribution. Always. +- Be **specific** about why — cite the direction.md clause, name the multiple concerns, list the empty template sections. Vague "this isn't a fit" is not acceptable. +- Offer a concrete path forward when one exists (split into PRs A + B + C; pick a different scope; fill in template sections X/Y/Z). +- Include a **3-day reply window**: state the date 3 days from today. If the contributor doesn't reply by then with reasoning to keep the PR open, it will be closed. Don't say "automatically" — the maintainer will close manually. +- No corporate-speak, no emoji, no AI-attribution. + +### Templates by category + +**For `decline` (direction conflict)**: + +```markdown +Thanks for putting this together, @! + +Unfortunately this isn't a direction we're taking with Archon. Specifically, this conflicts with `direction.md §`: . + +If you disagree with that direction call, reply here by **** and we'll discuss. Otherwise this PR will be closed after that date so the queue stays focused. + +For context, the project's stated scope lives at [`.archon/maintainer-standup/direction.md`](../blob/dev/.archon/maintainer-standup/direction.md). Open questions there are fair game for proposals — feel free to raise an issue if you'd like to push for a direction change. +``` + +**For `needs_split` (multiple concerns)**: + +```markdown +Thanks for the work here, @! + +This PR bundles several independent changes: . Each is potentially valuable but reviewing them together makes regressions hard to isolate and reverts hard to scope. + +Could you split this into focused PRs, one per concern? Suggested split: +1. +2. +3. + +If you'd rather discuss the split approach first, reply here by ****. Otherwise this PR will be closed in favor of the split versions after that date. +``` + +**For `needs_split` (too broad / sprawling)**: + +```markdown +Thanks for the contribution, @! + +The change touches a wide range of subsystems () which makes it hard to review as a single unit. Could you tighten the scope — focus on first and split the collateral edits into a follow-up PR? + +If you think the current scope is necessary, reply here by **** with reasoning. Otherwise this PR will be closed after that date so a tighter version can land. +``` + +Adapt the wording. Don't paste the templates verbatim if the situation is more nuanced — they're starting points. + +### Compute DATE-3-DAYS-OUT + +Use `read-context.output.deadline_3d` directly — it's already today-plus-three-calendar-days in `YYYY-MM-DD` form, computed deterministically by the gather script (sv-SE locale → ISO date in local time). Do **not** anchor to `prior_state.last_run_at`; that field can be days or weeks stale and would produce a deadline already in the past. + +If for any reason `deadline_3d` is missing or empty, abort the comment draft and surface this to the maintainer in the gate-decision artifact rather than guessing. + +--- + +## Phase 5: WRITE ARTIFACTS + +You **must** write two files using the Write tool before returning your structured output: + +### `$ARTIFACTS_DIR/gate-decision.md` + +Full reasoning for the maintainer's review: + +```markdown +# Gate Decision — PR # + +## Verdict + + +## Direction alignment + + + +## Scope assessment + + + +## Template quality + + + +## Cited direction clauses +- direction.md § +- direction.md § + +## Reasoning +<2-3 sentence summary> + +## Drafted decline comment (if applicable) + + +``` + +### `$ARTIFACTS_DIR/decline-comment.md` + +Only the decline comment body (used directly by the `post-decline` bash node as `--body-file`): + +If verdict is `review` or `unclear`, write a single line: `(no decline comment — verdict was )`. + +If verdict is `decline` or `needs_split`, write the drafted comment in markdown — exactly as it should appear on the PR. + +--- + +## Phase 6: RETURN STRUCTURED OUTPUT + +**This is the final step. After the artifacts are written, your entire response must be ONE JSON object — nothing else.** + +Allowed output shapes (Pi's parser handles either): + +1. **Bare JSON** — preferred: + ```json + {"verdict":"review","direction_alignment":"aligned",...} + ``` + +2. **Fenced JSON** — also fine: + ````markdown + ```json + {"verdict":"review","direction_alignment":"aligned",...} + ``` + ```` + +**NOT ALLOWED:** +- Prose before the JSON ("Looking at this PR..." / "Here is my analysis..."). +- Prose after the JSON ("This concludes the gate decision."). +- Bullet-point summaries restating fields. +- Markdown headers like `**Gate A**`. +- Any text outside the single JSON object or its fences. + +If you find yourself wanting to explain — that explanation belongs in `$ARTIFACTS_DIR/gate-decision.md`, NOT in your response. + +### Required fields + +- `verdict`: one of `review` / `decline` / `needs_split` / `unclear` +- `direction_alignment`: `aligned` / `conflict` / `unclear` +- `scope_assessment`: `focused` / `multiple_concerns` / `too_broad` +- `template_quality`: `good` / `partial` / `empty` +- `decline_categories`: array of strings, e.g. `["direction"]` or `["scope", "template"]`. Empty array `[]` when verdict is `review` or `unclear`. +- `cited_direction_clauses`: array of strings, e.g. `["direction.md §single-developer-tool"]`. Empty `[]` if none. +- `reasoning`: 1-3 sentence summary (string). + +### CHECKPOINT — before returning + +- [ ] Direction.md was actually read (not assumed). +- [ ] Decline comment cites a specific direction clause OR specific scope concerns OR specific empty template sections — never vague. +- [ ] Decline comment has a concrete `YYYY-MM-DD` 3-day deadline. +- [ ] `$ARTIFACTS_DIR/gate-decision.md` written. +- [ ] `$ARTIFACTS_DIR/decline-comment.md` written (placeholder line if not declining). +- [ ] **Final response is ONE JSON object — no prose, no headers, no bullet summary. Bare JSON or fenced JSON only.** diff --git a/.archon/commands/maintainer-review-report.md b/.archon/commands/maintainer-review-report.md new file mode 100644 index 0000000000..646510a105 --- /dev/null +++ b/.archon/commands/maintainer-review-report.md @@ -0,0 +1,86 @@ +--- +description: Produce the final summary across all branches of maintainer-review-pr (review / decline / unclear) for the workflow log +argument-hint: (no arguments — reads upstream artifacts) +--- + +# Maintainer Review — Final Report + +You are the final reporter. The workflow has finished one of three branches (review / decline / unclear). Your job: produce a one-screen summary that tells the maintainer what just happened and what's pending. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: DETECT WHICH BRANCH RAN + +Check what artifacts exist: + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number 2>/dev/null) +ls $ARTIFACTS_DIR/ +ls $ARTIFACTS_DIR/review/ 2>/dev/null +cat $ARTIFACTS_DIR/gate-decision.md 2>/dev/null | head -30 +``` + +Three possibilities: + +1. **Review branch ran**: `$ARTIFACTS_DIR/review/synthesis.md` exists. +2. **Decline branch ran**: `$ARTIFACTS_DIR/decline-comment.md` exists with non-placeholder content; the post-decline bash node already posted to GitHub. +3. **Unclear branch ran**: gate verdict was `unclear` and the maintainer was prompted to decide manually. + +--- + +## Phase 2: WRITE THE FINAL REPORT + +Write `$ARTIFACTS_DIR/final-report.md`: + +```markdown +# Maintainer Review — PR # — Final + +## Branch taken + + +## Gate decision + + +## Outcome + +### If review branch: +- Synthesized verdict: +- Findings: +- Aspects run: +- **Draft comment**: $ARTIFACTS_DIR/review/review-comment.md (copy-paste or edit before posting to PR) +- **Full synthesis**: $ARTIFACTS_DIR/review/synthesis.md + +### If decline branch: +- Decline categories: +- Cited direction clauses: +- Comment posted to PR: yes +- Reply window: +- Awaiting-author label added: read `$ARTIFACTS_DIR/.label-applied` — value is `applied` or `skipped`. If `skipped`, surface why by reading `$ARTIFACTS_DIR/.label-error` (gh stderr) and include a one-line explanation. **Do not say `yes` if the file says `skipped`** — say `no, label add failed: ` so the maintainer can decide whether to add it manually. + +### If unclear branch: +- Gate could not classify confidently. +- Maintainer prompted manually — outcome recorded in approval-gate response. + +## Next steps for the maintainer +<2-3 short bullets. e.g.: +- "Read $ARTIFACTS_DIR/review/review-comment.md and post to PR." +- "Wait for contributor reply by ; if no reply, close PR." +- "Update direction.md to address the open question this PR raised: ".> +``` + +--- + +## Phase 3: RETURN + +Return a single-line outcome: + +``` +PR # — branch=, verdict=, action=. +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/final-report.md` written. +- [ ] Correctly identifies which branch ran (don't pretend the review branch ran when it didn't). +- [ ] Lists concrete next steps for the maintainer. diff --git a/.archon/commands/maintainer-review-synthesize.md b/.archon/commands/maintainer-review-synthesize.md new file mode 100644 index 0000000000..bfdd3abb28 --- /dev/null +++ b/.archon/commands/maintainer-review-synthesize.md @@ -0,0 +1,156 @@ +--- +description: Synthesize findings from all review aspects into a single maintainer-ready review report (Pi-tuned) +argument-hint: (no arguments — reads review/*.md artifacts and writes synthesis) +--- + +# Maintainer Review — Synthesize + +You are the synthesizer. Read all available review-aspect findings, deduplicate overlap, prioritize, and produce a single maintainer-ready review summary plus a draft GitHub comment. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +### PR number +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +``` + +### Read every available review findings file +```bash +ls $ARTIFACTS_DIR/review/ +``` + +Then read each one: +- `code-review-findings.md` (always present if review branch ran) +- `error-handling-findings.md` (present if classifier said yes) +- `test-coverage-findings.md` (present if classifier said yes) +- `comment-quality-findings.md` (present if classifier said yes) +- `docs-impact-findings.md` (present if classifier said yes) + +Some files may be missing — that's expected. Don't error. + +### Read the gate decision (for context) +```bash +cat $ARTIFACTS_DIR/gate-decision.md +``` + +The gate may have noted things ("template was empty — nudge in synthesis"). Carry those notes forward. + +--- + +## Phase 2: AGGREGATE + DEDUPLICATE + +Issues often surface in multiple aspects (e.g. a missing test for an error path shows up in error-handling AND test-coverage). Don't list the same finding twice. Pick the most actionable wording and merge. + +Group findings by **severity** across all aspects, not by aspect: + +- **CRITICAL** (across aspects): merge / blocking / data-loss / silent-failure issues. +- **HIGH**: real bugs, missing test for a fix, missing docs for a new public surface, CLAUDE.md violation. +- **MEDIUM**: edge cases, comment rot risks, minor docs polish. +- **LOW / NITPICK**: style, naming, optional improvements. + +Within each tier, order by file path so the maintainer can scan top-to-bottom. + +--- + +## Phase 3: WRITE THE SYNTHESIS + +Write `$ARTIFACTS_DIR/review/synthesis.md`: + +```markdown +# Maintainer Review — PR # + +## Verdict + + +## Summary +<2-3 sentence overview. What the PR does, what's good, what's blocking.> + +## Findings + +### CRITICAL (N) +- ****: + - From: + - **Suggested fix**: + +### HIGH (N) +- (same format) + +### MEDIUM (N) +- (same format) + +### LOW / NITPICK (N) +- (consolidated) + +## CLAUDE.md compliance + + +## Gate-decision notes + + +## Aspects run +- code-review: +- error-handling: +- test-coverage: +- comment-quality: +- docs-impact: + +## Aspects skipped + +``` + +--- + +## Phase 4: WRITE THE DRAFT PR COMMENT + +Write `$ARTIFACTS_DIR/review/review-comment.md` — this is the markdown body that would be posted to the PR. The maintainer can copy-paste it or hand-edit before posting. + +Format: + +```markdown +## Review Summary + +**Verdict**: + +<2-3 sentence overview written for the PR author, not for the maintainer.> + +### Blocking issues +- (list CRITICAL findings, file:line, fix suggestion) + +### Suggested fixes +- (list HIGH findings) + +### Minor / nice-to-have +- (list MEDIUM + LOW combined) + +### Compliments + + +--- +*Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: .* +``` + +Tone for the PR comment: +- Address the contributor directly ("you", "your change"). +- Be **specific** — file:line + concrete fix. +- No corporate-speak, no excessive praise, no AI-attribution-by-name (the footer line is enough). + +--- + +## Phase 5: RETURN + +Return a single-line summary: + +``` +Synthesized: . CRITICAL / HIGH / MEDIUM / LOW findings across aspects. Comment drafted at $ARTIFACTS_DIR/review/review-comment.md. +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/synthesis.md` written. +- [ ] `$ARTIFACTS_DIR/review/review-comment.md` written. +- [ ] Findings deduplicated across aspects. +- [ ] Severity ordering correct. +- [ ] Skipped aspects listed with reason. diff --git a/.archon/commands/maintainer-review-test-coverage.md b/.archon/commands/maintainer-review-test-coverage.md new file mode 100644 index 0000000000..5b91c4ef9b --- /dev/null +++ b/.archon/commands/maintainer-review-test-coverage.md @@ -0,0 +1,101 @@ +--- +description: Review the PR for test coverage — does new behavior have tests, are critical paths exercised, do existing tests still cover what they should (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Test Coverage + +You are a test-focused reviewer. Run **only** when the diff touches source code (not pure docs / config / tests). Your job: assess whether the new behavior is properly tested. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +gh pr diff $PR_NUMBER +``` + +Read the project's testing conventions in `CLAUDE.md`: +- Mock isolation rules (Bun `mock.module` is process-global; spyOn is preferred for internal modules) +- Per-package test isolation (split bun test invocations to avoid mock pollution) +- `bun run test` (not `bun test` from repo root) + +--- + +## Phase 2: ANALYZE + +For each non-trivial code change, ask: + +### Behavioral coverage +- Is the **happy path** covered? +- Are **edge cases** covered? (Empty input, oversized input, malformed input, concurrent calls, etc.) +- Are **error paths** covered? (Throws when expected, returns null when expected.) +- Is the test asserting on the **right thing**? (Output value? Side effect? Both?) + +### Test quality +- Are tests deterministic? No timing, no real network, no real filesystem unless intentional? +- Mock pollution: does the file use `mock.module()` in a way that conflicts with other test files in the same package? +- Test isolation: does each test set up and tear down its own state? + +### Coverage gaps to flag +- New public function with no test → flag. +- New conditional branch with no test → flag. +- Bug fix without a regression test → flag (the test should fail before the fix). +- New error path with no test → flag. + +### Don't flag +- Trivial getters/setters with no logic. +- Internal helpers tested transitively through public API tests. +- Documentation-only or formatting-only changes. + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/test-coverage-findings.md`: + +```markdown +# Test Coverage Review — PR # + +## Summary +<1-2 sentences. Coverage: adequate / minor-gaps / significant-gaps.> + +## Findings + +### CRITICAL — bug fix without regression test +- ****: + - **Suggested test**: + +### HIGH — new behavior without coverage +- (same format) + +### MEDIUM — edge cases / error paths missing +- (same format) + +### LOW — improvements +- (same format) + +## Mock isolation concerns + + +## Notes for synthesizer + +``` + +If coverage is adequate, write `## Findings\n\nAdequate coverage for the changed behavior.` and stop. + +--- + +## Phase 4: RETURN + +``` +Test-coverage review complete. CRITICAL, HIGH, MEDIUM, LOW findings. Coverage: . +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/test-coverage-findings.md` written. +- [ ] Each CRITICAL/HIGH cites a specific function / branch and proposes a concrete test. +- [ ] No invented gaps. If coverage is good, say so. diff --git a/.archon/commands/maintainer-standup.md b/.archon/commands/maintainer-standup.md new file mode 100644 index 0000000000..2e549fb9a1 --- /dev/null +++ b/.archon/commands/maintainer-standup.md @@ -0,0 +1,161 @@ +--- +description: Synthesize the maintainer's morning standup brief from gathered git/PR/issue/state data +argument-hint: (no arguments — all context provided via upstream nodes) +--- + +# Maintainer Standup Synthesis + +You are producing a daily maintainer briefing for the Archon project. The user is the maintainer running this workflow. Your job is to read the gathered facts, cross-reference against the project's direction document and the maintainer's profile, and produce a prioritized brief plus state to persist for tomorrow's run. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD INPUTS + +You have three sources of upstream context, all already gathered. Each is a JSON string that you should parse. + +### Git status (origin/dev movement since last run) + +``` +$git-status.output +``` + +Fields: `current_dev_sha`, `prior_dev_sha`, `current_branch`, `is_dirty`, `pull_status`, `new_commits`, `diff_stat`. + +### GitHub data (PRs, issues, review requests, recently closed) + +``` +$gh-data.output +``` + +Fields: `gh_handle`, `since_date`, `all_open_prs`, `review_requested`, `authored_by_me`, `issues_assigned`, `recent_unlabeled_issues`, `recently_closed_prs`, `recently_closed_issues`, `my_recent_commits`. + +### Local context (direction doc, maintainer profile, prior state, recent briefs) + +``` +$read-context.output +``` + +Fields: `direction` (markdown string), `profile` (markdown string), `prior_state` (object or null), `recent_briefs` (array of `{date, content}`). + +--- + +## Phase 2: ANALYZE + +### 2a. Detect first-run vs ongoing + +If `prior_state` is `null` and `recent_briefs` is empty, this is a **first run**. Skip "Since last run" comparisons; produce a baseline triage and state snapshot the next run can diff against. + +### 2b. Compare prior state to current reality (progress detection) + +When `prior_state` exists: + +- **Resolved since last run**: PRs in `prior_state.observed_prs` whose numbers do NOT appear in current `gh-data.output.all_open_prs` — they were closed or merged. Cross-reference against `gh-data.output.recently_closed_prs` to know whether they merged or were closed without merging. Same for issues. +- **Carry-over revisited**: each item in `prior_state.carry_over` — is it still open? Did its status change? If resolved, mention briefly under "Resolved since last run" and DROP from `next_state.carry_over`. If still pending, keep with original `first_seen` date (so age is preserved). +- **What you shipped**: `gh-data.output.my_recent_commits` lists the maintainer's commits since the last run. Summarize meaningfully — group by area, highlight notable ones. Don't just list shas. +- **New since last run**: PRs in current `all_open_prs` whose numbers are NOT in `prior_state.observed_prs` are new this run. Same for issues. + +### 2c. Read the direction doc and profile + +The `direction` markdown defines what Archon IS / IS NOT. The `profile` markdown describes the maintainer's role, scope, and current focus. Both inform the triage: + +- **Profile scope** drives breadth of coverage. `scope: everything` (main maintainer) means classify all open PRs, not just ones touching the maintainer's focus areas. +- **Direction clauses** drive the polite-decline classification. PRs adding multi-tenancy, hosted-service features, or anything contradicting the IS-NOT list go to P4 with a citation. +- **Profile focus areas** weight prioritization within P1-P3 — items aligned with current focus rank higher. + +### 2d. Triage all open PRs into P1-P4 + +For each PR in `all_open_prs`: + +- **P1 (Do today)**: ready-to-merge PRs awaiting your review (`reviewDecision: APPROVED` or null AND `mergeStateStatus: clean`), security fixes, items breaking dev, blockers for an in-flight release. **Note**: `mergeStateStatus` is the only CI/merge signal in the gathered payload (values: `clean`, `unstable`, `dirty`, `blocked`, `behind`, `unknown`). For ambiguous cases run `gh pr checks ` to verify CI before classifying as P1. +- **P2 (This week)**: in-flight PRs needing review or maintainer feedback, PRs with merge conflicts that can be unblocked, PRs from the maintainer's current focus areas that are progressing. +- **P3 (Whenever)**: low-urgency items, drafts you authored, exploratory PRs, items outside current focus that aren't time-sensitive. +- **P4 (Polite-decline candidates)**: PRs that conflict with `direction.md`. Each P4 entry MUST cite a specific clause (e.g., `direction.md §single-developer-tool`). + +You may use `gh pr view `, `gh pr diff `, or `gh pr checks ` to drill into PRs whose triage classification cannot be determined from the metadata alone. Be selective — drilling into all 60+ PRs is wasteful. Drill into 5-10 of the most ambiguous or interesting cases. + +### 2e. Triage issues + +Issues in `issues_assigned` and `recent_unlabeled_issues` follow the same P1-P4 classification. Use `gh issue view ` to drill into ambiguous ones. Recently-filed unlabeled issues are likely candidates for first-pass labeling. + +### 2f. Surface direction questions + +If any PR raises a "we don't have a stance on this" question that `direction.md` doesn't answer, surface it under **Direction questions raised**. These go into `next_state.direction_questions` so the maintainer can absorb them into `direction.md` over time. + +### 2g. Carry-over aging + +Items that have been in `prior_state.carry_over` for multiple runs (check `first_seen` dates) are higher priority — surface them prominently and consider escalating their P-level. + +--- + +## Phase 3: GENERATE OUTPUT + +Return a JSON object matching the workflow's `output_format` schema. Do not write any files yourself — the workflow's `persist` node handles disk writes from your structured response. + +### `brief_markdown` (string) + +A maintainer-ready markdown brief. Adapt sections — omit empty ones, add others if useful. Keep entries to one line each. The brief should be readable on a single screen. + +```markdown +# Maintainer Standup — YYYY-MM-DD + +## Since last run +- (Summary of new commits on dev with notable highlights, or "first run — baseline snapshot") +- (Mention pull_status if not 'pulled': dirty/not_on_dev/pull_failed) + +## What you shipped +- (One-line summary grouped by area, derived from `my_recent_commits`. Omit if empty.) + +## Resolved since last run +- **PR #N** — [title] — merged ✓ / closed +- **Issue #N** — [title] — closed +- (Omit section if nothing resolved.) + +## P1 — Do today +- **PR #N** — [title] ([+X/-Y]) — [why P1, e.g. "ready to merge, awaiting your review"] +- **Issue #N** — [title] — [why P1] + +## P2 — This week +- (Same format) + +## P3 — Whenever +- (Same format) + +## P4 — Polite-decline candidates +- **PR #N** — [title] by @[author] — Conflicts with `direction.md §[clause]`. [One-line reason.] + +## Direction questions raised +- (PR #N raises: should Archon support [Y]? Add a stance to direction.md.) +- (Or omit if none.) + +## Carry-over still pending +- **PR #N** — [title] — first seen YYYY-MM-DD ([N] runs ago) — [current status] +- (Omit section if nothing carried over.) +``` + +### `next_state` (object) + +Carry-over state for tomorrow's run. Schema: + +- `last_run_at`: current ISO-8601 timestamp (use the actual timestamp at synthesis time). +- `last_dev_sha`: value from `git-status.output.current_dev_sha`. +- `carry_over`: items the next run should remember as "still pending." For items already in `prior_state.carry_over` that are still pending, **preserve the original `first_seen` date** so age is tracked correctly. +- `observed_prs`: snapshot of ALL currently-open PRs (number + title only) — used to detect new PRs and resolved PRs next run. This must include every entry in `all_open_prs`, not just ones you classified. +- `observed_issues`: same for assigned + unlabeled issues. +- `direction_questions`: new direction questions surfaced this run (string array). + +### PHASE_3_CHECKPOINT + +- [ ] Every PR in `all_open_prs` is either classified into P1-P4 OR included in `observed_prs` (no PR silently dropped). +- [ ] All P4 entries cite a specific `direction.md §clause`. +- [ ] Carry-over items still pending have their original `first_seen` preserved. +- [ ] Resolved-since-last-run items are surfaced in the brief AND removed from `next_state.carry_over`. +- [ ] `next_state.last_dev_sha` is set from `git-status.output.current_dev_sha`. +- [ ] `next_state.observed_prs` includes ALL currently-open PRs. + +--- + +## Phase 4: REPORT + +Return the JSON object only. The workflow's `persist` node writes `brief_markdown` to `.archon/maintainer-standup/briefs/.md` and `next_state` to `.archon/maintainer-standup/state.json`. Do not write files yourself. diff --git a/.archon/maintainer-standup/README.md b/.archon/maintainer-standup/README.md new file mode 100644 index 0000000000..3395682999 --- /dev/null +++ b/.archon/maintainer-standup/README.md @@ -0,0 +1,53 @@ +# Maintainer Standup + +Daily morning briefing for Archon maintainers. Pulls latest `dev`, fetches all open PRs and assigned issues, classifies them **P1–P4** against `direction.md`, and surfaces progress versus the previous run (merged, closed, what you shipped). + +## Files in this folder + +| File | Committed? | Purpose | +|------|:---:|---------| +| `direction.md` | ✓ | Project north-star — what Archon IS / IS NOT. **Shared by all maintainers.** Drives PR triage and polite-decline classification. | +| `README.md` | ✓ | This file. | +| `profile.md.example` | ✓ | Template for new maintainers to copy. | +| `profile.md` | gitignored | Your personal config (gh handle, role, focus areas). | +| `state.json` | gitignored | Auto-written carry-over for the next run. | +| `briefs/YYYY-MM-DD.md` | gitignored | Daily prose briefs. Last 3 are read into the next run. | + +`direction.md` is committed because triage decisions should be consistent across maintainers and across runs. `profile.md`, `state.json`, and `briefs/` are personal — your focus, your daily notes, your reading material — so each maintainer manages their own. + +## Setup for a new maintainer + +1. Copy the template: + ```bash + cp .archon/maintainer-standup/profile.md.example .archon/maintainer-standup/profile.md + ``` +2. Edit `profile.md`: + - Set `gh_handle` to your GitHub login. + - Set `role` and `scope` to match your maintainer focus (`main_maintainer` / `everything` for full coverage; narrower for sub-maintainers). + - Optionally fill in **Currently focused on** — the synthesizer weights items toward what you list there. +3. Run it: + ```bash + archon workflow run maintainer-standup "" + ``` +4. The first run is a baseline (no prior state to diff). Subsequent runs compare against `state.json` and surface "Resolved since last run" / "What you shipped" / aged carry-over items. + +## How it works (engine view) + +1. **Three gather scripts** run in parallel (`bun`, no AI): + - `maintainer-standup-git-status.ts` — fetches `origin/dev`, fast-forwards if safe, captures new commits + diff stat since the last recorded SHA. + - `maintainer-standup-gh-data.ts` — pulls open PRs (full metadata), review-requested PRs, authored-by-me PRs, assigned issues, recently-filed unlabeled issues, and recently-closed PRs/issues since the last run. + - `maintainer-standup-read-context.ts` — reads `direction.md`, `profile.md`, `state.json`, and the last 3 briefs. +2. **Synthesis node** (`command: maintainer-standup`, Claude Sonnet, structured output) reads everything, optionally drills into specific PRs/issues with `gh pr view` / `gh issue view`, classifies P1–P4 against `direction.md`, and returns `{ brief_markdown, next_state }`. +3. **Persist node** writes `brief_markdown` to `briefs/YYYY-MM-DD.md` and `next_state` to `state.json`. + +The workflow runs **in the live checkout** (`worktree.enabled: false`) — it has to read this folder and pull `dev`. `--branch` and `--no-worktree` flags are rejected. + +## Editing direction.md + +`direction.md` is the source of truth for "what Archon is / isn't" during PR triage. Add a clause when a triage decision needs justification (so the next maintainer can reach the same conclusion). When declining a PR, cite the clause inline (e.g., `direction.md §single-developer-tool`). + +The synthesizer also surfaces **Direction questions raised** — PRs that touch areas where `direction.md` has no stance yet. Use those to evolve the doc deliberately rather than deciding case-by-case. + +## Customizing the brief format + +The output structure is defined in `.archon/commands/maintainer-standup.md`. Adjust the Phase 3 template if you want different sections or a different P-tier scheme. The synthesizer's `output_format` schema lives in `.archon/workflows/maintainer-standup.yaml`. diff --git a/.archon/maintainer-standup/direction.md b/.archon/maintainer-standup/direction.md new file mode 100644 index 0000000000..07cd83ab79 --- /dev/null +++ b/.archon/maintainer-standup/direction.md @@ -0,0 +1,41 @@ +# Archon Direction + +The maintainer-standup workflow consults this document when triaging PRs and issues to suggest which contributions align with the project and which are likely polite-decline candidates. + +This file is **committed and shared by all maintainers**. Edit deliberately — direction calls live here so that PR triage stays consistent across runs and across maintainers. When declining a PR, cite the specific clause (e.g., `direction.md §single-developer-tool`). + +--- + +## What Archon IS + +- **A remote agentic coding platform.** Control AI coding assistants (Claude Code SDK, Codex SDK, Pi community provider) remotely from Slack, Telegram, GitHub, Discord, CLI, and Web UI. +- **A single-developer tool.** No multi-tenant complexity. Built for one practitioner running their own instance. +- **Platform-agnostic at the conversation layer.** Unified interface across adapters via `IPlatformAdapter`. Stream/batch AI responses in real time. +- **Workflow-driven.** Reproducible AI execution chains defined as YAML DAGs in `.archon/workflows/`. Workflows run in isolated git worktrees by default. +- **Type-safe.** Strict TypeScript everywhere. No `any` without justification. +- **Composable.** Scripts in `.archon/scripts/`, commands in `.archon/commands/`, workflows compose them. +- **Self-hostable.** Bun + TypeScript runtime. SQLite by default; PostgreSQL optional. Zero external service dependencies for core operation. + +## What Archon is NOT + +- **Not multi-tenant.** No user accounts, role management, billing, or SaaS scaffolding. PRs adding these conflict with the single-developer thesis. +- **Not a hosted service.** No proprietary backend dependencies. Self-hosted by design. +- **Not a general-purpose chat UI.** Adapters are conversation surfaces for *workflow execution*, not standalone chat experiences. +- **Not a replacement for the AI coding agent itself.** Archon orchestrates Claude Code / Codex / Pi — it doesn't reimplement them. +- **Not opinionated about the dev environment.** No mandatory editor integrations, framework lock-in, or Docker requirement beyond what users opt into. +- **Not a workflow marketplace.** Bundled workflows are reference patterns; Archon is not aiming to be a hub for third-party workflow distribution. + +## Open questions (no stance yet) + +These are direction calls we haven't made. PRs that touch these areas should surface the question for explicit decision rather than be silently accepted or rejected. The workflow may add to this list as new questions appear. + +- (No open questions yet — populated over time.) + +--- + +## How to evolve this doc + +- Add a "What Archon IS" or "is NOT" line when a PR triage forces a direction call. +- Move "Open questions" entries to the IS / IS NOT sections once decided. +- Reference the relevant clause in PR comments when declining: `direction.md §single-developer-tool`. +- Keep entries short — one or two lines each. The point is fast lookup during triage, not a manifesto. diff --git a/.archon/maintainer-standup/profile.md.example b/.archon/maintainer-standup/profile.md.example new file mode 100644 index 0000000000..220f7a26c6 --- /dev/null +++ b/.archon/maintainer-standup/profile.md.example @@ -0,0 +1,28 @@ +--- +# Required: your GitHub login (used by gh queries for review-requested / assigned filters). +gh_handle: your-github-login + +# Suggested: drives how broadly the synthesizer classifies the queue. +# - main_maintainer / everything → triage all open PRs, not just yours +# - reviewer / focus-area → narrower coverage +role: main_maintainer +scope: everything +--- + +# Maintainer Profile — Your Name + +One paragraph on how you want the brief tuned. The synthesizer reads this verbatim, so write what you actually want it to do. + +Example: + +> I'm a sub-maintainer focused on the workflow engine. Show me PRs that touch packages/workflows/ first; deprioritize adapter-only PRs unless they're P1. + +## What I want from the brief + +- (Whatever level of full-repo coverage you want) +- (How aggressively to flag polite-decline candidates) +- (Whether to surface drafts, third-party PRs, etc.) + +## Currently focused on + +- (Update as priorities shift. Items here rank higher within their P-tier.) diff --git a/.archon/scripts/maintainer-standup-gh-data.ts b/.archon/scripts/maintainer-standup-gh-data.ts new file mode 100644 index 0000000000..eb0d03964b --- /dev/null +++ b/.archon/scripts/maintainer-standup-gh-data.ts @@ -0,0 +1,195 @@ +#!/usr/bin/env bun +/** + * Fetches GitHub data for the maintainer-standup synthesis: all open PRs + * (light metadata), review-requested PRs, authored-by-me PRs, assigned issues, + * recent unlabeled issues, and recently-closed PRs/issues since the last run. + * + * Reads gh_handle from .archon/maintainer-standup/profile.md frontmatter. + * + * Output: JSON to stdout. + */ +import { execFileSync } from 'node:child_process'; +import { existsSync, readFileSync } from 'node:fs'; +import { resolve } from 'node:path'; + +// execFileSync with argv arrays — avoids shell-string interpolation and the +// associated quoting hazards (esp. for handles loaded from profile.md). +function exec(file: string, args: string[]): string { + try { + return execFileSync(file, args, { stdio: ['ignore', 'pipe', 'pipe'] }).toString(); + } catch (e) { + process.stderr.write(`${file} command failed: ${file} ${args.join(' ')}\n${(e as Error).message}\n`); + return '[]'; + } +} + +function parseJson(s: string, fallback: T): T { + try { + return JSON.parse(s) as T; + } catch { + return fallback; + } +} + +// ── Load gh_handle from profile.md frontmatter ── +let ghHandle = ''; +const profilePath = resolve(process.cwd(), '.archon/maintainer-standup/profile.md'); +if (existsSync(profilePath)) { + const profile = readFileSync(profilePath, 'utf8'); + const match = profile.match(/^gh_handle:\s*(\S+)\s*$/m); + if (match) ghHandle = match[1]; +} +if (!ghHandle) { + process.stderr.write('Warning: no gh_handle found in profile.md frontmatter\n'); +} + +// ── Load prior state to scope "recently closed" lookups ── +let lastRunAt = ''; +const statePath = resolve(process.cwd(), '.archon/maintainer-standup/state.json'); +if (existsSync(statePath)) { + try { + const state = JSON.parse(readFileSync(statePath, 'utf8')) as { last_run_at?: string }; + lastRunAt = state.last_run_at ?? ''; + } catch { + // ignore corrupt state + } +} + +// ── Open PRs (full metadata for triage) ── +const prFields = [ + 'number', + 'title', + 'author', + 'labels', + 'createdAt', + 'updatedAt', + 'isDraft', + 'mergeable', + 'mergeStateStatus', + 'reviewDecision', + 'headRefName', + 'baseRefName', + 'additions', + 'deletions', + 'changedFiles', + 'reviewRequests', +].join(','); + +// `gh pr list --json` does NOT auto-paginate beyond `--limit`. 1000 is the +// practical ceiling for a single GraphQL call and gives ~15× headroom over +// today's open-PR count. The next-run-diff invariant in the synthesis +// command (observed_prs must include every entry in all_open_prs) requires +// completeness here, so we warn loudly if we ever hit the cap. +const PR_LIMIT = 1000; +const allOpenPrs = parseJson( + exec('gh', ['pr', 'list', '--state', 'open', '--limit', String(PR_LIMIT), '--json', prFields]), + [], +); +if (allOpenPrs.length === PR_LIMIT) { + process.stderr.write( + `Warning: hit --limit ${PR_LIMIT} on all_open_prs. Some PRs may be silently truncated; ` + + `next-run "resolved since last run" detection will misclassify the dropped tail. ` + + `Switch to gh api graphql --paginate when this becomes a persistent issue.\n`, + ); +} + +let reviewRequested: unknown[] = []; +let authoredByMe: unknown[] = []; +let issuesAssigned: unknown[] = []; + +if (ghHandle) { + reviewRequested = parseJson( + exec('gh', [ + 'pr', 'list', + '--search', `is:open is:pr review-requested:${ghHandle}`, + '--json', 'number,title,author,createdAt,updatedAt', + ]), + [], + ); + authoredByMe = parseJson( + exec('gh', [ + 'pr', 'list', + '--author', ghHandle, + '--state', 'open', + '--json', 'number,title,createdAt,updatedAt,reviewDecision,mergeStateStatus', + ]), + [], + ); + issuesAssigned = parseJson( + exec('gh', [ + 'issue', 'list', + '--assignee', ghHandle, + '--state', 'open', + '--json', 'number,title,labels,createdAt,updatedAt,author', + ]), + [], + ); +} + +// ── Recent unlabeled issues (last 7 days) ── +const sevenDaysAgo = new Date(); +sevenDaysAgo.setDate(sevenDaysAgo.getDate() - 7); +const sevenDaysAgoStr = sevenDaysAgo.toISOString().slice(0, 10); +const recentUnlabeledIssues = parseJson( + exec('gh', [ + 'issue', 'list', + '--state', 'open', + '--search', `no:label created:>${sevenDaysAgoStr}`, + '--json', 'number,title,createdAt,author', + '--limit', '30', + ]), + [], +); + +// ── Recently closed/merged since last run (or last 7 days as fallback) ── +const sinceDate = lastRunAt ? lastRunAt.slice(0, 10) : sevenDaysAgoStr; +const recentlyClosedPrs = parseJson( + exec('gh', [ + 'pr', 'list', + '--state', 'closed', + '--search', `closed:>${sinceDate}`, + '--json', 'number,title,author,closedAt,mergedAt,state', + '--limit', '50', + ]), + [], +); +const recentlyClosedIssues = parseJson( + exec('gh', [ + 'issue', 'list', + '--state', 'closed', + '--search', `closed:>${sinceDate}`, + '--json', 'number,title,author,closedAt,state', + '--limit', '50', + ]), + [], +); + +// ── Maintainer's recent commits on dev (what you shipped) ── +let myRecentCommits = ''; +if (ghHandle) { + const since = lastRunAt || '7 days ago'; + try { + myRecentCommits = execFileSync( + 'git', + ['log', 'origin/dev', `--since=${since}`, `--author=${ghHandle}`, '--no-decorate', '--format=%h %s'], + { stdio: ['ignore', 'pipe', 'pipe'] }, + ).toString(); + } catch { + myRecentCommits = ''; + } +} + +console.log( + JSON.stringify({ + gh_handle: ghHandle, + since_date: sinceDate, + all_open_prs: allOpenPrs, + review_requested: reviewRequested, + authored_by_me: authoredByMe, + issues_assigned: issuesAssigned, + recent_unlabeled_issues: recentUnlabeledIssues, + recently_closed_prs: recentlyClosedPrs, + recently_closed_issues: recentlyClosedIssues, + my_recent_commits: myRecentCommits, + }), +); diff --git a/.archon/scripts/maintainer-standup-git-status.ts b/.archon/scripts/maintainer-standup-git-status.ts new file mode 100644 index 0000000000..9076c0eb0a --- /dev/null +++ b/.archon/scripts/maintainer-standup-git-status.ts @@ -0,0 +1,80 @@ +#!/usr/bin/env bun +/** + * Fetches origin/dev, optionally fast-forwards local dev, and reports new + * commits + diff stat since the last run's recorded SHA. + * + * Output: JSON to stdout with shape: + * { + * current_dev_sha, prior_dev_sha, current_branch, is_dirty, + * pull_status: 'pulled' | 'fetch_only' | 'pull_failed' | 'not_on_dev' | 'dirty', + * new_commits, diff_stat + * } + */ +import { execFileSync } from 'node:child_process'; +import { existsSync, readFileSync } from 'node:fs'; +import { resolve } from 'node:path'; + +// execFileSync (argv array, no shell) — defense-in-depth for git invocations. +// All args are hardcoded literals or values from `git` output (SHAs); using +// execFileSync removes any need to reason about shell metacharacters. +function git(args: string[]): { stdout: string; ok: boolean } { + try { + const out = execFileSync('git', args, { stdio: ['ignore', 'pipe', 'pipe'] }).toString(); + return { stdout: out, ok: true }; + } catch { + return { stdout: '', ok: false }; + } +} + +let priorSha = ''; +const stateFile = resolve(process.cwd(), '.archon/maintainer-standup/state.json'); +if (existsSync(stateFile)) { + try { + const state = JSON.parse(readFileSync(stateFile, 'utf8')) as { last_dev_sha?: string }; + priorSha = state.last_dev_sha ?? ''; + } catch { + // ignore corrupt state — first-run-like behavior + } +} + +git(['fetch', 'origin', 'dev']); + +const currentBranch = git(['rev-parse', '--abbrev-ref', 'HEAD']).stdout.trim(); +const isDirty = git(['status', '--porcelain']).stdout.trim().length > 0; + +let pullStatus: 'pulled' | 'fetch_only' | 'pull_failed' | 'not_on_dev' | 'dirty'; +if (currentBranch !== 'dev') { + pullStatus = 'not_on_dev'; +} else if (isDirty) { + pullStatus = 'dirty'; +} else { + const result = git(['pull', '--ff-only', 'origin', 'dev']); + pullStatus = result.ok ? 'pulled' : 'pull_failed'; +} + +const currentDevSha = git(['rev-parse', 'origin/dev']).stdout.trim(); + +let newCommits = ''; +let diffStat = ''; +if (priorSha && priorSha !== currentDevSha) { + // %h short SHA, %an author name, %s subject + const log = git(['log', `${priorSha}..origin/dev`, '--no-decorate', '--format=%h %an: %s']); + if (log.ok) { + newCommits = log.stdout; + diffStat = git(['diff', '--stat', `${priorSha}..origin/dev`]).stdout; + } else { + newCommits = '(prior SHA not found locally — full diff unavailable)'; + } +} + +console.log( + JSON.stringify({ + current_dev_sha: currentDevSha, + prior_dev_sha: priorSha, + current_branch: currentBranch, + is_dirty: isDirty, + pull_status: pullStatus, + new_commits: newCommits, + diff_stat: diffStat, + }), +); diff --git a/.archon/scripts/maintainer-standup-read-context.ts b/.archon/scripts/maintainer-standup-read-context.ts new file mode 100644 index 0000000000..0c4614f053 --- /dev/null +++ b/.archon/scripts/maintainer-standup-read-context.ts @@ -0,0 +1,67 @@ +#!/usr/bin/env bun +/** + * Loads local context for the maintainer-standup synthesis: direction.md + * (committed), profile.md (per-maintainer), prior state.json, and the most + * recent N briefs. + * + * Output: JSON to stdout. + */ +import { existsSync, readFileSync, readdirSync } from 'node:fs'; +import { resolve } from 'node:path'; + +const RECENT_BRIEFS_LIMIT = 3; + +const baseDir = resolve(process.cwd(), '.archon/maintainer-standup'); + +const directionPath = resolve(baseDir, 'direction.md'); +const direction = existsSync(directionPath) ? readFileSync(directionPath, 'utf8') : ''; + +const profilePath = resolve(baseDir, 'profile.md'); +const profile = existsSync(profilePath) ? readFileSync(profilePath, 'utf8') : ''; + +const statePath = resolve(baseDir, 'state.json'); +let priorState: unknown = null; +if (existsSync(statePath)) { + try { + priorState = JSON.parse(readFileSync(statePath, 'utf8')); + } catch { + priorState = null; + } +} + +const briefsDir = resolve(baseDir, 'briefs'); +const recentBriefs: { date: string; content: string }[] = []; +if (existsSync(briefsDir)) { + const files = readdirSync(briefsDir) + .filter((f) => f.endsWith('.md')) + .sort() + .reverse() + .slice(0, RECENT_BRIEFS_LIMIT); + for (const f of files) { + recentBriefs.push({ + date: f.replace(/\.md$/, ''), + content: readFileSync(resolve(briefsDir, f), 'utf8'), + }); + } +} + +// Deterministic clock — emit today's local date + a precomputed 3-day-out +// deadline so downstream prompts don't have to do calendar arithmetic +// (LLMs are unreliable at it) and don't anchor to stale prior_state.last_run_at +// (which can produce past deadlines on long gaps between runs). +const todayDate = new Date(); +const today = todayDate.toLocaleDateString('sv-SE'); // YYYY-MM-DD local +const deadlineDate = new Date(todayDate); +deadlineDate.setDate(deadlineDate.getDate() + 3); +const deadline_3d = deadlineDate.toLocaleDateString('sv-SE'); + +console.log( + JSON.stringify({ + direction, + profile, + prior_state: priorState, + recent_briefs: recentBriefs, + today, + deadline_3d, + }), +); diff --git a/.archon/workflows/defaults/archon-adversarial-dev.yaml b/.archon/workflows/defaults/archon-adversarial-dev.yaml index 68722c8b1a..bea7117f4a 100644 --- a/.archon/workflows/defaults/archon-adversarial-dev.yaml +++ b/.archon/workflows/defaults/archon-adversarial-dev.yaml @@ -117,7 +117,7 @@ nodes: - id: adversarial-sprint depends_on: [init-workspace] idle_timeout: 600000 - model: claude-opus-4-6[1m] + model: opus[1m] loop: prompt: | # Adversarial Development — Sprint Loop diff --git a/.archon/workflows/defaults/archon-feature-development.yaml b/.archon/workflows/defaults/archon-feature-development.yaml index 6d0747700d..a2ab7da87d 100644 --- a/.archon/workflows/defaults/archon-feature-development.yaml +++ b/.archon/workflows/defaults/archon-feature-development.yaml @@ -8,7 +8,7 @@ description: | nodes: - id: implement command: archon-implement - model: claude-opus-4-6[1m] + model: opus[1m] - id: create-pr command: archon-create-pr diff --git a/.archon/workflows/defaults/archon-fix-github-issue.yaml b/.archon/workflows/defaults/archon-fix-github-issue.yaml index 12ad675de9..a6fd0d235c 100644 --- a/.archon/workflows/defaults/archon-fix-github-issue.yaml +++ b/.archon/workflows/defaults/archon-fix-github-issue.yaml @@ -133,7 +133,7 @@ nodes: command: archon-fix-issue depends_on: [bridge-artifacts] context: fresh - model: claude-opus-4-6[1m] + model: opus[1m] # ═══════════════════════════════════════════════════════════════ # PHASE 5: VALIDATE diff --git a/.archon/workflows/defaults/archon-idea-to-pr.yaml b/.archon/workflows/defaults/archon-idea-to-pr.yaml index 9329c55021..1c2fe738d3 100644 --- a/.archon/workflows/defaults/archon-idea-to-pr.yaml +++ b/.archon/workflows/defaults/archon-idea-to-pr.yaml @@ -52,7 +52,7 @@ nodes: command: archon-implement-tasks depends_on: [confirm-plan] context: fresh - model: claude-opus-4-6[1m] + model: opus[1m] # ═══════════════════════════════════════════════════════════════════ # PHASE 4: VALIDATE diff --git a/.archon/workflows/defaults/archon-piv-loop.yaml b/.archon/workflows/defaults/archon-piv-loop.yaml index 7227900c2f..b544814e6b 100644 --- a/.archon/workflows/defaults/archon-piv-loop.yaml +++ b/.archon/workflows/defaults/archon-piv-loop.yaml @@ -198,14 +198,10 @@ nodes: 3. **Read example test files** — understand testing patterns 4. **Check for any recent changes** — `git log --oneline -10` - ## Step 2: Determine Plan Location + ## Step 2: Plan File Location - Generate a kebab-case slug from the feature name. - Save to `.claude/archon/plans/{slug}.plan.md`. - - ```bash - mkdir -p .claude/archon/plans - ``` + Save the plan to `$ARTIFACTS_DIR/plan.md`. + The directory already exists (pre-created by the workflow executor). ## Step 3: Write the Plan @@ -282,7 +278,7 @@ nodes: ``` ## Plan Created - **File**: `.claude/archon/plans/{slug}.plan.md` + **File**: `$ARTIFACTS_DIR/plan.md` **Tasks**: {count} **Files to change**: {count} @@ -310,13 +306,9 @@ nodes: --- - ## Step 1: Find and Read the Plan + ## Step 1: Read the Plan - ```bash - ls -t .claude/archon/plans/*.plan.md 2>/dev/null | head -1 - ``` - - Read the entire plan file. Also read CLAUDE.md for conventions. + Read `$ARTIFACTS_DIR/plan.md` and CLAUDE.md for conventions. ## Step 2: Process Feedback @@ -375,10 +367,10 @@ nodes: bash: | set -e - PLAN_FILE=$(ls -t .claude/archon/plans/*.plan.md 2>/dev/null | head -1) + PLAN_FILE="$ARTIFACTS_DIR/plan.md" - if [ -z "$PLAN_FILE" ]; then - echo "ERROR: No plan file found in .claude/archon/plans/" + if [ ! -f "$PLAN_FILE" ]; then + echo "ERROR: No plan file found at $ARTIFACTS_DIR/plan.md" exit 1 fi @@ -403,8 +395,12 @@ nodes: echo "" echo "=== PLAN_END ===" - TASK_COUNT=$(grep -c "^### Task [0-9]" "$PLAN_FILE" || true) - echo "TASK_COUNT=${TASK_COUNT:-0}" + TASK_COUNT=$(grep -c "^### Task [0-9]" "$PLAN_FILE" 2>/dev/null || echo "0") + if [ "$TASK_COUNT" -eq 0 ]; then + echo "ERROR: No '### Task N:' sections found in $PLAN_FILE. Plan may be malformed." + exit 1 + fi + echo "TASK_COUNT=${TASK_COUNT}" # ═══════════════════════════════════════════════════════════════ # PHASE 3b: IMPLEMENT — Task-by-Task Loop (Ralph pattern) @@ -447,7 +443,7 @@ nodes: may have changed things. **You MUST re-read from disk:** 1. **Read the plan file** — your implementation guide - 2. **Read progress tracking** — check if `.claude/archon/plans/progress.txt` exists + 2. **Read progress tracking** — check if `$ARTIFACTS_DIR/progress.txt` exists 3. **Read CLAUDE.md** — project conventions and constraints ### 0.3 Check Git State @@ -511,7 +507,7 @@ nodes: )" ``` - Track progress in `.claude/archon/plans/progress.txt`: + Track progress in `$ARTIFACTS_DIR/progress.txt`: ``` ## Task {N}: {title} — COMPLETED Date: {ISO date} @@ -552,11 +548,9 @@ nodes: --- - ## Step 1: Find and Read the Plan + ## Step 1: Read the Plan - ```bash - ls -t .claude/archon/plans/*.plan.md 2>/dev/null | head -1 - ``` + Read `$ARTIFACTS_DIR/plan.md` to understand the intended implementation. ## Step 2: Review All Changes @@ -581,7 +575,7 @@ nodes: Fix type errors, lint warnings, missing imports, formatting. Commit any fixes: ```bash - git add -A && git commit -m "fix: address code review findings" 2>/dev/null || true + git add -A && git commit -m "fix: address code review findings" || true ``` ## Step 6: Present Review @@ -627,11 +621,7 @@ nodes: ## Step 1: Read Context - ```bash - ls -t .claude/archon/plans/*.plan.md 2>/dev/null | head -1 - ``` - - Read the plan file and CLAUDE.md for conventions. + Read `$ARTIFACTS_DIR/plan.md` and CLAUDE.md for conventions. ## Step 2: Process Feedback @@ -710,7 +700,7 @@ nodes: ## Step 1: Push Changes ```bash - git push -u origin HEAD 2>&1 || true + git push -u origin HEAD 2>&1 || echo "WARNING: Push failed — verify remote authentication and branch state before creating the PR." ``` ## Step 2: Generate Summary @@ -720,7 +710,7 @@ nodes: git diff --stat $(git merge-base HEAD $BASE_BRANCH)..HEAD ``` - Read the plan file and progress tracking for context. + Read `$ARTIFACTS_DIR/plan.md` and `$ARTIFACTS_DIR/progress.txt` for context. ## Step 3: Create PR (if not already created) diff --git a/.archon/workflows/defaults/archon-plan-to-pr.yaml b/.archon/workflows/defaults/archon-plan-to-pr.yaml index 067c1a818e..83dbbebd88 100644 --- a/.archon/workflows/defaults/archon-plan-to-pr.yaml +++ b/.archon/workflows/defaults/archon-plan-to-pr.yaml @@ -42,7 +42,7 @@ nodes: command: archon-implement-tasks depends_on: [confirm-plan] context: fresh - model: claude-opus-4-6[1m] + model: opus[1m] # ═══════════════════════════════════════════════════════════════════ # PHASE 4: VALIDATE diff --git a/.archon/workflows/defaults/archon-ralph-dag.yaml b/.archon/workflows/defaults/archon-ralph-dag.yaml index 5c0d7c9099..5482fd5a15 100644 --- a/.archon/workflows/defaults/archon-ralph-dag.yaml +++ b/.archon/workflows/defaults/archon-ralph-dag.yaml @@ -189,7 +189,7 @@ nodes: - id: implement depends_on: [validate-prd] idle_timeout: 600000 - model: claude-opus-4-6[1m] + model: opus[1m] loop: prompt: | # Ralph Agent — Autonomous Story Implementation diff --git a/.archon/workflows/defaults/archon-refactor-safely.yaml b/.archon/workflows/defaults/archon-refactor-safely.yaml index 56bc96ac36..81e4cb5f09 100644 --- a/.archon/workflows/defaults/archon-refactor-safely.yaml +++ b/.archon/workflows/defaults/archon-refactor-safely.yaml @@ -207,7 +207,7 @@ nodes: # ═══════════════════════════════════════════════════════════════ - id: execute-refactor - model: claude-opus-4-6[1m] + model: opus[1m] prompt: | You are executing a refactoring plan with strict safety guardrails. diff --git a/.archon/workflows/defaults/archon-workflow-builder.yaml b/.archon/workflows/defaults/archon-workflow-builder.yaml index a311b8d970..a12758b0ec 100644 --- a/.archon/workflows/defaults/archon-workflow-builder.yaml +++ b/.archon/workflows/defaults/archon-workflow-builder.yaml @@ -61,7 +61,8 @@ nodes: 5. Whether this should be a simple DAG or include a loop node Be specific and concrete. Each proposed node should have a clear type - (bash, prompt, command, or loop) and a one-line description of what it does. + (bash, prompt, command, script, loop, or approval) and a one-line + description of what it does. model: haiku allowed_tools: [] output_format: @@ -115,7 +116,7 @@ nodes: nodes: - id: node-id-kebab-case - # Choose ONE of: prompt, bash, command, loop + # Choose ONE of: prompt, bash, command, script, loop, approval # --- prompt node (AI-executed) --- prompt: | @@ -131,6 +132,17 @@ nodes: # --- command node (references a .archon/commands/ file) --- command: command-name + # --- script node (TypeScript via bun, or Python via uv — no AI, stdout = $.output) --- + # Use for deterministic data transforms the shell would mangle (JSON parsing, etc.) + script: | + // JSON is valid JS expression syntax — assign directly (String.raw breaks on backticks) + const data = $other-node.output; + console.log(JSON.stringify({ count: data.items.length })); + runtime: bun # required: 'bun' (.ts/.js) or 'uv' (.py) + # deps: [requests] # uv only + # Or reference a named script in .archon/scripts/: + # script: extract-labels # no extension; bun resolves .ts/.js, uv resolves .py + # --- loop node (iterative AI execution) --- loop: prompt: | @@ -139,17 +151,22 @@ nodes: max_iterations: 10 fresh_context: true # optional: reset context each iteration + # --- approval node (human gate — pauses workflow) --- + approval: + message: "Review the plan above. Approve to continue." + # capture_response: true # store reviewer comment as $.output + # Common options for all node types: depends_on: [other-node-id] # DAG edges when: "$.output == 'value'" # conditional execution trigger_rule: all_success # all_success | one_success | all_done - timeout: 120000 # ms, for bash nodes + timeout: 120000 # ms, for bash and script nodes ``` ## Variable Reference - `$ARGUMENTS` — user's input text - `$ARTIFACTS_DIR` — pre-created directory for workflow artifacts - - `$.output` — stdout from a bash node or AI response from a prompt node + - `$.output` — stdout from a bash/script node or AI response from a prompt node - `$.output.field` — JSON field from a node with output_format - `$BASE_BRANCH` — base git branch @@ -158,12 +175,20 @@ nodes: 2. The `description:` MUST follow the "Use when / Triggers / Does / NOT for" pattern 3. Every node MUST have a unique kebab-case `id` 4. Use `depends_on` to define execution order - 5. Use `bash` nodes for deterministic operations (file checks, git commands, installs) - 6. Use `prompt` nodes for AI reasoning tasks - 7. Use `output_format` on prompt nodes when downstream nodes need structured data - 8. Use `allowed_tools: []` on classification/analysis nodes that don't need tools - 9. Use `denied_tools: [Edit, Bash]` when a node should only use Write (not edit existing files) - 10. Prefer `model: haiku` for simple classification tasks to save cost + 5. Use `bash` nodes for deterministic shell operations (file checks, git commands, installs) + 6. Use `script` nodes for typed data transforms (TypeScript JSON parsing, Python with deps) + — stdout is captured as output, stderr is forwarded as a warning. + $nodeId.output is NOT shell-quoted in script bodies. + - **TypeScript/bun**: assign directly — `const data = $nodeId.output;` + (JSON is valid JS expression syntax; avoid String.raw — it breaks on backticks) + - **Python/uv**: use json.loads — `import json; data = json.loads("""$nodeId.output""")` + Never interpolate into shell syntax. + 7. Use `prompt` nodes for AI reasoning tasks + 8. Use `approval` nodes to pause for human review at risky gates (plan→execute boundary, destructive actions) + 9. Use `output_format` on prompt nodes when downstream nodes need structured data + 10. Use `allowed_tools: []` on classification/analysis nodes that don't need tools + 11. Use `denied_tools: [Edit, Bash]` when a node should only use Write (not edit existing files) + 12. Prefer `model: haiku` for simple classification tasks to save cost ## Output diff --git a/.archon/workflows/experimental/archon-fix-github-issue-experimental.yaml b/.archon/workflows/experimental/archon-fix-github-issue-experimental.yaml new file mode 100644 index 0000000000..f94d496d46 --- /dev/null +++ b/.archon/workflows/experimental/archon-fix-github-issue-experimental.yaml @@ -0,0 +1,440 @@ +name: archon-fix-github-issue-experimental +description: | + EXPERIMENTAL: Path A variant of archon-fix-github-issue. Same DAG shape — same nodes, + same dependencies, same command files. Additions: + - Two extra classifier fields: `scope` (small/medium/large) and `needs_external_research`. + - A new `smoke-validate` node that checks the issue's concrete claims (file paths, + line numbers, symbols, repro commands) against the current codebase before any + skip gate fires. Every skip gate has a `claims_accurate == 'false'` override so an + inaccurate issue cannot cause a skip. + - `when:` gates on web-research and 4 reviewers so small, claim-verified issues + skip them. For medium/large issues or when the issue claims don't match the code, + behavior is identical to the full workflow. + + Skip gates (all overridden when smoke-validate flags the issue as inaccurate): + - web-research → runs when needs_external_research=='true' OR smoke=='false' + - error-handling → runs when review-classify says yes AND (scope!='small' OR smoke=='false') + - test-coverage → same as error-handling + - comment-quality → same as error-handling + - docs-impact → same as error-handling + + Always runs (same as full): classify, smoke-validate, investigate/plan, bridge-artifacts, + implement, validate, create-pr, review-scope, review-classify, code-review, synthesize, + self-fix, simplify, report. + + Use when: User wants to FIX, RESOLVE, or IMPLEMENT a solution for a GitHub issue. + Triggers: "fix this issue", "implement issue #123", "resolve this bug", "fix it", + "fix issue", "resolve issue", "fix #123". + NOT for: Comprehensive multi-agent reviews (use archon-issue-review-full), + questions about issues, CI failures, PR reviews, general exploration. + + DAG workflow that: + 1. Classifies the issue (bug/feature/enhancement/etc) + 2. Researches context (web research + codebase exploration via investigate/plan) + 3. Routes to investigate (bugs) or plan (features) based on classification + 4. Implements the fix/feature with validation + 5. Creates a draft PR using the repo's PR template + 6. Runs smart review (always code review + CLAUDE.md check, conditional additional agents) + 7. Aggressively self-fixes all findings (tests, docs, error handling) + 8. Simplifies changed code (implements fixes directly, not just reports) + 9. Reports results back to the GitHub issue with follow-up suggestions + +provider: claude +model: sonnet + +nodes: + # ═══════════════════════════════════════════════════════════════ + # PHASE 1: FETCH & CLASSIFY + # ═══════════════════════════════════════════════════════════════ + + - id: extract-issue-number + prompt: | + Find the GitHub issue number for this request. + + Request: $ARGUMENTS + + Rules: + - If the message contains an explicit issue number (e.g., "#709", "issue 709", "709"), extract that number. + - If the message is ambiguous (e.g., "fix the SQLite timestamp bug"), use `gh issue list` to search for matching issues and pick the best match. + + CRITICAL: Your final output must be ONLY the bare number with no quotes, no markdown, no explanation. Example correct output: 709 + + - id: fetch-issue + bash: | + # Strip quotes, whitespace, markdown backticks from AI output + ISSUE_NUM=$(echo "$extract-issue-number.output" | tr -d "'\"\`\n " | grep -oE '[0-9]+' | head -1) + if [ -z "$ISSUE_NUM" ]; then + echo "Failed to extract issue number from: $extract-issue-number.output" >&2 + exit 1 + fi + gh issue view "$ISSUE_NUM" --json title,body,labels,comments,state,url,author + depends_on: [extract-issue-number] + + - id: classify + prompt: | + You are an issue classifier. Analyze the GitHub issue below and determine: + (1) its type, (2) its scope, and (3) whether external web research is needed. + + ## Issue Content + + $fetch-issue.output + + ## Type + + | Type | Indicators | + |------|------------| + | bug | "broken", "error", "crash", "doesn't work", stack traces, regression | + | feature | "add", "new", "support", "would be nice", net-new capability | + | enhancement | "improve", "better", "update existing", "extend", incremental improvement | + | refactor | "clean up", "simplify", "reorganize", "restructure" | + | chore | "update deps", "upgrade", "maintenance", "CI/CD" | + | documentation | "docs", "readme", "clarify", "examples" | + + ## Scope + + Estimate how much code the fix is likely to touch. The issue body is your best + signal — reporter-pointed file paths, length of the reproducer, how specific the + request is. When uncertain, round UP (pick the larger scope). + + | Scope | Indicators | + |-------|------------| + | small | 1-3 files, single subsystem, clear from the body. Typos, one-line bugs, isolated refactors, doc fixes, small enhancements pointing at specific code. | + | medium | 3-10 files, one or two subsystems, some investigation needed. Most features, non-trivial bugs, refactors that cross a few files. | + | large | 10+ files, cross-subsystem, vague/exploratory, or requires real codebase discovery before a fix direction is clear. | + + ## External Research + + Does this issue need external (web) research to fix correctly? Say "true" only if + the fix depends on specifics of an external library, API, protocol, or standard + that are NOT already apparent from the codebase. Internal plumbing, refactoring, + obvious bug fixes, and issues where the reporter already cited the relevant docs + → "false". + + Provide reasoning that covers all three decisions. + depends_on: [fetch-issue] + model: haiku + allowed_tools: [] + output_format: + type: object + properties: + issue_type: + type: string + enum: ["bug", "feature", "enhancement", "refactor", "chore", "documentation"] + title: + type: string + scope: + type: string + enum: ["small", "medium", "large"] + needs_external_research: + type: string + enum: ["true", "false"] + reasoning: + type: string + required: [issue_type, title, scope, needs_external_research, reasoning] + + # ═══════════════════════════════════════════════════════════════ + # PHASE 1.5: SMOKE-VALIDATE + # Verifies that the issue's concrete claims (file paths, line numbers, + # symbols, repro commands) match the current codebase. Its `claims_accurate` + # verdict gates every skip decision downstream — if the issue body is + # inaccurate, the workflow falls back to the full pipeline. + # ═══════════════════════════════════════════════════════════════ + + - id: smoke-validate + prompt: | + You are a smoke validator. Your job: verify that the issue's claims about the + code are ACCURATE, so downstream skip decisions rest on a reliable foundation. + + ## Context + + ### Issue content + $fetch-issue.output + + ### Classifier verdict + $classify.output + + ## Your Task + + Extract the concrete, verifiable claims from the issue body and comments: + - File paths mentioned (e.g. "packages/core/src/foo.ts") + - Line numbers or specific code snippets quoted + - Function, class, type, or symbol names referenced + - Reproduction commands (e.g. "run bun test X") + + Then verify each concrete claim against the current codebase — TARGETED checks, + no Explore sub-agent: + - Use the Read tool on cited file paths. Confirm the file exists. + - If a line or region is cited, Read it and check the described code is there. + - If a symbol is cited, `grep -rn "" packages/` to confirm it exists. + - If a repro command is cited, check `package.json` / the referenced file to + confirm the command is plausible. Do NOT execute it. + + ## Budget + + Spend at most ~30 seconds on this. Check the 2-3 most concrete claims — the + ones the fix most likely hinges on. Don't exhaustively verify every mention. + Prefer false-negative safety (flag inaccurate when uncertain) over + false-positive (risking a skip on shaky evidence). + + If the issue has NO concrete claims (purely descriptive — "feature X is broken", + no file paths, no line numbers, no symbols), default to `claims_accurate: "false"`. + Vibes aren't a reliable foundation for skipping work. + + ## Output + + Set `claims_accurate`: + - "true": The concrete claims you checked match the current code. The issue body + is a reliable spec — downstream gates can trust the classifier's skip verdict. + - "false": One or more claims don't match reality — cited file doesn't exist, the + line doesn't contain the described code, the symbol was renamed/removed, the + repro command doesn't fit the project. The issue body is NOT a reliable + foundation for skipping. Downstream gates will fall back to the full pipeline + (research + all review agents). + + In `reasoning`, list exactly what you checked and what you found. + depends_on: [classify] + context: fresh + output_format: + type: object + properties: + claims_accurate: + type: string + enum: ["true", "false"] + reasoning: + type: string + required: [claims_accurate, reasoning] + + # ═══════════════════════════════════════════════════════════════ + # PHASE 2: RESEARCH (parallel with PR template fetch) + # ═══════════════════════════════════════════════════════════════ + + - id: web-research + command: archon-web-research + depends_on: [classify, smoke-validate] + # Runs when research is flagged OR smoke-validate finds the issue unreliable (fallback) + when: "$classify.output.needs_external_research == 'true' || $smoke-validate.output.claims_accurate == 'false'" + context: fresh + + # ═══════════════════════════════════════════════════════════════ + # PHASE 3: INVESTIGATE (bugs) / PLAN (features) + # ═══════════════════════════════════════════════════════════════ + + - id: investigate + command: archon-investigate-issue + depends_on: [classify, web-research] + when: "$classify.output.issue_type == 'bug'" + # Allow web-research to be skipped (needs_external_research == 'false') without blocking + trigger_rule: none_failed_min_one_success + context: fresh + + - id: plan + command: archon-create-plan + depends_on: [classify, web-research] + when: "$classify.output.issue_type != 'bug'" + # Allow web-research to be skipped (needs_external_research == 'false') without blocking + trigger_rule: none_failed_min_one_success + context: fresh + + # Bridge: ensure investigation.md exists for the implement step + # archon-fix-issue reads from $ARTIFACTS_DIR/investigation.md + # archon-create-plan writes to $ARTIFACTS_DIR/plan.md + # This node copies plan.md → investigation.md when the plan path was taken + - id: bridge-artifacts + bash: | + if [ -f "$ARTIFACTS_DIR/plan.md" ] && [ ! -f "$ARTIFACTS_DIR/investigation.md" ]; then + cp "$ARTIFACTS_DIR/plan.md" "$ARTIFACTS_DIR/investigation.md" + echo "Bridged plan.md to investigation.md for implement step" + elif [ -f "$ARTIFACTS_DIR/investigation.md" ]; then + echo "investigation.md exists from investigate step" + else + echo "WARNING: No investigation.md or plan.md found — implement may fail" + fi + depends_on: [investigate, plan] + trigger_rule: one_success + + # ═══════════════════════════════════════════════════════════════ + # PHASE 4: IMPLEMENT + # ═══════════════════════════════════════════════════════════════ + + - id: implement + command: archon-fix-issue + depends_on: [bridge-artifacts] + context: fresh + model: opus[1m] + + # ═══════════════════════════════════════════════════════════════ + # PHASE 5: VALIDATE + # ═══════════════════════════════════════════════════════════════ + + - id: validate + command: archon-validate + depends_on: [implement] + context: fresh + + # ═══════════════════════════════════════════════════════════════ + # PHASE 6: CREATE DRAFT PR + # ═══════════════════════════════════════════════════════════════ + + - id: create-pr + prompt: | + Create a draft pull request for the current branch. + + ## Context + + - **Issue**: $ARGUMENTS + - **Classification**: $classify.output + - **Issue title**: $classify.output.title + + ## Instructions + + 1. Check git status — ensure all changes are committed. If uncommitted changes exist, stage and commit them. + 2. Push the branch: `git push -u origin HEAD` + 3. Read implementation artifacts from `$ARTIFACTS_DIR/` for context: + - `$ARTIFACTS_DIR/investigation.md` or `$ARTIFACTS_DIR/plan.md` + - `$ARTIFACTS_DIR/implementation.md` + - `$ARTIFACTS_DIR/validation.md` + 4. Check if a PR already exists for this branch: `gh pr list --head $(git branch --show-current)` + - If PR exists, skip creation and capture its number + 5. Look for the project's PR template at `.github/pull_request_template.md`, `.github/PULL_REQUEST_TEMPLATE.md`, or `docs/PULL_REQUEST_TEMPLATE.md`. Read whichever one exists. + 6. Create a DRAFT PR: `gh pr create --draft --base $BASE_BRANCH` + - Title: concise, imperative mood, under 70 chars + - Body: if a PR template was found, fill in **every section** with details from the artifacts. Don't skip sections or leave placeholders. If no template, write a body with summary, changes, validation evidence, and `Fixes #...`. + - Link to issue: include `Fixes #...` or `Closes #...` + 7. Capture PR identifiers: + ```bash + PR_NUMBER=$(gh pr view --json number -q '.number') + echo "$PR_NUMBER" > "$ARTIFACTS_DIR/.pr-number" + PR_URL=$(gh pr view --json url -q '.url') + echo "$PR_URL" > "$ARTIFACTS_DIR/.pr-url" + ``` + depends_on: [validate] + context: fresh + + # ═══════════════════════════════════════════════════════════════ + # PHASE 7: REVIEW + # ═══════════════════════════════════════════════════════════════ + + - id: review-scope + command: archon-pr-review-scope + depends_on: [create-pr] + context: fresh + + - id: review-classify + prompt: | + You are a PR review classifier. Analyze the PR scope and determine + which review agents should run. + + ## PR Scope + + $review-scope.output + + ## Rules + + - **Code review**: ALWAYS run. This is mandatory for every PR. It also checks + the PR against CLAUDE.md rules and project conventions. + - **Error handling**: Run if the diff touches code with try/catch, error handling, + async/await, or adds new failure paths. + - **Test coverage**: Run if the diff touches source code (not just tests, docs, or config). + - **Comment quality**: Run if the diff adds or modifies comments, docstrings, JSDoc, + or significant documentation within code files. + - **Docs impact**: Run if the diff adds/removes/renames public APIs, commands, CLI flags, + environment variables, or user-facing features. + + Provide your reasoning for each decision. + depends_on: [review-scope] + model: haiku + allowed_tools: [] + context: fresh + output_format: + type: object + properties: + run_code_review: + type: string + enum: ["true", "false"] + run_error_handling: + type: string + enum: ["true", "false"] + run_test_coverage: + type: string + enum: ["true", "false"] + run_comment_quality: + type: string + enum: ["true", "false"] + run_docs_impact: + type: string + enum: ["true", "false"] + reasoning: + type: string + required: + - run_code_review + - run_error_handling + - run_test_coverage + - run_comment_quality + - run_docs_impact + - reasoning + + # Code review always runs — mandatory + - id: code-review + command: archon-code-review-agent + depends_on: [review-classify] + context: fresh + + # Reviewer gates: run when review-classify flags them AND the scope is non-small, + # OR when smoke-validate found the issue claims unreliable (fallback to full review). + # Expression form: A && B || A && C (the condition evaluator has no parens; && binds tighter than ||) + - id: error-handling + command: archon-error-handling-agent + depends_on: [review-classify] + when: "$review-classify.output.run_error_handling == 'true' && $classify.output.scope != 'small' || $review-classify.output.run_error_handling == 'true' && $smoke-validate.output.claims_accurate == 'false'" + context: fresh + + - id: test-coverage + command: archon-test-coverage-agent + depends_on: [review-classify] + when: "$review-classify.output.run_test_coverage == 'true' && $classify.output.scope != 'small' || $review-classify.output.run_test_coverage == 'true' && $smoke-validate.output.claims_accurate == 'false'" + context: fresh + + - id: comment-quality + command: archon-comment-quality-agent + depends_on: [review-classify] + when: "$review-classify.output.run_comment_quality == 'true' && $classify.output.scope != 'small' || $review-classify.output.run_comment_quality == 'true' && $smoke-validate.output.claims_accurate == 'false'" + context: fresh + + - id: docs-impact + command: archon-docs-impact-agent + depends_on: [review-classify] + when: "$review-classify.output.run_docs_impact == 'true' && $classify.output.scope != 'small' || $review-classify.output.run_docs_impact == 'true' && $smoke-validate.output.claims_accurate == 'false'" + context: fresh + + # ═══════════════════════════════════════════════════════════════ + # PHASE 8: SYNTHESIZE + SELF-FIX + # ═══════════════════════════════════════════════════════════════ + + - id: synthesize + command: archon-synthesize-review + depends_on: [code-review, error-handling, test-coverage, comment-quality, docs-impact] + trigger_rule: one_success + context: fresh + + - id: self-fix + command: archon-self-fix-all + depends_on: [synthesize] + context: fresh + + # ═══════════════════════════════════════════════════════════════ + # PHASE 9: SIMPLIFY + # ═══════════════════════════════════════════════════════════════ + + - id: simplify + command: archon-simplify-changes + depends_on: [self-fix] + context: fresh + + # ═══════════════════════════════════════════════════════════════ + # PHASE 10: REPORT + # ═══════════════════════════════════════════════════════════════ + + - id: report + command: archon-issue-completion-report + depends_on: [simplify] + context: fresh diff --git a/.archon/workflows/experimental/archon-release.yaml b/.archon/workflows/experimental/archon-release.yaml new file mode 100644 index 0000000000..afd8681f79 --- /dev/null +++ b/.archon/workflows/experimental/archon-release.yaml @@ -0,0 +1,946 @@ +name: archon-release +description: | + Use when: User says "/release", "release", "cut a release", "ship it", + "release to main", or asks to release the project. + Triggers: "/release", "/release minor", "/release major", "ship it", + "release patch", "release minor", "release major". + Does: Cuts a release from the dev branch end-to-end. Validates state, + smoke-tests the compiled binary, bumps version, drafts a changelog + from commits via AI, gets human approval, then commits, opens a PR, + tags after merge, creates the GitHub release, and updates the + Homebrew formula and tap. + NOT for: Hotfix recovery from a broken release CI run (manual recovery + path), publishing release notes only, retroactive tagging. + + Pass `--dry-run` (or `dry-run`) anywhere in the message to preview every + step without touching git, GitHub, the filesystem, or any remote state. + Bump type defaults to `patch`. Accepts: `patch`, `minor`, `major`. + + Examples: + archon workflow run archon-release "" # patch release + archon workflow run archon-release "minor" # minor release + archon workflow run archon-release "patch --dry-run" # dry-run patch + archon workflow run archon-release "--dry-run" # dry-run patch (implicit) + +provider: claude +model: sonnet +interactive: true # required: has approval gates + +worktree: + enabled: false # operates on the live dev branch — never use a worktree + +nodes: + # ═══════════════════════════════════════════════════════════════════ + # PHASE 1 — Parse args and validate preconditions (always run) + # ═══════════════════════════════════════════════════════════════════ + + - id: parse-args + script: | + const raw = String.raw`$ARGUMENTS`.trim().toLowerCase(); + const tokens = raw.split(/\s+/).filter(Boolean); + const dryRun = tokens.includes("--dry-run") || tokens.includes("dry-run"); + const bumpToken = tokens.find((t) => ["patch", "minor", "major"].includes(t)); + const bump = bumpToken ?? "patch"; + // dryRun is stringified as "true"/"false" so `when:` can compare against quoted strings + console.log(JSON.stringify({ bump, dryRun: String(dryRun) })); + runtime: bun + timeout: 5000 + + - id: validate-state + bash: | + set -euo pipefail + + echo "::: Validating release preconditions :::" + echo "Bump: $parse-args.output.bump" + echo "Dry run: $parse-args.output.dryRun" + echo + + git fetch origin --quiet + git checkout dev + git pull origin dev --ff-only --quiet + + # Only check TRACKED files for modifications. Untracked files don't + # affect a release because `git add -u` in commit-and-push won't pick + # them up. Being strict about untracked files would block this very + # workflow on the first run (the workflow YAML is itself untracked). + if ! git diff --quiet || ! git diff --cached --quiet; then + echo "ERROR: tracked files have uncommitted changes. Commit or stash before releasing." + git status --short + exit 1 + fi + + untracked=$(git ls-files --others --exclude-standard) + if [ -n "$untracked" ]; then + echo "WARNING: untracked files present (will NOT be included in release commit):" + echo "$untracked" | sed 's/^/ /' + echo + fi + + echo "OK: on dev, tracked files clean, fast-forwarded to origin/dev" + timeout: 60000 + depends_on: [parse-args] + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 2 — Pre-flight compiled-binary smoke test (always run) + # ─────────────────────────────────────────────────────────────────── + # Mirrors release skill Step 1.5. Catches bundler regressions before + # the tag is pushed. If this fails, abort immediately. + # ═══════════════════════════════════════════════════════════════════ + + - id: preflight-smoke + script: | + import { spawnSync } from "node:child_process"; + import { mkdtempSync, existsSync, rmSync } from "node:fs"; + import { tmpdir } from "node:os"; + import { join } from "node:path"; + + const result = { passed: "true", skipped: "false", reason: "" }; + + if (!existsSync("scripts/build-binaries.sh") || !existsSync("packages/cli/src/cli.ts")) { + result.skipped = "true"; + result.reason = "Not a Bun CLI project — pre-flight smoke skipped."; + console.log(JSON.stringify(result)); + process.exit(0); + } + + const dir = mkdtempSync(join(tmpdir(), "release-smoke-")); + const binaryPath = join(dir, "archon-smoke"); + + try { + const build = spawnSync( + "bun", + ["build", "--compile", "--minify", "--target=bun", `--outfile=${binaryPath}`, "packages/cli/src/cli.ts"], + { encoding: "utf-8", stdio: "pipe" }, + ); + + if (build.status !== 0) { + result.passed = "false"; + result.reason = `bun build --compile failed (exit ${build.status}):\n${build.stderr || build.stdout}`; + console.log(JSON.stringify(result)); + process.exit(0); + } + + // --help instead of `version` because version's compiled-binary branch + // requires BUNDLED_IS_BINARY=true, which scripts/build-binaries.sh sets + // but a bare `bun build --compile` does not. + const run = spawnSync(binaryPath, ["--help"], { encoding: "utf-8", timeout: 30000 }); + const out = `${run.stdout || ""}${run.stderr || ""}`; + + if (run.status !== 0) { + result.passed = "false"; + result.reason = `compiled binary crashed at startup (exit ${run.status}):\n${out}`; + console.log(JSON.stringify(result)); + process.exit(0); + } + + if (/Expected CommonJS module|TypeError:|ReferenceError:|SyntaxError:/.test(out)) { + result.passed = "false"; + result.reason = `compiled binary emitted runtime error despite exit 0:\n${out}`; + console.log(JSON.stringify(result)); + process.exit(0); + } + + result.reason = "Pre-flight binary smoke: PASSED"; + console.log(JSON.stringify(result)); + } finally { + rmSync(dir, { recursive: true, force: true }); + } + runtime: bun + timeout: 180000 + depends_on: [validate-state] + + - id: abort-if-smoke-failed + cancel: | + Pre-flight compiled-binary smoke test FAILED. The release is aborted + before any version bump, commit, tag, or PR is created. + + Common causes: + - Bun --bytecode producing invalid output for the current module graph + - A dependency reading package.json or other files at module top level + - Circular imports that break under minification + - A new package shipping CJS with an unusual wrapper shape + + Fix the underlying issue on a feature branch, merge to dev, then re-run /release. + The smoke test output is in the run log — check the preflight-smoke node. + when: "$preflight-smoke.output.passed == 'false'" + depends_on: [preflight-smoke] + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 3 — Detect stack, compute next version, collect commits + # ═══════════════════════════════════════════════════════════════════ + + - id: detect-stack + script: | + import { readFileSync, existsSync } from "node:fs"; + + const candidates = [ + { file: "package.json", stack: "node", extract: (s) => JSON.parse(s).version }, + { file: "pyproject.toml", stack: "python", extract: (s) => s.match(/^version\s*=\s*"([^"]+)"/m)?.[1] }, + { file: "Cargo.toml", stack: "rust", extract: (s) => s.match(/^version\s*=\s*"([^"]+)"/m)?.[1] }, + ]; + + for (const { file, stack, extract } of candidates) { + if (!existsSync(file)) continue; + const contents = readFileSync(file, "utf-8"); + const version = extract(contents); + if (!version) { + console.error(`Found ${file} but could not parse version field.`); + process.exit(1); + } + console.log(JSON.stringify({ stack, versionFile: file, currentVersion: version })); + process.exit(0); + } + + console.error("No supported version file found (package.json, pyproject.toml, Cargo.toml)."); + process.exit(1); + runtime: bun + timeout: 10000 + depends_on: [preflight-smoke] + + - id: bump-version + script: | + const stack = JSON.parse(String.raw`$detect-stack.output`); + const args = JSON.parse(String.raw`$parse-args.output`); + + const m = stack.currentVersion.match(/^(\d+)\.(\d+)\.(\d+)/); + if (!m) { + console.error(`Cannot parse semver from current version: ${stack.currentVersion}`); + process.exit(1); + } + let [, major, minor, patch] = m.map(Number); + + switch (args.bump) { + case "major": major += 1; minor = 0; patch = 0; break; + case "minor": minor += 1; patch = 0; break; + case "patch": patch += 1; break; + default: + console.error(`Unknown bump type: ${args.bump}`); + process.exit(1); + } + + const newVersion = `${major}.${minor}.${patch}`; + console.log(JSON.stringify({ + oldVersion: stack.currentVersion, + newVersion, + bump: args.bump, + stack: stack.stack, + versionFile: stack.versionFile, + })); + runtime: bun + timeout: 5000 + depends_on: [detect-stack, parse-args] + + - id: collect-commits + bash: | + set -euo pipefail + commits=$(git log main..dev --oneline --no-merges) + if [ -z "$commits" ]; then + echo "NO_COMMITS" + exit 0 + fi + echo "$commits" + timeout: 15000 + depends_on: [validate-state] + + - id: abort-if-no-commits + cancel: "Nothing to release — dev has no commits ahead of main." + when: "$collect-commits.output == 'NO_COMMITS'" + depends_on: [collect-commits] + + - id: collect-diff-stat + bash: | + git diff --stat main..dev | tail -60 + timeout: 15000 + depends_on: [validate-state] + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 4 — AI drafts the changelog from commits + diff (always runs) + # ═══════════════════════════════════════════════════════════════════ + + - id: draft-changelog + prompt: | + You are drafting a CHANGELOG entry for the upcoming release. + + Bumping `$bump-version.output.oldVersion` -> `$bump-version.output.newVersion` + (bump type: $bump-version.output.bump). + + Commits being shipped (oneline, no merges): + + ``` + $collect-commits.output + ``` + + Diff stat: + + ``` + $collect-diff-stat.output + ``` + + Categorize commits into Keep a Changelog sections: Added, Changed, Fixed, + Removed. Rules: + + - Rewrite commit subjects into clear user-facing changelog entries. Do NOT + copy commit messages verbatim. + - Group related commits into single entries where it makes sense. + - Each entry starts with a noun or gerund describing WHAT changed. + - Skip internal-only changes (CI tweaks, typo fixes) unless they affect + user-visible behavior. + - Include PR numbers in parentheses when visible: `(#12)`. + - Write a one-line summary that captures the release theme. + - No emoji. No AI attribution. No "Co-Authored-By". + - Empty arrays are fine if a category has no entries. + + Return strictly valid JSON matching the schema. + depends_on: [bump-version, collect-commits, collect-diff-stat] + allowed_tools: [] + output_format: + type: object + properties: + summary: + type: string + description: One-line summary of the release theme + added: + type: array + items: { type: string } + changed: + type: array + items: { type: string } + fixed: + type: array + items: { type: string } + removed: + type: array + items: { type: string } + required: [summary, added, changed, fixed, removed] + + # Bridge: persist draft-changelog's AI output to disk via auto-shell-quoted + # bash, so downstream SCRIPT nodes can read the JSON via fs instead of + # String.raw template substitution. Necessary because AI-generated content + # routinely contains backticks (markdown code spans) that would terminate + # a JS template literal mid-string. + # + # CRITICAL: do NOT wrap $draft-changelog.output in your own quotes. Archon + # already wraps it in single quotes via shellQuote(). Adding your own quotes + # like '$node.output' produces '''' which collapses to bare unquoted + # JSON, and bash brace-expands the {...} into separate words. + - id: save-draft-json + bash: | + mkdir -p "$ARTIFACTS_DIR" + printf '%s' $draft-changelog.output > "$ARTIFACTS_DIR/draft-changelog.json" + echo "wrote $ARTIFACTS_DIR/draft-changelog.json ($(wc -c < $ARTIFACTS_DIR/draft-changelog.json) bytes)" + timeout: 10000 + depends_on: [draft-changelog] + + - id: format-changelog + script: | + import { mkdirSync, writeFileSync, readFileSync } from "node:fs"; + import { join } from "node:path"; + + const artifactsDir = String.raw`$ARTIFACTS_DIR`; + // Read AI output from disk (file bridge) — see save-draft-json + // for why we don't use String.raw on $draft-changelog.output directly. + const cl = JSON.parse(readFileSync(join(artifactsDir, "draft-changelog.json"), "utf-8")); + // bump-version and parse-args produce safe deterministic JSON; substitution OK. + const ver = JSON.parse(String.raw`$bump-version.output`); + const args = JSON.parse(String.raw`$parse-args.output`); + + const today = new Date().toISOString().slice(0, 10); + + const sections = [ + ["Added", cl.added], + ["Changed", cl.changed], + ["Fixed", cl.fixed], + ["Removed", cl.removed], + ]; + + let md = `## [${ver.newVersion}] - ${today}\n\n${cl.summary}\n`; + for (const [name, items] of sections) { + if (!items?.length) continue; + md += `\n### ${name}\n\n`; + for (const it of items) md += `- ${it}\n`; + } + + // Persist a copy to the run's artifacts so the user has a record. + // Reuses `artifactsDir` declared above for reading draft-changelog.json. + try { + mkdirSync(artifactsDir, { recursive: true }); + writeFileSync(join(artifactsDir, "changelog-section.md"), md); + } catch (e) { + console.error(`(non-fatal) could not write artifact: ${e.message}`); + } + + console.log(JSON.stringify({ + rendered: md, + oldVersion: ver.oldVersion, + newVersion: ver.newVersion, + bump: ver.bump, + dryRun: args.dryRun, + stack: ver.stack, + versionFile: ver.versionFile, + })); + runtime: bun + timeout: 10000 + depends_on: [save-draft-json, draft-changelog, bump-version, parse-args] + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 5 — Human approval gate (always runs) + # ─────────────────────────────────────────────────────────────────── + # In dry-run mode the workflow stops here cleanly; in full mode it + # proceeds to write files and create the PR. + # ═══════════════════════════════════════════════════════════════════ + + # ── Pre-approval summary ── + # Approval messages don't get variable substitution today, so we emit + # the dynamic summary as a Haiku prompt-node output (which DOES get + # substituted and streams to chat). Cheap pass-through, ~200 tokens. + - id: review-summary + prompt: | + Reply to the user with EXACTLY this text, verbatim, no elaboration, + no markdown-rendering, no commentary, no questions. Just print it. + + ══════════════════════════════════════════════════════════════════ + RELEASE REVIEW + ══════════════════════════════════════════════════════════════════ + + Version : $bump-version.output.oldVersion → $bump-version.output.newVersion + Bump : $bump-version.output.bump + Dry run : $parse-args.output.dryRun + + ── Proposed CHANGELOG ─────────────────────────────────────────── + + $format-changelog.output.rendered + + ── Commits being shipped ──────────────────────────────────────── + + $collect-commits.output + + ══════════════════════════════════════════════════════════════════ + + Reply with `/workflow approve ` to continue, or + `/workflow reject ` to abort. + depends_on: [format-changelog, collect-commits, bump-version, parse-args] + model: haiku + allowed_tools: [] + + - id: review-changelog + approval: + message: | + Approve the release review above. + + - In dry-run mode the workflow ends here without modifying any files. + - In full mode approval triggers: write files, commit + push to dev, + open a PR dev → main, then pause again before tag/release. + depends_on: [review-summary] + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 6 — Apply local file changes (skipped in --dry-run) + # ═══════════════════════════════════════════════════════════════════ + + - id: write-files + script: | + import { readFileSync, writeFileSync, existsSync } from "node:fs"; + import { execSync } from "node:child_process"; + import { join } from "node:path"; + + // bump-version is safe to substitute (deterministic JSON, no backticks). + const ver = JSON.parse(String.raw`$bump-version.output`); + // format-changelog.output.rendered contains AI-authored markdown with + // backticks → unsafe via String.raw. Read the .md file from disk instead. + const artifactsDir = String.raw`$ARTIFACTS_DIR`; + const renderedMd = readFileSync(join(artifactsDir, "changelog-section.md"), "utf-8"); + const fmt = { rendered: renderedMd }; + + const written = []; + + // 1. Bump the version file + switch (ver.stack) { + case "node": { + const pkg = JSON.parse(readFileSync(ver.versionFile, "utf-8")); + pkg.version = ver.newVersion; + writeFileSync(ver.versionFile, JSON.stringify(pkg, null, 2) + "\n"); + break; + } + case "python": + case "rust": { + const original = readFileSync(ver.versionFile, "utf-8"); + const updated = original.replace(/^(version\s*=\s*")[^"]+(")/m, `$1${ver.newVersion}$2`); + if (updated === original) throw new Error(`Failed to update version in ${ver.versionFile}`); + writeFileSync(ver.versionFile, updated); + break; + } + default: + throw new Error(`Unknown stack: ${ver.stack}`); + } + written.push(ver.versionFile); + + // 2. Workspace version sync (monorepo only) + if (existsSync("scripts/sync-versions.sh")) { + execSync("bash scripts/sync-versions.sh", { stdio: "inherit" }); + // Stage workspace package.json files explicitly downstream + written.push("packages/*/package.json"); + } + + // 3. Lockfile refresh + const lockfileCommands = { + node: existsSync("bun.lock") ? ["bun", "install"] : + existsSync("package-lock.json") ? ["npm", "install", "--package-lock-only"] : null, + python: existsSync("uv.lock") ? ["uv", "lock", "--quiet"] : null, + rust: ["cargo", "update", "--workspace"], + }[ver.stack]; + + if (lockfileCommands) { + execSync(lockfileCommands.join(" "), { stdio: "inherit" }); + const lockFile = { + node: existsSync("bun.lock") ? "bun.lock" : "package-lock.json", + python: "uv.lock", + rust: "Cargo.lock", + }[ver.stack]; + if (lockFile && existsSync(lockFile)) written.push(lockFile); + } + + // 4. Update CHANGELOG.md — prepend the new section under [Unreleased] + const changelogPath = "CHANGELOG.md"; + let changelog = existsSync(changelogPath) + ? readFileSync(changelogPath, "utf-8") + : "# Changelog\n\nAll notable changes to this project will be documented in this file.\n\n## [Unreleased]\n\n"; + + // Insert the new section right after the [Unreleased] header (and any blank lines beneath it) + const unreleasedMatch = changelog.match(/(## \[Unreleased\]\s*\n+)/); + if (unreleasedMatch) { + const insertAt = unreleasedMatch.index + unreleasedMatch[0].length; + changelog = changelog.slice(0, insertAt) + fmt.rendered + "\n" + changelog.slice(insertAt); + } else { + // No [Unreleased] header — prepend at the top below the title + const titleMatch = changelog.match(/^# .+\n+/); + const insertAt = titleMatch ? titleMatch[0].length : 0; + changelog = changelog.slice(0, insertAt) + "## [Unreleased]\n\n" + fmt.rendered + "\n" + changelog.slice(insertAt); + } + writeFileSync(changelogPath, changelog); + written.push(changelogPath); + + console.log(JSON.stringify({ filesModified: written, newVersion: ver.newVersion })); + runtime: bun + timeout: 120000 + depends_on: [review-changelog, bump-version, format-changelog] + when: "$parse-args.output.dryRun == 'false'" + + - id: commit-and-push + bash: | + set -euo pipefail + + # Working tree was clean at validate-state; only write-files modified it, + # so `git add -A` stages exactly what the release should ship. + git add -A + git status --short + + git commit -m "Release $bump-version.output.newVersion" + git push origin dev + timeout: 60000 + depends_on: [write-files, bump-version] + when: "$parse-args.output.dryRun == 'false'" + + - id: create-pr + bash: | + set -euo pipefail + + ver=$bump-version.output.newVersion + body=$format-changelog.output.rendered + + # If a PR already exists for this branch, just print its URL. + existing=$(gh pr list --head dev --base main --state open --json url --jq '.[0].url' 2>/dev/null || true) + if [ -n "$existing" ]; then + echo "PR already open: $existing" + echo "$existing" + exit 0 + fi + + # Build the PR body in a way that doesn't put a literal "---" at YAML column 1 + pr_body=$(printf '%s\n\n---\n\nMerging this PR releases %s to main.\n' "$body" "$ver") + + url=$(gh pr create --base main --head dev --title "Release $ver" --body "$pr_body") + echo "$url" + timeout: 60000 + depends_on: [commit-and-push, format-changelog, bump-version] + when: "$parse-args.output.dryRun == 'false'" + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 7 — Wait for the PR to merge (skipped in --dry-run) + # ─────────────────────────────────────────────────────────────────── + # The user (or a reviewer) merges the PR however they prefer: + # gh pr merge --squash --delete-branch=false + # then approves here. We don't auto-merge — keeps reviewer in control. + # ═══════════════════════════════════════════════════════════════════ + + # Pre-merge-gate summary (same pass-through pattern as review-summary) + - id: merge-summary + prompt: | + Reply to the user with EXACTLY this text, verbatim, no elaboration: + + ══════════════════════════════════════════════════════════════════ + PR OPENED — waiting for merge + ══════════════════════════════════════════════════════════════════ + + $create-pr.output + + Merge the PR however you prefer: + gh pr merge --squash --delete-branch=false + (or use the GitHub web UI) + + Then approve here to continue with tag, GitHub release, dev sync, + binary wait, and Homebrew formula update. + depends_on: [create-pr] + model: haiku + allowed_tools: [] + when: "$parse-args.output.dryRun == 'false'" + + - id: wait-for-merge + approval: + message: | + Approve once the PR above has been merged into main. + Reject to stop — the PR will remain open and reviewable. + depends_on: [merge-summary] + when: "$parse-args.output.dryRun == 'false'" + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 8 — Tag, GitHub release, sync dev with main + # ═══════════════════════════════════════════════════════════════════ + + - id: tag-and-release + bash: | + set -euo pipefail + + ver=$bump-version.output.newVersion + body=$format-changelog.output.rendered + + git fetch origin main --quiet + + # Tag the merge commit on main, push the tag. + git tag "v$ver" origin/main + git push origin "v$ver" + + # Strip the leading "## [x.y.z] - YYYY-MM-DD" header line for the release body. + notes=$(printf '%s\n' "$body" | sed '1{/^## /d;}; 2{/^$/d;}') + + gh release create "v$ver" --title "v$ver" --notes "$notes" + + echo "Tagged and released v$ver" + timeout: 90000 + depends_on: [wait-for-merge, bump-version, format-changelog] + when: "$parse-args.output.dryRun == 'false'" + + - id: sync-dev-with-main + bash: | + set -euo pipefail + + # Ensure dev contains the merge commit from main so they don't diverge. + git checkout dev + git pull origin main --ff-only --quiet + git push origin dev + + echo "dev fast-forwarded to include main's merge commit" + timeout: 60000 + depends_on: [tag-and-release] + when: "$parse-args.output.dryRun == 'false'" + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 9 — Wait for release CI to finish building binaries + # ─────────────────────────────────────────────────────────────────── + # Poll the release until all 7 expected assets exist (5 binaries + + # archon-web.tar.gz + checksums.txt). Bail out early if the release + # workflow fails — no point waiting if CI is broken. + # ═══════════════════════════════════════════════════════════════════ + + - id: check-homebrew + bash: | + if [ -f homebrew/archon.rb ]; then + echo "true" + else + echo "false" + fi + timeout: 5000 + depends_on: [validate-state] + + - id: wait-for-binaries + bash: | + set -uo pipefail + + ver=$bump-version.output.newVersion + repo=$(gh repo view --json nameWithOwner -q .nameWithOwner) + + echo "Waiting for release workflow to finish uploading binaries to v$ver..." + + for i in $(seq 1 30); do + asset_count=$(gh release view "v$ver" --repo "$repo" --json assets --jq '.assets | length' 2>/dev/null || echo "0") + + if [ "$asset_count" -ge 7 ]; then + echo "All $asset_count assets uploaded" + exit 0 + fi + + # Short-circuit: if the release workflow itself failed, stop waiting. + workflow_status=$(gh run list --workflow release.yml --event push --limit 1 --json conclusion,status --jq '.[0] | "\(.status)|\(.conclusion)"' 2>/dev/null || echo "unknown|unknown") + if [ "$workflow_status" = "completed|failure" ]; then + echo "Release workflow FAILED — see: gh run view --log-failed" + exit 1 + fi + + echo " Assets so far: $asset_count/7 — waiting 30s (attempt $i/30)..." + sleep 30 + done + + echo "Timed out waiting for binaries after 15 minutes" + exit 1 + timeout: 1000000 # 16 minutes — outer bound on the polling loop + depends_on: [sync-dev-with-main, check-homebrew, bump-version] + when: "$parse-args.output.dryRun == 'false' && $check-homebrew.output == 'true'" + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 10 — Update Homebrew formula and sync the tap repo + # ─────────────────────────────────────────────────────────────────── + # Only runs if homebrew/archon.rb exists in the repo. The formula + # version and SHAs MUST move atomically (per the release skill's + # critical warning) — we regenerate the entire file from a template. + # ═══════════════════════════════════════════════════════════════════ + + - id: fetch-and-update-formula + script: | + import { spawnSync } from "node:child_process"; + import { writeFileSync, mkdtempSync, readFileSync, rmSync } from "node:fs"; + import { tmpdir } from "node:os"; + import { join } from "node:path"; + + const ver = JSON.parse(String.raw`$bump-version.output`).newVersion; + + const repoOwnerName = spawnSync("gh", ["repo", "view", "--json", "nameWithOwner", "-q", ".nameWithOwner"], { encoding: "utf-8" }).stdout.trim(); + + const dir = mkdtempSync(join(tmpdir(), "release-shas-")); + try { + const dl = spawnSync( + "gh", + ["release", "download", `v${ver}`, "--repo", repoOwnerName, "--pattern", "checksums.txt", "--dir", dir], + { encoding: "utf-8", stdio: "pipe" }, + ); + if (dl.status !== 0) { + console.error(`Failed to download checksums.txt: ${dl.stderr}`); + process.exit(1); + } + + const checksums = readFileSync(join(dir, "checksums.txt"), "utf-8"); + + const sha = (asset) => { + const m = checksums.match(new RegExp(`^([a-f0-9]{64})\\s+\\*?${asset}$`, "m")); + if (!m) throw new Error(`Missing SHA for ${asset} in checksums.txt:\n${checksums}`); + return m[1]; + }; + + const shas = { + darwinArm64: sha("archon-darwin-arm64"), + darwinX64: sha("archon-darwin-x64"), + linuxArm64: sha("archon-linux-arm64"), + linuxX64: sha("archon-linux-x64"), + }; + + // Regenerate the entire formula from the canonical template. + // Editing in place is forbidden — version + SHAs MUST move atomically. + // Built as a line array so the YAML block scalar's indentation rules don't fight us. + const formula = [ + "# Homebrew formula for Archon CLI", + "# To install: brew install coleam00/archon/archon", + "#", + "# This formula downloads pre-built binaries from GitHub releases.", + "# For development, see: https://github.com/coleam00/Archon", + "", + "class Archon < Formula", + ' desc "Remote agentic coding platform - control AI assistants from anywhere"', + ' homepage "https://github.com/coleam00/Archon"', + ` version "${ver}"`, + ' license "MIT"', + "", + " on_macos do", + " on_arm do", + ' url "https://github.com/coleam00/Archon/releases/download/v#{version}/archon-darwin-arm64"', + ` sha256 "${shas.darwinArm64}"`, + " end", + " on_intel do", + ' url "https://github.com/coleam00/Archon/releases/download/v#{version}/archon-darwin-x64"', + ` sha256 "${shas.darwinX64}"`, + " end", + " end", + "", + " on_linux do", + " on_arm do", + ' url "https://github.com/coleam00/Archon/releases/download/v#{version}/archon-linux-arm64"', + ` sha256 "${shas.linuxArm64}"`, + " end", + " on_intel do", + ' url "https://github.com/coleam00/Archon/releases/download/v#{version}/archon-linux-x64"', + ` sha256 "${shas.linuxX64}"`, + " end", + " end", + "", + " def install", + " binary_name = case", + " when OS.mac? && Hardware::CPU.arm?", + ' "archon-darwin-arm64"', + " when OS.mac? && Hardware::CPU.intel?", + ' "archon-darwin-x64"', + " when OS.linux? && Hardware::CPU.arm?", + ' "archon-linux-arm64"', + " when OS.linux? && Hardware::CPU.intel?", + ' "archon-linux-x64"', + " end", + "", + ' bin.install binary_name => "archon"', + " end", + "", + " test do", + ' assert_match version.to_s, shell_output("#{bin}/archon version")', + " end", + "end", + "", + ].join("\n"); + + writeFileSync("homebrew/archon.rb", formula); + console.log(JSON.stringify({ updatedTo: ver, shas })); + } finally { + rmSync(dir, { recursive: true, force: true }); + } + runtime: bun + timeout: 120000 + depends_on: [wait-for-binaries, bump-version] + when: "$parse-args.output.dryRun == 'false' && $check-homebrew.output == 'true'" + + - id: commit-formula + bash: | + set -euo pipefail + + ver=$bump-version.output.newVersion + + git checkout main + git pull origin main --ff-only --quiet + git add homebrew/archon.rb + git commit -m "chore(homebrew): update formula to v$ver" + git push origin main + + # Sync dev with main so the formula update is on both branches + git checkout dev + git pull origin main --ff-only --quiet + git push origin dev + + echo "Formula committed to main and synced to dev" + timeout: 90000 + depends_on: [fetch-and-update-formula, bump-version] + when: "$parse-args.output.dryRun == 'false' && $check-homebrew.output == 'true'" + + - id: sync-tap + script: | + import { spawnSync } from "node:child_process"; + import { mkdtempSync, copyFileSync, rmSync } from "node:fs"; + import { tmpdir } from "node:os"; + import { join } from "node:path"; + + const ver = JSON.parse(String.raw`$bump-version.output`).newVersion; + const tapRepo = "git@github.com:coleam00/homebrew-archon.git"; + + const dir = mkdtempSync(join(tmpdir(), "tap-sync-")); + try { + const clone = spawnSync("git", ["clone", "--depth=1", tapRepo, dir], { encoding: "utf-8", stdio: "pipe" }); + if (clone.status !== 0) { + console.error("Failed to clone tap repo. You may need push access to coleam00/homebrew-archon."); + console.error("Run this manually after the release:"); + console.error(` git clone ${tapRepo} && cp homebrew/archon.rb /Formula/archon.rb && git -C commit -am 'chore: sync formula to v${ver}' && git -C push`); + process.exit(1); + } + + copyFileSync("homebrew/archon.rb", join(dir, "Formula", "archon.rb")); + + const diff = spawnSync("git", ["-C", dir, "diff", "--quiet"], { encoding: "utf-8" }); + if (diff.status === 0) { + console.log("Tap formula already in sync — no changes needed"); + process.exit(0); + } + + for (const args of [ + ["-C", dir, "add", "Formula/archon.rb"], + ["-C", dir, "commit", "-m", `chore: sync formula to v${ver}`], + ["-C", dir, "push", "origin", "main"], + ]) { + const r = spawnSync("git", args, { encoding: "utf-8", stdio: "inherit" }); + if (r.status !== 0) { + console.error(`git ${args.slice(2).join(" ")} failed`); + process.exit(1); + } + } + + console.log(`Tap synced to v${ver}`); + } finally { + rmSync(dir, { recursive: true, force: true }); + } + runtime: bun + timeout: 120000 + depends_on: [commit-formula, bump-version] + when: "$parse-args.output.dryRun == 'false' && $check-homebrew.output == 'true'" + + # ═══════════════════════════════════════════════════════════════════ + # PHASE 11 — Final summary (always runs in both modes) + # ─────────────────────────────────────────────────────────────────── + # `trigger_rule: all_done` lets this run regardless of which downstream + # nodes were skipped (dry-run path or no-homebrew path). + # ═══════════════════════════════════════════════════════════════════ + + - id: final-summary + script: | + // Defensive: this node runs with trigger_rule: all_done, so any upstream + // node may have been skipped or failed. Empty $node.output substitutions + // resolve to "" and would break JSON.parse if not guarded. + const safeJson = (raw) => { + const s = raw.trim(); + if (!s) return null; + try { return JSON.parse(s); } catch { return null; } + }; + + const args = safeJson(String.raw`$parse-args.output`); + const ver = safeJson(String.raw`$bump-version.output`); + + const lines = []; + lines.push("══════════════════════════════════════════════════════════════════"); + + if (!args || !ver) { + lines.push("WORKFLOW ENDED EARLY — see prior node failures or skips."); + lines.push(""); + lines.push(`parse-args : ${args ? "ok" : "missing/skipped"}`); + lines.push(`bump-version: ${ver ? "ok" : "missing/skipped"}`); + lines.push(""); + lines.push("Check the run log for the first failed node and address it."); + } else if (args.dryRun === "true") { + lines.push(`DRY RUN COMPLETE — would have released v${ver.newVersion} (from v${ver.oldVersion})`); + lines.push(""); + lines.push("No files were written. No commits were made. No PR was created."); + lines.push("Re-run without --dry-run to actually cut the release."); + } else { + lines.push(`RELEASE COMPLETE — v${ver.newVersion} (from v${ver.oldVersion}, ${ver.bump})`); + lines.push(""); + lines.push("Verify the release end-to-end with the test-release skill:"); + lines.push(` /test-release brew ${ver.newVersion}`); + lines.push(` /test-release curl-mac ${ver.newVersion}`); + lines.push(""); + lines.push("If verification fails, file a hotfix and cut the next patch."); + lines.push("DO NOT announce the release until /test-release passes."); + } + lines.push("══════════════════════════════════════════════════════════════════"); + console.log(lines.join("\n")); + runtime: bun + timeout: 5000 + depends_on: + - review-changelog + - write-files + - commit-and-push + - create-pr + - wait-for-merge + - tag-and-release + - sync-dev-with-main + - wait-for-binaries + - fetch-and-update-formula + - commit-formula + - sync-tap + trigger_rule: all_done diff --git a/.archon/workflows/maintainer/maintainer-review-pr.yaml b/.archon/workflows/maintainer/maintainer-review-pr.yaml new file mode 100644 index 0000000000..d436118f95 --- /dev/null +++ b/.archon/workflows/maintainer/maintainer-review-pr.yaml @@ -0,0 +1,333 @@ +name: maintainer-review-pr +description: | + Use when: Maintainer wants to review a SINGLE PR with direction-and-scope + gating before any deep review. Skips deep review entirely when the PR is + off-direction, too broad, or has multiple concerns; instead drafts a polite- + decline comment for human approval. + Triggers: "maintainer review", "maintainer review pr ", "review and gate", + "should i review this PR", "gate this PR", "review pr as maintainer". + Does: Loads maintainer direction + profile + state -> gates the PR on + direction alignment, scope focus, and PR-template fill -> if review- + worthy, runs comprehensive review (5 parallel review aspects); if + decline-worthy, drafts a polite-decline comment that you approve before + it posts. + Provider: Pi (Minimax M2.7) — runs cheaper than Claude. Each review aspect + is its own Archon node, so Pi handles them as independent calls. + NOT for: Comprehensive review of a PR you've already decided to merge + (use archon-comprehensive-pr-review). Quick triage of all open PRs + (use maintainer-standup). + +provider: pi +model: minimax/MiniMax-M2.7 + +interactive: true # Required for the decline-approval gate + +worktree: + enabled: false # Live checkout — needs to read .archon/maintainer-standup/ +mutates_checkout: false # Read-only + per-run artifact writes; concurrent runs safe + +nodes: + # ═══════════════════════════════════════════════════════════════ + # PHASE 1: EXTRACT PR NUMBER FROM ARGUMENTS + # ═══════════════════════════════════════════════════════════════ + + - id: extract-pr-number + prompt: | + Find the GitHub PR number for this request. + + Request: $ARGUMENTS + + Rules: + - If the message contains an explicit PR number (e.g., "#1428", "PR 1428", "1428"), extract that number. + - If the message contains a PR URL (https://github.com/.../pull/N), extract N. + - If you cannot determine a single PR number, output ERROR. + + CRITICAL: Output ONLY the bare number with no quotes, markdown, or explanation. + Example correct output: 1428 + allowed_tools: [] + idle_timeout: 30000 + + # ═══════════════════════════════════════════════════════════════ + # PHASE 2: GATHER PR DATA + MAINTAINER CONTEXT (parallel) + # ═══════════════════════════════════════════════════════════════ + + - id: fetch-pr + bash: | + PR_NUM=$(echo "$extract-pr-number.output" | tr -d "'\"\`\n " | grep -oE '[0-9]+' | head -1) + if [ -z "$PR_NUM" ]; then + echo "Failed to extract PR number from: $extract-pr-number.output" >&2 + exit 1 + fi + echo "$PR_NUM" > "$ARTIFACTS_DIR/.pr-number" + gh pr view "$PR_NUM" --json number,title,body,labels,comments,reviews,state,mergeable,mergeStateStatus,additions,deletions,changedFiles,files,author,createdAt,updatedAt,baseRefName,headRefName,reviewDecision,reviewRequests,isDraft + depends_on: [extract-pr-number] + timeout: 30000 + + - id: fetch-diff + bash: | + PR_NUM=$(cat "$ARTIFACTS_DIR/.pr-number") + # Don't redirect stderr — let auth / network / deleted-PR failures surface + # as a node failure rather than feeding an empty diff to the gate (which + # would produce a confident verdict on no evidence). + if ! diff_output=$(gh pr diff "$PR_NUM"); then + echo "ERROR: gh pr diff failed for PR #$PR_NUM" >&2 + exit 1 + fi + # Cap at 2500 lines to keep prompt size bounded; gate cares about shape, not every line. + if [ -z "$diff_output" ]; then + echo "(empty diff — PR has no changes)" + else + echo "$diff_output" | head -2500 + fi + depends_on: [fetch-pr] + timeout: 30000 + + - id: read-context + # Reuses the maintainer-standup script — same direction.md / profile.md / + # state.json / recent briefs we want for gate decisions. + script: maintainer-standup-read-context + runtime: bun + timeout: 10000 + depends_on: [extract-pr-number] + + # ═══════════════════════════════════════════════════════════════ + # PHASE 3: GATE — direction + scope + template check + # ═══════════════════════════════════════════════════════════════ + + - id: gate + command: maintainer-review-gate + depends_on: [fetch-pr, fetch-diff, read-context] + context: fresh + output_format: + type: object + properties: + verdict: + type: string + enum: [review, decline, needs_split, unclear] + description: | + 'review' = passes gates, proceed to deep review. + 'decline' = wrong direction; draft polite-decline comment. + 'needs_split' = scope is multiple concerns; draft split-up request. + 'unclear' = gate cannot decide confidently; ask maintainer manually. + direction_alignment: + type: string + enum: [aligned, conflict, unclear] + scope_assessment: + type: string + enum: [focused, multiple_concerns, too_broad] + template_quality: + type: string + enum: [good, partial, empty] + decline_categories: + type: array + items: + type: string + description: e.g. ['direction', 'scope', 'template']. Empty when verdict == 'review'. + cited_direction_clauses: + type: array + items: + type: string + description: | + Specific direction.md clauses cited (e.g., 'direction.md §single-developer-tool'). + Empty when verdict == 'review'. + reasoning: + type: string + description: 1-3 sentences summarizing why this verdict. + required: + - verdict + - direction_alignment + - scope_assessment + - template_quality + - decline_categories + - cited_direction_clauses + - reasoning + + # ═══════════════════════════════════════════════════════════════ + # PHASE 4a: REVIEW BRANCH (verdict == 'review') + # ═══════════════════════════════════════════════════════════════ + + - id: review-classify + prompt: | + Determine which review aspects to run for this PR. + + ## PR Metadata + $fetch-pr.output + + ## Diff (truncated) + $fetch-diff.output + + ## Rules + - **Code review**: ALWAYS run. Mandatory for every PR. + - **Error handling**: Run if diff touches code with try/catch, async/await, or new failure paths. + - **Test coverage**: Run if diff touches source code (not just tests, docs, or config). + - **Comment quality**: Run if diff adds/modifies comments, docstrings, JSDoc, or in-code documentation. + - **Docs impact**: Run if diff adds/removes/renames public APIs, CLI flags, env vars, or user-facing features. + + Provide reasoning for each decision. Output JSON only. + depends_on: [gate] + when: "$gate.output.verdict == 'review'" + allowed_tools: [] + context: fresh + idle_timeout: 60000 + output_format: + type: object + properties: + run_code_review: + type: string + enum: ['true', 'false'] + run_error_handling: + type: string + enum: ['true', 'false'] + run_test_coverage: + type: string + enum: ['true', 'false'] + run_comment_quality: + type: string + enum: ['true', 'false'] + run_docs_impact: + type: string + enum: ['true', 'false'] + reasoning: + type: string + required: + - run_code_review + - run_error_handling + - run_test_coverage + - run_comment_quality + - run_docs_impact + - reasoning + + - id: code-review + command: maintainer-review-code-review + depends_on: [review-classify] + when: "$review-classify.output.run_code_review == 'true'" + context: fresh + + - id: error-handling + command: maintainer-review-error-handling + depends_on: [review-classify] + when: "$review-classify.output.run_error_handling == 'true'" + context: fresh + + - id: test-coverage + command: maintainer-review-test-coverage + depends_on: [review-classify] + when: "$review-classify.output.run_test_coverage == 'true'" + context: fresh + + - id: comment-quality + command: maintainer-review-comment-quality + depends_on: [review-classify] + when: "$review-classify.output.run_comment_quality == 'true'" + context: fresh + + - id: docs-impact + command: maintainer-review-docs-impact + depends_on: [review-classify] + when: "$review-classify.output.run_docs_impact == 'true'" + context: fresh + + - id: synthesize-review + command: maintainer-review-synthesize + depends_on: [code-review, error-handling, test-coverage, comment-quality, docs-impact] + trigger_rule: one_success + context: fresh + + # Auto-post — once the gate said 'review', the deep review is feedback worth + # delivering. No approval required; the maintainer can always edit/delete on + # GitHub. (Approval gates are reserved for the higher-stakes decline branch + # where the comment closes the door on the contribution.) + - id: post-review + bash: | + PR_NUM=$(cat "$ARTIFACTS_DIR/.pr-number") + if [ ! -f "$ARTIFACTS_DIR/review/review-comment.md" ]; then + echo "ERROR: review-comment.md missing — synthesize did not write it" >&2 + exit 1 + fi + gh pr comment "$PR_NUM" --body-file "$ARTIFACTS_DIR/review/review-comment.md" + echo "Posted review comment to PR #$PR_NUM" + depends_on: [synthesize-review] + timeout: 30000 + + # ═══════════════════════════════════════════════════════════════ + # PHASE 4b: DECLINE BRANCH (verdict in ['decline', 'needs_split']) + # ═══════════════════════════════════════════════════════════════ + + - id: approve-decline + approval: + message: | + Gate flagged this PR for polite-decline. Review the gate decision and the + drafted decline comment in the workflow output above (and in + $ARTIFACTS_DIR/gate-decision.md). + + Approve to post the drafted comment to the PR. + Reject with a reason to redraft (max 3 attempts). + capture_response: true + on_reject: + prompt: | + Reviewer feedback on the previous decline draft: + $REJECTION_REASON + + Re-read the gate decision at `$ARTIFACTS_DIR/gate-decision.md` and the + current drafted comment at `$ARTIFACTS_DIR/decline-comment.md`. Revise + the decline comment based on the feedback, then OVERWRITE + `$ARTIFACTS_DIR/decline-comment.md` with the new version. + + Output the revised decline comment as raw markdown — no JSON wrapper. + max_attempts: 3 + depends_on: [gate] + when: "$gate.output.verdict == 'decline' || $gate.output.verdict == 'needs_split'" + + - id: post-decline + bash: | + PR_NUM=$(cat "$ARTIFACTS_DIR/.pr-number") + if [ ! -f "$ARTIFACTS_DIR/decline-comment.md" ]; then + echo "ERROR: decline-comment.md missing — gate command did not write it" >&2 + exit 1 + fi + gh pr comment "$PR_NUM" --body-file "$ARTIFACTS_DIR/decline-comment.md" + + # Tag the PR so the morning brief can surface "awaiting author". + # Failure (label not present in repo, permissions, etc.) is non-fatal, + # but record the actual outcome so the report node doesn't claim the + # label was applied when it wasn't. + if gh pr edit "$PR_NUM" --add-label awaiting-author 2>"$ARTIFACTS_DIR/.label-error"; then + echo "applied" > "$ARTIFACTS_DIR/.label-applied" + rm -f "$ARTIFACTS_DIR/.label-error" + else + echo "skipped" > "$ARTIFACTS_DIR/.label-applied" + echo "WARN: gh pr edit --add-label failed; see $ARTIFACTS_DIR/.label-error" >&2 + fi + + echo "Posted decline comment to PR #$PR_NUM" + depends_on: [approve-decline] + timeout: 30000 + + # ═══════════════════════════════════════════════════════════════ + # PHASE 4c: UNCLEAR BRANCH (verdict == 'unclear') + # ═══════════════════════════════════════════════════════════════ + + - id: approve-unclear + approval: + message: | + Gate could not classify this PR confidently. Read the raw gate output + and any artifacts in $ARTIFACTS_DIR/, then decide manually. + + Approve (with optional comment) = workflow ends here (no comment posted, + no review run). Your comment is captured as $approve-unclear.output and + the report node will include it. + Reject (with reason) = workflow is cancelled; reasoning is recorded in + the run. + capture_response: true + depends_on: [gate] + when: "$gate.output.verdict == 'unclear'" + + # ═══════════════════════════════════════════════════════════════ + # PHASE 5: FINAL REPORT (whichever branch ran) + # ═══════════════════════════════════════════════════════════════ + + - id: report + command: maintainer-review-report + depends_on: [post-review, post-decline, approve-unclear] + trigger_rule: one_success + context: fresh diff --git a/.archon/workflows/maintainer/maintainer-standup.yaml b/.archon/workflows/maintainer/maintainer-standup.yaml new file mode 100644 index 0000000000..9382ce0887 --- /dev/null +++ b/.archon/workflows/maintainer/maintainer-standup.yaml @@ -0,0 +1,162 @@ +name: maintainer-standup +description: | + Use when: Maintainer wants their morning briefing — what changed on dev, + what's in the review queue, what to focus on today across PRs and issues. + Triggers: "morning standup", "maintainer standup", "what's new today", + "daily brief", "morning brief", "what should i work on today", + "start my day". + Does: Pulls latest dev, fetches all open PRs and assigned issues, cross- + references against direction.md to flag polite-decline candidates, + compares against prior run state to surface progress (merged, closed, + what you shipped), produces a prioritized P1-P4 brief. Saves dated + brief + state for next-run continuity. + NOT for: Fixing issues (use archon-fix-github-issue), reviewing a specific + PR (use archon-comprehensive-pr-review), repo-wide triage automation + (use repo-triage). + +provider: claude +model: sonnet + +worktree: + enabled: false # Live checkout — needs to git pull and read .archon/maintainer-standup/ + +nodes: + # ── Layer 0: gather facts in parallel ── + + - id: git-status + script: maintainer-standup-git-status + runtime: bun + timeout: 60000 + + - id: gh-data + script: maintainer-standup-gh-data + runtime: bun + timeout: 180000 + + - id: read-context + script: maintainer-standup-read-context + runtime: bun + timeout: 10000 + + # ── Layer 1: synthesize the brief ── + + - id: synthesize + command: maintainer-standup + depends_on: [git-status, gh-data, read-context] + output_format: + type: object + properties: + brief_markdown: + type: string + description: Human-readable maintainer brief in markdown, with P1-P4 sections. + next_state: + type: object + description: Carry-over state for tomorrow's run. + properties: + last_run_at: + type: string + description: ISO-8601 timestamp of this run. + last_dev_sha: + type: string + description: origin/dev SHA at the end of this run. + carry_over: + type: array + description: Items still pending from previous runs (or surfaced this run). + items: + type: object + properties: + kind: + type: string + enum: [pr, issue, task, direction_question] + id: + type: string + description: PR/issue number as string, or task identifier. + note: + type: string + description: Why this is being carried over. + first_seen: + type: string + description: ISO-8601 date when this item first appeared in carry_over (preserved across runs). + required: [kind, id, note, first_seen] + observed_prs: + type: array + description: Snapshot of ALL currently-open PRs, used to detect resolved/new PRs next run. + items: + type: object + properties: + number: + type: number + title: + type: string + required: [number, title] + observed_issues: + type: array + description: Snapshot of currently-tracked issues (assigned + recent unlabeled). + items: + type: object + properties: + number: + type: number + title: + type: string + required: [number, title] + direction_questions: + type: array + description: New "we don't have a stance on this" questions surfaced this run. + items: + type: string + required: [last_run_at, last_dev_sha, carry_over, observed_prs, observed_issues, direction_questions] + required: [brief_markdown, next_state] + + # ── Layer 2: persist state and dated brief ── + + - id: persist + depends_on: [synthesize] + runtime: bun + timeout: 15000 + script: | + import { writeFileSync, mkdirSync, existsSync } from 'node:fs'; + import { resolve } from 'node:path'; + + // JSON is valid JS expression syntax — substitute directly without a + // template literal. Wrapping in String.raw breaks if the output contains + // backticks (e.g. markdown code spans inside brief_markdown). + const data = $synthesize.output; + + // Local YYYY-MM-DD (sv-SE locale gives ISO format in local time) so a + // late-night run doesn't write tomorrow's UTC date and confuse next-run + // recent_briefs lookups. + const date = new Date().toLocaleDateString('sv-SE'); + + try { + const baseDir = resolve(process.cwd(), '.archon/maintainer-standup'); + if (!existsSync(baseDir)) mkdirSync(baseDir, { recursive: true }); + + writeFileSync( + resolve(baseDir, 'state.json'), + JSON.stringify(data.next_state, null, 2) + '\n', + ); + + const briefsDir = resolve(baseDir, 'briefs'); + if (!existsSync(briefsDir)) mkdirSync(briefsDir, { recursive: true }); + const briefPath = resolve(briefsDir, `${date}.md`); + writeFileSync(briefPath, data.brief_markdown); + + console.log(JSON.stringify({ + date, + state_path: '.archon/maintainer-standup/state.json', + brief_path: `.archon/maintainer-standup/briefs/${date}.md`, + })); + } catch (err) { + // Synthesis (Sonnet, ~5 min) is the expensive part. If persist fails + // (disk full, read-only fs, permission), dump the brief + state to + // stderr so the run isn't a total loss — they're recoverable from logs. + process.stderr.write(`PERSIST FAILED: ${err.message}\n`); + process.stderr.write('--- BEGIN brief_markdown (recoverable from logs) ---\n'); + process.stderr.write(data.brief_markdown + '\n'); + process.stderr.write('--- END brief_markdown ---\n'); + process.stderr.write('--- BEGIN next_state (recoverable from logs) ---\n'); + process.stderr.write(JSON.stringify(data.next_state, null, 2) + '\n'); + process.stderr.write('--- END next_state ---\n'); + process.exit(1); + } diff --git a/.archon/workflows/repo-triage.yaml b/.archon/workflows/maintainer/repo-triage.yaml similarity index 100% rename from .archon/workflows/repo-triage.yaml rename to .archon/workflows/maintainer/repo-triage.yaml diff --git a/.archon/workflows/e2e-claude-smoke.yaml b/.archon/workflows/test-workflows/e2e-claude-smoke.yaml similarity index 100% rename from .archon/workflows/e2e-claude-smoke.yaml rename to .archon/workflows/test-workflows/e2e-claude-smoke.yaml diff --git a/.archon/workflows/e2e-codex-smoke.yaml b/.archon/workflows/test-workflows/e2e-codex-smoke.yaml similarity index 100% rename from .archon/workflows/e2e-codex-smoke.yaml rename to .archon/workflows/test-workflows/e2e-codex-smoke.yaml diff --git a/.archon/workflows/e2e-deterministic.yaml b/.archon/workflows/test-workflows/e2e-deterministic.yaml similarity index 100% rename from .archon/workflows/e2e-deterministic.yaml rename to .archon/workflows/test-workflows/e2e-deterministic.yaml diff --git a/.archon/workflows/test-workflows/e2e-minimax-smoke.yaml b/.archon/workflows/test-workflows/e2e-minimax-smoke.yaml new file mode 100644 index 0000000000..eefae0d35a --- /dev/null +++ b/.archon/workflows/test-workflows/e2e-minimax-smoke.yaml @@ -0,0 +1,126 @@ +# E2E smoke test — Minimax M2.7 via the Pi community provider +# Verifies: Pi can resolve and call Minimax M2.7 using the user's local +# `pi /login` credentials (api_key entry in ~/.pi/agent/auth.json). +# Design: mirrors e2e-pi-smoke.yaml structure. Three nodes verify +# (1) the model responds at all, (2) it can self-identify as Minimax, +# (3) it can produce parseable JSON via output_format (best-effort on Pi). +# The final bash node fails fast if any signal is missing. +# Auth: requires a `minimax` entry in ~/.pi/agent/auth.json. No env vars. +name: e2e-minimax-smoke +description: | + Use when: Verifying that Minimax M2.7 loads via the Pi provider with the + user's local Pi auth (api_key in ~/.pi/agent/auth.json). + Triggers: "minimax smoke", "test minimax", "verify minimax", "minimax test". + Does: Sends three tiny prompts to Minimax M2.7 (math, self-identification, + structured JSON), asserts non-empty output and basic plausibility. + NOT for: Production work — connectivity / capability sanity check only. + +provider: pi +model: minimax/MiniMax-M2.7 + +worktree: + enabled: false # Smoke test — no need to isolate + +nodes: + # 1. Connectivity — does Pi resolve the model and stream a response? + - id: hello + prompt: 'What is 2+2? Answer with just the number, nothing else.' + allowed_tools: [] + effort: low + idle_timeout: 60000 + + # 2. Self-identification — INFORMATIONAL ONLY. Do not assert on the result. + # LLMs are unreliable narrators about their own identity, and Pi's system + # prompt mentions OpenAI-codex defaults, which causes Minimax (and likely + # other models) to pattern-match and claim that identity. The real proof + # of routing is in Pi's session jsonl (provider=minimax, real billing). + - id: identify + prompt: 'Without using any tools, on a single short line, tell me which model and provider you are.' + allowed_tools: [] + idle_timeout: 60000 + depends_on: [hello] + + # 3. Structured output — exercises Pi's best-effort output_format path + # (schema appended to prompt + JSON extracted from result text). + # This is the same machinery the maintainer-standup synthesis relies on. + - id: json + prompt: | + Return a JSON object with two fields, no fences and no prose: + - "name": your model name (string) + - "ok": always true (boolean) + allowed_tools: [] + idle_timeout: 60000 + depends_on: [hello] + output_format: + type: object + properties: + name: + type: string + ok: + type: boolean + required: [name, ok] + + # 4. Assertions — fail loudly if any node returned empty / unparseable. + - id: assert + depends_on: [hello, identify, json] + bash: | + math="$hello.output" + ident="$identify.output" + jname="$json.output.name" + jok="$json.output.ok" + + echo "── results ──" + echo "math = $math" + echo "identify = $ident" + echo "json.name = $jname" + echo "json.ok = $jok" + echo "──────────────" + + if [ -z "$math" ] || [ -z "$ident" ]; then + echo "FAIL: empty output from hello or identify node" + exit 1 + fi + if [ -z "$jname" ] || [ -z "$jok" ]; then + echo "FAIL: structured-output fields missing — Pi best-effort JSON parse failed" + exit 1 + fi + + # Real proof of routing: Pi writes a session jsonl per call. Find ALL + # session jsonls modified in the last 10 minutes (generous window — + # smoke's three Pi nodes + assert can collectively take several + # minutes on a slow network; capped at 10 to avoid matching old runs). + # Check each for the minimax routing signal — any one matching is + # sufficient evidence. This avoids: + # - brittle path-encoding assumptions about Pi's per-cwd session dir, + # - non-deterministic `head -1` over `find` output (find doesn't + # guarantee any order), + # - JSON field-order brittleness in a single combined regex + # (`provider` may appear before or after `modelId` in the jsonl). + recent_sessions=$(find "$HOME/.pi/agent/sessions" -name '*.jsonl' -mmin -10 -print 2>/dev/null) + if [ -z "$recent_sessions" ]; then + echo "FAIL: no Pi session jsonl modified in the last 10 minutes" + exit 1 + fi + + matched="" + while IFS= read -r session; do + # Two separate greps for order-independence — JSON field ordering + # isn't part of Pi's contract, so a single regex with `.*` between + # the two fields would silently false-FAIL if Pi ever reorders. + if grep -q '"provider":"minimax"' "$session" \ + && grep -q '"modelId":"MiniMax-M2.7"' "$session"; then + matched="$session" + break + fi + done <<< "$recent_sessions" + + if [ -n "$matched" ]; then + echo "PASS: Pi session log confirms provider=minimax, modelId=MiniMax-M2.7" + echo " session: $matched" + else + echo "FAIL: no recent Pi session log confirmed minimax routing — possible misroute" + echo " checked sessions:" + echo "$recent_sessions" | sed 's/^/ /' + exit 1 + fi + echo "PASS: smoke complete" diff --git a/.archon/workflows/e2e-mixed-providers.yaml b/.archon/workflows/test-workflows/e2e-mixed-providers.yaml similarity index 100% rename from .archon/workflows/e2e-mixed-providers.yaml rename to .archon/workflows/test-workflows/e2e-mixed-providers.yaml diff --git a/.archon/workflows/e2e-pi-all-nodes-smoke.yaml b/.archon/workflows/test-workflows/e2e-pi-all-nodes-smoke.yaml similarity index 100% rename from .archon/workflows/e2e-pi-all-nodes-smoke.yaml rename to .archon/workflows/test-workflows/e2e-pi-all-nodes-smoke.yaml diff --git a/.archon/workflows/e2e-pi-smoke.yaml b/.archon/workflows/test-workflows/e2e-pi-smoke.yaml similarity index 100% rename from .archon/workflows/e2e-pi-smoke.yaml rename to .archon/workflows/test-workflows/e2e-pi-smoke.yaml diff --git a/.archon/workflows/e2e-worktree-disabled.yaml b/.archon/workflows/test-workflows/e2e-worktree-disabled.yaml similarity index 100% rename from .archon/workflows/e2e-worktree-disabled.yaml rename to .archon/workflows/test-workflows/e2e-worktree-disabled.yaml diff --git a/.claude/skills/archon/SKILL.md b/.claude/skills/archon/SKILL.md index f36e7391b8..c844ad0eb9 100644 --- a/.claude/skills/archon/SKILL.md +++ b/.claude/skills/archon/SKILL.md @@ -37,17 +37,60 @@ Determine the user's intent and dispatch to the appropriate guide: | **Config / settings** | Read `guides/config.md` — interactive config editor | | **Initialize .archon/ in a repo** | Read `references/repo-init.md` | | **Create a workflow** | Read `references/workflow-dag.md` — the complete workflow authoring guide | +| **Quick parameter lookup — which field works on which node type** | Read `references/parameter-matrix.md` — master matrix, intent-based lookup, silent-failure catalog | | **Advanced features (hooks/MCP/skills)** | Read `references/dag-advanced.md` | | **Create a command file** | Read `references/authoring-commands.md` | | **Variable substitution reference** | Read `references/variables.md` | | **CLI command reference** | Read `references/cli-commands.md` | | **Run an interactive workflow** | Read `references/interactive-workflows.md` — transparent relay protocol | +| **Workflow good practices / anti-patterns** | Read `references/good-practices.md` — read before designing a non-trivial workflow | +| **Troubleshoot a failing / stuck workflow** | Read `references/troubleshooting.md` — log locations, common failure modes | | **Run a workflow (default)** | Continue with "Running Workflows" below | If the intent is ambiguous, ask the user to clarify. --- +## Richer Context: [archon.diy](https://archon.diy) + +The references in this skill are a distilled subset. The full, canonical docs live at **[archon.diy](https://archon.diy)** (Starlight site from `packages/docs-web/`). If the skill's reference pages don't cover what you need — an edge case, a worked example, a diagram, a deeper section on a feature — fetch the matching page from archon.diy. + +### When to reach for the live docs + +- You need an end-to-end example that's longer than what the skill shows (e.g. full patterns for hooks, MCP config, sandbox schema, approval flows) +- You're explaining a concept to the user and want the most readable framing (the `book/` series is written as a tutorial, not a reference) +- You hit a feature the skill only mentions in passing (e.g. `agents:` inline sub-agents, advanced Codex options, the full SyncHookJSONOutput schema) +- The user asks "where is this documented?" — point them at the archon.diy URL, not a skill file path + +### URL map + +| Topic | URL | +|-------|-----| +| Landing + install | [archon.diy](https://archon.diy) | +| Getting started (installation, quick start, concepts) | [archon.diy/getting-started/](https://archon.diy/getting-started/overview/) | +| The book (tutorial-style walkthrough) | [archon.diy/book/](https://archon.diy/book/) | +| Workflow authoring guide | [archon.diy/guides/authoring-workflows/](https://archon.diy/guides/authoring-workflows/) | +| Command authoring guide | [archon.diy/guides/authoring-commands/](https://archon.diy/guides/authoring-commands/) | +| Node type guides | [archon.diy/guides/loop-nodes/](https://archon.diy/guides/loop-nodes/), [/approval-nodes/](https://archon.diy/guides/approval-nodes/), [/script-nodes/](https://archon.diy/guides/script-nodes/) | +| Per-node features (Claude only) | [/hooks/](https://archon.diy/guides/hooks/), [/mcp-servers/](https://archon.diy/guides/mcp-servers/), [/skills/](https://archon.diy/guides/skills/) | +| Global workflows/commands/scripts | [archon.diy/guides/global-workflows/](https://archon.diy/guides/global-workflows/) | +| Variables reference | [archon.diy/reference/variables/](https://archon.diy/reference/variables/) | +| CLI reference | [archon.diy/reference/cli/](https://archon.diy/reference/cli/) | +| Security model (env, sandbox, target-repo `.env` stripping) | [archon.diy/reference/security/](https://archon.diy/reference/security/) | +| Architecture | [archon.diy/reference/architecture/](https://archon.diy/reference/architecture/) | +| Configuration (`.archon/config.yaml` full schema) | [archon.diy/reference/configuration/](https://archon.diy/reference/configuration/) | +| Troubleshooting | [archon.diy/reference/troubleshooting/](https://archon.diy/reference/troubleshooting/) | +| Adapter setup (Slack/Telegram/GitHub/Web/Discord/Gitea/GitLab) | [archon.diy/adapters/](https://archon.diy/adapters/) | +| Deployment (Docker, cloud, Windows) | [archon.diy/deployment/](https://archon.diy/deployment/) | + +URL shape is `archon.diy/
//` — the paths mirror the filenames under `packages/docs-web/src/content/docs/`. + +### Precedence + +This skill's reference pages are the primary source for routine workflow authoring, CLI use, and setup. Reach for archon.diy when the skill is incomplete for your case — don't go to the live docs first by default (skill refs load into context faster and are tuned for agents). + +--- + ## Running Workflows ### Core Command @@ -152,9 +195,9 @@ nodes: depends_on: [first-node] ``` -### Four Node Types +### Node Types -Each node has exactly ONE of: `command`, `prompt`, `bash`, or `loop`. +Each node has exactly ONE of: `command`, `prompt`, `bash`, `script`, `loop`, `approval`, or `cancel`. **Command node** — runs a `.archon/commands/*.md` file: ```yaml @@ -177,6 +220,22 @@ Each node has exactly ONE of: `command`, `prompt`, `bash`, or `loop`. timeout: 15000 ``` +**Script node** — TypeScript/JavaScript (via `bun`) or Python (via `uv`), no AI, stdout captured as output: +```yaml +- id: transform + script: | + const raw = process.argv.slice(2).join(' ') || '{}'; + console.log(JSON.stringify({ parsed: JSON.parse(raw) })); + runtime: bun # 'bun' (.ts/.js) or 'uv' (.py) — REQUIRED + timeout: 30000 # Optional, ms, default 120000 + +# Or reference a named script from .archon/scripts/ or ~/.archon/scripts/ +- id: analyze + script: analyze-metrics # loads .archon/scripts/analyze-metrics.py + runtime: uv + deps: ["pandas>=2.0"] # Optional, uv only — 'uv run --with ' +``` + **Loop node** — iterates AI prompt until completion: ```yaml - id: implement @@ -188,6 +247,29 @@ Each node has exactly ONE of: `command`, `prompt`, `bash`, or `loop`. until_bash: "bun run test" # Optional: exit 0 = done ``` +**Approval node** — pauses the workflow for human review. Requires `interactive: true` at the workflow level for Web UI delivery: +```yaml +interactive: true # workflow level — required for web UI + +nodes: + - id: review-gate + approval: + message: "Review the plan above before proceeding." + capture_response: true # Optional: user's comment → $review-gate.output + on_reject: # Optional: AI rework on rejection instead of cancel + prompt: "Revise based on feedback: $REJECTION_REASON" + max_attempts: 3 # Range 1-10, default 3 + depends_on: [plan] +``` + +**Cancel node** — terminates the workflow with a reason. Typically gated with `when:`: +```yaml +- id: stop-if-unsafe + cancel: "Refusing to proceed: input flagged UNSAFE." + depends_on: [classify] + when: "$classify.output != 'SAFE'" +``` + For the full authoring guide with all fields, conditions, trigger rules, and patterns: Read `references/workflow-dag.md` ### Creating a Command File @@ -230,7 +312,7 @@ For details: Read `references/dag-advanced.md` ### Example Files -- `examples/dag-workflow.yaml` — workflow with conditions, bash nodes, structured output +- `examples/dag-workflow.yaml` — workflow with conditions, bash + script + loop nodes, structured output - `examples/command-template.md` — Command file skeleton with all variables --- diff --git a/.claude/skills/archon/examples/dag-workflow.yaml b/.claude/skills/archon/examples/dag-workflow.yaml index 5e15f4c77c..50fcbdada1 100644 --- a/.claude/skills/archon/examples/dag-workflow.yaml +++ b/.claude/skills/archon/examples/dag-workflow.yaml @@ -1,7 +1,8 @@ -# Example: Workflow with all four node types +# Example: Workflow demonstrating multiple node types # -# Demonstrates: bash nodes, structured output, when: conditions, -# trigger_rule, per-node model, context: fresh, loop nodes, and output substitution. +# Demonstrates: bash nodes, script nodes (TypeScript via bun), structured output, +# when: conditions, trigger_rule, per-node model, context: fresh, loop nodes, +# and output substitution. # # IMPORTANT: This is a reference example. Design your actual workflow # around the user's specific needs — the number of nodes, their types, @@ -42,6 +43,26 @@ nodes: fi timeout: 5000 + # ── SCRIPT NODE: TypeScript (bun runtime), no AI, stdout captured as output ── + # Deterministic parsing the shell would mangle — extracts labels cleanly as JSON. + # + # NOTE: `$fetch-issue.output` is substituted *raw* into the script body (no shell + # quoting — see reference/variables.md). JSON is valid JS expression syntax — + # assign directly without String.raw or JSON.parse. String.raw breaks if the + # output contains backticks (e.g. markdown code spans in AI-generated content). + - id: extract-labels + script: | + try { + const issue = $fetch-issue.output; + const labels = (issue.labels ?? []).map((l) => l.name); + console.log(JSON.stringify({ labels, count: labels.length })); + } catch { + console.log(JSON.stringify({ labels: [], count: 0 })); + } + runtime: bun + depends_on: [fetch-issue] + timeout: 10000 + # ── PROMPT NODE: Inline AI prompt with structured output ── - id: classify prompt: | diff --git a/.claude/skills/archon/references/authoring-commands.md b/.claude/skills/archon/references/authoring-commands.md index 0b1240da6b..603dd3e4a3 100644 --- a/.claude/skills/archon/references/authoring-commands.md +++ b/.claude/skills/archon/references/authoring-commands.md @@ -4,14 +4,29 @@ Commands are plain Markdown files containing AI prompt templates. They are the a ## File Location +Commands are discovered from three scopes, highest-precedence first: + ``` -.archon/commands/ -├── my-command.md # Custom command -├── review-code.md # Another custom command -└── defaults/ # Optional: override bundled defaults - └── archon-assist.md # Overrides the bundled archon-assist +/.archon/commands/ # 1. Repo-scoped (wins) +├── my-command.md # Custom command for this repo +├── archon-assist.md # Overrides the bundled archon-assist +└── triage/ # Subfolders allowed, 1 level deep + └── review.md # Resolves as 'review', not 'triage/review' + +~/.archon/commands/ # 2. Home-scoped (user-level, shared across all repos) +├── review-checklist.md # Personal helper available in every repo +└── pr-style-guide.md + + # 3. Shipped with Archon (archon-assist, etc.) ``` +**Resolution rules:** + +- Filename-without-extension is the command name (e.g. `my-command.md` → `my-command`). +- 1-level subfolders are supported for grouping; resolution is still by filename (`triage/review.md` → `review`). +- Repo scope overrides home scope overrides bundled, by name. +- Duplicate basenames **within a scope** (e.g. two different `review.md` files in `triage/` and `security/`) are a user error — keep names unique within each scope. + Commands are referenced by name (without `.md`) in workflow YAML files. ## File Format @@ -78,11 +93,14 @@ Command names must: ## Discovery and Priority When a workflow references `command: my-command`, Archon searches in this order: -1. `.archon/commands/my-command.md` (repo custom) -2. `.archon/commands/defaults/my-command.md` (repo default overrides) + +1. `/.archon/commands/my-command.md` (repo scope) +2. `~/.archon/commands/my-command.md` (home scope — shared across every repo on the machine) 3. Bundled defaults (shipped with Archon) -First match wins. To override a bundled command, create a file with the same name in your repo. +First match wins. To override a bundled command, drop a file with the same name at either scope. To override a home-scoped command for a specific repo, drop a file with the same name in that repo's `.archon/commands/`. + +> **Web UI note**: Home-scoped commands appear in the workflow builder's node palette under a dedicated "Global (~/.archon/commands/)" section, distinct from project and bundled entries. ## Referencing Commands from Workflows diff --git a/.claude/skills/archon/references/cli-commands.md b/.claude/skills/archon/references/cli-commands.md index 157eacb713..0cc1a0ee06 100644 --- a/.claude/skills/archon/references/cli-commands.md +++ b/.claude/skills/archon/references/cli-commands.md @@ -32,7 +32,7 @@ archon workflow run archon-fix-github-issue --resume | `--branch ` / `-b` | Branch name for worktree. Reuses existing worktree if healthy | | `--from ` / `--from-branch ` | Start-point branch for new worktree (default: repo default branch) | | `--no-worktree` | Skip isolation — run in the live checkout | -| `--resume` | Resume the last failed run of this workflow (skips completed steps/nodes) | +| `--resume` | Resume the last failed run of this workflow at this cwd (skips completed nodes) | | `--cwd ` | Working directory override | **Flag conflicts** (errors): @@ -42,6 +42,87 @@ archon workflow run archon-fix-github-issue --resume **Default behavior** (no flags): Auto-creates a worktree with branch name `{workflow-name}-{timestamp}`. +**Auto-resume without `--resume`**: If a prior invocation of the same workflow at the same cwd failed, the next invocation automatically skips completed nodes. `--resume` is only needed when you want to force resume a specific failed run or to reuse the worktree from that run. + +### `archon workflow status` + +Show the currently running workflow (if any) with its run ID, state, and last activity. + +```bash +archon workflow status +archon workflow status --json # Machine-readable output +``` + +### `archon workflow approve [comment]` + +Approve a paused approval-node workflow. Auto-resumes the workflow. + +```bash +archon workflow approve abc123 +archon workflow approve abc123 --comment "Plan looks good" +archon workflow approve abc123 "Plan looks good" # positional form +``` + +For interactive loop nodes, the comment becomes `$LOOP_USER_INPUT` on the next iteration. For approval nodes with `capture_response: true`, the comment becomes `$.output` for downstream nodes. + +### `archon workflow reject [reason]` + +Reject a paused approval gate. Without `on_reject` on the node, cancels the workflow. With `on_reject`, runs the rework prompt with `$REJECTION_REASON` substituted and re-pauses. + +```bash +archon workflow reject abc123 +archon workflow reject abc123 --reason "Plan misses test coverage" +archon workflow reject abc123 "Plan misses test coverage" +``` + +### `archon workflow abandon ` + +Mark a non-terminal workflow run as cancelled. Use when a `running` row is stuck after a server crash or when you want to discard a paused run without rejecting. This does NOT kill an in-flight subprocess — it only transitions the DB row. + +```bash +archon workflow abandon abc123 +``` + +> **There is no `archon workflow cancel` CLI subcommand.** To actively cancel a running workflow (terminate its subprocess), use the chat slash command `/workflow cancel ` on the platform that started it (Web UI, Slack, Telegram, etc.), or the Cancel button on the Web UI dashboard. The CLI only offers `abandon`, which is the right tool for orphan cleanup but does not interrupt a live subprocess. + +### `archon workflow resume [message]` + +Explicitly re-run a failed run. Most workflows auto-resume without this — use it when you want to force a specific run ID. + +```bash +archon workflow resume abc123 +archon workflow resume abc123 "continue with the plan" +``` + +### `archon workflow cleanup [days]` + +**Deletes** old terminal workflow runs (`completed`/`failed`/`cancelled`) from the database for disk hygiene. Does NOT transition `running` rows — use `abandon`/`cancel` for those. + +```bash +archon workflow cleanup # Default: 7 days +archon workflow cleanup 30 # Custom: 30 days +``` + +### `archon workflow event emit --run-id --type [--data ]` + +Emit a workflow event to a running workflow. Used inside loop prompts to signal state (e.g. "checkpoint written") for observability. Rarely invoked from the shell directly. + +```bash +archon workflow event emit --run-id abc123 --type checkpoint --data '{"step":"plan"}' +``` + +### `archon continue [flags] [message]` + +Continue work on a branch with prior context. Defaults to `archon-assist`; use `--workflow` to pick a different workflow. Useful for iterative sessions on the same worktree without typing the full `workflow run` incantation. + +```bash +archon continue feat/auth "Add password reset" +archon continue feat/auth --workflow archon-feature-development "Continue from step 3" +archon continue feat/auth --no-context "Start fresh without loading prior artifacts" +``` + +Flags: `--workflow `, `--no-context`. + ## Isolation Commands ### `archon isolation list` @@ -59,11 +140,20 @@ Outputs: branch name, path, workflow type, platform, last activity age. Ghost en Remove stale worktree environments. ```bash -archon isolation cleanup # Default: 7 days -archon isolation cleanup 14 # Custom: 14 days -archon isolation cleanup --merged # Remove branches merged into main (+ remote branches) +archon isolation cleanup # Default: 7 days +archon isolation cleanup 14 # Custom: 14 days +archon isolation cleanup --merged # Also remove worktrees whose branches merged into main (deletes remote branches too) +archon isolation cleanup --merged --include-closed # Also remove worktrees whose PRs were closed without merging ``` +**Flags:** + +| Flag | Description | +|------|-------------| +| `[days]` | Positional — age threshold in days. Environments untouched for longer than this are removed. Default: 7 | +| `--merged` | Union of three signals — ancestry (`git branch --merged`), patch equivalence (`git cherry`), and PR state (`gh`) — safely catches squash-merges | +| `--include-closed` | With `--merged`, also remove worktrees whose PRs were closed (abandoned, not merged) | + ## Validate Commands ### `archon validate workflows [name]` diff --git a/.claude/skills/archon/references/dag-advanced.md b/.claude/skills/archon/references/dag-advanced.md index 4add35d8f7..63a83e9101 100644 --- a/.claude/skills/archon/references/dag-advanced.md +++ b/.claude/skills/archon/references/dag-advanced.md @@ -1,6 +1,6 @@ # Advanced Features: Hooks, MCP, Skills, Retry -These features are available on **command and prompt nodes** (hooks, MCP, skills, tool restrictions) and **command, prompt, and bash nodes** (retry, output_format). Loop nodes do not support these features (`retry` on loop nodes is a hard error; others are silently ignored). +These features are available on **command and prompt nodes** (hooks, MCP, skills, tool restrictions, `output_format`, `agents`, Claude SDK options) and **command, prompt, bash, and script nodes** (retry). Loop nodes do not support these features (`retry` on loop nodes is a hard error; others are silently ignored). Bash and script nodes silently ignore AI-specific fields (a loader warning lists the ignored fields). ## Provider Compatibility diff --git a/.claude/skills/archon/references/good-practices.md b/.claude/skills/archon/references/good-practices.md new file mode 100644 index 0000000000..e731a2583d --- /dev/null +++ b/.claude/skills/archon/references/good-practices.md @@ -0,0 +1,241 @@ +# Workflow Good Practices and Anti-Patterns + +Guidance for authoring workflows that survive first contact with a real codebase. Written for an agent or human writing their first non-trivial workflow. + +## Good Practices + +### 1. Use deterministic nodes for deterministic work + +AI nodes are expensive, non-reproducible, and can hallucinate. Use `bash:` or `script:` for anything that has a right answer a computer can produce. + +- **Run tests** with `bash: "bun run test"`, not `prompt: "run the tests and tell me if they passed"`. +- **Parse JSON** with `script:` (bun/uv), not a `prompt:` that re-derives structure from free text. +- **Read files with known paths** via `bash: "cat path/to/file"` or `Read` in an AI node where the agent actually needs to reason about the content. +- **Git state checks** (current branch, uncommitted changes, merge-base) → `bash:`. + +### 2. Use `output_format` for every node whose output downstream `when:` reads + +`when:` conditions do best-effort JSON parsing on `$nodeId.output` for `.field` access. If the upstream node doesn't enforce a shape, you're pattern-matching free-form AI text — fragile. + +```yaml +# GOOD +- id: classify + prompt: "Classify as BUG or FEATURE" + output_format: # enforces the JSON shape + type: object + properties: + type: { type: string, enum: [BUG, FEATURE] } + required: [type] + +- id: investigate + command: investigate-bug + depends_on: [classify] + when: "$classify.output.type == 'BUG'" # safe field access + +# BAD +- id: classify + prompt: "Is this a bug or a feature?" + # no output_format; AI might reply "it looks like a bug", "BUG", or "This is a bug.\n\n..." + +- id: investigate + command: investigate-bug + depends_on: [classify] + when: "$classify.output == 'BUG'" # fragile string match +``` + +### 3. `trigger_rule: none_failed_min_one_success` after conditional branches + +After `when:`-gated branches, the downstream merge node will see one or more **skipped** dependencies. Skipped ≠ success. Default `all_success` fails. + +```yaml +- id: investigate + command: investigate-bug + depends_on: [classify] + when: "$classify.output.type == 'BUG'" + +- id: plan + command: plan-feature + depends_on: [classify] + when: "$classify.output.type == 'FEATURE'" + +- id: implement + command: implement + depends_on: [investigate, plan] + trigger_rule: none_failed_min_one_success # CORRECT — exactly one ran + # trigger_rule: all_success ← would fail here (one dep skipped) +``` + +Use `one_success` when any dep succeeding is enough; `none_failed_min_one_success` when no dep should have failed AND at least one must have succeeded; `all_done` for "run cleanup regardless" patterns with `cancel:` or notification nodes. + +### 4. `context: fresh` requires artifacts for state passing + +A node with `context: fresh` starts with no memory of prior nodes in the same workflow. The only way state moves is via files. Default is `fresh` for parallel layers and `shared` for sequential — explicit `context: fresh` is common when you want cost isolation. + +```yaml +- id: investigate + command: investigate-bug + # Investigator WRITES to $ARTIFACTS_DIR/investigation.md + +- id: implement + command: implement-fix + depends_on: [investigate] + context: fresh + # Implementer MUST read $ARTIFACTS_DIR/investigation.md — it has no memory + # of what the investigator found. +``` + +Command files should lead with "read artifacts from `$ARTIFACTS_DIR/...`" when they're downstream of a fresh node. This is the single biggest quality lever on multi-node workflows. + +### 5. Cheap models for glue, strong models for substance + +Classification, routing, formatting, and short summaries don't need Opus. Use `model: haiku` for these and reserve `sonnet`/`opus` for the nodes that actually produce code or long-form analysis. Combined with `allowed_tools: []` on pure-text nodes, this cuts cost dramatically. + +```yaml +- id: classify + prompt: "Classify this issue" + model: haiku # fast + cheap + allowed_tools: [] # no tool overhead + output_format: { ... } + +- id: implement + command: implement-fix + model: sonnet # where the thinking happens +``` + +### 6. Write the workflow description for routing + +Archon's orchestrator routes user intent to workflows by description. Write descriptions that make routing obvious. + +- Start with the imperative action: "Fix a GitHub issue end-to-end", "Generate a Remotion video composition". +- Mention triggers: "Use when the user asks to review a PR", "Use when there's a failing test run". +- Mention what it does NOT do: "Does not create a PR — use `archon-plan-to-pr` for that". + +### 7. Validate before shipping + +Never declare a workflow "done" without: + +```bash +archon validate workflows # YAML + DAG structure + resource refs +``` + +This checks: YAML syntax, node ID uniqueness, no cycles, all `depends_on` exist, all `$nodeId.output` refs point to known nodes, all `command:` files exist, all `mcp:` configs parse, all `skills:` directories exist, provider/model compatibility, named script existence, runtime availability. Fix everything it reports before first run. + +For brand-new workflows, also: +1. Run once against a trivial input (`archon workflow run my-workflow --branch test/sanity "hello"`) +2. Check the run log at `~/.archon/workspaces///logs/.jsonl` +3. Check artifacts at `~/.archon/workspaces///artifacts/runs//` + +See `references/troubleshooting.md` for how to read those. + +### 8. Design the artifact chain before writing command files + +In a multi-node workflow, each node's artifact IS the specification for the next node. Before writing any command body, map out: + +| Node | Reads | Writes | +|------|-------|--------| +| `investigate-issue` | GitHub issue via `gh` | `$ARTIFACTS_DIR/issues/issue-{n}.md` | +| `implement-issue` | Artifact from `investigate-issue` | Code files, tests | +| `create-pr` | Git diff | GitHub PR, `$ARTIFACTS_DIR/pr-body.md` | + +If a downstream agent can't execute from just its artifact, the artifact is incomplete. This is the single most common failure mode in multi-node workflows. + +### 9. Keep workflows reversible + +Use `worktree.enabled: true` at the workflow level for anything that modifies the codebase. The CLI `--no-worktree` flag will hard-error, forcing users into isolation. The cost is a one-time cp of the worktree; the benefit is never having a failed workflow corrupt a live checkout. + +For read-only workflows (triage, reporting, code analysis), pin `worktree.enabled: false` instead — saves the worktree setup cost. + +--- + +## Anti-Patterns + +### ❌ Asking AI to run deterministic checks + +```yaml +# BAD +- id: test + prompt: "Run bun run test and tell me if it passed" + +# GOOD +- id: test + bash: "bun run test 2>&1" + +- id: react-to-tests + prompt: "Fix any failures: $test.output" + depends_on: [test] + trigger_rule: all_done # run even if tests failed +``` + +### ❌ Pattern-matching free-form AI output in `when:` + +```yaml +# BAD — brittle +- id: decide + prompt: "Should we proceed? Answer yes or no." +- id: do-thing + depends_on: [decide] + when: "$decide.output == 'yes'" # AI says "Yes!" or "Yes, because..." — no match + +# GOOD +- id: decide + prompt: "Should we proceed?" + output_format: + type: object + properties: { proceed: { type: boolean } } + required: [proceed] +- id: do-thing + depends_on: [decide] + when: "$decide.output.proceed == 'true'" +``` + +### ❌ Commands that assume prior-node memory in a `context: fresh` chain + +```markdown + +Fix the bug we discussed in the investigation phase. + + +Read the investigation at `$ARTIFACTS_DIR/issues/issue-{n}.md`. +Extract the root cause, affected files, and implementation plan. +Implement the changes exactly as specified in the plan. +``` + +### ❌ Long flat layers of AI nodes + +Ten sibling `prompt:` nodes in one layer all depending on one upstream is a $N/run cost bomb and a latency trap. If the work is parallel and similar, use the `agents:` inline sub-agent map-reduce pattern with a cheap model per item and a single stronger reducer. See `references/dag-advanced.md` and the [Inline sub-agents section on archon.diy](https://archon.diy/guides/authoring-workflows/#inline-sub-agents) for a worked example. + +### ❌ Hardcoding secrets in YAML or MCP configs + +Use `$ENV_VAR` expansion in MCP configs and the `env:` block in `.archon/config.yaml` (or Web UI Settings → Projects → Env Vars). See `references/repo-init.md` §Per-Project Env Injection. + +### ❌ `retry` on a loop node + +Loop nodes manage their own iteration via `max_iterations`. Setting `retry:` on a loop is a **hard parse error** — the workflow fails to load. If a loop iteration is flaky, handle it inside the loop prompt (the AI can retry tool calls) or use `until_bash` to gate completion on a deterministic check. + +### ❌ Tiny `max_iterations` on open-ended loops + +A loop with `max_iterations: 3` that's supposed to implement N stories from a PRD will silently stop after 3 iterations and leave the work half-done. Think about the worst case — multi-story PRDs need 10–20, fix-iterate cycles need 5–8, refinement loops need 3–5. + +### ❌ Missing `interactive: true` at workflow level for approval/loop gates on web + +Web UI dispatches non-interactive workflows to a background worker that cannot deliver chat messages. Approval-gate messages and loop `gate_message` prompts will never reach the user. If the workflow has `approval:` nodes OR `loop.interactive: true`, set workflow-level `interactive: true`. + +### ❌ Tool-restricted nodes without the MCP wildcard + +```yaml +# BAD — no tools available, including MCP +- id: analyze + prompt: "Use the Postgres MCP to query users" + mcp: .archon/mcp/postgres.json + allowed_tools: [] # OOPS — disables EVERYTHING, including MCP tools + +# FIXED — Archon auto-adds mcp____* wildcards when mcp: is set, +# so this actually works out of the box. The anti-pattern is forgetting +# and manually adding Read/Write/Bash/etc. when you only want MCP. +- id: analyze + prompt: "Use Postgres MCP to query users" + mcp: .archon/mcp/postgres.json + allowed_tools: [] # correct — MCP tools auto-attached +``` + +Caveat: this only helps Claude. Codex gets MCP config from `~/.codex/config.toml` globally, not per-node. diff --git a/.claude/skills/archon/references/interactive-workflows.md b/.claude/skills/archon/references/interactive-workflows.md index 243cfdb7b0..856d50afd1 100644 --- a/.claude/skills/archon/references/interactive-workflows.md +++ b/.claude/skills/archon/references/interactive-workflows.md @@ -103,4 +103,4 @@ archon workflow reject "reason for rejection" - **Workflow shows `running` for a long time**: The AI is doing research/implementation. Be patient — check again in a few minutes. - **Log file not found**: The log is at `~/.archon/workspaces///logs/.jsonl` -- **User wants to cancel**: Run `archon workflow reject ` or `archon workflow cancel ` +- **User wants to cancel**: Run `archon workflow reject ` to stop at an approval gate, or `archon workflow abandon ` to mark the run cancelled without killing any subprocess. To actively terminate a still-live subprocess, use the chat slash command `/workflow cancel ` on the platform that started it — there is no `archon workflow cancel` CLI subcommand diff --git a/.claude/skills/archon/references/parameter-matrix.md b/.claude/skills/archon/references/parameter-matrix.md new file mode 100644 index 0000000000..2d7fec80ce --- /dev/null +++ b/.claude/skills/archon/references/parameter-matrix.md @@ -0,0 +1,193 @@ +# Parameter Matrix (Quick Reference) + +One-page lookup for Archon workflow parameters: which field works on which node type, how to pick the right parameter for a given intent, and the gotchas that don't fail loudly. + +This is a **lookup reference**. For the full explanation of any field, follow the cross-references at the bottom to the detailed guides. + +## Master Matrix: Parameters × Node Types + +There are seven node types. Exactly one of `command`, `prompt`, `bash`, `script`, `loop`, `approval`, or `cancel` must appear per node. + +| Parameter | command | prompt | bash | script | loop | approval | cancel | +| -------------------------------------------- | :-----: | :-----: | :-----: | :-----: | :--------------------------: | :------------: | :-----: | +| `id` | yes | yes | yes | yes | yes | yes | yes | +| `depends_on` | yes | yes | yes | yes | yes | yes | yes | +| `when` | yes | yes | yes | yes | yes | yes | yes | +| `trigger_rule` | yes | yes | yes | yes | yes | yes | yes | +| `idle_timeout` | yes | yes | ignored (use `timeout`) | ignored (use `timeout`) | yes (per-iter) | yes | yes | +| `timeout` (total, not idle) | — | — | yes | yes | — | — | — | +| `model` / `provider` | yes | yes | ignored | ignored | **ignored at runtime** | ignored | ignored | +| `context: fresh` \| `shared` | yes | yes | ignored | ignored | ignored (use `loop.fresh_context`) | ignored | ignored | +| `output_format` | yes | yes | ignored | ignored | ignored | ignored | ignored | +| `allowed_tools` / `denied_tools` | yes | yes | ignored | ignored | ignored | ignored | ignored | +| `hooks` | yes | yes | ignored | ignored | ignored | ignored | ignored | +| `mcp` | yes | yes | ignored | ignored | ignored | ignored | ignored | +| `skills` | yes | yes | ignored | ignored | ignored | ignored | ignored | +| `agents` | yes | yes | ignored | ignored | ignored | ignored | ignored | +| `retry` | yes | yes | yes | yes | **hard error** | yes (`on_reject`) | yes | +| `effort` / `thinking` / `fallbackModel` / `betas` / `sandbox` / `maxBudgetUsd` / `systemPrompt` | yes | yes | ignored | ignored | ignored | ignored | ignored | +| `bash` / `script` / `runtime` / `deps` | — | — | `bash` required | `script` + `runtime` required | — | — | — | +| `loop` (nested config) | — | — | — | — | **required** | — | — | +| `approval` (nested config) | — | — | — | — | — | **required** | — | +| `cancel` (reason string) | — | — | — | — | — | — | **required** | + +**Reading the matrix:** +- **yes** — field works as expected on this node type. +- **ignored** — field is accepted by the parser but has no effect at runtime. Loader emits a warning (`_node_ai_fields_ignored`). +- **hard error** — workflow fails to load. Only `retry` on a loop node does this. + +Most AI features work on `command` and `prompt` nodes. Loop nodes are thin controllers — the AI fields inside `loop.prompt` are what actually run. `bash` and `script` nodes silently ignore AI fields. `approval` and `cancel` nodes don't invoke AI at all. + +## Parameter Selection by Intent + +Organized by what you're trying to do, not by field name. Useful when you know the outcome you want but aren't sure which parameter gets you there. + +| You want to... | Use | +| ------------------------------------------------ | ------------------------------------------------------------ | +| Control cost per node | `model: haiku`, `maxBudgetUsd: 0.50`, `effort: low` | +| Force pure reasoning (no tools) | `allowed_tools: []` | +| Read-only analysis phase | `denied_tools: [Write, Edit, Bash]` | +| Route based on upstream output | Upstream `output_format: {...}` + downstream `when:` | +| Join after mutually-exclusive routes | `trigger_rule: none_failed_min_one_success` or `one_success` | +| Run two independent branches in parallel | Two nodes with no shared `depends_on` | +| Iterate until tests pass | `loop: {until_bash: "bun run test", max_iterations: N}` | +| Iterate through a backlog without memory bleed | `loop: {fresh_context: true}`, state written to `$ARTIFACTS_DIR` | +| Iterate with human feedback between iterations | `loop: {interactive: true, gate_message: "..."}` + workflow `interactive: true` | +| Single human approval gate | `approval:` node with `on_reject: {prompt, max_attempts}` | +| Fail fast if upstream output is wrong | `cancel:` node with `when:` | +| Enforce a rule on every file edit | `hooks.PostToolUse` with `matcher: "Write\|Edit"` | +| Deny dangerous commands | `hooks.PreToolUse` with `permissionDecision: deny` | +| Give a node domain knowledge | `skills: [skill-name]` | +| Give a node external tools | `mcp: .archon/mcp/server.json` | +| Retry flaky API calls | `retry: {max_attempts: 3, delay_ms: 2000}` | +| Run Python in a node | `script:` node with `runtime: uv`, `deps: [...]` | +| Run TypeScript in a node | `script:` node with `runtime: bun` | +| Mix providers in one workflow | Workflow-level `provider: claude`, per-node `provider: codex` | +| Use a non-default model for one node | Node-level `model:` override | +| Run on a 1M context window | `model: opus[1m]` + `betas: ['context-1m-2025-08-07']` | +| Increase per-iteration timeout on a long loop | `idle_timeout: 600000` on the loop node | +| Pass large artifacts between nodes | Write to `$ARTIFACTS_DIR/...`, read in downstream node | +| Pass small structured data | `output_format` + `$nodeId.output.field` access | +| Block workflow on an external condition | `bash:` polling loop or `approval:` node | +| Spawn parallel sub-tasks inside one node | Inline `agents:` map (see below) | +| Force isolation regardless of CLI flags | Workflow-level `worktree: {enabled: true}` | +| Force live checkout for read-only workflows | Workflow-level `worktree: {enabled: false}` | + +## Silent Failures (what gets ignored without erroring) + +Things that don't fail parsing but don't do what you'd expect: + +1. **`model` / `provider` on a loop node** → silently ignored. Logged as `loop_node_ai_fields_ignored`. The loop is a controller; set model at workflow level or inside the loop prompt body. +2. **`hooks` / `mcp` / `skills` / `output_format` / `allowed_tools` / `denied_tools` on a loop, bash, script, approval, or cancel node** → silently ignored. +3. **`context: fresh` on a loop** → ignored. Use `loop.fresh_context: true` instead. +4. **`output_format` on a bash or script node** → schema is accepted but bash/script output is whatever stdout says; no JSON coercion. +5. **Unknown `$nodeId.output` reference** → resolves to empty string + warning; does not fail the workflow. +6. **Invalid `when:` expression** → node silently skipped (fail-closed). +7. **`allowed_tools` / `denied_tools` on Codex nodes** → ignored. Use Codex CLI config (`~/.codex/config.toml`). +8. **`hooks` on Codex nodes** → ignored + warning logged. +9. **`mcp` or `skills` per-node on Codex** → ignored. Configure globally in `~/.codex/config.toml` or `~/.agents/skills/`. +10. **`trigger_rule: all_success` after `when:`-gated fan-out** → branches that didn't run count as "not succeeded"; the join node will never fire. Use `none_failed_min_one_success` or `one_success`. +11. **Node-level `interactive: true` on an approval node or loop, without workflow-level `interactive: true`** → on the Web UI, gate messages never reach the user. The workflow dispatches to a background worker that can't deliver chat messages. +12. **Missing env var in MCP config** → warning logged, node continues with empty string substitution. +13. **`retry` on a loop node** → this one is a **hard parse error** (not silent). Use the loop's own `max_iterations` and `until_bash` for finish-line detection. +14. **`String.raw\`$nodeId.output\`` in a `script:` body** → silently corrupts when the substituted value contains a backtick (e.g. markdown code spans in AI output or `output_format` payloads). The template literal terminates early, producing a cryptic `Expected ";"` parse error. Use direct assignment instead: `const data = $nodeId.output;` — JSON is valid JS expression syntax and needs no wrapper. + +The pattern across these: if you set an AI feature on a non-AI node, it's silently ignored. Watch loader logs for `_ignored` warnings when debugging. + +## Inline `agents:` (Task-tool sub-agents) + +A node can define named sub-agents that Claude invokes via the `Task` tool. Useful for map-reduce patterns: one node spawns N parallel sub-tasks with a cheap model, then a reducer summarizes. + +```yaml +- id: analysis + prompt: | + For each area of the codebase, delegate to the appropriate sub-agent + via the Task tool. Summarize all findings into a single report. + agents: + security-scanner: # kebab-case id + description: "Scan for common web vulnerabilities" + prompt: "Run OWASP top-10 style checks on the given files" + model: haiku + tools: [Read, Grep, Glob] # tool whitelist for this sub-agent + disallowedTools: [Write, Edit, Bash] + maxTurns: 5 + test-coverage-auditor: + description: "Report untested or weakly-tested surfaces" + prompt: "Identify code paths without corresponding tests" + model: haiku + tools: [Read, Grep, Glob] + skills: [test-coverage-patterns] # skill injection per sub-agent + maxTurns: 5 +``` + +**Fields per agent:** + +| Field | Required | Description | +| ------------------ | :------: | --------------------------------------------------------- | +| `description` | yes | Shown when Claude decides which agent to delegate to | +| `prompt` | yes | System prompt the sub-agent runs under | +| `model` | no | Per-agent model override | +| `tools` | no | Tool whitelist for the sub-agent | +| `disallowedTools` | no | Tool blacklist | +| `skills` | no | Skills to inject into the sub-agent | +| `maxTurns` | no | Max conversation turns for the sub-agent | + +**Naming rule:** lowercase kebab-case. No leading or trailing hyphens, no double hyphens, no digits-only ids. + +**When to use `agents:` vs fan-out at the workflow level:** +- Use `agents:` when the number of sub-tasks is dynamic or decided by the orchestrator node at runtime. +- Use workflow-level fan-out (parallel nodes with `depends_on: [setup]`) when the sub-tasks are known ahead of time and each needs its own artifact. + +See [archon.diy/guides/authoring-workflows/#inline-sub-agents](https://archon.diy/guides/authoring-workflows/#inline-sub-agents) for a worked end-to-end example. + +## Cross-References to Detailed Guides + +Use this matrix to find the right parameter. Use these references for the full explanation of how it works. + +| Topic | Detailed reference | +| ------------------------------------------------ | ----------------------------------------------------------------------- | +| Workflow authoring overview, node base fields | `workflow-dag.md` | +| Loop nodes in depth (completion, session patterns) | `workflow-dag.md` § Loop Nodes | +| Approval / cancel nodes | `workflow-dag.md` § Approval Nodes, § Cancel Nodes | +| Hooks (events, matchers, response shapes) | `dag-advanced.md` § Hooks | +| MCP (transports, env expansion, wildcards) | `dag-advanced.md` § MCP | +| Skills (injection, discovery, combining with MCP) | `dag-advanced.md` § Skills | +| Retry classification (FATAL / TRANSIENT / UNKNOWN) | `dag-advanced.md` § Retry Configuration | +| Variable reference (`$ARGUMENTS`, `$ARTIFACTS_DIR`, etc) | `variables.md` | +| CLI flags and commands | `cli-commands.md` | +| Command file authoring | `authoring-commands.md` | +| Repo initialization, `.archon/config.yaml` schema | `repo-init.md` | +| Good practices and anti-patterns | `good-practices.md` | +| Interactive workflow relay protocol | `interactive-workflows.md` | +| Debugging and log locations | `troubleshooting.md` | +| Full schema reference | [archon.diy/reference/configuration/](https://archon.diy/reference/configuration/) | + +## Providers at a Glance + +| Feature | Claude | Codex | Pi (community) | +| ------------------------------- | :-----------: | :-------------------------------------: | :----------------------------------: | +| `command` / `prompt` / `loop` | yes | yes | yes | +| `bash` / `script` | yes | yes | yes | +| `output_format` | reliable | reliable | best-effort | +| `allowed_tools` / `denied_tools` | yes | ignored (use Codex CLI config) | ignored | +| `hooks` | yes | **ignored + warn** | not available | +| `mcp` (per-node) | yes | global `~/.codex/config.toml` only | not available | +| `skills` (per-node) | yes | global `~/.agents/skills/` only | not available | +| Model naming | `haiku`, `sonnet`, `opus`, `opus[1m]` | Codex model ID (e.g. `gpt-5.2`) | `/` (e.g. `anthropic/claude-opus-4-5`, `openai/gpt-4o`, `groq/llama-3-70b`) | +| `effort` / `thinking` | yes | use `modelReasoningEffort` for reasoning models | via `effort:` (maps to thinking level) | +| Session resume / `--resume` | yes | yes | yes | + +Mixing providers in one workflow: set workflow-level `provider: claude`, then override per-node with `provider: codex` or `provider: pi`. Cross-provider `$nodeId.output` substitution works as expected. + +## Ten Principles for Safe Workflow Design + +1. Always use `--branch ` (or `worktree: {enabled: true}`) for workflows that modify the codebase. +2. Validate before running: `archon validate workflows `. +3. Tier your models. Haiku for routing and glue; Sonnet for reasoning and review; Opus only where the context is deep. +4. Use `output_format` for every node whose output downstream `when:` reads. Never pattern-match free-form AI text. +5. On Ralph-style loops, use `loop.fresh_context: true` and treat `$ARTIFACTS_DIR` as the source of truth. Command bodies should re-read state at the top of every iteration. +6. Use interactive loops for iterative refinement with the human. Use `approval:` nodes for single-point checkpoints. +7. Read-only analysis phases use `denied_tools: [Write, Edit, Bash]`. Separation of concerns. +8. Use `hooks.PostToolUse` to enforce post-change validation (type-check, lint). Tighter feedback loop than end-of-workflow review. +9. Large artifacts go through `$ARTIFACTS_DIR`. Small structured data goes through `$nodeId.output.field`. +10. AI can scaffold a workflow. Only a human can verify it. Read the YAML before running. diff --git a/.claude/skills/archon/references/repo-init.md b/.claude/skills/archon/references/repo-init.md index 66be6375f5..e44907fd2e 100644 --- a/.claude/skills/archon/references/repo-init.md +++ b/.claude/skills/archon/references/repo-init.md @@ -10,14 +10,27 @@ Create the following in your repository root: .archon/ ├── commands/ # Custom command files (.md) ├── workflows/ # Workflow definitions (.yaml) +├── scripts/ # Named scripts for script: nodes (.ts/.js for bun, .py for uv) — optional ├── mcp/ # MCP server config files (.json) — optional -└── config.yaml # Repo-specific configuration — optional +├── state/ # Cross-run workflow state — gitignored, never committed +├── config.yaml # Repo-specific configuration — optional +└── .env # Repo-scoped Archon env (optional; do NOT commit) ``` ```bash -mkdir -p .archon/commands .archon/workflows +mkdir -p .archon/commands .archon/workflows .archon/scripts ``` +**What each directory is for:** + +- `commands/` — Reusable prompt templates used by `command:` workflow nodes. Committed to git. +- `workflows/` — YAML workflow definitions. Committed to git. +- `scripts/` — Named TypeScript/JavaScript (bun) or Python (uv) scripts referenced by `script:` nodes. Extension determines runtime: `.ts`/`.js` → bun, `.py` → uv. Committed to git. +- `mcp/` — MCP server JSON configs. Usually checked in with `$ENV_VAR` references; avoid hardcoding secrets. Some teams gitignore this and rely entirely on env expansion. +- `state/` — Workflow-written cross-run state (e.g. the `repo-triage` dedup log). **Always gitignore** — these are runtime artifacts, not source. +- `config.yaml` — Repo-specific defaults (assistant, worktree settings, etc.). Committed to git. +- `.env` — Repo-scoped Archon env (loaded with `override: true` at boot). **Do NOT commit.** This is different from the target repo's top-level `.env` — that file belongs to the target project, and Archon strips its auto-loaded keys from subprocess env before spawning AI to prevent leakage. See **Three-Path Env Model** below. + ## Minimal config.yaml Create `.archon/config.yaml` only if you need to override defaults: @@ -52,11 +65,59 @@ Archon ships with built-in commands and workflows (like `archon-assist`, `archon Add to your `.gitignore`: ```gitignore -# Archon runtime artifacts (never commit) -.archon/mcp/ # May contain env var references +# Archon runtime artifacts — NEVER commit +.archon/state/ # Cross-run workflow state, runtime-only +.archon/.env # Repo-scoped Archon env (secrets) + +# Optional — gitignore if your MCP configs hardcode secrets +.archon/mcp/ +``` + +`.archon/commands/`, `.archon/workflows/`, and `.archon/scripts/` **should be committed** — they are part of your project's workflow definitions. `.archon/config.yaml` should be committed unless it contains secrets (use `.archon/.env` for those instead). + +## Three-Path Env Model + +Archon loads env from three distinct paths at boot, with different trust levels and precedence: + +| Path | Scope | Trust | Loaded? | +|------|-------|-------|---------| +| `~/.archon/.env` | User (home) | Trusted — user owns it | Yes, with `override: true` | +| `/.archon/.env` | Repo (per-project, Archon-owned) | Trusted — user owns it | Yes, with `override: true` (overrides home) | +| `/.env` | Target repo | **Untrusted** — belongs to the project being worked on | **Stripped from `process.env`** before subprocess spawn to prevent secret leakage (see [archon.diy/reference/security/](https://archon.diy/reference/security/#target-repo-env-isolation) for the full trust model) | + +Boot behavior emits observable log lines: + +``` +[archon] loaded N keys from ~/.archon/.env +[archon] loaded M keys from /path/to/repo/.archon/.env +[archon] stripped K keys from /path/to/repo (ANTHROPIC_API_KEY, OPENAI_API_KEY, ...) ``` -The `.archon/commands/` and `.archon/workflows/` directories should be committed — they are part of your project's workflow definitions. +**Where should you put what?** + +- **API keys for Archon itself** (`ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, `DATABASE_URL`, `SLACK_BOT_TOKEN`, etc.) → `~/.archon/.env` (shared across all repos) or `/.archon/.env` (per-repo override). +- **Target-project env that a workflow needs** (`GH_TOKEN`, `DOTENV_PRIVATE_KEY`, etc.) → see [Per-Project Env Injection](#per-project-env-injection) below. +- **Target-project env that Archon should NOT touch** → leave it in `/.env` where the project already expects it. Archon strips it from subprocess env but doesn't delete the file. + +The `archon setup --scope home|project [--force]` wizard writes to the right file for you and produces a timestamped backup on every rewrite. + +## Per-Project Env Injection + +For env vars a workflow's `bash:` and `script:` subprocesses need (`GH_TOKEN` for `gh` calls, `DATABASE_URL` for a migration script, etc.), use one of the two **managed injection** surfaces — both inject into subprocess env at workflow execution time, after the target-repo `.env` strip: + +**Option 1: `.archon/config.yaml` `env:` block** (checked into git; values can be `$REF_NAME` expansions from Archon env): + +```yaml +env: + GH_TOKEN: $GH_TOKEN # expanded from ~/.archon/.env at runtime + BUILD_TARGET: production # literal value +``` + +**Option 2: Web UI Settings → Projects → Env Vars** — per-codebase, stored in the Archon DB, values never returned over the API (only keys are listed). Use this for values that should NOT appear in git. + +Both surfaces inject into: Claude/Codex/Pi subprocess env, `bash:` node subprocess env, `script:` node subprocess env, and direct chat messages that run against the codebase. The worktree isolation layer propagates them as well. + +> **About keys in the target repo's `/.env`**: Archon unconditionally strips the keys auto-loaded from `/.env` out of `process.env` at boot (see the Three-Path Env Model above) and the Bun subprocess is invoked with `--no-env-file`, so those values do NOT reach AI / bash / script subprocesses. If a workflow needs a value that currently lives in the target repo's `.env`, surface it through one of the two managed injection options above — don't expect the target `.env` to leak through. ## Global Configuration diff --git a/.claude/skills/archon/references/troubleshooting.md b/.claude/skills/archon/references/troubleshooting.md new file mode 100644 index 0000000000..099cccd928 --- /dev/null +++ b/.claude/skills/archon/references/troubleshooting.md @@ -0,0 +1,162 @@ +# Troubleshooting Workflows + +Where to look when a workflow fails, hangs, or does the wrong thing. + +## Log Locations + +Workflow run logs are written as JSONL per run: + +``` +~/.archon/workspaces///logs/.jsonl +``` + +Each line is a structured event. The discriminator is the `type` field. Values (see `packages/workflows/src/logger.ts` for the canonical list): + +| `type` | Meaning | +|--------|---------| +| `workflow_start` / `workflow_complete` / `workflow_error` | Run lifecycle | +| `node_start` / `node_complete` / `node_error` / `node_skipped` | Node lifecycle | +| `assistant` | AI assistant message — has `content` field with the full AI output | +| `tool` | SDK tool invocation — has `tool_name`, `tool_input`, `duration_ms`, and optionally `tokens` | +| `validation` | Workflow-level validation event — has `check` and `result` (`pass` / `fail` / `warn` / `unknown`) | + +> **Loop iterations and per-attempt retry events are NOT in the JSONL file.** They go through the workflow event emitter (WebSocket / `workflow_events` DB table) under `loop_iteration_started` / `loop_iteration_completed` etc. To see them, query the DB or the Web UI dashboard — not the JSONL log. + +Find the run ID from `archon workflow status` (most recent run). Then: + +```bash +# Last assistant message (what the AI said before failure) +jq 'select(.type == "assistant") | .content' | tail -1 + +# All error events (node failures + workflow-level failures) +jq 'select(.type == "node_error" or .type == "workflow_error")' + +# Full event stream +cat | jq . +``` + +Adapter logs (Slack / Telegram / Web / GitHub) are emitted to stderr when `LOG_LEVEL=debug` is set on the server. + +## Artifact Locations + +``` +~/.archon/workspaces///artifacts/runs// +``` + +Inspect artifacts when a multi-node workflow produces wrong output. The failing node's upstream artifact is usually where the problem originated. + +```bash +ls ~/.archon/workspaces///artifacts/runs// +cat ~/.archon/workspaces///artifacts/runs//issues/issue-42.md +``` + +Artifacts are **external** to the repo on purpose — they don't pollute git. + +## Common Failure Modes + +### "No base branch could be resolved" + +A node references `$BASE_BRANCH` in its prompt, but neither git auto-detection nor `worktree.baseBranch` in `.archon/config.yaml` produced a branch. + +**Fix:** +1. Set `worktree.baseBranch: main` (or `dev`, or whatever) in `.archon/config.yaml`. +2. Or pass `--from ` on `archon workflow run`. +3. Or remove the `$BASE_BRANCH` reference if the node doesn't actually need it. + +### "Claude Code not found" / "Codex CLI binary not found" + +Compiled-binary builds of Archon no longer embed Claude Code / Codex — you install them separately and Archon resolves the binary via env var or config. + +**Fix (Claude):** +- Install: `curl -fsSL https://claude.ai/install.sh | bash` (or `npm install -g @anthropic-ai/claude-code`) +- Set `CLAUDE_BIN_PATH=/path/to/claude` in `~/.archon/.env`, OR +- Set `assistants.claude.claudeBinaryPath: /absolute/path` in `.archon/config.yaml` +- Autodetect covers `$HOME/.local/bin/claude` (native installer) — no config needed if you used that path + +**Fix (Codex):** +- Install: `npm install -g @openai/codex` (or platform-specific instructions) +- Set `CODEX_CLI_PATH=/path/to/codex` or `assistants.codex.codexBinaryPath` in config +- Autodetect covers the standard npm / Homebrew locations per platform + +See [archon.diy/getting-started/installation/](https://archon.diy/getting-started/installation/) for full platform-specific install paths. + +### Workflow shows `running` for a long time but nothing happens + +Three possibilities: + +1. **The AI is actually working.** Check `~/.archon/workspaces///logs/.jsonl` — if you see recent `tool` or `assistant` events in the tail, it's fine. Wait. +2. **The server crashed and left an orphan row.** Server startup no longer auto-fails orphaned `running` rows (per the "No Autonomous Lifecycle Mutation" rule — `CLAUDE.md`). Transition it manually: + - Web UI: Dashboard → Abandon or Cancel button on the run card + - CLI: `archon workflow abandon ` — marks the DB row cancelled without killing any subprocess. Right tool for orphans since the subprocess is already gone + - Chat (Slack / Telegram / Web): `/workflow cancel ` — actively terminates the subprocess. Use for a still-live run that needs to be interrupted (there is no `archon workflow cancel` CLI subcommand) +3. **A node is past its `idle_timeout`.** The default is 5 minutes. Override with per-node `idle_timeout: 600000` (10 min) for long-running nodes. + +### Workflow fails mid-way; how do I resume? + +Auto-resume is default — just re-invoke the same workflow at the same cwd: + +```bash +archon workflow run my-workflow "original message" +# → "Resuming workflow — skipping N already-completed node(s)" +``` + +Use `--resume` only when you want to force-reuse the same worktree from a specific failed run. Use `archon workflow resume ` to force a specific run ID. + +**Caveat:** AI session context from prior nodes is NOT restored on resume. If a `context: shared` node depended on in-session memory, re-running it will have fresh context. Artifact-based handoff survives; in-context memory does not. + +### Approval gate not appearing on web UI + +You set `interactive: true` on the approval node but the workflow still runs in the background and no chat message appears. + +**Fix:** Set `interactive: true` at the **workflow level** too. Node-level `interactive` is ignored on web without workflow-level `interactive`. See `references/workflow-dag.md` §Approval Nodes and §Interactive Loops. + +### `MCP server connection failed: ` noise in chat + +User-level Claude plugin MCPs (e.g. `telegram`, `notion`) inherited from `~/.claude/` fail to connect in the headless subprocess. This is normal — they're not configured for Archon's worktree context. Archon filters these to debug logs (`dag.mcp_plugin_connection_suppressed`) and surfaces only workflow-configured MCP failures. + +If you see a failure for an MCP you DID configure via `mcp:` in the workflow: check the config JSON path, the MCP server's `command`/`args`, and any referenced env vars. + +### Node output is empty / `$nodeId.output.field` resolves to empty string + +Common causes: + +1. Upstream node is an AI node without `output_format` — the output is free-form text, JSON parsing fails, field access returns empty. +2. Upstream node was **skipped** (its `when:` evaluated false). Downstream `when:` with `==` comparisons against a specific value will fail-closed. +3. Bash/script node printed to stderr, not stdout. Only stdout is captured. +4. For script nodes, non-zero exit on a non-existent file / missing import silently drops the output. Check the run log for `node_error` entries. + +## Useful Diagnostic Commands + +```bash +# All active runs as JSON (running / paused / recently finished, depending on retention) +archon workflow status --json | jq '.runs[]' + +# Human-readable status of any active runs +archon workflow status + +# Active worktrees and their last activity +archon isolation list + +# Validate a specific workflow before running +archon validate workflows my-workflow + +# Validate a specific command +archon validate commands my-command + +# Dump the last 50 lines of a workflow's log +tail -n 50 ~/.archon/workspaces///logs/.jsonl | jq . + +# Increase log verbosity (workflow run) +archon workflow run my-workflow --verbose "..." + +# Increase server log verbosity +LOG_LEVEL=debug bun run start +``` + +## Escalation: when nothing makes sense + +1. Run `archon version` and note the version. +2. Run `archon validate workflows ` and capture the output. +3. Grab the last ~50 lines of the run's JSONL log. +4. Check the `CHANGELOG.md` for known issues / recent changes to the subsystem you're hitting. +5. File an issue at https://github.com/coleam00/Archon/issues with version, validate output, log tail, and the YAML. diff --git a/.claude/skills/archon/references/variables.md b/.claude/skills/archon/references/variables.md index 8f3d2dc57f..a02b546b3a 100644 --- a/.claude/skills/archon/references/variables.md +++ b/.claude/skills/archon/references/variables.md @@ -26,6 +26,7 @@ All variables are available in all workflows. The only exception is `$nodeId.out - **Command files** (`.archon/commands/*.md`) — all variables except `$nodeId.output` - **Inline `prompt:` fields** — in DAG prompt nodes and loop node prompts - **`bash:` scripts in DAG nodes** — `$nodeId.output` references are automatically shell-quoted (single-quoted with `'` escaped) +- **`script:` bodies in DAG nodes** — same substitution as bash, but `$nodeId.output` values are **NOT** shell-quoted. For TypeScript/bun scripts, assign directly (`const data = $nodeId.output;`) — JSON is valid JS expression syntax. **Avoid `String.raw\`$nodeId.output\``** — it silently breaks when the output contains a backtick (common in AI-generated markdown and `output_format` payloads). ## Substitution Order diff --git a/.claude/skills/archon/references/workflow-dag.md b/.claude/skills/archon/references/workflow-dag.md index eefb380646..93d2d0b2d0 100644 --- a/.claude/skills/archon/references/workflow-dag.md +++ b/.claude/skills/archon/references/workflow-dag.md @@ -20,9 +20,91 @@ nodes: depends_on: [other-node] # Node IDs that must complete first ``` -## Four Node Types (Mutually Exclusive) +## Workflow-Level Fields -Each node must have exactly ONE of these fields: +Top-level YAML fields on a workflow object. Per-node overrides (same name under a node) win over workflow-level defaults. + +### Core + +| Field | Type | Description | +|-------|------|-------------| +| `name` | string (required) | Workflow identifier (used in `archon workflow run `) | +| `description` | string (required) | Human-readable summary. Used for routing; see [Workflow Description Best Practices](https://archon.diy/guides/authoring-workflows/#workflow-description-best-practices) | +| `provider` | string | AI provider (e.g. `claude`, `codex`, `pi`). Default: from `.archon/config.yaml` | +| `model` | string | Model override. Claude: `sonnet` \| `opus` \| `haiku` \| `claude-*` \| `inherit`. Codex: any non-Claude model ID | +| `interactive` | boolean | **Required for web UI** when the workflow has approval gates or `loop.interactive` nodes. Forces foreground execution so gate messages reach the user's chat. Default: `false` (background on web) | + +### Isolation + +| Field | Type | Description | +|-------|------|-------------| +| `worktree.enabled` | boolean | Pin isolation regardless of caller. `false` = always live checkout (CLI `--branch`/`--from` hard-error). `true` = always worktree (CLI `--no-worktree` hard-errors). Omit = caller decides. Use `false` for read-only workflows (triage, reporting) | + +Other worktree config (`baseBranch`, `copyFiles`, `initSubmodules`, `path`) lives in `.archon/config.yaml`, not the workflow YAML — see `references/repo-init.md`. + +### Claude SDK Advanced Options + +These fields apply to Claude nodes workflow-wide; each can be overridden per-node. Codex nodes ignore them with a warning. + +| Field | Type | Description | +|-------|------|-------------| +| `effort` | `'low'` \| `'medium'` \| `'high'` \| `'max'` | Claude Agent SDK reasoning depth. Different from Codex `modelReasoningEffort` below | +| `thinking` | string \| object | Extended thinking. String shorthand: `'adaptive'` \| `'enabled'` \| `'disabled'`. Object form: `{ type: 'enabled', budgetTokens: 8000 }` | +| `fallbackModel` | string | Model to use if the primary model fails (e.g. `claude-haiku-4-5-20251001`) | +| `betas` | string[] | SDK beta feature flags (non-empty array). Example: `['context-1m-2025-08-07']` for 1M-context Claude | +| `sandbox` | object | OS-level filesystem/network restrictions. Nested `network` / `filesystem` sub-objects — see [archon.diy/guides/authoring-workflows/#claude-sdk-advanced-options](https://archon.diy/guides/authoring-workflows/#claude-sdk-advanced-options) for the full schema. Layers on top of worktree isolation | + +Per-node-only (NOT valid at workflow level): `maxBudgetUsd`, `systemPrompt`. + +### Codex-Specific Options + +| Field | Type | Description | +|-------|------|-------------| +| `modelReasoningEffort` | `'minimal'` \| `'low'` \| `'medium'` \| `'high'` \| `'xhigh'` | Codex reasoning depth. Separate field from Claude's `effort` | +| `webSearchMode` | `'disabled'` \| `'cached'` \| `'live'` | Codex web search behavior. Default: `disabled` | +| `additionalDirectories` | string[] | Absolute paths Codex can read outside the codebase (shared libraries, docs repos) | + +### Complete workflow-level example + +```yaml +name: careful-migration +description: | + Plan a migration, get explicit approval, then implement under strict + sandbox and cost limits. Used by the ops team before destructive work. +provider: claude +model: sonnet +interactive: true # required — this workflow has an approval gate + +worktree: + enabled: true # always isolate; reject --no-worktree + +effort: high +thinking: adaptive +fallbackModel: claude-haiku-4-5-20251001 +betas: ['context-1m-2025-08-07'] +sandbox: + enabled: true + network: + allowedDomains: ['api.github.com'] + allowManagedDomainsOnly: true + filesystem: + denyWrite: ['/etc', '/usr'] + +nodes: + - id: plan + command: plan-migration + - id: review + approval: + message: "Review the migration plan above." + depends_on: [plan] + - id: implement + command: implement-migration + depends_on: [review] +``` + +## Node Types (Mutually Exclusive) + +Each node must have exactly ONE of these fields: `command`, `prompt`, `bash`, `script`, `loop`, `approval`, or `cancel`. ### Command Node Runs a command file from `.archon/commands/`: @@ -54,6 +136,55 @@ Runs a shell script without AI: - **stderr** forwarded as warning, does not fail the node - No AI invoked — AI-specific fields are ignored - Use `timeout:` (milliseconds) for execution time limit +- `$nodeId.output` substitutions are **auto shell-quoted** (safe to embed) + +### Script Node +Runs TypeScript/JavaScript (via `bun`) or Python (via `uv`) without AI. Same stdout/stderr contract as bash nodes. + +**Inline script (TypeScript):** +```yaml +- id: parse + script: | + const raw = process.argv.slice(2).join(' ') || '{}'; + const data = JSON.parse(raw); + console.log(JSON.stringify({ items: data.items?.length ?? 0 })); + runtime: bun # REQUIRED: 'bun' or 'uv' + timeout: 30000 # ms, default: 120000 +``` + +**Inline script (Python) with uv dependencies:** +```yaml +- id: fetch + script: | + import httpx, json + r = httpx.get("https://api.github.com/repos/anthropics/anthropic-cookbook") + print(json.dumps({ "stars": r.json()["stargazers_count"] })) + runtime: uv + deps: ["httpx>=0.27"] # Optional — 'uv run --with '. Ignored for bun. +``` + +**Named script from `.archon/scripts/`:** +```yaml +- id: analyze + script: analyze-metrics # Resolves .archon/scripts/analyze-metrics.py + runtime: uv # Must match file extension (.ts/.js → bun, .py → uv) + deps: ["pandas>=2.0"] +``` + +- **Inline vs named**: a `script` value is treated as inline code if it contains a newline or any shell metacharacter (space, or any of: `;` `(` `)` `{` `}` `&` `|` `<` `>` `$` `` ` `` `"` `'`). Otherwise it's a named-script lookup (bare identifier). +- **Named script resolution**: `/.archon/scripts/` (wins) → `~/.archon/scripts/`. 1-level subfolder grouping allowed. Extension determines runtime (`.ts`/`.js` → `bun`, `.py` → `uv`) and MUST match the declared `runtime:` +- **Dispatch**: + - `bun` + inline → `bun --no-env-file -e ''` + - `bun` + named → `bun --no-env-file run ` + - `uv` + inline → `uv run [--with dep ...] python -c ''` + - `uv` + named → `uv run [--with dep ...] ` +- **`deps`** is uv-only. Bun auto-installs on import; `deps` with `runtime: bun` emits a validator warning +- **stdout** captured as `$nodeId.output` (trailing newline trimmed) +- **stderr** forwarded as warning, does NOT fail the node. Non-zero exit DOES fail it. +- **`bun --no-env-file`** prevents target repo `.env` from leaking into the subprocess +- `$nodeId.output` substitutions are **NOT shell-quoted** in script bodies — assign directly (`const data = $nodeId.output;`) or parse with `JSON.parse` / `json.loads`; don't interpolate into shell syntax +- **CAUTION — `String.raw\`$nodeId.output\`` is fragile**: if the substituted value contains a backtick (common in AI-generated markdown, `output_format` payloads, or any content with code spans), the template literal terminates early and produces a cryptic `Expected ";"` parse error. Use direct assignment instead — JSON is valid JS expression syntax and needs no wrapper. +- AI-specific fields (`model`, `provider`, `hooks`, `mcp`, `skills`, `output_format`, `allowed_tools`, `denied_tools`, `agents`, `effort`, `thinking`, `maxBudgetUsd`, `systemPrompt`, `fallbackModel`, `betas`, `sandbox`) emit a loader warning and are ignored ### Loop Node Iterates an AI prompt until a completion signal or max iterations: @@ -83,7 +214,7 @@ All node types share these fields: | `depends_on` | string[] | `[]` | Node IDs that must settle before this node runs | | `when` | string | — | Condition expression. Node **skipped** when false | | `trigger_rule` | string | `all_success` | Join semantics for multiple dependencies | -| `idle_timeout` | number (ms) | 300000 | Per-node idle timeout. On loop nodes, applies per-iteration | +| `idle_timeout` | number (ms) | 300000 | Idle timeout for AI streaming (`command`, `prompt`) and per-iteration idle for `loop`. Accepted but ignored on `bash` and `script` — use `timeout` there | **Command, prompt, and bash nodes** (silently ignored on loop nodes, except `retry` which is a hard error): @@ -129,14 +260,53 @@ nodes: ## Conditions (`when:`) +Gate whether a node runs based on upstream output. A condition that evaluates to `false` skips the node (fail-closed — skipped nodes propagate their skipped state to dependants). + +### Operators + +**String comparison** (literal string equality): ```yaml -- id: investigate - command: investigate-bug - depends_on: [classify] - when: "$classify.output.issue_type == 'bug'" +when: "$nodeId.output == 'VALUE'" +when: "$nodeId.output != 'VALUE'" +when: "$nodeId.output.field == 'VALUE'" # JSON dot notation (requires output_format) ``` -**Syntax**: `$nodeId.output OPERATOR 'value'` — operators: `==`, `!=` only. Values single-quoted. Invalid expressions skip the node (fail-closed). +**Numeric comparison** (both sides auto-parsed as numbers; fail-closed if either side is not finite): +```yaml +when: "$score.output > '80'" +when: "$score.output >= '0.9'" +when: "$score.output < '100'" +when: "$score.output <= '5'" +when: "$score.output.confidence >= '0.9'" +``` + +All six operators — `==`, `!=`, `<`, `>`, `<=`, `>=` — are supported. Values are single-quoted strings (even for numeric comparisons). + +### Compound Expressions + +Combine conditions with `&&` (AND) and `||` (OR). **`&&` binds tighter than `||`.** No parentheses supported — structure expressions with that precedence in mind. + +```yaml +when: "$a.output == 'X' && $b.output != 'Y'" +when: "$a.output == 'X' || $b.output == 'Y'" +when: "$score.output > '80' && $flag.output == 'true'" + +# Precedence: (A && B) || C +when: "$a.output == 'X' && $b.output == 'Y' || $c.output == 'Z'" +``` + +Short-circuit evaluation: `&&` stops at the first false, `||` stops at the first true. + +### Dot Notation (JSON Field Access) + +`$nodeId.output.field` parses the upstream output as JSON and extracts the named field. Returns empty string if parsing fails or the field is absent — which then fails-closed against any literal value. Requires the upstream node to have `output_format` set (for AI nodes) or to print valid JSON (for bash/script nodes). + +### Fail-Closed Rules + +- Invalid or unparseable expression → node skipped, warning logged +- Numeric operator with a non-numeric side → node skipped +- `$nodeId.output.field` on non-JSON output → field is empty → comparison fails +- Referenced node did not run (skipped upstream) → substitution is empty → comparison fails ## Node Output Substitution @@ -211,15 +381,53 @@ Loop nodes iterate an AI prompt until a completion condition is met. Use them fo max_iterations: 10 # Required. Integer >= 1. Fails if exceeded fresh_context: true # Optional. Default: false until_bash: "..." # Optional. Exit 0 = complete + interactive: true # Optional. Pauses between iterations for user input + gate_message: "..." # Required when interactive: true ``` | Field | Type | Required | Description | |-------|------|----------|-------------| -| `prompt` | string | Yes | Prompt template. Supports all variable substitution (`$ARGUMENTS`, `$nodeId.output`, etc.) | +| `prompt` | string | Yes | Prompt template. Supports all variable substitution (`$ARGUMENTS`, `$nodeId.output`, `$LOOP_USER_INPUT`, etc.) | | `until` | string | Yes | Completion signal to detect in AI output | | `max_iterations` | number | Yes | Hard limit. Node **fails** if exceeded | | `fresh_context` | boolean | No | Default `false`. `true` = fresh AI session each iteration | -| `until_bash` | string | No | Shell script run after each iteration. Exit 0 = complete | +| `until_bash` | string | No | Shell script run after each iteration. Exit 0 = complete. Variable substitution applies; `$nodeId.output` IS shell-quoted here | +| `interactive` | boolean | No | Default `false`. `true` = pause after each non-completing iteration for user feedback via `/workflow approve ` | +| `gate_message` | string | **Required when `interactive: true`** | Message shown to the user at each pause. Validated at parse time — a loop with `interactive: true` and no `gate_message` fails to load | + +### Interactive Loops + +Interactive loops pause between iterations so a human can provide feedback that feeds the next iteration. Use them for guided writing/refinement (e.g. PRD co-authoring, iterative design). + +```yaml +name: guided-refine +description: Refine an output with human feedback between iterations +interactive: true # REQUIRED at the workflow level for web UI + +nodes: + - id: refine + loop: + prompt: | + Review the current draft and improve it based on this feedback: + $LOOP_USER_INPUT + + When the output is satisfactory, output: DONE + until: DONE + max_iterations: 5 + interactive: true # node level — enables the pause + gate_message: | + Review the output above. Reply with feedback, or type DONE to finish. +``` + +The flow: +1. Iteration N runs. AI produces output. +2. If AI signalled completion (`DONE`) or `until_bash` exited 0, loop ends. +3. Otherwise: `gate_message` is sent to the user, workflow pauses (status = `paused`). +4. User runs `archon workflow approve ""` (or replies naturally in chat platforms). +5. Iteration N+1 runs with `$LOOP_USER_INPUT` substituted to the user's feedback — but **only on that first resumed iteration**. Subsequent iterations in the same resumed session see `$LOOP_USER_INPUT` as empty string. +6. Repeat. + +**Workflow-level `interactive: true` is required** for the gate message to reach the user on the web UI (otherwise the workflow dispatches to a background worker that can't deliver chat messages). The loader emits a warning if a node has `interactive: true` without workflow-level `interactive: true`. ### Completion Detection @@ -279,6 +487,148 @@ First iteration is always fresh regardless. --- +## Approval Nodes + +Approval nodes **pause the workflow** until a human approves or rejects the gate. Use them to insert review steps between AI-driven nodes — for example, reviewing a generated plan before committing to expensive implementation work. + +### Configuration + +```yaml +- id: review-gate + approval: + message: "Review the plan above before proceeding with implementation." + capture_response: false # Optional. true = user's comment stored as $review-gate.output + on_reject: # Optional. AI rework on rejection instead of cancel + prompt: "Revise based on feedback: $REJECTION_REASON" + max_attempts: 3 # Range 1–10, default 3. After max, workflow is cancelled. + depends_on: [plan] +``` + +### Fields + +| Field | Required | Description | +|-------|----------|-------------| +| `approval.message` | **Yes** | The message shown to the user when the workflow pauses | +| `approval.capture_response` | No | `true` = user's approval comment stored as `$.output` for downstream nodes. Default: `false` (downstream `$.output` is empty string) | +| `approval.on_reject.prompt` | No | Prompt run via AI when the user rejects. `$REJECTION_REASON` is substituted with the reject reason. After running, the workflow re-pauses at the same gate | +| `approval.on_reject.max_attempts` | No | Max times the on_reject prompt runs before the workflow is cancelled. Range: 1–10. Default: 3 | + +### Web UI Requirement + +Approval gates delivered on the Web UI require `interactive: true` at the **workflow level** — otherwise the workflow dispatches to a background worker and the gate message never reaches the user's chat window. + +```yaml +name: plan-approve-implement +interactive: true # REQUIRED for approval gates on web UI +nodes: + - id: plan + command: plan-feature + - id: review-gate + approval: + message: "Approve the plan to proceed." + depends_on: [plan] + - id: implement + command: implement + depends_on: [review-gate] +``` + +### Approve and Reject Commands + +```bash +# From the CLI +archon workflow approve +archon workflow approve --comment "looks good" +archon workflow reject +archon workflow reject --reason "plan needs more test coverage" + +# Cross-platform (Slack / Telegram / Web / GitHub chat) +/workflow approve +/workflow reject + +# Natural language (all platforms except CLI — auto-detects paused workflow) +User: "Looks good, proceed" +# → auto-approves. With capture_response: true, the message becomes $review-gate.output +``` + +### What Does NOT Work on Approval Nodes + +AI-specific fields (`model`, `provider`, `hooks`, `mcp`, `skills`, `output_format`, `allowed_tools`, `denied_tools`, `context`, `effort`, `thinking`, etc.) are accepted by the parser but emit a loader warning and are ignored — no AI runs during the pause. (Note: `on_reject.prompt` DOES run AI, using the workflow's default provider/model.) + +`retry`, `when`, `trigger_rule`, `depends_on`, `idle_timeout` all work. + +--- + +## Cancel Nodes + +Cancel nodes **terminate the workflow run** with a reason string. Useful for guarded exits — a `cancel:` node with a `when:` condition stops the workflow cleanly when preconditions aren't met. + +### Configuration + +```yaml +- id: gate-branch + cancel: "Refusing to run on main — this workflow modifies files." + when: "$check-branch.output == 'main'" + depends_on: [check-branch] +``` + +When a cancel node runs, Archon: +- Marks the workflow run as `cancelled` (not `failed`) +- Stops in-flight parallel nodes via the existing cancellation plumbing +- Records the reason string in the run's metadata +- Emits a `node_completed` event for the cancel node itself + +### Fields + +| Field | Required | Description | +|-------|----------|-------------| +| `cancel` | **Yes** | Non-empty reason string shown to the user and recorded in metadata | + +Standard DAG fields (`id`, `depends_on`, `when`, `trigger_rule`, `idle_timeout`) all work. AI-specific fields emit a loader warning and are ignored — cancel nodes don't invoke AI. + +### When to use `cancel` vs failing a `bash:` check + +- **Use `cancel:`** when the precondition failure is **expected** (e.g., wrong branch, required file missing, feature flag disabled). The run shows as `cancelled`, which doesn't trigger the DAG auto-resume path. +- **Use a `bash:` node that exits non-zero** when the check itself fails (e.g., network error, tool missing). The run shows as `failed`, which auto-resumes on the next invocation. + +### Typical Patterns + +**Gate on upstream classification:** +```yaml +- id: classify + prompt: "Is the input safe to proceed? Output 'SAFE' or 'UNSAFE'." + allowed_tools: [] + +- id: stop-if-unsafe + cancel: "Refusing to proceed: input flagged UNSAFE by classifier." + depends_on: [classify] + when: "$classify.output != 'SAFE'" + +- id: do-work + command: the-work + depends_on: [classify] + when: "$classify.output == 'SAFE'" +``` + +**Stop before expensive step unless precondition met:** +```yaml +- id: check-budget + bash: | + spent=$(gh api /meta --jq '.rate.used // 0') + echo "$spent" + +- id: abort-if-over + cancel: "Aborting — GH API quota exhausted." + depends_on: [check-budget] + when: "$check-budget.output > '4500'" + +- id: run-api-heavy-work + command: heavy-work + depends_on: [check-budget] + when: "$check-budget.output <= '4500'" +``` + +--- + ## Validate Before Finishing Before declaring a workflow complete, validate it: @@ -302,8 +652,13 @@ Use `--json` for machine-readable output. Use `archon validate commands ` - All `depends_on` reference existing IDs - No cycles - `$nodeId.output` refs in `when:`, `prompt:`, `loop.prompt:` must point to known IDs -- Exactly one of `command`, `prompt`, `bash`, `loop` per node +- Exactly one of `command`, `prompt`, `bash`, `script`, `loop`, `approval`, `cancel` per node +- Script nodes require `runtime: bun` or `runtime: uv` +- Named scripts must exist in `.archon/scripts/` or `~/.archon/scripts/` with extension matching declared runtime - `retry` on loop node = hard error +- `approval.message` required and non-empty +- `cancel` reason required and non-empty +- Approval `on_reject.max_attempts` must be 1–10 if set - `steps:` format rejected (deprecated — use `nodes:` only) ## Complete Example diff --git a/.claude/skills/release/SKILL.md b/.claude/skills/release/SKILL.md index 4f90f70978..1844336f2f 100644 --- a/.claude/skills/release/SKILL.md +++ b/.claude/skills/release/SKILL.md @@ -64,9 +64,15 @@ if [ -f scripts/build-binaries.sh ] && [ -f packages/cli/src/cli.ts ]; then packages/cli/src/cli.ts # Smoke test: the binary must start and exit 0 on a safe, non-interactive command. - # `version` or `--help` are both acceptable — pick one that does NOT touch the - # network, database, or require env vars. - if ! "$TMP_BINARY" version > /tmp/archon-preflight.log 2>&1; then + # Use `--help` (NOT `version`). The `version` command's compiled-binary code + # path depends on BUNDLED_IS_BINARY=true, which is set by scripts/build-binaries.sh + # — but we're doing a bare `bun build --compile` here to keep the smoke fast, + # so BUNDLED_IS_BINARY is still `false`. That sends `version` down the dev + # branch of version.ts which tries to read package.json from a path that only + # exists in node_modules, producing a false-positive ENOENT. `--help` has no + # such dev/binary branch and exercises the same module-init graph we're + # actually testing. Must NOT touch network, database, or require env vars. + if ! "$TMP_BINARY" --help > /tmp/archon-preflight.log 2>&1; then echo "ERROR: compiled binary crashed at startup" cat /tmp/archon-preflight.log echo "" diff --git a/.claude/skills/test-release/SKILL.md b/.claude/skills/test-release/SKILL.md index 31029014ea..c93d0c5bee 100644 --- a/.claude/skills/test-release/SKILL.md +++ b/.claude/skills/test-release/SKILL.md @@ -79,6 +79,8 @@ About to test: Path: brew (Homebrew tap on macOS) Version: 0.3.1 (expected) Cleanup: will uninstall after tests (brew uninstall + untap) + If `archon-stable` symlink is detected in Phase 2, it will be + restored at the end of Phase 5 by reinstalling the tap formula. Proceed? (y/N) ``` @@ -112,6 +114,18 @@ gh release view v --repo coleam00/Archon --json tagName,assets --jq '{t If the release does not exist or has no assets, abort with a clear message. Do not proceed to install a non-existent release. +4. **Detect persistent `archon-stable` install (brew path only).** If the user has renamed a prior brew install to `archon-stable` (the dual-homebrew pattern — see `~/.config/fish/functions/brew-upgrade-archon.fish`), Phase 5's `brew uninstall` will wipe it. Capture the state so Phase 5b can restore it: + +```bash +ARCHON_STABLE_WAS_INSTALLED="" +if [ -L /opt/homebrew/bin/archon-stable ] || [ -L /usr/local/bin/archon-stable ]; then + ARCHON_STABLE_WAS_INSTALLED="yes" + echo "Detected persistent archon-stable — will restore after Phase 5 uninstall." +fi +``` + +Export `ARCHON_STABLE_WAS_INSTALLED` into the environment used by Phase 5b. Only applies to the `brew` path — `curl-mac` and `curl-vps` don't go through brew and don't disturb `archon-stable`. + ## Phase 3 — Install ### Path: brew @@ -352,6 +366,25 @@ archon version | head -1 # should match the dev version captured in Phase 2 ``` +**Restore `archon-stable` if it existed before the test** (dual-homebrew pattern — see Phase 2 item 4): + +```bash +if [ -n "$ARCHON_STABLE_WAS_INSTALLED" ]; then + echo "Restoring archon-stable (detected before test)..." + brew tap coleam00/archon + brew install coleam00/archon/archon + BREW_BIN="$(brew --prefix)/bin" + if [ -e "$BREW_BIN/archon" ]; then + mv "$BREW_BIN/archon" "$BREW_BIN/archon-stable" + echo "archon-stable restored: $(archon-stable version 2>/dev/null | head -1)" + else + echo "WARNING: brew install succeeded but $BREW_BIN/archon missing — check formula" + fi +fi +``` + +> **Note on the restored version**: this reinstalls from whatever the tap currently ships, which is typically the release you just tested (so `archon-stable` ends up at the newly-tested version). That's usually what the operator wants — you just verified the new release works, and you want `archon-stable` pointed at it. If you were testing an older version for back-version QA, the restored `archon-stable` will be the *current* tap formula, not the pre-test version. For that rare case, the operator should re-run `brew-upgrade-archon` manually after the test. + ### Path: curl-mac ```bash diff --git a/.gitignore b/.gitignore index 4b225843ea..1f8415a4f8 100644 --- a/.gitignore +++ b/.gitignore @@ -48,6 +48,11 @@ e2e-screenshots/ # Cross-run workflow state (e.g. issue-triage memory) .archon/state/ +# Maintainer standup — per-maintainer state and briefs (direction.md is committed) +.archon/maintainer-standup/profile.md +.archon/maintainer-standup/state.json +.archon/maintainer-standup/briefs/ + # Agent artifacts (generated, local only) .agents/ .agents/rca-reports/ diff --git a/CHANGELOG.md b/CHANGELOG.md index 63d98f8264..9dabeac1d0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- **`$LOOP_PREV_OUTPUT` workflow variable (loop nodes only)** — exposes the previous iteration's cleaned output (after `` tag stripping) to the current iteration's prompt. Empty on the first iteration and on the first iteration after resuming from an interactive approval gate. Enables `fresh_context: true` loops to reference what the prior pass said or did without carrying full session history. (#1367) + ## [0.3.9] - 2026-04-22 First release with working compiled binaries since v0.3.6. Both v0.3.7 and v0.3.8 were tagged but neither shipped release assets — v0.3.7 was blocked by two genuine binary-runtime bugs (Pi SDK's module-init crash + Bun `--bytecode` producing broken output), and v0.3.8 was blocked by an unrelated CI smoke-test regression where `release.yml`'s Claude resolver test required an `origin` remote that the fresh `git init` test repo didn't have. Both superseded tags remain for history; their GitHub Releases were deleted at the time of tagging so `releases/latest` fell back to v0.3.6 throughout, keeping `install.sh` and Homebrew safe. v0.3.9 is what users actually install. diff --git a/CLAUDE.md b/CLAUDE.md index f2afd41e9c..28d337c44e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -32,6 +32,8 @@ **Git Workflow and Releases** - `main` is the release branch. Never commit directly to `main`. - `dev` is the working branch. All feature work branches off `dev` and merges back into `dev`. +- All PRs must use the template at `.github/PULL_REQUEST_TEMPLATE.md` — fill in every section. When opening a PR via `gh pr create`, copy the template into the body explicitly; GitHub only auto-applies it through the web UI. +- Link the issue with `Closes #` (or `Fixes` / `Resolves`) in the PR description so it auto-closes on merge. - To release, use the `/release` skill. It compares `dev` to `main`, generates changelog entries, bumps the version, and creates a PR to merge `dev` into `main`. - Releases follow Semantic Versioning: `/release` (patch), `/release minor`, `/release major`. - Changelog lives in `CHANGELOG.md` and follows Keep a Changelog format. @@ -689,6 +691,7 @@ async function createSession(conversationId: string, codebaseId: string) { - `$DOCS_DIR` - Documentation directory path; configured via `docs.path` in `.archon/config.yaml`. Defaults to `docs/`. Never throws. - `$LOOP_USER_INPUT` - User feedback provided via `/workflow approve ` at an interactive loop gate. Only populated on the first iteration of a resumed interactive loop; empty string on all other iterations. - `$REJECTION_REASON` - Reviewer feedback provided via `/workflow reject ` at an approval gate. Only populated in `on_reject` prompts; empty string elsewhere. +- `$LOOP_PREV_OUTPUT` - Cleaned output of the previous loop iteration (loop nodes only). Empty string on the first iteration (no prior output exists). Useful for `fresh_context: true` loops that need to reference what the previous pass produced or why it failed without carrying full session history. **Command Types:** diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c0120a16bd..314ab1e5f7 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -44,7 +44,8 @@ bun run validate 1. Create a feature branch from `dev` 2. Make your changes 3. Ensure all checks pass -4. Submit a PR with a clear description +4. Submit a PR using the template at [`.github/PULL_REQUEST_TEMPLATE.md`](./.github/PULL_REQUEST_TEMPLATE.md). GitHub fills it in automatically when you open a PR through the web UI. If you use `gh pr create`, copy the template into the body — leaving it empty or partially filled slows review. +5. Link the issue your PR addresses with `Closes #` (or `Fixes #` / `Resolves #`) in the description so it auto-closes on merge. ## Code Style diff --git a/eslint.config.mjs b/eslint.config.mjs index 152c4245dd..6e926f7bc0 100644 --- a/eslint.config.mjs +++ b/eslint.config.mjs @@ -17,6 +17,7 @@ export default tseslint.config( 'worktrees/**', '.claude/worktrees/**', '.claude/skills/**', + '.archon/**', // User workflow/script/command content — not in any tsconfig project '**/*.generated.ts', // Auto-generated source files (content inlined via JSON.stringify) '**/*.js', '*.mjs', diff --git a/homebrew/archon.rb b/homebrew/archon.rb index 0bac58a339..d8f4c45c18 100644 --- a/homebrew/archon.rb +++ b/homebrew/archon.rb @@ -7,28 +7,28 @@ class Archon < Formula desc "Remote agentic coding platform - control AI assistants from anywhere" homepage "https://github.com/coleam00/Archon" - version "0.3.6" + version "0.3.9" license "MIT" on_macos do on_arm do url "https://github.com/coleam00/Archon/releases/download/v#{version}/archon-darwin-arm64" - sha256 "96b6dac50b046eece9eddbb988a0c39b4f9a0e2faac66e49b977ba6360069e86" + sha256 "b617f85a2181938b793b25ad816a9f6b3149d184f64b2e9e2ea2430f27778d64" end on_intel do url "https://github.com/coleam00/Archon/releases/download/v#{version}/archon-darwin-x64" - sha256 "09f1dbe12417b4300b7b07b531eb7391a286305f8d4eafc11e7f61f5d26eb8eb" + sha256 "5a928af5e0e67ffe084159161a9ea3994a9304cc39bd06132719cd89cc715e86" end end on_linux do on_arm do url "https://github.com/coleam00/Archon/releases/download/v#{version}/archon-linux-arm64" - sha256 "80b06a6ff699ec57cd4a3e49cfe7b899a3e8212688d70285f5a887bf10086731" + sha256 "567bfca9175e10d9b4fd748e3862bbd34141a234766a7ecf0a714d9c27b8c92e" end on_intel do url "https://github.com/coleam00/Archon/releases/download/v#{version}/archon-linux-x64" - sha256 "09f5dac6db8037ed6f3e5b7e9c5eb8e37f19822a4ed2bf4cd7e654780f9d00de" + sha256 "c918218df2f0f853d107e6b1727dcd9accc034b183ffbccea93a331d8d376ed8" end end diff --git a/packages/docs-web/src/content/docs/adapters/community/discord.md b/packages/docs-web/src/content/docs/adapters/community/discord.md index 0f3e59082c..b719d719ce 100644 --- a/packages/docs-web/src/content/docs/adapters/community/discord.md +++ b/packages/docs-web/src/content/docs/adapters/community/discord.md @@ -40,6 +40,14 @@ Connect Archon to Discord so you can interact with your AI coding assistant from 2. Enable **"Message Content Intent"** (required for the bot to read messages) 3. Save changes +:::caution +Skipping this step causes Discord to reject the bot's connection with +`Used disallowed intents`. Archon will log +`discord.start_failed_continuing_without_adapter` and keep the rest of +the server running, but the Discord adapter will be unavailable until +the intent is enabled and the server is restarted. +::: + ## Invite Bot to Your Server 1. Go to "OAuth2" > "URL Generator" in the left sidebar diff --git a/packages/docs-web/src/content/docs/adapters/web.md b/packages/docs-web/src/content/docs/adapters/web.md index 0025ca0219..bb5e43ba91 100644 --- a/packages/docs-web/src/content/docs/adapters/web.md +++ b/packages/docs-web/src/content/docs/adapters/web.md @@ -166,7 +166,7 @@ Click on a workflow run (from the dashboard or progress card) to open the execut The Workflow Builder at `/workflows/builder` provides a visual editor for creating and modifying workflow YAML files. Features include: - **DAG canvas** -- Drag-and-drop nodes to build your workflow graph visually -- **Node palette** -- Add command, prompt, bash, and loop nodes from a sidebar library +- **Node palette** -- Drag command, prompt, and bash nodes from a sidebar library. Additional node types (`script`, `loop`, `approval`, `cancel`) are editable via the Code / Split view - **Node inspector** -- Click a node to configure its properties (command, prompt text, dependencies, model overrides, hooks, MCP servers, etc.) in a tabbed panel - **View modes** -- Toggle between Visual, Split, and Code views. Split mode shows the canvas and YAML side by side. - **Command picker** -- Browse available commands when configuring command nodes diff --git a/packages/docs-web/src/content/docs/book/dag-workflows.md b/packages/docs-web/src/content/docs/book/dag-workflows.md index 2a66702584..558df2590f 100644 --- a/packages/docs-web/src/content/docs/book/dag-workflows.md +++ b/packages/docs-web/src/content/docs/book/dag-workflows.md @@ -230,20 +230,23 @@ The classify-and-route example uses `none_failed_min_one_success` on `implement` ## Node Types -Archon supports four node types: +Archon supports seven node types. Exactly one mode field is required per node: | Type | Syntax | When to use | |------|--------|-------------| | **Command** | `command: my-command` | Load a command from `.archon/commands/my-command.md`. The standard choice. | | **Prompt** | `prompt: "inline instructions..."` | Quick, one-off instructions that don't need a reusable command file. | | **Bash** | `bash: "shell command"` | Run a shell script without AI. Stdout is captured as `$nodeId.output`. Deterministic operations only. | +| **Script** | `script: "..." ` + `runtime: bun \| uv` | Run TypeScript/JavaScript (bun) or Python (uv) without AI. Inline code or named reference to `.archon/scripts/`. Stdout captured as `$nodeId.output`. See [Script Nodes](/guides/script-nodes/). | | **Loop** | `loop: { prompt: "...", until: SIGNAL }` | Repeat an AI prompt until a completion signal appears in the output. See [Loop Nodes](/guides/loop-nodes/). | +| **Approval** | `approval: { message: "..." }` | Pause the workflow for a human approve/reject decision. See [Approval Nodes](/guides/approval-nodes/). | +| **Cancel** | `cancel: "reason string"` | Terminate the workflow run (status: cancelled, not failed). Usually gated with `when:`. | **Command** is the most common. Use it for anything you'll reuse across workflows. **Prompt** is convenient for glue nodes — summarizing outputs, formatting data — where the logic is simple and workflow-specific. -**Bash** is powerful for deterministic operations: running tests, checking git status, reading a file, fetching an API. The AI doesn't run the bash command; your shell does. The output becomes a variable for downstream nodes: +**Bash** is powerful for deterministic shell operations: running tests, checking git status, reading a file, fetching an API. The AI doesn't run the bash command; your shell does. The output becomes a variable for downstream nodes: ```yaml - id: check-tests @@ -255,6 +258,22 @@ Archon supports four node types: prompt: "Test output: $check-tests.output\n\nFix any failures." ``` +**Script** is for deterministic work that needs a real programming language — parsing JSON, transforming data between AI nodes, calling typed HTTP clients. Use `runtime: bun` for TypeScript/JavaScript and `runtime: uv` for Python: + +```yaml +- id: transform + script: | + const raw = process.env.UPSTREAM ?? '{}'; + const items = JSON.parse(raw).items ?? []; + console.log(JSON.stringify({ count: items.length })); + runtime: bun + +- id: analyze + script: analyze-metrics # Named script: .archon/scripts/analyze-metrics.py + runtime: uv + deps: ["pandas>=2.0"] # uv-only; bun auto-installs imports +``` + **Loop** is for iterative tasks where you don't know how many steps it will take. The AI runs until it emits a completion signal: ```yaml @@ -269,6 +288,32 @@ Archon supports four node types: fresh_context: true ``` +**Approval** pauses the workflow for human review. The downstream nodes don't run until the user approves in chat, CLI, or web UI: + +```yaml +interactive: true # required at workflow level for web UI delivery + +nodes: + - id: plan + command: plan-feature + - id: review-gate + approval: + message: "Review the plan above." + depends_on: [plan] + - id: implement + command: implement + depends_on: [review-gate] +``` + +**Cancel** terminates the workflow with a reason string. Pair with `when:` for guarded exits — the run shows as `cancelled` rather than `failed`: + +```yaml +- id: gate-branch + cancel: "Refusing to run on main — this workflow modifies files." + when: "$check-branch.output == 'main'" + depends_on: [check-branch] +``` + --- ## Best Practices diff --git a/packages/docs-web/src/content/docs/book/quick-reference.md b/packages/docs-web/src/content/docs/book/quick-reference.md index ae37659f7a..a0c34643c3 100644 --- a/packages/docs-web/src/content/docs/book/quick-reference.md +++ b/packages/docs-web/src/content/docs/book/quick-reference.md @@ -124,7 +124,10 @@ All nodes share these base fields: | `command` | One of | string | Name of a command file in `.archon/commands/` | | `prompt` | One of | string | Inline AI instructions | | `bash` | One of | string | Shell script (runs without AI; stdout captured as `$nodeId.output`) | +| `script` | One of | string | TypeScript/JavaScript (bun) or Python (uv) — inline or named ref to `.archon/scripts/`. Requires `runtime`. See [Script Nodes](/guides/script-nodes/) | | `loop` | One of | object | Loop configuration (see Loop Options below) | +| `approval` | One of | object | Pause for human review; see [Approval Nodes](/guides/approval-nodes/) | +| `cancel` | One of | string | Reason string; terminates the run with `cancelled` status (not `failed`). Usually gated with `when:` | | `depends_on` | No | string[] | Node IDs that must complete before this node runs | | `when` | No | string | Condition expression; node is skipped if false | | `trigger_rule` | No | string | Join semantics when multiple upstreams exist (see Trigger Rules) | @@ -135,12 +138,30 @@ All nodes share these base fields: | `allowed_tools` | No | string[] | Restrict available tools to this list (Claude only) | | `denied_tools` | No | string[] | Remove specific tools from this node's context (Claude only) | | `idle_timeout` | No | number | Per-node idle timeout in milliseconds (default: 5 minutes) | -| `retry` | No | object | Retry configuration for transient failures (see Retry Options) | +| `retry` | No | object | Retry configuration for transient failures (see Retry Options). **Hard error on loop nodes** | | `hooks` | No | object | SDK hook callbacks (Claude only; see Hook Schema) | | `mcp` | No | string | Path to MCP server config JSON file (Claude only) | | `skills` | No | string[] | Skill names to preload into this node's context (Claude only) | +| `agents` | No | object | Inline sub-agent definitions keyed by kebab-case ID. Claude only | -> **bash node timeout**: The `timeout` field on bash nodes is in **milliseconds** (default: 120000). This differs from hook `timeout`, which is in seconds. +**Script-specific fields** (required when `script:` is set): + +| Field | Required | Type | Description | +|-------|----------|------|-------------| +| `runtime` | Yes | `'bun'` \| `'uv'` | Which runtime executes the script. Must match file extension for named scripts (`.ts`/`.js` → bun, `.py` → uv) | +| `deps` | No | string[] | Python dependencies for `uv run --with`. Ignored for bun (bun auto-installs) | +| `timeout` | No | number | Hard kill in ms. Default: 120000 (2 min). Same semantics as `bash` timeout | + +**Approval-specific fields** (required when `approval:` is set): + +| Field | Required | Type | Description | +|-------|----------|------|-------------| +| `approval.message` | Yes | string | The message shown to the user when the workflow pauses | +| `approval.capture_response` | No | boolean | `true` = user's comment becomes `$.output`. Default: `false` | +| `approval.on_reject.prompt` | No | string | AI rework prompt when the user rejects. `$REJECTION_REASON` substituted | +| `approval.on_reject.max_attempts` | No | number | Max rework iterations before cancel. Range 1-10, default 3 | + +> **bash and script node timeout**: The `timeout` field is in **milliseconds** (default: 120000). This differs from hook `timeout`, which is in seconds. ### Trigger Rules diff --git a/packages/docs-web/src/content/docs/getting-started/ai-assistants.md b/packages/docs-web/src/content/docs/getting-started/ai-assistants.md index ff4f8e6533..7a65b97adf 100644 --- a/packages/docs-web/src/content/docs/getting-started/ai-assistants.md +++ b/packages/docs-web/src/content/docs/getting-started/ai-assistants.md @@ -229,7 +229,7 @@ DEFAULT_AI_ASSISTANT=codex ## Pi (Community Provider) -**One adapter, ~20 LLM backends.** Pi (`@mariozechner/pi-coding-agent`) is a community-maintained coding-agent harness that Archon integrates as the first community provider. It unlocks Anthropic, OpenAI, Google (Gemini + Vertex), Groq, Mistral, Cerebras, xAI, OpenRouter, Hugging Face, and more under a single `provider: pi` entry. +**One adapter, ~20 LLM backends.** Pi (`@mariozechner/pi-coding-agent`) is a community-maintained coding-agent harness that Archon integrates as the first community provider. It unlocks Anthropic, OpenAI, Google (Gemini + Vertex), Groq, Mistral, Cerebras, xAI, OpenRouter, Hugging Face, and local inference (LM Studio, ollama, llamacpp, custom OpenAI-compatible endpoints registered in `~/.pi/agent/models.json`) under a single `provider: pi` entry. Pi is registered as `builtIn: false` — it validates the community-provider seam rather than being a core-team-maintained option. If it proves stable and valuable it may be promoted to `builtIn: true` later. @@ -262,7 +262,20 @@ Pi supports both OAuth subscriptions and API keys. Archon's adapter reads your e | `openrouter` | `OPENROUTER_API_KEY` | | `huggingface` | `HUGGINGFACE_API_KEY` | -Additional Pi backends exist (Azure, Bedrock, Vertex, etc.) — file an issue if you need them wired. +Additional cloud backends exist (Azure, Bedrock, Vertex, etc.) — file an issue if you need an env-var shortcut wired for them. + +**Local / custom providers (no credentials needed):** + +Providers that aren't in the env-var table above (LM Studio, ollama, llamacpp, custom OpenAI-compatible endpoints) work without any Archon-side configuration. Register them in `~/.pi/agent/models.json` per Pi's own docs and reference them as `/`: + +```yaml +# .archon/config.yaml +assistants: + pi: + model: lm-studio/qwen2.5-coder-14b # whatever ID you registered with Pi +``` + +Archon logs an info-level `pi.auth_missing` event when no credentials are found and continues — Pi's SDK then connects directly to the local endpoint defined in `models.json`. If the provider does require auth (a less-common cloud backend not in the env-var table) the SDK call fails downstream; the `pi.auth_missing` breadcrumb in the log lets you trace it back to a missing env-var mapping. ### Extensions (on by default) diff --git a/packages/docs-web/src/content/docs/guides/authoring-workflows.md b/packages/docs-web/src/content/docs/guides/authoring-workflows.md index 0fbc282640..2e3f4f9e37 100644 --- a/packages/docs-web/src/content/docs/guides/authoring-workflows.md +++ b/packages/docs-web/src/content/docs/guides/authoring-workflows.md @@ -126,6 +126,10 @@ worktree: # Optional: pin isolation behavior regardless o # like triage/reporting. true = must use a worktree; # CLI --no-worktree hard-errors. Omit to let the # caller decide (current default = worktree). +tags: [GitLab, Review] # Optional: explicit Web UI filter tags. Overrides the + # keyword-based tag inference. An empty list (`tags: []`) + # suppresses inference and shows no tags. Omit to fall + # back to inferred tags (the default). # Required for DAG-based nodes: @@ -174,6 +178,7 @@ nodes: | `command` | string | Command name to load from `.archon/commands/` | | `prompt` | string | Inline prompt string | | `bash` | string | Shell script (no AI). Stdout captured as `$nodeId.output`. Optional `timeout` (ms, default 120000) | +| `script` | string | TypeScript/JavaScript (via `bun`) or Python (via `uv`) — inline code or named reference to `.archon/scripts/`. Stdout captured as `$nodeId.output`. Requires `runtime: bun` or `runtime: uv`. Optional `deps` (uv only) and `timeout` (ms, default 120000). See [Script Nodes](/guides/script-nodes/) | | `loop` | object | Iterative AI prompt until completion signal. See [Loop Nodes](/guides/loop-nodes/) | | `approval` | object | Pauses workflow for human review. See [Approval Nodes](/guides/approval-nodes/) | | `cancel` | string | Terminates the workflow run with a reason string. Uses existing cancellation plumbing — in-flight parallel nodes are stopped | diff --git a/packages/docs-web/src/content/docs/guides/global-workflows.md b/packages/docs-web/src/content/docs/guides/global-workflows.md index 282881e312..a4651ba0ec 100644 --- a/packages/docs-web/src/content/docs/guides/global-workflows.md +++ b/packages/docs-web/src/content/docs/guides/global-workflows.md @@ -6,7 +6,7 @@ area: workflows audience: [user] status: current sidebar: - order: 8 + order: 9 --- Workflows placed in `~/.archon/workflows/`, commands in `~/.archon/commands/`, and scripts in `~/.archon/scripts/` are loaded globally -- they appear in every project and can be invoked from any repository. Workflows and commands carry the `source: 'global'` label in the Web UI node palette; scripts resolve under the same repo-wins-over-home precedence. diff --git a/packages/docs-web/src/content/docs/guides/hooks.md b/packages/docs-web/src/content/docs/guides/hooks.md index 3e6928ae21..201e60c3cb 100644 --- a/packages/docs-web/src/content/docs/guides/hooks.md +++ b/packages/docs-web/src/content/docs/guides/hooks.md @@ -6,7 +6,7 @@ area: workflows audience: [user] status: current sidebar: - order: 5 + order: 6 --- DAG workflow nodes support a `hooks` field that attaches Claude Agent SDK hooks diff --git a/packages/docs-web/src/content/docs/guides/index.md b/packages/docs-web/src/content/docs/guides/index.md index 0d53209fb6..f3cce0d69e 100644 --- a/packages/docs-web/src/content/docs/guides/index.md +++ b/packages/docs-web/src/content/docs/guides/index.md @@ -20,6 +20,7 @@ How-to guides for building and running AI coding workflows with Archon. - [Loop Nodes](/guides/loop-nodes/) — Iterative AI execution with completion conditions and deterministic exit checks - [Approval Nodes](/guides/approval-nodes/) — Human review gates with optional AI rework on rejection +- [Script Nodes](/guides/script-nodes/) — TypeScript/JavaScript (bun) or Python (uv) as a deterministic DAG node, without AI ## Node Features (Claude only) diff --git a/packages/docs-web/src/content/docs/guides/loop-nodes.md b/packages/docs-web/src/content/docs/guides/loop-nodes.md index 0e9e3eebc3..1420c9670a 100644 --- a/packages/docs-web/src/content/docs/guides/loop-nodes.md +++ b/packages/docs-web/src/content/docs/guides/loop-nodes.md @@ -90,10 +90,13 @@ substitution: | `$WORKFLOW_ID` | Current workflow run ID | | `$nodeId.output` | Output from upstream nodes | | `$LOOP_USER_INPUT` | User feedback provided via `/workflow approve ` at an interactive loop gate. Only populated on the first iteration of a resumed interactive loop; empty string on all other iterations. | +| `$LOOP_PREV_OUTPUT` | Cleaned output of the previous loop iteration. Empty string on the first iteration. Useful for `fresh_context: true` loops that need to reference what the previous pass produced or why it failed. | `$USER_MESSAGE` is particularly important for `fresh_context: true` loops — the agent has no memory of prior iterations, so the prompt must include all -context needed to continue the work. +context needed to continue the work. `$LOOP_PREV_OUTPUT` complements this by +exposing the previous iteration's own output without forcing the engine to +thread the session. ### `until` @@ -177,6 +180,39 @@ The prompt tells the agent it has no memory and must bootstrap from files. window exhaustion is a risk. The agent reads `.archon/ralph/*/prd.json` or similar tracking files to know what's done and what's next. +### Retry-on-failure with `$LOOP_PREV_OUTPUT` + +When `fresh_context: true` is needed (to keep each iteration's context window +small) but the agent still benefits from knowing what the previous pass said — +typical of implement→validate or generate→review loops — inject the previous +iteration's output via `$LOOP_PREV_OUTPUT`: + +```yaml +- id: implement-and-qa + loop: + prompt: | + Implement the plan, then run `bun run validate`. + If checks fail, fix the failures. + + Previous iteration output (empty on first pass): + $LOOP_PREV_OUTPUT + + Use the above to focus your fixes. When all checks pass output: + QA_PASS + until: QA_PASS + fresh_context: true + max_iterations: 3 +``` + +In a continuous run, the first iteration sees `$LOOP_PREV_OUTPUT` substituted +to an empty string; iterations 2+ see the previous iteration's cleaned output +(after `` tags are stripped). + +When a loop resumes from an interactive approval gate, the first executed +iteration after the resume also receives an empty `$LOOP_PREV_OUTPUT` even if +its numeric iteration is 2+ — the prior output lived in a different run and is +not carried across the gate. + ### Accumulating context The agent builds on its own prior work across iterations. Good for iterative diff --git a/packages/docs-web/src/content/docs/guides/mcp-servers.md b/packages/docs-web/src/content/docs/guides/mcp-servers.md index 46474477e2..c777964d75 100644 --- a/packages/docs-web/src/content/docs/guides/mcp-servers.md +++ b/packages/docs-web/src/content/docs/guides/mcp-servers.md @@ -6,7 +6,7 @@ area: workflows audience: [user] status: current sidebar: - order: 6 + order: 7 --- DAG workflow nodes support a `mcp` field that attaches MCP (Model Context Protocol) diff --git a/packages/docs-web/src/content/docs/guides/remotion-workflow.md b/packages/docs-web/src/content/docs/guides/remotion-workflow.md index d68831be91..666b1ad916 100644 --- a/packages/docs-web/src/content/docs/guides/remotion-workflow.md +++ b/packages/docs-web/src/content/docs/guides/remotion-workflow.md @@ -6,7 +6,7 @@ area: workflows audience: [user] status: current sidebar: - order: 9 + order: 10 --- The `archon-remotion-generate` workflow uses AI to create Remotion video compositions. diff --git a/packages/docs-web/src/content/docs/guides/script-nodes.md b/packages/docs-web/src/content/docs/guides/script-nodes.md new file mode 100644 index 0000000000..dcf2b985f6 --- /dev/null +++ b/packages/docs-web/src/content/docs/guides/script-nodes.md @@ -0,0 +1,352 @@ +--- +title: Script Nodes +description: Run TypeScript, JavaScript, or Python code as a DAG node without invoking an AI agent. +category: guides +area: workflows +audience: [user] +status: current +sidebar: + order: 5 +--- + +DAG workflow nodes support a `script` field that runs a TypeScript, JavaScript, +or Python snippet as part of the workflow. No AI agent is invoked — the script +runs via the `bun` or `uv` runtime, `stdout` is captured as the node's output, +and the result is available downstream as `$nodeId.output`. + +Use script nodes for deterministic work that needs a real programming language: +parsing JSON, transforming data between upstream AI nodes, calling HTTP APIs +with typed clients, or computing values that a shell one-liner would mangle. +If a plain shell command is enough, use a [`bash:` node](/guides/authoring-workflows/#node-fields) +instead. + +## Quick Start + +### Inline TypeScript (bun) + +```yaml +nodes: + - id: parse + script: | + const data = { count: 42, label: "ok" }; + console.log(JSON.stringify(data)); + runtime: bun +``` + +### Inline Python (uv) + +```yaml +nodes: + - id: compute + script: | + import json, statistics + values = [1, 2, 3, 4, 5] + print(json.dumps({ "mean": statistics.mean(values) })) + runtime: uv +``` + +### Named script from `.archon/scripts/` + +```yaml +nodes: + - id: fetch-pages + script: fetch-github-pages # resolves .archon/scripts/fetch-github-pages.ts + runtime: bun + timeout: 60000 +``` + +The file `.archon/scripts/fetch-github-pages.ts` is loaded and executed with +`bun --no-env-file run `. + +## How It Works + +1. **Substitute variables.** `$ARGUMENTS`, `$WORKFLOW_ID`, `$ARTIFACTS_DIR`, + `$BASE_BRANCH`, `$DOCS_DIR`, and upstream `$nodeId.output` references are + substituted into the `script` text before execution. +2. **Detect inline vs named.** If the `script` value contains a newline or any + shell metacharacter (see [Inline vs Named Scripts](#inline-vs-named-scripts) + below), it's treated as inline code. Otherwise it's treated as a named-script + reference. +3. **Dispatch.** + - `runtime: bun` + inline → `bun --no-env-file -e ''` + - `runtime: bun` + named → `bun --no-env-file run ` + - `runtime: uv` + inline → `uv run [--with dep ...] python -c ''` + - `runtime: uv` + named → `uv run [--with dep ...] ` +4. **Capture.** `stdout` (with the trailing newline stripped) becomes + `$nodeId.output`. `stderr` is logged as a warning and posted to the + conversation but does **not** fail the node. A non-zero exit code fails it. + +## YAML Schema + +```yaml +- id: node-name + script: # required, non-empty + runtime: bun | uv # required + deps: ["httpx", "pydantic>=2"] # optional, uv-only (see below) + timeout: 60000 # optional ms, default 120000 + depends_on: [upstream] # optional + when: "$upstream.output != ''" # optional + trigger_rule: all_success # optional (default) + retry: # optional; same shape as bash/AI nodes + max_attempts: 3 + on_error: transient +``` + +### Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `script` | string | Yes | Inline code, or the basename (no extension) of a file in `.archon/scripts/` or `~/.archon/scripts/` | +| `runtime` | `'bun'` \| `'uv'` | Yes | Which runtime executes the script. Must match the file extension for named scripts | +| `deps` | string[] | No | Python dependencies to install for this run. **uv only** — ignored with a warning for `bun` | +| `timeout` | number (ms) | No | Hard kill after this many milliseconds. Default: `120000` (2 min) | + +Standard DAG fields (`id`, `depends_on`, `when`, `trigger_rule`, `retry`) all +work. AI-specific fields (`model`, `provider`, `context`, `output_format`, +`allowed_tools`, `denied_tools`, `hooks`, `mcp`, `skills`, `agents`, `effort`, +`thinking`, `maxBudgetUsd`, `systemPrompt`, `fallbackModel`, `betas`, `sandbox`) +are accepted by the parser but emit a loader warning and are ignored at runtime +— no AI is invoked. `idle_timeout` is also accepted but ignored: script nodes +run as one-shot subprocesses, so use `timeout` (hard kill after N ms) instead. + +## Inline vs Named Scripts + +The executor decides mode from the `script` string itself. A value is treated +as **inline code** if it contains a newline or any shell metacharacter; otherwise +it's a **named script** lookup. + +- **Metacharacters that trigger inline mode:** space, `;` `(` `)` `{` `}` `&` + `|` `<` `>` `$` `` ` `` `"` `'` +- **Inline examples:** `"const x = 1; console.log(x)"`, multi-line blocks, any + snippet with a space +- **Named examples:** `fetch-pages`, `analyze_metrics`, `triage-fmt` — bare + identifiers with no whitespace or shell syntax + +If you want an inline snippet that happens to be syntactically a single +identifier, add a trailing comment or newline to force inline mode. + +### Named Script Resolution + +Named scripts are discovered from, in precedence order: + +1. `/.archon/scripts/` — repo-local +2. `~/.archon/scripts/` — home-scoped (shared across every repo) + +Each directory is walked one subfolder deep (e.g. `.archon/scripts/triage/foo.ts` +resolves as `foo`). Deeper nesting is ignored. On a same-name collision the +repo-local entry wins silently — see [Global Workflows](/guides/global-workflows/) +for the shared precedence rules. + +### Extension ↔ Runtime Mapping + +Named scripts derive their runtime from the file extension: + +| Extension | Runtime | +|-----------|---------| +| `.ts`, `.js` | `bun` | +| `.py` | `uv` | + +The `runtime:` declared on the node **must match the file's extension** — the +validator rejects `runtime: uv` pointing at a `.ts` file, and vice versa. For +inline scripts, you can use any language that the chosen runtime supports. + +## Dependencies (uv only) + +`deps` is a pass-through to `uv run --with `, which installs packages into +a per-run ephemeral environment: + +```yaml +- id: scrape + script: | + import httpx + r = httpx.get("https://api.github.com/repos/anthropics/anthropic-cookbook") + print(r.text) + runtime: uv + deps: ["httpx>=0.27"] +``` + +- **Version pinning** — any PEP 508 specifier works (`pkg==1.2.3`, `pkg>=2,<3`). +- **Bun ignores `deps`** — Bun auto-installs imported packages on first run, so + the validator emits a warning if you set `deps` with `runtime: bun`. Remove + the field, or switch to `uv` if you need explicit dependency management. +- **No persistent environment** — each run is isolated; there is no `requirements.txt` + or lockfile to maintain. + +## Output and Data Flow + +`stdout` (trimmed of its trailing newline) becomes `$nodeId.output`. Print JSON +if you want downstream nodes to access structured fields with +`$nodeId.output.field` — the workflow engine tries to parse the output as JSON +for field access in `when:` conditions and prompt substitution. + +```yaml +- id: classify + script: | + const input = process.argv.slice(2).join(' '); + const severity = input.includes('crash') ? 'high' : 'low'; + console.log(JSON.stringify({ severity, length: input.length })); + runtime: bun + +- id: investigate + command: investigate-bug + depends_on: [classify] + when: "$classify.output.severity == 'high'" +``` + +### Variable Substitution in Scripts + +Variables are substituted into the `script` text **as raw strings, without +shell quoting** — unlike `bash:` nodes, where `$nodeId.output` values are +auto-quoted. Treat substituted values as untrusted input and parse them with +language features, not by interpolating into shell syntax. + +:::caution[Avoid String.raw with `$nodeId.output`] +The pattern `` String.raw`$nodeId.output` `` looks safe but fails silently when +the substituted value contains a backtick — common in AI-generated markdown, +`output_format` payloads, or any output with inline code spans. The backtick +terminates the template literal early, producing a cryptic `Expected ";"` parse +error at runtime. + +**Use direct assignment instead.** JSON is a strict subset of JavaScript +expression syntax, so the substituted value is always a valid JS literal: + +```typescript +// Safe — works for any valid JSON, including content with backticks +const data = $fetch-issue.output; + +// Fragile — breaks if output contains a backtick +const data = JSON.parse(String.raw`$fetch-issue.output`); // DON'T +``` +::: + +For **named scripts**, variables are not passed automatically. Read them from +the environment (`process.env.USER_MESSAGE`, `os.environ['USER_MESSAGE']`) +or accept them via stdin. For **inline scripts**, substituted variables are +literally embedded into the code string at execution time. + +## Environment and Isolation + +Script subprocesses receive `process.env` merged with any codebase-scoped env +vars you've configured via the Web UI (Settings → Projects → Env Vars) or the +`env:` block in `.archon/config.yaml`. This is the same injection surface used +by Claude, Codex, and bash nodes. + +**Target repo `.env` isolation:** the Bun subprocess is invoked with +`--no-env-file`, so variables in the target repo's `.env` do **not** leak into +the script. Archon-managed env (from `~/.archon/.env` and `/.archon/.env`) +passes through normally. `uv`-launched Python subprocesses do not auto-load +`.env` at all. See [Security Model](/reference/security/#target-repo-env-isolation) +for the full story. + +## Validation + +`archon validate workflows ` checks script nodes for: + +- **Script file exists** — for named scripts, the basename must exist in + `.archon/scripts/` or `~/.archon/scripts/` with a matching extension for + the declared runtime. Missing files fail validation with a hint showing + the expected path. +- **Runtime available on PATH** — `bun` or `uv` must be installed. Missing + runtimes emit a warning with the official install command: + - `curl -fsSL https://bun.sh/install | bash` + - `curl -LsSf https://astral.sh/uv/install.sh | sh` +- **`deps` with `runtime: bun`** — warns that `deps` is a no-op under Bun. + +Runtime availability is cached per-process — the check spawns `which bun` / +`which uv` once and memoizes the result. + +## Patterns + +### Transform AI output before the next node + +Use a script node as a deterministic adapter between two AI nodes. The script +parses the upstream classifier's JSON, filters, and forwards a clean payload: + +```yaml +- id: classify + prompt: "Classify: $ARGUMENTS" + allowed_tools: [] + output_format: + type: object + properties: + items: + type: array + items: { type: object } + +- id: filter + script: | + const upstream = JSON.parse(process.env.UPSTREAM ?? '{}'); + const high = (upstream.items ?? []).filter(i => i.severity === 'high'); + console.log(JSON.stringify(high)); + runtime: bun + depends_on: [classify] + +- id: triage + command: triage-high-severity + depends_on: [filter] + when: "$filter.output != '[]'" +``` + +*(Note: to actually populate `UPSTREAM` you'd inline-substitute +`$classify.output` into the script body. The example above illustrates the +shape.)* + +### Reusable helper in `~/.archon/scripts/` + +A helper you want available in every repo — say, a triage summary formatter — +lives at `~/.archon/scripts/triage-fmt.ts`: + +```typescript +// ~/.archon/scripts/triage-fmt.ts +const raw = process.argv.slice(2).join(' ') || '{}'; +const data = JSON.parse(raw); +const lines = data.issues?.map((i: { id: string; title: string }) => + `- [${i.id}] ${i.title}` +).join('\n') ?? ''; +console.log(lines || 'no issues'); +``` + +Then reference it by name from any repo's workflow: + +```yaml +- id: format + script: triage-fmt + runtime: bun + depends_on: [gather] +``` + +### Python with scientific dependencies + +```yaml +- id: analyze + script: | + import json, sys + import pandas as pd + data = json.loads(sys.argv[1]) if len(sys.argv) > 1 else [] + df = pd.DataFrame(data) + print(df.describe().to_json()) + runtime: uv + deps: ["pandas>=2.0"] + depends_on: [collect] +``` + +## What Does NOT Work + +- **AI-only features** — `hooks`, `mcp`, `skills`, `allowed_tools`, + `denied_tools`, `agents`, `model`, `provider`, `output_format`, `effort`, + `thinking`, `maxBudgetUsd`, `systemPrompt`, `fallbackModel`, `betas`, and + `sandbox` are all ignored at runtime. The loader emits a warning listing + the ignored fields. +- **Interactive prompts** — the script runs headlessly; any `stdin` read will + see EOF immediately. +- **Runtimes other than `bun` and `uv`** — rejected at parse time. +- **Cancelling mid-execution** — script subprocesses are killed on workflow + cancel, but there's no cooperative cancellation signal. Design scripts to + complete quickly or fail fast. + +## See Also + +- [Authoring Workflows](/guides/authoring-workflows/) — full workflow reference +- [Global Workflows, Commands, and Scripts](/guides/global-workflows/) — home-scoped `~/.archon/scripts/` +- [Security Model](/reference/security/#target-repo-env-isolation) — env isolation details +- [Variables Reference](/reference/variables/) — substitution rules diff --git a/packages/docs-web/src/content/docs/guides/skills.md b/packages/docs-web/src/content/docs/guides/skills.md index d27262ffac..f64b6def3d 100644 --- a/packages/docs-web/src/content/docs/guides/skills.md +++ b/packages/docs-web/src/content/docs/guides/skills.md @@ -6,7 +6,7 @@ area: workflows audience: [user] status: current sidebar: - order: 7 + order: 8 --- DAG workflow nodes support a `skills` field that preloads named skills into the diff --git a/packages/docs-web/src/content/docs/reference/variables.md b/packages/docs-web/src/content/docs/reference/variables.md index f32779cb6c..ecbc626d6c 100644 --- a/packages/docs-web/src/content/docs/reference/variables.md +++ b/packages/docs-web/src/content/docs/reference/variables.md @@ -8,11 +8,11 @@ sidebar: order: 5 --- -Archon substitutes variables in command files, inline prompts, and bash scripts before execution. There are three categories of variables: workflow variables (substituted by the workflow engine), positional arguments (substituted by the command handler), and node output references (DAG workflows only). +Archon substitutes variables in command files, inline prompts, bash scripts, and `script:` node bodies before execution. There are three categories of variables: workflow variables (substituted by the workflow engine), positional arguments (substituted by the command handler), and node output references (DAG workflows only). ## Workflow Variables -These variables are substituted by the workflow executor in all node types (`command:`, `prompt:`, `bash:`, `loop:`). +These variables are substituted by the workflow executor in all node types (`command:`, `prompt:`, `bash:`, `script:`, `loop:`). | Variable | Resolves to | Notes | |----------|-------------|-------| @@ -27,6 +27,7 @@ These variables are substituted by the workflow executor in all node types (`com | `$ISSUE_CONTEXT` | Same as `$CONTEXT` | Alias | | `$LOOP_USER_INPUT` | User feedback from an interactive loop approval gate | Only populated on the first iteration of a resumed interactive loop. Empty string on all other iterations | | `$REJECTION_REASON` | Reviewer feedback from an approval node rejection | Only available in `on_reject` prompts. Empty string elsewhere | +| `$LOOP_PREV_OUTPUT` | Cleaned output of the previous loop iteration (loop nodes only) | Empty string on the first iteration. Useful for `fresh_context: true` loops that need to reference the prior pass without carrying the full session history | ### Context Variable Behavior @@ -64,6 +65,10 @@ In DAG workflows, nodes can reference the output of any completed upstream node. | `$nodeId.output` | Full output string of the referenced node | The node must be a declared dependency (in `depends_on`) | | `$nodeId.output.field` | A specific JSON field from the node's output | Requires the upstream node to use `output_format` for structured JSON | +### Shell Quoting in `bash:` vs `script:` + +`$nodeId.output` values are **auto shell-quoted** (single-quoted, with embedded `'` escaped) when substituted into `bash:` scripts, so the value is always safe to embed in a shell command. They are **not** shell-quoted when substituted into `script:` bodies — the raw value is embedded as-is. For script nodes, treat substituted values as untrusted input and parse them with language features (e.g. `JSON.parse`), not by interpolating into shell syntax. + ### Example ```yaml @@ -88,7 +93,7 @@ nodes: Variables are substituted in a defined order: -1. **Workflow variables** -- `$WORKFLOW_ID`, `$USER_MESSAGE`, `$ARGUMENTS`, `$ARTIFACTS_DIR`, `$BASE_BRANCH`, `$DOCS_DIR`, `$LOOP_USER_INPUT`, `$REJECTION_REASON` +1. **Workflow variables** -- `$WORKFLOW_ID`, `$USER_MESSAGE`, `$ARGUMENTS`, `$ARTIFACTS_DIR`, `$BASE_BRANCH`, `$DOCS_DIR`, `$LOOP_USER_INPUT`, `$REJECTION_REASON`, `$LOOP_PREV_OUTPUT` 2. **Context variables** -- `$CONTEXT`, `$EXTERNAL_CONTEXT`, `$ISSUE_CONTEXT` 3. **Node output references** -- `$nodeId.output`, `$nodeId.output.field` @@ -107,4 +112,5 @@ Positional arguments (`$1` through `$9`) are substituted separately by the comma | `$CONTEXT` / aliases | Yes | No | No | | `$LOOP_USER_INPUT` | Yes (loop nodes) | No | No | | `$REJECTION_REASON` | Yes (`on_reject` only) | No | No | +| `$LOOP_PREV_OUTPUT` | Yes (loop nodes) | No | No | | `$nodeId.output` | Yes (DAG nodes) | No | Yes | diff --git a/packages/providers/src/claude/binary-resolver.test.ts b/packages/providers/src/claude/binary-resolver.test.ts index f87e78f36d..c5c407a531 100644 --- a/packages/providers/src/claude/binary-resolver.test.ts +++ b/packages/providers/src/claude/binary-resolver.test.ts @@ -5,6 +5,8 @@ * with BUNDLED_IS_BINARY=true, which conflicts with other test files. */ import { describe, test, expect, mock, beforeEach, afterAll, spyOn } from 'bun:test'; +import { homedir } from 'node:os'; +import { join } from 'node:path'; import { createMockLogger } from '../test/mocks/logger'; const mockLogger = createMockLogger(); @@ -76,7 +78,55 @@ describe('resolveClaudeBinaryPath (binary mode)', () => { expect(result).toBe('/env/cli.js'); }); - test('throws with install instructions when nothing configured', async () => { + test('autodetects native installer path when env and config are unset', async () => { + // Mirror the implementation: use os.homedir() + node:path.join so the + // expected path matches the platform's actual home dir and separator. + const expected = join( + homedir(), + '.local', + 'bin', + process.platform === 'win32' ? 'claude.exe' : 'claude' + ); + // File exists only at the native-installer path. + fileExistsSpy = spyOn(resolver, 'fileExists').mockImplementation( + (path: string) => path === expected + ); + + const result = await resolver.resolveClaudeBinaryPath(); + expect(result).toBe(expected); + // Log must mark this as autodetect, not 'env' or 'config' — the source + // string is load-bearing for debug triage. + expect(mockLogger.info).toHaveBeenCalledWith( + { binaryPath: expected, source: 'autodetect' }, + 'claude.binary_resolved' + ); + }); + + test('env var takes precedence over autodetect when both would match', async () => { + process.env.CLAUDE_BIN_PATH = '/custom/env/claude'; + fileExistsSpy = spyOn(resolver, 'fileExists').mockReturnValue(true); + + const result = await resolver.resolveClaudeBinaryPath(); + expect(result).toBe('/custom/env/claude'); + expect(mockLogger.info).toHaveBeenCalledWith( + { binaryPath: '/custom/env/claude', source: 'env' }, + 'claude.binary_resolved' + ); + }); + + test('config takes precedence over autodetect when both would match', async () => { + fileExistsSpy = spyOn(resolver, 'fileExists').mockReturnValue(true); + + const result = await resolver.resolveClaudeBinaryPath('/custom/config/claude'); + expect(result).toBe('/custom/config/claude'); + expect(mockLogger.info).toHaveBeenCalledWith( + { binaryPath: '/custom/config/claude', source: 'config' }, + 'claude.binary_resolved' + ); + }); + + test('throws with install instructions when nothing is configured and autodetect misses', async () => { + // Every probe returns false — env unset, config unset, native path absent. fileExistsSpy = spyOn(resolver, 'fileExists').mockReturnValue(false); const promise = resolver.resolveClaudeBinaryPath(); diff --git a/packages/providers/src/claude/binary-resolver.ts b/packages/providers/src/claude/binary-resolver.ts index f236acb277..c2273d85d2 100644 --- a/packages/providers/src/claude/binary-resolver.ts +++ b/packages/providers/src/claude/binary-resolver.ts @@ -9,13 +9,16 @@ * Resolution order (binary mode only): * 1. `CLAUDE_BIN_PATH` environment variable * 2. `assistants.claude.claudeBinaryPath` in config - * 3. Throw with install instructions + * 3. Autodetect canonical install path (native installer default) + * 4. Throw with install instructions * * In dev mode (BUNDLED_IS_BINARY=false), returns undefined so the caller * omits `pathToClaudeCodeExecutable` entirely and the SDK resolves via its * normal node_modules lookup. */ import { existsSync as _existsSync } from 'node:fs'; +import { homedir } from 'node:os'; +import { join } from 'node:path'; import { BUNDLED_IS_BINARY, createLogger } from '@archon/paths'; /** Wrapper for existsSync — enables spyOn in tests (direct imports can't be spied on). */ @@ -89,6 +92,25 @@ export async function resolveClaudeBinaryPath( return configClaudeBinaryPath; } - // 3. Not found — throw with install instructions + // 3. Autodetect — the Anthropic native installer + // (`curl -fsSL https://claude.ai/install.sh | bash` on macOS/Linux, + // `irm https://claude.ai/install.ps1 | iex` on Windows) writes the + // executable to a fixed location relative to $HOME. Users who follow + // the recommended install path don't need any env var or config entry; + // users who deviate (npm global, custom path, etc.) still set one of + // the higher-priority sources above. + const nativeInstallerPath = + process.platform === 'win32' + ? join(homedir(), '.local', 'bin', 'claude.exe') + : join(homedir(), '.local', 'bin', 'claude'); + if (fileExists(nativeInstallerPath)) { + getLog().info( + { binaryPath: nativeInstallerPath, source: 'autodetect' }, + 'claude.binary_resolved' + ); + return nativeInstallerPath; + } + + // 4. Not found — throw with install instructions throw new Error(INSTALL_INSTRUCTIONS); } diff --git a/packages/providers/src/codex/binary-resolver.test.ts b/packages/providers/src/codex/binary-resolver.test.ts index 1df4e7c6f6..a121e4c204 100644 --- a/packages/providers/src/codex/binary-resolver.test.ts +++ b/packages/providers/src/codex/binary-resolver.test.ts @@ -87,7 +87,70 @@ describe('resolveCodexBinaryPath (binary mode)', () => { expect(normalized).toContain('/tmp/test-archon-home/vendor/codex/'); }); + test('autodetects npm global install at ~/.npm-global/bin/codex (POSIX)', async () => { + if (process.platform === 'win32') return; // POSIX-only probe + const home = process.env.HOME ?? '/Users/test'; + const expected = `${home}/.npm-global/bin/codex`; + fileExistsSpy = spyOn(resolver, 'fileExists').mockImplementation( + (path: string) => path === expected + ); + + const result = await resolver.resolveCodexBinaryPath(); + expect(result).toBe(expected); + expect(mockLogger.info).toHaveBeenCalledWith( + { binaryPath: expected, source: 'autodetect' }, + 'codex.binary_resolved' + ); + }); + + test('autodetects homebrew install on Apple Silicon', async () => { + if (process.platform !== 'darwin' || process.arch !== 'arm64') { + // `/opt/homebrew/bin/codex` is only probed on darwin-arm64; on other + // hosts this test has nothing to assert (the probe list excludes it). + return; + } + fileExistsSpy = spyOn(resolver, 'fileExists').mockImplementation( + (path: string) => path === '/opt/homebrew/bin/codex' + ); + + const result = await resolver.resolveCodexBinaryPath(); + expect(result).toBe('/opt/homebrew/bin/codex'); + expect(mockLogger.info).toHaveBeenCalledWith( + { binaryPath: '/opt/homebrew/bin/codex', source: 'autodetect' }, + 'codex.binary_resolved' + ); + }); + + test('autodetects system install at /usr/local/bin/codex', async () => { + if (process.platform === 'win32') { + // /usr/local/bin is not probed on Windows. + return; + } + fileExistsSpy = spyOn(resolver, 'fileExists').mockImplementation( + (path: string) => path === '/usr/local/bin/codex' + ); + + const result = await resolver.resolveCodexBinaryPath(); + expect(result).toBe('/usr/local/bin/codex'); + }); + + test('vendor directory takes precedence over autodetect', async () => { + // Both vendor and npm-global would match; vendor must win (lower tier #). + fileExistsSpy = spyOn(resolver, 'fileExists').mockImplementation((path: string) => { + const normalized = path.replace(/\\/g, '/'); + return normalized.includes('vendor/codex') || normalized.includes('.npm-global'); + }); + + const result = await resolver.resolveCodexBinaryPath(); + expect(result!.replace(/\\/g, '/')).toContain('/vendor/codex/'); + expect(mockLogger.info).toHaveBeenCalledWith( + expect.objectContaining({ source: 'vendor' }), + 'codex.binary_resolved' + ); + }); + test('throws with install instructions when binary not found anywhere', async () => { + // Env unset, config unset, vendor dir empty, every autodetect path missing. fileExistsSpy = spyOn(resolver, 'fileExists').mockReturnValue(false); await expect(resolver.resolveCodexBinaryPath()).rejects.toThrow('Codex CLI binary not found'); diff --git a/packages/providers/src/codex/binary-resolver.ts b/packages/providers/src/codex/binary-resolver.ts index a1e0f01a5b..1ac8e57cfb 100644 --- a/packages/providers/src/codex/binary-resolver.ts +++ b/packages/providers/src/codex/binary-resolver.ts @@ -9,12 +9,14 @@ * 1. `CODEX_BIN_PATH` environment variable * 2. `assistants.codex.codexBinaryPath` in config * 3. `~/.archon/vendor/codex/` (user-placed) - * 4. Throw with install instructions + * 4. Autodetect canonical install paths (npm prefix defaults per platform) + * 5. Throw with install instructions * * In dev mode (BUNDLED_IS_BINARY=false), returns undefined so the SDK * uses its normal node_modules-based resolution. */ import { existsSync as _existsSync } from 'node:fs'; +import { homedir } from 'node:os'; import { join } from 'node:path'; import { BUNDLED_IS_BINARY, getArchonHome, createLogger } from '@archon/paths'; @@ -89,7 +91,19 @@ export async function resolveCodexBinaryPath( } } - // 4. Not found — throw with install instructions + // 4. Autodetect — probe the handful of paths Codex typically lands at + // when installed via the documented package managers. Users who install + // somewhere else (custom npm prefix, etc.) still set one of the higher- + // priority sources above. Order: most specific → least specific. + const autodetectPaths = getAutodetectPaths(); + for (const probePath of autodetectPaths) { + if (fileExists(probePath)) { + getLog().info({ binaryPath: probePath, source: 'autodetect' }, 'codex.binary_resolved'); + return probePath; + } + } + + // 5. Not found — throw with install instructions const vendorPath = `~/.archon/${CODEX_VENDOR_DIR}/`; throw new Error( 'Codex CLI binary not found. The Codex provider requires a native binary\n' + @@ -105,3 +119,47 @@ export async function resolveCodexBinaryPath( ' codexBinaryPath: /path/to/codex\n' ); } + +/** + * Canonical install locations probed by tier 4 autodetect. Grounded in + * the official @openai/codex README and the npm global-install contract + * (npm writes the binary to `{npm_prefix}/bin/` on POSIX and + * `{npm_prefix}\.cmd` on Windows). The probes cover the npm prefix + * a default install lands at on each platform: + * + * - `$HOME/.npm-global/bin/codex` — common when the user ran + * `npm config set prefix ~/.npm-global` to avoid root writes + * - `/opt/homebrew/bin/codex` — mac Apple Silicon with homebrew-node + * (homebrew sets npm prefix to /opt/homebrew) + * - `/usr/local/bin/codex` — mac Intel with homebrew-node, or linux + * with system-installed node (npm prefix defaults to /usr/local) + * - `%AppData%\npm\codex.cmd` — Windows npm global default + * + * Not covered (explicit override required via CODEX_BIN_PATH or config): + * - users with other custom npm prefixes — `npm root -g` would spawn + * a subprocess per resolve, too heavy for a probe helper + * - Homebrew cask install (`brew install --cask codex`) — cask layout + * isn't a PATH binary; users should symlink or set the path + * - manual GitHub Releases extract — placement is user-determined + */ +function getAutodetectPaths(): string[] { + const paths: string[] = []; + + if (process.platform === 'win32') { + const appData = process.env.APPDATA; + if (appData) paths.push(join(appData, 'npm', 'codex.cmd')); + paths.push(join(homedir(), '.npm-global', 'codex.cmd')); + return paths; + } + + // POSIX (macOS + Linux) + paths.push(join(homedir(), '.npm-global', 'bin', 'codex')); + + if (process.platform === 'darwin' && process.arch === 'arm64') { + paths.push('/opt/homebrew/bin/codex'); + } + + paths.push('/usr/local/bin/codex'); + + return paths; +} diff --git a/packages/providers/src/community/pi/provider.test.ts b/packages/providers/src/community/pi/provider.test.ts index 17e6de417d..4de4314147 100644 --- a/packages/providers/src/community/pi/provider.test.ts +++ b/packages/providers/src/community/pi/provider.test.ts @@ -81,7 +81,14 @@ const mockAuthCreate = mock(() => ({ setRuntimeApiKey: mockSetRuntimeApiKey, getApiKey: mockGetApiKey, })); -const mockModelRegistryInMemory = mock(() => ({})); + +const mockModelRegistryFind = mock((provider: string, modelId: string) => { + if (provider === 'nonexistent') return undefined; + return { id: modelId, provider, name: `${provider}/${modelId}` }; +}); +const mockModelRegistryCreate = mock(() => ({ + find: mockModelRegistryFind, +})); // SessionManager mocks. Each returns a tagged session-manager stub so tests // can assert whether resume resolved to an existing session or fell through @@ -115,7 +122,7 @@ const mockCreateLsTool = mock((_cwd: string) => ({ __piTool: 'ls' })); mock.module('@mariozechner/pi-coding-agent', () => ({ createAgentSession: mockCreateAgentSession, AuthStorage: { create: mockAuthCreate }, - ModelRegistry: { inMemory: mockModelRegistryInMemory }, + ModelRegistry: { create: mockModelRegistryCreate }, SessionManager: { create: mockSessionCreate, open: mockSessionOpen, @@ -132,16 +139,6 @@ mock.module('@mariozechner/pi-coding-agent', () => ({ createLsTool: mockCreateLsTool, })); -// getModel is imported from pi-ai. Return a fake model for known refs and -// undefined for unknown refs so the provider's not-found branch is testable. -const mockGetModel = mock((provider: string, modelId: string) => { - if (provider === 'nonexistent') return undefined; - return { id: modelId, provider, name: `${provider}/${modelId}` }; -}); -mock.module('@mariozechner/pi-ai', () => ({ - getModel: mockGetModel, -})); - // Import AFTER mocks are set — module resolution freezes the mocks. import { PiProvider } from './provider'; import { PI_CAPABILITIES } from './capabilities'; @@ -169,6 +166,12 @@ function resetScript(events: FakeEvent[]): void { describe('PiProvider', () => { beforeEach(() => { + mockLogger.fatal.mockClear(); + mockLogger.error.mockClear(); + mockLogger.warn.mockClear(); + mockLogger.info.mockClear(); + mockLogger.debug.mockClear(); + mockLogger.trace.mockClear(); mockPrompt.mockClear(); mockAbort.mockClear(); mockDispose.mockClear(); @@ -177,8 +180,9 @@ describe('PiProvider', () => { mockSetFlagValue.mockClear(); mockResourceLoaderReload.mockClear(); mockCreateAgentSession.mockClear(); - mockGetModel.mockClear(); mockAuthCreate.mockClear(); + mockModelRegistryCreate.mockClear(); + mockModelRegistryFind.mockClear(); mockSetRuntimeApiKey.mockClear(); mockGetApiKey.mockClear(); MockDefaultResourceLoader.mockClear(); @@ -209,6 +213,21 @@ describe('PiProvider', () => { expect(new PiProvider().getCapabilities()).toEqual(PI_CAPABILITIES); }); + test('sendQuery installs PI_PACKAGE_DIR shim before Pi SDK loads', async () => { + // Runtime-safety regression: Pi's config.js reads `getPackageJsonPath()` at + // its module init, which resolves to a non-existent path inside compiled + // archon binaries. The shim writes a stub package.json to tmpdir and sets + // PI_PACKAGE_DIR so Pi's short-circuit kicks in. Must run BEFORE the + // dynamic imports in sendQuery — we verify by calling the fast-fail "no + // model" path (which returns before any Pi SDK logic executes) and + // asserting the env var was set regardless. + delete process.env.PI_PACKAGE_DIR; + expect(process.env.PI_PACKAGE_DIR).toBeUndefined(); + await consume(new PiProvider().sendQuery('hi', '/tmp')); + expect(process.env.PI_PACKAGE_DIR).toBeDefined(); + expect(process.env.PI_PACKAGE_DIR).toContain('archon-pi-shim'); + }); + test('throws when no model is configured', async () => { const { error } = await consume(new PiProvider().sendQuery('hi', '/tmp')); expect(error?.message).toContain('Pi provider requires a model'); @@ -221,15 +240,102 @@ describe('PiProvider', () => { expect(error?.message).toContain('Invalid Pi model ref'); }); - test('throws when Pi provider id is unknown AND no creds available', async () => { - // No env var, no auth.json entry → fail-fast with hint about env-var table + test('logs credential hint when Pi provider id is unknown AND no creds available', async () => { + // No env var, no auth.json entry → log hint, but continue, to support custom providers that don't use credentials or that use non-Pi means of providing credentials. + resetScript(scriptedAgentEnd()); const { error } = await consume( new PiProvider().sendQuery('hi', '/tmp', undefined, { model: 'unknownprovider/some-model', }) ); - expect(error?.message).toContain("no credentials for provider 'unknownprovider'"); - expect(error?.message).toContain("not in the Archon adapter's env-var table"); + + expect(error).toBeUndefined(); + expect(mockLogger.info).toHaveBeenCalledWith( + { + piProvider: 'unknownprovider', + envHint: expect.stringContaining("not in the Archon adapter's env-var table"), + loginHint: expect.stringContaining('/login'), + }, + 'pi.auth_missing' + ); + expect(mockCreateAgentSession).toHaveBeenCalledTimes(1); + }); + + test('ModelRegistry.create receives the AuthStorage instance', async () => { + // Headline-fix wiring: ModelRegistry.create must receive the same + // AuthStorage instance returned by AuthStorage.create(), so registry + // lookups can resolve user-configured custom models from + // ~/.pi/agent/models.json (LM Studio, ollama, llamacpp, etc.). Without + // this wiring the registry only sees the static built-in catalog. + process.env.GEMINI_API_KEY = 'sk-test'; + resetScript(scriptedAgentEnd()); + + await consume( + new PiProvider().sendQuery('hi', '/tmp', undefined, { + model: 'google/gemini-2.5-pro', + }) + ); + + expect(mockAuthCreate).toHaveBeenCalledTimes(1); + expect(mockModelRegistryCreate).toHaveBeenCalledTimes(1); + const authInstance = mockAuthCreate.mock.results[0]?.value; + expect(mockModelRegistryCreate).toHaveBeenCalledWith(authInstance); + }); + + test('AuthStorage.create() throwing surfaces a contextualized error', async () => { + // Both AuthStorage.create() and ModelRegistry.create() read from disk + // and can throw on malformed JSON or filesystem errors. Wrap with + // try/catch and surface a Pi-framed error so operators see the cause + // rather than a raw SDK stack trace. + mockAuthCreate.mockImplementationOnce(() => { + throw new Error('Unexpected token } in JSON at position 42'); + }); + + const { error } = await consume( + new PiProvider().sendQuery('hi', '/tmp', undefined, { + model: 'google/gemini-2.5-pro', + }) + ); + + expect(error).toBeDefined(); + expect(error?.message).toContain('Pi auth storage init failed'); + expect(error?.message).toContain('Unexpected token'); + expect(error?.message).toContain('~/.pi/agent/auth.json'); + expect(mockLogger.error).toHaveBeenCalledWith( + expect.objectContaining({ piProvider: 'google' }), + 'pi.auth_storage_init_failed' + ); + }); + + test('Pi model not found includes models.json load error when registry reports one', async () => { + // ModelRegistry swallows models.json parse/validation errors into an + // internal loadError. When find() returns undefined we surface that + // error in both the structured log and the throw message so users + // debugging a custom-provider config see the actual reason. + process.env.GEMINI_API_KEY = 'sk-test'; + mockModelRegistryFind.mockImplementationOnce(() => undefined); + mockModelRegistryCreate.mockImplementationOnce(() => ({ + find: mockModelRegistryFind, + getError: () => 'Provider lm-studio: "baseUrl" is required when defining custom models.', + })); + + const { error } = await consume( + new PiProvider().sendQuery('hi', '/tmp', undefined, { + model: 'lm-studio/some-model', + }) + ); + + expect(error?.message).toContain('Pi model not found'); + expect(error?.message).toContain('models.json failed to load'); + expect(error?.message).toContain('"baseUrl" is required'); + expect(mockLogger.error).toHaveBeenCalledWith( + expect.objectContaining({ + piProvider: 'lm-studio', + modelId: 'some-model', + loadError: expect.stringContaining('"baseUrl" is required'), + }), + 'pi.model_not_found' + ); }); test('throws when env var missing AND auth.json has no entry', async () => { @@ -280,13 +386,13 @@ describe('PiProvider', () => { expect(mockGetApiKey).toHaveBeenCalledWith('anthropic'); }); - test('throws when getModel returns undefined', async () => { + test('throws when ModelRegistry.find returns undefined', async () => { process.env.GEMINI_API_KEY = 'sk-test'; - // 'nonexistent' is handled in mockGetModel to return undefined, but - // the adapter rejects unknown providers before getModel. To exercise + // 'nonexistent' is handled in mockModelRegistryFind to return undefined, but + // the adapter rejects unknown providers. To exercise // the not-found branch, use a known provider but unknown modelId by - // temporarily swapping mockGetModel to always return undefined. - mockGetModel.mockImplementationOnce(() => undefined); + // temporarily swapping mockModelRegistryFind to always return undefined. + mockModelRegistryFind.mockImplementationOnce(() => undefined); const { error } = await consume( new PiProvider().sendQuery('hi', '/tmp', undefined, { model: 'google/unknown-model-id', diff --git a/packages/providers/src/community/pi/provider.ts b/packages/providers/src/community/pi/provider.ts index e4b6804762..5a14ed6166 100644 --- a/packages/providers/src/community/pi/provider.ts +++ b/packages/providers/src/community/pi/provider.ts @@ -1,5 +1,8 @@ +import { existsSync, mkdirSync, writeFileSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + import { createLogger } from '@archon/paths'; -import type { Api, Model } from '@mariozechner/pi-ai'; import type { IAgentProvider, @@ -24,6 +27,44 @@ import { parsePiModelRef } from './model-ref'; // All Pi SDK value bindings and Pi-dependent helper modules are dynamically // imported inside `sendQuery()` below, which runs only when a Pi workflow is // actually invoked. Type-only imports above are fine — TS erases them. +// +// Lazy-loading defers the crash from boot-time to sendQuery-time — but the +// crash still happens when Pi is actually used. `ensurePiPackageDirShim()` +// (see below) fixes the *runtime* half: before any dynamic Pi import in +// sendQuery, write a stub package.json to tmpdir and point Pi at it via +// its own documented `PI_PACKAGE_DIR` escape hatch. + +/** + * Write a minimal package.json to a stable tmpdir and set `PI_PACKAGE_DIR` + * so Pi's `config.js` short-circuits its `dirname(process.execPath)` walk + * (which fails inside a compiled archon binary). Pi only reads three + * optional fields from that package.json — `piConfig.name`, `piConfig.configDir`, + * and `version` — so the stub is genuinely minimal. Idempotent: the file is + * only written once per host (existsSync check), and the env var is set on + * every call so multiple PiProvider instances stay consistent. + * + * Done on each sendQuery rather than at module load so (a) the file write + * is paid only when Pi is actually used, and (b) the env var can't get + * clobbered between registration and invocation. + */ +function ensurePiPackageDirShim(): void { + const shimDir = join(tmpdir(), 'archon-pi-shim'); + const shimPkgJson = join(shimDir, 'package.json'); + if (!existsSync(shimPkgJson)) { + mkdirSync(shimDir, { recursive: true }); + // `piConfig: {}` is explicit so Pi's defaults (`name: 'pi'`, + // `configDir: '.pi'`) kick in — matches Pi's standalone behavior. + writeFileSync( + shimPkgJson, + JSON.stringify({ + name: 'archon-pi-shim', + version: '0.0.0', + piConfig: {}, + }) + ); + } + process.env.PI_PACKAGE_DIR = shimDir; +} /** * Map Pi provider id → env var name used by pi-ai's getEnvApiKey(). @@ -53,24 +94,6 @@ function getLog(): ReturnType { return cachedLog; } -/** - * Typed wrapper around Pi's `getModel` for a runtime-string provider/model - * pair. Pi's getModel signature constrains `TModelId` to - * `keyof MODELS[TProvider]`, which isn't knowable from a runtime string — - * the local `GetModelFn` alias is the narrowest shape that still lets us - * bypass that constraint. Isolating the escape hatch behind one searchable - * name keeps it auditable. Takes `getModel` as a parameter because the Pi - * SDK is loaded dynamically (see the header comment on this file for why). - */ -type GetModelFn = (provider: string, modelId: string) => Model | undefined; -function lookupPiModel( - getModel: GetModelFn, - provider: string, - modelId: string -): Model | undefined { - return getModel(provider, modelId); -} - /** * Append a "respond with JSON matching this schema" instruction to the user * prompt so Pi-backed models produce parseable structured output. Pi's SDK @@ -98,15 +121,7 @@ ${JSON.stringify(schema, null, 2)}`; /** * Pi community provider — wraps `@mariozechner/pi-coding-agent`'s full * coding-agent harness. Each `sendQuery()` call creates a fresh session - * (no reuse) with in-memory auth/session/settings, so the server never - * touches `~/.pi/` and concurrent calls don't collide. - * - * Capabilities (see `capabilities.ts` for the canonical list): Pi declares - * `sessionResume`, `skills`, `toolRestrictions`, `structuredOutput`, - * `envInjection`, `effortControl`, and `thinkingControl`. Features Pi does - * not currently support through Archon (`mcp`, `hooks`, `agents`, - * `costControl`, `fallbackModel`, `sandbox`) stay off; the dag-executor - * surfaces a warning for any unsupported nodeConfig field. + * (no reuse) so concurrent calls don't collide. */ export class PiProvider implements IAgentProvider { async *sendQuery( @@ -115,6 +130,13 @@ export class PiProvider implements IAgentProvider { resumeSessionId?: string, requestOptions?: SendQueryOptions ): AsyncGenerator { + // Install the PI_PACKAGE_DIR shim BEFORE the dynamic imports below: Pi's + // config.js runs `readFileSync(getPackageJsonPath())` at its own module + // init, and getPackageJsonPath() checks process.env.PI_PACKAGE_DIR first. + // Without this, the dynamic import below would crash with ENOENT on + // `dirname(process.execPath)/package.json` inside a compiled binary. + ensurePiPackageDirShim(); + // Lazy-load Pi SDK and all Pi-dependent helper modules here. Must not move // these imports to module scope — see the header comment for the failure // mode (archon compiled binary crashes at startup when Pi's config.js @@ -125,7 +147,6 @@ export class PiProvider implements IAgentProvider { // destructured PascalCase bindings trip eslint's naming-convention rule. const [ piCodingAgent, - piAi, { bridgeSession }, { resolvePiSkills, resolvePiThinkingLevel, resolvePiTools }, { createNoopResourceLoader }, @@ -133,7 +154,6 @@ export class PiProvider implements IAgentProvider { { createArchonUIBridge, createArchonUIContext }, ] = await Promise.all([ import('@mariozechner/pi-coding-agent'), - import('@mariozechner/pi-ai'), import('./event-bridge'), import('./options-translator'), import('./resource-loader'), @@ -178,39 +198,74 @@ export class PiProvider implements IAgentProvider { ); } - // 2. Look up the Model via Pi's static catalog. `lookupPiModel` returns - // undefined when not found; we guard explicitly below. - // Cast to the runtime-string-friendly shape — see `lookupPiModel`'s docblock. - const model = lookupPiModel(piAi.getModel as GetModelFn, parsed.provider, parsed.modelId); + // 2. Build AuthStorage + ModelRegistry. Both `create()` calls read from + // disk: AuthStorage reads ~/.pi/agent/auth.json (or + // $PI_CODING_AGENT_DIR/auth.json), and ModelRegistry reads + // ~/.pi/agent/models.json — the user's per-host config including + // custom models for local providers (LM Studio, ollama, llamacpp, + // custom OpenAI-compatible endpoints). Reads are synchronous and + // happen on every sendQuery; we don't cache because the user can + // edit either file between calls and expects pickup without restart + // (Pi's `/login` flow rewrites auth.json under a file lock). + // ModelRegistry captures any models.json load/parse error in its + // internal loadError rather than throwing — surfaced below if the + // requested model is then not found. + let authStorage: ReturnType; + let modelRegistry: ReturnType; + try { + authStorage = piCodingAgent.AuthStorage.create(); + modelRegistry = piCodingAgent.ModelRegistry.create(authStorage); + } catch (err) { + const e = err as Error; + getLog().error({ err: e, piProvider: parsed.provider }, 'pi.auth_storage_init_failed'); + throw new Error( + `Pi auth storage init failed: ${e.message}. Check that ~/.pi/agent/auth.json ` + + '(or $PI_CODING_AGENT_DIR/auth.json) is valid JSON and readable.' + ); + } + + // 3. Look up the model. find() returns undefined when not found; if + // models.json itself failed to load (e.g. a custom provider entry + // missing baseUrl/apiKey), surface the load error so users debugging + // custom-provider configs see the actual reason. + const model = modelRegistry.find(parsed.provider, parsed.modelId); if (!model) { + const loadError = modelRegistry.getError?.(); + const loadErrorHint = loadError + ? ` ~/.pi/agent/models.json failed to load: ${loadError}` + : ''; + getLog().error( + { + piProvider: parsed.provider, + modelId: parsed.modelId, + loadError: loadError ?? null, + }, + 'pi.model_not_found' + ); throw new Error( - `Pi model not found: provider='${parsed.provider}' model='${parsed.modelId}'. ` + + `Pi model not found: provider='${parsed.provider}' model='${parsed.modelId}'.${loadErrorHint} ` + 'See https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/models.generated.ts for the Pi model catalog.' ); } - // 3. Build AuthStorage. `AuthStorage.create()` reads ~/.pi/agent/auth.json - // (or $PI_CODING_AGENT_DIR/auth.json), so any credential the user has - // populated via `pi` → `/login` (OAuth subscriptions: Claude Pro/Max, - // ChatGPT Plus, GitHub Copilot, Gemini CLI, Antigravity) or by editing - // the file directly (api_key entries) is picked up transparently. - // - // Per-request env vars override the file via setRuntimeApiKey — this - // mirrors Claude's process-env + request-env merge pattern and - // ensures codebase-scoped env vars (from .archon/config.yaml `env:`) - // win over the user's global Pi login. + // 4. Resolve credentials. authStorage already loaded ~/.pi/agent/auth.json + // so any creds populated via `pi` → `/login` (OAuth subscriptions: + // Claude Pro/Max, ChatGPT Plus, GitHub Copilot, Gemini CLI, + // Antigravity) or by hand-edited api_key entries are picked up + // transparently. Per-request env vars override via setRuntimeApiKey — + // mirrors Claude's process-env + request-env merge so codebase-scoped + // env vars (.archon/config.yaml `env:`) win over the user's global + // Pi login. // // Pi's internal resolution order: // 1. runtime override (our setRuntimeApiKey below) // 2. auth.json api_key entry // 3. auth.json oauth entry (auto-refreshes expired tokens) - // 4. env var fallback (Pi's getEnvApiKey, e.g. ANTHROPIC_API_KEY) + // 4. env var fallback (Pi's getEnvApiKey, e.g. ANTHROPIC_API_KEY) // // OAuth refresh note: Pi refreshes expired access tokens against the // provider's OAuth server and rewrites ~/.pi/agent/auth.json under a // file lock (same mechanism pi CLI uses — safe for concurrent access). - const authStorage = piCodingAgent.AuthStorage.create(); - const envVarName = PI_PROVIDER_ENV_VARS[parsed.provider]; const envOverride = envVarName ? (requestOptions?.env?.[envVarName] ?? process.env[envVarName]) @@ -219,16 +274,28 @@ export class PiProvider implements IAgentProvider { authStorage.setRuntimeApiKey(parsed.provider, envOverride); } - // Fail-fast: resolve creds synchronously before spinning up a session. - // Matches Claude's auth-error fast-fail pattern (no retry on auth failures). const resolvedKey = await authStorage.getApiKey(parsed.provider); if (!resolvedKey) { - const envHint = envVarName - ? `Set ${envVarName} in the environment or codebase env vars (.archon/config.yaml env: section).` - : `Provider '${parsed.provider}' is not in the Archon adapter's env-var table — file an issue if you want a shortcut env var for it.`; - const loginHint = `Or run \`pi\` and type \`/login\` locally to authenticate '${parsed.provider}' via OAuth; credentials land in ~/.pi/agent/auth.json and are picked up automatically.`; - throw new Error( - `Pi auth: no credentials for provider '${parsed.provider}'. ${envHint} ${loginHint}` + if (envVarName) { + const envHint = `Set ${envVarName} in the environment or codebase env vars (.archon/config.yaml env: section).`; + const loginHint = `Or run \`pi\` and type \`/login\` locally to authenticate '${parsed.provider}' via OAuth; credentials land in ~/.pi/agent/auth.json and are picked up automatically.`; + throw new Error( + `Pi auth: no credentials for provider '${parsed.provider}'. ${envHint} ${loginHint}` + ); + } + + // Unmapped providers (LM Studio, ollama, llamacpp, custom + // OpenAI-compatible endpoints) often don't need credentials at all — + // log + continue rather than failing fast so local models work without + // ceremony. If the SDK call later fails for a provider that *does* + // need creds, the auth_missing breadcrumb is searchable in the log. + getLog().info( + { + piProvider: parsed.provider, + envHint: `Provider '${parsed.provider}' is not in the Archon adapter's env-var table — file an issue if you want a shortcut env var for it.`, + loginHint: `Or run \`pi\` and type \`/login\` locally to authenticate '${parsed.provider}' via OAuth; credentials land in ~/.pi/agent/auth.json and are picked up automatically.`, + }, + 'pi.auth_missing' ); } @@ -294,13 +361,11 @@ export class PiProvider implements IAgentProvider { }; } - // ModelRegistry + settings stay in-memory — only sessions persist, to - // match Claude/Codex. Resource loader still suppresses filesystem - // discovery by default, except for explicitly-passed skill paths and — - // when piConfig.enableExtensions is true — Pi's community extension - // ecosystem (tools + lifecycle hooks from ~/.pi/agent/extensions/ and - // packages installed via `pi install npm:`). - const modelRegistry = piCodingAgent.ModelRegistry.inMemory(authStorage); + // Settings stay in-memory — only sessions persist, to match Claude/Codex. + // Resource loader still suppresses filesystem except for explicitly-passed + // skill paths and — when piConfig.enableExtensions is true — Pi's community + // extension ecosystem (tools + lifecycle hooks from ~/.pi/agent/extensions/ + // and packages installed via `pi install npm:`). const settingsManager = piCodingAgent.SettingsManager.inMemory(); // Default ON: extensions (community packages like @plannotator/pi-extension // or your own local ones) are a core reason users run Pi. Opt out with diff --git a/packages/server/src/index.ts b/packages/server/src/index.ts index c1c76cf549..ee14cfef5b 100644 --- a/packages/server/src/index.ts +++ b/packages/server/src/index.ts @@ -385,8 +385,24 @@ export async function startServer(opts: ServerOptions = {}): Promise { .catch(createMessageErrorHandler('Discord', discordAdapter, conversationId)); }); - await discord.start(); - activePlatforms.push('Discord'); + // Don't let a Discord login failure (bad token, missing privileged + // intents, etc.) bring down the whole server — users running + // `archon serve` for the web UI shouldn't lose it because of an + // unrelated bot misconfiguration. See #1365. + try { + await discord.start(); + activePlatforms.push('Discord'); + } catch (error) { + const err = error as Error; + const isPrivilegedIntentError = err.message?.includes('disallowed intents'); + const hint = isPrivilegedIntentError + ? 'Enable "Message Content Intent" in the Discord Developer Portal ' + + '(your application > Bot > Privileged Gateway Intents) and restart, ' + + 'or unset DISCORD_BOT_TOKEN if you do not want the Discord adapter.' + : 'Verify DISCORD_BOT_TOKEN is valid, or unset it to disable the Discord adapter.'; + getLog().error({ err, hint }, 'discord.start_failed_continuing_without_adapter'); + discord = null; + } } else { getLog().info('discord_adapter_skipped'); } diff --git a/packages/web/src/components/workflows/WorkflowCard.tsx b/packages/web/src/components/workflows/WorkflowCard.tsx index 10ed0cd23e..b2a6fc8218 100644 --- a/packages/web/src/components/workflows/WorkflowCard.tsx +++ b/packages/web/src/components/workflows/WorkflowCard.tsx @@ -55,7 +55,7 @@ export function WorkflowCard({ const parsed = parseWorkflowDescription(workflow.description ?? ''); const displayName = getWorkflowDisplayName(workflow.name); const category = getWorkflowCategory(workflow.name, workflow.description ?? ''); - const tags = getWorkflowTags(workflow.name, parsed); + const tags = getWorkflowTags(workflow.name, parsed, workflow.tags); const iconName = getWorkflowIconName(workflow.name, category); const CARD_ICON = ICON_MAP[iconName]; diff --git a/packages/web/src/lib/api.generated.d.ts b/packages/web/src/lib/api.generated.d.ts index 68b4d0a02f..2abcd56361 100644 --- a/packages/web/src/lib/api.generated.d.ts +++ b/packages/web/src/lib/api.generated.d.ts @@ -2345,6 +2345,10 @@ export interface components { args?: string[]; }; }; + worktree?: { + enabled?: boolean; + }; + tags?: string[]; nodes: components['schemas']['DagNode'][]; }; /** @enum {string} */ @@ -2561,6 +2565,7 @@ export interface components { runningWorkflows: number; version?: string; is_docker: boolean; + activePlatforms?: string[]; }; UpdateCheckResponse: { updateAvailable: boolean; diff --git a/packages/web/src/lib/workflow-metadata.test.ts b/packages/web/src/lib/workflow-metadata.test.ts index 18af743267..87fd8bb2c9 100644 --- a/packages/web/src/lib/workflow-metadata.test.ts +++ b/packages/web/src/lib/workflow-metadata.test.ts @@ -200,6 +200,31 @@ describe('getWorkflowTags', () => { const githubCount = tags.filter(t => t === 'GitHub').length; expect(githubCount).toBeLessThanOrEqual(1); }); + + test('uses explicit tags when provided', () => { + const parsed = parseWorkflowDescription('A GitLab workflow'); + const tags = getWorkflowTags('review-gitlab-mr', parsed, ['GitLab', 'Review']); + expect(tags).toEqual(['GitLab', 'Review']); + }); + + test('falls back to inference when no explicit tags', () => { + const parsed = parseWorkflowDescription('Does: review PR on GitHub'); + const tags = getWorkflowTags('archon-pr-review', parsed, undefined); + expect(tags).toContain('GitHub'); + expect(tags).toContain('Review'); + }); + + test('deduplicates explicit tags', () => { + const parsed = parseWorkflowDescription('anything'); + const tags = getWorkflowTags('test', parsed, ['GitLab', 'GitLab', 'Review']); + expect(tags).toEqual(['GitLab', 'Review']); + }); + + test('explicit empty array suppresses inference', () => { + const parsed = parseWorkflowDescription('Does: review PR on GitHub'); + const tags = getWorkflowTags('archon-pr-review', parsed, []); + expect(tags).toEqual([]); + }); }); describe('getWorkflowIconName', () => { diff --git a/packages/web/src/lib/workflow-metadata.ts b/packages/web/src/lib/workflow-metadata.ts index e3ab01191d..14ccb43e3e 100644 --- a/packages/web/src/lib/workflow-metadata.ts +++ b/packages/web/src/lib/workflow-metadata.ts @@ -163,8 +163,18 @@ export function getWorkflowCategory(name: string, description: string): Workflow /** * Derive tags from the workflow name and parsed description. + * If `explicitTags` is provided (including an empty array), those are used + * verbatim (deduplicated) and inference is skipped. */ -export function getWorkflowTags(name: string, parsed: ParsedDescription): string[] { +export function getWorkflowTags( + name: string, + parsed: ParsedDescription, + explicitTags?: string[] +): string[] { + if (explicitTags !== undefined) { + return [...new Set(explicitTags)]; + } + const tags: string[] = []; const text = `${name} ${parsed.raw}`.toLowerCase(); diff --git a/packages/workflows/src/dag-executor.test.ts b/packages/workflows/src/dag-executor.test.ts index b4717e9565..81f258132f 100644 --- a/packages/workflows/src/dag-executor.test.ts +++ b/packages/workflows/src/dag-executor.test.ts @@ -1243,6 +1243,51 @@ describe('executeDagWorkflow -- bash nodes', () => { execSpy.mockRestore(); }); + it('does not start or complete a workflow that is already paused before a bash node executes', async () => { + const store = createMockStore(); + (store.getWorkflowRunStatus as Mock<() => Promise>).mockResolvedValue('paused'); + const mockDeps = createMockDeps(store); + const platform = createMockPlatform(); + const workflowRun = makeWorkflowRun('bash-paused-run-id'); + const execSpy = spyOn(git, 'execFileAsync').mockResolvedValue({ + stdout: 'should-not-run\n', + stderr: '', + }); + + await executeDagWorkflow( + mockDeps, + platform, + 'conv-bash-paused', + testDir, + { name: 'bash-paused-test', nodes: [{ id: 'stats', bash: 'echo blocked' }] }, + workflowRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + expect(execSpy).not.toHaveBeenCalled(); + interface EventTypeOnly { + event_type: string; + } + const eventTypes = ( + store.createWorkflowEvent as Mock<(event: EventTypeOnly) => Promise> + ).mock.calls.map((call: [EventTypeOnly]) => call[0].event_type); + expect(eventTypes).not.toContain('node_started'); + expect(eventTypes).not.toContain('node_failed'); + expect(eventTypes).not.toContain('run_failed'); + expect(eventTypes).not.toContain('run_cancelled'); + expect((store.completeWorkflowRun as Mock<() => Promise>).mock.calls.length).toBe(0); + expect((store.failWorkflowRun as Mock<() => Promise>).mock.calls.length).toBe(0); + expect((store.cancelWorkflowRun as Mock<() => Promise>).mock.calls.length).toBe(0); + + execSpy.mockRestore(); + }); + it('bash node output with shell metacharacters does not inject into downstream bash script', async () => { const mockDeps = createMockDeps(); const platform = createMockPlatform(); @@ -3140,6 +3185,266 @@ describe('executeDagWorkflow -- resume with priorCompletedNodes', () => { expect(mockSendQueryDag.mock.calls.length).toBe(3); }); + it('substitutes $LOOP_PREV_OUTPUT with previous iteration output (empty on iter 1)', async () => { + // Iteration 1 emits a distinctive output, iteration 2 emits the completion signal. + // We then assert the prompt sent to the AI: iteration 1 strips $LOOP_PREV_OUTPUT + // to empty, iteration 2 receives iteration 1's cleaned output. + let callCount = 0; + mockSendQueryDag.mockImplementation(function* () { + callCount++; + if (callCount === 1) { + yield { type: 'assistant', content: 'Iter1 output: 2 type errors in users.ts' }; + yield { type: 'result', sessionId: 'loop-session-1' }; + } else { + yield { type: 'assistant', content: 'All fixed. COMPLETE' }; + yield { type: 'result', sessionId: 'loop-session-2' }; + } + }); + + const mockDeps = createMockDeps(); + const platform = createMockPlatform(); + const workflowRun = makeWorkflowRun(); + + await executeDagWorkflow( + mockDeps, + platform, + 'conv-dag', + testDir, + { + name: 'dag-loop-prev-output', + nodes: [ + { + id: 'fix-loop', + loop: { + prompt: 'Previous output: <<$LOOP_PREV_OUTPUT>>. Fix and emit COMPLETE.', + until: 'COMPLETE', + max_iterations: 5, + fresh_context: true, + }, + }, + ], + }, + workflowRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + expect(mockSendQueryDag.mock.calls.length).toBe(2); + const promptIter1 = mockSendQueryDag.mock.calls[0][0] as string; + const promptIter2 = mockSendQueryDag.mock.calls[1][0] as string; + // Iteration 1: $LOOP_PREV_OUTPUT substitutes to empty string. + expect(promptIter1).toContain('Previous output: <<>>.'); + // Iteration 2: receives iteration 1's cleaned output. + expect(promptIter2).toContain( + 'Previous output: <>.' + ); + }); + + it('strips tags from $LOOP_PREV_OUTPUT (uses cleaned output)', async () => { + let callCount = 0; + mockSendQueryDag.mockImplementation(function* () { + callCount++; + if (callCount === 1) { + // Iteration 1 includes a non-completion XML tag in its output. The cleaned + // output (after stripCompletionTags) drops ... blocks. + // We use a non-matching signal here so iteration 1 does NOT complete. + yield { + type: 'assistant', + content: 'Real work output. NOT_DONE_YET', + }; + yield { type: 'result', sessionId: 'loop-session-1' }; + } else { + yield { type: 'assistant', content: 'Done. COMPLETE' }; + yield { type: 'result', sessionId: 'loop-session-2' }; + } + }); + + const mockDeps = createMockDeps(); + const platform = createMockPlatform(); + const workflowRun = makeWorkflowRun(); + + await executeDagWorkflow( + mockDeps, + platform, + 'conv-dag', + testDir, + { + name: 'dag-loop-prev-clean', + nodes: [ + { + id: 'fix-loop', + loop: { + prompt: 'PREV=[$LOOP_PREV_OUTPUT]', + until: 'COMPLETE', + max_iterations: 5, + fresh_context: true, + }, + }, + ], + }, + workflowRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + expect(mockSendQueryDag.mock.calls.length).toBe(2); + const promptIter2 = mockSendQueryDag.mock.calls[1][0] as string; + // The previous-output payload must be the *cleaned* output — no tags. + expect(promptIter2).toContain('PREV=[Real work output.'); + expect(promptIter2).not.toContain(''); + }); + + it('$LOOP_PREV_OUTPUT is empty on the first iteration after interactive resume', async () => { + // Regression guard for the resume-from-approval path: when an interactive + // loop pauses at the approval gate, the prior `lastIterationOutput` lives + // in a separate process and is not persisted. On resume, the executor must + // substitute $LOOP_PREV_OUTPUT to '' on the first resumed iteration — + // never to whatever the paused run produced. + // + // Wirasm-suggested shape (PR #1367 review): two executeDagWorkflow calls. + // The first call pauses at the gate after iteration 1; the second call + // resumes with metadata.approval populated and runs iteration 2. + + // ---- Call 1: fresh run, iteration 1 emits no completion → pauses at gate + mockSendQueryDag.mockImplementationOnce(function* () { + yield { type: 'assistant', content: 'Iter1 output: 2 type errors in users.ts' }; + yield { type: 'result', sessionId: 'loop-session-1' }; + }); + const mockDeps1 = createMockDeps(); + const platform1 = createMockPlatform(); + const freshRun = makeWorkflowRun('resume-prev-fresh-run'); + + await executeDagWorkflow( + mockDeps1, + platform1, + 'conv-dag', + testDir, + { + name: 'interactive-loop-resume-prev-output', + nodes: [ + { + id: 'refine', + loop: { + prompt: + 'User: $LOOP_USER_INPUT. PREV=<<$LOOP_PREV_OUTPUT>>. Continue or emit COMPLETE.', + until: 'COMPLETE', + max_iterations: 10, + interactive: true, + gate_message: 'Review and provide feedback.', + }, + }, + ], + }, + freshRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + // First iteration of a fresh interactive loop: $LOOP_PREV_OUTPUT empty; + // $LOOP_USER_INPUT empty (no user has spoken yet). + expect(mockSendQueryDag.mock.calls.length).toBe(1); + const promptIter1 = mockSendQueryDag.mock.calls[0][0] as string; + expect(promptIter1).toContain('PREV=<<>>.'); + expect(promptIter1).toContain('User: .'); + // Fresh interactive loop must pause at the gate, not return early. + const pauseCalls1 = ( + mockDeps1.store.pauseWorkflowRun as Mock< + (id: string, ctx: Record) => Promise + > + ).mock.calls; + expect(pauseCalls1.length).toBe(1); + expect(pauseCalls1[0][1]).toMatchObject({ + type: 'interactive_loop', + nodeId: 'refine', + iteration: 1, + }); + + // ---- Call 2: resumed run — metadata carries iter 1 + user input. + // iter 2 emits the completion signal so the loop exits cleanly. + mockSendQueryDag.mockImplementationOnce(function* () { + yield { type: 'assistant', content: 'All clear. COMPLETE' }; + yield { type: 'result', sessionId: 'loop-session-2' }; + }); + const mockDeps2 = createMockDeps(); + const platform2 = createMockPlatform(); + const resumedRun = makeWorkflowRun('resume-prev-resume-run', { + metadata: { + approval: { + type: 'interactive_loop', + nodeId: 'refine', + iteration: 1, + sessionId: 'loop-session-1', + message: 'Review and provide feedback.', + }, + loop_user_input: 'looks good, ship it', + }, + }); + + await executeDagWorkflow( + mockDeps2, + platform2, + 'conv-dag', + testDir, + { + name: 'interactive-loop-resume-prev-output', + nodes: [ + { + id: 'refine', + loop: { + prompt: + 'User: $LOOP_USER_INPUT. PREV=<<$LOOP_PREV_OUTPUT>>. Continue or emit COMPLETE.', + until: 'COMPLETE', + max_iterations: 10, + interactive: true, + gate_message: 'Review and provide feedback.', + }, + }, + ], + }, + resumedRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + // Second executeDagWorkflow call started a fresh sendQuery generator (mock + // call index 1 across the two runs). The resumed iteration must NOT carry + // the prior process's iter-1 output through $LOOP_PREV_OUTPUT — it must + // substitute to ''. + expect(mockSendQueryDag.mock.calls.length).toBe(2); + const promptResumeIter = mockSendQueryDag.mock.calls[1][0] as string; + expect(promptResumeIter).toContain('PREV=<<>>.'); + expect(promptResumeIter).not.toContain('Iter1 output: 2 type errors'); + // The resume's user input flows through on the first resumed iteration. + expect(promptResumeIter).toContain('User: looks good, ship it.'); + // Resume call exits via completion, not via a second pause at the gate. + const pauseCalls2 = ( + mockDeps2.store.pauseWorkflowRun as Mock< + (id: string, ctx: Record) => Promise + > + ).mock.calls; + expect(pauseCalls2.length).toBe(0); + }); + it('fails when max_iterations exceeded', async () => { mockSendQueryDag.mockImplementation(function* () { yield { type: 'assistant', content: 'Still working...' }; @@ -4674,6 +4979,76 @@ describe('executeDagWorkflow -- approval node', () => { expect(pauseCalls.length).toBe(1); }); + it('on_reject does not write node_completed for the approval gate node ID', async () => { + mockSendQueryDag.mockImplementation(function* () { + yield { type: 'assistant', content: 'Fixed based on feedback' }; + yield { type: 'result', sessionId: 'reject-no-poison-session' }; + }); + + const store = createMockStore(); + const mockDeps = createMockDeps(store); + const platform = createMockPlatform(); + + const workflowRun = makeWorkflowRun('reject-no-poison-run', { + metadata: { + approval: { + type: 'approval', + nodeId: 'review', + message: 'Approve this plan?', + onRejectPrompt: 'Fix based on: $REJECTION_REASON', + onRejectMaxAttempts: 3, + }, + rejection_reason: 'Missing edge case handling', + rejection_count: 1, + }, + }); + + await executeDagWorkflow( + mockDeps, + platform, + 'conv-approval', + testDir, + { + name: 'approval-no-poison', + nodes: [ + { + id: 'review', + approval: { + message: 'Approve this plan?', + on_reject: { prompt: 'Fix based on: $REJECTION_REASON', max_attempts: 3 }, + }, + }, + ], + }, + workflowRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + // The on_reject synthetic node must NOT produce a node_completed event with + // step_name equal to the approval gate's own ID ('review'). If it did, a + // subsequent resume would find the event via getCompletedDagNodeOutputs and + // skip the approval gate entirely, bypassing the human gate. + const eventCalls = (store.createWorkflowEvent as ReturnType).mock.calls; + const nodeCompletedEvents = eventCalls.filter( + (call: unknown[]) => (call[0] as Record).event_type === 'node_completed' + ); + const completedStepNames = nodeCompletedEvents.map( + (call: unknown[]) => (call[0] as Record).step_name + ); + expect(completedStepNames).not.toContain('review'); + + // The synthetic on_reject node MUST produce a node_completed event with the + // distinct ID 'review:on_reject'. This ensures the synthetic node itself is + // recorded as completed so it is not re-run on a subsequent resume. + expect(completedStepNames.filter((n: unknown) => n === 'review:on_reject').length).toBe(1); + }); + it('on_reject cancels when max_attempts exhausted', async () => { const store = createMockStore(); const mockDeps = createMockDeps(store); @@ -4787,6 +5162,112 @@ describe('executeDagWorkflow -- approval node', () => { 1 ); }); + + it('approval message substitutes $nodeId.output.field references from upstream structured output', async () => { + // Repro for: approval gates were rendering literal "$gather-context.output.repo_name" + // instead of resolved values, breaking interactive workflows like atlas-onboard. + // Parity: prompt/bash/loop/cancel nodes already get substituteNodeOutputRefs; + // approval.message must too so the human sees concrete values. + const structuredJson = { + repo_name: 'hcr-els', + app_code: 'CCELS', + frontend_port: 3012, + }; + + const commandsDir = join(testDir, '.archon', 'commands'); + await mkdir(commandsDir, { recursive: true }); + await writeFile(join(commandsDir, 'gather-context.md'), 'Gather context: $USER_MESSAGE'); + + mockSendQueryDag.mockImplementation(function* () { + yield { type: 'assistant', content: JSON.stringify(structuredJson) }; + yield { type: 'result', sessionId: 'sid-approval-sub', structuredOutput: structuredJson }; + }); + + const store = createMockStore(); + const mockDeps = createMockDeps(store); + const platform = createMockPlatform(); + const workflowRun = makeWorkflowRun('approval-sub-run'); + + await executeDagWorkflow( + mockDeps, + platform, + 'conv-approval-sub', + testDir, + { + name: 'approval-sub-test', + nodes: [ + { + id: 'gather-context', + command: 'gather-context', + output_format: { + type: 'object', + properties: { + repo_name: { type: 'string' }, + app_code: { type: 'string' }, + frontend_port: { type: 'number' }, + }, + }, + }, + { + id: 'confirm', + depends_on: ['gather-context'], + approval: { + message: + 'Repo: $gather-context.output.repo_name | App: $gather-context.output.app_code | Port: $gather-context.output.frontend_port', + }, + }, + ], + }, + workflowRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + // gather-context AI call ran once; approval node does NOT call AI + expect(mockSendQueryDag.mock.calls.length).toBe(1); + + // pauseWorkflowRun should receive the SUBSTITUTED message, not the literal placeholders + const pauseCalls = ( + store.pauseWorkflowRun as Mock<(id: string, ctx: Record) => Promise> + ).mock.calls; + expect(pauseCalls.length).toBe(1); + expect(pauseCalls[0][1]).toMatchObject({ + type: 'approval', + nodeId: 'confirm', + message: 'Repo: hcr-els | App: CCELS | Port: 3012', + }); + + // The fix touches FOUR emission sites (safeSendMessage / createWorkflowEvent / + // pauseWorkflowRun / event-emitter). Assert the other two reachable surfaces too — + // a future regression at any one of them would otherwise pass this test silently. + // (Per CodeRabbit review of PR coleam00/Archon#1426.) + + // (a) The chat-surface prompt emitted via platform.sendMessage must contain the + // substituted message and must NOT contain literal $gather-context.output refs. + const sentMessages = ( + platform.sendMessage as Mock<(...args: unknown[]) => Promise> + ).mock.calls.map((c: unknown[]) => c[1] as string); + expect(sentMessages.some(m => m.includes('Repo: hcr-els | App: CCELS | Port: 3012'))).toBe( + true + ); + expect(sentMessages.some(m => m.includes('$gather-context.output'))).toBe(false); + + // (b) The persisted approval_requested workflow event's data.message must be substituted. + const approvalRequestedEvents = ( + store.createWorkflowEvent as Mock<() => Promise> + ).mock.calls.filter( + (c: unknown[]) => (c[0] as { event_type: string }).event_type === 'approval_requested' + ); + expect(approvalRequestedEvents.length).toBe(1); + expect((approvalRequestedEvents[0][0] as { data: { message: string } }).data.message).toBe( + 'Repo: hcr-els | App: CCELS | Port: 3012' + ); + }); }); describe('executeDagWorkflow -- env var injection', () => { let testDir: string; @@ -6079,3 +6560,161 @@ describe('shouldContinueStreamingForStatus', () => { expect(shouldContinueStreamingForStatus('invalid-status')).toBe(false); }); }); + +describe('executeDagWorkflow -- final status derivation', () => { + // Invariant: if ANY non-skipped node has failed status, the run must be + // marked 'failed' — never 'completed' — regardless of how many other nodes + // succeeded. This covers the anyFailed branch in executeDagWorkflow + // (dag-executor.ts ~line 2956), which had no direct test coverage. + let testDir: string; + + beforeEach(async () => { + testDir = join( + tmpdir(), + `dag-status-test-${Date.now()}-${Math.random().toString(36).slice(2)}` + ); + await mkdir(testDir, { recursive: true }); + + mockSendQueryDag.mockClear(); + mockGetAgentProviderDag.mockClear(); + mockSendQueryDag.mockImplementation(function* () { + yield { type: 'assistant', content: 'DAG AI response' }; + yield { type: 'result', sessionId: 'dag-session-id' }; + }); + mockGetAgentProviderDag.mockImplementation(() => ({ + sendQuery: mockSendQueryDag, + getType: () => 'claude', + getCapabilities: mockClaudeCapabilities, + })); + }); + + afterEach(async () => { + try { + await rm(testDir, { recursive: true, force: true }); + } catch { + // ignore cleanup errors + } + }); + + it('one success + one independent failure -> failWorkflowRun, not completeWorkflowRun', async () => { + const mockStore = createMockStore(); + const mockDeps = createMockDeps(mockStore); + const platform = createMockPlatform(); + const workflowRun = makeWorkflowRun('dag-status-run-1'); + + const nodes: DagNode[] = [ + { id: 'pass', bash: 'echo ok' } as BashNode, + { id: 'fail', bash: 'exit 1' } as BashNode, + ]; + + await executeDagWorkflow( + mockDeps, + platform, + 'conv-status', + testDir, + { name: 'status-test', nodes }, + workflowRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + expect((mockStore.failWorkflowRun as ReturnType).mock.calls.length).toBe(1); + expect((mockStore.completeWorkflowRun as ReturnType).mock.calls.length).toBe(0); + expect(mockStore.failWorkflowRun).toHaveBeenCalledWith( + expect.anything(), + expect.stringContaining('fail') + ); + + // Confirm the failure message names the failing node + const sendMessage = platform.sendMessage as ReturnType; + const messages = sendMessage.mock.calls.map((call: unknown[]) => call[1] as string); + const failMsg = messages.find((m: string) => m.includes('completed with failures')); + expect(failMsg).toBeDefined(); + }); + + it('multiple successes + one failure -> failWorkflowRun, not completeWorkflowRun', async () => { + const mockStore = createMockStore(); + const mockDeps = createMockDeps(mockStore); + const platform = createMockPlatform(); + const workflowRun = makeWorkflowRun('dag-status-run-2'); + + const nodes: DagNode[] = [ + { id: 'a', bash: 'echo a' } as BashNode, + { id: 'b', bash: 'echo b' } as BashNode, + { id: 'c', bash: 'echo c' } as BashNode, + { id: 'fail', bash: 'exit 1' } as BashNode, + ]; + + await executeDagWorkflow( + mockDeps, + platform, + 'conv-status', + testDir, + { name: 'status-test-multi', nodes }, + workflowRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + expect((mockStore.failWorkflowRun as ReturnType).mock.calls.length).toBe(1); + expect((mockStore.completeWorkflowRun as ReturnType).mock.calls.length).toBe(0); + expect(mockStore.failWorkflowRun).toHaveBeenCalledWith( + expect.anything(), + expect.stringContaining('fail') + ); + + const sendMessage = platform.sendMessage as ReturnType; + const messages = sendMessage.mock.calls.map((call: unknown[]) => call[1] as string); + const failMsg = messages.find((m: string) => m.includes('completed with failures')); + expect(failMsg).toBeDefined(); + }); + + it('trigger_rule: none_failed skips dependent node + anyFailed still marks run failed', async () => { + const mockStore = createMockStore(); + const mockDeps = createMockDeps(mockStore); + const platform = createMockPlatform(); + const workflowRun = makeWorkflowRun('dag-status-run-3'); + + // Layer 1: A and B run in parallel. B fails. + // Layer 2: C depends on B with trigger_rule: none_failed — so C is skipped. + // Expected: anyFailed=true (from B), so run must be marked failed even though C is only skipped. + const nodes: DagNode[] = [ + { id: 'a', bash: 'echo a' } as BashNode, + { id: 'b', bash: 'exit 1' } as BashNode, + { id: 'c', bash: 'echo c', depends_on: ['b'], trigger_rule: 'none_failed' } as BashNode, + ]; + + await executeDagWorkflow( + mockDeps, + platform, + 'conv-status', + testDir, + { name: 'status-test-skip', nodes }, + workflowRun, + 'claude', + undefined, + join(testDir, 'artifacts'), + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ); + + expect((mockStore.failWorkflowRun as ReturnType).mock.calls.length).toBe(1); + expect((mockStore.completeWorkflowRun as ReturnType).mock.calls.length).toBe(0); + expect(mockStore.failWorkflowRun).toHaveBeenCalledWith( + expect.anything(), + expect.stringContaining('b') + ); + }); +}); diff --git a/packages/workflows/src/dag-executor.ts b/packages/workflows/src/dag-executor.ts index 419a9066f6..0f6f4b8063 100644 --- a/packages/workflows/src/dag-executor.ts +++ b/packages/workflows/src/dag-executor.ts @@ -1223,6 +1223,66 @@ async function executeBashNode( const nodeStartTime = Date.now(); const nodeContext: SendMessageContext = { workflowId: workflowRun.id, nodeName: node.id }; + // Pre-execution cancel check — avoid starting work if workflow is already + // cancelled. Fail closed: if the status probe itself errors, skip the node + // rather than spawning a subprocess we can't tear down on cancel. + try { + const preStatus = await deps.store.getWorkflowRunStatus(workflowRun.id); + if (preStatus !== 'running') { + const effectiveStatus = preStatus ?? 'deleted'; + getLog().info( + { nodeId: node.id, status: effectiveStatus }, + 'bash_node.cancelled_before_exec' + ); + // Return `skipped`, not `completed`: a sibling that paused the run (or + // an external cancel) shouldn't make this node satisfy `all_success` + // for downstream nodes. `skipped` propagates correctly through + // checkTriggerRule and avoids handing downstream a synthetic output. + return { + state: 'skipped', + output: `Bash node '${node.id}' skipped: workflow ${effectiveStatus}`, + }; + } + } catch (err) { + const probeErr = err as Error; + getLog().error( + { err: probeErr, workflowRunId: workflowRun.id, nodeId: node.id }, + 'bash_node.status_check_failed' + ); + // Fail closed but mark the node as `failed`, not `completed` — a transient + // status-probe failure must not silently turn skipped work into a synthetic + // success that downstream nodes can consume. Emit/persist a node_failed + // event before returning so consumers that derive DAG state from + // node_started → node_completed/failed transitions can attribute the + // workflow failure to this node instead of seeing it disappear. + const probeFailMsg = `Bash node '${node.id}' skipped: workflow status probe failed (${probeErr.message})`; + deps.store + .createWorkflowEvent({ + workflow_run_id: workflowRun.id, + event_type: 'node_failed', + step_name: node.id, + data: { error: probeFailMsg, type: 'bash', reason: 'status_probe_failed' }, + }) + .catch((dbErr: Error) => { + getLog().error( + { err: dbErr, workflowRunId: workflowRun.id, eventType: 'node_failed' }, + 'workflow_event_persist_failed' + ); + }); + getWorkflowEventEmitter().emit({ + type: 'node_failed', + runId: workflowRun.id, + nodeId: node.id, + nodeName: node.id, + error: probeFailMsg, + }); + return { + state: 'failed', + output: '', + error: probeFailMsg, + }; + } + getLog().info({ nodeId: node.id, type: 'bash' }, 'dag_node_started'); await logNodeStart(logDir, workflowRun.id, node.id, ''); @@ -1261,8 +1321,13 @@ async function executeBashNode( const finalScript = substituteNodeOutputRefs(substitutedScript, nodeOutputs, true); const timeout = node.timeout ?? SUBPROCESS_DEFAULT_TIMEOUT; - const subprocessEnv = - envVars && Object.keys(envVars).length > 0 ? { ...process.env, ...envVars } : undefined; + const subprocessEnv: NodeJS.ProcessEnv = { + ...process.env, + ARTIFACTS_DIR: artifactsDir, + LOG_DIR: logDir, + BASE_BRANCH: baseBranch, + ...(envVars ?? {}), + }; try { const { stdout, stderr } = await execFileAsync('bash', ['-c', finalScript], { @@ -1377,6 +1442,63 @@ async function executeScriptNode( const nodeStartTime = Date.now(); const nodeContext: SendMessageContext = { workflowId: workflowRun.id, nodeName: node.id }; + // Pre-execution cancel check — avoid starting work if workflow is already + // cancelled. Fail closed: if the status probe itself errors, skip the node + // rather than spawning a subprocess we can't tear down on cancel. + try { + const preStatus = await deps.store.getWorkflowRunStatus(workflowRun.id); + if (preStatus !== 'running') { + const effectiveStatus = preStatus ?? 'deleted'; + getLog().info( + { nodeId: node.id, status: effectiveStatus }, + 'script_node.cancelled_before_exec' + ); + // Return `skipped`, not `completed`: see executeBashNode for rationale. + return { + state: 'skipped', + output: `Script node '${node.id}' skipped: workflow ${effectiveStatus}`, + }; + } + } catch (err) { + const probeErr = err as Error; + getLog().error( + { err: probeErr, workflowRunId: workflowRun.id, nodeId: node.id }, + 'script_node.status_check_failed' + ); + // Fail closed but mark the node as `failed`, not `completed` — a transient + // status-probe failure must not silently turn skipped work into a synthetic + // success that downstream nodes can consume. Emit/persist a node_failed + // event before returning so consumers that derive DAG state from + // node_started → node_completed/failed transitions can attribute the + // workflow failure to this node instead of seeing it disappear. + const probeFailMsg = `Script node '${node.id}' skipped: workflow status probe failed (${probeErr.message})`; + deps.store + .createWorkflowEvent({ + workflow_run_id: workflowRun.id, + event_type: 'node_failed', + step_name: node.id, + data: { error: probeFailMsg, type: 'script', reason: 'status_probe_failed' }, + }) + .catch((dbErr: Error) => { + getLog().error( + { err: dbErr, workflowRunId: workflowRun.id, eventType: 'node_failed' }, + 'workflow_event_persist_failed' + ); + }); + getWorkflowEventEmitter().emit({ + type: 'node_failed', + runId: workflowRun.id, + nodeId: node.id, + nodeName: node.id, + error: probeFailMsg, + }); + return { + state: 'failed', + output: '', + error: probeFailMsg, + }; + } + getLog().info({ nodeId: node.id, type: 'script', runtime: node.runtime }, 'dag_node_started'); await logNodeStart(logDir, workflowRun.id, node.id, '