Release 0.4.0 by prospapledge88 · Pull Request #2 · prospapledge88/Archon

prospapledge88 · 2026-04-14T03:40:04Z

Release 0.4.0

Six harness-engineering improvements inspired by Cole Medin's "Full Archon Guide"
livestream — prompt injection defense, cost analytics, scheduled workflow triggers,
cross-run project knowledge, a dark-factory reference workflow, and workflow health
metrics. Includes three rounds of peer-review fixes from independent code reviews.

Added

Prompt injection defense for workflow inputs: two-layer defense for untrusted external content flowing into workflow prompts via $CONTEXT, $ISSUE_CONTEXT, and $EXTERNAL_CONTEXT. Layer 1 strips known injection patterns (LLM role markers, Anthropic turn delimiters, instruction overrides, trust-boundary breakers). Layer 2 wraps the sanitized content in an XML trust boundary.
Cost analytics API and dashboard: new GET /api/analytics/costs endpoint returning total spend, per-workflow cost breakdown, daily buckets, and success/failure cost splits. CostSummaryCard on the dashboard shows total spend, top 3 workflows by cost, and success vs. failure cost.
Scheduled workflow triggers: new schedules: configuration in per-repo .archon/config.yaml with standard 5-field cron expressions. The scheduler runs on a 60-second tick and dispatches workflows via a dedicated worktree per run. Lightweight cron parser — no external dependencies.
Cross-run project knowledge: every workflow run contributes a deterministic summary entry to .archon/knowledge/run-history.md (capped at 50 entries). Workflow prompts can inject prior run history via the new $PROJECT_KNOWLEDGE variable.
Dark-factory reference workflow: new bundled archon-dark-factory YAML demonstrating autonomous GitHub issue processing with label gating (archon:auto), plan/implement/validate/PR flow, and explicit success/failure handling.
Workflow health metrics on the dashboard: new WorkflowHealthCard shows success rate, average run duration, and top 3 failing workflows. Shares a TanStack Query cache entry with CostSummaryCard — one network call feeds both widgets.

Changed

substituteWorkflowVariables() accepts an optional projectKnowledge parameter; threaded through buildPromptWithContext() and all call sites.
Scheduled workflow dispatch now creates a dedicated worktree per run instead of executing against the codebase's live checkout.
CostAnalytics response shape extended with health fields; schema name preserved for compatibility.
api.generated.d.ts regenerated from the OpenAPI spec.

Fixed

Dark-factory plan→implement handoff via bridge-artifacts node + archon-fix-issue command (was broken for scheduler dispatch).
Dark-factory success handler swaps archon:auto → archon:done to prevent infinite re-processing; reads PR URL from $ARTIFACTS_DIR/.pr-url sentinel instead of grepping stdout.
Dark-factory failure handler uses .pr-url sentinel to distinguish streamed-text-then-failed from genuine success.
Scheduler overlap check switched from path-based to codebase+workflow-name (since scheduled runs now use worktree paths).
getAvgDuration guards against clock-skew (negative durations) and NaN from PostgreSQL NUMERIC edge cases.
Dashboard cards share an identical queryKey: ['cost-analytics', { days: 30 }] for single-fetch behaviour.
WorkflowHealthCard uses the existing formatDurationMs helper so duration renders consistently across all dashboard cards.

Merging this PR releases 0.4.0 to main.

Introduces stripInjectionPatterns() in sanitize-external.ts with four pattern categories: LLM role markers, Anthropic turn delimiters, instruction overrides, and trust boundary breakers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sanitizeExternalContent() combines pattern stripping with an XML wrapper that instructs the AI to treat the content as data, not instructions. Logs stripped patterns at warn level. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…stitution substituteWorkflowVariables() and buildPromptWithContext() now sanitize issueContext through sanitizeExternalContent() before substitution. Untrusted content from GitHub issues is stripped of injection patterns and wrapped in XML trust boundaries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two-layer defense for untrusted external content (, , ): deterministic pattern stripping (LLM role markers, Anthropic turn delimiters, instruction overrides, trust boundary breakers) followed by XML trust boundary wrapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Dialect-aware SQL queries for per-workflow cost breakdown and daily cost totals. Reads existing total_cost_usd from workflow_runs metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

OpenAPI route returning aggregated workflow cost analytics: total spend, success/failure breakdown, per-workflow costs, and daily cost buckets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CostSummaryCard shows total spend, success/failure breakdown, and top 3 workflows by cost. Uses TanStack Query with 30s stale time. Hidden when no cost data is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds GET /api/analytics/costs endpoint with per-workflow cost breakdown, success/failure split, and daily buckets. Dashboard CostSummaryCard shows total spend, top 3 workflows, and success/failure cost comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

5-field cron parser supporting wildcards, ranges, steps, and lists. Used by the workflow scheduler to evaluate schedule triggers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Minimal IWorkflowPlatform that logs workflow messages via Pino instead of sending to a chat platform. Used for scheduled runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New ScheduleEntry type with workflow, cron, and enabled fields. Parsed from per-repo .archon/config.yaml schedules: array. Invalid entries (missing workflow or cron) are filtered out. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

60-second tick loop evaluates cron schedules from per-repo config. Dispatches workflows via executeWorkflow() with a logging-only adapter. Skips if a run is already active for the same workflow+path. Rescans codebase configs every 5 minutes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds cron-based workflow scheduling to Archon. Configured per-repo in .archon/config.yaml with schedules: entries. 60-second tick loop evaluates cron expressions and dispatches via executeWorkflow() with a logging-only adapter. Skips overlapping runs. Rescans configs every 5 minutes. Includes: cron parser, schedule adapter, config types, scheduler service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extracts deterministic run summaries into .archon/knowledge/run-history.md. Supports formatting, prepending (newest first), and capping at 50 entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New optional variable for injecting cross-run project knowledge from .archon/knowledge/run-history.md into workflow prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reads .archon/knowledge/run-history.md at workflow start for $PROJECT_KNOWLEDGE substitution. Records run summaries after workflow completion via knowledge-writer. Respects package boundary (@archon/workflows reads filesystem, @archon/core writes via DB).

Accumulates deterministic run summaries in .archon/knowledge/run-history.md (newest first, capped at 50 entries). Future runs access prior knowledge via the new $PROJECT_KNOWLEDGE workflow variable. Respects the @archon/workflows → @archon/core package boundary: workflows reads the file via fs/promises; core writes via the knowledge-writer service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New bundled workflow demonstrating autonomous GitHub issue processing. Fetches issues labeled archon:auto, plans using $PROJECT_KNOWLEDGE, implements in a fresh session, validates with a fix loop, creates a draft PR, and handles success/failure outcomes via issue comments and label management. Designed to run on a cron schedule (see description for setup).

Adds archon-dark-factory to BUNDLED_WORKFLOWS so it ships with binary distributions alongside the other bundled workflows.

Bundled YAML workflow demonstrating autonomous GitHub issue processing. Fetches issues labeled 'archon:auto', plans with $PROJECT_KNOWLEDGE, implements in a fresh session, validates with a 5-iteration fix loop, creates a draft PR, and handles success/failure via issue comments and label management. Designed to run on a cron schedule. Composes the four prior improvements: prompt injection defense (automatic via sanitized $CONTEXT), cost analytics (runs appear in dashboard), scheduled triggers (via .archon/config.yaml schedules), and cross-run project knowledge ($PROJECT_KNOWLEDGE in plan node). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three independent peer reviews converged on Critical/Important issues in the bundled dark factory workflow. This commit addresses all of them: - Wrap $fetch-issue.output in XML trust boundary in plan prompt (Layer-1 pattern stripping only fires for $CONTEXT-family variables, not node outputs; issue bodies were flowing raw into the plan prompt) - Replace command: archon-implement with bridge-artifacts + archon-fix-issue (archon-implement expects $ARGUMENTS to be a file path; scheduler sets $ARGUMENTS to 'Scheduled run (cron)' which would crash the implement) - Swap archon:auto -> archon:done on success (current workflow kept the label, causing infinite re-processing on every scheduler tick) - Read PR URL from $ARTIFACTS_DIR/.pr-url instead of grepping stdout (archon-create-pr writes the canonical URL there; grep could match any URL in the command's prose output) - Use .pr-url sentinel file for failure-handler guard (previous guard treated 'create-pr streamed text then failed' as success, suppressing both comments) - Idempotent label setup commands in workflow description - Sync spec and plan docs to match shipped YAML (trigger_rule: all_done)

Three independent peer reviews converged on Critical/Important issues in the dark factory workflow. This branch addresses all of them: - Wrap $fetch-issue.output in XML trust boundary (injection defense) - Bridge artifacts pattern for plan → implement handoff (was broken for scheduled dispatch where $ARGUMENTS is not a file path) - Swap archon:auto → archon:done on success (prevents infinite loop) - Read PR URL from $ARTIFACTS_DIR/.pr-url sentinel (not fragile grep) - Use .pr-url sentinel for failure-handler guard - Idempotent label setup commands - Sync spec and plan docs to shipped behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The workflow scheduler was dispatching workflows directly against the codebase's live checkout (schedule.cwd), which meant every scheduled run would commit and push from the user's main working directory, potentially stomping on in-progress work. The scheduler now creates a dedicated worktree per run using the same pattern as the CLI (workflow.ts:467-499): getIsolationProvider().create() with workflowType='task' and a time-based identifier. Each tick gets its own isolated environment; old worktrees are reaped by the existing cleanup service. Also replaces the path-based overlap check (getActiveWorkflowRunByPath) with a codebase + workflow-name check, since scheduled runs now use worktree paths rather than schedule.cwd. Without this change, concurrent scheduler ticks would never detect each other. The recordWorkflowRun call still uses schedule.cwd (the canonical codebase path) so the knowledge file .archon/knowledge/run-history.md persists across runs in the repo itself, not in ephemeral worktrees. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Design specs and implementation plans written during brainstorming but never committed as part of their feature branches: - Prompt injection defense (#1) - Cost analytics aggregation (#2) - Scheduled workflow triggers (#3) - Cross-run project knowledge (#4) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Dialect-aware query for average workflow run duration in seconds. Powers the Workflow Health dashboard card.

Adds successRate, avgDurationSeconds, and topFailingWorkflows to the CostAnalyticsResponse schema. Response name unchanged to preserve compatibility with existing CostSummaryCard.

Adds successRate (aggregate), avgDurationSeconds, and topFailingWorkflows to the response. Tracks per-workflow success/failure counts during aggregation. Noise filter: workflows with fewer than 3 total runs are excluded from topFailingWorkflows.

New card showing success rate, average duration, and top 3 failing workflows. Reuses the CostSummaryCard's TanStack Query cache entry (queryKey: 'cost-analytics') — one API call feeds both cards.

Placed immediately after CostSummaryCard so both analytics widgets appear together between the status bar and active workflows.

Extends the existing cost analytics endpoint with health data (success rate, average duration, top failing workflows) and adds a WorkflowHealthCard dashboard widget. Both cards share a TanStack Query cache entry — single network request feeds both. Health data: - successRate: aggregate across all terminal runs in period - avgDurationSeconds: average completed_at - started_at - topFailingWorkflows: ranked by failure rate, capped at 3, filters workflows with <3 runs to avoid misleading rankings Completes the 6-improvement arc inspired by Cole Medin's harness engineering patterns. The harness-elevates-model thesis is now empirically verifiable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three independent peer reviews converged on the same concrete issues: - Deduplicate formatDuration: use existing formatDurationMs from @/lib/format. The local helper had different output format (2m 30s vs 2.5m) which would render inconsistently beside other dashboard cards using the canonical formatter. - Add clock-skew guard to getAvgDuration: AND completed_at >= started_at prevents negative durations from corrupting the average if a row has bad timestamp order (clock adjustment, manual edit). - NaN defense in Number(raw) coercion: PostgreSQL NUMERIC can theoretically return 'NaN' as a string; Number.isFinite filter falls back to 0. - Add days parameter to queryKey: prevents latent cache collision when a future card wants a different time window. - Regenerate api.generated.d.ts: the OpenAPI-derived types were stale since Improvement #2 landed (neither feature regenerated the file). Now includes CostAnalyticsResponse with all health fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three independent reviews converged on concrete issues: - Deduplicate formatDuration: use existing formatDurationMs from @/lib/format (dashboard now renders consistent duration format across all cards instead of the inconsistent '2m 30s' vs '2.5m') - Clock-skew guard in getAvgDuration: AND completed_at >= started_at prevents negative durations from corrupting the average - NaN defense in Number coercion: guard against 'NaN' strings from PostgreSQL NUMERIC edge cases - days parameter in queryKey: forward-compatibility for future cards that want different time windows - Regenerated api.generated.d.ts: OpenAPI-derived types were stale from Improvement #2; now includes full analytics response shape Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Design spec and implementation plan for Improvement #6 that were written during brainstorming but not committed with the feature branch.

…coleam00#1263) * fix(bundled-defaults): auto-generate import list, emit inline strings Root-cause fix for bundle drift (15 commands + 7 workflows previously missing from binary distributions) and a prerequisite for packaging @archon/workflows as a Node-loadable SDK. The hand-maintained `bundled-defaults.ts` import list is replaced by `scripts/generate-bundled-defaults.ts`, which walks `.archon/{commands,workflows}/defaults/` and emits a generated source file with inline string literals. `bundled-defaults.ts` becomes a thin facade that re-exports the generated records and keeps the `isBinaryBuild()` helper. Inline strings (via JSON.stringify) replace Bun's `import X from '...' with { type: 'text' }` attributes. The binary build still embeds the data at compile time, but the module now loads under Node too — removing SDK blocker #2. - Generator: `scripts/generate-bundled-defaults.ts` (+ `--check` mode for CI) - `package.json`: `generate:bundled`, `check:bundled`; wired into `validate` - `build-binaries.sh`: regenerates defaults before compile - Test: `bundle completeness` now derives expected set from on-disk files - All 56 defaults (36 commands + 20 workflows) now in the bundle * fix(bundled-defaults): address PR review feedback Review: coleam00#1263 (comment) Generator: - Guard against .yaml/.yml name collisions (previously silent overwrite) - Add early access() check with actionable error when run from wrong cwd - Type top-level catch as unknown; print only message for Error instances - Drop redundant /* eslint-disable */ emission (global ignore covers it) - Fix misleading CI-mechanism claim in header comment - Collapse dead `if (!ext) continue` guard into a single typed pass Scripts get real type-checking + linting: - New scripts/tsconfig.json extending root config - type-check now includes scripts/ via `tsc --noEmit -p scripts/tsconfig.json` - Drop `scripts/**` from eslint ignores; add to projectService file scope Tests: - Inline listNames helper (Rule of Three) - Drop redundant toBeDefined/typeof assertions; the Record<string, string> type plus length > 50 already cover them - Add content-fidelity round-trip assertion (defense against generator content bugs, not just key-set drift) Facade comment: drop dead reference to .claude/rules/dx-quirks.md. CI: wire `bun run check:bundled` into .github/workflows/test.yml so the header's CI-verification claim is truthful. Docs: CLAUDE.md step count four→five; add contributor bullet about `bun run generate:bundled` in the Defaults section and CONTRIBUTING.md. * chore(e2e): bump Codex model to gpt-5.2 gpt-5.1-codex-mini is deprecated and unavailable on ChatGPT-account Codex auth. Plain gpt-5.2 works. Verified end-to-end: - e2e-codex-smoke: structured output returns {category:'math'} - e2e-mixed-providers: claude+codex both return expected tokens

cjnprospa and others added 30 commits April 13, 2026 14:49

feat(core): add cost analytics query functions

36fb271

Dialect-aware SQL queries for per-workflow cost breakdown and daily cost totals. Reads existing total_cost_usd from workflow_runs metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(server): add GET /api/analytics/costs endpoint

74dde82

OpenAPI route returning aggregated workflow cost analytics: total spend, success/failure breakdown, per-workflow costs, and daily cost buckets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(web): add cost analytics dashboard widget

3deb0ad

CostSummaryCard shows total spend, success/failure breakdown, and top 3 workflows by cost. Uses TanStack Query with 30s stale time. Hidden when no cost data is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(core): add lightweight cron expression parser

2fa62f6

5-field cron parser supporting wildcards, ranges, steps, and lists. Used by the workflow scheduler to evaluate schedule triggers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(core): add schedule platform adapter

38dccf0

Minimal IWorkflowPlatform that logs workflow messages via Pino instead of sending to a chat platform. Used for scheduled runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(core): add knowledge writer for cross-run project context

422f596

Extracts deterministic run summaries into .archon/knowledge/run-history.md. Supports formatting, prepending (newest first), and capping at 50 entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(workflows): add $PROJECT_KNOWLEDGE variable substitution

13d699b

New optional variable for injecting cross-run project knowledge from .archon/knowledge/run-history.md into workflow prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore(workflows): register dark-factory workflow in bundle

b0c98cd

Adds archon-dark-factory to BUNDLED_WORKFLOWS so it ships with binary distributions alongside the other bundled workflows.

feat(core): add getAvgDuration analytics query

01822c9

Dialect-aware query for average workflow run duration in seconds. Powers the Workflow Health dashboard card.

feat(server): extend cost analytics schema with health fields

74019b5

Adds successRate, avgDurationSeconds, and topFailingWorkflows to the CostAnalyticsResponse schema. Response name unchanged to preserve compatibility with existing CostSummaryCard.

feat(web): add WorkflowHealthCard dashboard widget

254da47

New card showing success rate, average duration, and top 3 failing workflows. Reuses the CostSummaryCard's TanStack Query cache entry (queryKey: 'cost-analytics') — one API call feeds both cards.

feat(web): render WorkflowHealthCard on dashboard

2481942

Placed immediately after CostSummaryCard so both analytics widgets appear together between the status bar and active workflows.

cjnprospa and others added 4 commits April 14, 2026 13:30

docs(superpowers): add spec and plan for workflow health metrics (#6)

b0c63e2

Design spec and implementation plan for Improvement #6 that were written during brainstorming but not committed with the feature branch.

Release 0.4.0

5f72b58

prospapledge88 merged commit 2f8816e into main Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.4.0#2

Release 0.4.0#2
prospapledge88 merged 34 commits intomainfrom
dev

prospapledge88 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

prospapledge88 commented Apr 14, 2026

Release 0.4.0

Added

Changed

Fixed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants