Release 0.4.0#2
Merged
prospapledge88 merged 34 commits intomainfrom Apr 14, 2026
Merged
Conversation
Introduces stripInjectionPatterns() in sanitize-external.ts with four pattern categories: LLM role markers, Anthropic turn delimiters, instruction overrides, and trust boundary breakers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sanitizeExternalContent() combines pattern stripping with an XML wrapper that instructs the AI to treat the content as data, not instructions. Logs stripped patterns at warn level. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stitution substituteWorkflowVariables() and buildPromptWithContext() now sanitize issueContext through sanitizeExternalContent() before substitution. Untrusted content from GitHub issues is stripped of injection patterns and wrapped in XML trust boundaries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two-layer defense for untrusted external content (, , ): deterministic pattern stripping (LLM role markers, Anthropic turn delimiters, instruction overrides, trust boundary breakers) followed by XML trust boundary wrapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dialect-aware SQL queries for per-workflow cost breakdown and daily cost totals. Reads existing total_cost_usd from workflow_runs metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OpenAPI route returning aggregated workflow cost analytics: total spend, success/failure breakdown, per-workflow costs, and daily cost buckets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CostSummaryCard shows total spend, success/failure breakdown, and top 3 workflows by cost. Uses TanStack Query with 30s stale time. Hidden when no cost data is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds GET /api/analytics/costs endpoint with per-workflow cost breakdown, success/failure split, and daily buckets. Dashboard CostSummaryCard shows total spend, top 3 workflows, and success/failure cost comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5-field cron parser supporting wildcards, ranges, steps, and lists. Used by the workflow scheduler to evaluate schedule triggers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minimal IWorkflowPlatform that logs workflow messages via Pino instead of sending to a chat platform. Used for scheduled runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New ScheduleEntry type with workflow, cron, and enabled fields. Parsed from per-repo .archon/config.yaml schedules: array. Invalid entries (missing workflow or cron) are filtered out. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
60-second tick loop evaluates cron schedules from per-repo config. Dispatches workflows via executeWorkflow() with a logging-only adapter. Skips if a run is already active for the same workflow+path. Rescans codebase configs every 5 minutes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds cron-based workflow scheduling to Archon. Configured per-repo in .archon/config.yaml with schedules: entries. 60-second tick loop evaluates cron expressions and dispatches via executeWorkflow() with a logging-only adapter. Skips overlapping runs. Rescans configs every 5 minutes. Includes: cron parser, schedule adapter, config types, scheduler service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracts deterministic run summaries into .archon/knowledge/run-history.md. Supports formatting, prepending (newest first), and capping at 50 entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New optional variable for injecting cross-run project knowledge from .archon/knowledge/run-history.md into workflow prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reads .archon/knowledge/run-history.md at workflow start for $PROJECT_KNOWLEDGE substitution. Records run summaries after workflow completion via knowledge-writer. Respects package boundary (@archon/workflows reads filesystem, @archon/core writes via DB).
Accumulates deterministic run summaries in .archon/knowledge/run-history.md (newest first, capped at 50 entries). Future runs access prior knowledge via the new $PROJECT_KNOWLEDGE workflow variable. Respects the @archon/workflows → @archon/core package boundary: workflows reads the file via fs/promises; core writes via the knowledge-writer service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New bundled workflow demonstrating autonomous GitHub issue processing. Fetches issues labeled archon:auto, plans using $PROJECT_KNOWLEDGE, implements in a fresh session, validates with a fix loop, creates a draft PR, and handles success/failure outcomes via issue comments and label management. Designed to run on a cron schedule (see description for setup).
Adds archon-dark-factory to BUNDLED_WORKFLOWS so it ships with binary distributions alongside the other bundled workflows.
Bundled YAML workflow demonstrating autonomous GitHub issue processing. Fetches issues labeled 'archon:auto', plans with $PROJECT_KNOWLEDGE, implements in a fresh session, validates with a 5-iteration fix loop, creates a draft PR, and handles success/failure via issue comments and label management. Designed to run on a cron schedule. Composes the four prior improvements: prompt injection defense (automatic via sanitized $CONTEXT), cost analytics (runs appear in dashboard), scheduled triggers (via .archon/config.yaml schedules), and cross-run project knowledge ($PROJECT_KNOWLEDGE in plan node). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three independent peer reviews converged on Critical/Important issues in the bundled dark factory workflow. This commit addresses all of them: - Wrap $fetch-issue.output in XML trust boundary in plan prompt (Layer-1 pattern stripping only fires for $CONTEXT-family variables, not node outputs; issue bodies were flowing raw into the plan prompt) - Replace command: archon-implement with bridge-artifacts + archon-fix-issue (archon-implement expects $ARGUMENTS to be a file path; scheduler sets $ARGUMENTS to 'Scheduled run (cron)' which would crash the implement) - Swap archon:auto -> archon:done on success (current workflow kept the label, causing infinite re-processing on every scheduler tick) - Read PR URL from $ARTIFACTS_DIR/.pr-url instead of grepping stdout (archon-create-pr writes the canonical URL there; grep could match any URL in the command's prose output) - Use .pr-url sentinel file for failure-handler guard (previous guard treated 'create-pr streamed text then failed' as success, suppressing both comments) - Idempotent label setup commands in workflow description - Sync spec and plan docs to match shipped YAML (trigger_rule: all_done)
Three independent peer reviews converged on Critical/Important issues in the dark factory workflow. This branch addresses all of them: - Wrap $fetch-issue.output in XML trust boundary (injection defense) - Bridge artifacts pattern for plan → implement handoff (was broken for scheduled dispatch where $ARGUMENTS is not a file path) - Swap archon:auto → archon:done on success (prevents infinite loop) - Read PR URL from $ARTIFACTS_DIR/.pr-url sentinel (not fragile grep) - Use .pr-url sentinel for failure-handler guard - Idempotent label setup commands - Sync spec and plan docs to shipped behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The workflow scheduler was dispatching workflows directly against the codebase's live checkout (schedule.cwd), which meant every scheduled run would commit and push from the user's main working directory, potentially stomping on in-progress work. The scheduler now creates a dedicated worktree per run using the same pattern as the CLI (workflow.ts:467-499): getIsolationProvider().create() with workflowType='task' and a time-based identifier. Each tick gets its own isolated environment; old worktrees are reaped by the existing cleanup service. Also replaces the path-based overlap check (getActiveWorkflowRunByPath) with a codebase + workflow-name check, since scheduled runs now use worktree paths rather than schedule.cwd. Without this change, concurrent scheduler ticks would never detect each other. The recordWorkflowRun call still uses schedule.cwd (the canonical codebase path) so the knowledge file .archon/knowledge/run-history.md persists across runs in the repo itself, not in ephemeral worktrees. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Design specs and implementation plans written during brainstorming but never committed as part of their feature branches: - Prompt injection defense (#1) - Cost analytics aggregation (#2) - Scheduled workflow triggers (#3) - Cross-run project knowledge (#4) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dialect-aware query for average workflow run duration in seconds. Powers the Workflow Health dashboard card.
Adds successRate, avgDurationSeconds, and topFailingWorkflows to the CostAnalyticsResponse schema. Response name unchanged to preserve compatibility with existing CostSummaryCard.
Adds successRate (aggregate), avgDurationSeconds, and topFailingWorkflows to the response. Tracks per-workflow success/failure counts during aggregation. Noise filter: workflows with fewer than 3 total runs are excluded from topFailingWorkflows.
New card showing success rate, average duration, and top 3 failing workflows. Reuses the CostSummaryCard's TanStack Query cache entry (queryKey: 'cost-analytics') — one API call feeds both cards.
Placed immediately after CostSummaryCard so both analytics widgets appear together between the status bar and active workflows.
Extends the existing cost analytics endpoint with health data (success rate, average duration, top failing workflows) and adds a WorkflowHealthCard dashboard widget. Both cards share a TanStack Query cache entry — single network request feeds both. Health data: - successRate: aggregate across all terminal runs in period - avgDurationSeconds: average completed_at - started_at - topFailingWorkflows: ranked by failure rate, capped at 3, filters workflows with <3 runs to avoid misleading rankings Completes the 6-improvement arc inspired by Cole Medin's harness engineering patterns. The harness-elevates-model thesis is now empirically verifiable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three independent peer reviews converged on the same concrete issues: - Deduplicate formatDuration: use existing formatDurationMs from @/lib/format. The local helper had different output format (2m 30s vs 2.5m) which would render inconsistently beside other dashboard cards using the canonical formatter. - Add clock-skew guard to getAvgDuration: AND completed_at >= started_at prevents negative durations from corrupting the average if a row has bad timestamp order (clock adjustment, manual edit). - NaN defense in Number(raw) coercion: PostgreSQL NUMERIC can theoretically return 'NaN' as a string; Number.isFinite filter falls back to 0. - Add days parameter to queryKey: prevents latent cache collision when a future card wants a different time window. - Regenerate api.generated.d.ts: the OpenAPI-derived types were stale since Improvement #2 landed (neither feature regenerated the file). Now includes CostAnalyticsResponse with all health fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three independent reviews converged on concrete issues: - Deduplicate formatDuration: use existing formatDurationMs from @/lib/format (dashboard now renders consistent duration format across all cards instead of the inconsistent '2m 30s' vs '2.5m') - Clock-skew guard in getAvgDuration: AND completed_at >= started_at prevents negative durations from corrupting the average - NaN defense in Number coercion: guard against 'NaN' strings from PostgreSQL NUMERIC edge cases - days parameter in queryKey: forward-compatibility for future cards that want different time windows - Regenerated api.generated.d.ts: OpenAPI-derived types were stale from Improvement #2; now includes full analytics response shape Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Design spec and implementation plan for Improvement #6 that were written during brainstorming but not committed with the feature branch.
prospapledge88
pushed a commit
that referenced
this pull request
Apr 17, 2026
…coleam00#1263) * fix(bundled-defaults): auto-generate import list, emit inline strings Root-cause fix for bundle drift (15 commands + 7 workflows previously missing from binary distributions) and a prerequisite for packaging @archon/workflows as a Node-loadable SDK. The hand-maintained `bundled-defaults.ts` import list is replaced by `scripts/generate-bundled-defaults.ts`, which walks `.archon/{commands,workflows}/defaults/` and emits a generated source file with inline string literals. `bundled-defaults.ts` becomes a thin facade that re-exports the generated records and keeps the `isBinaryBuild()` helper. Inline strings (via JSON.stringify) replace Bun's `import X from '...' with { type: 'text' }` attributes. The binary build still embeds the data at compile time, but the module now loads under Node too — removing SDK blocker #2. - Generator: `scripts/generate-bundled-defaults.ts` (+ `--check` mode for CI) - `package.json`: `generate:bundled`, `check:bundled`; wired into `validate` - `build-binaries.sh`: regenerates defaults before compile - Test: `bundle completeness` now derives expected set from on-disk files - All 56 defaults (36 commands + 20 workflows) now in the bundle * fix(bundled-defaults): address PR review feedback Review: coleam00#1263 (comment) Generator: - Guard against .yaml/.yml name collisions (previously silent overwrite) - Add early access() check with actionable error when run from wrong cwd - Type top-level catch as unknown; print only message for Error instances - Drop redundant /* eslint-disable */ emission (global ignore covers it) - Fix misleading CI-mechanism claim in header comment - Collapse dead `if (!ext) continue` guard into a single typed pass Scripts get real type-checking + linting: - New scripts/tsconfig.json extending root config - type-check now includes scripts/ via `tsc --noEmit -p scripts/tsconfig.json` - Drop `scripts/**` from eslint ignores; add to projectService file scope Tests: - Inline listNames helper (Rule of Three) - Drop redundant toBeDefined/typeof assertions; the Record<string, string> type plus length > 50 already cover them - Add content-fidelity round-trip assertion (defense against generator content bugs, not just key-set drift) Facade comment: drop dead reference to .claude/rules/dx-quirks.md. CI: wire `bun run check:bundled` into .github/workflows/test.yml so the header's CI-verification claim is truthful. Docs: CLAUDE.md step count four→five; add contributor bullet about `bun run generate:bundled` in the Defaults section and CONTRIBUTING.md. * chore(e2e): bump Codex model to gpt-5.2 gpt-5.1-codex-mini is deprecated and unavailable on ChatGPT-account Codex auth. Plain gpt-5.2 works. Verified end-to-end: - e2e-codex-smoke: structured output returns {category:'math'} - e2e-mixed-providers: claude+codex both return expected tokens
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release 0.4.0
Six harness-engineering improvements inspired by Cole Medin's "Full Archon Guide"
livestream — prompt injection defense, cost analytics, scheduled workflow triggers,
cross-run project knowledge, a dark-factory reference workflow, and workflow health
metrics. Includes three rounds of peer-review fixes from independent code reviews.
Added
$CONTEXT,$ISSUE_CONTEXT, and$EXTERNAL_CONTEXT. Layer 1 strips known injection patterns (LLM role markers, Anthropic turn delimiters, instruction overrides, trust-boundary breakers). Layer 2 wraps the sanitized content in an XML trust boundary.GET /api/analytics/costsendpoint returning total spend, per-workflow cost breakdown, daily buckets, and success/failure cost splits.CostSummaryCardon the dashboard shows total spend, top 3 workflows by cost, and success vs. failure cost.schedules:configuration in per-repo.archon/config.yamlwith standard 5-field cron expressions. The scheduler runs on a 60-second tick and dispatches workflows via a dedicated worktree per run. Lightweight cron parser — no external dependencies..archon/knowledge/run-history.md(capped at 50 entries). Workflow prompts can inject prior run history via the new$PROJECT_KNOWLEDGEvariable.archon-dark-factoryYAML demonstrating autonomous GitHub issue processing with label gating (archon:auto), plan/implement/validate/PR flow, and explicit success/failure handling.WorkflowHealthCardshows success rate, average run duration, and top 3 failing workflows. Shares a TanStack Query cache entry withCostSummaryCard— one network call feeds both widgets.Changed
substituteWorkflowVariables()accepts an optionalprojectKnowledgeparameter; threaded throughbuildPromptWithContext()and all call sites.CostAnalyticsresponse shape extended with health fields; schema name preserved for compatibility.api.generated.d.tsregenerated from the OpenAPI spec.Fixed
bridge-artifactsnode +archon-fix-issuecommand (was broken for scheduler dispatch).archon:auto→archon:doneto prevent infinite re-processing; reads PR URL from$ARTIFACTS_DIR/.pr-urlsentinel instead of grepping stdout..pr-urlsentinel to distinguish streamed-text-then-failed from genuine success.getAvgDurationguards against clock-skew (negative durations) and NaN from PostgreSQL NUMERIC edge cases.queryKey: ['cost-analytics', { days: 30 }]for single-fetch behaviour.WorkflowHealthCarduses the existingformatDurationMshelper so duration renders consistently across all dashboard cards.Merging this PR releases 0.4.0 to main.