Skip to content

Release 0.4.0#2

Merged
prospapledge88 merged 34 commits intomainfrom
dev
Apr 14, 2026
Merged

Release 0.4.0#2
prospapledge88 merged 34 commits intomainfrom
dev

Conversation

@prospapledge88
Copy link
Copy Markdown
Owner

Release 0.4.0

Six harness-engineering improvements inspired by Cole Medin's "Full Archon Guide"
livestream — prompt injection defense, cost analytics, scheduled workflow triggers,
cross-run project knowledge, a dark-factory reference workflow, and workflow health
metrics. Includes three rounds of peer-review fixes from independent code reviews.

Added

  • Prompt injection defense for workflow inputs: two-layer defense for untrusted external content flowing into workflow prompts via $CONTEXT, $ISSUE_CONTEXT, and $EXTERNAL_CONTEXT. Layer 1 strips known injection patterns (LLM role markers, Anthropic turn delimiters, instruction overrides, trust-boundary breakers). Layer 2 wraps the sanitized content in an XML trust boundary.
  • Cost analytics API and dashboard: new GET /api/analytics/costs endpoint returning total spend, per-workflow cost breakdown, daily buckets, and success/failure cost splits. CostSummaryCard on the dashboard shows total spend, top 3 workflows by cost, and success vs. failure cost.
  • Scheduled workflow triggers: new schedules: configuration in per-repo .archon/config.yaml with standard 5-field cron expressions. The scheduler runs on a 60-second tick and dispatches workflows via a dedicated worktree per run. Lightweight cron parser — no external dependencies.
  • Cross-run project knowledge: every workflow run contributes a deterministic summary entry to .archon/knowledge/run-history.md (capped at 50 entries). Workflow prompts can inject prior run history via the new $PROJECT_KNOWLEDGE variable.
  • Dark-factory reference workflow: new bundled archon-dark-factory YAML demonstrating autonomous GitHub issue processing with label gating (archon:auto), plan/implement/validate/PR flow, and explicit success/failure handling.
  • Workflow health metrics on the dashboard: new WorkflowHealthCard shows success rate, average run duration, and top 3 failing workflows. Shares a TanStack Query cache entry with CostSummaryCard — one network call feeds both widgets.

Changed

  • substituteWorkflowVariables() accepts an optional projectKnowledge parameter; threaded through buildPromptWithContext() and all call sites.
  • Scheduled workflow dispatch now creates a dedicated worktree per run instead of executing against the codebase's live checkout.
  • CostAnalytics response shape extended with health fields; schema name preserved for compatibility.
  • api.generated.d.ts regenerated from the OpenAPI spec.

Fixed

  • Dark-factory plan→implement handoff via bridge-artifacts node + archon-fix-issue command (was broken for scheduler dispatch).
  • Dark-factory success handler swaps archon:autoarchon:done to prevent infinite re-processing; reads PR URL from $ARTIFACTS_DIR/.pr-url sentinel instead of grepping stdout.
  • Dark-factory failure handler uses .pr-url sentinel to distinguish streamed-text-then-failed from genuine success.
  • Scheduler overlap check switched from path-based to codebase+workflow-name (since scheduled runs now use worktree paths).
  • getAvgDuration guards against clock-skew (negative durations) and NaN from PostgreSQL NUMERIC edge cases.
  • Dashboard cards share an identical queryKey: ['cost-analytics', { days: 30 }] for single-fetch behaviour.
  • WorkflowHealthCard uses the existing formatDurationMs helper so duration renders consistently across all dashboard cards.

Merging this PR releases 0.4.0 to main.

cjnprospa and others added 30 commits April 13, 2026 14:49
Introduces stripInjectionPatterns() in sanitize-external.ts with four
pattern categories: LLM role markers, Anthropic turn delimiters,
instruction overrides, and trust boundary breakers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sanitizeExternalContent() combines pattern stripping with an XML
wrapper that instructs the AI to treat the content as data, not
instructions. Logs stripped patterns at warn level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stitution

substituteWorkflowVariables() and buildPromptWithContext() now sanitize
issueContext through sanitizeExternalContent() before substitution.
Untrusted content from GitHub issues is stripped of injection patterns
and wrapped in XML trust boundaries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two-layer defense for untrusted external content (, ,
): deterministic pattern stripping (LLM role markers,
Anthropic turn delimiters, instruction overrides, trust boundary breakers)
followed by XML trust boundary wrapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dialect-aware SQL queries for per-workflow cost breakdown and daily
cost totals. Reads existing total_cost_usd from workflow_runs metadata.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OpenAPI route returning aggregated workflow cost analytics:
total spend, success/failure breakdown, per-workflow costs,
and daily cost buckets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CostSummaryCard shows total spend, success/failure breakdown, and
top 3 workflows by cost. Uses TanStack Query with 30s stale time.
Hidden when no cost data is available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds GET /api/analytics/costs endpoint with per-workflow cost breakdown,
success/failure split, and daily buckets. Dashboard CostSummaryCard shows
total spend, top 3 workflows, and success/failure cost comparison.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5-field cron parser supporting wildcards, ranges, steps, and lists.
Used by the workflow scheduler to evaluate schedule triggers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minimal IWorkflowPlatform that logs workflow messages via Pino
instead of sending to a chat platform. Used for scheduled runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New ScheduleEntry type with workflow, cron, and enabled fields.
Parsed from per-repo .archon/config.yaml schedules: array.
Invalid entries (missing workflow or cron) are filtered out.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
60-second tick loop evaluates cron schedules from per-repo config.
Dispatches workflows via executeWorkflow() with a logging-only adapter.
Skips if a run is already active for the same workflow+path.
Rescans codebase configs every 5 minutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds cron-based workflow scheduling to Archon. Configured per-repo in
.archon/config.yaml with schedules: entries. 60-second tick loop evaluates
cron expressions and dispatches via executeWorkflow() with a logging-only
adapter. Skips overlapping runs. Rescans configs every 5 minutes.

Includes: cron parser, schedule adapter, config types, scheduler service.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracts deterministic run summaries into .archon/knowledge/run-history.md.
Supports formatting, prepending (newest first), and capping at 50 entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New optional variable for injecting cross-run project knowledge
from .archon/knowledge/run-history.md into workflow prompts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reads .archon/knowledge/run-history.md at workflow start for
$PROJECT_KNOWLEDGE substitution. Records run summaries after
workflow completion via knowledge-writer. Respects package boundary
(@archon/workflows reads filesystem, @archon/core writes via DB).
Accumulates deterministic run summaries in .archon/knowledge/run-history.md
(newest first, capped at 50 entries). Future runs access prior knowledge
via the new $PROJECT_KNOWLEDGE workflow variable. Respects the
@archon/workflows → @archon/core package boundary: workflows reads the
file via fs/promises; core writes via the knowledge-writer service.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New bundled workflow demonstrating autonomous GitHub issue processing.
Fetches issues labeled archon:auto, plans using $PROJECT_KNOWLEDGE,
implements in a fresh session, validates with a fix loop, creates a
draft PR, and handles success/failure outcomes via issue comments
and label management.

Designed to run on a cron schedule (see description for setup).
Adds archon-dark-factory to BUNDLED_WORKFLOWS so it ships with
binary distributions alongside the other bundled workflows.
Bundled YAML workflow demonstrating autonomous GitHub issue processing.
Fetches issues labeled 'archon:auto', plans with $PROJECT_KNOWLEDGE,
implements in a fresh session, validates with a 5-iteration fix loop,
creates a draft PR, and handles success/failure via issue comments
and label management. Designed to run on a cron schedule.

Composes the four prior improvements: prompt injection defense
(automatic via sanitized $CONTEXT), cost analytics (runs appear in
dashboard), scheduled triggers (via .archon/config.yaml schedules),
and cross-run project knowledge ($PROJECT_KNOWLEDGE in plan node).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three independent peer reviews converged on Critical/Important issues
in the bundled dark factory workflow. This commit addresses all of them:

- Wrap $fetch-issue.output in XML trust boundary in plan prompt
  (Layer-1 pattern stripping only fires for $CONTEXT-family variables,
  not node outputs; issue bodies were flowing raw into the plan prompt)
- Replace command: archon-implement with bridge-artifacts + archon-fix-issue
  (archon-implement expects $ARGUMENTS to be a file path; scheduler sets
  $ARGUMENTS to 'Scheduled run (cron)' which would crash the implement)
- Swap archon:auto -> archon:done on success
  (current workflow kept the label, causing infinite re-processing on
  every scheduler tick)
- Read PR URL from $ARTIFACTS_DIR/.pr-url instead of grepping stdout
  (archon-create-pr writes the canonical URL there; grep could match
  any URL in the command's prose output)
- Use .pr-url sentinel file for failure-handler guard
  (previous guard treated 'create-pr streamed text then failed' as
  success, suppressing both comments)
- Idempotent label setup commands in workflow description
- Sync spec and plan docs to match shipped YAML (trigger_rule: all_done)
Three independent peer reviews converged on Critical/Important issues
in the dark factory workflow. This branch addresses all of them:

- Wrap $fetch-issue.output in XML trust boundary (injection defense)
- Bridge artifacts pattern for plan → implement handoff (was broken
  for scheduled dispatch where $ARGUMENTS is not a file path)
- Swap archon:auto → archon:done on success (prevents infinite loop)
- Read PR URL from $ARTIFACTS_DIR/.pr-url sentinel (not fragile grep)
- Use .pr-url sentinel for failure-handler guard
- Idempotent label setup commands
- Sync spec and plan docs to shipped behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The workflow scheduler was dispatching workflows directly against the
codebase's live checkout (schedule.cwd), which meant every scheduled
run would commit and push from the user's main working directory,
potentially stomping on in-progress work.

The scheduler now creates a dedicated worktree per run using the same
pattern as the CLI (workflow.ts:467-499): getIsolationProvider().create()
with workflowType='task' and a time-based identifier. Each tick gets
its own isolated environment; old worktrees are reaped by the existing
cleanup service.

Also replaces the path-based overlap check (getActiveWorkflowRunByPath)
with a codebase + workflow-name check, since scheduled runs now use
worktree paths rather than schedule.cwd. Without this change, concurrent
scheduler ticks would never detect each other.

The recordWorkflowRun call still uses schedule.cwd (the canonical
codebase path) so the knowledge file .archon/knowledge/run-history.md
persists across runs in the repo itself, not in ephemeral worktrees.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Design specs and implementation plans written during brainstorming
but never committed as part of their feature branches:

- Prompt injection defense (#1)
- Cost analytics aggregation (#2)
- Scheduled workflow triggers (#3)
- Cross-run project knowledge (#4)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dialect-aware query for average workflow run duration in seconds.
Powers the Workflow Health dashboard card.
Adds successRate, avgDurationSeconds, and topFailingWorkflows to
the CostAnalyticsResponse schema. Response name unchanged to
preserve compatibility with existing CostSummaryCard.
Adds successRate (aggregate), avgDurationSeconds, and topFailingWorkflows
to the response. Tracks per-workflow success/failure counts during
aggregation. Noise filter: workflows with fewer than 3 total runs
are excluded from topFailingWorkflows.
New card showing success rate, average duration, and top 3 failing
workflows. Reuses the CostSummaryCard's TanStack Query cache entry
(queryKey: 'cost-analytics') — one API call feeds both cards.
Placed immediately after CostSummaryCard so both analytics widgets
appear together between the status bar and active workflows.
Extends the existing cost analytics endpoint with health data (success
rate, average duration, top failing workflows) and adds a
WorkflowHealthCard dashboard widget. Both cards share a TanStack
Query cache entry — single network request feeds both.

Health data:
- successRate: aggregate across all terminal runs in period
- avgDurationSeconds: average completed_at - started_at
- topFailingWorkflows: ranked by failure rate, capped at 3, filters
  workflows with <3 runs to avoid misleading rankings

Completes the 6-improvement arc inspired by Cole Medin's harness
engineering patterns. The harness-elevates-model thesis is now
empirically verifiable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cjnprospa and others added 4 commits April 14, 2026 13:30
Three independent peer reviews converged on the same concrete issues:

- Deduplicate formatDuration: use existing formatDurationMs from
  @/lib/format. The local helper had different output format (2m 30s
  vs 2.5m) which would render inconsistently beside other dashboard
  cards using the canonical formatter.
- Add clock-skew guard to getAvgDuration: AND completed_at >=
  started_at prevents negative durations from corrupting the average
  if a row has bad timestamp order (clock adjustment, manual edit).
- NaN defense in Number(raw) coercion: PostgreSQL NUMERIC can
  theoretically return 'NaN' as a string; Number.isFinite filter
  falls back to 0.
- Add days parameter to queryKey: prevents latent cache collision
  when a future card wants a different time window.
- Regenerate api.generated.d.ts: the OpenAPI-derived types were
  stale since Improvement #2 landed (neither feature regenerated
  the file). Now includes CostAnalyticsResponse with all health
  fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three independent reviews converged on concrete issues:

- Deduplicate formatDuration: use existing formatDurationMs from
  @/lib/format (dashboard now renders consistent duration format
  across all cards instead of the inconsistent '2m 30s' vs '2.5m')
- Clock-skew guard in getAvgDuration: AND completed_at >= started_at
  prevents negative durations from corrupting the average
- NaN defense in Number coercion: guard against 'NaN' strings from
  PostgreSQL NUMERIC edge cases
- days parameter in queryKey: forward-compatibility for future cards
  that want different time windows
- Regenerated api.generated.d.ts: OpenAPI-derived types were stale
  from Improvement #2; now includes full analytics response shape

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Design spec and implementation plan for Improvement #6 that were
written during brainstorming but not committed with the feature branch.
@prospapledge88 prospapledge88 merged commit 2f8816e into main Apr 14, 2026
prospapledge88 pushed a commit that referenced this pull request Apr 17, 2026
…coleam00#1263)

* fix(bundled-defaults): auto-generate import list, emit inline strings

Root-cause fix for bundle drift (15 commands + 7 workflows previously
missing from binary distributions) and a prerequisite for packaging
@archon/workflows as a Node-loadable SDK.

The hand-maintained `bundled-defaults.ts` import list is replaced by
`scripts/generate-bundled-defaults.ts`, which walks
`.archon/{commands,workflows}/defaults/` and emits a generated source
file with inline string literals. `bundled-defaults.ts` becomes a thin
facade that re-exports the generated records and keeps the
`isBinaryBuild()` helper.

Inline strings (via JSON.stringify) replace Bun's
`import X from '...' with { type: 'text' }` attributes. The binary build
still embeds the data at compile time, but the module now loads under
Node too — removing SDK blocker #2.

- Generator: `scripts/generate-bundled-defaults.ts` (+ `--check` mode for CI)
- `package.json`: `generate:bundled`, `check:bundled`; wired into `validate`
- `build-binaries.sh`: regenerates defaults before compile
- Test: `bundle completeness` now derives expected set from on-disk files
- All 56 defaults (36 commands + 20 workflows) now in the bundle

* fix(bundled-defaults): address PR review feedback

Review: coleam00#1263 (comment)

Generator:
- Guard against .yaml/.yml name collisions (previously silent overwrite)
- Add early access() check with actionable error when run from wrong cwd
- Type top-level catch as unknown; print only message for Error instances
- Drop redundant /* eslint-disable */ emission (global ignore covers it)
- Fix misleading CI-mechanism claim in header comment
- Collapse dead `if (!ext) continue` guard into a single typed pass

Scripts get real type-checking + linting:
- New scripts/tsconfig.json extending root config
- type-check now includes scripts/ via `tsc --noEmit -p scripts/tsconfig.json`
- Drop `scripts/**` from eslint ignores; add to projectService file scope

Tests:
- Inline listNames helper (Rule of Three)
- Drop redundant toBeDefined/typeof assertions; the Record<string, string>
  type plus length > 50 already cover them
- Add content-fidelity round-trip assertion (defense against generator
  content bugs, not just key-set drift)

Facade comment: drop dead reference to .claude/rules/dx-quirks.md.

CI: wire `bun run check:bundled` into .github/workflows/test.yml so the
header's CI-verification claim is truthful.

Docs: CLAUDE.md step count four→five; add contributor bullet about
`bun run generate:bundled` in the Defaults section and CONTRIBUTING.md.

* chore(e2e): bump Codex model to gpt-5.2

gpt-5.1-codex-mini is deprecated and unavailable on ChatGPT-account Codex
auth. Plain gpt-5.2 works. Verified end-to-end:

- e2e-codex-smoke: structured output returns {category:'math'}
- e2e-mixed-providers: claude+codex both return expected tokens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants