fix(workflows): make archon-adversarial-dev sed usage portable and add regression test#1
Merged
LaplaceYoung merged 1 commit intodevfrom Apr 22, 2026
Merged
Conversation
Wirasm
added a commit
that referenced
this pull request
Apr 22, 2026
…gent) (coleam00#1270) * feat(providers): add Pi community provider (@mariozechner/pi-coding-agent) Introduces Pi as the first community provider under the Phase 2 registry, registered with builtIn: false. Wraps Pi's full coding-agent harness the same way ClaudeProvider wraps @anthropic-ai/claude-agent-sdk and CodexProvider wraps @openai/codex-sdk. - PiProvider implements IAgentProvider; fresh AgentSession per sendQuery call - AsyncQueue bridges Pi's callback-based session.subscribe() to Archon's AsyncGenerator<MessageChunk> contract - Server-safe: AuthStorage.inMemory + SessionManager.inMemory + SettingsManager.inMemory + DefaultResourceLoader with all no* flags — no filesystem access, no cross-request state - API key seeded per-call from options.env → process.env fallback - Model refs: '<pi-provider-id>/<model-id>' (e.g. google/gemini-2.5-pro, openrouter/qwen/qwen3-coder) with syntactic compatibility check - registerPiProvider() wired at CLI, server, and config-loader entrypoints, kept separate from registerBuiltinProviders() since builtIn: false is load-bearing for the community-provider validation story - All 12 capability flags declared false in v1 — dag-executor warnings fire honestly for any unmapped nodeConfig field - 58 new tests covering event mapping, async-queue semantics, model-ref parsing, defensive config parsing, registry integration Supported Pi providers (v1): anthropic, openai, google, groq, mistral, cerebras, xai, openrouter, huggingface. Extend PI_PROVIDER_ENV_VARS as needed. Out of scope (v1): session resume, MCP, hooks, skills mapping, thinking level mapping, structured output, OAuth flows, model catalog validation. These remain false on PI_CAPABILITIES until intentionally wired. * feat(providers/pi): read ~/.pi/agent/auth.json for OAuth + api_key passthrough Replaces the v1 env-var-only auth flow with AuthStorage.create(), which reads ~/.pi/agent/auth.json. This transparently picks up credentials the user has populated via `pi` → `/login` (OAuth subscriptions: Claude Pro/Max, ChatGPT Plus, GitHub Copilot, Gemini CLI, Antigravity) or by editing the file directly. Env-var behavior preserved: when ANTHROPIC_API_KEY / GEMINI_API_KEY / etc. is set (in process.env or per-request options.env), the adapter calls setRuntimeApiKey which is priority #1 in Pi's resolution chain. Auth.json entries are priority coleam00#2-coleam00#3. Pi's internal env-var fallback remains priority coleam00#4 as a safety net. Archon does not implement OAuth flows itself — it only rides on creds the user created via the Pi CLI. OAuth refresh still happens inside Pi (auth-storage.ts:369-413) under a file lock; concurrent refreshes between the Pi CLI and Archon are race-safe by Pi's own design. - Fail-fast error now mentions both the env-var path and `pi /login` - 2 new tests: OAuth cred from auth.json; env var wins over auth.json - 12 existing tests still pass (env-var-only path unchanged) CI compatibility: no auth.json in CI, no change — env-var (secrets) flows through Pi's getEnvApiKey fallback identically to v1. * test(e2e): add Pi provider smoke test workflow Mirrors e2e-claude-smoke.yaml: single prompt node + bash assert. Targets `anthropic/claude-haiku-4-5` via `provider: pi`; works in CI (ANTHROPIC_API_KEY secret) and locally (user's `pi /login` OAuth). Verified locally with an Anthropic OAuth subscription — full run takes ~4s from session_started to assert PASS, exercising the async-queue bridge and agent_end → result-chunk assembly under real Pi event timing. Not yet wired into .github/workflows/e2e-smoke.yml — separate PR once this lands, to keep the Pi provider PR minimal. * feat(providers/pi): v2 — thinkingLevel, tool restrictions, systemPrompt Extends the Pi adapter with three node-level translations, flipping the corresponding capability flags from false → true so the dag-executor no longer emits warnings for these fields on Pi nodes. 1. effort / thinking → Pi thinkingLevel (options-translator.ts) - Archon EffortLevel enum: low|medium|high|max (from packages/workflows/src/schemas/dag-node.ts). `max` maps to Pi's `xhigh` since Archon's enum lacks it. - Pi-native strings (minimal, xhigh, off) also accepted for programmatic callers bypassing the schema. - `off` on either field → no thinkingLevel (Pi's implicit off). - Claude-shape object `thinking: {type:'enabled', budget_tokens:N}` yields a system warning and is not applied. 2. allowed_tools / denied_tools → filtered Pi built-in tools - Supports all 7 Pi tools: read, bash, edit, write, grep, find, ls. - Case-insensitive normalization. - Empty `allowed_tools: []` means no tools (LLM-only), matching e2e-claude-smoke's idiom. - Unknown names (Claude-specific like `WebFetch`) collected and surfaced as a system warning; ignored tools don't fail the run. 3. systemPrompt (AgentRequestOptions + nodeConfig.systemPrompt) - Threaded through `DefaultResourceLoader({systemPrompt})`; Pi's default prompt is replaced entirely. Request-level wins over node-level. Capability flag changes: - thinkingControl: false → true - effortControl: false → true - toolRestrictions: false → true Package delta: - +1 direct dep: @sinclair/typebox (Pi types reference it; adding as direct dep resolves the TS portable-type error). - +1 test file: options-translator.test.ts (19 tests, 100% coverage). - provider.test.ts extended with 11 new tests covering all three paths. - registry.test.ts updated: capability assertion reflects new flags. Live-verified: `bun run cli workflow run e2e-pi-smoke --no-worktree` succeeds in 1.2s with thinkingLevel=low, toolCount=0. Smoke YAML updated to use `effort: low` (schema-valid) + `allowed_tools: []` (LLM-only). * test(e2e): add comprehensive Pi smoke covering every CI-compatible node type Exercises every node type Archon supports under `provider: pi`, except `approval:` (pauses for human input, incompatible with CI): 1. prompt — inline AI prompt 2. command — named command file (uses e2e-echo-command.md) 3. loop — bounded iterative AI prompt (max_iterations: 2) 4. bash — shell script with JSON output 5. script — bun runtime (echo-args.js) 6. script — uv / Python runtime (echo-py.py) Plus DAG features on top of Pi: - depends_on + $nodeId.output substitution - when: conditional with JSON dot-access - trigger_rule: all_success merge - final assert node validates every upstream output is non-empty Complements the minimal e2e-pi-smoke.yaml — that stays as the fast-path smoke for connectivity checks; this one is the broader surface coverage. Verified locally end-to-end against Anthropic OAuth (pi /login): PASS, all 9 non-final nodes produce output, assert succeeds. * feat(providers/pi): resolve Archon `skills:` names to Pi skill paths Flips capabilities.skills: false → true by translating Archon's name-based `skills:` nodeConfig (e.g. `skills: [agent-browser]`) to absolute directory paths Pi's DefaultResourceLoader can consume via additionalSkillPaths. Search order for each skill name (first match wins): 1. <cwd>/.agents/skills/<name>/ — project-local, agentskills.io 2. <cwd>/.claude/skills/<name>/ — project-local, Claude convention 3. ~/.agents/skills/<name>/ — user-global, agentskills.io 4. ~/.claude/skills/<name>/ — user-global, Claude convention A directory resolves only if it contains a SKILL.md. Unresolved names are collected and surfaced as a system-chunk warning (e.g. "Pi could not resolve skill names: foo, bar. Searched .agents/skills and .claude/skills (project + user-global)."), matching the semantic of "requested but not found" without aborting the run. Pi's buildSystemPrompt auto-appends the agentskills.io XML block for each loaded skill, so the model sees them — no separate prompt injection needed (Pi differs from Claude here; Claude wraps in an AgentDefinition with a preloaded prompt, Pi uses XML block in system prompt). Ancestor directory traversal above cwd is deliberately skipped in this pass — matches the Pi provider's cwd-bound scope and avoids ambiguity about which repo's skills win when Archon runs from a subdirectory. Bun's os.homedir() bypasses the HOME env var; the resolver uses `process.env.HOME ?? homedir()` so tests can stage a synthetic home dir. Tests: - 11 new tests in options-translator.test.ts cover project/user, .agents/ vs .claude/, project-wins-over-user, SKILL.md presence check, dedup, missing-name collection. - 2 new integration tests in provider.test.ts cover the missing-skill warning path and the "no skills configured → no additionalSkillPaths" path. - registry.test.ts updated to assert skills: true in capabilities. Live-verified locally: `.claude/skills/archon-dev/SKILL.md` resolves, pi.session_started log shows `skillCount: 1, missingSkillCount: 0`, smoke workflow passes in 1.2s. * feat(providers/pi): session resume via Pi session store Flips capabilities.sessionResume: false → true. Pi now persists sessions under ~/.pi/agent/sessions/<encoded-cwd>/<uuid>.jsonl by default — same pattern Claude and Codex use for their respective stores, same blast radius as those providers. Flow: - No resumeSessionId → SessionManager.create(cwd) (fresh, persisted) - resumeSessionId + match in SessionManager.list(cwd) → open(path) - resumeSessionId + no match → fresh session + system warning ("⚠️ Could not resume Pi session. Starting fresh conversation.") Matches Codex's resume_thread_failed fallback at packages/providers/src/codex/provider.ts:553-558. The sessionId flows back to Archon via the terminal `result` chunk — bridgeSession annotates it with session.sessionId unconditionally so Archon's orchestrator can persist it and pass it as resumeSessionId on the next turn. Same mechanism used for Claude/Codex. Cross-cwd resume (e.g. worktree switch) is deliberately not supported in this pass: list(cwd) scans only the current cwd's session dir. A workflow that changes cwd mid-run lands on a fresh session, which matches Pi's mental model. Bridge sessionId annotation uses session.sessionId, which Pi always populates (UUID) — so no special-case for inMemory sessions is needed. Factored the resolver into session-resolver.ts (5 unit tests): - no id → create - id + match → open - id + no match → create with resumeFailed: true - list() throws → resumeFailed: true (graceful) - empty-string id → treated as "no resume requested" Integration tests in provider.test.ts add 3 cases: - resume-not-found yields warning + calls create - resume-match calls open with the file path, no warning - result chunk always carries sessionId Verified live end-to-end against Anthropic OAuth: - first call → sessionId 019d...; model replies "noted" - second call with that sessionId → "resumed: true" in logs; model correctly recalls prior turn ("Crimson.") - bogus sessionId → "⚠️ Could not resume..." warning + fresh UUID * refactor(providers,core): generalize community-provider registration Addresses the community-pattern regression flagged in the PR coleam00#1270 review: a second community provider should require editing only its own directory, not seven files across providers/ + core/ + cli/ + server/. Three changes: 1. Drop typed `pi` slot from AssistantDefaultsConfig + AssistantDefaults. Community providers live behind the generic `[string]` index that `ProviderDefaultsMap` was explicitly designed to provide. The typed claude/codex slots stay — they give IDE autocomplete for built-in config access without `as` casts, which was the whole reason the intersection exists. Community providers parse their own config via Record<string, unknown> anyway, so the typed slot added no real parser safety. 2. Loop-based getDefaults + mergeAssistantDefaults. No more hardcoded `pi: {}` spreads. getDefaults() seeds from `getRegisteredProviders()`; mergeAssistantDefaults clones every slot present in `base`. Adding a new provider requires zero edits to this function. 3. New `registerCommunityProviders()` aggregator in registry.ts. Entrypoints (CLI, server, config-loader) call ONE function after `registerBuiltinProviders()` rather than one call per community provider. Adding a new community provider is now a single-line edit to registerCommunityProviders(). This makes Pi (and future community providers) actually behave like Phase 2 (coleam00#1195) advertised: drop the implementation under packages/providers/src/community/<id>/, export a `register<Id>Provider`, add one line to the aggregator. Tests: - New `registerCommunityProviders` suite (2 tests: registers pi, idempotent). - config-loader.test updated: assert built-in slots explicitly rather than exhaustive map shape. No functional change for Pi end-users. Purely structural. * fix(providers/pi,core): correctness + hygiene fixes from PR coleam00#1270 review Addresses six of the review's important findings, all within the same PR branch: 1. envInjection: false → true The provider reads requestOptions.env on every call (for API-key passthrough). Declaring the capability false caused a spurious dag-executor warning for every Pi user who configured codebase env vars — which is the MAIN auth path. Flipping to true removes the false positive. 2. toSafeAssistantDefaults: denylist → allowlist The old shape deleted `additionalDirectories`, `settingSources`, `codexBinaryPath` before sending defaults to the web UI. Any future sensitive provider field (OAuth token, absolute path, internal metadata) would silently leak via the `[key: string]: unknown` index signature. New SAFE_ASSISTANT_FIELDS map lists exactly what to expose per provider; unknown providers get an empty allowlist so the web UI sees "provider exists" but no config details. 3. AsyncQueue single-consumer invariant The type was documented single-consumer but unenforced. A second `for await` would silently race with the first over buffer + waiters. Added a synchronous guard in Symbol.asyncIterator that throws on second call — copy-paste mistakes now fail fast with a clear message instead of dropping items. 4. session.dispose() / session.abort() silent catches Both catch blocks now log at debug via a module-scoped logger so SDK regressions surface without polluting normal output. 5. Type scripted events as AgentSessionEvent in provider.test.ts Was `Record<string, unknown>` — Pi field renames would silently keep tests passing. Now typed against Pi's actual event union. 6. Leaked /tmp/pi-research/... path in provider.ts comment Local-machine path that crept in during research. Replaced with the upstream GitHub URL (matches convention at provider.ts:110). Plus review-flagged simplifications: - Extract lookupPiModel wrapper — isolates the `as unknown as` cast behind one searchable name. - Hoist QueueItem → BridgeQueueItem at module scope (export'd for test visibility; not used externally yet but enables unit testing the mapping in isolation if needed later). - getRegisteredProviderNames: remove side-effecting registration calls. `loadConfig()` already bootstraps the registry before any caller can observe this helper — the hidden coupling was misleading. Plus missing-coverage tests from the review (pr-test-analyzer): - session.prompt() rejection → error surfaces to consumer - pre-aborted signal → session.abort() called - mid-stream abort → session.abort() called - modelFallbackMessage → system chunk yielded - AsyncQueue second-consumer → throws synchronously No behavioral changes for end users beyond the envInjection warning fix. * docs: Pi provider + community-provider contributor guide Addresses the PR coleam00#1270 review's docs-impact findings: the original Pi PR had no user-facing or contributor-facing documentation, and architecture.md still referenced the pre-Phase-2 factory.ts pattern (factory.ts was deleted in coleam00#1195). 1. packages/docs-web/src/content/docs/reference/architecture.md - Replace stale factory.ts references with the registry pattern. - Update inline IAgentProvider block: add getCapabilities, add options parameter. - Rewrite MessageChunk block as the actual discriminated union (was a placeholder with optional fields that didn't match the current type). - "Adding a New AI Agent Provider" checklist now distinguishes built-in (register in registerBuiltinProviders) from community (separate guide). Links to the new contributor guide. 2. packages/docs-web/src/content/docs/contributing/adding-a-community-provider.md (new) - Step-by-step guide using Pi as the reference implementation. - Covers: directory layout, capability discipline (start false, flip one at a time), provider class skeleton, registration via aggregator, test isolation (Bun mock.module pollution), what NOT to do (no edits to AssistantDefaultsConfig, no direct registerProvider from entrypoints, no overclaiming capabilities). 3. packages/docs-web/src/content/docs/getting-started/ai-assistants.md - New "Pi (Community Provider)" section: install, OAuth + API-key table per Pi backend, model ref format, workflow examples, capability matrix showing what Pi supports (session resume, tool restrictions, effort/thinking, skills, system prompt, envInjection) and what it doesn't (MCP, hooks, structured output, cost control, fallback model, sandbox). 4. .env.example - New Pi section with commented env vars for each supported backend (ANTHROPIC_API_KEY through HUGGINGFACE_API_KEY), each paired with its Pi provider id. OAuth flow (pi /login → auth.json) is explicitly called out — Archon reads that file too. 5. CHANGELOG.md - Unreleased entry for Pi, registerCommunityProviders aggregator, and the new contributor guide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
archon-adversarial-devworkflow init step works on environments wheresed -iis non-portable by using an atomic temp-file +mvreplacement.sed -ivariants in bundled workflow content.state.json.Description
sed -icall in.archon/workflows/defaults/archon-adversarial-dev.yamlwith a temp-file flow usingSTATE_TMP,sed ... > "$STATE_TMP", andmv "$STATE_TMP" "$ARTIFACTS/state.json".packages/workflows/src/defaults/bundled-defaults.test.tsthat requires theSTATE_TMPpattern, the redirectedsedinvocation, themvstep, and asserts that nosed -ivariants appear via a regex.Testing
bun test packages/workflows/src/defaults/bundled-defaults.test.ts, which failed in this environment due to module resolution errorCannot find module '@archon/paths'and therefore could not complete the test run.bun --filter @archon/workflows test, which likewise failed withCannot find module '@archon/paths'and did not produce passing results here.Codex Task