Skip to content

fix(workflows): make archon-adversarial-dev sed usage portable and add regression test#1

Merged
LaplaceYoung merged 1 commit intodevfrom
young/fix-issues-in-pr-and-comment
Apr 22, 2026
Merged

fix(workflows): make archon-adversarial-dev sed usage portable and add regression test#1
LaplaceYoung merged 1 commit intodevfrom
young/fix-issues-in-pr-and-comment

Conversation

@LaplaceYoung
Copy link
Copy Markdown
Owner

Motivation

  • Ensure the archon-adversarial-dev workflow init step works on environments where sed -i is non-portable by using an atomic temp-file + mv replacement.
  • Prevent regressions by asserting the portable flow is present and disallowing sed -i variants in bundled workflow content.
  • Improve cross-platform robustness for workflow initialization that writes state.json.

Description

  • Replaced non-portable sed -i call in .archon/workflows/defaults/archon-adversarial-dev.yaml with a temp-file flow using STATE_TMP, sed ... > "$STATE_TMP", and mv "$STATE_TMP" "$ARTIFACTS/state.json".
  • Added a regression test in packages/workflows/src/defaults/bundled-defaults.test.ts that requires the STATE_TMP pattern, the redirected sed invocation, the mv step, and asserts that no sed -i variants appear via a regex.
  • Small test-suite-related adjustments only; no production API or runtime behavior changes beyond the workflow YAML replacement.

Testing

  • Ran bun test packages/workflows/src/defaults/bundled-defaults.test.ts, which failed in this environment due to module resolution error Cannot find module '@archon/paths' and therefore could not complete the test run.
  • Ran bun --filter @archon/workflows test, which likewise failed with Cannot find module '@archon/paths' and did not produce passing results here.
  • The changes are unit-test covered by the new assertions; full CI in the repository (with correct workspace resolution) is expected to run the suite successfully.

Codex Task

@LaplaceYoung LaplaceYoung merged commit 8e4b866 into dev Apr 22, 2026
Wirasm added a commit that referenced this pull request Apr 22, 2026
…gent) (coleam00#1270)

* feat(providers): add Pi community provider (@mariozechner/pi-coding-agent)

Introduces Pi as the first community provider under the Phase 2 registry,
registered with builtIn: false. Wraps Pi's full coding-agent harness the
same way ClaudeProvider wraps @anthropic-ai/claude-agent-sdk and
CodexProvider wraps @openai/codex-sdk.

- PiProvider implements IAgentProvider; fresh AgentSession per sendQuery call
- AsyncQueue bridges Pi's callback-based session.subscribe() to Archon's
  AsyncGenerator<MessageChunk> contract
- Server-safe: AuthStorage.inMemory + SessionManager.inMemory +
  SettingsManager.inMemory + DefaultResourceLoader with all no* flags —
  no filesystem access, no cross-request state
- API key seeded per-call from options.env → process.env fallback
- Model refs: '<pi-provider-id>/<model-id>' (e.g. google/gemini-2.5-pro,
  openrouter/qwen/qwen3-coder) with syntactic compatibility check
- registerPiProvider() wired at CLI, server, and config-loader entrypoints,
  kept separate from registerBuiltinProviders() since builtIn: false is
  load-bearing for the community-provider validation story
- All 12 capability flags declared false in v1 — dag-executor warnings fire
  honestly for any unmapped nodeConfig field
- 58 new tests covering event mapping, async-queue semantics, model-ref
  parsing, defensive config parsing, registry integration

Supported Pi providers (v1): anthropic, openai, google, groq, mistral,
cerebras, xai, openrouter, huggingface. Extend PI_PROVIDER_ENV_VARS as
needed.

Out of scope (v1): session resume, MCP, hooks, skills mapping, thinking
level mapping, structured output, OAuth flows, model catalog validation.
These remain false on PI_CAPABILITIES until intentionally wired.

* feat(providers/pi): read ~/.pi/agent/auth.json for OAuth + api_key passthrough

Replaces the v1 env-var-only auth flow with AuthStorage.create(), which
reads ~/.pi/agent/auth.json. This transparently picks up credentials the
user has populated via `pi` → `/login` (OAuth subscriptions: Claude
Pro/Max, ChatGPT Plus, GitHub Copilot, Gemini CLI, Antigravity) or by
editing the file directly.

Env-var behavior preserved: when ANTHROPIC_API_KEY / GEMINI_API_KEY /
etc. is set (in process.env or per-request options.env), the adapter
calls setRuntimeApiKey which is priority #1 in Pi's resolution chain.
Auth.json entries are priority coleam00#2-coleam00#3. Pi's internal env-var fallback
remains priority coleam00#4 as a safety net.

Archon does not implement OAuth flows itself — it only rides on creds
the user created via the Pi CLI. OAuth refresh still happens inside Pi
(auth-storage.ts:369-413) under a file lock; concurrent refreshes
between the Pi CLI and Archon are race-safe by Pi's own design.

- Fail-fast error now mentions both the env-var path and `pi /login`
- 2 new tests: OAuth cred from auth.json; env var wins over auth.json
- 12 existing tests still pass (env-var-only path unchanged)

CI compatibility: no auth.json in CI, no change — env-var (secrets)
flows through Pi's getEnvApiKey fallback identically to v1.

* test(e2e): add Pi provider smoke test workflow

Mirrors e2e-claude-smoke.yaml: single prompt node + bash assert.
Targets `anthropic/claude-haiku-4-5` via `provider: pi`; works in CI
(ANTHROPIC_API_KEY secret) and locally (user's `pi /login` OAuth).

Verified locally with an Anthropic OAuth subscription — full run takes
~4s from session_started to assert PASS, exercising the async-queue
bridge and agent_end → result-chunk assembly under real Pi event timing.

Not yet wired into .github/workflows/e2e-smoke.yml — separate PR once
this lands, to keep the Pi provider PR minimal.

* feat(providers/pi): v2 — thinkingLevel, tool restrictions, systemPrompt

Extends the Pi adapter with three node-level translations, flipping the
corresponding capability flags from false → true so the dag-executor no
longer emits warnings for these fields on Pi nodes.

1. effort / thinking → Pi thinkingLevel (options-translator.ts)
   - Archon EffortLevel enum: low|medium|high|max (from
     packages/workflows/src/schemas/dag-node.ts). `max` maps to Pi's
     `xhigh` since Archon's enum lacks it.
   - Pi-native strings (minimal, xhigh, off) also accepted for
     programmatic callers bypassing the schema.
   - `off` on either field → no thinkingLevel (Pi's implicit off).
   - Claude-shape object `thinking: {type:'enabled', budget_tokens:N}`
     yields a system warning and is not applied.

2. allowed_tools / denied_tools → filtered Pi built-in tools
   - Supports all 7 Pi tools: read, bash, edit, write, grep, find, ls.
   - Case-insensitive normalization.
   - Empty `allowed_tools: []` means no tools (LLM-only), matching
     e2e-claude-smoke's idiom.
   - Unknown names (Claude-specific like `WebFetch`) collected and
     surfaced as a system warning; ignored tools don't fail the run.

3. systemPrompt (AgentRequestOptions + nodeConfig.systemPrompt)
   - Threaded through `DefaultResourceLoader({systemPrompt})`; Pi's
     default prompt is replaced entirely. Request-level wins over
     node-level.

Capability flag changes:
- thinkingControl: false → true
- effortControl:   false → true
- toolRestrictions: false → true

Package delta:
- +1 direct dep: @sinclair/typebox (Pi types reference it; adding as
  direct dep resolves the TS portable-type error).
- +1 test file: options-translator.test.ts (19 tests, 100% coverage).
- provider.test.ts extended with 11 new tests covering all three paths.
- registry.test.ts updated: capability assertion reflects new flags.

Live-verified: `bun run cli workflow run e2e-pi-smoke --no-worktree`
succeeds in 1.2s with thinkingLevel=low, toolCount=0. Smoke YAML updated
to use `effort: low` (schema-valid) + `allowed_tools: []` (LLM-only).

* test(e2e): add comprehensive Pi smoke covering every CI-compatible node type

Exercises every node type Archon supports under `provider: pi`, except
`approval:` (pauses for human input, incompatible with CI):
  1. prompt   — inline AI prompt
  2. command  — named command file (uses e2e-echo-command.md)
  3. loop     — bounded iterative AI prompt (max_iterations: 2)
  4. bash     — shell script with JSON output
  5. script   — bun runtime (echo-args.js)
  6. script   — uv / Python runtime (echo-py.py)

Plus DAG features on top of Pi:
  - depends_on + $nodeId.output substitution
  - when: conditional with JSON dot-access
  - trigger_rule: all_success merge
  - final assert node validates every upstream output is non-empty

Complements the minimal e2e-pi-smoke.yaml — that stays as the fast-path
smoke for connectivity checks; this one is the broader surface coverage.

Verified locally end-to-end against Anthropic OAuth (pi /login): PASS,
all 9 non-final nodes produce output, assert succeeds.

* feat(providers/pi): resolve Archon `skills:` names to Pi skill paths

Flips capabilities.skills: false → true by translating Archon's name-based
`skills:` nodeConfig (e.g. `skills: [agent-browser]`) to absolute directory
paths Pi's DefaultResourceLoader can consume via additionalSkillPaths.

Search order for each skill name (first match wins):
  1. <cwd>/.agents/skills/<name>/      — project-local, agentskills.io
  2. <cwd>/.claude/skills/<name>/      — project-local, Claude convention
  3. ~/.agents/skills/<name>/          — user-global, agentskills.io
  4. ~/.claude/skills/<name>/          — user-global, Claude convention

A directory resolves only if it contains a SKILL.md. Unresolved names are
collected and surfaced as a system-chunk warning (e.g. "Pi could not
resolve skill names: foo, bar. Searched .agents/skills and .claude/skills
(project + user-global)."), matching the semantic of "requested but not
found" without aborting the run.

Pi's buildSystemPrompt auto-appends the agentskills.io XML block for each
loaded skill, so the model sees them — no separate prompt injection needed
(Pi differs from Claude here; Claude wraps in an AgentDefinition with a
preloaded prompt, Pi uses XML block in system prompt).

Ancestor directory traversal above cwd is deliberately skipped in this
pass — matches the Pi provider's cwd-bound scope and avoids ambiguity
about which repo's skills win when Archon runs from a subdirectory.

Bun's os.homedir() bypasses the HOME env var; the resolver uses
`process.env.HOME ?? homedir()` so tests can stage a synthetic home dir.

Tests:
- 11 new tests in options-translator.test.ts cover project/user, .agents/
  vs .claude/, project-wins-over-user, SKILL.md presence check, dedup,
  missing-name collection.
- 2 new integration tests in provider.test.ts cover the missing-skill
  warning path and the "no skills configured → no additionalSkillPaths"
  path.
- registry.test.ts updated to assert skills: true in capabilities.

Live-verified locally: `.claude/skills/archon-dev/SKILL.md` resolves,
pi.session_started log shows `skillCount: 1, missingSkillCount: 0`,
smoke workflow passes in 1.2s.

* feat(providers/pi): session resume via Pi session store

Flips capabilities.sessionResume: false → true. Pi now persists sessions
under ~/.pi/agent/sessions/<encoded-cwd>/<uuid>.jsonl by default — same
pattern Claude and Codex use for their respective stores, same blast
radius as those providers.

Flow:
  - No resumeSessionId → SessionManager.create(cwd) (fresh, persisted)
  - resumeSessionId + match in SessionManager.list(cwd) → open(path)
  - resumeSessionId + no match → fresh session + system warning
    ("⚠️ Could not resume Pi session. Starting fresh conversation.")
    Matches Codex's resume_thread_failed fallback at
    packages/providers/src/codex/provider.ts:553-558.

The sessionId flows back to Archon via the terminal `result` chunk —
bridgeSession annotates it with session.sessionId unconditionally so
Archon's orchestrator can persist it and pass it as resumeSessionId on
the next turn. Same mechanism used for Claude/Codex.

Cross-cwd resume (e.g. worktree switch) is deliberately not supported in
this pass: list(cwd) scans only the current cwd's session dir. A workflow
that changes cwd mid-run lands on a fresh session, which matches Pi's
mental model.

Bridge sessionId annotation uses session.sessionId, which Pi always
populates (UUID) — so no special-case for inMemory sessions is needed.

Factored the resolver into session-resolver.ts (5 unit tests):
  - no id → create
  - id + match → open
  - id + no match → create with resumeFailed: true
  - list() throws → resumeFailed: true (graceful)
  - empty-string id → treated as "no resume requested"

Integration tests in provider.test.ts add 3 cases:
  - resume-not-found yields warning + calls create
  - resume-match calls open with the file path, no warning
  - result chunk always carries sessionId

Verified live end-to-end against Anthropic OAuth:
  - first call → sessionId 019d...; model replies "noted"
  - second call with that sessionId → "resumed: true" in logs; model
    correctly recalls prior turn ("Crimson.")
  - bogus sessionId → "⚠️ Could not resume..." warning + fresh UUID

* refactor(providers,core): generalize community-provider registration

Addresses the community-pattern regression flagged in the PR coleam00#1270 review:
a second community provider should require editing only its own directory,
not seven files across providers/ + core/ + cli/ + server/.

Three changes:

1. Drop typed `pi` slot from AssistantDefaultsConfig + AssistantDefaults.
   Community providers live behind the generic `[string]` index that
   `ProviderDefaultsMap` was explicitly designed to provide. The typed
   claude/codex slots stay — they give IDE autocomplete for built-in
   config access without `as` casts, which was the whole reason the
   intersection exists. Community providers parse their own config via
   Record<string, unknown> anyway, so the typed slot added no real
   parser safety.

2. Loop-based getDefaults + mergeAssistantDefaults. No more hardcoded
   `pi: {}` spreads. getDefaults() seeds from `getRegisteredProviders()`;
   mergeAssistantDefaults clones every slot present in `base`. Adding a
   new provider requires zero edits to this function.

3. New `registerCommunityProviders()` aggregator in registry.ts.
   Entrypoints (CLI, server, config-loader) call ONE function after
   `registerBuiltinProviders()` rather than one call per community
   provider. Adding a new community provider is now a single-line edit
   to registerCommunityProviders().

This makes Pi (and future community providers) actually behave like
Phase 2 (coleam00#1195) advertised: drop the implementation under
packages/providers/src/community/<id>/, export a `register<Id>Provider`,
add one line to the aggregator.

Tests:
- New `registerCommunityProviders` suite (2 tests: registers pi,
  idempotent).
- config-loader.test updated: assert built-in slots explicitly rather
  than exhaustive map shape.

No functional change for Pi end-users. Purely structural.

* fix(providers/pi,core): correctness + hygiene fixes from PR coleam00#1270 review

Addresses six of the review's important findings, all within the same
PR branch:

1. envInjection: false → true
   The provider reads requestOptions.env on every call (for API-key
   passthrough). Declaring the capability false caused a spurious
   dag-executor warning for every Pi user who configured codebase env
   vars — which is the MAIN auth path. Flipping to true removes the
   false positive.

2. toSafeAssistantDefaults: denylist → allowlist
   The old shape deleted `additionalDirectories`, `settingSources`,
   `codexBinaryPath` before sending defaults to the web UI. Any future
   sensitive provider field (OAuth token, absolute path, internal
   metadata) would silently leak via the `[key: string]: unknown` index
   signature. New SAFE_ASSISTANT_FIELDS map lists exactly what to
   expose per provider; unknown providers get an empty allowlist so
   the web UI sees "provider exists" but no config details.

3. AsyncQueue single-consumer invariant
   The type was documented single-consumer but unenforced. A second
   `for await` would silently race with the first over buffer +
   waiters. Added a synchronous guard in Symbol.asyncIterator that
   throws on second call — copy-paste mistakes now fail fast with a
   clear message instead of dropping items.

4. session.dispose() / session.abort() silent catches
   Both catch blocks now log at debug via a module-scoped logger so
   SDK regressions surface without polluting normal output.

5. Type scripted events as AgentSessionEvent in provider.test.ts
   Was `Record<string, unknown>` — Pi field renames would silently
   keep tests passing. Now typed against Pi's actual event union.

6. Leaked /tmp/pi-research/... path in provider.ts comment
   Local-machine path that crept in during research. Replaced with
   the upstream GitHub URL (matches convention at provider.ts:110).

Plus review-flagged simplifications:
  - Extract lookupPiModel wrapper — isolates the `as unknown as` cast
    behind one searchable name.
  - Hoist QueueItem → BridgeQueueItem at module scope (export'd for
    test visibility; not used externally yet but enables unit testing
    the mapping in isolation if needed later).
  - getRegisteredProviderNames: remove side-effecting registration
    calls. `loadConfig()` already bootstraps the registry before any
    caller can observe this helper — the hidden coupling was
    misleading.

Plus missing-coverage tests from the review (pr-test-analyzer):
  - session.prompt() rejection → error surfaces to consumer
  - pre-aborted signal → session.abort() called
  - mid-stream abort → session.abort() called
  - modelFallbackMessage → system chunk yielded
  - AsyncQueue second-consumer → throws synchronously

No behavioral changes for end users beyond the envInjection warning
fix.

* docs: Pi provider + community-provider contributor guide

Addresses the PR coleam00#1270 review's docs-impact findings: the original Pi
PR had no user-facing or contributor-facing documentation, and
architecture.md still referenced the pre-Phase-2 factory.ts pattern
(factory.ts was deleted in coleam00#1195).

1. packages/docs-web/src/content/docs/reference/architecture.md
   - Replace stale factory.ts references with the registry pattern.
   - Update inline IAgentProvider block: add getCapabilities, add
     options parameter.
   - Rewrite MessageChunk block as the actual discriminated union
     (was a placeholder with optional fields that didn't match the
     current type).
   - "Adding a New AI Agent Provider" checklist now distinguishes
     built-in (register in registerBuiltinProviders) from community
     (separate guide). Links to the new contributor guide.

2. packages/docs-web/src/content/docs/contributing/adding-a-community-provider.md (new)
   - Step-by-step guide using Pi as the reference implementation.
   - Covers: directory layout, capability discipline (start false,
     flip one at a time), provider class skeleton, registration via
     aggregator, test isolation (Bun mock.module pollution), what
     NOT to do (no edits to AssistantDefaultsConfig, no direct
     registerProvider from entrypoints, no overclaiming capabilities).

3. packages/docs-web/src/content/docs/getting-started/ai-assistants.md
   - New "Pi (Community Provider)" section: install, OAuth +
     API-key table per Pi backend, model ref format, workflow
     examples, capability matrix showing what Pi supports (session
     resume, tool restrictions, effort/thinking, skills, system
     prompt, envInjection) and what it doesn't (MCP, hooks,
     structured output, cost control, fallback model, sandbox).

4. .env.example
   - New Pi section with commented env vars for each supported
     backend (ANTHROPIC_API_KEY through HUGGINGFACE_API_KEY), each
     paired with its Pi provider id. OAuth flow (pi /login → auth.json)
     is explicitly called out — Archon reads that file too.

5. CHANGELOG.md
   - Unreleased entry for Pi, registerCommunityProviders aggregator,
     and the new contributor guide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant