Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- **`$LOOP_PREV_OUTPUT` workflow variable (loop nodes only)** — exposes the previous iteration's cleaned output (after `<promise>` tag stripping) to the current iteration's prompt. Empty on the first iteration and on the first iteration after resuming from an interactive approval gate. Enables `fresh_context: true` loops to reference what the prior pass said or did without carrying full session history. (#1367)

### Changed

- **Provider/model resolution: trust the SDK, drop allow-lists.** Removed `inferProviderFromModel` and `isModelCompatible` entirely. Provider is now resolved via a flat explicit chain — `node.provider ?? workflow.provider ?? config.assistant` — and never inferred from the model string. Model strings pass through to the SDK unchanged; the SDK validates them at request time. Codex's stream loop now matches Claude's contract (every terminal close emits exactly one `result` chunk; `error` events without a recovering `turn.completed` synthesize `result.isError` with subtype `codex_stream_incomplete`; `turn.failed` becomes `codex_turn_failed`). AI nodes that exit the streaming loop with empty assistant text and no structured output now fail loudly with `dag.node_empty_output` instead of completing as silent zero-output successes. Provider-id typos (workflow-level and per-node) are caught at YAML load time. **Migration**: workflows that previously relied on cross-provider model inference (e.g. `model: gpt-5.2-codex` with no `provider:`, expecting Archon to pick `codex` because Claude's allow-list rejected the string) must now set `provider:` explicitly. Workflows that already set both `provider:` and `model:` — and workflows that set only `model:` matching `config.assistant` — keep working unchanged. (#1463)

### Fixed

- **Claude provider crashed in dev mode with `error: unknown option '--no-env-file'`.** The Claude Agent SDK switched from shipping `cli.js` to per-platform native binaries (via optional deps) in the 0.2.x series. Archon's `shouldPassNoEnvFile` predicate kept emitting the Bun-only `--no-env-file` flag in dev mode (when the SDK resolves its bundled binary), which the native binary rejects. Tightened the predicate to only emit the flag for explicitly-configured Bun-runnable JS entry points (`.js`/`.mjs`/`.cjs`). Target-repo `.env` isolation is unchanged — `stripCwdEnv()` at process boot remains the primary guard, and the native Claude binary does not auto-load `.env` from its cwd. (#1461)
Expand Down
7 changes: 3 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -501,10 +501,9 @@ assistants:
3. SDK defaults

**Model Validation:**
- Workflows are validated at load time for provider/model compatibility
- Claude models: `sonnet`, `opus`, `haiku`, `claude-*`, `inherit`
- Codex models: Any model except Claude-specific aliases
- Invalid combinations fail workflow loading with clear error messages
- Workflows are validated at load time for provider _identity_ only — `provider:` (workflow-level and per-node) must be a registered provider id, otherwise the YAML is rejected with `Unknown provider '<id>'. Registered: claude, codex, pi`.
- Model strings are NOT validated by Archon. Whatever the user writes in `model:` is forwarded verbatim to the resolved SDK. Vendor SDKs ship new models faster than Archon can update; the SDK and the upstream API are the source of truth for what names exist.
- Provider is resolved via an explicit chain: `node.provider ?? workflow.provider ?? config.assistant`. Model never influences provider selection.

### Running the App in Worktrees

Expand Down
2 changes: 1 addition & 1 deletion packages/docs-web/src/content/docs/book/quick-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ defaults:
| `Routing unclear — falling back to archon-assist` | No workflow matched the input | Use an explicit workflow name: `archon workflow run my-workflow "..."` |
| `Worktree already exists for branch X` | Prior run left a worktree | Run `archon complete X` or `archon isolation cleanup` |
| `Not a git repository` | Running outside a repo | `cd` into a git repo first — workflow and isolation commands require one |
| `Model X is not valid for provider Y` | Provider/model mismatch | Each provider accepts specific models — check the provider's `isModelCompatible` rules. Claude accepts `sonnet`, `opus`, `haiku`, `claude-*`; Codex accepts other models. |
| `Unknown provider 'X'. Registered: claude, codex, pi` | Typo in `provider:` (workflow root or node-level) | Set `provider:` to one of the registered ids. Model strings themselves are not validated at load time — the SDK rejects unknown models at request time. |
| `$BASE_BRANCH referenced but could not be detected` | No base branch set and auto-detection failed | Set `worktree.baseBranch` in `.archon/config.yaml` or ensure `main`/`master` exists |
| Workflow hangs with no output | Node idle timeout hit | Increase `idle_timeout` on the node (milliseconds) |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,6 @@ export function registerYourProvider(): void {
displayName: 'Your Provider (community)',
factory: () => new YourProvider(),
capabilities: YOUR_CAPABILITIES,
isModelCompatible: (model) => /* pattern check */,
builtIn: false, // ← important: community providers are NOT built-in
});
}
Expand All @@ -147,7 +146,7 @@ Co-locate tests next to your code. The Pi tests use this isolation pattern:

- Mock the SDK (`mock.module` at the top of the file, before importing your provider).
- Tests that touch `mock.module` are split into separate `bun test` invocations in `packages/providers/package.json` (see existing entries for the Pi files). Bun's `mock.module` is process-global and irreversible — splitting prevents cross-file pollution.
- Registry test (`packages/providers/src/registry.test.ts`): add a `describe` block asserting `builtIn: false`, idempotent registration, and `isModelCompatible` behavior.
- Registry test (`packages/providers/src/registry.test.ts`): add a `describe` block asserting `builtIn: false` and idempotent registration.

### 5. Capability discipline

Expand Down
30 changes: 15 additions & 15 deletions packages/docs-web/src/content/docs/guides/authoring-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -602,16 +602,15 @@ provider: claude # Any registered provider (default: from config)
model: sonnet # Model override (default: from config assistants.claude.model)
```

**Claude models:**
- `sonnet` - Fast, balanced (recommended)
- `opus` - Powerful, expensive
- `haiku` - Fast, lightweight
- `claude-*` - Full model IDs (e.g., `claude-3-5-sonnet-20241022`)
- `inherit` - Use model from previous session
**Model strings:** Whatever you write in `model:` is forwarded verbatim to the resolved provider's SDK. Archon doesn't keep an internal allow-list, because vendor SDKs ship new models faster than this doc can. The provider's API decides whether the string is valid at request time.

**Codex models:**
- Any OpenAI model ID (e.g., `gpt-5.3-codex`, `o5-pro`)
- Cannot use Claude model aliases
Common shapes you'll see in practice:

- **Claude (Anthropic):** family aliases (`sonnet`, `opus`, `haiku`), full model IDs (`claude-opus-4-7`, `claude-3-5-sonnet-20241022`), context-window suffixed forms (`opus[1m]`, `claude-opus-4-7[1m]`), or `inherit` to reuse the previous session's model.
- **Codex (OpenAI):** any OpenAI model ID — `gpt-5.3-codex`, `gpt-5.2`, `o5-pro`, etc.
- **Pi (community):** `<backend>/<model-id>` refs — e.g. `google/gemini-2.5-pro`, `openrouter/qwen/qwen3-coder`.

If the SDK rejects the string at request time, the node fails loudly with the SDK's error message — Archon never silently re-routes a model from one provider to another based on the string.

### Codex-Specific Options

Expand Down Expand Up @@ -676,18 +675,19 @@ nodes:
**Platforms:** `interactive` only affects the web platform. CLI, Slack, Telegram, and
GitHub always run workflows in foreground mode regardless of this setting.

### Model Validation
### Provider Validation

Workflows are validated at load time:
- Provider/model compatibility checked
- Invalid combinations fail with clear error messages
- Validation errors shown in `/workflow list`
Workflows are validated at load time for **provider identity only**:
- Both the workflow-level `provider:` and any per-node `provider:` overrides must name a registered provider (`claude`, `codex`, `pi`).
- Validation errors are shown in `/workflow list`.

Example validation error:
```
Model "sonnet" is not compatible with provider "codex"
Unknown provider 'claud'. Registered: claude, codex, pi
```
Comment on lines 684 to 687
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to this fenced block.

Line 685 opens a bare code fence, which triggers MD040 in markdownlint. Using text here keeps the docs clean and avoids validation noise.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 685-685: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/docs-web/src/content/docs/guides/authoring-workflows.md` around
lines 684 - 687, Change the bare fenced code block that contains the example
validation error "Unknown provider 'claud'. Registered: claude, codex, pi" so
the opening fence is labeled with the text language (i.e., replace ``` with
```text) to satisfy markdownlint MD040; locate the fenced block near the Example
validation error and update only the opening fence.


Model strings are not validated at load time — they're forwarded to the SDK as-is and validated by the upstream API at request time.

### Resource Validation (CLI)

To validate that all referenced command files, MCP config files, and skill directories exist on disk, run:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,6 @@ export function registerBuiltinProviders(): void {
displayName: 'Your Assistant',
factory: () => new YourAssistantProvider(),
capabilities: YOUR_ASSISTANT_CAPABILITIES,
isModelCompatible: (model) => /* pattern check */,
builtIn: true,
},
// ...existing entries
Expand Down
124 changes: 107 additions & 17 deletions packages/providers/src/codex/provider.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -870,10 +870,13 @@ describe('CodexProvider', () => {
);
});

test('handles error events', async () => {
test('error events followed by turn.completed yield a clean result (recoverable)', async () => {
// SDK error events that are followed by turn.completed indicate the SDK
// recovered internally. The dropped error message is logged but not
// surfaced \u2014 only one terminal result chunk is yielded.
mockRunStreamed.mockResolvedValue({
events: (async function* () {
yield { type: 'error', message: 'Something went wrong' };
yield { type: 'error', message: 'Transient blip' };
yield { type: 'turn.completed', usage: defaultUsage };
})(),
});
Expand All @@ -883,14 +886,44 @@ describe('CodexProvider', () => {
chunks.push(chunk);
}

expect(chunks[0]).toEqual({ type: 'system', content: '\u26A0\uFE0F Something went wrong' });
expect(mockLogger.error).toHaveBeenCalledWith(
{ message: 'Something went wrong' },
'stream_error'
);
expect(chunks).toHaveLength(1);
expect(chunks[0]).toEqual({
type: 'result',
sessionId: 'new-thread-id',
tokens: { input: 10, output: 5 },
});
expect(mockLogger.error).toHaveBeenCalledWith({ message: 'Transient blip' }, 'stream_error');
});

test('error event followed by stream close yields fail-stop result.isError', async () => {
// The SDK sends an error event (e.g. "model not supported") and the
// iterator closes without turn.completed or turn.failed. The provider
// synthesizes a fail-stop result so the dag-executor's msg.isError
// branch catches the failure \u2014 same chunk shape as Claude.
mockRunStreamed.mockResolvedValue({
events: (async function* () {
yield { type: 'error', message: "'opus[1m]' model is not supported" };
})(),
});

const chunks = [];
for await (const chunk of client.sendQuery('test', '/workspace')) {
chunks.push(chunk);
}

expect(chunks).toHaveLength(1);
expect(chunks[0]).toEqual({
type: 'result',
sessionId: 'new-thread-id',
isError: true,
errorSubtype: 'codex_stream_incomplete',
errors: ["'opus[1m]' model is not supported"],
});
});

test('suppresses MCP timeout errors', async () => {
test('MCP client errors followed by turn.completed yield clean result', async () => {
// MCP client errors are non-fatal \u2014 Codex retries internally.
// Only after turn.completed do we know the SDK recovered.
mockRunStreamed.mockResolvedValue({
events: (async function* () {
yield { type: 'error', message: 'MCP client connection timeout' };
Expand All @@ -903,22 +936,46 @@ describe('CodexProvider', () => {
chunks.push(chunk);
}

// Should only have the result, not the MCP error
expect(chunks).toHaveLength(1);
expect(chunks[0]).toEqual({
type: 'result',
sessionId: 'new-thread-id',
tokens: { input: 10, output: 5 },
});

// Error is still logged even though not sent to user
// Logged but not surfaced as failure
expect(mockLogger.error).toHaveBeenCalledWith(
{ message: 'MCP client connection timeout' },
'stream_error'
);
});

test('handles turn.failed events', async () => {
test('MCP-only error followed by stream close still fails (no terminal = failure)', async () => {
// The stream-incomplete fail-stop fires whenever the iterator closes
// without a terminal event \u2014 that's an SDK contract violation
// regardless of cause. But the captured error message does NOT carry
// the MCP-client text, since MCP errors are filtered from capture.
mockRunStreamed.mockResolvedValue({
events: (async function* () {
yield { type: 'error', message: 'MCP client transport closed' };
})(),
});

const chunks = [];
for await (const chunk of client.sendQuery('test', '/workspace')) {
chunks.push(chunk);
}

expect(chunks).toHaveLength(1);
expect(chunks[0]).toMatchObject({
type: 'result',
isError: true,
errorSubtype: 'codex_stream_incomplete',
});
const errors = (chunks[0] as { errors?: string[] }).errors;
expect(errors?.[0]).not.toContain('MCP client');
});

test('turn.failed yields result.isError with codex_turn_failed subtype', async () => {
mockRunStreamed.mockResolvedValue({
events: (async function* () {
yield { type: 'turn.failed', error: { message: 'Rate limit exceeded' } };
Expand All @@ -930,17 +987,21 @@ describe('CodexProvider', () => {
chunks.push(chunk);
}

expect(chunks).toHaveLength(1);
expect(chunks[0]).toEqual({
type: 'system',
content: '\u274C Turn failed: Rate limit exceeded',
type: 'result',
sessionId: 'new-thread-id',
isError: true,
errorSubtype: 'codex_turn_failed',
errors: ['Rate limit exceeded'],
});
expect(mockLogger.error).toHaveBeenCalledWith(
{ errorMessage: 'Rate limit exceeded' },
'turn_failed'
);
});

test('handles turn.failed without error message', async () => {
test('turn.failed without error message yields fail-stop with Unknown error', async () => {
mockRunStreamed.mockResolvedValue({
events: (async function* () {
yield { type: 'turn.failed', error: null };
Expand All @@ -952,16 +1013,45 @@ describe('CodexProvider', () => {
chunks.push(chunk);
}

expect(chunks).toHaveLength(1);
expect(chunks[0]).toEqual({
type: 'system',
content: '\u274C Turn failed: Unknown error',
type: 'result',
sessionId: 'new-thread-id',
isError: true,
errorSubtype: 'codex_turn_failed',
errors: ['Unknown error'],
});
expect(mockLogger.error).toHaveBeenCalledWith(
{ errorMessage: 'Unknown error' },
'turn_failed'
);
});

test('iterator that closes with zero events yields codex_stream_incomplete with default message', async () => {
// Bare-stream-close fallback: no error event, no terminal event,
// iterator just ends. Locks in the default message used when there is
// no captured non-MCP error to attribute the failure to.
mockRunStreamed.mockResolvedValue({
events: (async function* () {
// no events
})(),
});

const chunks = [];
for await (const chunk of client.sendQuery('test', '/workspace')) {
chunks.push(chunk);
}

expect(chunks).toHaveLength(1);
expect(chunks[0]).toEqual({
type: 'result',
sessionId: 'new-thread-id',
isError: true,
errorSubtype: 'codex_stream_incomplete',
errors: ['Codex stream closed without turn.completed or turn.failed'],
});
});

test('throws on runStreamed error', async () => {
const networkError = new Error('Network failure');
mockRunStreamed.mockRejectedValue(networkError);
Expand Down
45 changes: 41 additions & 4 deletions packages/providers/src/codex/provider.ts
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,13 @@ async function* streamCodexEvents(
const state: CodexStreamState = {};
let accumulatedText = '';

// If the iterator closes without a terminal event (e.g. the model was
// rejected before the turn even started), we synthesize a fail-stop result
// after the loop so the dag-executor's `msg.isError` branch catches it
// — matching Claude's contract. Both terminal branches below `return`,
// so reaching the post-loop block can only mean no terminal fired.
let lastNonMcpError: string | undefined;

for await (const event of events) {
if (abortSignal?.aborted) {
getLog().info('query_aborted_between_events');
Expand All @@ -213,8 +220,14 @@ async function* streamCodexEvents(
if (event.type === 'error') {
const errorEvent = event as { message: string };
getLog().error({ message: errorEvent.message }, 'stream_error');
// MCP client errors are non-fatal — Codex retries internally and may
// still reach turn.completed. Other errors are captured; whether they
// are fatal is decided when the stream terminates: turn.completed
// means the SDK recovered, so the captured error is dropped; loop
// closure without a terminal means the captured error caused the
// stream to abort and is surfaced as the failure cause.
if (!errorEvent.message.includes('MCP client')) {
yield { type: 'system', content: `⚠️ ${errorEvent.message}` };
lastNonMcpError = errorEvent.message;
}
continue;
}
Expand All @@ -223,8 +236,14 @@ async function* streamCodexEvents(
const errorObj = (event as { error?: { message?: string } }).error;
const errorMessage = errorObj?.message ?? 'Unknown error';
getLog().error({ errorMessage }, 'turn_failed');
yield { type: 'system', content: `❌ Turn failed: ${errorMessage}` };
break;
yield {
type: 'result',
sessionId: threadId ?? undefined,
isError: true,
errorSubtype: 'codex_turn_failed',
errors: [errorMessage],
};
return;
}

if (event.type === 'item.completed') {
Expand Down Expand Up @@ -419,9 +438,27 @@ async function* streamCodexEvents(
tokens: usage,
...(structuredOutput !== undefined ? { structuredOutput } : {}),
};
break;
return;
}
}

// Reaching here means the iterator closed without yielding turn.completed
// or turn.failed (both branches `return` immediately). Common cause: model
// rejected by the API (model not supported, auth refused) before the turn
// started. Surface as a fail-stop. The dag-executor's `msg.isError` branch
// (dag-executor.ts: throws `Node '<id>' failed: SDK returned <subtype>`)
// turns this into a thrown node failure — distinct from the empty-output
// guard further down, which returns `{ state: 'failed' }` for AI nodes
// that streamed nothing but never raised an isError.
const message = lastNonMcpError ?? 'Codex stream closed without turn.completed or turn.failed';
getLog().error({ message }, 'stream_incomplete');
yield {
type: 'result',
sessionId: threadId ?? undefined,
isError: true,
errorSubtype: 'codex_stream_incomplete',
errors: [message],
};
}

// ─── Error Classification & Retry ────────────────────────────────────────
Expand Down
1 change: 0 additions & 1 deletion packages/providers/src/community/pi/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
export { PI_CAPABILITIES } from './capabilities';
export { parsePiConfig, type PiProviderDefaults } from './config';
export { isPiModelCompatible, parsePiModelRef, type PiModelRef } from './model-ref';
export { PiProvider } from './provider';
export { registerPiProvider } from './registration';
Loading
Loading