fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout#1092
fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout#1092
Conversation
…t-event timeout (#1067) Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking non-overlapping keys into the Archon process — new stripCwdEnv() boot import removes them before any module reads env. (2) archon serve hardcoded skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting. (3) Claude SDK query had no first-event timeout, causing silent 30-min hangs when the subprocess wedges — new withFirstMessageTimeout wrapper races the first event against a configurable deadline (default 60s). Changes: - Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules - Import boot module as first import in CLI entry point - Remove skipPlatformAdapters: true from serve.ts - Add withFirstMessageTimeout + diagnostics to ClaudeClient - Add CLAUDECODE=1 nested-session warning to CLI - Add 9 unit tests (6 strip-cwd-env + 3 timeout) Fixes #1067 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (19)
📝 WalkthroughWalkthroughThis PR replaces the subprocess environment allowlist security model with CWD environment variable stripping at CLI boot time. It removes Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI Startup
participant Boot as `@archon/paths/strip-cwd-env-boot`
participant StripCwd as stripCwdEnv()
participant ProcessEnv as process.env
participant Dotenv as dotenv
participant App as Application Code
CLI->>Boot: Execute boot import (first import)
Boot->>StripCwd: Call stripCwdEnv()
StripCwd->>ProcessEnv: Remove CWD .env keys<br/>(from Bun auto-load)
StripCwd->>ProcessEnv: Remove CLAUDECODE marker<br/>& nested Claude Code vars
StripCwd->>ProcessEnv: Delete NODE_OPTIONS,<br/>VSCODE_INSPECTOR_OPTIONS
StripCwd-->>Boot: Return (side-effect complete)
Boot-->>CLI: Boot module loaded
CLI->>Dotenv: Load ~/.archon/.env<br/>with override: true
Dotenv->>ProcessEnv: Merge archon config<br/>(wins over inherited vars)
Dotenv-->>CLI: Env loaded
CLI->>App: Import remaining modules<br/>(read sanitized process.env)
App-->>CLI: Ready
sequenceDiagram
participant Client as Claude Client
participant Query as query() SDK
participant Timeout as withFirstMessageTimeout()
participant Gen as AsyncGenerator
participant Timer as setTimeout()
participant Subprocess as Claude Subprocess
Client->>Timeout: Call with timeout config<br/>(60s default)
Timeout->>Timer: Start race timer
Timeout->>Gen: Call gen.next()
Gen->>Subprocess: Spawn subprocess
Subprocess-->>Gen: First event arrives
Gen-->>Timeout: Yield event
Timeout->>Timer: Clear timeout (first event received)
Timeout-->>Client: Return event
Note over Timeout: If timeout fires first:
Timeout->>Subprocess: Abort controller
Subprocess-->>Subprocess: Cleanup
Timeout-->>Client: Throw FirstEventTimeoutError<br/>+ log diagnostics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Comprehensive PR ReviewPR: #1092 — fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout SummaryThis PR correctly fixes three real bugs from issue #1067. The implementation is focused and well-reasoned — the Verdict:
🟠 High Issues (Should Fix Before Merge)HIGH-1: Dangling
|
| # | Issue | Location | Suggestion |
|---|---|---|---|
| L1 | Happy-path test timeoutMs: 5000 leaves a dangling timer |
claude.test.ts:1113 |
Fixed automatically by HIGH-1 fix |
| L2 | controller.signal.aborted not asserted in timeout test |
claude.test.ts |
Add expect(controller.signal.aborted).toBe(true) |
| L3 | firstValue.done === true branch untested |
claude.ts:303 |
Add test with an immediately-completing generator |
| L4 | Distinct keys across multiple .env files not tested |
strip-cwd-env.test.ts |
Add test with KEY_A in .env and KEY_B in .env.local |
| L5 | getFirstEventTimeoutMs env-var override path untested |
claude.ts:233-241 |
Optional — function is private; risk is low |
| L6 | stripCwdEnv JSDoc refers to override: true the CLI no longer uses |
strip-cwd-env.ts:24 |
Change to "loaded afterward by each entry point" |
| L7 | .claude/rules/cli.md Startup Behavior still says override: true |
.claude/rules/cli.md |
Update to reflect new boot sequence |
| L8 | ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING undocumented |
configuration.md |
Add to Core env var table |
| L9 | CLAUDE.md @archon/paths description incomplete |
CLAUDE.md |
Add stripCwdEnv/strip-cwd-env-boot; note dotenv as allowed external dep |
| L10 | No troubleshooting entry for nested Claude Code session hang | troubleshooting.md |
Add section with CLAUDECODE=1 warning + workaround env vars |
✅ What's Good
withFirstMessageTimeouttimeout path is exemplary: callscontroller.abort(), emits structuredlog.errorwith full diagnostic payload (env key names but not values), and throws with the GitHub issue URL for discoverability. This is exactly CLAUDE.md's "fail fast + explicit errors" pattern.processEnv: {}trick for safe key collection: parses dotenv files without writing toprocess.env, then explicitly deletes only the matched keys. Clean separation of discovery and deletion.__timeout__sentinel pattern: simple and effective way to distinguish timeout from real generator errors without adding an extra error class.- Boot import ordering:
@archon/paths/strip-cwd-env-bootis confirmed as the very first import incli.ts(line 12), beforeparseArgs,config,resolve, orexistsSync. Critical correctness constraint met. strip-cwd-env.test.tsbehavioral coverage: 6 tests covering malformed lines, missing file, multi-file, key-not-in-env no-op, and preservation of non-CWD keys — all testing observable outcomes.- One-line
serve.tsfix: removingskipPlatformAdapters: trueis clean with no side effects since adapters self-gate on token presence. getFirstEventTimeoutMs()defensive parsing: validatesNumber.isFinite(parsed) && parsed > 0; safe fallback to 60s default on invalid input.ARCHON_SUPPRESS_NESTED_CLAUDE_WARNINGescape hatch: well-named, follows existing env-var naming conventions, and the warning message itself advertises it.
Suggested Follow-up Issues
| Title | Priority |
|---|---|
"Fix CWD .env leak in direct bun dev:server path (server/src/index.ts)" |
P2 (if MEDIUM-1 not fixed here) |
| "Add troubleshooting guide for nested Claude Code session hang (CLAUDECODE=1 warning)" | P3 |
Reviewed by Archon comprehensive-pr-review workflow — 5 specialized agents
Full artifacts: ~/.archon/workspaces/coleam00/Archon/artifacts/runs/13050cf1201ed061a931279a5a35f648/review/
Fixed: - Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1) - Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1) - Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2) - Update stale configuration.md docs for new env-loading mechanism (HIGH-2) - Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs - Add nested Claude Code hang troubleshooting entry - Fix boot module JSDoc: "CLI and server" → "CLI" only - Fix stripCwdEnv JSDoc: remove stale "override: true" reference - Update .claude/rules/cli.md startup behavior section - Update CLAUDE.md @archon/paths description with new exports Tests added: - Assert controller.signal.aborted on timeout - Handle generator that completes immediately without yielding - Strip distinct keys from different .env files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
⚡ Self-Fix Report (Aggressive)Status: COMPLETE Fixes Applied (15 total)
View all fixes
Tests Added
Skipped (0)(none — all findings addressed) Validation✅ Type check | ✅ Lint | ✅ Tests (all pass) Self-fix by Archon · aggressive mode · fixes pushed to |
…MessageTimeout Replace the '__timeout__' string sentinel used to identify timeout rejections with a dedicated FirstEventTimeoutError class. instanceof checks are more explicit and robust than string comparison on error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Archon PR Validation ReportVerdict: APPROVE SummaryAll three bugs (CWD Bug Confirmation
IssuesNo blocking issues found. What's Done Well
Validated by archon-validate-pr workflow |
Review from #1068 / #1071 authorsWe authored the original PRs (#1068 and #1071) that this consolidates. The consolidation is solid — We'll take over from here and push fixes for the issues below directly to this branch. Issues to address
|
…marker strip, tests 1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3) 2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK bypasses the env option and uses process.env directly (#1097) 3. Add CLAUDECODE=1 warning to server entry point (was only in CLI) 4. Add diagnostic payload content test for withFirstMessageTimeout 5. Integrate #1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS from process.env at entry point. Pattern-matched on CLAUDE_CODE_* prefix rather than hardcoding 6 names, so future Claude Code markers are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved. Root cause per #1097: the Claude Agent SDK leaks process.env into the spawned child regardless of the explicit env option, so the only way to prevent the nested-session deadlock is to delete the markers from process.env at the entry point. Validation: bun run validate passes, 125 paths tests (6 new marker tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak verified stripped (target repo .env DATABASE_URL does not affect Archon DB selection).
…only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see #1097) - It attracted a constant stream of PRs adding keys (#1060, #1093, #1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless.
The integration tests caught a real issue: without override:true, the ~/.archon/.env load doesn't win over shell-inherited env vars. If the user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000, the user expects Archon to use 3000. stripCwdEnv() handles CWD .env files (untrusted). override:true handles shell-inherited vars (trusted but less specific than ~/.archon/.env). Different concerns, both needed. Also adds 6 integration tests covering the full entry-point flow: 1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped 2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept 3. General leak test — nothing from CWD reaches subprocess 4. Same key in both CWD and archon — archon value wins 5. CLAUDECODE markers stripped even when not from CWD .env 6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip
PR Review Summary — Multi-Agent ReviewReviewed by 6 specialized agents: code-reviewer, docs-impact, test-analyzer, silent-failure-hunter, type-design-analyzer, code-simplifier. Critical Issues (2 found)
Important Issues (5 found)
Suggestions (8 found)
Strengths
Documentation Issues
Verdict: NEEDS FIXESTwo critical issues must be addressed before merge:
Recommended Actions
|
…uth logic Review findings addressed: 1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE from process.env before the warning check in cli.ts/server/index.ts could fire. Moved the warning into stripCwdEnv() itself, emitted BEFORE the deletion. Removed duplicate warning code from both entry points. 2. useGlobalAuth token stripping removed (intentional, not regression) — the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when useGlobalAuth=true. Per design discussion: the user controls ~/.archon/.env and all keys they set are intentional. If they want global auth, they just don't set tokens. Simplified buildSubprocessEnv to log auth mode for diagnostics only, no filtering. 3. Docs "no override needed" corrected — cli.md and configuration.md now reflect the actual code (override: true).
…t timeout (coleam00#1067, coleam00#1030, coleam00#1098, coleam00#1070) * fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout (coleam00#1067) Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking non-overlapping keys into the Archon process — new stripCwdEnv() boot import removes them before any module reads env. (2) archon serve hardcoded skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting. (3) Claude SDK query had no first-event timeout, causing silent 30-min hangs when the subprocess wedges — new withFirstMessageTimeout wrapper races the first event against a configurable deadline (default 60s). Changes: - Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules - Import boot module as first import in CLI entry point - Remove skipPlatformAdapters: true from serve.ts - Add withFirstMessageTimeout + diagnostics to ClaudeClient - Add CLAUDECODE=1 nested-session warning to CLI - Add 9 unit tests (6 strip-cwd-env + 3 timeout) Fixes coleam00#1067 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings for PR coleam00#1092 Fixed: - Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1) - Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1) - Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2) - Update stale configuration.md docs for new env-loading mechanism (HIGH-2) - Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs - Add nested Claude Code hang troubleshooting entry - Fix boot module JSDoc: "CLI and server" → "CLI" only - Fix stripCwdEnv JSDoc: remove stale "override: true" reference - Update .claude/rules/cli.md startup behavior section - Update CLAUDE.md @archon/paths description with new exports Tests added: - Assert controller.signal.aborted on timeout - Handle generator that completes immediately without yielding - Strip distinct keys from different .env files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * simplify: replace string sentinel with typed error class in withFirstMessageTimeout Replace the '__timeout__' string sentinel used to identify timeout rejections with a dedicated FirstEventTimeoutError class. instanceof checks are more explicit and robust than string comparison on error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings — dotenv version, docs, server warning, marker strip, tests 1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3) 2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK bypasses the env option and uses process.env directly (coleam00#1097) 3. Add CLAUDECODE=1 warning to server entry point (was only in CLI) 4. Add diagnostic payload content test for withFirstMessageTimeout 5. Integrate coleam00#1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS from process.env at entry point. Pattern-matched on CLAUDE_CODE_* prefix rather than hardcoding 6 names, so future Claude Code markers are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved. Root cause per coleam00#1097: the Claude Agent SDK leaks process.env into the spawned child regardless of the explicit env option, so the only way to prevent the nested-session deadlock is to delete the markers from process.env at the entry point. Validation: bun run validate passes, 125 paths tests (6 new marker tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak verified stripped (target repo .env DATABASE_URL does not affect Archon DB selection). * refactor: remove SUBPROCESS_ENV_ALLOWLIST — trust user config, strip only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see coleam00#1097) - It attracted a constant stream of PRs adding keys (coleam00#1060, coleam00#1093, coleam00#1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless. * fix: restore override:true for archon env, add integration tests The integration tests caught a real issue: without override:true, the ~/.archon/.env load doesn't win over shell-inherited env vars. If the user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000, the user expects Archon to use 3000. stripCwdEnv() handles CWD .env files (untrusted). override:true handles shell-inherited vars (trusted but less specific than ~/.archon/.env). Different concerns, both needed. Also adds 6 integration tests covering the full entry-point flow: 1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped 2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept 3. General leak test — nothing from CWD reaches subprocess 4. Same key in both CWD and archon — archon value wins 5. CLAUDECODE markers stripped even when not from CWD .env 6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip * test: add DATABASE_URL leak scenarios to env integration tests * fix: move CLAUDECODE warning into stripCwdEnv, remove dead useGlobalAuth logic Review findings addressed: 1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE from process.env before the warning check in cli.ts/server/index.ts could fire. Moved the warning into stripCwdEnv() itself, emitted BEFORE the deletion. Removed duplicate warning code from both entry points. 2. useGlobalAuth token stripping removed (intentional, not regression) — the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when useGlobalAuth=true. Per design discussion: the user controls ~/.archon/.env and all keys they set are intentional. If they want global auth, they just don't set tokens. Simplified buildSubprocessEnv to log auth mode for diagnostics only, no filtering. 3. Docs "no override needed" corrected — cli.md and configuration.md now reflect the actual code (override: true). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
…t timeout (coleam00#1067, coleam00#1030, coleam00#1098, coleam00#1070) * fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout (coleam00#1067) Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking non-overlapping keys into the Archon process — new stripCwdEnv() boot import removes them before any module reads env. (2) archon serve hardcoded skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting. (3) Claude SDK query had no first-event timeout, causing silent 30-min hangs when the subprocess wedges — new withFirstMessageTimeout wrapper races the first event against a configurable deadline (default 60s). Changes: - Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules - Import boot module as first import in CLI entry point - Remove skipPlatformAdapters: true from serve.ts - Add withFirstMessageTimeout + diagnostics to ClaudeClient - Add CLAUDECODE=1 nested-session warning to CLI - Add 9 unit tests (6 strip-cwd-env + 3 timeout) Fixes coleam00#1067 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings for PR coleam00#1092 Fixed: - Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1) - Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1) - Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2) - Update stale configuration.md docs for new env-loading mechanism (HIGH-2) - Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs - Add nested Claude Code hang troubleshooting entry - Fix boot module JSDoc: "CLI and server" → "CLI" only - Fix stripCwdEnv JSDoc: remove stale "override: true" reference - Update .claude/rules/cli.md startup behavior section - Update CLAUDE.md @archon/paths description with new exports Tests added: - Assert controller.signal.aborted on timeout - Handle generator that completes immediately without yielding - Strip distinct keys from different .env files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * simplify: replace string sentinel with typed error class in withFirstMessageTimeout Replace the '__timeout__' string sentinel used to identify timeout rejections with a dedicated FirstEventTimeoutError class. instanceof checks are more explicit and robust than string comparison on error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings — dotenv version, docs, server warning, marker strip, tests 1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3) 2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK bypasses the env option and uses process.env directly (coleam00#1097) 3. Add CLAUDECODE=1 warning to server entry point (was only in CLI) 4. Add diagnostic payload content test for withFirstMessageTimeout 5. Integrate coleam00#1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS from process.env at entry point. Pattern-matched on CLAUDE_CODE_* prefix rather than hardcoding 6 names, so future Claude Code markers are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved. Root cause per coleam00#1097: the Claude Agent SDK leaks process.env into the spawned child regardless of the explicit env option, so the only way to prevent the nested-session deadlock is to delete the markers from process.env at the entry point. Validation: bun run validate passes, 125 paths tests (6 new marker tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak verified stripped (target repo .env DATABASE_URL does not affect Archon DB selection). * refactor: remove SUBPROCESS_ENV_ALLOWLIST — trust user config, strip only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see coleam00#1097) - It attracted a constant stream of PRs adding keys (coleam00#1060, coleam00#1093, coleam00#1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless. * fix: restore override:true for archon env, add integration tests The integration tests caught a real issue: without override:true, the ~/.archon/.env load doesn't win over shell-inherited env vars. If the user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000, the user expects Archon to use 3000. stripCwdEnv() handles CWD .env files (untrusted). override:true handles shell-inherited vars (trusted but less specific than ~/.archon/.env). Different concerns, both needed. Also adds 6 integration tests covering the full entry-point flow: 1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped 2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept 3. General leak test — nothing from CWD reaches subprocess 4. Same key in both CWD and archon — archon value wins 5. CLAUDECODE markers stripped even when not from CWD .env 6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip * test: add DATABASE_URL leak scenarios to env integration tests * fix: move CLAUDECODE warning into stripCwdEnv, remove dead useGlobalAuth logic Review findings addressed: 1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE from process.env before the warning check in cli.ts/server/index.ts could fire. Moved the warning into stripCwdEnv() itself, emitted BEFORE the deletion. Removed duplicate warning code from both entry points. 2. useGlobalAuth token stripping removed (intentional, not regression) — the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when useGlobalAuth=true. Per design discussion: the user controls ~/.archon/.env and all keys they set are intentional. If they want global auth, they just don't set tokens. Simplified buildSubprocessEnv to log auth mode for diagnostics only, no filtering. 3. Docs "no override needed" corrected — cli.md and configuration.md now reflect the actual code (override: true). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
…t timeout (coleam00#1067, coleam00#1030, coleam00#1098, coleam00#1070) * fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout (coleam00#1067) Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking non-overlapping keys into the Archon process — new stripCwdEnv() boot import removes them before any module reads env. (2) archon serve hardcoded skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting. (3) Claude SDK query had no first-event timeout, causing silent 30-min hangs when the subprocess wedges — new withFirstMessageTimeout wrapper races the first event against a configurable deadline (default 60s). Changes: - Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules - Import boot module as first import in CLI entry point - Remove skipPlatformAdapters: true from serve.ts - Add withFirstMessageTimeout + diagnostics to ClaudeClient - Add CLAUDECODE=1 nested-session warning to CLI - Add 9 unit tests (6 strip-cwd-env + 3 timeout) Fixes coleam00#1067 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings for PR coleam00#1092 Fixed: - Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1) - Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1) - Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2) - Update stale configuration.md docs for new env-loading mechanism (HIGH-2) - Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs - Add nested Claude Code hang troubleshooting entry - Fix boot module JSDoc: "CLI and server" → "CLI" only - Fix stripCwdEnv JSDoc: remove stale "override: true" reference - Update .claude/rules/cli.md startup behavior section - Update CLAUDE.md @archon/paths description with new exports Tests added: - Assert controller.signal.aborted on timeout - Handle generator that completes immediately without yielding - Strip distinct keys from different .env files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * simplify: replace string sentinel with typed error class in withFirstMessageTimeout Replace the '__timeout__' string sentinel used to identify timeout rejections with a dedicated FirstEventTimeoutError class. instanceof checks are more explicit and robust than string comparison on error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings — dotenv version, docs, server warning, marker strip, tests 1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3) 2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK bypasses the env option and uses process.env directly (coleam00#1097) 3. Add CLAUDECODE=1 warning to server entry point (was only in CLI) 4. Add diagnostic payload content test for withFirstMessageTimeout 5. Integrate coleam00#1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS from process.env at entry point. Pattern-matched on CLAUDE_CODE_* prefix rather than hardcoding 6 names, so future Claude Code markers are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved. Root cause per coleam00#1097: the Claude Agent SDK leaks process.env into the spawned child regardless of the explicit env option, so the only way to prevent the nested-session deadlock is to delete the markers from process.env at the entry point. Validation: bun run validate passes, 125 paths tests (6 new marker tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak verified stripped (target repo .env DATABASE_URL does not affect Archon DB selection). * refactor: remove SUBPROCESS_ENV_ALLOWLIST — trust user config, strip only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see coleam00#1097) - It attracted a constant stream of PRs adding keys (coleam00#1060, coleam00#1093, coleam00#1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless. * fix: restore override:true for archon env, add integration tests The integration tests caught a real issue: without override:true, the ~/.archon/.env load doesn't win over shell-inherited env vars. If the user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000, the user expects Archon to use 3000. stripCwdEnv() handles CWD .env files (untrusted). override:true handles shell-inherited vars (trusted but less specific than ~/.archon/.env). Different concerns, both needed. Also adds 6 integration tests covering the full entry-point flow: 1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped 2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept 3. General leak test — nothing from CWD reaches subprocess 4. Same key in both CWD and archon — archon value wins 5. CLAUDECODE markers stripped even when not from CWD .env 6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip * test: add DATABASE_URL leak scenarios to env integration tests * fix: move CLAUDECODE warning into stripCwdEnv, remove dead useGlobalAuth logic Review findings addressed: 1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE from process.env before the warning check in cli.ts/server/index.ts could fire. Moved the warning into stripCwdEnv() itself, emitted BEFORE the deletion. Removed duplicate warning code from both entry points. 2. useGlobalAuth token stripping removed (intentional, not regression) — the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when useGlobalAuth=true. Per design discussion: the user controls ~/.archon/.env and all keys they set are intentional. If they want global auth, they just don't set tokens. Simplified buildSubprocessEnv to log auth mode for diagnostics only, no filtering. 3. Docs "no override needed" corrected — cli.md and configuration.md now reflect the actual code (override: true). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
Summary
.envbefore user code — non-overlapping keys from target repo leak into Archon process even after theoverride: truepartial fix; (2)archon servehardcodesskipPlatformAdapters: true, silently preventing all platform adapters (Telegram, Discord, Slack, GitHub) from ever starting; (3) No first-event timeout on Claude SDK query — subprocess wedge causes silent 30-min hang atdag_node_started.LOG_LEVEL, leaked tokens) when running Archon from within a target repo. Bug 2 makesarchon servepermanently Web-only regardless of token configuration. Bug 3 gives no actionable error when the Claude subprocess fails to start.stripCwdEnv()boot utility in@archon/pathsthat strips Bun-auto-loaded CWD env keys before any module readsprocess.env; removed the one-lineskipPlatformAdapters: truehardcode fromserve.ts; addedwithFirstMessageTimeoutwrapper around thequery()call inClaudeClientwith configurable timeout and structured diagnostics; addedCLAUDECODE=1nested-session warning to CLI.~/.archon/.envloading logic; all existing env-leak-gate /SUBPROCESS_ENV_ALLOWLISTmechanisms.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
cli.ts@archon/paths/strip-cwd-env-bootstrip-cwd-env-boot.tsstrip-cwd-env.tsserve.tsstartServer()skipPlatformAdapters: trueclaude.tswithFirstMessageTimeoutquery()generatorclaude.tsbuildFirstEventHangDiagnosticsLabel Snapshot
risk: lowsize: Scli,paths,corecli:boot,paths:env,core:claude-clientChange Metadata
bugmultiLinked Issue
Validation Evidence (required)
bun run validaterun — type-check, lint, format, tests all passSecurity Impact (required)
stripCwdEnv()removes CWD.envkeys fromprocess.env(security improvement: prevents target-repo tokens from leaking into Archon process and subprocesses)stripCwdEnv()only reads (parses without writing) CWD.envfilesdotenv.config({ processEnv: {} })to parse without re-contaminating; only removes keys appearing in CWD.envfiles, never touches~/.archon/.envkeysCompatibility / Migration
ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS(default: 60000ms) andARCHON_SUPPRESS_NESTED_CLAUDE_WARNINGHuman Verification (required)
bun run validatesuite (type-check, lint, format, tests) passed with 0 failuresstripCwdEnv()handles missing files, malformed lines, keys absent fromprocess.env;withFirstMessageTimeouttested for normal completion, stuck generator timeout, and error message contentarchon servewith actual Telegram/Slack tokens; actual subprocess hang scenario (environmental — cannot reproduce deterministically in CI)Side Effects / Blast Radius (required)
archon serveplatform adapter startup,ClaudeClientquery loopstripCwdEnv()runs before any module init — if a user intentionally relied on CWD.envkeys in the Archon process, they would need to move those keys to~/.archon/.env(documented intended behavior)ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MSallows tuning or effectively disabling the timeout;claude.first_event_timeoutlog event provides structured diagnostics for hang diagnosisRollback Plan (required)
git revert dcd392f3ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS(set very high to effectively disable timeout);ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING(suppress nested session warning)stripCwdEnv()regresses, users see~/.archon/.envkeys missing (auth failures); if platform adapter regression,no_platform_adapters_configuredin logs despite tokens being setRisks and Mitigations
stripCwdEnv()might remove keys user intended Archon process to see.envfiles Bun auto-loads.~/.archon/.envloads after, so Archon config keys are always present. Documented in code comments.ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MSenv var allows increasing the timeout. Default 60s is well above the normal 2-3s startup time observed in practice.Issues resolved
PRs superseded (close on merge)
Related (not fixed here)
Credits
process.envinto the spawned child regardless of the explicitenvoption, which was the key insight enabling the allowlist removal and the correct nested-session fixSummary by CodeRabbit
New Features
ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS).ARCHON_SUPPRESS_NESTED_CLAUDE_WARNINGto suppress warnings in specific environments.Bug Fixes
Documentation