fix(core): add AWS env vars to subprocess allowlist for Bedrock auth#1060
fix(core): add AWS env vars to subprocess allowlist for Bedrock auth#1060liorfranko wants to merge 1 commit intocoleam00:devfrom
Conversation
AWS credentials (AWS_PROFILE, AWS_REGION, AWS_ACCESS_KEY_ID, etc.) were stripped from Claude Code subprocesses by the env allowlist, causing 401 auth failures when using Bedrock. The allowlist already included CLAUDE_CODE_USE_BEDROCK but not the AWS vars it depends on. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe subprocess environment allowlist was extended to include six AWS-related environment variables ( Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/core/src/utils/env-allowlist.ts (1)
37-43: Add a regression test for required Bedrock AWS keys in the allowlist.Given this was a production auth break, a small unit test asserting these keys are preserved by
buildCleanSubprocessEnv()would prevent accidental regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/utils/env-allowlist.ts` around lines 37 - 43, Add a unit test that asserts buildCleanSubprocessEnv() preserves the required Bedrock AWS keys in the allowlist: 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_SESSION_TOKEN', 'AWS_PROFILE', 'AWS_REGION', and 'AWS_DEFAULT_REGION'. Mock or supply an env object containing these keys with sample values, call buildCleanSubprocessEnv(env) and assert each of the listed AWS keys exists and keeps its value in the returned object; place the test alongside existing env-related tests and use the same test helpers/assertion style as the repo.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/core/src/utils/env-allowlist.ts`:
- Around line 37-43: Add a unit test that asserts buildCleanSubprocessEnv()
preserves the required Bedrock AWS keys in the allowlist: 'AWS_ACCESS_KEY_ID',
'AWS_SECRET_ACCESS_KEY', 'AWS_SESSION_TOKEN', 'AWS_PROFILE', 'AWS_REGION', and
'AWS_DEFAULT_REGION'. Mock or supply an env object containing these keys with
sample values, call buildCleanSubprocessEnv(env) and assert each of the listed
AWS keys exists and keeps its value in the returned object; place the test
alongside existing env-related tests and use the same test helpers/assertion
style as the repo.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b8fe2a73-d23f-40bc-95b1-4af04559d69f
📒 Files selected for processing (1)
packages/core/src/utils/env-allowlist.ts
…only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see #1097) - It attracted a constant stream of PRs adding keys (#1060, #1093, #1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless.
|
Addressed by the allowlist removal in #1092 — |
…t timeout (#1067, #1030, #1098, #1070) * fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout (#1067) Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking non-overlapping keys into the Archon process — new stripCwdEnv() boot import removes them before any module reads env. (2) archon serve hardcoded skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting. (3) Claude SDK query had no first-event timeout, causing silent 30-min hangs when the subprocess wedges — new withFirstMessageTimeout wrapper races the first event against a configurable deadline (default 60s). Changes: - Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules - Import boot module as first import in CLI entry point - Remove skipPlatformAdapters: true from serve.ts - Add withFirstMessageTimeout + diagnostics to ClaudeClient - Add CLAUDECODE=1 nested-session warning to CLI - Add 9 unit tests (6 strip-cwd-env + 3 timeout) Fixes #1067 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings for PR #1092 Fixed: - Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1) - Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1) - Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2) - Update stale configuration.md docs for new env-loading mechanism (HIGH-2) - Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs - Add nested Claude Code hang troubleshooting entry - Fix boot module JSDoc: "CLI and server" → "CLI" only - Fix stripCwdEnv JSDoc: remove stale "override: true" reference - Update .claude/rules/cli.md startup behavior section - Update CLAUDE.md @archon/paths description with new exports Tests added: - Assert controller.signal.aborted on timeout - Handle generator that completes immediately without yielding - Strip distinct keys from different .env files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * simplify: replace string sentinel with typed error class in withFirstMessageTimeout Replace the '__timeout__' string sentinel used to identify timeout rejections with a dedicated FirstEventTimeoutError class. instanceof checks are more explicit and robust than string comparison on error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings — dotenv version, docs, server warning, marker strip, tests 1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3) 2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK bypasses the env option and uses process.env directly (#1097) 3. Add CLAUDECODE=1 warning to server entry point (was only in CLI) 4. Add diagnostic payload content test for withFirstMessageTimeout 5. Integrate #1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS from process.env at entry point. Pattern-matched on CLAUDE_CODE_* prefix rather than hardcoding 6 names, so future Claude Code markers are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved. Root cause per #1097: the Claude Agent SDK leaks process.env into the spawned child regardless of the explicit env option, so the only way to prevent the nested-session deadlock is to delete the markers from process.env at the entry point. Validation: bun run validate passes, 125 paths tests (6 new marker tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak verified stripped (target repo .env DATABASE_URL does not affect Archon DB selection). * refactor: remove SUBPROCESS_ENV_ALLOWLIST — trust user config, strip only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see #1097) - It attracted a constant stream of PRs adding keys (#1060, #1093, #1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless. * fix: restore override:true for archon env, add integration tests The integration tests caught a real issue: without override:true, the ~/.archon/.env load doesn't win over shell-inherited env vars. If the user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000, the user expects Archon to use 3000. stripCwdEnv() handles CWD .env files (untrusted). override:true handles shell-inherited vars (trusted but less specific than ~/.archon/.env). Different concerns, both needed. Also adds 6 integration tests covering the full entry-point flow: 1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped 2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept 3. General leak test — nothing from CWD reaches subprocess 4. Same key in both CWD and archon — archon value wins 5. CLAUDECODE markers stripped even when not from CWD .env 6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip * test: add DATABASE_URL leak scenarios to env integration tests * fix: move CLAUDECODE warning into stripCwdEnv, remove dead useGlobalAuth logic Review findings addressed: 1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE from process.env before the warning check in cli.ts/server/index.ts could fire. Moved the warning into stripCwdEnv() itself, emitted BEFORE the deletion. Removed duplicate warning code from both entry points. 2. useGlobalAuth token stripping removed (intentional, not regression) — the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when useGlobalAuth=true. Per design discussion: the user controls ~/.archon/.env and all keys they set are intentional. If they want global auth, they just don't set tokens. Simplified buildSubprocessEnv to log auth mode for diagnostics only, no filtering. 3. Docs "no override needed" corrected — cli.md and configuration.md now reflect the actual code (override: true). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
|
Resolved by #1092 (allowlist removed — all AWS Bedrock vars pass through automatically). |
…t timeout (coleam00#1067, coleam00#1030, coleam00#1098, coleam00#1070) * fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout (coleam00#1067) Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking non-overlapping keys into the Archon process — new stripCwdEnv() boot import removes them before any module reads env. (2) archon serve hardcoded skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting. (3) Claude SDK query had no first-event timeout, causing silent 30-min hangs when the subprocess wedges — new withFirstMessageTimeout wrapper races the first event against a configurable deadline (default 60s). Changes: - Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules - Import boot module as first import in CLI entry point - Remove skipPlatformAdapters: true from serve.ts - Add withFirstMessageTimeout + diagnostics to ClaudeClient - Add CLAUDECODE=1 nested-session warning to CLI - Add 9 unit tests (6 strip-cwd-env + 3 timeout) Fixes coleam00#1067 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings for PR coleam00#1092 Fixed: - Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1) - Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1) - Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2) - Update stale configuration.md docs for new env-loading mechanism (HIGH-2) - Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs - Add nested Claude Code hang troubleshooting entry - Fix boot module JSDoc: "CLI and server" → "CLI" only - Fix stripCwdEnv JSDoc: remove stale "override: true" reference - Update .claude/rules/cli.md startup behavior section - Update CLAUDE.md @archon/paths description with new exports Tests added: - Assert controller.signal.aborted on timeout - Handle generator that completes immediately without yielding - Strip distinct keys from different .env files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * simplify: replace string sentinel with typed error class in withFirstMessageTimeout Replace the '__timeout__' string sentinel used to identify timeout rejections with a dedicated FirstEventTimeoutError class. instanceof checks are more explicit and robust than string comparison on error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings — dotenv version, docs, server warning, marker strip, tests 1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3) 2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK bypasses the env option and uses process.env directly (coleam00#1097) 3. Add CLAUDECODE=1 warning to server entry point (was only in CLI) 4. Add diagnostic payload content test for withFirstMessageTimeout 5. Integrate coleam00#1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS from process.env at entry point. Pattern-matched on CLAUDE_CODE_* prefix rather than hardcoding 6 names, so future Claude Code markers are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved. Root cause per coleam00#1097: the Claude Agent SDK leaks process.env into the spawned child regardless of the explicit env option, so the only way to prevent the nested-session deadlock is to delete the markers from process.env at the entry point. Validation: bun run validate passes, 125 paths tests (6 new marker tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak verified stripped (target repo .env DATABASE_URL does not affect Archon DB selection). * refactor: remove SUBPROCESS_ENV_ALLOWLIST — trust user config, strip only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see coleam00#1097) - It attracted a constant stream of PRs adding keys (coleam00#1060, coleam00#1093, coleam00#1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless. * fix: restore override:true for archon env, add integration tests The integration tests caught a real issue: without override:true, the ~/.archon/.env load doesn't win over shell-inherited env vars. If the user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000, the user expects Archon to use 3000. stripCwdEnv() handles CWD .env files (untrusted). override:true handles shell-inherited vars (trusted but less specific than ~/.archon/.env). Different concerns, both needed. Also adds 6 integration tests covering the full entry-point flow: 1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped 2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept 3. General leak test — nothing from CWD reaches subprocess 4. Same key in both CWD and archon — archon value wins 5. CLAUDECODE markers stripped even when not from CWD .env 6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip * test: add DATABASE_URL leak scenarios to env integration tests * fix: move CLAUDECODE warning into stripCwdEnv, remove dead useGlobalAuth logic Review findings addressed: 1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE from process.env before the warning check in cli.ts/server/index.ts could fire. Moved the warning into stripCwdEnv() itself, emitted BEFORE the deletion. Removed duplicate warning code from both entry points. 2. useGlobalAuth token stripping removed (intentional, not regression) — the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when useGlobalAuth=true. Per design discussion: the user controls ~/.archon/.env and all keys they set are intentional. If they want global auth, they just don't set tokens. Simplified buildSubprocessEnv to log auth mode for diagnostics only, no filtering. 3. Docs "no override needed" corrected — cli.md and configuration.md now reflect the actual code (override: true). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
…t timeout (coleam00#1067, coleam00#1030, coleam00#1098, coleam00#1070) * fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout (coleam00#1067) Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking non-overlapping keys into the Archon process — new stripCwdEnv() boot import removes them before any module reads env. (2) archon serve hardcoded skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting. (3) Claude SDK query had no first-event timeout, causing silent 30-min hangs when the subprocess wedges — new withFirstMessageTimeout wrapper races the first event against a configurable deadline (default 60s). Changes: - Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules - Import boot module as first import in CLI entry point - Remove skipPlatformAdapters: true from serve.ts - Add withFirstMessageTimeout + diagnostics to ClaudeClient - Add CLAUDECODE=1 nested-session warning to CLI - Add 9 unit tests (6 strip-cwd-env + 3 timeout) Fixes coleam00#1067 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings for PR coleam00#1092 Fixed: - Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1) - Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1) - Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2) - Update stale configuration.md docs for new env-loading mechanism (HIGH-2) - Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs - Add nested Claude Code hang troubleshooting entry - Fix boot module JSDoc: "CLI and server" → "CLI" only - Fix stripCwdEnv JSDoc: remove stale "override: true" reference - Update .claude/rules/cli.md startup behavior section - Update CLAUDE.md @archon/paths description with new exports Tests added: - Assert controller.signal.aborted on timeout - Handle generator that completes immediately without yielding - Strip distinct keys from different .env files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * simplify: replace string sentinel with typed error class in withFirstMessageTimeout Replace the '__timeout__' string sentinel used to identify timeout rejections with a dedicated FirstEventTimeoutError class. instanceof checks are more explicit and robust than string comparison on error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings — dotenv version, docs, server warning, marker strip, tests 1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3) 2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK bypasses the env option and uses process.env directly (coleam00#1097) 3. Add CLAUDECODE=1 warning to server entry point (was only in CLI) 4. Add diagnostic payload content test for withFirstMessageTimeout 5. Integrate coleam00#1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS from process.env at entry point. Pattern-matched on CLAUDE_CODE_* prefix rather than hardcoding 6 names, so future Claude Code markers are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved. Root cause per coleam00#1097: the Claude Agent SDK leaks process.env into the spawned child regardless of the explicit env option, so the only way to prevent the nested-session deadlock is to delete the markers from process.env at the entry point. Validation: bun run validate passes, 125 paths tests (6 new marker tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak verified stripped (target repo .env DATABASE_URL does not affect Archon DB selection). * refactor: remove SUBPROCESS_ENV_ALLOWLIST — trust user config, strip only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see coleam00#1097) - It attracted a constant stream of PRs adding keys (coleam00#1060, coleam00#1093, coleam00#1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless. * fix: restore override:true for archon env, add integration tests The integration tests caught a real issue: without override:true, the ~/.archon/.env load doesn't win over shell-inherited env vars. If the user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000, the user expects Archon to use 3000. stripCwdEnv() handles CWD .env files (untrusted). override:true handles shell-inherited vars (trusted but less specific than ~/.archon/.env). Different concerns, both needed. Also adds 6 integration tests covering the full entry-point flow: 1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped 2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept 3. General leak test — nothing from CWD reaches subprocess 4. Same key in both CWD and archon — archon value wins 5. CLAUDECODE markers stripped even when not from CWD .env 6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip * test: add DATABASE_URL leak scenarios to env integration tests * fix: move CLAUDECODE warning into stripCwdEnv, remove dead useGlobalAuth logic Review findings addressed: 1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE from process.env before the warning check in cli.ts/server/index.ts could fire. Moved the warning into stripCwdEnv() itself, emitted BEFORE the deletion. Removed duplicate warning code from both entry points. 2. useGlobalAuth token stripping removed (intentional, not regression) — the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when useGlobalAuth=true. Per design discussion: the user controls ~/.archon/.env and all keys they set are intentional. If they want global auth, they just don't set tokens. Simplified buildSubprocessEnv to log auth mode for diagnostics only, no filtering. 3. Docs "no override needed" corrected — cli.md and configuration.md now reflect the actual code (override: true). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
…t timeout (coleam00#1067, coleam00#1030, coleam00#1098, coleam00#1070) * fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout (coleam00#1067) Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking non-overlapping keys into the Archon process — new stripCwdEnv() boot import removes them before any module reads env. (2) archon serve hardcoded skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting. (3) Claude SDK query had no first-event timeout, causing silent 30-min hangs when the subprocess wedges — new withFirstMessageTimeout wrapper races the first event against a configurable deadline (default 60s). Changes: - Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules - Import boot module as first import in CLI entry point - Remove skipPlatformAdapters: true from serve.ts - Add withFirstMessageTimeout + diagnostics to ClaudeClient - Add CLAUDECODE=1 nested-session warning to CLI - Add 9 unit tests (6 strip-cwd-env + 3 timeout) Fixes coleam00#1067 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings for PR coleam00#1092 Fixed: - Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1) - Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1) - Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2) - Update stale configuration.md docs for new env-loading mechanism (HIGH-2) - Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs - Add nested Claude Code hang troubleshooting entry - Fix boot module JSDoc: "CLI and server" → "CLI" only - Fix stripCwdEnv JSDoc: remove stale "override: true" reference - Update .claude/rules/cli.md startup behavior section - Update CLAUDE.md @archon/paths description with new exports Tests added: - Assert controller.signal.aborted on timeout - Handle generator that completes immediately without yielding - Strip distinct keys from different .env files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * simplify: replace string sentinel with typed error class in withFirstMessageTimeout Replace the '__timeout__' string sentinel used to identify timeout rejections with a dedicated FirstEventTimeoutError class. instanceof checks are more explicit and robust than string comparison on error messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings — dotenv version, docs, server warning, marker strip, tests 1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3) 2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK bypasses the env option and uses process.env directly (coleam00#1097) 3. Add CLAUDECODE=1 warning to server entry point (was only in CLI) 4. Add diagnostic payload content test for withFirstMessageTimeout 5. Integrate coleam00#1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS from process.env at entry point. Pattern-matched on CLAUDE_CODE_* prefix rather than hardcoding 6 names, so future Claude Code markers are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved. Root cause per coleam00#1097: the Claude Agent SDK leaks process.env into the spawned child regardless of the explicit env option, so the only way to prevent the nested-session deadlock is to delete the markers from process.env at the entry point. Validation: bun run validate passes, 125 paths tests (6 new marker tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak verified stripped (target repo .env DATABASE_URL does not affect Archon DB selection). * refactor: remove SUBPROCESS_ENV_ALLOWLIST — trust user config, strip only CWD The allowlist was wrong for a single-developer tool: - It blocked keys the user intentionally set in ~/.archon/.env (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.) - It was bypassed by the SDK anyway (process.env leaks to subprocess regardless of the env option — see coleam00#1097) - It attracted a constant stream of PRs adding keys (coleam00#1060, coleam00#1093, coleam00#1099) New model: CWD .env keys are the only untrusted source. stripCwdEnv() at entry point handles that. Everything in ~/.archon/.env + shell env passes through to the subprocess. No filtering, no second-guessing. Changes: - Delete env-allowlist.ts and env-allowlist.test.ts - Simplify buildSubprocessEnv() to return { ...process.env } with auth-mode logging (no token stripping — user controls their config) - Replace 4 allowlist-based tests with 1 pass-through test - Remove env-allowlist.test.ts from core test batch - Update security.md and cli.md docs to reflect the new model The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip remain in stripCwdEnv() at entry point — those are process-level safety (not per-subprocess filtering) and are needed regardless. * fix: restore override:true for archon env, add integration tests The integration tests caught a real issue: without override:true, the ~/.archon/.env load doesn't win over shell-inherited env vars. If the user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000, the user expects Archon to use 3000. stripCwdEnv() handles CWD .env files (untrusted). override:true handles shell-inherited vars (trusted but less specific than ~/.archon/.env). Different concerns, both needed. Also adds 6 integration tests covering the full entry-point flow: 1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped 2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept 3. General leak test — nothing from CWD reaches subprocess 4. Same key in both CWD and archon — archon value wins 5. CLAUDECODE markers stripped even when not from CWD .env 6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip * test: add DATABASE_URL leak scenarios to env integration tests * fix: move CLAUDECODE warning into stripCwdEnv, remove dead useGlobalAuth logic Review findings addressed: 1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE from process.env before the warning check in cli.ts/server/index.ts could fire. Moved the warning into stripCwdEnv() itself, emitted BEFORE the deletion. Removed duplicate warning code from both entry points. 2. useGlobalAuth token stripping removed (intentional, not regression) — the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when useGlobalAuth=true. Per design discussion: the user controls ~/.archon/.env and all keys they set are intentional. If they want global auth, they just don't set tokens. Simplified buildSubprocessEnv to log auth mode for diagnostics only, no filtering. 3. Docs "no override needed" corrected — cli.md and configuration.md now reflect the actual code (override: true). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
Summary
AWS_PROFILE,AWS_REGION,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN,AWS_DEFAULT_REGION) were missing from the subprocess env allowlistCLAUDE_CODE_USE_BEDROCKbut not the AWS vars it depends onTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit