Skip to content

fix: Windows HOME resolution and MCP plugin warning suppression#1134

Closed
MrFadiAi wants to merge 1 commit intocoleam00:devfrom
MrFadiAi:fix/windows-home-mcp-warnings
Closed

fix: Windows HOME resolution and MCP plugin warning suppression#1134
MrFadiAi wants to merge 1 commit intocoleam00:devfrom
MrFadiAi:fix/windows-home-mcp-warnings

Conversation

@MrFadiAi
Copy link
Copy Markdown
Contributor

@MrFadiAi MrFadiAi commented Apr 12, 2026

Summary

  • Fix Windows path resolution bug where HOME environment variable is unset, causing CLI and server to fail loading ~/.archon/.env configuration
  • Suppress non-fatal MCP connection failures from user-level plugins (e.g., telegram) in headless workflow subprocesses — only surface warnings for workflow-configured MCP servers

Changes

Windows HOME Resolution (@archon/cli, @archon/server)

  • Replace process.env.HOME with getArchonHome() from @archon/paths for cross-platform config path resolution in both CLI and server entry points
  • On Windows, HOME is often unset — getArchonHome() correctly falls back to USERPROFILE or HOMEDRIVE+HOMEPATH

MCP Plugin Warning Suppression (@archon/workflows, @archon/core)

  • Track workflow-configured MCP server names in the DAG executor
  • Filter MCP connection failure messages to only surface warnings for servers explicitly configured via mcp: in workflow YAML
  • User-level plugin MCP failures (e.g., telegram) are logged at debug level instead of warn and not sent to the user
  • Pass settings: { enabledPlugins: {} } to Claude Agent SDK subprocess to attempt disabling user plugins

Test plan

  • Verified on Windows 11 where HOME is unset — CLI and server now correctly load ~/.archon/.env
  • Verified archon workflow run archon-assist "Say hello" completes without telegram MCP warning
  • TypeScript type-check passes for all packages
  • ESLint + Prettier pass (pre-commit hook)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved MCP server connection error handling by distinguishing between workflow-level and plugin-level failures; only workflow failures are reported to users, while plugin failures are logged internally.
  • New Features

    • Claude agent subprocess now exclusively uses explicitly configured MCP servers for enhanced predictability and control.

- Use getArchonHome() instead of process.env.HOME for cross-platform
  config path resolution in CLI and server entry points
- Suppress non-fatal MCP connection failures from user-level plugins
  (e.g., telegram) in headless workflow subprocesses
- Only surface MCP failures for workflow-configured servers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Updates env file path resolution across CLI, server, and tests to use a centralized getArchonHome() function. Disables user plugin execution in Claude subprocess by adding plugin settings. Modifies workflow executor to differentiate and filter MCP server connection failures between workflow-specific and plugin-specific sources.

Changes

Cohort / File(s) Summary
Environment Path Standardization
packages/cli/src/cli.ts, packages/cli/src/cli.test.ts, packages/server/src/index.ts
Replaced manual home directory resolution with centralized getArchonHome() function for computing the global ~/.archon/.env file path. Updated imports and adjusted test setup to reflect the new path resolution method.
Claude Provider Configuration
packages/core/src/providers/claude.ts
Added settings: { enabledPlugins: {} } to Claude Agent SDK options to disable user and plugin execution within the subprocess, restricting activity to supplied MCP servers only.
MCP Failure Filtering
packages/workflows/src/dag-executor.ts
Implemented differentiation between workflow-specific and plugin-specific MCP server connection failures in system message handling. Only workflow MCP failures are surfaced to users via safeSendMessage; plugin failures are logged at debug level and suppressed from user output.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Paths now home, with getArchon's grace,
MCP failures sorted to their rightful place—
Plugins hush while workflows speak,
Claude's plugins rest, no longer meek!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the main problem, solution, and test plan, but is missing critical template sections including architecture diagrams, validation evidence, security impact, compatibility analysis, human verification details, and rollback plan. Complete the description template by adding Architecture Diagrams (Before/After), Validation Evidence (test command outputs), Security Impact assessment, Compatibility/Migration details, Human Verification scope, Side Effects/Blast Radius analysis, and Rollback Plan.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the two main changes: Windows HOME resolution fix and MCP plugin warning suppression, matching the core objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/cli/src/cli.test.ts (1)

241-246: ⚠️ Potential issue | 🟠 Major

Test is environment-dependent and can be flaky.

This path now targets the real user ~/.archon/.env when present, so test outcomes can vary by machine/user file contents. Prefer a temp .env fixture to keep this deterministic.

Proposed deterministic test adjustment
   it('should allow ~/.archon/.env to override Bun-auto-loaded vars via override:true', async () => {
     const { config } = await import('dotenv');
-    const { resolve } = await import('path');
-    const { existsSync } = await import('fs');
+    const { resolve, join } = await import('path');
+    const { mkdtemp, writeFile, rm } = await import('fs/promises');
+    const { tmpdir } = await import('os');

     // Simulate Bun auto-loading a stale value
     process.env.TEST_ARCHON_OVERRIDE = 'from-cwd-env';

-    // Write a temporary env content and load with override
-    const globalEnvPath = resolve(getArchonHome(), '.env');
-    if (existsSync(globalEnvPath)) {
-      const result = config({ path: globalEnvPath, override: true });
-      // If ~/.archon/.env exists and has DATABASE_URL, it should override
-      expect(result.error).toBeUndefined();
-    }
+    const tempDir = await mkdtemp(join(tmpdir(), 'archon-cli-test-'));
+    const testEnvPath = resolve(tempDir, '.env');
+    await writeFile(testEnvPath, 'TEST_ARCHON_OVERRIDE=from-archon-env\n', 'utf8');
+    const result = config({ path: testEnvPath, override: true });
+    expect(result.error).toBeUndefined();
+    expect(process.env.TEST_ARCHON_OVERRIDE).toBe('from-archon-env');
+    await rm(tempDir, { recursive: true, force: true });

     // Clean up
     delete process.env.TEST_ARCHON_OVERRIDE;
   });

As per coding guidelines: "Keep tests deterministic — no flaky timing or network dependence without guardrails; ensure local validation commands (bun run validate) map directly to CI expectations (Determinism + Reproducibility)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli.test.ts` around lines 241 - 246, The test reads the real
user home (~/.archon/.env) via getArchonHome() and becomes flaky; instead create
a temporary directory and a temp .env fixture, write the expected DATABASE_URL
into that file, mock or override getArchonHome() to return the temp dir (or set
the env value used by getArchonHome()), then call resolve(getArchonHome(),
'.env') and load with config({ path: ..., override: true }) and assert
result.error is undefined; finally clean up the temp file/dir. Target symbols:
globalEnvPath, getArchonHome, resolve, existsSync, and config.
🧹 Nitpick comments (2)
packages/core/src/providers/claude.ts (1)

388-391: Scope plugin suppression to workflow/headless calls only.

This hard-codes plugin disabling for every ClaudeProvider call. If this provider is reused outside DAG/headless runs, it can unintentionally disable user-intended plugins there too. Prefer a typed AgentRequestOptions flag (for example, disableUserPlugins?: boolean) and apply settings.enabledPlugins only when that flag is set by workflow execution paths.

Proposed direction
-        settings: { enabledPlugins: {} },
+        ...(requestOptions?.disableUserPlugins === true
+          ? { settings: { enabledPlugins: {} } }
+          : {}),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/providers/claude.ts` around lines 388 - 391, The provider
currently unconditionally injects settings.enabledPlugins into every
ClaudeProvider call; change this to a conditional based on a new optional
boolean flag on the request type (e.g., add disableUserPlugins?: boolean to
AgentRequestOptions) and only set settings: { enabledPlugins: {} } on the
outgoing request when AgentRequestOptions.disableUserPlugins === true (default
false). Update the types for AgentRequestOptions, the call site(s) that create
requests for workflow/headless runs to set disableUserPlugins: true, and the
ClaudeProvider request-building logic (referencing ClaudeProvider and
settings.enabledPlugins) so normal non-workflow calls keep user plugins enabled.
packages/workflows/src/dag-executor.ts (1)

1037-1071: Minor duplication in filtering logic.

The code filters failedEntries twice with the same split(' (')[0] operation. While this is readable and the arrays are small, a single-pass partition would be slightly more efficient.

♻️ Optional: single-pass partition
-          const workflowFailures = failedEntries.filter(entry => {
-            const name = entry.split(' (')[0];
-            return workflowMcpNames.has(name);
-          });
-
-          if (workflowFailures.length > 0) {
+          const workflowFailures: string[] = [];
+          const pluginFailures: string[] = [];
+          for (const entry of failedEntries) {
+            const name = entry.split(' (')[0];
+            if (workflowMcpNames.has(name)) {
+              workflowFailures.push(entry);
+            } else {
+              pluginFailures.push(entry);
+            }
+          }
+
+          if (workflowFailures.length > 0) {
             const workflowMsg = `MCP server connection failed: ${workflowFailures.join(', ')}`;
             getLog().warn(
               { nodeId: node.id, mcpStatus: workflowMsg },
               'dag.mcp_server_connection_failed'
             );
             const delivered = await safeSendMessage(
               platform,
               conversationId,
               workflowMsg,
               nodeContext
             );
             if (!delivered) {
               getLog().error(
                 { nodeId: node.id, mcpStatus: workflowMsg, workflowRunId: workflowRun.id },
                 'dag.mcp_connection_failure_delivery_failed'
               );
             }
           }
-          // User-plugin MCP failures: log at debug, don't surface to user
-          const pluginFailures = failedEntries.filter(entry => {
-            const name = entry.split(' (')[0];
-            return !workflowMcpNames.has(name);
-          });
+
           if (pluginFailures.length > 0) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.ts` around lines 1037 - 1071, The code
currently computes workflowFailures and pluginFailures by filtering
failedEntries twice using entry.split(' (')[0]; change this to a single-pass
partition over failedEntries (e.g., a for loop or Array.reduce) that extracts
const name = entry.split(' (')[0] once and pushes the entry into either
workflowFailures or pluginFailures based on workflowMcpNames.has(name); keep the
subsequent logic that logs workflowFailures, calls safeSendMessage, and logs
delivery errors unchanged, only replace the duplicate filtering with the
single-pass partition that produces workflowFailures and pluginFailures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/cli/src/cli.test.ts`:
- Around line 241-246: The test reads the real user home (~/.archon/.env) via
getArchonHome() and becomes flaky; instead create a temporary directory and a
temp .env fixture, write the expected DATABASE_URL into that file, mock or
override getArchonHome() to return the temp dir (or set the env value used by
getArchonHome()), then call resolve(getArchonHome(), '.env') and load with
config({ path: ..., override: true }) and assert result.error is undefined;
finally clean up the temp file/dir. Target symbols: globalEnvPath,
getArchonHome, resolve, existsSync, and config.

---

Nitpick comments:
In `@packages/core/src/providers/claude.ts`:
- Around line 388-391: The provider currently unconditionally injects
settings.enabledPlugins into every ClaudeProvider call; change this to a
conditional based on a new optional boolean flag on the request type (e.g., add
disableUserPlugins?: boolean to AgentRequestOptions) and only set settings: {
enabledPlugins: {} } on the outgoing request when
AgentRequestOptions.disableUserPlugins === true (default false). Update the
types for AgentRequestOptions, the call site(s) that create requests for
workflow/headless runs to set disableUserPlugins: true, and the ClaudeProvider
request-building logic (referencing ClaudeProvider and settings.enabledPlugins)
so normal non-workflow calls keep user plugins enabled.

In `@packages/workflows/src/dag-executor.ts`:
- Around line 1037-1071: The code currently computes workflowFailures and
pluginFailures by filtering failedEntries twice using entry.split(' (')[0];
change this to a single-pass partition over failedEntries (e.g., a for loop or
Array.reduce) that extracts const name = entry.split(' (')[0] once and pushes
the entry into either workflowFailures or pluginFailures based on
workflowMcpNames.has(name); keep the subsequent logic that logs
workflowFailures, calls safeSendMessage, and logs delivery errors unchanged,
only replace the duplicate filtering with the single-pass partition that
produces workflowFailures and pluginFailures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c58d79d8-8e7b-46a7-a670-0e0fe9c298c3

📥 Commits

Reviewing files that changed from the base of the PR and between eb75ab6 and 43c0b97.

📒 Files selected for processing (5)
  • packages/cli/src/cli.test.ts
  • packages/cli/src/cli.ts
  • packages/core/src/providers/claude.ts
  • packages/server/src/index.ts
  • packages/workflows/src/dag-executor.ts

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 21, 2026

Thanks for raising this, @MrFadiAi — closing in favor of a fresh fix on current dev.

This PR is 9 days stale and all three changes have drifted:

  1. Windows HOME fix — already shipped. packages/cli/src/cli.ts and packages/server/src/index.ts no longer reference process.env.HOME; PR CLI silently strips repo-local .env vars; logs say (0) from .env instead of stripped N keys #1302 moved env loading into loadArchonEnv() (@archon/paths/env-loader), which uses getArchonHome() internally and handles the Windows case (USERPROFILE / HOMEDRIVE+HOMEPATH / ARCHON_HOME override). No further change needed.

  2. settings: { enabledPlugins: {} } in claude.ts — file moved. packages/core/src/providers/claude.ts was extracted to packages/providers/src/claude/provider.ts during the provider registry refactor, so the patch can't apply.

  3. MCP plugin warning suppression in dag-executor.ts — core idea is good, but the surrounding code has been touched by feat(isolation,workflows): worktree location + per-workflow isolation policy #1310 and feat(paths,workflows): unify ~/.archon/{workflows,commands,scripts} + drop globalSearchPath (closes #1136) #1315 and needs a manual rebase.

The MCP noise filter (point 3) is the piece still worth landing. We'll re-do it on current dev with tests and post the follow-up PR here — you'll be credited via Co-authored-by: on the commit.

@Wirasm Wirasm closed this Apr 21, 2026
Wirasm added a commit that referenced this pull request Apr 22, 2026
…#1327)

* fix(workflows): filter user-plugin MCP noise out of workflow warnings

Before this change, the dag-executor surfaced every entry in the Claude
SDK's "MCP server connection failed: …" system message to the user. That
message includes user-level plugin MCPs inherited from ~/.claude/ (e.g.
`telegram`) that fail to connect in the headless workflow subprocess —
they're non-actionable noise for the workflow author.

Fix:
- Pre-compute the set of workflow-configured MCP server names per node
  by parsing the `mcp:` config file once at the start of
  executeNodeInternal. No caller-facing API change; no duplication of
  the provider's env-var expansion logic (we only need the keys).
- Split the system-message handler: the `MCP server connection failed:`
  path now surfaces only the subset of failing names that match the
  node's configured set; user-plugin failures are debug-logged as
  `dag.mcp_plugin_connection_suppressed`. The `⚠️` branch is unchanged.

Supersedes #1134 (closed as stale — the Windows HOME fix in that PR was
already shipped via #1302, and the claude.ts enabledPlugins change
targeted a file that has since moved into @archon/providers).

Credits @MrFadiAi for identifying and reporting the underlying issue.

Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>

* fix(workflows): preserve MCP failure status in filtered message + observability

Address review feedback on PR #1327:

- parseMcpFailureServerNames now returns {name, segment} entries so the
  forwarded "MCP server connection failed: ..." message preserves the
  per-server status detail (e.g. "(timeout)", "(disconnected)") that the
  bare-name reconstruction was dropping.
- loadConfiguredMcpServerNames now debug-logs read failures as
  dag.mcp_filter_config_read_failed instead of swallowing them silently.
  A transient EMFILE/EBUSY at filter time would otherwise silently
  reclassify a real workflow-MCP failure as plugin noise.
- Add 4 integration tests through executeDagWorkflow covering the mixed
  workflow/plugin split, all-plugin suppression, no-mcp:-config nodes,
  and the unchanged ⚠️ warning path.
- Drop a WHAT comment above configuredMcpNames and a temporal phrase
  ("before this filter landed") that would rot on merge.
- Document the filtering boundary in guides/mcp-servers.md and add a
  troubleshooting row for users debugging silently-suppressed plugin MCPs.

---------

Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>
@Wirasm Wirasm mentioned this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants