Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions assistant/src/daemon/__tests__/conversation-tool-setup.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
/**
* Tests for `isToolActiveForContext` host-tool capability gating.
*
* The chrome-extension interface only supports the `host_browser` host proxy
* capability, so host tools must be gated by
* `supportsHostProxy(transport, capability)` instead of a single
* `hasNoClient` check. These tests assert that:
*
* - Each host tool is only projected for transports whose interface supports
* the matching capability (e.g. `host_bash` is only active for macOS, not
* chrome-extension — regression coverage for the Codex P1 host-tool leak).
* - The `hasNoClient` precondition still takes precedence so HTTP-only paths
* never see host tools.
* - Contexts without a `transportInterface` fall back to the permissive
* coarse-grained behavior so callers that have not yet plumbed the field
* through `SkillProjectionContext` continue to see the host tools their
* `hasNoClient` state allows.
*/

import { describe, expect, test } from "bun:test";

import type { SkillProjectionCache } from "../conversation-skill-tools.js";
import {
isToolActiveForContext,
type SkillProjectionContext,
} from "../conversation-tool-setup.js";

function makeCtx(
overrides: Partial<SkillProjectionContext> = {},
): SkillProjectionContext {
return {
skillProjectionState: new Map(),
skillProjectionCache: {} as SkillProjectionCache,
coreToolNames: new Set<string>(),
toolsDisabledDepth: 0,
...overrides,
};
}

describe("isToolActiveForContext — host tool capability gating", () => {
test("host_bash is active for macOS transport (full host proxy support)", () => {
expect(
isToolActiveForContext(
"host_bash",
makeCtx({ hasNoClient: false, transportInterface: "macos" }),
),
).toBe(true);
});

test("host_bash is NOT active for chrome-extension transport (Codex P1 regression)", () => {
// Regression coverage: chrome-extension only supports `host_browser`, so
// `host_bash` must NOT be projected even though a client is connected
// (`hasNoClient: false`). Without per-capability gating the model could
// attempt host_bash calls that the transport cannot dispatch.
expect(
isToolActiveForContext(
"host_bash",
makeCtx({
hasNoClient: false,
transportInterface: "chrome-extension",
}),
),
).toBe(false);
});

test("host_browser is active for chrome-extension transport", () => {
expect(
isToolActiveForContext(
"host_browser",
makeCtx({
hasNoClient: false,
transportInterface: "chrome-extension",
}),
),
).toBe(true);
});
Comment on lines +66 to +76

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Test uses hasNoClient: false for chrome-extension, masking the real runtime state

The test at line 66-76 (host_browser is active for chrome-extension transport) uses hasNoClient: false, but the actual runtime state for chrome-extension sessions through the SSE sendMessageToConversation path keeps hasNoClient === true (see assistant/src/daemon/server.ts:1226-1229 and assistant/src/daemon/conversation.ts:569-570). This means the test passes but doesn't cover the real-world scenario. A test case with { hasNoClient: true, transportInterface: "chrome-extension" } expecting host_browser to be true would catch the reported bug.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


test("host_browser is active for macOS transport", () => {
expect(
isToolActiveForContext(
"host_browser",
makeCtx({ hasNoClient: false, transportInterface: "macos" }),
),
).toBe(true);
});

test("host_file_read is NOT active for chrome-extension transport", () => {
expect(
isToolActiveForContext(
"host_file_read",
makeCtx({
hasNoClient: false,
transportInterface: "chrome-extension",
}),
),
).toBe(false);
});

test("host_bash respects hasNoClient even when transport supports it", () => {
// The existing `hasNoClient` gate must continue to take precedence: even
// a macOS-capable transport must not surface host tools when no client is
// actually connected (e.g. the HTTP-only path).
expect(
isToolActiveForContext(
"host_bash",
makeCtx({ hasNoClient: true, transportInterface: "macos" }),
),
).toBe(false);
});

test("host_bash falls back to permissive behavior when transport is undefined", () => {
// Backwards-compat fallback: contexts that don't pass a transport
// interface (e.g. tests, callers that haven't plumbed the new field)
// keep the coarse-grained behavior so we don't accidentally hide tools.
expect(
isToolActiveForContext(
"host_bash",
makeCtx({ hasNoClient: false, transportInterface: undefined }),
),
).toBe(true);
});
});
49 changes: 48 additions & 1 deletion assistant/src/daemon/conversation-tool-setup.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@
* keeping the constructor body focused on wiring.
*/

import {
type HostProxyCapability,
type InterfaceId,
supportsHostProxy,
} from "../channels/types.js";
import { isHttpAuthDisabled } from "../config/env.js";
import { getIsPlatform } from "../config/env-registry.js";
import type { CesClient } from "../credential-execution/client.js";
Expand Down Expand Up @@ -459,6 +464,15 @@ export interface SkillProjectionContext {
subagentAllowedTools?: Set<string>;
/** True when this conversation belongs to a subagent spawned by SubagentManager. */
readonly isSubagent?: boolean;
/**
* The interface id of the connected client driving the current turn (e.g.
* "macos", "chrome-extension"). Used to gate host tools by per-capability
* `supportsHostProxy(transport, capability)` so that interfaces which only
* support a subset of the host proxy set (e.g. chrome-extension supports
* `host_browser` but not `host_bash`/`host_file`) do not leak unsupported
* host tools into the LLM tool definitions.
*/
readonly transportInterface?: InterfaceId;
}

// ── Conditional tool sets ────────────────────────────────────────────
Expand All @@ -469,6 +483,25 @@ const HOST_TOOL_NAMES = new Set([
"host_file_write",
"host_file_edit",
"host_bash",
"host_browser",
]);
/**
* Maps each host tool name to the host proxy capability that the connected
* client interface must support. `isToolActiveForContext` uses this to gate
* each host tool individually so that partial-capability transports (e.g.
* chrome-extension only supports `host_browser`) only see the host tools
* their interface can actually service.
*
* Note: there is no `host_cu` tool exposed via the tool gating layer today;
* computer-use is preactivated as a skill and projected through the skill
* tools path. Only the host tools listed in `HOST_TOOL_NAMES` need entries.
*/
const HOST_TOOL_TO_CAPABILITY = new Map<string, HostProxyCapability>([
["host_bash", "host_bash"],
["host_file_read", "host_file"],
["host_file_write", "host_file"],
["host_file_edit", "host_file"],
["host_browser", "host_browser"],
]);
const CLIENT_CAPABILITY_TOOL_NAMES = new Set(["app_open"]);
const PLATFORM_TOOL_NAMES = new Set(["request_system_permission"]);
Expand Down Expand Up @@ -501,7 +534,21 @@ export function isToolActiveForContext(
// Host tools require a connected client — without one, there is no human
// to approve execution and the guardian auto-approve path would allow
// unchecked host command execution on the daemon host.
return !ctx.hasNoClient;
if (ctx.hasNoClient) return false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow host_browser when non-interactive client is connected

This new host-tool branch still hard-fails on hasNoClient, which unintentionally removes host_browser for chrome-extension turns. In runtime routing, chrome-extension is intentionally marked non-interactive (hasNoClient = true) while still wiring a live browser proxy, so this condition now prevents the model from ever seeing host_browser even though supportsHostProxy("chrome-extension", "host_browser") is true. That regresses browser automation for extension sessions while fixing the bash/file leak.

Useful? React with 👍 / 👎.

// Then gate per-capability against the connected interface so that
// partial-capability transports (e.g. chrome-extension only supports
// `host_browser`) only see host tools their interface can service.
// Without this check, host_bash/host_file_* would surface for
// chrome-extension sessions and the model would attempt calls the
// transport cannot dispatch.
const capability = HOST_TOOL_TO_CAPABILITY.get(name);
const transport = ctx.transportInterface;
// Backwards-compat fallback: contexts that have not yet been plumbed
// with a transport interface (tests, callers that don't pass the new
// field) keep the coarse-grained behavior so we don't accidentally hide
// tools from them.
if (!transport || !capability) return true;
return supportsHostProxy(transport, capability);
Comment on lines 534 to +551

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Adding host_browser to HOST_TOOL_NAMES breaks chrome-extension sessions where hasNoClient === true

Before this PR, host_browser was not in HOST_TOOL_NAMES, so isToolActiveForContext("host_browser", ctx) fell through to return true — always active regardless of hasNoClient. This PR adds host_browser to HOST_TOOL_NAMES (line 486), which subjects it to the if (ctx.hasNoClient) return false gate at line 537.

For chrome-extension sessions arriving through the SSE sendMessageToConversation path, hasNoClient is intentionally kept true. The comment at assistant/src/daemon/server.ts:1226-1229 explains: "We do NOT call updateClient(onEvent, false) for that case, because flipping hasNoClient false would also enable host_bash/host_file/host_cu tool gating." And assistant/src/daemon/conversation.ts:569-570 confirms: "The chrome-extension interface is non-interactive (so hasNoClient === true), but it DOES have a connected client that can service host_browser_request events."

With hasNoClient === true and host_browser now in HOST_TOOL_NAMES, the early return false fires before the new per-capability supportsHostProxy check is ever reached. The test at line 66-76 masks this by using hasNoClient: false, which doesn't match the actual runtime state for chrome-extension sessions on the SSE path.

Affected code path in isToolActiveForContext
if (HOST_TOOL_NAMES.has(name)) {
    if (ctx.hasNoClient) return false; // ← blocks host_browser for chrome-extension
    // per-capability check never reached
    const capability = HOST_TOOL_TO_CAPABILITY.get(name);
    const transport = ctx.transportInterface;
    if (!transport || !capability) return true;
    return supportsHostProxy(transport, capability);
}

(Refers to lines 486-551)

Prompt for agents
The problem is that host_browser was added to HOST_TOOL_NAMES, which gates it behind the hasNoClient check. But for chrome-extension sessions on the SSE sendMessageToConversation path (server.ts:1232-1240), hasNoClient is intentionally kept true — the code deliberately avoids calling updateClient(onEvent, false) to prevent other host tools from leaking.

Now that per-capability gating via supportsHostProxy(transport, capability) is in place, there are two possible approaches:

1. In isToolActiveForContext, restructure the HOST_TOOL_NAMES branch so that when transportInterface is available, the per-capability check runs BEFORE (or instead of) the hasNoClient check. The hasNoClient check could be the fallback only when no transport is known. This way chrome-extension with hasNoClient=true but transportInterface='chrome-extension' would still pass host_browser through supportsHostProxy.

2. Update server.ts:1232-1240 (sendMessageToConversation) and conversation-process.ts drain queue logic to call updateClient(onEvent, false) for chrome-extension now that per-capability gating handles the host_bash/host_file differentiation. This would set hasNoClient=false for chrome-extension, matching the test expectation. But this has broader implications for isInteractive and other hasNoClient-gated behavior.

Approach 1 is safer and more localized. The test at conversation-tool-setup.test.ts:66-76 should also be updated to cover the actual runtime state (hasNoClient: true with transportInterface: 'chrome-extension').
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines 534 to +551

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 server.ts hasNoClient workaround for chrome-extension is now potentially obsolete

The code at assistant/src/daemon/server.ts:1221-1240 deliberately avoids calling updateClient(onEvent, false) for chrome-extension to prevent host_bash/host_file tool gating from triggering. Now that per-capability gating via supportsHostProxy(transport, capability) is in place, this workaround may be unnecessary — the per-capability check would correctly block host_bash/host_file for chrome-extension even if hasNoClient were false. Cleaning this up (and setting hasNoClient = false for chrome-extension) would fix the reported bug and simplify the state model. However, there may be other hasNoClient-gated behaviors (e.g. isInteractive at conversation-tool-setup.ts:221) that need auditing first.

(Refers to lines 533-551)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

}
if (CLIENT_CAPABILITY_TOOL_NAMES.has(name)) {
return !ctx.hasNoClient;
Expand Down
11 changes: 11 additions & 0 deletions assistant/src/daemon/conversation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import type { ResolvedSystemPrompt } from "../agent/loop.js";
import { AgentLoop } from "../agent/loop.js";
import type {
InterfaceId,
TurnChannelContext,
TurnInterfaceContext,
} from "../channels/types.js";
Expand Down Expand Up @@ -1090,6 +1091,16 @@ export class Conversation {
return this.currentTurnInterfaceContext;
}

/**
* Implements the `transportInterface` field of `SkillProjectionContext` so
* that `isToolActiveForContext` can gate host tools by per-capability
* `supportsHostProxy(transport, capability)`. Derived from the live turn
* interface context so it tracks the connected client across turns.
*/
get transportInterface(): InterfaceId | undefined {
return this.currentTurnInterfaceContext?.userMessageInterface;
}

async persistUserMessage(
content: string,
attachments: UserMessageAttachment[],
Expand Down
Loading