Host Browser Proxy — Phase 2 (Chrome extension as CDP proxy)#24159
Conversation
The main-branch release commit (#24108) bumped assistant/package.json to 0.6.2 but did not regenerate the openapi spec. Regenerate it on the feature branch so CI's OpenAPI Spec Check passes for Phase 2 PRs.
…h/file/cu proxies (#24115)
…cel messages (#24113) * feat(clients/macos): decode host_browser_request and host_browser_cancel messages * fix: type HostBrowserRequest.timeoutSeconds as Double? Matches the daemon's number-typed wire contract and mirrors HostBashRequest.timeoutSeconds, so fractional timeouts like 0.01s don't throw a type-mismatch and drop the whole host_browser_request event.
…ion backend stub (#24110) * feat(browser-session): add BrowserSessionManager scaffold with extension backend stub * test(browser-session): import public API via index.ts to satisfy knip Updates manager.test.ts to consume BrowserSessionManager, createExtensionBackend, and types through the public ../index.js entry point instead of deep-importing ../manager.js and ../backends/extension.js. This keeps knip happy during the scaffold phase: index.ts becomes a transitively-reachable entry point from src/**/__tests__/**/*.ts before any production module consumes it. * fix(browser-session): enforce session existence in BrowserSessionManager.send Throws when the caller passes a sessionId that doesn't exist or has been disposed. Still advisory for single-backend Phase 2, but makes disposeSession() an actual enforcement boundary so commands can't run against stale ids once Phase 4 adds multi-backend routing.
* feat(chrome-extension): add standalone CDP proxy module * fix(chrome-extension): inject runtime.lastError and thread sessionId through CDP proxy - Add runtime.lastError to ChromeDebuggerApi so mocked tests can surface errors - Fold frame.sessionId into sendCommand params for flat-session routing - Extract sessionId from event params when building CdpEventFrame - Document flat-session handling in the module docstring * fix(chrome-extension): route flat-session sessionId through DebuggerSession target Chrome 125+ debugger.sendCommand takes sessionId on the target argument (DebuggerSession), not inside commandParams. Switch back to passing sessionId on the target. Same change on the onEvent listener — read sessionId from 'source' rather than params, since flat-session events surface it on the source. Also clean up the module docstring to drop PR-level narrative per clients/AGENTS.md's comment quality rule. * fix(chrome-extension): bind defaultChromeDebuggerApi methods to chrome.debugger Returning methods from a Proxy via Reflect.get without binding causes 'Illegal invocation' at runtime because Chrome's native bindings check this against the original chrome.debugger object. Replace the Proxy with a plain object whose methods are explicitly bound.
…old (#24114) * feat(chrome-extension-native-host): add native messaging helper scaffold * fix(chrome-extension-native-host): robust port discovery, JSON error handling, and assistant terminology - Add --assistant-port CLI arg so Chrome-spawned helpers can be pointed at a non-default port when the lockfile isn't present - Surface malformed stdin JSON as a protocol-level error frame instead of a silent crash - Rename user-facing 'daemon' to 'assistant' in error messages per AGENTS.md terminology rule Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(chrome-extension-native-host): finish daemon→assistant rename in client prose, vars, and smoke test - README section header and prose use 'assistant' (per root AGENTS.md §139) - DEFAULT_DAEMON_PORT → DEFAULT_ASSISTANT_PORT, resolveDaemonPort → resolveAssistantPort (per clients/AGENTS.md §403-404) - Smoke test example uses dynamic import() instead of require() since the package is ESM * fix(chrome-extension-native-host): flush stdout before exiting Wait for process.stdout.write callback to fire before calling process.exit(), so the native-messaging frame actually reaches Chrome on pipe-backed stdout before the process terminates. Without this, Chrome can see a disconnect instead of the intended token_response or error frame under backpressure or larger payloads. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(chrome-extension): add cloud OAuth sign-in skeleton * fix(chrome-extension): run OAuth sign-in from service worker and validate guardianId - Popup now sends a message to the background worker to initiate cloud sign-in instead of running launchWebAuthFlow directly. This avoids the MV3 popup teardown race where the awaited OAuth promise never resolves if the popup blurs during the auth window. - Add guardianId type check to getStoredToken so malformed stored tokens can't leak 'Signed in as guardian:undefined' into the popup UI.
…host proxy gating (#24111) * feat(channels): add chrome-extension interface id and per-capability host proxy gating * fix(channels): keep hostBrowserProxy available for non-interactive chrome-extension interfaces updateClient/drain-queue paths used !isInteractive as a proxy for hasNoClient, which incorrectly marks the chrome-extension's hostBrowserProxy unavailable immediately after construction. Decouple the flags: chrome-extension is non-interactive (no prompter UI) but still has a connected client for host_browser_request events. - conversation-routes.ts: derive hasNoClient as !(isInteractive || supportsHostProxy(sourceInterface, 'host_browser')) - server.ts persistAndProcessMessage: same pattern so queued sends don't lose availability - conversation-process.ts drain queue: add restore path via new Conversation.restoreBrowserProxyAvailability() helper - conversation.ts: add restoreBrowserProxyAvailability() that re-enables only the browser proxy (gated on hasNoClient) - channels/types.ts: clarify supportsHostProxy no-arg JSDoc to call out the desktop-only semantics - conversation-confirmation-signals.test.ts: cover the new restore helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(channels): targeted hostBrowserProxy enable without relaxing hasNoClient Cycle 1 derived hasNoClient as !(isInteractive || supportsHostProxy(id, 'host_browser')) to keep the chrome-extension's browser proxy available. That inadvertently made tool gating treat the conversation as fully interactive (isInteractive derives from !ctx.hasNoClient), enabling host_bash/host_file tools that chrome-extension can't service. Revert to the literal hasNoClient = !isInteractive and instead call a targeted restoreBrowserProxyAvailability() after updateClient. The helper now enables the browser proxy regardless of hasNoClient so the single-proxy chrome-extension turn works without leaking host_bash/host_file tool availability. Part of JARVIS-1175 * fix(channels): drop 'historically' from JSDoc and tighten chrome-extension else-if in server.ts - assistant/AGENTS.md: comments describe current state, not history - server.ts: scope the non-interactive host-browser restore branch to interfaces that specifically only support host_browser (not macos, which hits the interactive branch) * test: add restoreBrowserProxyAvailability to Conversation mocks Two test files use object-literal mocks for Conversation that need the new method so they don't throw TypeError at the new call site in handleSendMessage. * fix(routes): optional-chain restoreBrowserProxyAvailability for test mocks * test: allowlist chrome-extension-native-host in gateway-only guard The native messaging helper intentionally POSTs to the local daemon's /v1/browser-extension-pair endpoint on 127.0.0.1 to mint capability tokens for the extension; it's a bootstrap path that cannot and should not go through the gateway. Add it to the guard-test allowlist. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on clients (#24129) * feat(runtime): route host_browser_request to connected chrome-extension clients * fix(runtime): gateway guardianId plumbing + queue-drain-safe chrome-extension sender - handleBrowserRelayUpgrade now looks for x-guardian-id header/query param as a fallback when the JWT sub is a service token (gateway-forwarded case) - Conversation exposes hostBrowserSenderOverride so restoreBrowserProxyAvailability preserves the registry-routed sender on drain-queue restores instead of clobbering it with the SSE hub sender
…proxy behind feature flag (#24125) * feat(chrome-extension): dispatch host_browser_request frames via CDP proxy behind feature flag * fix(chrome-extension): use camelCase wire format, tolerate re-attach, guard postResult catch - Match daemon's actual host_browser_request envelope shape (requestId, cdpMethod, cdpParams, cdpSessionId — only timeout_seconds stays snake_case) - POST /v1/host-browser-result with camelCase keys to match the runtime schema - Track attached CDP targets and skip re-attach; dispose clears the set - Wrap postResult calls inside the catch handler so a secondary failure is logged instead of becoming an unhandled rejection * fix(chrome-extension): invalidate attachedTargets cache on debugger detach Subscribe to CdpProxy.onDetach in the dispatcher and remove the corresponding key from the attached-targets cache when Chrome notifies us of a detach (tab close, navigation, infobar cancel, external takeover). Without this, the cache held a stale entry forever and subsequent commands skipped the re-attach, causing permanent CDP failures.
…nt (#24130) * feat(runtime): add /v1/browser-extension-pair capability token endpoint * fix(runtime): align pair endpoint with native helper contract + move secret out of workspace - Accept extensionOrigin (preferred) and origin (legacy) in request body - Return expiresAt as ISO 8601 string instead of numeric ms, matching what the chrome-extension-native-host helper validates - Move capabilityTokenSecret out of workspace/data into protected storage alongside the actor-token-signing-key per AGENTS.md workspace-isolation rule - Fix isLoopbackHostHeader to correctly parse IPv6 bracket notation * fix(runtime): align pair allowlist with native helper + reject malformed bracketed Host headers - ALLOWED_EXTENSION_ORIGINS now matches the chrome-extension-native-host placeholder so the dev pair flow works end-to-end - parseHostHeader rejects inputs like '[::1]attacker.com' where content after the closing bracket is not an optional ':port'
… install (#24128) * feat(installer): write Chrome native messaging host manifest on macOS install * fix(build): parenthesize native-host staleness check Bash || and && are equal-precedence left-to-right, so the unparenthesized condition incorrectly required bun.lock to also be newer for a package.json update to trigger a rebuild. Group the bun.lock subexpression explicitly. * fix(installer): conform InstallError to LocalizedError so localizedDescription is useful
…tive messaging (#24142) * feat(chrome-extension): bootstrap self-hosted capability token via native messaging * fix(chrome-extension): nativeMessaging permission, disconnect race, persistence fallback, popup->worker delegation - Add nativeMessaging permission to manifest so Chrome actually allows chrome.runtime.connectNative('com.vellum.daemon') - Set settled=true synchronously on token_response so a fast onDisconnect can't win the race and reject a valid pairing - On chrome.storage.local.set failure, log and resolve with the in-memory token instead of discarding it (single-session fallback) - Move the pair flow into the service worker via chrome.runtime.sendMessage so the popup teardown can't kill the awaited promise mid-flight
…ket (#24143) * feat(chrome-extension): connect to cloud gateway browser-relay WebSocket * fix(chrome-extension): surface missing-token connect failures and ignore stale socket close events - Worker now returns an actionable error when the selected relay mode has no usable token (cloud not signed in, self-hosted not paired) - RelayConnection's close listener ignores events from superseded sockets so a setMode mid-flight does not nuke the new socket reference
…apability bootstrap (#24154)
…est round-trip (#24153) * test(host-browser): e2e smoke test for cloud-hosted host_browser_request round-trip * test(host-browser): exercise actual timeout path and clarify mock WS header support - Disconnected test renamed/restructured to use a never-resolving CDP handler plus a short timeout_seconds, so the proxy's setTimeout path is actually covered - Removed/implemented extraHandshakeHeaders on the mock fixture so the advertised API matches reality
|
Self-review is starting. Results will be posted here when complete. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1f9190ff33
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .then((stored: StoredLocalToken) => sendResponse({ ok: true, token: stored })) | ||
| .catch((err) => sendResponse({ ok: false, error: err instanceof Error ? err.message : String(err) })); |
There was a problem hiding this comment.
Reply to popup in self-hosted pair handler
The self-hosted-pair message path never calls the runtime message callback (sendResponseFn); it calls the relay socket helper sendResponse(...) instead. Because this branch returns true (async response expected), popup callers wait for a response that never arrives and end up with a runtime error, so local pairing appears to fail even when the native bootstrap succeeds.
Useful? React with 👍 / 👎.
| { type: "token_response", token, expiresAt }, | ||
| 0, |
There was a problem hiding this comment.
Include guardianId in native token_response
The native host drops guardianId when it forwards /v1/browser-extension-pair output to Chrome, but bootstrapLocalToken() requires guardianId on token_response and rejects frames without it as malformed. In practice this means successful pair endpoint responses still fail the extension bootstrap path, so no local capability token is accepted/persisted.
Useful? React with 👍 / 👎.
| const [token, port] = await Promise.all([getBearerToken(), getRelayPort()]); | ||
| const headers: Record<string, string> = { 'content-type': 'application/json' }; | ||
| if (token) headers.authorization = `Bearer ${token}`; | ||
| const resp = await fetch(`http://127.0.0.1:${port}/v1/host-browser-result`, { |
There was a problem hiding this comment.
Post host_browser results to active relay origin
Result delivery is hardcoded to http://127.0.0.1:<port>/v1/host-browser-result with the local bearer token, even when relay mode is cloud. In cloud mode, host_browser_request frames come from the gateway WebSocket, so posting responses to localhost causes delivery failures/timeouts (especially on machines without a local assistant), and the request never resolves server-side.
Useful? React with 👍 / 👎.
Self-review completeResult: GAPS FOUND (Pass 3 incomplete due to interruption) Pass 1 — External reviewer feedback: GAPS FOUND12 actionable items remain after fix cycles, including 4 P0/P1 bugs: P0 / P1 (functional / security):
P2 / quality: Pass 2 — Plan faithfulness: GAPS FOUND (1 item)
Pass 3 — Repo integration review: NOT COMPLETEDInterrupted before completion. The Pass 1 + Pass 2 findings above include the integration concerns that Pass 3 would have flagged most prominently (host tool leak in PR 4, wire format coordination in PR 9/10, popup-worker delegation miss in PR 14). RecommendationAddress the 4 P0/P1 items before merging to main. The other 7 P2 items can land in a follow-up cleanup PR. Self-hosted pairing and the unauthorized-origin path will not work end-to-end without items 2, 3, and 4. Feature branch is ready for your review. Per --auto-merge not being set, this PR is presented for manual merge. |
…e history-narrating docstrings (#24229)
There was a problem hiding this comment.
💡 Codex Review
vellum-assistant/clients/chrome-extension/popup/popup.ts
Lines 167 to 169 in 98b776f
The popup computes relayMode from the radio state but never writes that value as part of the same connect flow before calling runtime.sendMessage({ type: 'connect' }). If the user toggles mode and immediately clicks Connect, the worker can still use its stale in-memory relayMode (updated only via async storage.onChanged) and attempt the wrong transport, producing incorrect self-hosted/cloud connection behavior.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ) { | ||
| const envelopeType = (parsed as { type: string }).type; | ||
| if (envelopeType === 'host_browser_request') { | ||
| if (!cdpProxyEnabled) return; |
There was a problem hiding this comment.
Reject disabled host_browser requests explicitly
When cdpProxyEnabled is false (the default when vellum.cdpProxyEnabled is unset), host_browser_request frames are dropped with an early return. The daemon still advertises host_browser for the chrome-extension transport, so these calls wait for the full proxy timeout and fail as intermittent tool hangs instead of an immediate, actionable error. This creates a production-facing regression for any extension session that has not enabled the beta toggle.
Useful? React with 👍 / 👎.
Round 3 fix-ups completeAll 4 round-3 findings have been addressed. Feature branch tip is now Round-3 PRs merged
Cumulative totals
Total: 34 merged PRs on the feature branch. Remaining deferred items (Phase 3 scope)
Feature branch statusReady for manual review. The 3 P1 regressions introduced by round-1 fixes are resolved, verified, and pinned with regression tests. All in-scope P2/P3 cleanups have landed. Code quality on the feature branch is now significantly higher than when round-2 review started. |
| if (HOST_TOOL_NAMES.has(name)) { | ||
| // Host tools require a connected client — without one, there is no human | ||
| // to approve execution and the guardian auto-approve path would allow | ||
| // unchecked host command execution on the daemon host. | ||
| const capability = HOST_TOOL_TO_CAPABILITY.get(name); | ||
| const transport = ctx.transportInterface; | ||
|
|
||
| // Per-capability check is authoritative for structural support: if the | ||
| // transport cannot service this capability, the tool is filtered out. | ||
| if (transport && capability && !supportsHostProxy(transport, capability)) { | ||
| return false; | ||
| } | ||
|
|
||
| // chrome-extension is its own executor — the extension's popup gates | ||
| // commands via its own UI, and the transport does not use an SSE-level | ||
| // interactive approval channel. hasNoClient is intentionally `true` for | ||
| // chrome-extension turns (chrome-extension is not in INTERACTIVE_INTERFACES) | ||
| // and must not gate host_browser. Trust the per-capability check. | ||
| if (transport === "chrome-extension") { | ||
| return true; |
There was a problem hiding this comment.
🚩 HOST_TOOL_NAMES and HOST_TOOL_TO_CAPABILITY sets must stay in sync for chrome-extension safety
The isToolActiveForContext function at assistant/src/daemon/conversation-tool-setup.ts:533-555 has a structural fragility: if a future tool is added to HOST_TOOL_NAMES (line 481-487) but NOT to HOST_TOOL_TO_CAPABILITY (line 499-505), the per-capability check at line 539 is skipped (because capability is undefined/falsy), and the transport === "chrome-extension" check at line 548 unconditionally returns true. This would silently grant chrome-extension access to a host tool it cannot service. Currently all five entries are mapped so this is not a live bug, but the two data structures have no compile-time or runtime enforcement that they stay in sync. A guard test or assertion would prevent this from regressing.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Phase 2 of the Browser Use Architecture TDD. Refactors the Chrome extension into a thin chrome.debugger CDP JSON-RPC proxy and ships both cloud-hosted (extension → Vellum gateway) and self-hosted (extension → local daemon via Chrome Native Messaging) transports. Bundles the Phase 1 quick follow-ups (Swift host_browser decoders + sibling host-proxy defensive backports).
The legacy ExtensionCommand protocol stays alive throughout Phase 2 behind `vellum.cdpProxyEnabled` in chrome.storage.local. Phase 3 will flip the default and migrate user-facing browser tools to call hostBrowserProxy.request().
PRs merged into feature branch
Wave 1 (foundations, parallel)
Wave 2 (integration, parallel)
Wave 3 (auth flows, parallel)
Wave 4 (E2E smoke tests, parallel)
Non-goals (deferred to Phase 3+)
Part of plan: `.private/plans/host-browser-proxy-phase-2.md`