Skip to content

test(host-browser): e2e smoke test for self-hosted native-messaging capability bootstrap#24154

Merged
noanflaherty merged 1 commit into
noanflaherty/host-browser-proxy-phase-2from
run-plan/host-browser-ph2/pr-16
Apr 7, 2026
Merged

test(host-browser): e2e smoke test for self-hosted native-messaging capability bootstrap#24154
noanflaherty merged 1 commit into
noanflaherty/host-browser-proxy-phase-2from
run-plan/host-browser-ph2/pr-16

Conversation

@noanflaherty
Copy link
Copy Markdown
Contributor

@noanflaherty noanflaherty commented Apr 7, 2026

Summary

  • Native helper integration test spawns the compiled binary, mocks the daemon pair endpoint, and asserts the request_token → token_response framing
  • Daemon E2E test boots the runtime, spawns the helper, asserts the returned capability token verifies via verifyHostBrowserCapability
  • VELLUM_DAEMON_TOKEN_PATH dev fallback covered by a focused test
  • Tests skip gracefully when the helper binary isn't built

Part of plan: host-browser-proxy-phase-2.md (PR 16 of 16)


Open with Devin

@noanflaherty noanflaherty merged commit 00eb2ba into noanflaherty/host-browser-proxy-phase-2 Apr 7, 2026
10 checks passed
@noanflaherty noanflaherty deleted the run-plan/host-browser-ph2/pr-16 branch April 7, 2026 23:35
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 2 additional findings in Devin Review.

Open in Devin Review

* End-to-end smoke test for the self-hosted native-messaging capability
* bootstrap path.
*
* This test exercises the full PR 7 / PR 11 / PR 13 flow at the subprocess
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Comments narrate project history with PR number references instead of describing current state

The file header comment block and inline comments extensively reference specific PR numbers (PR 7, PR 11, PR 13, PR 15, PR 16) to describe current features. This violates the assistant/AGENTS.md code comments rule: "Comments should describe the current state of the codebase, not narrate its history. Avoid phrases like 'no longer does X', 'previously used Y', or 'was removed in PR Z' — future readers should not need to understand past implementations to understand the current code." Using PR numbers as feature labels requires future readers to map PR numbers to features, which is exactly the kind of project-history narration the rule prohibits.

All PR reference locations in this file
  • Line 5: "the full PR 7 / PR 11 / PR 13 flow"
  • Line 9: "route from PR 11"
  • Line 10: "(PR 7, ...)"
  • Line 31: "PR 15's mock-chrome-extension fixture"
  • Line 313: "scope of PR 16"
Prompt for agents
The file header docstring (lines 1-40) and the inline comment at line 313 reference specific PR numbers (PR 7, PR 11, PR 13, PR 15, PR 16) to describe features and scope. Per assistant/AGENTS.md, comments should describe the current state of the codebase, not narrate its history. Replace PR references with descriptions of the actual features they refer to. For example, instead of 'the full PR 7 / PR 11 / PR 13 flow', describe the actual flow: 'the native messaging helper, browser-extension-pair endpoint, and capability token flow'. Instead of 'route from PR 11', say 'the handleBrowserExtensionPair route'. Instead of 'PR 15's mock-chrome-extension fixture', say 'the mock-chrome-extension fixture'. Instead of 'scope of PR 16', just describe what the test currently covers.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +314 to +316
test.skip("TODO(phase-3): WebSocket round-trip via /v1/browser-relay?token=<cap>", () => {
/* See TODO above. */
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 test.skip used for unimplemented feature placeholder instead of test.todo

Line 314 uses test.skip with an empty body for a test that documents expected behavior for an unimplemented feature (Phase 3 WebSocket support). The root AGENTS.md testing rule says: "When adding tests that reproduce a bug or document expected behavior before the fix lands, use test.todo(\"description\", () => {}) so mainline stays green. Convert test.todo to test when the implementation PR lands." The test name itself includes TODO(phase-3), confirming this is a placeholder for future work. The existing codebase follows this convention (see assistant/src/__tests__/conversation-agent-loop-overflow.test.ts which uses test.todo for the same pattern).

Suggested change
test.skip("TODO(phase-3): WebSocket round-trip via /v1/browser-relay?token=<cap>", () => {
/* See TODO above. */
});
test.todo("TODO(phase-3): WebSocket round-trip via /v1/browser-relay?token=<cap>", () => {
/* See TODO above. */
});
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +280 to +301
test("rejects disallowed extension origin with an error frame", async () => {
const pair = mock!;
// Reset the request log so we can assert the helper never contacted
// the pair endpoint in the unauthorized case.
pair.requests.length = 0;

const result = await runHelper({
extensionOrigin: DISALLOWED_ORIGIN,
assistantPort: pair.port,
stdinBytes: encodeFrame({ type: "request_token" }),
});

expect(result.exitCode).not.toBe(0);
expect(result.frames).toHaveLength(1);
const frame = result.frames[0] as { type: string; message?: string };
expect(frame.type).toBe("error");
expect(frame.message).toBe("unauthorized_origin");

// No pair request should have been sent — the helper rejects
// unknown extension origins before touching the network.
expect(pair.requests.length).toBe(0);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Native host fall-through after writeFrameAndExit may cause test flakiness

The native host binary at clients/chrome-extension-native-host/src/index.ts:253-263 has a control flow issue: after calling writeFrameAndExit() for an unauthorized origin, execution falls through to set up stdin listeners (line 265+). writeFrameAndExit returns a never-resolving promise but is not awaited, so the synchronous code after the if block continues. The process.exit(1) in the write callback fires asynchronously on the next tick. If stdin data is already buffered (as it is in the test — the test writes stdin bytes before the process starts), there's a theoretical race window where the stdin handler could process the frame before exit fires. In practice this is unlikely to cause flakiness because the event loop tick ordering tends to be deterministic for buffered writes, but it's a pre-existing design concern in the binary that the new integration test at clients/chrome-extension-native-host/src/__tests__/integration.test.ts:280-301 exercises.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

noanflaherty added a commit that referenced this pull request Apr 8, 2026
* chore: regenerate openapi.yaml for version 0.6.2 bump

The main-branch release commit (#24108) bumped assistant/package.json to
0.6.2 but did not regenerate the openapi spec. Regenerate it on the feature
branch so CI's OpenAPI Spec Check passes for Phase 2 PRs.

* fix(daemon): backport host-browser-proxy defensive guards to host-bash/file/cu proxies (#24115)

* docs(browser): document chrome.debugger infobar decision (#24106)

* feat(clients/macos): decode host_browser_request and host_browser_cancel messages (#24113)

* feat(clients/macos): decode host_browser_request and host_browser_cancel messages

* fix: type HostBrowserRequest.timeoutSeconds as Double?

Matches the daemon's number-typed wire contract and mirrors
HostBashRequest.timeoutSeconds, so fractional timeouts like 0.01s don't
throw a type-mismatch and drop the whole host_browser_request event.

* feat(browser-session): add BrowserSessionManager scaffold with extension backend stub (#24110)

* feat(browser-session): add BrowserSessionManager scaffold with extension backend stub

* test(browser-session): import public API via index.ts to satisfy knip

Updates manager.test.ts to consume BrowserSessionManager, createExtensionBackend,
and types through the public ../index.js entry point instead of deep-importing
../manager.js and ../backends/extension.js. This keeps knip happy during the
scaffold phase: index.ts becomes a transitively-reachable entry point from
src/**/__tests__/**/*.ts before any production module consumes it.

* fix(browser-session): enforce session existence in BrowserSessionManager.send

Throws when the caller passes a sessionId that doesn't exist or has
been disposed. Still advisory for single-backend Phase 2, but makes
disposeSession() an actual enforcement boundary so commands can't run
against stale ids once Phase 4 adds multi-backend routing.

* feat(chrome-extension): add standalone CDP proxy module (#24112)

* feat(chrome-extension): add standalone CDP proxy module

* fix(chrome-extension): inject runtime.lastError and thread sessionId through CDP proxy

- Add runtime.lastError to ChromeDebuggerApi so mocked tests can surface errors
- Fold frame.sessionId into sendCommand params for flat-session routing
- Extract sessionId from event params when building CdpEventFrame
- Document flat-session handling in the module docstring

* fix(chrome-extension): route flat-session sessionId through DebuggerSession target

Chrome 125+ debugger.sendCommand takes sessionId on the target argument
(DebuggerSession), not inside commandParams. Switch back to passing
sessionId on the target. Same change on the onEvent listener — read
sessionId from 'source' rather than params, since flat-session events
surface it on the source.

Also clean up the module docstring to drop PR-level narrative per
clients/AGENTS.md's comment quality rule.

* fix(chrome-extension): bind defaultChromeDebuggerApi methods to chrome.debugger

Returning methods from a Proxy via Reflect.get without binding causes
'Illegal invocation' at runtime because Chrome's native bindings check
this against the original chrome.debugger object. Replace the Proxy with
a plain object whose methods are explicitly bound.

* feat(chrome-extension-native-host): add native messaging helper scaffold (#24114)

* feat(chrome-extension-native-host): add native messaging helper scaffold

* fix(chrome-extension-native-host): robust port discovery, JSON error handling, and assistant terminology

- Add --assistant-port CLI arg so Chrome-spawned helpers can be pointed
  at a non-default port when the lockfile isn't present
- Surface malformed stdin JSON as a protocol-level error frame instead
  of a silent crash
- Rename user-facing 'daemon' to 'assistant' in error messages per
  AGENTS.md terminology rule

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(chrome-extension-native-host): finish daemon→assistant rename in client prose, vars, and smoke test

- README section header and prose use 'assistant' (per root AGENTS.md §139)
- DEFAULT_DAEMON_PORT → DEFAULT_ASSISTANT_PORT, resolveDaemonPort → resolveAssistantPort (per clients/AGENTS.md §403-404)
- Smoke test example uses dynamic import() instead of require() since the package is ESM

* fix(chrome-extension-native-host): flush stdout before exiting

Wait for process.stdout.write callback to fire before calling
process.exit(), so the native-messaging frame actually reaches Chrome
on pipe-backed stdout before the process terminates. Without this,
Chrome can see a disconnect instead of the intended token_response
or error frame under backpressure or larger payloads.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(chrome-extension): add cloud OAuth sign-in skeleton (#24117)

* feat(chrome-extension): add cloud OAuth sign-in skeleton

* fix(chrome-extension): run OAuth sign-in from service worker and validate guardianId

- Popup now sends a message to the background worker to initiate cloud
  sign-in instead of running launchWebAuthFlow directly. This avoids
  the MV3 popup teardown race where the awaited OAuth promise never
  resolves if the popup blurs during the auth window.
- Add guardianId type check to getStoredToken so malformed stored
  tokens can't leak 'Signed in as guardian:undefined' into the popup UI.

* feat(channels): add chrome-extension interface id and per-capability host proxy gating (#24111)

* feat(channels): add chrome-extension interface id and per-capability host proxy gating

* fix(channels): keep hostBrowserProxy available for non-interactive chrome-extension interfaces

updateClient/drain-queue paths used !isInteractive as a proxy for
hasNoClient, which incorrectly marks the chrome-extension's
hostBrowserProxy unavailable immediately after construction.
Decouple the flags: chrome-extension is non-interactive (no prompter
UI) but still has a connected client for host_browser_request events.

- conversation-routes.ts: derive hasNoClient as !(isInteractive || supportsHostProxy(sourceInterface, 'host_browser'))
- server.ts persistAndProcessMessage: same pattern so queued sends don't lose availability
- conversation-process.ts drain queue: add restore path via new Conversation.restoreBrowserProxyAvailability() helper
- conversation.ts: add restoreBrowserProxyAvailability() that re-enables only the browser proxy (gated on hasNoClient)
- channels/types.ts: clarify supportsHostProxy no-arg JSDoc to call out the desktop-only semantics
- conversation-confirmation-signals.test.ts: cover the new restore helper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(channels): targeted hostBrowserProxy enable without relaxing hasNoClient

Cycle 1 derived hasNoClient as !(isInteractive || supportsHostProxy(id, 'host_browser')) to
keep the chrome-extension's browser proxy available. That inadvertently made tool gating treat
the conversation as fully interactive (isInteractive derives from !ctx.hasNoClient), enabling
host_bash/host_file tools that chrome-extension can't service.

Revert to the literal hasNoClient = !isInteractive and instead call a targeted
restoreBrowserProxyAvailability() after updateClient. The helper now enables the browser
proxy regardless of hasNoClient so the single-proxy chrome-extension turn works without
leaking host_bash/host_file tool availability.

Part of JARVIS-1175

* fix(channels): drop 'historically' from JSDoc and tighten chrome-extension else-if in server.ts

- assistant/AGENTS.md: comments describe current state, not history
- server.ts: scope the non-interactive host-browser restore branch to interfaces that
  specifically only support host_browser (not macos, which hits the interactive branch)

* test: add restoreBrowserProxyAvailability to Conversation mocks

Two test files use object-literal mocks for Conversation that need the
new method so they don't throw TypeError at the new call site in
handleSendMessage.

* fix(routes): optional-chain restoreBrowserProxyAvailability for test mocks

* test: allowlist chrome-extension-native-host in gateway-only guard

The native messaging helper intentionally POSTs to the local daemon's
/v1/browser-extension-pair endpoint on 127.0.0.1 to mint capability
tokens for the extension; it's a bootstrap path that cannot and should
not go through the gateway. Add it to the guard-test allowlist.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(runtime): route host_browser_request to connected chrome-extension clients (#24129)

* feat(runtime): route host_browser_request to connected chrome-extension clients

* fix(runtime): gateway guardianId plumbing + queue-drain-safe chrome-extension sender

- handleBrowserRelayUpgrade now looks for x-guardian-id header/query param as a
  fallback when the JWT sub is a service token (gateway-forwarded case)
- Conversation exposes hostBrowserSenderOverride so restoreBrowserProxyAvailability
  preserves the registry-routed sender on drain-queue restores instead of clobbering
  it with the SSE hub sender

* feat(chrome-extension): dispatch host_browser_request frames via CDP proxy behind feature flag (#24125)

* feat(chrome-extension): dispatch host_browser_request frames via CDP proxy behind feature flag

* fix(chrome-extension): use camelCase wire format, tolerate re-attach, guard postResult catch

- Match daemon's actual host_browser_request envelope shape (requestId, cdpMethod,
  cdpParams, cdpSessionId — only timeout_seconds stays snake_case)
- POST /v1/host-browser-result with camelCase keys to match the runtime schema
- Track attached CDP targets and skip re-attach; dispose clears the set
- Wrap postResult calls inside the catch handler so a secondary failure is logged
  instead of becoming an unhandled rejection

* fix(chrome-extension): invalidate attachedTargets cache on debugger detach

Subscribe to CdpProxy.onDetach in the dispatcher and remove the
corresponding key from the attached-targets cache when Chrome notifies
us of a detach (tab close, navigation, infobar cancel, external
takeover). Without this, the cache held a stale entry forever and
subsequent commands skipped the re-attach, causing permanent CDP
failures.

* feat(runtime): add /v1/browser-extension-pair capability token endpoint (#24130)

* feat(runtime): add /v1/browser-extension-pair capability token endpoint

* fix(runtime): align pair endpoint with native helper contract + move secret out of workspace

- Accept extensionOrigin (preferred) and origin (legacy) in request body
- Return expiresAt as ISO 8601 string instead of numeric ms, matching what the
  chrome-extension-native-host helper validates
- Move capabilityTokenSecret out of workspace/data into protected storage alongside
  the actor-token-signing-key per AGENTS.md workspace-isolation rule
- Fix isLoopbackHostHeader to correctly parse IPv6 bracket notation

* fix(runtime): align pair allowlist with native helper + reject malformed bracketed Host headers

- ALLOWED_EXTENSION_ORIGINS now matches the chrome-extension-native-host
  placeholder so the dev pair flow works end-to-end
- parseHostHeader rejects inputs like '[::1]attacker.com' where content
  after the closing bracket is not an optional ':port'

* feat(installer): write Chrome native messaging host manifest on macOS install (#24128)

* feat(installer): write Chrome native messaging host manifest on macOS install

* fix(build): parenthesize native-host staleness check

Bash || and && are equal-precedence left-to-right, so the unparenthesized
condition incorrectly required bun.lock to also be newer for a package.json
update to trigger a rebuild. Group the bun.lock subexpression explicitly.

* fix(installer): conform InstallError to LocalizedError so localizedDescription is useful

* feat(chrome-extension): bootstrap self-hosted capability token via native messaging (#24142)

* feat(chrome-extension): bootstrap self-hosted capability token via native messaging

* fix(chrome-extension): nativeMessaging permission, disconnect race, persistence fallback, popup->worker delegation

- Add nativeMessaging permission to manifest so Chrome actually allows
  chrome.runtime.connectNative('com.vellum.daemon')
- Set settled=true synchronously on token_response so a fast onDisconnect
  can't win the race and reject a valid pairing
- On chrome.storage.local.set failure, log and resolve with the in-memory
  token instead of discarding it (single-session fallback)
- Move the pair flow into the service worker via chrome.runtime.sendMessage
  so the popup teardown can't kill the awaited promise mid-flight

* feat(chrome-extension): connect to cloud gateway browser-relay WebSocket (#24143)

* feat(chrome-extension): connect to cloud gateway browser-relay WebSocket

* fix(chrome-extension): surface missing-token connect failures and ignore stale socket close events

- Worker now returns an actionable error when the selected relay mode has
  no usable token (cloud not signed in, self-hosted not paired)
- RelayConnection's close listener ignores events from superseded sockets
  so a setMode mid-flight does not nuke the new socket reference

* test(host-browser): e2e smoke test for self-hosted native-messaging capability bootstrap (#24154)

* test(host-browser): e2e smoke test for cloud-hosted host_browser_request round-trip (#24153)

* test(host-browser): e2e smoke test for cloud-hosted host_browser_request round-trip

* test(host-browser): exercise actual timeout path and clarify mock WS header support

- Disconnected test renamed/restructured to use a never-resolving CDP handler
  plus a short timeout_seconds, so the proxy's setTimeout path is actually
  covered
- Removed/implemented extraHandshakeHeaders on the mock fixture so the
  advertised API matches reality

* test(cdp-proxy): add unit tests and fix sync targetToDebuggee throw (#24187)

* fix(chrome-extension): evict attached-target cache on CDP send failure (#24188)

* test(host-browser-e2e): rewrite header and convert test.skip to test.todo (#24190)

* test(host-bash-proxy): use bun:test fake timers for timeout regression test (#24189)

* fix(chrome-extension): popup pairing reply + relay-aware host_browser result POST (#24194)

* fix(chrome-extension-native-host): halt unauthorized origins and forward guardianId (#24192)

* fix(daemon): gate host tools by per-capability supportsHostProxy (#24195)

* chore(chrome-extension): typecheck worker.ts + popup.ts and use "assistant" terminology (#24199)

* fix(chrome-extension): popup connect handler honors selected relay mode (#24225)

* chore(chrome-extension): extend bun:test ambient shim with common symbols (#24226)

* fix(daemon): preserve host_browser for chrome-extension in per-capability tool gate (#24224)

* fix(chrome-extension): read live relay mode per request + defensive worker cleanups (#24227)

* chore(chrome-extension): remove stale cdp-proxy declarations and outdated comment (#24228)

* chore(chrome-extension-native-host): split writeFrameAndExit + rewrite history-narrating docstrings (#24229)

* chore(chrome-extension): tighten bun:test shim so only test.todo has optional callback (#24234)

* chore(daemon): rewrite host-tool gating test comment in forward-looking voice (#24233)

* chore(chrome-extension): dedupe RelayConnection.mode accessor (keep getCurrentMode) (#24235)

* fix(chrome-extension): worker reads live relay mode from storage on connect (#24236)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant