feat(runtime): instantiate HostAppControlProxy for capable clients#29333
Conversation
When a connecting client supports the host_app_control capability, unconditionally instantiate HostAppControlProxy and attach it to the Conversation, plus preactivate the app-control skill. The feature flag is read only by the skill-projection layer via SKILL.md frontmatter — no in-code flag check is needed since unreached tools are harmless. Part of plan: app-control-skill.md (PR 14 of 16)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77c1f82a3d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!conversation.isProcessing()) { | ||
| conversation.addPreactivatedSkillId("app-control"); |
There was a problem hiding this comment.
Preactivate app-control for queued macOS turns
When POST /v1/messages hits a busy conversation, this branch skips app-control preactivation, but queued-turn replay in drainQueueImpl only re-adds computer-use (see assistant/src/daemon/conversation-process.ts), not app-control. That means macOS messages accepted while processing will run later without app-control tools projected, so users can intermittently lose app-control capability depending on queue timing.
Useful? React with 👍 / 👎.
| if (!conversation.isProcessing()) { | ||
| conversation.addPreactivatedSkillId("app-control"); | ||
| } |
There was a problem hiding this comment.
🔴 Missing app-control skill preactivation in drain queue causes tools to be unavailable for queued messages
In conversation-routes.ts, the addPreactivatedSkillId("app-control") call is correctly guarded by !conversation.isProcessing() (line 1420), mirroring CU's pattern where preactivation is deferred to drain-queue time. However, the drain queue implementation in conversation-process.ts only re-adds "computer-use" (conversation-process.ts:434 and conversation-process.ts:872) but never re-adds "app-control". When a macOS user sends a message while the conversation is busy: (1) the proxy is attached, (2) but addPreactivatedSkillId("app-control") is skipped because isProcessing() is true, (3) preactivatedSkillIds is reset to undefined at drain start (conversation-process.ts:368, conversation-process.ts:829), (4) only "computer-use" is restored — "app-control" is silently lost. This means app-control tools won't be projected into the tool set for any queued macOS turn.
Prompt for agents
The app-control preactivation in conversation-routes.ts (line 1420-1422) is guarded by !isProcessing(), which defers preactivation to drain-queue time — exactly mirroring the CU pattern at lines 1402-1403. But the drain queue implementation in conversation-process.ts was not updated to include app-control preactivation alongside the existing computer-use preactivation.
Fix: In assistant/src/daemon/conversation-process.ts, add conversation.addPreactivatedSkillId("app-control") in both drain paths — the single-message drain (around line 434, inside the supportsHostProxy(sourceInterface) block) and the batched drain (around line 872, inside the same guard). The condition should be the same as the existing computer-use preactivation: if sourceInterface && supportsHostProxy(sourceInterface), since the no-arg supportsHostProxy returns true only for macOS, which is the same interface that supports host_app_control.
Was this helpful? React with 👍 or 👎 to provide feedback.
) * feat(daemon): add host_app_control capability and message types (#29318) Add the host_app_control capability to the HostProxyCapability union (macOS only) and declare the wire types (HostAppControlRequest, HostAppControlInput discriminated union, HostAppControlCancel, HostAppControlState, HostAppControlResultPayload). No consumers yet — this is type-only scaffolding for the proxy class in PR 4. Part of plan: app-control-skill.md (PR 2 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(macos): add HostAppControl request and result types (#29319) Add Swift types (HostAppControlRequest, HostAppControlInput discriminated enum, HostAppControlCancel, HostAppControlState, HostAppControlResultPayload, WindowBounds) mirroring the TypeScript wire shapes added in PR 2. Codable round-trip matches the JSON conventions used by HostCuRequest. Part of plan: app-control-skill.md (PR 3 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * refactor(daemon): extract HostProxyBase from HostCuProxy (#29320) Extract the structurally-shared lifecycle (pending map, timeout, abort SSE, dispose, isAvailable) from HostCuProxy into a new abstract HostProxyBase class. HostCuProxy now extends the base and retains only CU-specific state (step counter, AX-tree diff, loop detector). Part of plan: app-control-skill.md (PR 1 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(tools): add app-control proxy-tool definitions (#29321) Define the 8 app-control proxy tools (start, observe, press, combo, type, click, drag, stop) with executionMode: 'proxy' and stub execute() that throws. Add forwardAppControlProxyTool() bridge helper. Mirrors the computer-use tool-definition pattern. Part of plan: app-control-skill.md (PR 5 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(daemon): add HostAppControlProxy over HostProxyBase (#29323) Add HostAppControlProxy extending the shared HostProxyBase. Owns app-control-specific state: per-instance active-app, PNG-hash loop guard (5 identical observations -> stuck warning), and a module-level singleton lock so only one conversation holds an active session at a time. Disposes release the lock. Part of plan: app-control-skill.md (PR 4 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(macos): add per-process keyboard input helper (#29324) Add AppKeyboard helper that posts synthetic keyboard events to a target process via CGEventPostToPid (NOT CGEventPost) so input is scoped to the target app and never leaks to other foregrounded windows. Supports press (with optional hold duration), combo (simultaneous multi-key hold), and type (Unicode-aware string typing). On cancellation, all held keys are released before re-throwing. Part of plan: app-control-skill.md (PR 7 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(macos): add per-process mouse input helper (#29325) Add AppMouse helper that posts synthetic mouse clicks and drags to a target process via CGEventPostToPid (NOT CGEventPost). Coordinates are window-relative and translated to global at post time. Click supports left/right/middle and an optional double-click flag (sets mouseEventClickState=2). Drag posts mouseDown -> 10 interpolated mouseDragged events -> mouseUp. Part of plan: app-control-skill.md (PR 8 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(macos): add per-app window screenshot helper (#29326) Add AppWindowCapture for capturing the frontmost normal window of a target process by PID. Returns CaptureResult with state (running/missing/minimized) and PNG base64 + window bounds when available. Distinguishes a missing process from a minimized one. PNG encoding via CGImageDestination. Part of plan: app-control-skill.md (PR 6 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(runtime): add /v1/host-app-control-result route (#29327) Add the result-pickup HTTP endpoint that the macOS client POSTs to after executing an app-control action. Mirrors the host-cu-result route. Forwards the payload to conversation.hostAppControlProxy.resolve(requestId, payload). Adds the field declaration on Conversation; full lifecycle wiring lands in PR 10. Part of plan: app-control-skill.md (PR 9 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(skills): add app-control bundled skill and feature flag (#29328) Register the new app-control bundled skill (SKILL.md + TOOLS.json + 8 tool stubs forwarding through skill-proxy-bridge). Add the app-control feature flag (defaultEnabled: false, scope: assistant). The skill is gated by the flag via SKILL.md frontmatter; no in-code flag checks needed since the projection layer handles gating. Part of plan: app-control-skill.md (PR 12 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(macos): add AppControlExecutor dispatching tool actions (#29330) Implement AppControlExecutor that switches on HostAppControlRequest.input and dispatches to AppWindowCapture (async, ScreenCaptureKit-backed since macOS 15 deprecated CGWindowListCreateImage), AppKeyboard, and AppMouse. Resolves the target app to a pid_t via bundle ID first then localized name. Click/drag fetch current window bounds before posting events. Part of plan: app-control-skill.md (PR 13 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(daemon): wire HostAppControlProxy into Conversation lifecycle (#29329) Mirror the four hostCuProxy attachment points in Conversation: declare the field, add setHostAppControlProxy, dispose the proxy in Conversation.dispose, and parallel any teardown/availability checks. PR 9 added the field declaration; this PR completes the lifecycle wiring. Part of plan: app-control-skill.md (PR 10 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(daemon): route app_control_* tools through HostAppControlProxy (#29331) Add a sibling branch to the computer_use_* dispatch in surfaceProxyResolver. app_control_stop is handled locally (calls proxy.dispose, returns a stopped summary, no client round-trip), matching CU's _done/_respond pattern. All other app_control_* tools forward to ctx.hostAppControlProxy.request. Returns an isError unavailability result when no proxy or no client connected. Part of plan: app-control-skill.md (PR 11 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(macos): wire AppControlExecutor into connection setup (#29332) Add hostAppControlRequest and hostAppControlCancel handlers in the SSE message dispatch, mirroring the existing hostCu* handlers. Each request launches a cancellable Task that calls AppControlExecutor.perform(_:) and POSTs the result to /v1/host-app-control-result. Capability advertisement now includes both host_cu and host_app_control. Part of plan: app-control-skill.md (PR 15 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * feat(runtime): instantiate HostAppControlProxy for capable clients (#29333) When a connecting client supports the host_app_control capability, unconditionally instantiate HostAppControlProxy and attach it to the Conversation, plus preactivate the app-control skill. The feature flag is read only by the skill-projection layer via SKILL.md frontmatter — no in-code flag check is needed since unreached tools are harmless. Part of plan: app-control-skill.md (PR 14 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * test(app-control): end-to-end mocked SSE flow + CGEventPost guard (#29335) Add an end-to-end app-control flow test driving a fake conversation through start -> observe -> stop with mocked SSE broadcasts and POSTs to /v1/host-app-control-result, plus singleton-lock coverage. Add a static-analysis guard that fails if any AppControl swift file uses the deprecated global CGEventPost (CGEventPostToPid / CGEvent.postToPid are required). Part of plan: app-control-skill.md (PR 16 of 16) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * chore(tools): delete unused app-control definitions.ts (#29338) The 400-line tools/app-control/definitions.ts was referenced only by app-control-tool-schemas.test.ts. The production bundled-skill path uses TOOLS.json + bundled-tool-registry.ts. The hand-duplicated schemas in definitions.ts had no sync enforcement against TOOLS.json. Rewrite the schema test to validate TOOLS.json directly. The skill-proxy-bridge.ts helper is preserved (the bundled-skill stubs still use it). Part of plan: app-control-skill.md (fix round 1) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(runtime): register host_app_control pending interactions and route capability correctly (#29339) Two production-breaking fixes for app-control: 1. registerPendingInteraction now handles host_app_control_request by registering with kind: 'host_app_control'. Without this, every result POST from the macOS client fell through the route handler's early-return and the proxy's promise never resolved. 2. capabilityForMessageType now matches the longest prefix before the trailing _request/_cancel suffix. Previously it sliced to the second underscore, mapping host_app_control_request to undefined and broadcasting to all subscribers instead of routing only to host_app_control-capable clients. Part of plan: app-control-skill.md (fix round 1) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(daemon): inject tool discriminator, clear stop reference, drop dead state (#29340) Four entangled correctness fixes: 1. surfaceProxyResolver injects 'tool' (e.g. 'start', 'observe') derived from toolName before forwarding to HostAppControlProxy. Without this, the Swift client could not decode requests and the singleton-lock guard never fired. 2. app_control_stop now clears the Conversation's hostAppControlProxy reference after dispose so subsequent tool calls cleanly fail with 'unavailable' instead of dispatching to a disposed proxy. 3. Delete the write-only _actionHistory ring buffer, recordActionFingerprint method, and actionHistory getter; nothing in production read them. 4. PNG-hash STUCK_REPEAT_THRESHOLD lowered from 5 to 4 so the warning fires after 5 total identical observations as the plan specified, not 6. Part of plan: app-control-skill.md (fix round 1) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(daemon): preactivate app-control on queue dequeue (#29342) Both dequeue paths in conversation-process.ts reset preactivatedSkillIds and only re-added computer-use. Add the parallel re-add for app-control so the skill remains projected for queued messages 2+, mirroring the CU branch. Part of plan: app-control-skill.md (fix round 1) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(types): add conversationId to HostAppControlCancel, drop unused occluded state (#29341) Two wire-type coherence fixes: 1. HostAppControlCancel (TS + Swift) was missing conversationId, but host-proxy-base.ts has always sent it on the wire. Schema now matches the actual envelope, matching HostCuCancelRequest's shape. 2. Drop the HostAppControlState.occluded variant from TS, Swift, the route Zod schema, TOOLS.json, and definitions.ts. AppWindowCapture only emits running/minimized/missing; nothing produces occluded. Re-add when a producer exists. Part of plan: app-control-skill.md (fix round 1) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * refactor(runtime): simplify capabilityForMessageType to direct lookup (#29350) The longest-prefix matcher with HOST_PREFIX_KEYS_BY_LENGTH was over-engineered for current state — every registered key matches a stripped stem exactly. Replace with a direct table lookup keyed on the stem (after stripping _request/_cancel). Behaviorally identical for all currently-defined message types; existing tests still pass. Part of plan: app-control-skill.md (slop cleanup) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * chore(daemon): remove test-only public API from host proxies (#29351) Two pieces of dead public API surface caught by self-review: 1. HostProxyBase.cancel() was only invoked by its own test file; the production cancel path runs via AbortSignal handling inside dispatchRequest. 2. HostAppControlProxy.activeApp / ActiveApp / currentApp are written in the start-success branch but only read by tests; the actual singleton mechanism is activeAppControlConversationId. Delete both with their tests. Part of plan: app-control-skill.md (slop cleanup) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * chore(macos): drop unread Swift fields on app-control wire structs (#29352) Two Swift fields decoded but never consumed: 1. HostAppControlRequest.toolName — AppControlExecutor switches on input only; the discriminator lives in input.tool. 2. HostAppControlCancel.conversationId — AppDelegate's cancel handler invokes cancelHostAppControlRequest(msg.requestId) and never reads conversationId. The sibling HostCuCancelRequest doesn't carry it either, so the 'wire-shape parity' rationale was inconsistent. The wire envelope still includes both fields (daemon-side TS types unchanged); Swift's Codable silently ignores them on decode. Part of plan: app-control-skill.md (slop cleanup) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * refactor(daemon): extract host-proxy preactivation helper (#29353) The same supportsHostProxy(sourceInterface, capability) gate plus addPreactivatedSkillId(skillId) pattern appeared in four places (conversation-routes.ts, process-message.ts, two paths in conversation-process.ts) — one entry per host-proxy capability per call site. Consolidate into a single source of truth: HOST_PROXY_SKILL_PREACTIVATIONS and preactivateHostProxySkills(). Adding a new host-proxy capability now means updating one list, not four call sites. Part of plan: app-control-skill.md (slop cleanup) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(macos): surface AppWindowCapture errors in HostAppControlResultPayload (#29357) ScreenCaptureKit failures (most commonly: Screen Recording permission not granted) silently returned nil from captureWindowPNG, and AppWindowCapture.capture(forPid:) still reported state: running with no PNG. Daemon and LLM saw a 'successful' payload with no error and no image — confusing for the user, who has no signal that the macOS app is missing a permission. Wire the underlying error string through CaptureResult.captureError into HostAppControlResultPayload.executionError. The window state remains correctly classified (running/minimized/missing); the new error field is an orthogonal signal that capture itself failed even though the window exists. For click/drag tools, the executor only surfaces the capture error when window bounds are also missing — we only need the bounds for those tools, so a missing PNG is non-fatal there. Part of plan: app-control-skill.md (post-merge UX fix) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(app-control): activate target before input + add app_control_sequence + observe settle delay (#29363) Co-authored-by: Vellum Assistant <assistant@vellum.ai> * fix(app-control): register route policy, regenerate registry + openapi - Register host-app-control-result route policy (approval.write scope) - Regenerate bundled-tool-registry.ts to include app-control-sequence - Regenerate openapi.yaml for /v1/host-app-control-result endpoint Fixes failing CI: Test (bundled-tool-registry-guard, guard-tests), OpenAPI Spec Check, and Lint (knip unused-files) on #29343. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(macos): map snake_case wire keys for HostAppControlInput coding keys (#29372) Co-authored-by: Vellum Assistant <assistant@vellum.ai> --------- Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Part of plan: app-control-skill.md (PR 14 of 16)