Skip to content

refactor(runtime): simplify capabilityForMessageType to direct lookup#29350

Merged
siddseethepalli merged 1 commit into
siddseethepalli/app-control-skillfrom
app-control/slop-r1-4-prefix-simplify
May 3, 2026
Merged

refactor(runtime): simplify capabilityForMessageType to direct lookup#29350
siddseethepalli merged 1 commit into
siddseethepalli/app-control-skillfrom
app-control/slop-r1-4-prefix-simplify

Conversation

@siddseethepalli
Copy link
Copy Markdown
Contributor

@siddseethepalli siddseethepalli commented May 3, 2026

Summary

  • Replace longest-prefix matcher with direct HOST_PREFIX_TO_CAPABILITY[stem] lookup.
  • Delete HOST_PREFIX_KEYS_BY_LENGTH (no longer needed).
  • All existing test cases still pass.

Addresses slop issue #6 from app-control-skill.md round-2 review.


Open in Devin Review

The longest-prefix matcher with HOST_PREFIX_KEYS_BY_LENGTH was over-engineered for current state — every registered key matches a stripped stem exactly. Replace with a direct table lookup keyed on the stem (after stripping _request/_cancel). Behaviorally identical for all currently-defined message types; existing tests still pass.

Part of plan: app-control-skill.md (slop cleanup)
@siddseethepalli siddseethepalli merged commit 1f7c402 into siddseethepalli/app-control-skill May 3, 2026
@siddseethepalli siddseethepalli deleted the app-control/slop-r1-4-prefix-simplify branch May 3, 2026 06:31
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

siddseethepalli added a commit that referenced this pull request May 3, 2026
)

* feat(daemon): add host_app_control capability and message types (#29318)

Add the host_app_control capability to the HostProxyCapability union (macOS only) and declare the wire types (HostAppControlRequest, HostAppControlInput discriminated union, HostAppControlCancel, HostAppControlState, HostAppControlResultPayload). No consumers yet — this is type-only scaffolding for the proxy class in PR 4.

Part of plan: app-control-skill.md (PR 2 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(macos): add HostAppControl request and result types (#29319)

Add Swift types (HostAppControlRequest, HostAppControlInput discriminated enum, HostAppControlCancel, HostAppControlState, HostAppControlResultPayload, WindowBounds) mirroring the TypeScript wire shapes added in PR 2. Codable round-trip matches the JSON conventions used by HostCuRequest.

Part of plan: app-control-skill.md (PR 3 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* refactor(daemon): extract HostProxyBase from HostCuProxy (#29320)

Extract the structurally-shared lifecycle (pending map, timeout, abort SSE, dispose, isAvailable) from HostCuProxy into a new abstract HostProxyBase class. HostCuProxy now extends the base and retains only CU-specific state (step counter, AX-tree diff, loop detector).

Part of plan: app-control-skill.md (PR 1 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(tools): add app-control proxy-tool definitions (#29321)

Define the 8 app-control proxy tools (start, observe, press, combo, type, click, drag, stop) with executionMode: 'proxy' and stub execute() that throws. Add forwardAppControlProxyTool() bridge helper. Mirrors the computer-use tool-definition pattern.

Part of plan: app-control-skill.md (PR 5 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(daemon): add HostAppControlProxy over HostProxyBase (#29323)

Add HostAppControlProxy extending the shared HostProxyBase. Owns app-control-specific state: per-instance active-app, PNG-hash loop guard (5 identical observations -> stuck warning), and a module-level singleton lock so only one conversation holds an active session at a time. Disposes release the lock.

Part of plan: app-control-skill.md (PR 4 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(macos): add per-process keyboard input helper (#29324)

Add AppKeyboard helper that posts synthetic keyboard events to a target process via CGEventPostToPid (NOT CGEventPost) so input is scoped to the target app and never leaks to other foregrounded windows. Supports press (with optional hold duration), combo (simultaneous multi-key hold), and type (Unicode-aware string typing). On cancellation, all held keys are released before re-throwing.

Part of plan: app-control-skill.md (PR 7 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(macos): add per-process mouse input helper (#29325)

Add AppMouse helper that posts synthetic mouse clicks and drags to a target process via CGEventPostToPid (NOT CGEventPost). Coordinates are window-relative and translated to global at post time. Click supports left/right/middle and an optional double-click flag (sets mouseEventClickState=2). Drag posts mouseDown -> 10 interpolated mouseDragged events -> mouseUp.

Part of plan: app-control-skill.md (PR 8 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(macos): add per-app window screenshot helper (#29326)

Add AppWindowCapture for capturing the frontmost normal window of a target process by PID. Returns CaptureResult with state (running/missing/minimized) and PNG base64 + window bounds when available. Distinguishes a missing process from a minimized one. PNG encoding via CGImageDestination.

Part of plan: app-control-skill.md (PR 6 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(runtime): add /v1/host-app-control-result route (#29327)

Add the result-pickup HTTP endpoint that the macOS client POSTs to after executing an app-control action. Mirrors the host-cu-result route. Forwards the payload to conversation.hostAppControlProxy.resolve(requestId, payload). Adds the field declaration on Conversation; full lifecycle wiring lands in PR 10.

Part of plan: app-control-skill.md (PR 9 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(skills): add app-control bundled skill and feature flag (#29328)

Register the new app-control bundled skill (SKILL.md + TOOLS.json + 8 tool stubs forwarding through skill-proxy-bridge). Add the app-control feature flag (defaultEnabled: false, scope: assistant). The skill is gated by the flag via SKILL.md frontmatter; no in-code flag checks needed since the projection layer handles gating.

Part of plan: app-control-skill.md (PR 12 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(macos): add AppControlExecutor dispatching tool actions (#29330)

Implement AppControlExecutor that switches on HostAppControlRequest.input and dispatches to AppWindowCapture (async, ScreenCaptureKit-backed since macOS 15 deprecated CGWindowListCreateImage), AppKeyboard, and AppMouse. Resolves the target app to a pid_t via bundle ID first then localized name. Click/drag fetch current window bounds before posting events.

Part of plan: app-control-skill.md (PR 13 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(daemon): wire HostAppControlProxy into Conversation lifecycle (#29329)

Mirror the four hostCuProxy attachment points in Conversation: declare the field, add setHostAppControlProxy, dispose the proxy in Conversation.dispose, and parallel any teardown/availability checks. PR 9 added the field declaration; this PR completes the lifecycle wiring.

Part of plan: app-control-skill.md (PR 10 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(daemon): route app_control_* tools through HostAppControlProxy (#29331)

Add a sibling branch to the computer_use_* dispatch in surfaceProxyResolver. app_control_stop is handled locally (calls proxy.dispose, returns a stopped summary, no client round-trip), matching CU's _done/_respond pattern. All other app_control_* tools forward to ctx.hostAppControlProxy.request. Returns an isError unavailability result when no proxy or no client connected.

Part of plan: app-control-skill.md (PR 11 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(macos): wire AppControlExecutor into connection setup (#29332)

Add hostAppControlRequest and hostAppControlCancel handlers in the SSE message dispatch, mirroring the existing hostCu* handlers. Each request launches a cancellable Task that calls AppControlExecutor.perform(_:) and POSTs the result to /v1/host-app-control-result. Capability advertisement now includes both host_cu and host_app_control.

Part of plan: app-control-skill.md (PR 15 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* feat(runtime): instantiate HostAppControlProxy for capable clients (#29333)

When a connecting client supports the host_app_control capability, unconditionally instantiate HostAppControlProxy and attach it to the Conversation, plus preactivate the app-control skill. The feature flag is read only by the skill-projection layer via SKILL.md frontmatter — no in-code flag check is needed since unreached tools are harmless.

Part of plan: app-control-skill.md (PR 14 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* test(app-control): end-to-end mocked SSE flow + CGEventPost guard (#29335)

Add an end-to-end app-control flow test driving a fake conversation through start -> observe -> stop with mocked SSE broadcasts and POSTs to /v1/host-app-control-result, plus singleton-lock coverage. Add a static-analysis guard that fails if any AppControl swift file uses the deprecated global CGEventPost (CGEventPostToPid / CGEvent.postToPid are required).

Part of plan: app-control-skill.md (PR 16 of 16)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* chore(tools): delete unused app-control definitions.ts (#29338)

The 400-line tools/app-control/definitions.ts was referenced only by app-control-tool-schemas.test.ts. The production bundled-skill path uses TOOLS.json + bundled-tool-registry.ts. The hand-duplicated schemas in definitions.ts had no sync enforcement against TOOLS.json. Rewrite the schema test to validate TOOLS.json directly.

The skill-proxy-bridge.ts helper is preserved (the bundled-skill stubs still use it).

Part of plan: app-control-skill.md (fix round 1)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(runtime): register host_app_control pending interactions and route capability correctly (#29339)

Two production-breaking fixes for app-control:
1. registerPendingInteraction now handles host_app_control_request by registering with kind: 'host_app_control'. Without this, every result POST from the macOS client fell through the route handler's early-return and the proxy's promise never resolved.
2. capabilityForMessageType now matches the longest prefix before the trailing _request/_cancel suffix. Previously it sliced to the second underscore, mapping host_app_control_request to undefined and broadcasting to all subscribers instead of routing only to host_app_control-capable clients.

Part of plan: app-control-skill.md (fix round 1)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(daemon): inject tool discriminator, clear stop reference, drop dead state (#29340)

Four entangled correctness fixes:
1. surfaceProxyResolver injects 'tool' (e.g. 'start', 'observe') derived from toolName before forwarding to HostAppControlProxy. Without this, the Swift client could not decode requests and the singleton-lock guard never fired.
2. app_control_stop now clears the Conversation's hostAppControlProxy reference after dispose so subsequent tool calls cleanly fail with 'unavailable' instead of dispatching to a disposed proxy.
3. Delete the write-only _actionHistory ring buffer, recordActionFingerprint method, and actionHistory getter; nothing in production read them.
4. PNG-hash STUCK_REPEAT_THRESHOLD lowered from 5 to 4 so the warning fires after 5 total identical observations as the plan specified, not 6.

Part of plan: app-control-skill.md (fix round 1)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(daemon): preactivate app-control on queue dequeue (#29342)

Both dequeue paths in conversation-process.ts reset preactivatedSkillIds and only re-added computer-use. Add the parallel re-add for app-control so the skill remains projected for queued messages 2+, mirroring the CU branch.

Part of plan: app-control-skill.md (fix round 1)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(types): add conversationId to HostAppControlCancel, drop unused occluded state (#29341)

Two wire-type coherence fixes:
1. HostAppControlCancel (TS + Swift) was missing conversationId, but host-proxy-base.ts has always sent it on the wire. Schema now matches the actual envelope, matching HostCuCancelRequest's shape.
2. Drop the HostAppControlState.occluded variant from TS, Swift, the route Zod schema, TOOLS.json, and definitions.ts. AppWindowCapture only emits running/minimized/missing; nothing produces occluded. Re-add when a producer exists.

Part of plan: app-control-skill.md (fix round 1)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* refactor(runtime): simplify capabilityForMessageType to direct lookup (#29350)

The longest-prefix matcher with HOST_PREFIX_KEYS_BY_LENGTH was over-engineered for current state — every registered key matches a stripped stem exactly. Replace with a direct table lookup keyed on the stem (after stripping _request/_cancel). Behaviorally identical for all currently-defined message types; existing tests still pass.

Part of plan: app-control-skill.md (slop cleanup)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* chore(daemon): remove test-only public API from host proxies (#29351)

Two pieces of dead public API surface caught by self-review:
1. HostProxyBase.cancel() was only invoked by its own test file; the production cancel path runs via AbortSignal handling inside dispatchRequest.
2. HostAppControlProxy.activeApp / ActiveApp / currentApp are written in the start-success branch but only read by tests; the actual singleton mechanism is activeAppControlConversationId.

Delete both with their tests.

Part of plan: app-control-skill.md (slop cleanup)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* chore(macos): drop unread Swift fields on app-control wire structs (#29352)

Two Swift fields decoded but never consumed:
1. HostAppControlRequest.toolName — AppControlExecutor switches on input only; the discriminator lives in input.tool.
2. HostAppControlCancel.conversationId — AppDelegate's cancel handler invokes cancelHostAppControlRequest(msg.requestId) and never reads conversationId. The sibling HostCuCancelRequest doesn't carry it either, so the 'wire-shape parity' rationale was inconsistent.

The wire envelope still includes both fields (daemon-side TS types unchanged); Swift's Codable silently ignores them on decode.

Part of plan: app-control-skill.md (slop cleanup)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* refactor(daemon): extract host-proxy preactivation helper (#29353)

The same supportsHostProxy(sourceInterface, capability) gate plus addPreactivatedSkillId(skillId) pattern appeared in four places (conversation-routes.ts, process-message.ts, two paths in conversation-process.ts) — one entry per host-proxy capability per call site. Consolidate into a single source of truth: HOST_PROXY_SKILL_PREACTIVATIONS and preactivateHostProxySkills(). Adding a new host-proxy capability now means updating one list, not four call sites.

Part of plan: app-control-skill.md (slop cleanup)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(macos): surface AppWindowCapture errors in HostAppControlResultPayload (#29357)

ScreenCaptureKit failures (most commonly: Screen Recording permission not granted) silently returned nil from captureWindowPNG, and AppWindowCapture.capture(forPid:) still reported state: running with no PNG. Daemon and LLM saw a 'successful' payload with no error and no image — confusing for the user, who has no signal that the macOS app is missing a permission.

Wire the underlying error string through CaptureResult.captureError into HostAppControlResultPayload.executionError. The window state remains correctly classified (running/minimized/missing); the new error field is an orthogonal signal that capture itself failed even though the window exists.

For click/drag tools, the executor only surfaces the capture error when window bounds are also missing — we only need the bounds for those tools, so a missing PNG is non-fatal there.

Part of plan: app-control-skill.md (post-merge UX fix)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(app-control): activate target before input + add app_control_sequence + observe settle delay (#29363)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

* fix(app-control): register route policy, regenerate registry + openapi

- Register host-app-control-result route policy (approval.write scope)
- Regenerate bundled-tool-registry.ts to include app-control-sequence
- Regenerate openapi.yaml for /v1/host-app-control-result endpoint

Fixes failing CI: Test (bundled-tool-registry-guard, guard-tests),
OpenAPI Spec Check, and Lint (knip unused-files) on #29343.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(macos): map snake_case wire keys for HostAppControlInput coding keys (#29372)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>

---------

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant