Skip to content

device-use v2: standalone iOS workbench + MCP server (AAP-ready)#249

Merged
zvadaadam merged 35 commits into
mainfrom
zvadaadam/agent-app-previews
Apr 18, 2026
Merged

device-use v2: standalone iOS workbench + MCP server (AAP-ready)#249
zvadaadam merged 35 commits into
mainfrom
zvadaadam/agent-app-previews

Conversation

@zvadaadam

@zvadaadam zvadaadam commented Apr 18, 2026

Copy link
Copy Markdown
Owner

Summary

Refactors packages/device-use from a stateless CLI + Swift binary into a standalone iOS Simulator workbench: a long-running Bun server hosting a React viewer, an MCP HTTP endpoint, a binary-WS touch proxy, and the same engine the CLI now shares as a peer. Ships as the reference implementation of an Agentic Apps Protocol (AAP) app — agentic-app.json is included; the Deus host integration follows in a separate PR.

20 commits, +4.8k / −1k LOC, scoped entirely to packages/device-use/ + docs/device-use-v2-design.md + .github/workflows/device-use-e2e.yml.

What landed

  • Server (src/server/) — Hono on Bun.serve, one process serving:

    • / React viewer (Vite-built)
    • /mcp MCP HTTP transport (24 tools, JSON-RPC over SSE)
    • /ws event bus (broadcast every tool-event + tool-log)
    • /sim-input binary WS proxy → simbridge /ws (low-latency touch + buttons)
    • /api/tools/*, /api/state, /api/stream, /api/events REST surface
    • /stream.mjpeg passthrough from simbridge
    • /health liveness probe
  • Engine (src/engine/) — kept; expanded with xcodebuild, logs.ts, project-info.ts, app-lifecycle primitives. CLI and server are independent peers — both import the engine, neither depends on the other.

  • Frontend (src/frontend/) — React + Vite + Zustand. Phone-frame canvas with live MJPEG, tap/swipe/long-press via binary WS, hardware Home/Lock buttons (Cmd+H/Cmd+L), real-time @ref element sidebar with auto-snapshot on interaction, build-status feedback, agent activity feed, log drawer, keyboard input that batches keystrokes through type_text.

  • CLIdevice-use serve boots the server. All other commands (list, boot, snapshot, tap, type, screenshot, etc.) unchanged. SDK + stream subcommand deleted (replaced by the server + /sim-input).

  • AAP manifestpackages/device-use/agentic-app.json. Declares launch command, ready probe, viewer URL, MCP endpoint, storage, lifecycle, requires.

  • Design docdocs/device-use-v2-design.md. Locked architectural decisions, implementation log, and a handoff section for the next PR (AAP host integration in Deus).

  • CI.github/workflows/device-use-e2e.yml. Nightly + workflow_dispatch on macos-15; runs the full e2e (build → install → launch → tap) against a real simulator.

Test plan

  • bun test — 86/86 pass (engine unit + server integration)
  • bun run typecheck — clean
  • bun scripts/e2e-server.ts — full real-sim flow passes (18 tool-events, 97 tool-log frames)
  • bun scripts/touch-smoke.ts/sim-input binary WS proxy passes touch correctly
  • bun scripts/longpress-smoke.ts — 1.5s hold opens iOS context menu (verified)
  • Manual: bun run starthttp://localhost:3100 → boot sim → set TestApp project → ▶ Run → click apps → swipe → home → keyboard typing
  • Manual: MCP from Claude Code → tools/list returns 24 tools, tools/call round-trips

What's deliberately NOT in this PR

Tracked as v1.1 in the design doc:

  • Wheel-to-scroll mapping — drag works; mousewheel doesn't.
  • Pinch & rotation — needs simbridge multi-touch; out of scope.
  • CLI actions surfacing in /ws activity feed — peer model preserved; opt-in --via-server flag is the planned shape.
  • Deletion of legacy simulator code in apps/web/src/features/simulator/ etc. — happens in the AAP host PR after soak.
  • Live-sim e2e per PR — only nightly. macOS minutes are 10× Linux; nightly is the right cadence.

Architectural choices reviewers will likely ask about

All deliberate, all in the design doc:

  • MCP transport is stateless per-request, not session-based — required by the SDK's WebStandardStreamableHTTPServerTransport.
  • Touch input bypasses invokeTool for latency (/sim-input → simbridge WS direct). REST + MCP taps still go through invokeTool so MCP/REST agents see them in the activity feed.
  • StateStore is a JSON file, not SQLite — ~200 bytes of state per workspace.
  • Two as any casts in toolInputSchema() — zod-to-json-schema's input type is too deep for the union we pass; documented inline.
  • Binds to 0.0.0.0 so localhost resolves regardless of v4/v6 — discovered while debugging an orphaned IPv6 listener shadowing us.

Known visible gaps

Documented honestly; none are PR blockers:

  • Volume buttons dropped — simbridge's sendButton only wires home/lock.
  • Keyboard input is best-effort batched (150ms idle); fast typing of "hello" arrives as one type_text call. Special keys other than Enter/Backspace are not forwarded.
  • The 10s sim-store poll triggers list_devices started/completed events that show in the activity feed (small noise — could move to event-driven later).

Next steps (separate PR)

The Deus-side AAP host integration. The "Handoff to next PR" section at the bottom of docs/device-use-v2-design.md lays out the receiver-side responsibilities, the ~2.5k LOC of legacy simulator code that gets deleted once it works, 8 lessons device-use learned that should shape the host's behavior, and a concrete first-commit slice.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Web-based iOS simulator workbench: viewer, device frame, sidebar, logs, toasts, snapshot/tap/type/run controls, MCP HTTP + WebSocket APIs, live log streaming, persistent workspace state, and a new local "serve" command.
  • Removed
    • Legacy SDK session API and prior streaming CLI/manager.
  • Documentation
    • Added design doc, updated README, and new Agentic Apps manifest for workspace launch.
  • Tests
    • New unit, integration, smoke, and e2e test suites plus runner scripts.
  • Chores
    • Nightly macOS e2e CI workflow, package metadata and .gitignore updates, dependency bump.

zvadaadam and others added 20 commits April 17, 2026 23:30
Adds the foundation for device-use v2 as a standalone product: a Hono
server on Bun.serve that will later host /mcp, /ws, /api/*, and a
React viewer at /. Dev mode proxies to Vite for HMR; prod serves the
built bundle from dist/frontend.

- Hono server at packages/device-use/src/server/index.ts with /health
- React scaffold at packages/device-use/src/frontend/ (main/App/index.html)
- Vite config rooted at src/frontend, outputs to dist/frontend
- tsconfig: JSX + DOM libs
- scripts: bun run dev (server), dev:frontend (vite), build, start

Also lands docs/device-use-v2-design.md — the full design record
(15 locked architectural decisions, phased implementation plan,
draft AAP manifest for later).

Part of the AAP (Agentic Apps Protocol) effort. This PR is the
device-use standalone refactor; AAP protocol itself is a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…per, logs

Adds three new engine primitives needed for the v2 "build + run + observe"
flow, each injectable for unit tests:

- project-info.ts — parses `xcodebuild -list -json` for an .xcodeproj or
  .xcworkspace; returns schemes / targets / configurations. Used by the
  agent (and the viewer's scheme picker) to know what to build.

- xcodebuild.ts — `build()` wraps `xcodebuild build` with spawn so the
  stdout/stderr line stream can be piped into tool-log events (for the
  viewer's logs drawer and the agent's per-build trace). Supports
  AbortSignal so stop-app can cancel an in-flight build. Predicts the
  built .app path when a derivedDataPath is provided; `resolveAppPath()`
  is the fallback that queries -showBuildSettings -json.

- logs.ts — `streamLogs()` wraps `simctl spawn <udid> log stream` with
  injectable spawner. Supports bundleId + pid filtering via --predicate,
  and a stop() handle for graceful teardown.

All three follow the existing engine pattern (executor/spawner as first
arg, typed errors extending DeviceUseError). 18 new unit tests — all
using fake executors/spawners, zero real subprocess deps.

Exit: bun test passes 72/72; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires all 23 MCP tools behind a single invoker that emits tool-event
frames over WebSocket. The same invokeTool path handles REST, MCP, and
WS calls — one engine path, one trace, zero duplicated behavior.

New modules:
- state.ts — persistent JSON at {storage}/state.json (pinned UDID,
  active project/scheme). Write-serialized to avoid clobbers.
- events.ts — EventBus with 200-event ring buffer + subscriber set.
  Emits ToolEvent + ToolLog; reuses MCP tool-call shape (no parallel type).
- invoker.ts — central dispatcher. Every MCP/REST/WS tool call goes
  through invokeTool; emits started/completed/failed with a shared id.
- tools.ts — 23 tool definitions (Zod-schema'd handlers). Each handler
  can use ctx.executor / ctx.state / ctx.stream / ctx.refMap / ctx.events.
- mcp.ts — stateless MCP HTTP handler. Creates a fresh Server+Transport
  per request (SDK requirement for stateless mode) and routes tool calls
  through invokeTool.
- stream.ts — StreamManager owns the long-lived simbridge --stream
  subprocess; exposes proxyStream() for /stream.mjpeg passthrough.

Server wires them together:
- GET  /health              — liveness
- GET  /api/state           — persisted state
- GET  /api/tools           — all tools + JSON schemas
- POST /api/tools/:name     — invoke a tool (REST)
- GET  /api/events          — recent tool-event history
- GET  /stream.mjpeg        — MJPEG passthrough
- GET  /api/stream          — stream status
- ALL  /mcp                 — MCP HTTP transport
- WS   /ws                  — tool-event + tool-log subscribe + invoke

Deps: @modelcontextprotocol/sdk, zod-to-json-schema.

Exit verified end-to-end:
- 86 tests pass (14 new integration tests)
- typecheck clean
- curl /api/tools → 23-tool JSON
- curl /mcp tools/list + tools/call → proper SSE responses
- scripts/ws-smoke.ts → WS receives started+completed tool-event after
  REST invoke, correlation id matches

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r, logs)

Builds out the standalone UI served at /. Works entirely through the
HTTP + WS endpoints landed in Phase 3 — no direct engine imports
from the frontend.

Components:
- TopBar: sim picker, project path input, scheme dropdown,
  ▶ Run button with disabled/building/failed states
- DeviceFrame: phone bezel + MJPEG stream from /stream.mjpeg;
  placeholder when no sim is pinned or sim not booted;
  ripple animation dispatched from tap tool-events via a Zustand
  subscription (not a useEffect setState)
- Sidebar: snapshot-on-demand with clickable @ref list
  (each taps the element) + live activity feed of recent tool-events
- LogsDrawer: auto-scrolled tail of tool-log frames
- Toasts: one-liner on build/install/launch completion or any failure

Stores (Zustand):
- sim-store — sims list, pinned UDID, stream info
- project-store — active project path, scheme, configuration,
  build status
- activity-store — tool-event ring buffer + toast dispatch
- logs-store — tool-log ring buffer (500 lines)

Infra:
- lib/ws.ts — WS client hook with exponential reconnect.
  Routes tool-event → activity-store, tool-log → logs-store.
- lib/api.ts — typed fetch wrappers around /api/tools/*.
- styles.css — minimal dark theme using the same color tokens
  as the Deus IDE. No Tailwind (keeps deps light).

Exit verified:
- typecheck clean, eslint clean
- vite build succeeds (202KB JS, 5KB CSS, 43 modules)
- Hono→Vite proxy serves the page via /

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dd serve)

The legacy SDK (fluent session builder) and CLI `stream` subcommand
were built for a stateless-CLI world. v2 replaces them with the
long-lived server, MCP endpoint, and /stream.mjpeg passthrough —
so these surfaces are dead weight.

Deleted:
- src/sdk/                    (fluent `session()` builder + apps helpers)
- src/cli/stream/             (manager + viewer-html generator)
- src/cli/commands/stream.ts  (CLI stream enable/disable/status)
- scripts/sdk-smoke.ts

Added:
- src/cli/commands/serve.ts — `device-use serve [--port N] [--host] [--open]`
  spawns bun on the server module and inherits stdio. Works in dev
  (source path) and in bundled form.

Updated:
- src/cli/index.ts            — drop stream import, add serve
- scripts/build-ts.ts         — drop sdk build entry; now produces
                                 dist/cli.js + dist/engine.js
- package.json                — description updated; main/exports
                                 point at engine (primary programmatic
                                 entry); bin unchanged

CLI surface (preserved): list, boot, shutdown, open, session, snapshot,
query, tap, swipe, type, fill, screenshot, wait-for, open-url, apps,
appstate, launch, terminate, permission, doctor, install, serve.

Exit verified:
- typecheck clean
- 86 tests pass
- `bun run cli help` lists all commands including `serve`
- `bun run cli serve --port 3101` starts server, /health → 200
- `bun run build:ts` produces dist/cli.js + dist/engine.js cleanly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final phase: makes device-use v2 AAP-ready and documents the shape.

- agentic-app.json — manifest at the package root. Host IDEs (Deus,
  MCP-speaking IDEs) read this to launch device-use, probe readiness
  via /health, embed the viewer at /, and wire agent tool calls into
  /mcp. Uses the schema shape agreed in docs/device-use-v2-design.md.

- run tool (composite) — chains build → resolveAppPath → install →
  launch_app in one call. Reads CFBundleIdentifier from the built
  Info.plist when bundleId isn't provided. Total MCP tools now 24.
  The Viewer's ▶ Run button calls build + install + launch as
  separate steps today; future phases can switch to this composite
  for a cleaner progress story.

- README rewritten — new architecture, how to run the server, MCP
  endpoint config for Claude Desktop, full tool list.

- AGENTS.md rewritten — new layout, how to add a tool (invokeTool
  path), how to add a CLI command, event shape, hard rules ("never
  bypass invokeTool", "stateless CLI stays stateless", etc.).

- .gitignore — adds .device-use/ (server's per-workspace state dir).

End-to-end verified:
- typecheck clean
- 86 tests pass
- curl /health → {ok:true}
- curl /api/tools → 24 tools
- curl /mcp tools/list → 24 tools
- set_active_simulator persists via state.json
- WebSocket streams started+completed tool-events with correlation id
- Pre-flight Info.plist-read in run tool handles bundleId resolution

device-use v2 is now the standalone product the AAP will later consume.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/e2e-server.ts spawns a fresh device-use server on port 3199,
subscribes to /ws, exercises the full data flow, and asserts event
correlation across REST, MCP, and WebSocket paths. Runs in ~25s when
xcodebuild has a warm DerivedData cache.

Covers:
1. Server lifecycle — spawn + /health probe + graceful SIGTERM.
2. Tool registry — REST /api/tools reports 24 tools.
3. MCP HTTP transport — initialize + tools/list + tools/call(get_state)
   over JSON-RPC-over-SSE with per-request stateless transport.
4. Simulator lifecycle — list_devices, set_active_simulator,
   boot (Shutdown → Booted).
5. Project introspection — get_project_info parses xcodebuild -list -json
   and finds the TestApp scheme.
6. Composite run tool — build → resolveAppPath → install → auto-resolve
   bundleId from Info.plist → launch. End-to-end in one call.
7. Snapshot — a11y tree walk, @refs assigned to interactive nodes.
8. Tap — ref-based tap through simbridge.
9. State persistence — state.json round-trip verified on disk AND
   via get_state tool.
10. WS event correlation — every tool emits paired started+completed
    frames with matching ids; build streams tool-log frames during
    xcodebuild.

First successful run:
  24 tools · 19 tool-events · 273 tool-log frames · 0 failures · 25s

Future regression: `bun scripts/e2e-server.ts` from the package dir.
Requires Xcode + an iPhone 17 Pro simulator available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The server previously bound to 127.0.0.1 only. On a machine with a
stale IPv6 listener on the same port (e.g. an orphaned sim-helper
from agent-simulator — observed in the wild, PPID=1), browsers
resolving `localhost` to ::1 first would connect to the wrong server
and render the stale UI.

Binding to 0.0.0.0 (all IPv4 interfaces) means happy-eyeballs falls
back to 127.0.0.1 cleanly when ::1 has no listener, and any IPv6
orphan on the same port becomes harmless rather than a hijacker.

Root cause, not a workaround: the orphan was a real bug in older
tooling; this change makes us robust against that class of collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The stream subprocess (simbridge --stream) was only auto-starting at
server startup for a pre-pinned sim. Booting interactively via the UI
would pin + boot the sim but never start the stream — viewer stuck on
"Waiting for stream…" even though everything else worked (snapshot,
activity feed, etc.).

Fix:
- boot tool — after simctl boot succeeds, start the stream for that
  UDID. Failures are logged but don't fail the boot (CLI callers can
  still boot without needing a stream).
- set_active_simulator tool — if the newly-pinned sim is already
  booted, start the stream too. Covers the "open viewer after sim
  is already booted externally" path.

Both return the new stream info in their result so the UI doesn't
need to wait for the 10s refresh tick to pick it up.

Verified against a real sim: restart server → pinned UDID auto-boots
and streams on an assigned port → /stream.mjpeg returns
200 multipart/x-mixed-replace with frames.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mouse events on the MJPEG image now drive the simulator, not just
watch it. Click → tap(x, y). Drag → swipe(from, to) with an 8pt
threshold to distinguish the two. Cursor turns crosshair when a
clickable stream is loaded.

To map CSS pixels → simulator points, StreamManager.start() now
probes the upstream /config on ready and computes logical point
dimensions (assuming the standard 3x retina scale). StreamInfo
gains `size: { pxW, pxH, ptW, ptH }`, exposed via /api/stream.
The DeviceFrame multiplies the click's fractional position by
ptW/ptH to get point coords for the tap/swipe tools.

Ripple animations now use the dynamic ptW from streamInfo instead
of the previous hardcoded 402pt estimate, so they land on the
right spot on any device.

Every interaction routes through invokeTool → same activity feed
entries, same tool-event broadcast, same MCP trace. Zero new
invocation paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clicks on the MJPEG <img> silently did nothing — browsers don't fire
mouse events reliably on images that swap frames continuously. The
original Deus simulator viewer avoided this by rendering MJPEG frames
into a <canvas> and sending touch events as binary WebSocket frames
to simbridge's /ws (0x03 prefix + JSON payload, begin/move/end
lifecycle). Adopting that proven pattern.

Server:
- /sim-input WebSocket endpoint. Opens an upstream WS to simbridge's
  /ws for the active stream and forwards binary frames one-way
  (browser → simbridge). Buffers messages until upstream handshake
  completes. Closes upstream when client disconnects.
- Upgrade flow attaches { kind: "events" | "sim-input", wsUrl? } to
  the Bun WS data field so a single websocket handler can multiplex.

Frontend:
- DeviceFrame now renders the stream into a <canvas>. A requestAnimation
  Frame loop drives ctx.drawImage(img, 0, 0) from an off-screen Image
  pointed at /stream.mjpeg. Canvas fires mouse events reliably even
  while frames are updating.
- Mouse events encode 0x03-prefixed TouchEventPayload (begin on
  mousedown, move on drag, end on mouseup) and send normalized 0..1
  coordinates via the /sim-input WS.
- Window-level mouseup catches drags that release outside the canvas.
- Ripples fire locally on mousedown — immediate UI feedback without
  waiting for a server round-trip.

Verified end-to-end: tap at Maps' center (0.27, 0.20) → Maps app
launched with "Allow Location" permission dialog. scripts/touch-smoke.ts
exercises the full path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tap

User reported the refs sidebar showed stale elements: the list still
referenced the home screen after they'd opened Maps. Fix: every canvas
mouseup schedules a snapshot refresh (debounced 500ms, collapses rapid
taps into one call). The sidebar also auto-snapshots the moment the
stream first goes live, and shows the foreground app name next to the
"elements" header so the user knows what they're looking at.

- refs-store.ts — new Zustand store with refs + foreground + loading,
  a refresh() action, and a scheduleRefresh(delayMs) debouncer
- DeviceFrame — calls scheduleRefresh on canvas mouseup AND on the
  window-level mouseup (drag-release-off-canvas path)
- Sidebar — consumes the refs store, auto-snapshots on first stream
  connect, re-snapshots after any ref tap (sidebar → canvas tap →
  screen changes → refs stale otherwise)

Activity feed also gets clearer: users can see which tap was theirs
(REST via api.tap from ref clicks) vs. which came from the direct WS
/sim-input path (those bypass invokeTool and don't show in activity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lback)

Ref taps were hitting the bounding-box center, which misses the hit
target for elements like switches/checkboxes where the tappable area
is not centered on the a11y frame (e.g. Maps' "Traffic" checkbox —
the a11y node's center is on the label, the switch is to the right).

Fix: use the engine's existing tapEntry helper, which tries in order:
  1. tapById(identifier) — simbridge scans the a11y tree and taps the
     matching element's natural hit target
  2. tapByLabel(label) — same, by accessibility label
  3. tap(x, y) — coordinate fallback
This matches what the CLI's `tap @eN` command already does. Ref taps
in the viewer sidebar now toggle switches, check checkboxes, and
activate controls whose hit target differs from the a11y bounds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a row of hardware buttons (Home / Lock / Vol+ / Vol−) below the
phone. Clicks send 0x04 + JSON{button} frames to simbridge's /ws via
the existing /sim-input proxy — same transport as touch, so they're
instant and don't round-trip through invokeTool.

Also:
- Cmd+H / Cmd+L keyboard shortcuts for Home / Lock (respects focus;
  ignores keys while typing in inputs)
- Canvas right-click now preventDefault's (no more browser context
  menu popping over the stream)
- Button press triggers the same debounced refs-snapshot refresh that
  canvas taps do, so the sidebar stays in sync after going home

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical cleanup after the v2 refactor, no behavior changes. 86
tests + e2e-server (18 tool events, 97 tool-log frames) still pass.

Server:
- EventBus now owns `logHandles: Map<string, LogHandle>` directly.
  stream_logs / stop_logs drop the `ctx.events as unknown as …` cast
  and the `_logHandles ??= new Map()` dance.
- New `toolInputSchema()` helper in tools.ts wraps the zod-to-json-
  schema call with its `as any` cast. /api/tools and /mcp tools/list
  both use it; removes the duplicated expression and two casts.
- WS_SUBSCRIBERS map collapsed from `{unsubscribe, ws: unknown}` to
  the unsubscribe function directly. The `ws: unknown` field was
  never read.
- Dropped the `void buf;` no-op in stream.ts — the simbridge stderr
  handler just discards the parameter.

Frontend:
- snapshot tool now returns an explicit `foreground: string | null`
  (the root Application node's label). Frontend reads it as a named
  field instead of digging into `tree[0].label`.
- snapshot call in api.ts stops requesting interactiveOnly:false
  (it was inflating the sidebar with non-clickable static text).
  Server default of `interactiveOnly: true` is what the UI wants.
- project-store: BuildStatus type trimmed from 6 states to 4
  (idle|running|done|failed). The old "building|installing|launching"
  sub-states were dead — nothing set them.
- TopBar's Run button wired to `api.run()` (the composite: build →
  install → launch) instead of `api.build()`. "▶ run" actually runs.
- DeviceFrame: extracted `normalize(clientX, clientY)` helper so the
  canvas mouseup and window-level mouseup share one coordinate path.
- Every scheduleRefresh() callsite dropped its explicit 500ms arg;
  that's now the store's default.

CLI:
- `device-use serve --open` now polls /health (up to 6s) before
  opening the browser, replacing the 1500ms blind sleep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…them

simbridge's SimAccessibilityBridge.sendButton only dispatches home and
lock; volumeup/volumedown payloads reach the handler but are no-ops.
Keeping the UI honest by removing the two buttons that don't do
anything. If volume control comes back later, it'll need a sendButton
case in Swift first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sim-store refreshes /api/stream every 10s and writes a fresh
streamInfo object into state. Both effects in DeviceFrame (the MJPEG
canvas render loop and the /sim-input WebSocket) depended on the
whole streamInfo object, so each poll tore down the canvas, cleared
it, recreated the Image element, and reconnected the WS. Result: a
visible black flash on the phone every 10 seconds and a one-frame
gap in touch delivery if you happened to be mid-drag.

Fix: depend on primitive identity — streamUdid (and streamPort for
the WS). Same UDID/port → same effect instance, no teardown. If the
underlying sim actually changes UDID (user switches sim), both
effects re-fire correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dentity

The first stab at the 10s blink fixed DeviceFrame's effect deps but
missed two other places that still churned references on every poll:

1. sim-store.refresh() set streamInfo / sims / pinnedUdid
   unconditionally, so every 10s each of those fields got a new
   reference even when the content was unchanged. Anything subscribed
   to them via `useSimStore()` re-rendered.
2. Sidebar's "take an initial snapshot on first stream connect"
   effect depended on the full streamInfo object; every poll
   re-fired the effect's guard check.

Real fix — at the store, not the consumers:
- sim-store now compares new vs. previous stream/sims/pinnedUdid
  and reuses the old reference when content matches. No downstream
  sees a spurious update.
- Sidebar effect now depends on `streamInfo?.udid` (primitive)
  instead of the object, matching DeviceFrame's pattern. Even if
  we later regress #1, #2 is independently safe.

Served bundle: index-Cre9C4oH.js.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes closing the last gaps before the PR opens.

1. Keyboard input on the viewer — keystrokes while focus isn't on
   another input get batched into a 150ms idle buffer and flushed via
   the type_text tool. Enter flushes with submit=true (presses Return
   after). Backspace trims the local buffer. Modifier keys (Cmd/Ctrl)
   are preserved for the existing hardware-button shortcuts
   (Cmd+H / Cmd+L). Buffering collapses fast typing like "hello"
   from 5 sequential simbridge subprocesses into one.

2. Long-press verified — scripts/longpress-smoke.ts holds begin for
   1500ms, then releases end, and asserts the iOS context menu
   appears (screen refs grow, including the "Dismiss context menu"
   action). Passes against a real iPhone simulator. The architecture
   already supported it: our /sim-input WS forwards begin+end raw to
   simbridge, which keeps the touch down for whatever interval the
   client chooses. No code needed — just proof.

3. GitHub Actions workflow for nightly e2e on macos-15. Picks any
   available iPhone, boots it, runs scripts/e2e-server.ts end-to-end
   (spawn → /health → WS → REST + MCP → build → install → launch →
   tap → state persistence). Schedule + manual dispatch only; we
   don't run per-PR (macOS minutes cost 10×). The e2e script now
   honors $E2E_SIM_UDID to target a specific sim without changing
   its default behavior for local dev.

Verified end-to-end:
  - type_text via REST: {success:true}
  - longpress-smoke.ts: refs 3 → 7, iOS context menu opened
  - touch-smoke.ts: still passes
  - e2e-server.ts: full flow passes (18 tool-events, 97 tool-logs)
  - 86/86 unit tests pass
  - typecheck clean

Out of scope for this PR (noted in design doc as v1.1):
  - wheel→scroll mapping
  - pinch/rotation multi-touch
  - CLI actions appearing in /ws activity feed (peer model stays pure)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "Handoff to next PR" block to docs/device-use-v2-design.md so
whoever picks up the Deus-side AAP integration has a working brief.

Covers:
- What to build in Deus (apps.service.ts, app-registrar.ts, lifecycle
  tools, frontend launcher) with concrete file paths.
- What to delete after AAP works (~2.5k LOC of legacy simulator code
  in apps/web/src/features/simulator/ + simulator-context.ts +
  deus-tools/simulator.ts).
- 8 lessons device-use learned that should shape AAP host behavior:
  requires validation, {port} substitution, mandatory health probe,
  storage env var, child.on(exit) > polling, low-latency input
  back-channel, runtime setMcpServers, framework-agnostic frontend.
- Things device-use deliberately deferred (wheel-scroll, CLI in WS,
  multi-sim side-by-side) and where they belong.
- Open design questions for the next PR.
- A concrete first-commit slice for the Deus AAP PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 18, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors device-use into a Bun-backed standalone iOS simulator workbench: adds a long-lived server (HTTP/MCP/WS), React frontend, unified tool registry and invoker, simbridge stream manager, persistent workspace state, Xcode/xcrun engine wrappers, e2e and smoke test scripts, and removes the previous SDK/session and legacy stream CLI.

Changes

Cohort / File(s) Summary
CI / Docs / Manifests
.github/workflows/device-use-e2e.yml, docs/device-use-v2-design.md, packages/device-use/AGENTS.md, packages/device-use/README.md, packages/device-use/agentic-app.json
Adds nightly/manual e2e workflow, design doc, updated AGENTS/README, and an Agentic Apps manifest.
Root & Package config
package.json, packages/device-use/package.json, packages/device-use/tsconfig.json, packages/device-use/vite.config.ts, packages/device-use/.gitignore
Bumps dependency, switches package entrypoint to engine artifact, enables DOM/React TypeScript libs/JSX, adds Vite config, and ignores .device-use/.
CLI surface
packages/device-use/src/cli/commands/serve.ts, packages/device-use/src/cli/index.ts
Adds serve CLI (port/host/--open, health polling + opener) and registers it; replaces previous stream registration.
Removed legacy CLI stream
packages/device-use/src/cli/commands/stream.ts, packages/device-use/src/cli/stream/manager.ts, packages/device-use/src/cli/stream/viewer-html.ts
Deletes legacy stream command, manager, and viewer HTML generator and their exported APIs.
Build scripts
packages/device-use/scripts/build-ts.ts
Removes SDK build entry (no longer emits SDK artifact).
SDK removal
packages/device-use/src/sdk/...
packages/device-use/src/sdk/builder.ts, packages/device-use/src/sdk/apps.ts, packages/device-use/src/sdk/index.ts
Removes SDK/session builder, app-name→bundle resolver, and SDK public entry exports.
Server core
packages/device-use/src/server/index.ts, packages/device-use/src/server/events.ts, packages/device-use/src/server/state.ts, packages/device-use/src/server/stream.ts, packages/device-use/src/server/invoker.ts, packages/device-use/src/server/mcp.ts
Adds Bun server entrypoint (Hono) with /health, /api, /mcp, /ws, /sim-input, EventBus, StateStore, StreamManager, and invokeTool dispatcher.
Tool registry
packages/device-use/src/server/tools.ts
Introduces unified TOOLS registry, Context creation, tool input schema generation, UDID resolution, and implementations for device/project/build/install/run/automation/logging tools.
Engine utilities
packages/device-use/src/engine/xcodebuild.ts, packages/device-use/src/engine/project-info.ts, packages/device-use/src/engine/logs.ts, packages/device-use/src/engine/index.ts
Adds xcodebuild wrapper (build/resolveAppPath), project introspection (getProjectInfo), log streaming (streamLogs), and re-exports.
Frontend app & components
packages/device-use/src/frontend/App.tsx, packages/device-use/src/frontend/main.tsx, packages/device-use/src/frontend/index.html, packages/device-use/src/frontend/styles.css, packages/device-use/src/frontend/components/*
Adds React frontend, entrypoint, HTML, stylesheet, and UI components (TopBar, DeviceFrame, Sidebar, LogsDrawer, Toasts).
Frontend stores & client API
packages/device-use/src/frontend/stores/*, packages/device-use/src/frontend/lib/api.ts, packages/device-use/src/frontend/lib/ws.ts
Adds Zustand stores (sim, project, refs, activity, logs), typed fetch API client, and WS hook with reconnect/backoff.
Stream & input smoke / e2e scripts
packages/device-use/scripts/e2e-server.ts, packages/device-use/scripts/longpress-smoke.ts, packages/device-use/scripts/touch-smoke.ts, packages/device-use/scripts/ws-smoke.ts (removed sdk-smoke.ts)
Adds e2e runner that boots simulator and validates flows, touch/longpress/ws smoke tests; removes legacy SDK smoke script.
Tests
packages/device-use/test/*
xcodebuild.test.ts, project-info.test.ts, logs.test.ts, server.test.ts
Adds unit/integration tests for build, project introspection, log streaming, and server/tool invocation/state persistence.
Frontend tooling
packages/device-use/vite.config.ts, packages/device-use/src/frontend/*
Configures Vite dev/build for frontend and adds React/Vite toolchain files.
Many new TS modules
packages/device-use/src/{server,engine,frontend}/...
Large additions: event bus, state store, stream manager, tools registry, invoker, MCP adapter, engine wrappers, frontend stores/components and related types.

Sequence Diagram

sequenceDiagram
    actor User
    participant Frontend as React Viewer
    participant Server as Bun HTTP/WS Server
    participant Tools as Tool Registry / Invoker
    participant Engine as xcodebuild/xcrun wrappers
    participant Bridge as simbridge subprocess
    participant Simulator as iOS Simulator

    User->>Frontend: open UI / click Run
    Frontend->>Server: POST /api/tools/run {project, scheme, udid}
    Server->>Tools: invokeTool("run", params)
    Tools->>Server: emit tool-event (started)
    Tools->>Engine: build(project, scheme, destination)
    Engine->>Bridge: spawn xcodebuild / xcrun (child process)
    Bridge->>Simulator: build/install/launch actions
    Simulator-->>Bridge: runtime responses
    Bridge-->>Engine: stdout/stderr lines
    Engine->>Tools: BuildResult / appPath
    Tools->>Engine: install(appPath, udid) / launchApp(...)
    Tools->>Server: emit tool-event (completed)
    Server-->>Frontend: WS tool-event / tool-log frames
    Frontend->>Server: WS binary sim-input (touch)
    Server->>Bridge: forward binary frame
    Bridge->>Simulator: inject HID input
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰
I hopped from SDK to Bun so bright,
A tiny server sprang to light.
Canvas, websockets, simbridge hum —
Logs and toasts and a run button.
Hop hop — the app wakes; what a sight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'device-use v2: standalone iOS workbench + MCP server (AAP-ready)' accurately and concisely describes the main change: refactoring device-use from a stateless CLI to a long-running standalone server workbench with MCP support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zvadaadam/agent-app-previews

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🧹 Nitpick comments (16)
packages/device-use/scripts/ws-smoke.ts (1)

27-28: JSON.parse may throw on malformed frames.

If the server sends non-JSON data, JSON.parse will throw and crash the script. This is acceptable for a smoke test (crash = test failure), but wrapping in try/catch would make failure diagnostics clearer.

🛡️ Optional defensive parsing
 ws.addEventListener("message", (e) => {
+  let msg;
+  try {
-  const msg = JSON.parse(typeof e.data === "string" ? e.data : e.data.toString());
+    msg = JSON.parse(typeof e.data === "string" ? e.data : e.data.toString());
+  } catch {
+    console.warn("Ignoring malformed frame:", e.data);
+    return;
+  }
   seen.push(msg);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/scripts/ws-smoke.ts` around lines 27 - 28, The message
handler attached with ws.addEventListener("message", ...) calls JSON.parse
directly on e.data which will throw on malformed JSON and crash; wrap the parse
in a try/catch inside that handler (around JSON.parse and subsequent use of msg)
to catch parsing errors, log a clear diagnostic including the raw frame (e.data
/ e.data.toString()) and the error, and then either return or close the socket
so the smoke test fails with a readable message; update references to
JSON.parse, msg, and e.data in that handler accordingly.
packages/device-use/src/frontend/components/Toasts.tsx (1)

5-7: Redundant null check after length guard.

Line 7's if (!latest) return null is unreachable — if toasts.length === 0 on line 5, we've already returned. After that guard, toasts[toasts.length - 1] is guaranteed to exist.

🧹 Optional cleanup
 export function Toasts() {
   const toasts = useActivityStore((s) => s.toasts);
   if (toasts.length === 0) return null;
   const latest = toasts[toasts.length - 1];
-  if (!latest) return null;

   return (
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/frontend/components/Toasts.tsx` around lines 5 - 7,
The redundant null check in the Toasts component should be removed: after the
early return "if (toasts.length === 0) return null" the computed "const latest =
toasts[toasts.length - 1]" is guaranteed to be defined, so delete the
unreachable "if (!latest) return null" check (or collapse the two checks into a
single guard) to simplify the logic around the toasts/latest variables.
packages/device-use/src/frontend/lib/ws.ts (1)

21-31: Consider ts-pattern for message type dispatch (optional).

Per coding guidelines, discriminated unions should prefer ts-pattern with .exhaustive(). However, since this parses external JSON where unknown types should be silently ignored, the current simple conditionals are pragmatic. If more message types are added, consider refactoring.

♻️ Example with ts-pattern (if desired)
import { match } from "ts-pattern";

// In the message handler:
match(msg)
  .with({ type: "tool-event" }, (m) => useActivityStore.getState().push(m))
  .with({ type: "tool-log" }, (m) => useLogsStore.getState().append(m))
  .otherwise(() => {/* ignore unknown */});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/frontend/lib/ws.ts` around lines 21 - 31, The message
handler attached in ws.addEventListener currently uses ad-hoc conditionals to
dispatch parsed JSON frames to useActivityStore.getState().push and
useLogsStore.getState().append; refactor this dispatch to use ts-pattern's match
for clearer discriminated-union handling (e.g., match(msg).with({type:
"tool-event"}, m => useActivityStore.getState().push(m)).with({type:
"tool-log"}, m => useLogsStore.getState().append(m)).otherwise(() => {/* ignore
unknown */})), and if you add more message types later, use .exhaustive() to
ensure coverage while preserving the current behavior of silently ignoring
unknown external messages.
packages/device-use/src/frontend/components/LogsDrawer.tsx (1)

8-10: Consider debouncing rapid scroll updates.

When logs stream rapidly, calling scrollIntoView({ behavior: "smooth" }) on every append may queue up animations and cause visual jank. Consider debouncing or using behavior: "instant" for high-frequency updates.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/frontend/components/LogsDrawer.tsx` around lines 8 -
10, The current useEffect calling bottomRef.current?.scrollIntoView({ behavior:
"smooth", block: "end" }) on every change to lines can queue animations when
logs stream rapidly; modify the scroll logic in the useEffect to debounce rapid
updates (e.g., store a timeout or timestamp in a ref and only call
scrollIntoView after a short delay or when no new lines arrive for X ms) or
switch to instant scrolling for high-frequency updates (detect update rate via a
lastUpdate ref and if updates are too frequent call scrollIntoView({ behavior:
"instant", block: "end" }) instead), updating the useEffect, bottomRef usage,
and any added refs/cleanup to clear timeouts on unmount.
packages/device-use/test/xcodebuild.test.ts (1)

21-37: The fake-child cleanup hook is a no-op right now.

afterEach() clears STARTED, but makeSpawner() never pushes the created FakeChild into that array. If a future test leaks a late emit or unread stream data, this guard won’t actually isolate the suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/test/xcodebuild.test.ts` around lines 21 - 37, The
cleanup is ineffective because makeSpawner never records created FakeChild
instances in STARTED; modify makeSpawner to push each created child into the
STARTED array (e.g., after creating `const child = new FakeChild();` call
`STARTED.push(child)`), so the existing afterEach() guard can actually clear and
prevent leaked emits or unread streams; reference symbols: makeSpawner, STARTED,
FakeChild, spawner.
packages/device-use/src/frontend/stores/refs-store.ts (1)

22-48: Move snapshot fetching out of Zustand and into TanStack Query.

refs, foreground, and loading are server-state concerns, and this store is re-implementing fetch lifecycle/invalidation manually. Keeping only the debounce/local UI bits here and letting Query own api.snapshot() would simplify refresh behavior and align with the repo’s expected state split.

As per coding guidelines, "Use TanStack Query v5 for server state management."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/frontend/stores/refs-store.ts` around lines 22 - 48,
The store currently performs server fetching in useRefsStore (keys: refs,
foreground, loading, refresh, scheduleRefresh) by calling api.snapshot(); remove
the fetch lifecycle from refresh and loading management and instead use TanStack
Query v5 to fetch snapshot data (e.g., a useSnapshotQuery that returns refs and
foreground). Keep useRefsStore only for local UI/debounce concerns: expose local
foreground mutation and scheduleRefresh (update scheduleRefresh to call
queryClient.invalidateQueries or the query's refetch trigger instead of invoking
api.snapshot). Replace any direct uses of useRefsStore.refresh with the query
refetch/invalidation and remove manual loading state handling from the store.
Ensure components read refs/foreground from the query and only use the store for
debounced UI state.
.github/workflows/device-use-e2e.yml (1)

21-24: Consider pinning Bun version for CI stability.

Using bun-version: latest can cause unexpected failures if a new Bun release introduces breaking changes. Pinning to a specific version improves reproducibility.

♻️ Proposed fix
       - name: Setup Bun
         uses: oven-sh/setup-bun@v2
         with:
-          bun-version: latest
+          bun-version: "1.2"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/device-use-e2e.yml around lines 21 - 24, Replace the
dynamic bun-version in the GitHub Actions step that sets up Bun (the "Setup Bun"
step using oven-sh/setup-bun@v2) with a pinned, explicit version string (e.g.,
"0.x.y") instead of "latest" so CI is stable and reproducible; update the
bun-version input to the chosen fixed semver and document the pinned version in
the workflow comment.
packages/device-use/src/frontend/components/TopBar.tsx (1)

58-64: Prefer Tailwind class over inline style.

Using inline style bypasses Tailwind's design system. Consider using a Tailwind width utility instead.

♻️ Proposed fix
       <input
         type="text"
         placeholder="project path (.xcodeproj or .xcworkspace)"
         defaultValue={project.path ?? ""}
         onBlur={onProjectPath}
-        style={{ minWidth: 280 }}
+        className="min-w-70"
       />

As per coding guidelines: "Use Tailwind CSS v4 for styling."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/frontend/components/TopBar.tsx` around lines 58 - 64,
Replace the inline style on the input in TopBar.tsx with a Tailwind utility:
remove style={{ minWidth: 280 }} and add a className using the Tailwind
arbitrary value (for example className="min-w-[280px]") while preserving
existing props (type, placeholder, defaultValue={project.path ?? ""},
onBlur={onProjectPath}); ensure you merge the className if one already exists on
the element.
packages/device-use/scripts/touch-smoke.ts (1)

41-51: WebSocket error event discards diagnostic information.

The error handler rejects with a generic message, losing the actual error details that could help diagnose connection failures.

♻️ Proposed fix
   await new Promise<void>((resolve, reject) => {
     const t = setTimeout(() => reject(new Error("ws open timeout")), 3000);
     ws.addEventListener("open", () => {
       clearTimeout(t);
       resolve();
     });
-    ws.addEventListener("error", () => {
+    ws.addEventListener("error", (ev) => {
       clearTimeout(t);
-      reject(new Error("ws error"));
+      reject(new Error(`ws error: ${ev.message ?? "unknown"}`));
     });
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/scripts/touch-smoke.ts` around lines 41 - 51, The "error"
listener on the WebSocket promise throws a generic Error and discards diagnostic
details; change the listener on ws (the ws.addEventListener("error", ... )
block) to accept the error/event argument (e.g., (err) or (ev)) and reject with
that error or a new Error that includes its message/stack (e.g., reject(err) or
reject(new Error(`ws error: ${err?.message || String(err)}`))) so the real
diagnostic information is preserved when the promise fails.
packages/device-use/src/frontend/stores/project-store.ts (1)

37-46: Consider handling setActiveProject failure.

If api.setActiveProject() fails (network error, server error), the local state is still updated (lines 40-45), potentially causing UI/server state divergence. Consider checking the result before updating local state, or at minimum logging the error.

🛡️ Suggested approach
   setProject: async (path, scheme, configuration) => {
-    await api.setActiveProject(path, scheme, configuration);
+    const setResult = await api.setActiveProject(path, scheme, configuration);
+    if (!setResult.success) {
+      console.warn("[project-store] setActiveProject failed:", setResult.error);
+      // Optionally: return early or show toast
+    }
     const info = await api.getProjectInfo(path);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/frontend/stores/project-store.ts` around lines 37 -
46, The setProject handler updates local state regardless of
api.setActiveProject outcome; change it to check the result of
api.setActiveProject(path, scheme, configuration) and only proceed to call
api.getProjectInfo and set({...}) when the setActiveProject call succeeds,
otherwise log the error (or surface it to the caller) and avoid mutating the
local store (or revert any partial changes). Specifically update the setProject
function to capture the response/error from api.setActiveProject, handle failure
paths (logging via processLogger/console or returning an error), and only use
api.getProjectInfo and set(...) when setActiveProject indicates success.
docs/device-use-v2-design.md (1)

39-39: Add language specifier to fenced code block.

The architecture diagram code block is missing a language specifier. Use text or plaintext for ASCII diagrams.

📝 Suggested fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────┐
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/device-use-v2-design.md` at line 39, The fenced code block containing
the architecture diagram is missing a language specifier; update that block (the
ASCII "architecture diagram" fenced code block) to start with a language tag
such as ```text or ```plaintext so the diagram is rendered correctly (e.g.,
replace ``` with ```text at the opening fence).
packages/device-use/test/server.test.ts (1)

94-106: Misleading test name.

The test name says "emits started + failed on unknown tool" but the assertion (line 104) correctly expects only 1 event (failed). The invoker intentionally skips started for unknown tools. Consider renaming for clarity.

📝 Suggested fix
-  test("emits started + failed on unknown tool", async () => {
+  test("emits failed (no started) on unknown tool", async () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/test/server.test.ts` around lines 94 - 106, The test
title "emits started + failed on unknown tool" is misleading because
invokeTool("nope", ...) intentionally skips emitting a "started" event and only
emits a single "failed" tool-event; rename the test description to something
like "emits failed on unknown tool" (update the string passed to test(...) that
wraps the assertions using invokeTool, events, and toolEvents) so the name
matches the assertions and behavior of invokeTool.
packages/device-use/src/frontend/components/DeviceFrame.tsx (1)

97-108: Unused streamPort in dependency array.

The WebSocket effect depends on streamPort (line 108) but doesn't use it in the effect body. While this may be intentional (to reconnect when stream config changes), it could cause unnecessary WS reconnections when only the port changes without affecting the connection URL (which uses location.host).

If the intent is to reconnect when the underlying stream restarts, consider documenting this with a comment, or remove streamPort if reconnection isn't needed:

-  }, [booted, streamUdid, streamPort]);
+  // Reconnect WS when stream restarts (new UDID or port indicates simbridge restart)
+  }, [booted, streamUdid, streamPort]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/frontend/components/DeviceFrame.tsx` around lines 97
- 108, The useEffect in DeviceFrame that opens the /sim-input WebSocket includes
streamPort in its dependency array but never references it, causing unnecessary
reconnects; either remove streamPort from the dependency array to avoid spurious
reconnections or (if reconnect on stream restart is intended) add a short
comment above the effect explaining that streamPort is included intentionally to
force reconnection on stream changes; if the latter is truly required for
reconnect behavior, alternatively incorporate streamPort into the connection key
(e.g., the URL or a query param) so its inclusion is explicit. Ensure you update
the dependency list on the useEffect and/or add the clarifying comment near
wsRef and the useEffect.
packages/device-use/src/server/index.ts (1)

167-212: Consider typed WebSocket wrapper.

The ws as unknown as object casts (lines 181, 209) are used as Map keys because Bun's WebSocket type doesn't have a stable identity type. While functional, this is a type-safety workaround.

For future maintainability, consider creating a typed wrapper or using WeakMap with the WebSocket instance if Bun's types improve. Current approach works but relies on object identity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/server/index.ts` around lines 167 - 212, The code
uses unsafe casts ws as unknown as object when storing keys in WS_SUBSCRIBERS
and WS_UPSTREAMS inside the websocket.open handler; replace this fragile pattern
by introducing a typed wrapper or keying mechanism: create a small wrapper
type/class (e.g., ClientConnection) that holds the Bun WebSocket and any
metadata (unsubscribe, upstream, buffered) and store instances of that wrapper
in WS_SUBSCRIBERS/WS_UPSTREAMS (or switch those maps to WeakMap<WebSocket, ...>
keyed by the actual WebSocket type if Bun's types allow it); update
websocket.open (and any lookup/cleanup sites) to create and store the wrapper
(or use the WebSocket directly with WeakMap) instead of casting, and update
references to retrieve the wrapper to access unsubscribe/upstream/buffered.
packages/device-use/src/server/stream.ts (1)

107-108: Consider draining stdout to prevent backpressure.

The process is spawned with stdio: ["ignore", "pipe", "pipe"], meaning stdout is piped but never consumed (only stderr has a no-op handler on line 108). If simbridge writes to stdout, this could cause backpressure and potentially hang the subprocess.

🛡️ Suggested fix
     // Drop simbridge's stderr — it's noisy about private-framework loads.
+    child.stdout?.on("data", () => {});
     child.stderr?.on("data", () => {});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/server/stream.ts` around lines 107 - 108, The
subprocess's stdout is piped but never consumed, causing possible backpressure;
add a no-op drain for stdout by attaching a data handler to the child.stdout
stream (similar to the existing child.stderr?.on("data", () => {})) so stdout is
consumed; locate the spawned process represented by the variable child in
stream.ts and add child.stdout?.on("data", () => {}), and consider also handling
child.stdout?.on("error", ...) if needed for robustness.
packages/device-use/AGENTS.md (1)

18-30: Add language specifier to fenced code block.

The directory tree code block is missing a language specifier. Use text or plaintext for non-syntax-highlighted blocks to satisfy linters.

📝 Suggested fix
-```
+```text
 packages/device-use/
 ├── native/              Swift simbridge — don't rewrite without good reason
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/AGENTS.md` around lines 18 - 30, The fenced
directory-tree block in AGENTS.md is missing a language specifier; update the
opening code fence (the triple backticks that start the block) to include a
plain-text specifier (e.g., change ``` to ```text or ```plaintext) so the block
is treated as non-syntax-highlighted text; leave the block contents (the tree
with lines like "packages/device-use/" and "├── native/") unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/device-use-v2-design.md`:
- Around line 54-57: The architecture diagram text currently reads "TanStack
Start on Nitro" but decision `#5` locks the runtime to "Hono on Bun.serve"; update
the diagram string "TanStack Start on Nitro" to "Hono on Bun.serve" (and any
nearby labels like routes: /, /mcp, /ws if needed) so the diagram matches the
documented decision `#5`.
- Around line 63-65: The document references "TanStack Start React viewer" which
conflicts with decision `#5`; update that phrase to reflect the chosen stack by
replacing "TanStack Start React viewer" with "Vite + React viewer (using
TanStack Router for routing)" so the frontend description consistently states
Vite + React and mentions TanStack Router as the routing library.

In `@packages/device-use/agentic-app.json`:
- Around line 10-33: The manifest uses "localhost" for UI/MCP URLs while the
launched command defaults to binding to 127.0.0.1; unify them to the same
loopback address to avoid IPv6/hostname resolution mismatches. Either add a host
flag to the launcher (in "launch.args": include "--host", "127.0.0.1") or change
the published endpoints to 127.0.0.1 (replace "localhost" with "127.0.0.1" in
"ui.url" and in "agent.tools.url") so the IDE connects to the exact interface
the serve command binds to.

In `@packages/device-use/package.json`:
- Around line 54-62: The package.json for the device-use package is missing the
zustand dependency used by the frontend stores (sim-store.ts, refs-store.ts,
project-store.ts, logs-store.ts, activity-store.ts); add "zustand" (e.g.
"^4.0.0" or the repo standard zustand version) to the "dependencies" object in
packages/device-use/package.json and reinstall (npm/yarn/pnpm) to update the
lockfile so imports from "zustand" resolve at runtime.

In `@packages/device-use/README.md`:
- Around line 65-88: The fenced code block containing the ASCII diagram in the
README is missing a language tag which triggers markdownlint; update the opening
fence from ``` to ```text so the block is explicitly marked as plain text (i.e.,
change the ASCII art fenced block in the README to start with ```text).

In `@packages/device-use/scripts/e2e-server.ts`:
- Around line 168-173: The current shutdown Promise relies on checking
serverProc!.killed inside the setTimeout, which is incorrect because killed
becomes true as soon as the signal is sent; instead track process exit with a
local flag and use that in the timeout to decide whether to send SIGKILL: add a
boolean like exited=false, set it to true inside the serverProc!.once("exit",
...) handler (and resolve there), send SIGTERM immediately via
serverProc!.kill("SIGTERM"), and in the setTimeout callback check the exited
flag (not serverProc!.killed) and call serverProc!.kill("SIGKILL") only if
exited is still false; ensure resolve is only called once from the exit handler.

In `@packages/device-use/src/cli/commands/serve.ts`:
- Around line 29-33: The moduleCandidates array contains an incorrect compiled
candidate path: from the cli/commands location the entry "../server/index.js"
resolves to cli/server/index.js instead of the package-level compiled server;
update the candidate in moduleCandidates (where path.resolve is called with
here) to point one directory higher (use ../../server/index.js) so the compiled
layout (dist/cli + dist/server) correctly resolves the server entrypoint.
- Around line 51-65: The code always calls spawn(opener, [url]) after the
polling loop even if /health never returned OK; change the logic in the
params.open async IIFE to track whether a successful health probe occurred
(e.g., set sawHealthy = true when res.ok) and only call spawn(opener, [url],
...) when sawHealthy is true; also break out of the loop immediately after a
successful probe (you already do that) and consider skipping the opener if the
server process has exited (check any server process variable available in this
scope) so that spawn(opener, ...) is executed only when a 200 was observed and
the server is still running.
- Around line 68-72: The current await new Promise using child.once("exit")
always returns success: true even when the child process failed; change to await
both 'error' and 'exit' events from the spawned child (listen for
child.once("error", err => reject(err)) and child.once("exit", (code, signal) =>
code === 0 ? resolve() : reject(new Error(`exit code ${code} signal
${signal}`)))) so failures are propagated; then catch/reject and return success:
false with an informative message containing the exit code/signal or error.
Refer to the existing child variable and the Promise around child.once("exit")
in serve.ts to locate where to replace the behavior.

In `@packages/device-use/src/engine/project-info.ts`:
- Around line 40-43: detectKind currently treats any path not ending with
".xcworkspace" as "xcodeproj", which masks invalid inputs; update
detectKind(projectPath: string) to explicitly check for ".xcworkspace" and
".xcodeproj" and throw a clear Error for any other extensions so callers fail
fast with a helpful message (include projectPath in the error text for context).

In `@packages/device-use/src/engine/xcodebuild.ts`:
- Around line 150-159: The code currently sets result.appPath to a synthesized
path (using options.derivedDataPath, options.configuration, and options.scheme)
even when that path may not exist; update the block in xcodebuild.ts so that
after constructing the candidate path (the same join of options.derivedDataPath,
"Build/Products", `${options.configuration ?? "Debug"}-iphonesimulator`,
`${options.scheme}.app`) you verify the directory actually exists (synchronously
or asynchronously via fs.existsSync / fs.promises.access) and only assign
result.appPath when the check passes; otherwise leave result.appPath unset (or
null) so callers won’t receive a bogus path.

In `@packages/device-use/src/frontend/components/Sidebar.tsx`:
- Around line 13-17: The effect in Sidebar (useEffect) wrongly skips refreshing
refs when streamUdid changes because it checks refs.length === 0 against the
global useRefsStore; remove the refs.length condition so that when streamUdid
changes you always call useRefsStore.getState().refresh(), i.e. change the
effect to run on [streamUdid] and unconditionally invoke refresh() when
streamUdid is truthy; keep the existing call site
useRefsStore.getState().refresh() and ensure dependencies include streamUdid
(and not refs.length) so refs are reloaded on every simulator-stream change.

In `@packages/device-use/src/frontend/components/TopBar.tsx`:
- Around line 11-15: The onBoot handler currently calls await
api.boot(pinnedUdid) and then always calls useSimStore.getState().refresh(),
which swallows errors and refreshes state even on failure; wrap the api.boot
call in a try/catch inside onBoot (check pinnedUdid as before), call
useSimStore.getState().refresh() only on successful boot, and in the catch
surface the error to the user (e.g., via an existing notification/toast utility
or processLogger) so failures are visible; adjust onBoot to reference api.boot,
pinnedUdid and useSimStore.getState().refresh() accordingly.

In `@packages/device-use/src/frontend/stores/sim-store.ts`:
- Around line 64-67: The setPinned function updates local state before
confirming the API succeeded; change it so api.setActiveSimulator(udid) is
awaited inside a try/catch and only call set({ pinnedUdid: udid }) after a
successful response. In the catch block (or on non-success response) do not
update pinnedUdid, log or surface the error, and optionally return or rethrow
the error so callers can react; update the function signature/return value if
needed to convey success/failure. Ensure you modify the setPinned async function
and references to api.setActiveSimulator and set({ pinnedUdid }) to implement
this behavior.

In `@packages/device-use/src/frontend/styles.css`:
- Around line 82-85: The .topbar button.primary rule uses a hardcoded color
("white") and direct color literals elsewhere; update this selector and the
other mentioned blocks (around the ranges noted in the comment) to use CSS
variables instead of literals — e.g., replace color: white with color:
var(--text-on-accent) and replace any hex/rgba values with appropriate tokens
like var(--accent), var(--muted), or new variables added to the token set;
ensure you update the selectors .topbar button.primary and the other rules
referenced (the blocks around 124-134, 161-173, 176-183, 386-399) to
consistently reference CSS variables and add any missing variables to the theme
token definitions.
- Line 345: Replace the deprecated CSS declaration "word-break: break-word" with
the modern equivalent "overflow-wrap: break-word" in the stylesheet; locate the
rule that currently contains the "word-break: break-word;" property (the
declaration shown in packages/device-use/src/frontend/styles.css) and swap it to
use "overflow-wrap: break-word;" to restore the same wrapping behavior without
using deprecated syntax.

In `@packages/device-use/src/server/state.ts`:
- Around line 66-72: The persist() method currently assigns this.writeChain =
this.writeChain.then(async () => { await mkdir(...); await writeFile(...); }) so
a single rejected mkdir()/writeFile() will leave this.writeChain rejected and
forever skip later update()/clear() writes; fix by catching errors inside the
chained callback so the promise returned to this.writeChain always resolves:
wrap the mkdir/writeFile calls in a try/catch (inside persist's then callback),
log the error (or handle it) but do not rethrow, ensuring this.writeChain
remains fulfilled and subsequent calls to persist() continue to execute;
reference persist(), this.writeChain, mkdir, writeFile, update, clear,
storageDir, file and state to locate where to add the try/catch.

In `@packages/device-use/src/server/tools.ts`:
- Around line 553-562: The handler currently calls streamLogs(...) but doesn't
pass onError/onExit handlers and also doesn't guard against streamLogs throwing,
so stale entries are left in ctx.events.logHandles; update the handler in the
async function that calls resolveUdid/streamLogs to (1) wrap the streamLogs call
in try/catch and only insert into ctx.events.logHandles and return a
subscriptionId on success, and (2) pass onError and onExit callbacks to
streamLogs that remove the handle from ctx.events.logHandles (using the id), and
emit a final event (e.g., "tool-log" with stream "stderr" or a dedicated
"tool-exit") via ctx.events.emit so callers know the subscription ended.

---

Nitpick comments:
In @.github/workflows/device-use-e2e.yml:
- Around line 21-24: Replace the dynamic bun-version in the GitHub Actions step
that sets up Bun (the "Setup Bun" step using oven-sh/setup-bun@v2) with a
pinned, explicit version string (e.g., "0.x.y") instead of "latest" so CI is
stable and reproducible; update the bun-version input to the chosen fixed semver
and document the pinned version in the workflow comment.

In `@docs/device-use-v2-design.md`:
- Line 39: The fenced code block containing the architecture diagram is missing
a language specifier; update that block (the ASCII "architecture diagram" fenced
code block) to start with a language tag such as ```text or ```plaintext so the
diagram is rendered correctly (e.g., replace ``` with ```text at the opening
fence).

In `@packages/device-use/AGENTS.md`:
- Around line 18-30: The fenced directory-tree block in AGENTS.md is missing a
language specifier; update the opening code fence (the triple backticks that
start the block) to include a plain-text specifier (e.g., change ``` to ```text
or ```plaintext) so the block is treated as non-syntax-highlighted text; leave
the block contents (the tree with lines like "packages/device-use/" and "├──
native/") unchanged.

In `@packages/device-use/scripts/touch-smoke.ts`:
- Around line 41-51: The "error" listener on the WebSocket promise throws a
generic Error and discards diagnostic details; change the listener on ws (the
ws.addEventListener("error", ... ) block) to accept the error/event argument
(e.g., (err) or (ev)) and reject with that error or a new Error that includes
its message/stack (e.g., reject(err) or reject(new Error(`ws error:
${err?.message || String(err)}`))) so the real diagnostic information is
preserved when the promise fails.

In `@packages/device-use/scripts/ws-smoke.ts`:
- Around line 27-28: The message handler attached with
ws.addEventListener("message", ...) calls JSON.parse directly on e.data which
will throw on malformed JSON and crash; wrap the parse in a try/catch inside
that handler (around JSON.parse and subsequent use of msg) to catch parsing
errors, log a clear diagnostic including the raw frame (e.data /
e.data.toString()) and the error, and then either return or close the socket so
the smoke test fails with a readable message; update references to JSON.parse,
msg, and e.data in that handler accordingly.

In `@packages/device-use/src/frontend/components/DeviceFrame.tsx`:
- Around line 97-108: The useEffect in DeviceFrame that opens the /sim-input
WebSocket includes streamPort in its dependency array but never references it,
causing unnecessary reconnects; either remove streamPort from the dependency
array to avoid spurious reconnections or (if reconnect on stream restart is
intended) add a short comment above the effect explaining that streamPort is
included intentionally to force reconnection on stream changes; if the latter is
truly required for reconnect behavior, alternatively incorporate streamPort into
the connection key (e.g., the URL or a query param) so its inclusion is
explicit. Ensure you update the dependency list on the useEffect and/or add the
clarifying comment near wsRef and the useEffect.

In `@packages/device-use/src/frontend/components/LogsDrawer.tsx`:
- Around line 8-10: The current useEffect calling
bottomRef.current?.scrollIntoView({ behavior: "smooth", block: "end" }) on every
change to lines can queue animations when logs stream rapidly; modify the scroll
logic in the useEffect to debounce rapid updates (e.g., store a timeout or
timestamp in a ref and only call scrollIntoView after a short delay or when no
new lines arrive for X ms) or switch to instant scrolling for high-frequency
updates (detect update rate via a lastUpdate ref and if updates are too frequent
call scrollIntoView({ behavior: "instant", block: "end" }) instead), updating
the useEffect, bottomRef usage, and any added refs/cleanup to clear timeouts on
unmount.

In `@packages/device-use/src/frontend/components/Toasts.tsx`:
- Around line 5-7: The redundant null check in the Toasts component should be
removed: after the early return "if (toasts.length === 0) return null" the
computed "const latest = toasts[toasts.length - 1]" is guaranteed to be defined,
so delete the unreachable "if (!latest) return null" check (or collapse the two
checks into a single guard) to simplify the logic around the toasts/latest
variables.

In `@packages/device-use/src/frontend/components/TopBar.tsx`:
- Around line 58-64: Replace the inline style on the input in TopBar.tsx with a
Tailwind utility: remove style={{ minWidth: 280 }} and add a className using the
Tailwind arbitrary value (for example className="min-w-[280px]") while
preserving existing props (type, placeholder, defaultValue={project.path ?? ""},
onBlur={onProjectPath}); ensure you merge the className if one already exists on
the element.

In `@packages/device-use/src/frontend/lib/ws.ts`:
- Around line 21-31: The message handler attached in ws.addEventListener
currently uses ad-hoc conditionals to dispatch parsed JSON frames to
useActivityStore.getState().push and useLogsStore.getState().append; refactor
this dispatch to use ts-pattern's match for clearer discriminated-union handling
(e.g., match(msg).with({type: "tool-event"}, m =>
useActivityStore.getState().push(m)).with({type: "tool-log"}, m =>
useLogsStore.getState().append(m)).otherwise(() => {/* ignore unknown */})), and
if you add more message types later, use .exhaustive() to ensure coverage while
preserving the current behavior of silently ignoring unknown external messages.

In `@packages/device-use/src/frontend/stores/project-store.ts`:
- Around line 37-46: The setProject handler updates local state regardless of
api.setActiveProject outcome; change it to check the result of
api.setActiveProject(path, scheme, configuration) and only proceed to call
api.getProjectInfo and set({...}) when the setActiveProject call succeeds,
otherwise log the error (or surface it to the caller) and avoid mutating the
local store (or revert any partial changes). Specifically update the setProject
function to capture the response/error from api.setActiveProject, handle failure
paths (logging via processLogger/console or returning an error), and only use
api.getProjectInfo and set(...) when setActiveProject indicates success.

In `@packages/device-use/src/frontend/stores/refs-store.ts`:
- Around line 22-48: The store currently performs server fetching in
useRefsStore (keys: refs, foreground, loading, refresh, scheduleRefresh) by
calling api.snapshot(); remove the fetch lifecycle from refresh and loading
management and instead use TanStack Query v5 to fetch snapshot data (e.g., a
useSnapshotQuery that returns refs and foreground). Keep useRefsStore only for
local UI/debounce concerns: expose local foreground mutation and scheduleRefresh
(update scheduleRefresh to call queryClient.invalidateQueries or the query's
refetch trigger instead of invoking api.snapshot). Replace any direct uses of
useRefsStore.refresh with the query refetch/invalidation and remove manual
loading state handling from the store. Ensure components read refs/foreground
from the query and only use the store for debounced UI state.

In `@packages/device-use/src/server/index.ts`:
- Around line 167-212: The code uses unsafe casts ws as unknown as object when
storing keys in WS_SUBSCRIBERS and WS_UPSTREAMS inside the websocket.open
handler; replace this fragile pattern by introducing a typed wrapper or keying
mechanism: create a small wrapper type/class (e.g., ClientConnection) that holds
the Bun WebSocket and any metadata (unsubscribe, upstream, buffered) and store
instances of that wrapper in WS_SUBSCRIBERS/WS_UPSTREAMS (or switch those maps
to WeakMap<WebSocket, ...> keyed by the actual WebSocket type if Bun's types
allow it); update websocket.open (and any lookup/cleanup sites) to create and
store the wrapper (or use the WebSocket directly with WeakMap) instead of
casting, and update references to retrieve the wrapper to access
unsubscribe/upstream/buffered.

In `@packages/device-use/src/server/stream.ts`:
- Around line 107-108: The subprocess's stdout is piped but never consumed,
causing possible backpressure; add a no-op drain for stdout by attaching a data
handler to the child.stdout stream (similar to the existing
child.stderr?.on("data", () => {})) so stdout is consumed; locate the spawned
process represented by the variable child in stream.ts and add
child.stdout?.on("data", () => {}), and consider also handling
child.stdout?.on("error", ...) if needed for robustness.

In `@packages/device-use/test/server.test.ts`:
- Around line 94-106: The test title "emits started + failed on unknown tool" is
misleading because invokeTool("nope", ...) intentionally skips emitting a
"started" event and only emits a single "failed" tool-event; rename the test
description to something like "emits failed on unknown tool" (update the string
passed to test(...) that wraps the assertions using invokeTool, events, and
toolEvents) so the name matches the assertions and behavior of invokeTool.

In `@packages/device-use/test/xcodebuild.test.ts`:
- Around line 21-37: The cleanup is ineffective because makeSpawner never
records created FakeChild instances in STARTED; modify makeSpawner to push each
created child into the STARTED array (e.g., after creating `const child = new
FakeChild();` call `STARTED.push(child)`), so the existing afterEach() guard can
actually clear and prevent leaked emits or unread streams; reference symbols:
makeSpawner, STARTED, FakeChild, spawner.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc0cdb83-0cdc-42eb-b77f-c3ac9fe32c8f

📥 Commits

Reviewing files that changed from the base of the PR and between 6a1d771 and 1d9ed9e.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (55)
  • .github/workflows/device-use-e2e.yml
  • docs/device-use-v2-design.md
  • package.json
  • packages/device-use/.gitignore
  • packages/device-use/AGENTS.md
  • packages/device-use/README.md
  • packages/device-use/agentic-app.json
  • packages/device-use/package.json
  • packages/device-use/scripts/build-ts.ts
  • packages/device-use/scripts/e2e-server.ts
  • packages/device-use/scripts/longpress-smoke.ts
  • packages/device-use/scripts/sdk-smoke.ts
  • packages/device-use/scripts/touch-smoke.ts
  • packages/device-use/scripts/ws-smoke.ts
  • packages/device-use/src/cli/commands/serve.ts
  • packages/device-use/src/cli/commands/stream.ts
  • packages/device-use/src/cli/index.ts
  • packages/device-use/src/cli/stream/manager.ts
  • packages/device-use/src/cli/stream/viewer-html.ts
  • packages/device-use/src/engine/index.ts
  • packages/device-use/src/engine/logs.ts
  • packages/device-use/src/engine/project-info.ts
  • packages/device-use/src/engine/xcodebuild.ts
  • packages/device-use/src/frontend/App.tsx
  • packages/device-use/src/frontend/components/DeviceFrame.tsx
  • packages/device-use/src/frontend/components/LogsDrawer.tsx
  • packages/device-use/src/frontend/components/Sidebar.tsx
  • packages/device-use/src/frontend/components/Toasts.tsx
  • packages/device-use/src/frontend/components/TopBar.tsx
  • packages/device-use/src/frontend/index.html
  • packages/device-use/src/frontend/lib/api.ts
  • packages/device-use/src/frontend/lib/ws.ts
  • packages/device-use/src/frontend/main.tsx
  • packages/device-use/src/frontend/stores/activity-store.ts
  • packages/device-use/src/frontend/stores/logs-store.ts
  • packages/device-use/src/frontend/stores/project-store.ts
  • packages/device-use/src/frontend/stores/refs-store.ts
  • packages/device-use/src/frontend/stores/sim-store.ts
  • packages/device-use/src/frontend/styles.css
  • packages/device-use/src/sdk/apps.ts
  • packages/device-use/src/sdk/builder.ts
  • packages/device-use/src/sdk/index.ts
  • packages/device-use/src/server/events.ts
  • packages/device-use/src/server/index.ts
  • packages/device-use/src/server/invoker.ts
  • packages/device-use/src/server/mcp.ts
  • packages/device-use/src/server/state.ts
  • packages/device-use/src/server/stream.ts
  • packages/device-use/src/server/tools.ts
  • packages/device-use/test/logs.test.ts
  • packages/device-use/test/project-info.test.ts
  • packages/device-use/test/server.test.ts
  • packages/device-use/test/xcodebuild.test.ts
  • packages/device-use/tsconfig.json
  • packages/device-use/vite.config.ts
💤 Files with no reviewable changes (8)
  • packages/device-use/scripts/build-ts.ts
  • packages/device-use/scripts/sdk-smoke.ts
  • packages/device-use/src/sdk/apps.ts
  • packages/device-use/src/cli/commands/stream.ts
  • packages/device-use/src/sdk/builder.ts
  • packages/device-use/src/cli/stream/viewer-html.ts
  • packages/device-use/src/sdk/index.ts
  • packages/device-use/src/cli/stream/manager.ts

Comment thread docs/device-use-v2-design.md
Comment thread docs/device-use-v2-design.md
Comment thread packages/device-use/agentic-app.json
Comment thread packages/device-use/package.json
Comment thread packages/device-use/README.md Outdated
Comment thread packages/device-use/src/frontend/stores/sim-store.ts
Comment thread packages/device-use/src/frontend/styles.css Outdated
Comment thread packages/device-use/src/frontend/styles.css Outdated
Comment thread packages/device-use/src/server/state.ts Outdated
Comment thread packages/device-use/src/server/tools.ts

@zvadaadam zvadaadam left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI code review (Claude)

CodeRabbit already covered most of what I would have flagged (writeChain .catch() bug, hardcoded colors in styles.css, serve.ts path bug, install/launch error swallowing, etc.). I'm only posting the two findings that didn't show up in CodeRabbit's pass:

  1. set_active_project silently wipes scheme/configuration when called with just a path — triggered by every blur of the path input.
  2. Frontend 10s polling violates the root CLAUDE.md "WebSocket push over polling" Hard Rule.

Generated by Claude Code as an automated code review.

configuration: z.string().optional(),
}),
handler: async (ctx, params) => {
await ctx.state.update({ project: params });

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI code review (Claude) — bug, not flagged by CodeRabbit

set_active_project silently wipes scheme and configuration whenever called with just a path.

params is stored wholesale, and StateStore.update (src/server/state.ts:51-56) shallow-merges only at the top level — so state.project is replaced entirely, not merged with the previous value.

The frontend hits this on every blur of the path input: TopBar.onProjectPath (src/frontend/components/TopBar.tsx:25-29) calls useProjectStore.setProject(path) with no scheme → api.setActiveProject(path, undefined, undefined)JSON.stringify drops the undefined keys → server receives {path}state.project becomes {path}, scheme gone. Re-focusing and blurring the field without changing it wipes the pinned scheme and disables ▶ Run.

Fix: merge inside the handler, e.g.

const prev = ctx.state.get().project ?? {};
await ctx.state.update({ project: { ...prev, ...params } });

or only overwrite keys the caller actually provided.

useSimStore.getState().refresh();
useProjectStore.getState().refresh();
// Refresh sim list every 10s (catches external simctl boot/shutdown).
const t = setInterval(() => useSimStore.getState().refresh(), 10_000);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI code review (Claude) — CLAUDE.md violation, not flagged by CodeRabbit

Root CLAUDE.md Hard Rules: "WebSocket push over polling — only poll for git diffs on working sessions." The package's AGENTS.md does not override this.

This 10s setInterval continuously polls /api/state + /api/stream. The justification (catching external xcrun simctl boot/shutdown) is real, but the rule has only one carve-out (git diffs). Two earlier commits in this branch (ef6e7ca2, 8f0fb70e) had to fight UI flicker caused by this exact polling — the cost of the violation is already showing up.

Fix: poll xcrun simctl list -j server-side and emit a sim-state event over the existing /ws bus when the list changes; frontend subscribes once instead of polling on a timer.

17 inline comments from the bot — addressing all of them. Brief
mapping (file → comment id):

CRITICAL
  package.json#62           Add missing zustand dep (was hoisted to root, would
                            fail bun install --frozen-lockfile in CI).

MAJOR
  agentic-app.json#33       UI/MCP URLs use 127.0.0.1 instead of localhost so
                            host IDEs don't trip over IPv4/IPv6 happy-eyeballs
                            ambiguity. serve.ts also defaults --host to 0.0.0.0
                            (matches the server's own default), and the printed
                            probe URL collapses 0.0.0.0 → 127.0.0.1.
  serve.ts#33               Compiled-layout candidate path was off-by-one
                            (..server vs ../..server). Fixed.
  serve.ts#65               Browser only opens after /health responds 200; on
                            startup failure, no dead tab.
  serve.ts#72               Propagate child exit code; non-zero exits return
                            {success:false} with the code.
  e2e-server.ts#173         ChildProcess.killed is set when the signal is sent,
                            not when the process exits — using it as a kill-fallback
                            check would race a SIGTERM-ignoring child. Switched
                            to a tracked `exited` flag + clearTimeout on exit.
  Sidebar.tsx#17            Refresh refs whenever streamUdid changes (sim swap),
                            not only when refs.length === 0. Stops stale refs
                            from a previous sim from lingering.
  styles.css#85             Added --on-accent / --stage-bg / --frame-bg /
                            --canvas-bg / --shadow-* tokens; removed every
                            literal `white`, `#06060a`, `#1a1a22`, `#000`, and
                            `rgba(...)` from the stylesheet body.
  state.ts#72               persist() now catches mkdir/writeFile failures
                            inside the chained promise so a single transient
                            error can't permanently brick the writeChain.
  tools.ts#562              stream_logs wires onError + onExit to delete the
                            handle from logHandles, so failed/exited streams
                            don't leak.

MINOR
  design-doc#57,#65         Architecture diagram said "TanStack Start on Nitro"
                            but the locked stack (decision #5) is "Hono on
                            Bun.serve" / "Vite + React". Aligned.
  README.md#88              Added `text` language tag to the architecture fenced
                            code block (markdownlint MD040).
  project-info.ts#43        detectKind() throws XcodebuildError on inputs that
                            aren't .xcodeproj or .xcworkspace, instead of
                            silently treating them as .xcodeproj.
  xcodebuild.ts#159         Removed the synthesized appPath from BuildResult —
                            it was a guess based on `scheme.app` that's wrong
                            whenever target product name differs. Callers
                            should use resolveAppPath() (already used by the
                            `run` composite tool).
  TopBar.tsx#15             onBoot pushes a failed tool-event into the activity
                            store on api.boot() failure, so the toast surfaces
                            the error.
  sim-store.ts#67           setPinned checks res.success — only commits the
                            pinnedUdid change on success, otherwise stores
                            error state.
  styles.css#345            Replaced deprecated `word-break: break-word` with
                            `overflow-wrap: anywhere` (modern equivalent).

Side effects:
  scripts/*.ts              Added `export {}` to the three smoke scripts so
                            tsc treats them as modules, not global scripts
                            (was causing duplicate-symbol errors after a
                            second smoke script joined the project).

Verified:
  - 86/86 unit tests pass
  - typecheck clean
  - bun scripts/e2e-server.ts passes against a real iPhone 17 Pro
    (17 tool-events, 97 tool-log frames, 0 failures)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 1

CodeRabbit posted 17 actionable inline comments (1 critical, 8 major, 8 minor). All addressed in f14d701d.

Critical (1)

  • zustand missing from packages/device-use/package.json — added (was hoisted to root locally, would have failed bun install --frozen-lockfile in CI).

Major (8)

  • agentic-app.jsonui.url/agent.tools.url now 127.0.0.1 (avoid v4/v6 happy-eyeballs ambiguity); serve.ts defaults --host to 0.0.0.0 to match server's bind.
  • serve.ts compiled-layout candidate path — was off-by-one (../server resolved into cli/server/), now correct.
  • serve.ts open-browser-on-failure — open only fires inside the /health 200 success branch.
  • serve.ts exit-code propagation — non-zero exits return {success:false, message:"Server exited with code N"}.
  • e2e-server.ts ChildProcess.killed race — switched to a tracked exited flag + clearTimeout on exit so SIGKILL only fires when the child actually ignored SIGTERM.
  • Sidebar.tsx stale refs after sim swap — refresh on every streamUdid change, not only when refs.length === 0.
  • state.ts permanent persistence brick — persist() catches mkdir/writeFile inside the chain so a single transient FS error no longer poisons writeChain for the server's lifetime.
  • tools.ts log-stream subscription leak — stream_logs wires onError + onExit to delete the handle from logHandles.

Minor (8)

  • ✅ Design doc architecture diagram aligned with locked decision Fix browser panel layout with console open #5 (Hono on Bun.serve / Vite + React).
  • ✅ README architecture fence got a text language tag (markdownlint MD040).
  • project-info.ts detectKind() throws on inputs that aren't .xcodeproj or .xcworkspace.
  • xcodebuild.ts removed the synthesized appPath from BuildResult — the run composite already calls resolveAppPath() for the real path.
  • TopBar.tsx onBoot surfaces failures as a failed tool-event (toast).
  • sim-store.ts setPinned only commits the pinned UDID change on res.success, otherwise sets error.
  • styles.css added --on-accent / --stage-bg / --frame-bg / --canvas-bg / --shadow-* tokens; replaced every literal hex/rgba/white in the stylesheet body. (Did not extract literals that only appear once.)
  • styles.css deprecated word-break: break-word → modern overflow-wrap: anywhere.

Side effect

  • scripts/{touch,longpress,ws}-smoke.ts — added export {} so tsc treats them as modules (they were colliding on top-level names like frame() and PORT).

Verified

  • bun test 86/86 pass
  • bun run typecheck clean
  • bun scripts/e2e-server.ts PASS — 17 tool-events, 97 tool-log frames, 0 failures against a real iPhone simulator.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
packages/device-use/src/server/tools.ts (1)

159-172: ⚠️ Potential issue | 🟠 Major

set_active_project silently wipes scheme and configuration when called with only a path.

This was flagged in a previous review and remains unaddressed. When params contains only path, storing it wholesale via ctx.state.update({ project: params }) replaces the entire project object, losing any previously set scheme or configuration. The frontend's TopBar blur handler can trigger this, breaking the "▶ Run" button.

🐛 Proposed fix — merge with previous project state
 const set_active_project = tool({
   name: "set_active_project",
   description:
     "Pin the active Xcode project (path + scheme) for this workspace. Used by build/run when params are omitted.",
   schema: z.object({
     path: z.string(),
     scheme: z.string().optional(),
     configuration: z.string().optional(),
   }),
   handler: async (ctx, params) => {
-    await ctx.state.update({ project: params });
-    return { ok: true, project: params };
+    const prev = ctx.state.get().project ?? {};
+    const merged = { ...prev, ...params };
+    await ctx.state.update({ project: merged });
+    return { ok: true, project: merged };
   },
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/server/tools.ts` around lines 159 - 172, The handler
for set_active_project currently overwrites the entire project state with params
which causes scheme and configuration to be wiped when only path is provided;
change the handler to read the existing project from ctx.state (e.g.,
ctx.state.project or via ctx.state.get), shallow-merge the existing project with
the incoming params so unspecified fields are preserved, and then call
ctx.state.update({ project: mergedProject }) instead of ctx.state.update({
project: params }); ensure you reference set_active_project and ctx.state.update
when making the change.
docs/device-use-v2-design.md (1)

183-184: ⚠️ Potential issue | 🟡 Minor

Phase 1 still references the rejected TanStack Start + Nitro stack.

Lines 183-184 describe adding "TanStack Start" and creating a "minimal Nitro server", but decision #5 (line 23) documents that the team refined away from TanStack Start on Nitro during Phase 1 scaffolding itself, ultimately choosing "Hono on Bun.serve" for the backend. The Phase 1 description should reflect the final chosen stack, not the initial exploration.

📝 Suggested fix
 ### Phase 1 — Scaffold the new server + frontend (~1 day)
 
-- Add TanStack Start to `packages/device-use` (new deps, tsconfig adjustments).
-- `src/server/` — minimal Nitro server with `/health` route.
+- Add Hono + Vite + TanStack Router to `packages/device-use` (new deps, tsconfig adjustments).
+- `src/server/` — minimal Hono on Bun.serve with `/health` route.
 - `src/frontend/` — minimal React app served at `/`.
 - `bun run dev` / `bun run build` / `bun run start` wire-up.
 - **Exit criterion:** `bun run dev` → open `localhost:3100/` → "device-use v2" heading renders; `GET /health` returns 200.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/device-use-v2-design.md` around lines 183 - 184, Update the Phase 1
description that still references "TanStack Start" and "minimal Nitro server":
replace the bullet "- Add TanStack Start to `packages/device-use` (new deps,
tsconfig adjustments)." with a note about adding the chosen Hono stack (e.g.,
"Add Hono on Bun.serve to `packages/device-use`") and change "- `src/server/` —
minimal Nitro server with `/health` route." to reflect the final implementation
(e.g., "src/server/ — minimal Hono server on Bun.serve with `/health` route").
Ensure the Phase 1 heading/text that mentions "Phase 1" and those two bullets
are updated so the document accurately states the final backend choice ("Hono on
Bun.serve") instead of "TanStack Start" or "Nitro".
packages/device-use/src/frontend/stores/sim-store.ts (1)

64-70: ⚠️ Potential issue | 🟡 Minor

Handle thrown API errors in setPinned as well as non-success responses.

Line 65 is only guarded for res.success === false; if api.setActiveSimulator() throws, error state is not set and callers see a rejected promise path.

💡 Proposed fix
   setPinned: async (udid) => {
-    const res = await api.setActiveSimulator(udid);
-    if (!res.success) {
-      set({ error: res.error ?? "failed to pin simulator" });
-      return;
-    }
-    set({ pinnedUdid: udid, error: null });
+    try {
+      const res = await api.setActiveSimulator(udid);
+      if (!res.success) {
+        set({ error: res.error ?? "failed to pin simulator" });
+        return;
+      }
+      set({ pinnedUdid: udid, error: null });
+    } catch (err) {
+      set({ error: err instanceof Error ? err.message : "failed to pin simulator" });
+    }
   },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/src/frontend/stores/sim-store.ts` around lines 64 - 70,
The setPinned async action (setPinned) only handles non-success responses from
api.setActiveSimulator but not exceptions; wrap the api.setActiveSimulator(udid)
call in a try/catch (or attach .catch) inside setPinned, set the store's error
via set({ error: <message> }) when an exception occurs, and ensure the function
returns after setting error (so callers do not see an unhandled rejection); keep
the existing success-path (set({ pinnedUdid: udid, error: null })) intact and
include the thrown error message (or a fallback string) in the error state.
🧹 Nitpick comments (3)
packages/device-use/test/xcodebuild.test.ts (1)

32-37: Remove unused STARTED array.

The STARTED array is declared and cleared in afterEach but never populated or read anywhere in the test file.

🧹 Suggested cleanup
-const STARTED: FakeChild[] = [];
-
-afterEach(() => {
-  // Guard: don't leave fake children emitting after a test ends.
-  STARTED.length = 0;
-});
-
 describe("build", () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/test/xcodebuild.test.ts` around lines 32 - 37, Remove the
unused STARTED array and its cleanup: delete the const STARTED: FakeChild[] =
[]; declaration and remove the afterEach block that only resets STARTED.length,
ensuring no other code refers to STARTED (search for STARTED, FakeChild or that
afterEach helper) before committing the change in xcodebuild.test.ts.
packages/device-use/scripts/e2e-server.ts (1)

213-226: Consider deriving expected tool count dynamically.

The hardcoded 24 for tool count validation will require manual updates whenever tools are added or removed. Consider importing TOOLS.length from the tools module if the e2e script can safely depend on server internals, or document this as an intentional snapshot assertion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/device-use/scripts/e2e-server.ts` around lines 213 - 226, The checks
use a hardcoded expected value "24" for tool counts (see toolCount check and
mcp("tools/list") assertion), which will break when tools change; update both
comparisons to derive the expected count from the actual tools definition
(import TOOLS and use TOOLS.length) and compare against that value instead of
24, or if this is intentionally a snapshot test, add a clear comment explaining
it's intentional; ensure the import of TOOLS is added at the top and replace the
two literal 24s used with the dynamic expectedCount when calling fail and
logging via mcp and toolCount checks (referencing toolCount, mcp("tools/list"),
listRes.tools.length, fail, and log).
docs/device-use-v2-design.md (1)

39-39: Consider adding a language specifier to the code block.

The fenced code block at line 39 lacks a language specifier. While the ASCII art diagram renders correctly, adding text or another appropriate specifier would satisfy markdown linters and be consistent with best practices.

📝 Suggested fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────┐
 │  packages/device-use/                                       │
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/device-use-v2-design.md` at line 39, The fenced code block in
docs/device-use-v2-design.md (the ASCII art diagram starting at the code fence
around line 39) lacks a language specifier; update that opening triple-backtick
to include a language such as text (e.g., ```text) so markdown linters accept it
and the rendering remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/device-use-v2-design.md`:
- Line 168: Update the dev/prod modes description so it reflects the actual
chosen stack (Hono on Bun.serve for the backend) instead of saying "Standard
TanStack Start"; modify the sentence that currently reads "`bun run dev` = Vite
HMR + `tsx watch` for server. `bun run start` = bundled frontend + compiled
server. Standard TanStack Start." to explicitly state that `bun run dev` uses
Vite HMR with a Bun/Hono dev server (e.g., `tsx watch` driving Hono on
Bun.serve) and that `bun run start` runs the bundled frontend with the
production Bun/Hono server; ensure the phrase "Standard TanStack Start" is
removed or replaced and that this aligns with decision `#5` which switched to Hono
on Bun.serve.

In `@packages/device-use/src/frontend/stores/sim-store.ts`:
- Around line 15-19: The current sameStream function only compares udid and port
which can let url/size changes be ignored; update sameStream (and the other
occurrence noted) to include url and size in the comparison (or perform a
shallow/deep equality of StreamInfo objects) so any change to url or size will
be detected and state will update; locate the sameStream function and replace
its equality check to compare udid, port, url, and size (or use a reliable
object-equality helper) to prevent stale viewer metadata.
- Around line 22-30: The equality function sameSims currently compares udid,
state, and name but omits runtime, causing missed updates; update sameSims (and
the other occurrence mentioned) to also compare x.runtime and y.runtime (e.g.,
add a check like x.runtime !== y.runtime) so any change to the simulator runtime
triggers inequality and subscriber updates.
- Around line 40-71: The refresh() and setPinned() handlers in sim-store.ts are
performing server fetches and mutations directly; rework them to use TanStack
Query v5 instead: create queries for device list, state and stream (e.g.,
useQuery keys like ["devices"], ["simState"], ["stream"]) and a mutation for
setActiveSimulator via useMutation, then in sim-store's refresh and setPinned
either read from/queryClient (queryClient.getQueryData or invalidate/refetch
queries) or call mutation.mutateAsync and onSuccess invalidate/refetch the
relevant queries; preserve existing identity logic by deriving
sims/pinnedUdid/streamInfo from the query data and continue using
sameSims/sameStream to avoid unnecessary replacements. Ensure setPinned uses the
mutation's onSuccess to update/refresh the ["devices"] and ["simState"] queries
and set errors via the store only from query/mutation results rather than
calling api directly.

---

Duplicate comments:
In `@docs/device-use-v2-design.md`:
- Around line 183-184: Update the Phase 1 description that still references
"TanStack Start" and "minimal Nitro server": replace the bullet "- Add TanStack
Start to `packages/device-use` (new deps, tsconfig adjustments)." with a note
about adding the chosen Hono stack (e.g., "Add Hono on Bun.serve to
`packages/device-use`") and change "- `src/server/` — minimal Nitro server with
`/health` route." to reflect the final implementation (e.g., "src/server/ —
minimal Hono server on Bun.serve with `/health` route"). Ensure the Phase 1
heading/text that mentions "Phase 1" and those two bullets are updated so the
document accurately states the final backend choice ("Hono on Bun.serve")
instead of "TanStack Start" or "Nitro".

In `@packages/device-use/src/frontend/stores/sim-store.ts`:
- Around line 64-70: The setPinned async action (setPinned) only handles
non-success responses from api.setActiveSimulator but not exceptions; wrap the
api.setActiveSimulator(udid) call in a try/catch (or attach .catch) inside
setPinned, set the store's error via set({ error: <message> }) when an exception
occurs, and ensure the function returns after setting error (so callers do not
see an unhandled rejection); keep the existing success-path (set({ pinnedUdid:
udid, error: null })) intact and include the thrown error message (or a fallback
string) in the error state.

In `@packages/device-use/src/server/tools.ts`:
- Around line 159-172: The handler for set_active_project currently overwrites
the entire project state with params which causes scheme and configuration to be
wiped when only path is provided; change the handler to read the existing
project from ctx.state (e.g., ctx.state.project or via ctx.state.get),
shallow-merge the existing project with the incoming params so unspecified
fields are preserved, and then call ctx.state.update({ project: mergedProject })
instead of ctx.state.update({ project: params }); ensure you reference
set_active_project and ctx.state.update when making the change.

---

Nitpick comments:
In `@docs/device-use-v2-design.md`:
- Line 39: The fenced code block in docs/device-use-v2-design.md (the ASCII art
diagram starting at the code fence around line 39) lacks a language specifier;
update that opening triple-backtick to include a language such as text (e.g.,
```text) so markdown linters accept it and the rendering remains unchanged.

In `@packages/device-use/scripts/e2e-server.ts`:
- Around line 213-226: The checks use a hardcoded expected value "24" for tool
counts (see toolCount check and mcp("tools/list") assertion), which will break
when tools change; update both comparisons to derive the expected count from the
actual tools definition (import TOOLS and use TOOLS.length) and compare against
that value instead of 24, or if this is intentionally a snapshot test, add a
clear comment explaining it's intentional; ensure the import of TOOLS is added
at the top and replace the two literal 24s used with the dynamic expectedCount
when calling fail and logging via mcp and toolCount checks (referencing
toolCount, mcp("tools/list"), listRes.tools.length, fail, and log).

In `@packages/device-use/test/xcodebuild.test.ts`:
- Around line 32-37: Remove the unused STARTED array and its cleanup: delete the
const STARTED: FakeChild[] = []; declaration and remove the afterEach block that
only resets STARTED.length, ensuring no other code refers to STARTED (search for
STARTED, FakeChild or that afterEach helper) before committing the change in
xcodebuild.test.ts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 419393ca-8ab7-47b2-82b5-526e7235cd2d

📥 Commits

Reviewing files that changed from the base of the PR and between 1d9ed9e and f14d701.

📒 Files selected for processing (18)
  • docs/device-use-v2-design.md
  • packages/device-use/README.md
  • packages/device-use/agentic-app.json
  • packages/device-use/package.json
  • packages/device-use/scripts/e2e-server.ts
  • packages/device-use/scripts/longpress-smoke.ts
  • packages/device-use/scripts/touch-smoke.ts
  • packages/device-use/scripts/ws-smoke.ts
  • packages/device-use/src/cli/commands/serve.ts
  • packages/device-use/src/engine/project-info.ts
  • packages/device-use/src/engine/xcodebuild.ts
  • packages/device-use/src/frontend/components/Sidebar.tsx
  • packages/device-use/src/frontend/components/TopBar.tsx
  • packages/device-use/src/frontend/stores/sim-store.ts
  • packages/device-use/src/frontend/styles.css
  • packages/device-use/src/server/state.ts
  • packages/device-use/src/server/tools.ts
  • packages/device-use/test/xcodebuild.test.ts
✅ Files skipped from review due to trivial changes (3)
  • packages/device-use/README.md
  • packages/device-use/scripts/touch-smoke.ts
  • packages/device-use/agentic-app.json
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/device-use/src/frontend/components/Sidebar.tsx
  • packages/device-use/src/frontend/components/TopBar.tsx
  • packages/device-use/src/cli/commands/serve.ts
  • packages/device-use/src/engine/project-info.ts
  • packages/device-use/src/server/state.ts
  • packages/device-use/package.json
  • packages/device-use/src/engine/xcodebuild.ts

Comment thread docs/device-use-v2-design.md Outdated
Comment thread packages/device-use/src/frontend/stores/sim-store.ts
Comment thread packages/device-use/src/frontend/stores/sim-store.ts
Comment thread packages/device-use/src/frontend/stores/sim-store.ts
zvadaadam and others added 2 commits April 18, 2026 11:06
The zustand dep was added to packages/device-use/package.json in
f14d701 but the workspace lockfile entry wasn't captured. This is
the missing one-line lockfile sync — `bun install --frozen-lockfile`
was already passing in CI because the root entry was up-to-date,
but the workspace block needed the addition for consistency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- design-doc#168 — removed stale "TanStack Start" reference from the
  dev/prod modes row; replaced with the actual setup (Hono server with
  --hot, separate Vite dev server, prod = bundled frontend + same
  Hono server in NODE_ENV=production).
- sim-store#19 (sameStream) — extended comparison beyond udid + port
  to include url and structural size comparison (pxW/pxH/ptW/ptH).
  Stream metadata changes (e.g. orientation rotation altering size
  while UDID/port stay fixed) now correctly notify subscribers.
- sim-store#30 (sameSims) — added runtime to the per-simulator
  equality check. Runtime upgrades now propagate.

Disagreed with one suggestion (TanStack Query refactor for sim-store
fetches). Reasoning in the round summary on the PR — short version:
device-use is a standalone package whose frontend deliberately runs
on Zustand only, has no TanStack Query dependency today, and the
guideline cited applies to apps/web (Deus's main UI), not to
packaged sub-products. Adding TanStack Query for two REST reads
would inflate the bundle and introduce a paradigm split between the
WS event subscription and the polled data, with no real benefit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 2

CodeRabbit posted 4 inline comments after the round-1 push. Triaged: 3 fixed, 1 disagreed.

Fixed (3) in 248d2810

  • ✅ Design doc — stale "TanStack Start" reference in dev/prod modes row replaced with the real setup (Hono + --hot, separate Vite, prod = Hono in NODE_ENV=production).
  • sameStream extended to compare udid + port + url + structural size (pxW/pxH/ptW/ptH). Catches orientation rotation that changes size while UDID/port stay fixed.
  • sameSims extended to compare runtime per-entry. Same fix covers the second reference at line 54 (function is shared).

Disagreed (1) — TanStack Query refactor for sim-store
Per the inline reply on the comment: device-use is a standalone package (Zustand only by design, no TanStack Query dep today, frontend bundle stays small). The CLAUDE.md guideline cited applies to apps/web/ (Deus's main UI), not to packaged sub-products. Two REST reads + a trivial single-POST mutation don't earn the cost of adding TS-Query and forking the mental model between WS-event state (Zustand) and polled state (TS-Query). Will revisit if device-use grows real list-view caching needs.

Verified

  • 86/86 tests pass
  • typecheck clean

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/device-use-v2-design.md (1)

183-185: ⚠️ Potential issue | 🟡 Minor

Phase 1 still references deprecated stack choices.

This section says “Add TanStack Start” and “minimal Nitro server,” which conflicts with locked decision #5 and the rest of this doc’s Hono-on-Bun.serve direction. Please update this phase text to Hono + Vite terminology for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/device-use-v2-design.md` around lines 183 - 185, Update Phase 1 wording
to match the locked decision and the rest of the doc: replace "Add TanStack
Start" with a Hono + Vite phrasing (e.g., "Add Hono + Vite to
`packages/device-use`" including new deps and tsconfig adjustments), change
"`src/server/` — minimal Nitro server with `/health` route" to "`src/server/` —
minimal Hono-on-Bun.serve server with `/health` route", and change
"`src/frontend/` — minimal React app served at `/`" to explicitly state the
React app is served via Vite (e.g., "minimal React app served via Vite at `/`");
keep file/dir references `packages/device-use`, `src/server`, and `src/frontend`
so reviewers can find the updated lines.
🧹 Nitpick comments (1)
docs/device-use-v2-design.md (1)

39-39: Add a language to the fenced code block.

The architecture diagram block should include a language tag (for markdownlint MD040), e.g. text.

Suggested doc fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────┐
 ...
-```
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/device-use-v2-design.md` at line 39, The fenced code block containing
the architecture diagram is missing a language tag (causing markdownlint MD040);
update the opening fence (the triple backticks that start the diagram block) to
include a language tag such as text (e.g., change ``` to ```text) so the diagram
block is fenced with a language identifier and leave the closing ``` unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/device-use-v2-design.md`:
- Line 318: The docs currently reference `launch.ready.http`, which conflicts
with the manifest shape shown earlier that uses `launch.ready` with `type:
"http"`; update the sentence to use the manifest field name consistently (e.g.,
`launch.ready` and indicate `type: "http"` where relevant) so it does not imply
a different schema path, and ensure related occurrences refer to `launch.ready`
rather than `launch.ready.http`.

---

Duplicate comments:
In `@docs/device-use-v2-design.md`:
- Around line 183-185: Update Phase 1 wording to match the locked decision and
the rest of the doc: replace "Add TanStack Start" with a Hono + Vite phrasing
(e.g., "Add Hono + Vite to `packages/device-use`" including new deps and
tsconfig adjustments), change "`src/server/` — minimal Nitro server with
`/health` route" to "`src/server/` — minimal Hono-on-Bun.serve server with
`/health` route", and change "`src/frontend/` — minimal React app served at `/`"
to explicitly state the React app is served via Vite (e.g., "minimal React app
served via Vite at `/`"); keep file/dir references `packages/device-use`,
`src/server`, and `src/frontend` so reviewers can find the updated lines.

---

Nitpick comments:
In `@docs/device-use-v2-design.md`:
- Line 39: The fenced code block containing the architecture diagram is missing
a language tag (causing markdownlint MD040); update the opening fence (the
triple backticks that start the diagram block) to include a language tag such as
text (e.g., change ``` to ```text) so the diagram block is fenced with a
language identifier and leave the closing ``` unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5cea110d-dfb8-412c-bd89-7f2ec910ed1f

📥 Commits

Reviewing files that changed from the base of the PR and between f14d701 and 248d281.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • docs/device-use-v2-design.md
  • packages/device-use/src/frontend/stores/sim-store.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/device-use/src/frontend/stores/sim-store.ts

Comment thread docs/device-use-v2-design.md Outdated
CodeRabbit flagged that `launch.ready.http` was an inconsistent path
shorthand — the actual manifest schema is `launch.ready` with
`{type: "http", path: "/health"}`. Reworded to use the correct
shape so future readers don't assume a `launch.ready.http` key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 3

CodeRabbit posted 1 minor doc-wording nit. Fixed in 92c6ff1e — reworded launch.ready.http shorthand to the actual schema shape (launch.ready with {type: "http", path: "/health"}).

No other findings. Ready to merge once CI on the new SHA confirms green.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/device-use-v2-design.md`:
- Line 85: Update the `/ws` bullet so it does not claim standalone CLI commands
are visible live; change it to state that `/ws` exposes bidirectional events for
subscribed frontends and live updates only reflect actions routed through the
server/viewer (standalone CLI actions do not populate server/viewer activity).
Ensure the wording references `/ws`, "CLI commands", and "server/viewer
activity" so readers understand the limitation already noted elsewhere.
- Line 30: The doc incorrectly shows the storage path as
`{storage.workspace}/.device-use/state.json`; update the text to reflect how
StateStore and DEUS_STORAGE actually behave by documenting the file as
`{storage.workspace}/state.json` (since DEUS_STORAGE is set to
`{workspace}/.device-use` and StateStore writes to `{storageDir}/state.json`),
and apply the same correction to the other occurrences noted (lines referenced
203 and 320); mention the relevant symbols StateStore, storageDir, and
DEUS_STORAGE so readers can map env var to actual file path.
- Around line 242-243: Update the out-of-scope note that incorrectly claims
agentic-app.json is separate: change the wording around the "AAP protocol itself
(`agentic-app.json`, registry, lifecycle in Deus backend)" to state that the
manifest file agentic-app.json is included in this PR (manifest included) while
protocol/host integration (registry and lifecycle handling in the Deus backend)
remains deferred to a follow-up; apply the same rewording to the repeated
occurrence referenced at line 253 so both locations reflect "manifest included,
protocol/host integration deferred."
- Around line 183-185: Update the stale Phase 1 stack text that references
"TanStack Start" and "Nitro" to reflect the locked stack: replace those list
items so they read that `packages/device-use` uses Hono on Bun.serve for the
server and a Vite-served frontend (update the bullets mentioning `src/server/`
and `src/frontend/` accordingly), and ensure any mentions of tsconfig/deps
reference the actual deps/config needed for Hono+Bun and Vite.
- Line 39: The fenced architecture block uses a plain code fence (```) without a
language tag; update the opening fence to include a language identifier (for
example change ``` to ```text) so the block becomes a text-fenced code block and
satisfies markdownlint MD040 for the architecture diagram in the document.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: daa94643-9771-460d-ac9a-627e72b8d0cd

📥 Commits

Reviewing files that changed from the base of the PR and between 248d281 and 92c6ff1.

📒 Files selected for processing (1)
  • docs/device-use-v2-design.md

Comment thread docs/device-use-v2-design.md Outdated
Comment thread docs/device-use-v2-design.md Outdated
Comment thread docs/device-use-v2-design.md Outdated
Comment thread docs/device-use-v2-design.md Outdated
Comment thread docs/device-use-v2-design.md Outdated
All 5 are doc-only fixes in docs/device-use-v2-design.md:

- Storage path duplication — `state.json` lives at
  `{storage.workspace}/state.json`, not `{storage.workspace}/.device-use/state.json`
  (the `.device-use` segment is already inside `storage.workspace`).
  Fixed in 3 places (table row #12, backend wiring section).

- Architecture fence missing language — added `text` (markdownlint MD040).

- /ws bullet overstated CLI visibility — clarified that only actions
  routed through the server (REST, MCP, viewer) populate the activity
  stream; standalone CLI invocations bypass it (peer model).

- Phase 1 stack text was stale — said "Add TanStack Start" + "Nitro"
  but the locked stack (decision #5) is Hono on Bun.serve + Vite.
  Bullet rewritten to match what was actually built.

- Out-of-scope note about agentic-app.json was outdated — the manifest
  IS in this PR; the next PR is the Deus host that consumes it. Reworded
  + retitled the manifest section to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 4

CodeRabbit posted 5 inline doc-only nits in docs/device-use-v2-design.md. All addressed in fa9fe358.

  • ✅ Storage path duplication ({storage.workspace}/.device-use/state.json{storage.workspace}/state.json) — fixed in 3 places.
  • ✅ Architecture fence missing text language tag.
  • /ws bullet clarified — only server-routed actions populate the activity stream; standalone CLI invocations bypass it (peer model).
  • ✅ Phase 1 stack text rewritten — was 'Add TanStack Start / Nitro' (the original plan), now matches the locked stack (Hono on Bun.serve + Vite).
  • ✅ Out-of-scope note clarified — agentic-app.json IS in this PR; only the Deus host integration that consumes it is deferred.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/device-use-v2-design.md`:
- Around line 270-271: Docs show manifest examples using "localhost" for the
"ui" url and "agent.tools.url"; update those example values to use the IPv4-safe
literal "127.0.0.1" instead of "localhost" so they match the PR's manifest fix
and avoid IPv4/IPv6 ambiguity (replace the URL values referenced by the "ui" key
and by "agent" → "tools" → "url").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6fe59951-a281-459f-8428-2ac2460e52f4

📥 Commits

Reviewing files that changed from the base of the PR and between 92c6ff1 and fa9fe35.

📒 Files selected for processing (1)
  • docs/device-use-v2-design.md

Comment thread docs/device-use-v2-design.md Outdated
CodeRabbit noticed the design-doc manifest example still showed
\`localhost\` URLs while the actual \`packages/device-use/agentic-app.json\`
uses \`127.0.0.1\` (the IPv4-safe form chosen during the round-1 fix
to avoid IPv4/IPv6 happy-eyeballs ambiguity). Aligned the example so
copy/paste from the doc doesn't regress the fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 5

CodeRabbit posted 1 doc-consistency nit. Fixed in HEAD — aligned the design-doc manifest example URLs (localhost127.0.0.1) so copy/paste from the doc doesn't regress the round-1 IPv4/IPv6 fix.

zvadaadam and others added 8 commits April 18, 2026 11:27
The viewer now uses the same color/text/radius/font system that
apps/web/src/global.css defines for the Deus IDE itself, so embedding
device-use as an AAP app feels native instead of looking like a
separate skin.

Tokens mirrored 1:1 from Deus's dark theme:

- Backgrounds — six-tier hierarchy (--bg-base #0b0b0b → --bg-surface
  → --bg-elevated → --bg-raised → --bg-overlay → --bg-muted #2a2a2a).
  Each step adds perceptible depth without needing borders.
- Text — five-level OKLCH grayscale (--text-primary 0.86 → 0.32),
  generous gaps between levels.
- Accent — `oklch(0.78 0.09 345)` (Deus's "cool rose" primary), not
  the previous purple. Used for the run button, status dot glow,
  ref ids, and ripples.
- Semantic signals — equiluminant OKLCH (success / warning /
  destructive all at L≈0.65-0.68), matches Deus exactly.
- Borders — `color-mix(in oklch, var(--foreground) 5%, transparent)`,
  semi-transparent so they adapt to any surface.
- Radius scale — 2 / 4 / 6 / 8 / 10 / 12 / 16 / 20 / 24 px.
  Device frame uses --radius-4xl (24px), canvas inside uses
  --radius-3xl (20px), buttons --radius-md/lg.
- Typography — 13px base (matches Deus's dense IDE scale), system
  -apple-system / SF Pro Text + SF Mono, negative letter-spacing
  throughout (-0.005em / -0.015em on brand) for the "premium texture"
  layer.
- Custom scrollbars themed to bg-raised → bg-overlay on hover.

JSX inline styles updated to reference the new tokens
(--accent-2 → --primary, --danger → --destructive, --dim →
--text-muted-foreground).

Class names unchanged — same DOM structure, just new visual tokens
underneath. Verified:
  - 86/86 tests pass
  - typecheck clean
  - vite build: index-CKuhJYDJ.css (9.62 KB), index-BOvIo28T.js
  - bun scripts/e2e-server.ts PASS — 18 tool-events, 97 tool-log
    frames, 0 failures

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The logs pane was eating 220px at the bottom even when empty, which
is most of the time. Now it's just a ~32px header row by default; the
phone + sidebar get the extra vertical space back.

- Click the header row to toggle expand/collapse.
- Chevron flips ▸ / ▾ to signal state.
- Auto-expands on any new log line — if you triggered a build or a
  log stream, the drawer pops open for you. Collapsing is always
  manual. Auto-expand uses a direct Zustand subscription (not a
  useEffect setState) to satisfy react-hooks/set-state-in-effect.
- Clear button hidden when collapsed (no affordance without content).
- Grid row for logs changed from fixed 220px to `auto`; the
  `.expanded` class pins the body at 220px when open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Native <select> popups use OS chrome (white on macOS), breaking the
dark Deus theme. Swapped for @radix-ui/react-select wrapped in a
small themed component — same primitive Deus's apps/web Select uses,
just styled with plain CSS against our token system (no Tailwind).

What's new:
- src/frontend/components/Select.tsx — minimal wrapper exposing a
  plain {value, onValueChange, options, placeholder} API. Options
  can be any ReactNode so the sim picker can render a state dot + a
  dim runtime suffix inline.
- Styled trigger matches the other topbar inputs (bg-elevated, 1px
  border, 6px radius, hover brightens to bg-raised, focus-visible
  rings with --ring).
- Portaled popover uses --popover / --border / --shadow-elevated,
  radius-xl; items have 6/10 padding, left checkmark indicator in
  --primary, data-highlighted lifts to bg-elevated.
- Chevron rotates 180° on open (data-state="open").
- Smooth 180ms in / 150ms out with cubic-bezier matching apps/web.

TopBar wiring:
- Sim picker renders each option as `[dot] Name · runtime`, with the
  dot green+glow when that sim is Booted.
- Scheme picker renders as a plain list.
- Empty "no simulators" case stays inline text (no dropdown needed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without explicit cache directives, browsers heuristically cache the
HTML shell and keep pointing at old Vite-hashed asset URLs across
deploys — user refreshes, sees a stale bundle. This was hitting the
dropdown-themed commit: the new bundle was built + served but the
browser's cached HTML still referenced the pre-Radix build.

Fix in the prod static-serving branch:
- /assets/* → `Cache-Control: public, max-age=31536000, immutable`
  (Vite fingerprints these, so the URL changes whenever the content
  does — safe to cache forever).
- Everything else (including the HTML shell) → `Cache-Control: no-cache`
  (revalidate every load so new bundle URLs are picked up immediately).

Verified via `curl -D -`. HTML now sends `no-cache`, asset sends
`public, max-age=31536000, immutable`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old frame was just a rounded rect with 12px padding and 40px
radius — didn't match iPhone proportions, had no dynamic island,
no camera hole, and no accurate bezel thickness. Ported the
GenericDeviceFrame shape from
apps/web/src/features/simulator/ui/DeviceFrame.tsx exactly:

- Aspect ratio per device class (iPhone 430:932, iPad 834:1194);
  the frame scales to fit the stage height while preserving ratio.
- Shell radius 3.25rem (iPhone) / 2.75rem (iPad), screen radius
  2.6rem / 2.1rem — the two-layer corner system.
- Percentage-based screen insets (iPhone 1.7% top / 2.5% sides /
  1.9% bottom) matching Apple's actual bezel thickness.
- Dynamic island: 38% × 4.1%, positioned at 2.35% top, bg-base
  rounded-full. Shows for iPhone only.
- Camera dot: 10×10, positioned at 1.45% top with a subtle ring.
  Shows for iPad only.
- Inner bezel ring at inset 0.6% — catches light, hints at edge
  thickness without adding chrome.
- iPhone vs iPad detection from pinned sim's name (same heuristic
  apps/web uses).

We don't bundle the Apple-derived bezel assets (apps/web loads them
from a generated manifest via scripts/prepare-device-chrome.mjs —
needs Xcode + licensing considerations for packaged device-use);
the generic shell is the fallback Deus uses anyway when the manifest
isn't present, so visual parity is the same as Deus's default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ady has them

The simulator's MJPEG stream captures iOS's full screen, including the
native status bar with its own dynamic island (iPhone) or camera
cutout (iPad). My overlay versions drew a second pill on top at
slightly different sizes/positions, creating the visible "two things
stacked" artifact the user flagged.

Fix: remove the overlay divs + their CSS. The stream shows the real
ones rendered by iOS at the right size/position.

Why Deus's apps/web draws them anyway: their code path assumes the
captured stream excludes the status bar (that is the case when they
eventually swap in the Apple-derived bezel assets, which crop it).
For the generic-shell + full-screen MJPEG case (what we have), the
overlays are redundant.

Kept: shell background, proper aspect ratio, shell/screen radii,
percentage screen insets, subtle inner ring. Those still make the
frame feel intentional without faking what's already in the stream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… fits exactly

The hardcoded 430/932 shell AR only matches the base iPhone 16 Pro.
For iPhone Air (420×912) or iPhone 17 Pro Max (440×956), the stream
either got stretched/squished by the canvas filling the box, or left
a black gap at the top/bottom because the inner screen box didn't
match the stream's aspect ratio.

Fix: compute the shell aspect ratio from streamInfo.size.ptW/ptH
so that, after subtracting the percentage bezel insets, the inner
screen box has exactly the same ratio as the stream. The canvas then
fills the screen cleanly with no object-fit letterboxing needed.

Math:
  widthFrac  = 1 − (left% + right%) / 100
  heightFrac = 1 − (top%  + bottom%) / 100
  screenAR   = ptW / ptH                     (from streamInfo.size)
  shellAR    = screenAR × widthFrac / heightFrac

Bezel insets (1.7/2.5/2.5/1.9 % for iPhone, 2.1 all-around for iPad)
are unchanged. Fallback to the previous hardcoded ratios when we
don't have streamInfo yet (empty-state placeholder). Dropped the
canvas `object-fit: contain` since there's nothing to contain now —
the box matches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract ElementsPanel and ActivityPanel as discrete components under
components/sidebar/, share a Panel wrapper for chrome, and add a
collapsible shell with a Cmd+B toggle. Sidebar state persists via a new
ui-store (localStorage). Collapsed state shows a thin rail button on
the right edge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Sidebar refactor — collapsible + componentized

Commit: a5015691

What changed

  • New stores/ui-store.ts — Zustand store for UI state (sidebarCollapsed), persisted to localStorage under device-use.ui.v1.
  • New components/sidebar/Panel.tsx — shared wrapper with title, action, flex, children.
  • New components/sidebar/ElementsPanel.tsx + ActivityPanel.tsx — each panel is now its own file with its own store subscription.
  • Refactored Sidebar.tsx — thin composition layer; handles collapsed state + Cmd+B / Ctrl+B toggle.
  • App.tsx applies .sidebar-collapsed on .app so the grid collapses the right column.
  • styles.css — renamed .section*.sidebar-panel*, added .sidebar-rail (thin edge button when collapsed), .sidebar-chrome + .sidebar-collapse (chevron), 180ms ease on the grid transition.

UX

  • Collapsed by default? No — stays expanded until user toggles.
  • Toggle via chevron on the sidebar header, Cmd+B, or the edge rail when collapsed.
  • State persists across reloads.

Verified: bun run typecheck clean, eslint src/frontend clean, bun run build:frontend builds. Server restarted in prod mode; new bundle (index-D1G2j9a6.js / index-BG7noG1n.css) served.

Working design doc covering (a) hyperframes investigation as a
candidate second AAP app and (b) AAP v1 protocol revisions
surfaced by that investigation. Not implemented yet —
informs upcoming PR sequence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam zvadaadam merged commit 8538df3 into main Apr 18, 2026
4 checks passed
zvadaadam added a commit that referenced this pull request May 31, 2026
17 inline comments from the bot — addressing all of them. Brief
mapping (file → comment id):

CRITICAL
  package.json#62           Add missing zustand dep (was hoisted to root, would
                            fail bun install --frozen-lockfile in CI).

MAJOR
  agentic-app.json#33       UI/MCP URLs use 127.0.0.1 instead of localhost so
                            host IDEs don't trip over IPv4/IPv6 happy-eyeballs
                            ambiguity. serve.ts also defaults --host to 0.0.0.0
                            (matches the server's own default), and the printed
                            probe URL collapses 0.0.0.0 → 127.0.0.1.
  serve.ts#33               Compiled-layout candidate path was off-by-one
                            (..server vs ../..server). Fixed.
  serve.ts#65               Browser only opens after /health responds 200; on
                            startup failure, no dead tab.
  serve.ts#72               Propagate child exit code; non-zero exits return
                            {success:false} with the code.
  e2e-server.ts#173         ChildProcess.killed is set when the signal is sent,
                            not when the process exits — using it as a kill-fallback
                            check would race a SIGTERM-ignoring child. Switched
                            to a tracked `exited` flag + clearTimeout on exit.
  Sidebar.tsx#17            Refresh refs whenever streamUdid changes (sim swap),
                            not only when refs.length === 0. Stops stale refs
                            from a previous sim from lingering.
  styles.css#85             Added --on-accent / --stage-bg / --frame-bg /
                            --canvas-bg / --shadow-* tokens; removed every
                            literal `white`, `#06060a`, `#1a1a22`, `#000`, and
                            `rgba(...)` from the stylesheet body.
  state.ts#72               persist() now catches mkdir/writeFile failures
                            inside the chained promise so a single transient
                            error can't permanently brick the writeChain.
  tools.ts#562              stream_logs wires onError + onExit to delete the
                            handle from logHandles, so failed/exited streams
                            don't leak.

MINOR
  design-doc#57,#65         Architecture diagram said "TanStack Start on Nitro"
                            but the locked stack (decision #5) is "Hono on
                            Bun.serve" / "Vite + React". Aligned.
  README.md#88              Added `text` language tag to the architecture fenced
                            code block (markdownlint MD040).
  project-info.ts#43        detectKind() throws XcodebuildError on inputs that
                            aren't .xcodeproj or .xcworkspace, instead of
                            silently treating them as .xcodeproj.
  xcodebuild.ts#159         Removed the synthesized appPath from BuildResult —
                            it was a guess based on `scheme.app` that's wrong
                            whenever target product name differs. Callers
                            should use resolveAppPath() (already used by the
                            `run` composite tool).
  TopBar.tsx#15             onBoot pushes a failed tool-event into the activity
                            store on api.boot() failure, so the toast surfaces
                            the error.
  sim-store.ts#67           setPinned checks res.success — only commits the
                            pinnedUdid change on success, otherwise stores
                            error state.
  styles.css#345            Replaced deprecated `word-break: break-word` with
                            `overflow-wrap: anywhere` (modern equivalent).

Side effects:
  scripts/*.ts              Added `export {}` to the three smoke scripts so
                            tsc treats them as modules, not global scripts
                            (was causing duplicate-symbol errors after a
                            second smoke script joined the project).

Verified:
  - 86/86 unit tests pass
  - typecheck clean
  - bun scripts/e2e-server.ts passes against a real iPhone 17 Pro
    (17 tool-events, 97 tool-log frames, 0 failures)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant