device-use v2: standalone iOS workbench + MCP server (AAP-ready)#249
Conversation
Adds the foundation for device-use v2 as a standalone product: a Hono server on Bun.serve that will later host /mcp, /ws, /api/*, and a React viewer at /. Dev mode proxies to Vite for HMR; prod serves the built bundle from dist/frontend. - Hono server at packages/device-use/src/server/index.ts with /health - React scaffold at packages/device-use/src/frontend/ (main/App/index.html) - Vite config rooted at src/frontend, outputs to dist/frontend - tsconfig: JSX + DOM libs - scripts: bun run dev (server), dev:frontend (vite), build, start Also lands docs/device-use-v2-design.md — the full design record (15 locked architectural decisions, phased implementation plan, draft AAP manifest for later). Part of the AAP (Agentic Apps Protocol) effort. This PR is the device-use standalone refactor; AAP protocol itself is a separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…per, logs Adds three new engine primitives needed for the v2 "build + run + observe" flow, each injectable for unit tests: - project-info.ts — parses `xcodebuild -list -json` for an .xcodeproj or .xcworkspace; returns schemes / targets / configurations. Used by the agent (and the viewer's scheme picker) to know what to build. - xcodebuild.ts — `build()` wraps `xcodebuild build` with spawn so the stdout/stderr line stream can be piped into tool-log events (for the viewer's logs drawer and the agent's per-build trace). Supports AbortSignal so stop-app can cancel an in-flight build. Predicts the built .app path when a derivedDataPath is provided; `resolveAppPath()` is the fallback that queries -showBuildSettings -json. - logs.ts — `streamLogs()` wraps `simctl spawn <udid> log stream` with injectable spawner. Supports bundleId + pid filtering via --predicate, and a stop() handle for graceful teardown. All three follow the existing engine pattern (executor/spawner as first arg, typed errors extending DeviceUseError). 18 new unit tests — all using fake executors/spawners, zero real subprocess deps. Exit: bun test passes 72/72; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires all 23 MCP tools behind a single invoker that emits tool-event
frames over WebSocket. The same invokeTool path handles REST, MCP, and
WS calls — one engine path, one trace, zero duplicated behavior.
New modules:
- state.ts — persistent JSON at {storage}/state.json (pinned UDID,
active project/scheme). Write-serialized to avoid clobbers.
- events.ts — EventBus with 200-event ring buffer + subscriber set.
Emits ToolEvent + ToolLog; reuses MCP tool-call shape (no parallel type).
- invoker.ts — central dispatcher. Every MCP/REST/WS tool call goes
through invokeTool; emits started/completed/failed with a shared id.
- tools.ts — 23 tool definitions (Zod-schema'd handlers). Each handler
can use ctx.executor / ctx.state / ctx.stream / ctx.refMap / ctx.events.
- mcp.ts — stateless MCP HTTP handler. Creates a fresh Server+Transport
per request (SDK requirement for stateless mode) and routes tool calls
through invokeTool.
- stream.ts — StreamManager owns the long-lived simbridge --stream
subprocess; exposes proxyStream() for /stream.mjpeg passthrough.
Server wires them together:
- GET /health — liveness
- GET /api/state — persisted state
- GET /api/tools — all tools + JSON schemas
- POST /api/tools/:name — invoke a tool (REST)
- GET /api/events — recent tool-event history
- GET /stream.mjpeg — MJPEG passthrough
- GET /api/stream — stream status
- ALL /mcp — MCP HTTP transport
- WS /ws — tool-event + tool-log subscribe + invoke
Deps: @modelcontextprotocol/sdk, zod-to-json-schema.
Exit verified end-to-end:
- 86 tests pass (14 new integration tests)
- typecheck clean
- curl /api/tools → 23-tool JSON
- curl /mcp tools/list + tools/call → proper SSE responses
- scripts/ws-smoke.ts → WS receives started+completed tool-event after
REST invoke, correlation id matches
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r, logs) Builds out the standalone UI served at /. Works entirely through the HTTP + WS endpoints landed in Phase 3 — no direct engine imports from the frontend. Components: - TopBar: sim picker, project path input, scheme dropdown, ▶ Run button with disabled/building/failed states - DeviceFrame: phone bezel + MJPEG stream from /stream.mjpeg; placeholder when no sim is pinned or sim not booted; ripple animation dispatched from tap tool-events via a Zustand subscription (not a useEffect setState) - Sidebar: snapshot-on-demand with clickable @ref list (each taps the element) + live activity feed of recent tool-events - LogsDrawer: auto-scrolled tail of tool-log frames - Toasts: one-liner on build/install/launch completion or any failure Stores (Zustand): - sim-store — sims list, pinned UDID, stream info - project-store — active project path, scheme, configuration, build status - activity-store — tool-event ring buffer + toast dispatch - logs-store — tool-log ring buffer (500 lines) Infra: - lib/ws.ts — WS client hook with exponential reconnect. Routes tool-event → activity-store, tool-log → logs-store. - lib/api.ts — typed fetch wrappers around /api/tools/*. - styles.css — minimal dark theme using the same color tokens as the Deus IDE. No Tailwind (keeps deps light). Exit verified: - typecheck clean, eslint clean - vite build succeeds (202KB JS, 5KB CSS, 43 modules) - Hono→Vite proxy serves the page via / Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dd serve)
The legacy SDK (fluent session builder) and CLI `stream` subcommand
were built for a stateless-CLI world. v2 replaces them with the
long-lived server, MCP endpoint, and /stream.mjpeg passthrough —
so these surfaces are dead weight.
Deleted:
- src/sdk/ (fluent `session()` builder + apps helpers)
- src/cli/stream/ (manager + viewer-html generator)
- src/cli/commands/stream.ts (CLI stream enable/disable/status)
- scripts/sdk-smoke.ts
Added:
- src/cli/commands/serve.ts — `device-use serve [--port N] [--host] [--open]`
spawns bun on the server module and inherits stdio. Works in dev
(source path) and in bundled form.
Updated:
- src/cli/index.ts — drop stream import, add serve
- scripts/build-ts.ts — drop sdk build entry; now produces
dist/cli.js + dist/engine.js
- package.json — description updated; main/exports
point at engine (primary programmatic
entry); bin unchanged
CLI surface (preserved): list, boot, shutdown, open, session, snapshot,
query, tap, swipe, type, fill, screenshot, wait-for, open-url, apps,
appstate, launch, terminate, permission, doctor, install, serve.
Exit verified:
- typecheck clean
- 86 tests pass
- `bun run cli help` lists all commands including `serve`
- `bun run cli serve --port 3101` starts server, /health → 200
- `bun run build:ts` produces dist/cli.js + dist/engine.js cleanly
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final phase: makes device-use v2 AAP-ready and documents the shape.
- agentic-app.json — manifest at the package root. Host IDEs (Deus,
MCP-speaking IDEs) read this to launch device-use, probe readiness
via /health, embed the viewer at /, and wire agent tool calls into
/mcp. Uses the schema shape agreed in docs/device-use-v2-design.md.
- run tool (composite) — chains build → resolveAppPath → install →
launch_app in one call. Reads CFBundleIdentifier from the built
Info.plist when bundleId isn't provided. Total MCP tools now 24.
The Viewer's ▶ Run button calls build + install + launch as
separate steps today; future phases can switch to this composite
for a cleaner progress story.
- README rewritten — new architecture, how to run the server, MCP
endpoint config for Claude Desktop, full tool list.
- AGENTS.md rewritten — new layout, how to add a tool (invokeTool
path), how to add a CLI command, event shape, hard rules ("never
bypass invokeTool", "stateless CLI stays stateless", etc.).
- .gitignore — adds .device-use/ (server's per-workspace state dir).
End-to-end verified:
- typecheck clean
- 86 tests pass
- curl /health → {ok:true}
- curl /api/tools → 24 tools
- curl /mcp tools/list → 24 tools
- set_active_simulator persists via state.json
- WebSocket streams started+completed tool-events with correlation id
- Pre-flight Info.plist-read in run tool handles bundleId resolution
device-use v2 is now the standalone product the AAP will later consume.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/e2e-server.ts spawns a fresh device-use server on port 3199, subscribes to /ws, exercises the full data flow, and asserts event correlation across REST, MCP, and WebSocket paths. Runs in ~25s when xcodebuild has a warm DerivedData cache. Covers: 1. Server lifecycle — spawn + /health probe + graceful SIGTERM. 2. Tool registry — REST /api/tools reports 24 tools. 3. MCP HTTP transport — initialize + tools/list + tools/call(get_state) over JSON-RPC-over-SSE with per-request stateless transport. 4. Simulator lifecycle — list_devices, set_active_simulator, boot (Shutdown → Booted). 5. Project introspection — get_project_info parses xcodebuild -list -json and finds the TestApp scheme. 6. Composite run tool — build → resolveAppPath → install → auto-resolve bundleId from Info.plist → launch. End-to-end in one call. 7. Snapshot — a11y tree walk, @refs assigned to interactive nodes. 8. Tap — ref-based tap through simbridge. 9. State persistence — state.json round-trip verified on disk AND via get_state tool. 10. WS event correlation — every tool emits paired started+completed frames with matching ids; build streams tool-log frames during xcodebuild. First successful run: 24 tools · 19 tool-events · 273 tool-log frames · 0 failures · 25s Future regression: `bun scripts/e2e-server.ts` from the package dir. Requires Xcode + an iPhone 17 Pro simulator available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The server previously bound to 127.0.0.1 only. On a machine with a stale IPv6 listener on the same port (e.g. an orphaned sim-helper from agent-simulator — observed in the wild, PPID=1), browsers resolving `localhost` to ::1 first would connect to the wrong server and render the stale UI. Binding to 0.0.0.0 (all IPv4 interfaces) means happy-eyeballs falls back to 127.0.0.1 cleanly when ::1 has no listener, and any IPv6 orphan on the same port becomes harmless rather than a hijacker. Root cause, not a workaround: the orphan was a real bug in older tooling; this change makes us robust against that class of collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The stream subprocess (simbridge --stream) was only auto-starting at server startup for a pre-pinned sim. Booting interactively via the UI would pin + boot the sim but never start the stream — viewer stuck on "Waiting for stream…" even though everything else worked (snapshot, activity feed, etc.). Fix: - boot tool — after simctl boot succeeds, start the stream for that UDID. Failures are logged but don't fail the boot (CLI callers can still boot without needing a stream). - set_active_simulator tool — if the newly-pinned sim is already booted, start the stream too. Covers the "open viewer after sim is already booted externally" path. Both return the new stream info in their result so the UI doesn't need to wait for the 10s refresh tick to pick it up. Verified against a real sim: restart server → pinned UDID auto-boots and streams on an assigned port → /stream.mjpeg returns 200 multipart/x-mixed-replace with frames. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mouse events on the MJPEG image now drive the simulator, not just
watch it. Click → tap(x, y). Drag → swipe(from, to) with an 8pt
threshold to distinguish the two. Cursor turns crosshair when a
clickable stream is loaded.
To map CSS pixels → simulator points, StreamManager.start() now
probes the upstream /config on ready and computes logical point
dimensions (assuming the standard 3x retina scale). StreamInfo
gains `size: { pxW, pxH, ptW, ptH }`, exposed via /api/stream.
The DeviceFrame multiplies the click's fractional position by
ptW/ptH to get point coords for the tap/swipe tools.
Ripple animations now use the dynamic ptW from streamInfo instead
of the previous hardcoded 402pt estimate, so they land on the
right spot on any device.
Every interaction routes through invokeTool → same activity feed
entries, same tool-event broadcast, same MCP trace. Zero new
invocation paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clicks on the MJPEG <img> silently did nothing — browsers don't fire
mouse events reliably on images that swap frames continuously. The
original Deus simulator viewer avoided this by rendering MJPEG frames
into a <canvas> and sending touch events as binary WebSocket frames
to simbridge's /ws (0x03 prefix + JSON payload, begin/move/end
lifecycle). Adopting that proven pattern.
Server:
- /sim-input WebSocket endpoint. Opens an upstream WS to simbridge's
/ws for the active stream and forwards binary frames one-way
(browser → simbridge). Buffers messages until upstream handshake
completes. Closes upstream when client disconnects.
- Upgrade flow attaches { kind: "events" | "sim-input", wsUrl? } to
the Bun WS data field so a single websocket handler can multiplex.
Frontend:
- DeviceFrame now renders the stream into a <canvas>. A requestAnimation
Frame loop drives ctx.drawImage(img, 0, 0) from an off-screen Image
pointed at /stream.mjpeg. Canvas fires mouse events reliably even
while frames are updating.
- Mouse events encode 0x03-prefixed TouchEventPayload (begin on
mousedown, move on drag, end on mouseup) and send normalized 0..1
coordinates via the /sim-input WS.
- Window-level mouseup catches drags that release outside the canvas.
- Ripples fire locally on mousedown — immediate UI feedback without
waiting for a server round-trip.
Verified end-to-end: tap at Maps' center (0.27, 0.20) → Maps app
launched with "Allow Location" permission dialog. scripts/touch-smoke.ts
exercises the full path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tap User reported the refs sidebar showed stale elements: the list still referenced the home screen after they'd opened Maps. Fix: every canvas mouseup schedules a snapshot refresh (debounced 500ms, collapses rapid taps into one call). The sidebar also auto-snapshots the moment the stream first goes live, and shows the foreground app name next to the "elements" header so the user knows what they're looking at. - refs-store.ts — new Zustand store with refs + foreground + loading, a refresh() action, and a scheduleRefresh(delayMs) debouncer - DeviceFrame — calls scheduleRefresh on canvas mouseup AND on the window-level mouseup (drag-release-off-canvas path) - Sidebar — consumes the refs store, auto-snapshots on first stream connect, re-snapshots after any ref tap (sidebar → canvas tap → screen changes → refs stale otherwise) Activity feed also gets clearer: users can see which tap was theirs (REST via api.tap from ref clicks) vs. which came from the direct WS /sim-input path (those bypass invokeTool and don't show in activity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lback)
Ref taps were hitting the bounding-box center, which misses the hit
target for elements like switches/checkboxes where the tappable area
is not centered on the a11y frame (e.g. Maps' "Traffic" checkbox —
the a11y node's center is on the label, the switch is to the right).
Fix: use the engine's existing tapEntry helper, which tries in order:
1. tapById(identifier) — simbridge scans the a11y tree and taps the
matching element's natural hit target
2. tapByLabel(label) — same, by accessibility label
3. tap(x, y) — coordinate fallback
This matches what the CLI's `tap @eN` command already does. Ref taps
in the viewer sidebar now toggle switches, check checkboxes, and
activate controls whose hit target differs from the a11y bounds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a row of hardware buttons (Home / Lock / Vol+ / Vol−) below the
phone. Clicks send 0x04 + JSON{button} frames to simbridge's /ws via
the existing /sim-input proxy — same transport as touch, so they're
instant and don't round-trip through invokeTool.
Also:
- Cmd+H / Cmd+L keyboard shortcuts for Home / Lock (respects focus;
ignores keys while typing in inputs)
- Canvas right-click now preventDefault's (no more browser context
menu popping over the stream)
- Button press triggers the same debounced refs-snapshot refresh that
canvas taps do, so the sidebar stays in sync after going home
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical cleanup after the v2 refactor, no behavior changes. 86
tests + e2e-server (18 tool events, 97 tool-log frames) still pass.
Server:
- EventBus now owns `logHandles: Map<string, LogHandle>` directly.
stream_logs / stop_logs drop the `ctx.events as unknown as …` cast
and the `_logHandles ??= new Map()` dance.
- New `toolInputSchema()` helper in tools.ts wraps the zod-to-json-
schema call with its `as any` cast. /api/tools and /mcp tools/list
both use it; removes the duplicated expression and two casts.
- WS_SUBSCRIBERS map collapsed from `{unsubscribe, ws: unknown}` to
the unsubscribe function directly. The `ws: unknown` field was
never read.
- Dropped the `void buf;` no-op in stream.ts — the simbridge stderr
handler just discards the parameter.
Frontend:
- snapshot tool now returns an explicit `foreground: string | null`
(the root Application node's label). Frontend reads it as a named
field instead of digging into `tree[0].label`.
- snapshot call in api.ts stops requesting interactiveOnly:false
(it was inflating the sidebar with non-clickable static text).
Server default of `interactiveOnly: true` is what the UI wants.
- project-store: BuildStatus type trimmed from 6 states to 4
(idle|running|done|failed). The old "building|installing|launching"
sub-states were dead — nothing set them.
- TopBar's Run button wired to `api.run()` (the composite: build →
install → launch) instead of `api.build()`. "▶ run" actually runs.
- DeviceFrame: extracted `normalize(clientX, clientY)` helper so the
canvas mouseup and window-level mouseup share one coordinate path.
- Every scheduleRefresh() callsite dropped its explicit 500ms arg;
that's now the store's default.
CLI:
- `device-use serve --open` now polls /health (up to 6s) before
opening the browser, replacing the 1500ms blind sleep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…them simbridge's SimAccessibilityBridge.sendButton only dispatches home and lock; volumeup/volumedown payloads reach the handler but are no-ops. Keeping the UI honest by removing the two buttons that don't do anything. If volume control comes back later, it'll need a sendButton case in Swift first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sim-store refreshes /api/stream every 10s and writes a fresh streamInfo object into state. Both effects in DeviceFrame (the MJPEG canvas render loop and the /sim-input WebSocket) depended on the whole streamInfo object, so each poll tore down the canvas, cleared it, recreated the Image element, and reconnected the WS. Result: a visible black flash on the phone every 10 seconds and a one-frame gap in touch delivery if you happened to be mid-drag. Fix: depend on primitive identity — streamUdid (and streamPort for the WS). Same UDID/port → same effect instance, no teardown. If the underlying sim actually changes UDID (user switches sim), both effects re-fire correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dentity The first stab at the 10s blink fixed DeviceFrame's effect deps but missed two other places that still churned references on every poll: 1. sim-store.refresh() set streamInfo / sims / pinnedUdid unconditionally, so every 10s each of those fields got a new reference even when the content was unchanged. Anything subscribed to them via `useSimStore()` re-rendered. 2. Sidebar's "take an initial snapshot on first stream connect" effect depended on the full streamInfo object; every poll re-fired the effect's guard check. Real fix — at the store, not the consumers: - sim-store now compares new vs. previous stream/sims/pinnedUdid and reuses the old reference when content matches. No downstream sees a spurious update. - Sidebar effect now depends on `streamInfo?.udid` (primitive) instead of the object, matching DeviceFrame's pattern. Even if we later regress #1, #2 is independently safe. Served bundle: index-Cre9C4oH.js. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes closing the last gaps before the PR opens.
1. Keyboard input on the viewer — keystrokes while focus isn't on
another input get batched into a 150ms idle buffer and flushed via
the type_text tool. Enter flushes with submit=true (presses Return
after). Backspace trims the local buffer. Modifier keys (Cmd/Ctrl)
are preserved for the existing hardware-button shortcuts
(Cmd+H / Cmd+L). Buffering collapses fast typing like "hello"
from 5 sequential simbridge subprocesses into one.
2. Long-press verified — scripts/longpress-smoke.ts holds begin for
1500ms, then releases end, and asserts the iOS context menu
appears (screen refs grow, including the "Dismiss context menu"
action). Passes against a real iPhone simulator. The architecture
already supported it: our /sim-input WS forwards begin+end raw to
simbridge, which keeps the touch down for whatever interval the
client chooses. No code needed — just proof.
3. GitHub Actions workflow for nightly e2e on macos-15. Picks any
available iPhone, boots it, runs scripts/e2e-server.ts end-to-end
(spawn → /health → WS → REST + MCP → build → install → launch →
tap → state persistence). Schedule + manual dispatch only; we
don't run per-PR (macOS minutes cost 10×). The e2e script now
honors $E2E_SIM_UDID to target a specific sim without changing
its default behavior for local dev.
Verified end-to-end:
- type_text via REST: {success:true}
- longpress-smoke.ts: refs 3 → 7, iOS context menu opened
- touch-smoke.ts: still passes
- e2e-server.ts: full flow passes (18 tool-events, 97 tool-logs)
- 86/86 unit tests pass
- typecheck clean
Out of scope for this PR (noted in design doc as v1.1):
- wheel→scroll mapping
- pinch/rotation multi-touch
- CLI actions appearing in /ws activity feed (peer model stays pure)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "Handoff to next PR" block to docs/device-use-v2-design.md so
whoever picks up the Deus-side AAP integration has a working brief.
Covers:
- What to build in Deus (apps.service.ts, app-registrar.ts, lifecycle
tools, frontend launcher) with concrete file paths.
- What to delete after AAP works (~2.5k LOC of legacy simulator code
in apps/web/src/features/simulator/ + simulator-context.ts +
deus-tools/simulator.ts).
- 8 lessons device-use learned that should shape AAP host behavior:
requires validation, {port} substitution, mandatory health probe,
storage env var, child.on(exit) > polling, low-latency input
back-channel, runtime setMcpServers, framework-agnostic frontend.
- Things device-use deliberately deferred (wheel-scroll, CLI in WS,
multi-sim side-by-side) and where they belong.
- Open design questions for the next PR.
- A concrete first-commit slice for the Deus AAP PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors device-use into a Bun-backed standalone iOS simulator workbench: adds a long-lived server (HTTP/MCP/WS), React frontend, unified tool registry and invoker, simbridge stream manager, persistent workspace state, Xcode/xcrun engine wrappers, e2e and smoke test scripts, and removes the previous SDK/session and legacy stream CLI. Changes
Sequence DiagramsequenceDiagram
actor User
participant Frontend as React Viewer
participant Server as Bun HTTP/WS Server
participant Tools as Tool Registry / Invoker
participant Engine as xcodebuild/xcrun wrappers
participant Bridge as simbridge subprocess
participant Simulator as iOS Simulator
User->>Frontend: open UI / click Run
Frontend->>Server: POST /api/tools/run {project, scheme, udid}
Server->>Tools: invokeTool("run", params)
Tools->>Server: emit tool-event (started)
Tools->>Engine: build(project, scheme, destination)
Engine->>Bridge: spawn xcodebuild / xcrun (child process)
Bridge->>Simulator: build/install/launch actions
Simulator-->>Bridge: runtime responses
Bridge-->>Engine: stdout/stderr lines
Engine->>Tools: BuildResult / appPath
Tools->>Engine: install(appPath, udid) / launchApp(...)
Tools->>Server: emit tool-event (completed)
Server-->>Frontend: WS tool-event / tool-log frames
Frontend->>Server: WS binary sim-input (touch)
Server->>Bridge: forward binary frame
Bridge->>Simulator: inject HID input
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
🧹 Nitpick comments (16)
packages/device-use/scripts/ws-smoke.ts (1)
27-28: JSON.parse may throw on malformed frames.If the server sends non-JSON data,
JSON.parsewill throw and crash the script. This is acceptable for a smoke test (crash = test failure), but wrapping in try/catch would make failure diagnostics clearer.🛡️ Optional defensive parsing
ws.addEventListener("message", (e) => { + let msg; + try { - const msg = JSON.parse(typeof e.data === "string" ? e.data : e.data.toString()); + msg = JSON.parse(typeof e.data === "string" ? e.data : e.data.toString()); + } catch { + console.warn("Ignoring malformed frame:", e.data); + return; + } seen.push(msg);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/scripts/ws-smoke.ts` around lines 27 - 28, The message handler attached with ws.addEventListener("message", ...) calls JSON.parse directly on e.data which will throw on malformed JSON and crash; wrap the parse in a try/catch inside that handler (around JSON.parse and subsequent use of msg) to catch parsing errors, log a clear diagnostic including the raw frame (e.data / e.data.toString()) and the error, and then either return or close the socket so the smoke test fails with a readable message; update references to JSON.parse, msg, and e.data in that handler accordingly.packages/device-use/src/frontend/components/Toasts.tsx (1)
5-7: Redundant null check after length guard.Line 7's
if (!latest) return nullis unreachable — iftoasts.length === 0on line 5, we've already returned. After that guard,toasts[toasts.length - 1]is guaranteed to exist.🧹 Optional cleanup
export function Toasts() { const toasts = useActivityStore((s) => s.toasts); if (toasts.length === 0) return null; const latest = toasts[toasts.length - 1]; - if (!latest) return null; return (🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/frontend/components/Toasts.tsx` around lines 5 - 7, The redundant null check in the Toasts component should be removed: after the early return "if (toasts.length === 0) return null" the computed "const latest = toasts[toasts.length - 1]" is guaranteed to be defined, so delete the unreachable "if (!latest) return null" check (or collapse the two checks into a single guard) to simplify the logic around the toasts/latest variables.packages/device-use/src/frontend/lib/ws.ts (1)
21-31: Consider ts-pattern for message type dispatch (optional).Per coding guidelines, discriminated unions should prefer
ts-patternwith.exhaustive(). However, since this parses external JSON where unknown types should be silently ignored, the current simple conditionals are pragmatic. If more message types are added, consider refactoring.♻️ Example with ts-pattern (if desired)
import { match } from "ts-pattern"; // In the message handler: match(msg) .with({ type: "tool-event" }, (m) => useActivityStore.getState().push(m)) .with({ type: "tool-log" }, (m) => useLogsStore.getState().append(m)) .otherwise(() => {/* ignore unknown */});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/frontend/lib/ws.ts` around lines 21 - 31, The message handler attached in ws.addEventListener currently uses ad-hoc conditionals to dispatch parsed JSON frames to useActivityStore.getState().push and useLogsStore.getState().append; refactor this dispatch to use ts-pattern's match for clearer discriminated-union handling (e.g., match(msg).with({type: "tool-event"}, m => useActivityStore.getState().push(m)).with({type: "tool-log"}, m => useLogsStore.getState().append(m)).otherwise(() => {/* ignore unknown */})), and if you add more message types later, use .exhaustive() to ensure coverage while preserving the current behavior of silently ignoring unknown external messages.packages/device-use/src/frontend/components/LogsDrawer.tsx (1)
8-10: Consider debouncing rapid scroll updates.When logs stream rapidly, calling
scrollIntoView({ behavior: "smooth" })on every append may queue up animations and cause visual jank. Consider debouncing or usingbehavior: "instant"for high-frequency updates.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/frontend/components/LogsDrawer.tsx` around lines 8 - 10, The current useEffect calling bottomRef.current?.scrollIntoView({ behavior: "smooth", block: "end" }) on every change to lines can queue animations when logs stream rapidly; modify the scroll logic in the useEffect to debounce rapid updates (e.g., store a timeout or timestamp in a ref and only call scrollIntoView after a short delay or when no new lines arrive for X ms) or switch to instant scrolling for high-frequency updates (detect update rate via a lastUpdate ref and if updates are too frequent call scrollIntoView({ behavior: "instant", block: "end" }) instead), updating the useEffect, bottomRef usage, and any added refs/cleanup to clear timeouts on unmount.packages/device-use/test/xcodebuild.test.ts (1)
21-37: The fake-child cleanup hook is a no-op right now.
afterEach()clearsSTARTED, butmakeSpawner()never pushes the createdFakeChildinto that array. If a future test leaks a late emit or unread stream data, this guard won’t actually isolate the suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/test/xcodebuild.test.ts` around lines 21 - 37, The cleanup is ineffective because makeSpawner never records created FakeChild instances in STARTED; modify makeSpawner to push each created child into the STARTED array (e.g., after creating `const child = new FakeChild();` call `STARTED.push(child)`), so the existing afterEach() guard can actually clear and prevent leaked emits or unread streams; reference symbols: makeSpawner, STARTED, FakeChild, spawner.packages/device-use/src/frontend/stores/refs-store.ts (1)
22-48: Move snapshot fetching out of Zustand and into TanStack Query.
refs,foreground, andloadingare server-state concerns, and this store is re-implementing fetch lifecycle/invalidation manually. Keeping only the debounce/local UI bits here and letting Query ownapi.snapshot()would simplify refresh behavior and align with the repo’s expected state split.As per coding guidelines, "Use TanStack Query v5 for server state management."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/frontend/stores/refs-store.ts` around lines 22 - 48, The store currently performs server fetching in useRefsStore (keys: refs, foreground, loading, refresh, scheduleRefresh) by calling api.snapshot(); remove the fetch lifecycle from refresh and loading management and instead use TanStack Query v5 to fetch snapshot data (e.g., a useSnapshotQuery that returns refs and foreground). Keep useRefsStore only for local UI/debounce concerns: expose local foreground mutation and scheduleRefresh (update scheduleRefresh to call queryClient.invalidateQueries or the query's refetch trigger instead of invoking api.snapshot). Replace any direct uses of useRefsStore.refresh with the query refetch/invalidation and remove manual loading state handling from the store. Ensure components read refs/foreground from the query and only use the store for debounced UI state..github/workflows/device-use-e2e.yml (1)
21-24: Consider pinning Bun version for CI stability.Using
bun-version: latestcan cause unexpected failures if a new Bun release introduces breaking changes. Pinning to a specific version improves reproducibility.♻️ Proposed fix
- name: Setup Bun uses: oven-sh/setup-bun@v2 with: - bun-version: latest + bun-version: "1.2"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/device-use-e2e.yml around lines 21 - 24, Replace the dynamic bun-version in the GitHub Actions step that sets up Bun (the "Setup Bun" step using oven-sh/setup-bun@v2) with a pinned, explicit version string (e.g., "0.x.y") instead of "latest" so CI is stable and reproducible; update the bun-version input to the chosen fixed semver and document the pinned version in the workflow comment.packages/device-use/src/frontend/components/TopBar.tsx (1)
58-64: Prefer Tailwind class over inline style.Using inline
stylebypasses Tailwind's design system. Consider using a Tailwind width utility instead.♻️ Proposed fix
<input type="text" placeholder="project path (.xcodeproj or .xcworkspace)" defaultValue={project.path ?? ""} onBlur={onProjectPath} - style={{ minWidth: 280 }} + className="min-w-70" />As per coding guidelines: "Use Tailwind CSS v4 for styling."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/frontend/components/TopBar.tsx` around lines 58 - 64, Replace the inline style on the input in TopBar.tsx with a Tailwind utility: remove style={{ minWidth: 280 }} and add a className using the Tailwind arbitrary value (for example className="min-w-[280px]") while preserving existing props (type, placeholder, defaultValue={project.path ?? ""}, onBlur={onProjectPath}); ensure you merge the className if one already exists on the element.packages/device-use/scripts/touch-smoke.ts (1)
41-51: WebSocket error event discards diagnostic information.The error handler rejects with a generic message, losing the actual error details that could help diagnose connection failures.
♻️ Proposed fix
await new Promise<void>((resolve, reject) => { const t = setTimeout(() => reject(new Error("ws open timeout")), 3000); ws.addEventListener("open", () => { clearTimeout(t); resolve(); }); - ws.addEventListener("error", () => { + ws.addEventListener("error", (ev) => { clearTimeout(t); - reject(new Error("ws error")); + reject(new Error(`ws error: ${ev.message ?? "unknown"}`)); }); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/scripts/touch-smoke.ts` around lines 41 - 51, The "error" listener on the WebSocket promise throws a generic Error and discards diagnostic details; change the listener on ws (the ws.addEventListener("error", ... ) block) to accept the error/event argument (e.g., (err) or (ev)) and reject with that error or a new Error that includes its message/stack (e.g., reject(err) or reject(new Error(`ws error: ${err?.message || String(err)}`))) so the real diagnostic information is preserved when the promise fails.packages/device-use/src/frontend/stores/project-store.ts (1)
37-46: Consider handlingsetActiveProjectfailure.If
api.setActiveProject()fails (network error, server error), the local state is still updated (lines 40-45), potentially causing UI/server state divergence. Consider checking the result before updating local state, or at minimum logging the error.🛡️ Suggested approach
setProject: async (path, scheme, configuration) => { - await api.setActiveProject(path, scheme, configuration); + const setResult = await api.setActiveProject(path, scheme, configuration); + if (!setResult.success) { + console.warn("[project-store] setActiveProject failed:", setResult.error); + // Optionally: return early or show toast + } const info = await api.getProjectInfo(path);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/frontend/stores/project-store.ts` around lines 37 - 46, The setProject handler updates local state regardless of api.setActiveProject outcome; change it to check the result of api.setActiveProject(path, scheme, configuration) and only proceed to call api.getProjectInfo and set({...}) when the setActiveProject call succeeds, otherwise log the error (or surface it to the caller) and avoid mutating the local store (or revert any partial changes). Specifically update the setProject function to capture the response/error from api.setActiveProject, handle failure paths (logging via processLogger/console or returning an error), and only use api.getProjectInfo and set(...) when setActiveProject indicates success.docs/device-use-v2-design.md (1)
39-39: Add language specifier to fenced code block.The architecture diagram code block is missing a language specifier. Use
textorplaintextfor ASCII diagrams.📝 Suggested fix
-``` +```text ┌─────────────────────────────────────────────────────────────┐🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/device-use-v2-design.md` at line 39, The fenced code block containing the architecture diagram is missing a language specifier; update that block (the ASCII "architecture diagram" fenced code block) to start with a language tag such as ```text or ```plaintext so the diagram is rendered correctly (e.g., replace ``` with ```text at the opening fence).packages/device-use/test/server.test.ts (1)
94-106: Misleading test name.The test name says "emits started + failed on unknown tool" but the assertion (line 104) correctly expects only 1 event (
failed). The invoker intentionally skipsstartedfor unknown tools. Consider renaming for clarity.📝 Suggested fix
- test("emits started + failed on unknown tool", async () => { + test("emits failed (no started) on unknown tool", async () => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/test/server.test.ts` around lines 94 - 106, The test title "emits started + failed on unknown tool" is misleading because invokeTool("nope", ...) intentionally skips emitting a "started" event and only emits a single "failed" tool-event; rename the test description to something like "emits failed on unknown tool" (update the string passed to test(...) that wraps the assertions using invokeTool, events, and toolEvents) so the name matches the assertions and behavior of invokeTool.packages/device-use/src/frontend/components/DeviceFrame.tsx (1)
97-108: UnusedstreamPortin dependency array.The WebSocket effect depends on
streamPort(line 108) but doesn't use it in the effect body. While this may be intentional (to reconnect when stream config changes), it could cause unnecessary WS reconnections when only the port changes without affecting the connection URL (which useslocation.host).If the intent is to reconnect when the underlying stream restarts, consider documenting this with a comment, or remove
streamPortif reconnection isn't needed:- }, [booted, streamUdid, streamPort]); + // Reconnect WS when stream restarts (new UDID or port indicates simbridge restart) + }, [booted, streamUdid, streamPort]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/frontend/components/DeviceFrame.tsx` around lines 97 - 108, The useEffect in DeviceFrame that opens the /sim-input WebSocket includes streamPort in its dependency array but never references it, causing unnecessary reconnects; either remove streamPort from the dependency array to avoid spurious reconnections or (if reconnect on stream restart is intended) add a short comment above the effect explaining that streamPort is included intentionally to force reconnection on stream changes; if the latter is truly required for reconnect behavior, alternatively incorporate streamPort into the connection key (e.g., the URL or a query param) so its inclusion is explicit. Ensure you update the dependency list on the useEffect and/or add the clarifying comment near wsRef and the useEffect.packages/device-use/src/server/index.ts (1)
167-212: Consider typed WebSocket wrapper.The
ws as unknown as objectcasts (lines 181, 209) are used as Map keys because Bun's WebSocket type doesn't have a stable identity type. While functional, this is a type-safety workaround.For future maintainability, consider creating a typed wrapper or using
WeakMapwith the WebSocket instance if Bun's types improve. Current approach works but relies on object identity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/server/index.ts` around lines 167 - 212, The code uses unsafe casts ws as unknown as object when storing keys in WS_SUBSCRIBERS and WS_UPSTREAMS inside the websocket.open handler; replace this fragile pattern by introducing a typed wrapper or keying mechanism: create a small wrapper type/class (e.g., ClientConnection) that holds the Bun WebSocket and any metadata (unsubscribe, upstream, buffered) and store instances of that wrapper in WS_SUBSCRIBERS/WS_UPSTREAMS (or switch those maps to WeakMap<WebSocket, ...> keyed by the actual WebSocket type if Bun's types allow it); update websocket.open (and any lookup/cleanup sites) to create and store the wrapper (or use the WebSocket directly with WeakMap) instead of casting, and update references to retrieve the wrapper to access unsubscribe/upstream/buffered.packages/device-use/src/server/stream.ts (1)
107-108: Consider draining stdout to prevent backpressure.The process is spawned with
stdio: ["ignore", "pipe", "pipe"], meaning stdout is piped but never consumed (only stderr has a no-op handler on line 108). If simbridge writes to stdout, this could cause backpressure and potentially hang the subprocess.🛡️ Suggested fix
// Drop simbridge's stderr — it's noisy about private-framework loads. + child.stdout?.on("data", () => {}); child.stderr?.on("data", () => {});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/server/stream.ts` around lines 107 - 108, The subprocess's stdout is piped but never consumed, causing possible backpressure; add a no-op drain for stdout by attaching a data handler to the child.stdout stream (similar to the existing child.stderr?.on("data", () => {})) so stdout is consumed; locate the spawned process represented by the variable child in stream.ts and add child.stdout?.on("data", () => {}), and consider also handling child.stdout?.on("error", ...) if needed for robustness.packages/device-use/AGENTS.md (1)
18-30: Add language specifier to fenced code block.The directory tree code block is missing a language specifier. Use
textorplaintextfor non-syntax-highlighted blocks to satisfy linters.📝 Suggested fix
-``` +```text packages/device-use/ ├── native/ Swift simbridge — don't rewrite without good reason🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/AGENTS.md` around lines 18 - 30, The fenced directory-tree block in AGENTS.md is missing a language specifier; update the opening code fence (the triple backticks that start the block) to include a plain-text specifier (e.g., change ``` to ```text or ```plaintext) so the block is treated as non-syntax-highlighted text; leave the block contents (the tree with lines like "packages/device-use/" and "├── native/") unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/device-use-v2-design.md`:
- Around line 54-57: The architecture diagram text currently reads "TanStack
Start on Nitro" but decision `#5` locks the runtime to "Hono on Bun.serve"; update
the diagram string "TanStack Start on Nitro" to "Hono on Bun.serve" (and any
nearby labels like routes: /, /mcp, /ws if needed) so the diagram matches the
documented decision `#5`.
- Around line 63-65: The document references "TanStack Start React viewer" which
conflicts with decision `#5`; update that phrase to reflect the chosen stack by
replacing "TanStack Start React viewer" with "Vite + React viewer (using
TanStack Router for routing)" so the frontend description consistently states
Vite + React and mentions TanStack Router as the routing library.
In `@packages/device-use/agentic-app.json`:
- Around line 10-33: The manifest uses "localhost" for UI/MCP URLs while the
launched command defaults to binding to 127.0.0.1; unify them to the same
loopback address to avoid IPv6/hostname resolution mismatches. Either add a host
flag to the launcher (in "launch.args": include "--host", "127.0.0.1") or change
the published endpoints to 127.0.0.1 (replace "localhost" with "127.0.0.1" in
"ui.url" and in "agent.tools.url") so the IDE connects to the exact interface
the serve command binds to.
In `@packages/device-use/package.json`:
- Around line 54-62: The package.json for the device-use package is missing the
zustand dependency used by the frontend stores (sim-store.ts, refs-store.ts,
project-store.ts, logs-store.ts, activity-store.ts); add "zustand" (e.g.
"^4.0.0" or the repo standard zustand version) to the "dependencies" object in
packages/device-use/package.json and reinstall (npm/yarn/pnpm) to update the
lockfile so imports from "zustand" resolve at runtime.
In `@packages/device-use/README.md`:
- Around line 65-88: The fenced code block containing the ASCII diagram in the
README is missing a language tag which triggers markdownlint; update the opening
fence from ``` to ```text so the block is explicitly marked as plain text (i.e.,
change the ASCII art fenced block in the README to start with ```text).
In `@packages/device-use/scripts/e2e-server.ts`:
- Around line 168-173: The current shutdown Promise relies on checking
serverProc!.killed inside the setTimeout, which is incorrect because killed
becomes true as soon as the signal is sent; instead track process exit with a
local flag and use that in the timeout to decide whether to send SIGKILL: add a
boolean like exited=false, set it to true inside the serverProc!.once("exit",
...) handler (and resolve there), send SIGTERM immediately via
serverProc!.kill("SIGTERM"), and in the setTimeout callback check the exited
flag (not serverProc!.killed) and call serverProc!.kill("SIGKILL") only if
exited is still false; ensure resolve is only called once from the exit handler.
In `@packages/device-use/src/cli/commands/serve.ts`:
- Around line 29-33: The moduleCandidates array contains an incorrect compiled
candidate path: from the cli/commands location the entry "../server/index.js"
resolves to cli/server/index.js instead of the package-level compiled server;
update the candidate in moduleCandidates (where path.resolve is called with
here) to point one directory higher (use ../../server/index.js) so the compiled
layout (dist/cli + dist/server) correctly resolves the server entrypoint.
- Around line 51-65: The code always calls spawn(opener, [url]) after the
polling loop even if /health never returned OK; change the logic in the
params.open async IIFE to track whether a successful health probe occurred
(e.g., set sawHealthy = true when res.ok) and only call spawn(opener, [url],
...) when sawHealthy is true; also break out of the loop immediately after a
successful probe (you already do that) and consider skipping the opener if the
server process has exited (check any server process variable available in this
scope) so that spawn(opener, ...) is executed only when a 200 was observed and
the server is still running.
- Around line 68-72: The current await new Promise using child.once("exit")
always returns success: true even when the child process failed; change to await
both 'error' and 'exit' events from the spawned child (listen for
child.once("error", err => reject(err)) and child.once("exit", (code, signal) =>
code === 0 ? resolve() : reject(new Error(`exit code ${code} signal
${signal}`)))) so failures are propagated; then catch/reject and return success:
false with an informative message containing the exit code/signal or error.
Refer to the existing child variable and the Promise around child.once("exit")
in serve.ts to locate where to replace the behavior.
In `@packages/device-use/src/engine/project-info.ts`:
- Around line 40-43: detectKind currently treats any path not ending with
".xcworkspace" as "xcodeproj", which masks invalid inputs; update
detectKind(projectPath: string) to explicitly check for ".xcworkspace" and
".xcodeproj" and throw a clear Error for any other extensions so callers fail
fast with a helpful message (include projectPath in the error text for context).
In `@packages/device-use/src/engine/xcodebuild.ts`:
- Around line 150-159: The code currently sets result.appPath to a synthesized
path (using options.derivedDataPath, options.configuration, and options.scheme)
even when that path may not exist; update the block in xcodebuild.ts so that
after constructing the candidate path (the same join of options.derivedDataPath,
"Build/Products", `${options.configuration ?? "Debug"}-iphonesimulator`,
`${options.scheme}.app`) you verify the directory actually exists (synchronously
or asynchronously via fs.existsSync / fs.promises.access) and only assign
result.appPath when the check passes; otherwise leave result.appPath unset (or
null) so callers won’t receive a bogus path.
In `@packages/device-use/src/frontend/components/Sidebar.tsx`:
- Around line 13-17: The effect in Sidebar (useEffect) wrongly skips refreshing
refs when streamUdid changes because it checks refs.length === 0 against the
global useRefsStore; remove the refs.length condition so that when streamUdid
changes you always call useRefsStore.getState().refresh(), i.e. change the
effect to run on [streamUdid] and unconditionally invoke refresh() when
streamUdid is truthy; keep the existing call site
useRefsStore.getState().refresh() and ensure dependencies include streamUdid
(and not refs.length) so refs are reloaded on every simulator-stream change.
In `@packages/device-use/src/frontend/components/TopBar.tsx`:
- Around line 11-15: The onBoot handler currently calls await
api.boot(pinnedUdid) and then always calls useSimStore.getState().refresh(),
which swallows errors and refreshes state even on failure; wrap the api.boot
call in a try/catch inside onBoot (check pinnedUdid as before), call
useSimStore.getState().refresh() only on successful boot, and in the catch
surface the error to the user (e.g., via an existing notification/toast utility
or processLogger) so failures are visible; adjust onBoot to reference api.boot,
pinnedUdid and useSimStore.getState().refresh() accordingly.
In `@packages/device-use/src/frontend/stores/sim-store.ts`:
- Around line 64-67: The setPinned function updates local state before
confirming the API succeeded; change it so api.setActiveSimulator(udid) is
awaited inside a try/catch and only call set({ pinnedUdid: udid }) after a
successful response. In the catch block (or on non-success response) do not
update pinnedUdid, log or surface the error, and optionally return or rethrow
the error so callers can react; update the function signature/return value if
needed to convey success/failure. Ensure you modify the setPinned async function
and references to api.setActiveSimulator and set({ pinnedUdid }) to implement
this behavior.
In `@packages/device-use/src/frontend/styles.css`:
- Around line 82-85: The .topbar button.primary rule uses a hardcoded color
("white") and direct color literals elsewhere; update this selector and the
other mentioned blocks (around the ranges noted in the comment) to use CSS
variables instead of literals — e.g., replace color: white with color:
var(--text-on-accent) and replace any hex/rgba values with appropriate tokens
like var(--accent), var(--muted), or new variables added to the token set;
ensure you update the selectors .topbar button.primary and the other rules
referenced (the blocks around 124-134, 161-173, 176-183, 386-399) to
consistently reference CSS variables and add any missing variables to the theme
token definitions.
- Line 345: Replace the deprecated CSS declaration "word-break: break-word" with
the modern equivalent "overflow-wrap: break-word" in the stylesheet; locate the
rule that currently contains the "word-break: break-word;" property (the
declaration shown in packages/device-use/src/frontend/styles.css) and swap it to
use "overflow-wrap: break-word;" to restore the same wrapping behavior without
using deprecated syntax.
In `@packages/device-use/src/server/state.ts`:
- Around line 66-72: The persist() method currently assigns this.writeChain =
this.writeChain.then(async () => { await mkdir(...); await writeFile(...); }) so
a single rejected mkdir()/writeFile() will leave this.writeChain rejected and
forever skip later update()/clear() writes; fix by catching errors inside the
chained callback so the promise returned to this.writeChain always resolves:
wrap the mkdir/writeFile calls in a try/catch (inside persist's then callback),
log the error (or handle it) but do not rethrow, ensuring this.writeChain
remains fulfilled and subsequent calls to persist() continue to execute;
reference persist(), this.writeChain, mkdir, writeFile, update, clear,
storageDir, file and state to locate where to add the try/catch.
In `@packages/device-use/src/server/tools.ts`:
- Around line 553-562: The handler currently calls streamLogs(...) but doesn't
pass onError/onExit handlers and also doesn't guard against streamLogs throwing,
so stale entries are left in ctx.events.logHandles; update the handler in the
async function that calls resolveUdid/streamLogs to (1) wrap the streamLogs call
in try/catch and only insert into ctx.events.logHandles and return a
subscriptionId on success, and (2) pass onError and onExit callbacks to
streamLogs that remove the handle from ctx.events.logHandles (using the id), and
emit a final event (e.g., "tool-log" with stream "stderr" or a dedicated
"tool-exit") via ctx.events.emit so callers know the subscription ended.
---
Nitpick comments:
In @.github/workflows/device-use-e2e.yml:
- Around line 21-24: Replace the dynamic bun-version in the GitHub Actions step
that sets up Bun (the "Setup Bun" step using oven-sh/setup-bun@v2) with a
pinned, explicit version string (e.g., "0.x.y") instead of "latest" so CI is
stable and reproducible; update the bun-version input to the chosen fixed semver
and document the pinned version in the workflow comment.
In `@docs/device-use-v2-design.md`:
- Line 39: The fenced code block containing the architecture diagram is missing
a language specifier; update that block (the ASCII "architecture diagram" fenced
code block) to start with a language tag such as ```text or ```plaintext so the
diagram is rendered correctly (e.g., replace ``` with ```text at the opening
fence).
In `@packages/device-use/AGENTS.md`:
- Around line 18-30: The fenced directory-tree block in AGENTS.md is missing a
language specifier; update the opening code fence (the triple backticks that
start the block) to include a plain-text specifier (e.g., change ``` to ```text
or ```plaintext) so the block is treated as non-syntax-highlighted text; leave
the block contents (the tree with lines like "packages/device-use/" and "├──
native/") unchanged.
In `@packages/device-use/scripts/touch-smoke.ts`:
- Around line 41-51: The "error" listener on the WebSocket promise throws a
generic Error and discards diagnostic details; change the listener on ws (the
ws.addEventListener("error", ... ) block) to accept the error/event argument
(e.g., (err) or (ev)) and reject with that error or a new Error that includes
its message/stack (e.g., reject(err) or reject(new Error(`ws error:
${err?.message || String(err)}`))) so the real diagnostic information is
preserved when the promise fails.
In `@packages/device-use/scripts/ws-smoke.ts`:
- Around line 27-28: The message handler attached with
ws.addEventListener("message", ...) calls JSON.parse directly on e.data which
will throw on malformed JSON and crash; wrap the parse in a try/catch inside
that handler (around JSON.parse and subsequent use of msg) to catch parsing
errors, log a clear diagnostic including the raw frame (e.data /
e.data.toString()) and the error, and then either return or close the socket so
the smoke test fails with a readable message; update references to JSON.parse,
msg, and e.data in that handler accordingly.
In `@packages/device-use/src/frontend/components/DeviceFrame.tsx`:
- Around line 97-108: The useEffect in DeviceFrame that opens the /sim-input
WebSocket includes streamPort in its dependency array but never references it,
causing unnecessary reconnects; either remove streamPort from the dependency
array to avoid spurious reconnections or (if reconnect on stream restart is
intended) add a short comment above the effect explaining that streamPort is
included intentionally to force reconnection on stream changes; if the latter is
truly required for reconnect behavior, alternatively incorporate streamPort into
the connection key (e.g., the URL or a query param) so its inclusion is
explicit. Ensure you update the dependency list on the useEffect and/or add the
clarifying comment near wsRef and the useEffect.
In `@packages/device-use/src/frontend/components/LogsDrawer.tsx`:
- Around line 8-10: The current useEffect calling
bottomRef.current?.scrollIntoView({ behavior: "smooth", block: "end" }) on every
change to lines can queue animations when logs stream rapidly; modify the scroll
logic in the useEffect to debounce rapid updates (e.g., store a timeout or
timestamp in a ref and only call scrollIntoView after a short delay or when no
new lines arrive for X ms) or switch to instant scrolling for high-frequency
updates (detect update rate via a lastUpdate ref and if updates are too frequent
call scrollIntoView({ behavior: "instant", block: "end" }) instead), updating
the useEffect, bottomRef usage, and any added refs/cleanup to clear timeouts on
unmount.
In `@packages/device-use/src/frontend/components/Toasts.tsx`:
- Around line 5-7: The redundant null check in the Toasts component should be
removed: after the early return "if (toasts.length === 0) return null" the
computed "const latest = toasts[toasts.length - 1]" is guaranteed to be defined,
so delete the unreachable "if (!latest) return null" check (or collapse the two
checks into a single guard) to simplify the logic around the toasts/latest
variables.
In `@packages/device-use/src/frontend/components/TopBar.tsx`:
- Around line 58-64: Replace the inline style on the input in TopBar.tsx with a
Tailwind utility: remove style={{ minWidth: 280 }} and add a className using the
Tailwind arbitrary value (for example className="min-w-[280px]") while
preserving existing props (type, placeholder, defaultValue={project.path ?? ""},
onBlur={onProjectPath}); ensure you merge the className if one already exists on
the element.
In `@packages/device-use/src/frontend/lib/ws.ts`:
- Around line 21-31: The message handler attached in ws.addEventListener
currently uses ad-hoc conditionals to dispatch parsed JSON frames to
useActivityStore.getState().push and useLogsStore.getState().append; refactor
this dispatch to use ts-pattern's match for clearer discriminated-union handling
(e.g., match(msg).with({type: "tool-event"}, m =>
useActivityStore.getState().push(m)).with({type: "tool-log"}, m =>
useLogsStore.getState().append(m)).otherwise(() => {/* ignore unknown */})), and
if you add more message types later, use .exhaustive() to ensure coverage while
preserving the current behavior of silently ignoring unknown external messages.
In `@packages/device-use/src/frontend/stores/project-store.ts`:
- Around line 37-46: The setProject handler updates local state regardless of
api.setActiveProject outcome; change it to check the result of
api.setActiveProject(path, scheme, configuration) and only proceed to call
api.getProjectInfo and set({...}) when the setActiveProject call succeeds,
otherwise log the error (or surface it to the caller) and avoid mutating the
local store (or revert any partial changes). Specifically update the setProject
function to capture the response/error from api.setActiveProject, handle failure
paths (logging via processLogger/console or returning an error), and only use
api.getProjectInfo and set(...) when setActiveProject indicates success.
In `@packages/device-use/src/frontend/stores/refs-store.ts`:
- Around line 22-48: The store currently performs server fetching in
useRefsStore (keys: refs, foreground, loading, refresh, scheduleRefresh) by
calling api.snapshot(); remove the fetch lifecycle from refresh and loading
management and instead use TanStack Query v5 to fetch snapshot data (e.g., a
useSnapshotQuery that returns refs and foreground). Keep useRefsStore only for
local UI/debounce concerns: expose local foreground mutation and scheduleRefresh
(update scheduleRefresh to call queryClient.invalidateQueries or the query's
refetch trigger instead of invoking api.snapshot). Replace any direct uses of
useRefsStore.refresh with the query refetch/invalidation and remove manual
loading state handling from the store. Ensure components read refs/foreground
from the query and only use the store for debounced UI state.
In `@packages/device-use/src/server/index.ts`:
- Around line 167-212: The code uses unsafe casts ws as unknown as object when
storing keys in WS_SUBSCRIBERS and WS_UPSTREAMS inside the websocket.open
handler; replace this fragile pattern by introducing a typed wrapper or keying
mechanism: create a small wrapper type/class (e.g., ClientConnection) that holds
the Bun WebSocket and any metadata (unsubscribe, upstream, buffered) and store
instances of that wrapper in WS_SUBSCRIBERS/WS_UPSTREAMS (or switch those maps
to WeakMap<WebSocket, ...> keyed by the actual WebSocket type if Bun's types
allow it); update websocket.open (and any lookup/cleanup sites) to create and
store the wrapper (or use the WebSocket directly with WeakMap) instead of
casting, and update references to retrieve the wrapper to access
unsubscribe/upstream/buffered.
In `@packages/device-use/src/server/stream.ts`:
- Around line 107-108: The subprocess's stdout is piped but never consumed,
causing possible backpressure; add a no-op drain for stdout by attaching a data
handler to the child.stdout stream (similar to the existing
child.stderr?.on("data", () => {})) so stdout is consumed; locate the spawned
process represented by the variable child in stream.ts and add
child.stdout?.on("data", () => {}), and consider also handling
child.stdout?.on("error", ...) if needed for robustness.
In `@packages/device-use/test/server.test.ts`:
- Around line 94-106: The test title "emits started + failed on unknown tool" is
misleading because invokeTool("nope", ...) intentionally skips emitting a
"started" event and only emits a single "failed" tool-event; rename the test
description to something like "emits failed on unknown tool" (update the string
passed to test(...) that wraps the assertions using invokeTool, events, and
toolEvents) so the name matches the assertions and behavior of invokeTool.
In `@packages/device-use/test/xcodebuild.test.ts`:
- Around line 21-37: The cleanup is ineffective because makeSpawner never
records created FakeChild instances in STARTED; modify makeSpawner to push each
created child into the STARTED array (e.g., after creating `const child = new
FakeChild();` call `STARTED.push(child)`), so the existing afterEach() guard can
actually clear and prevent leaked emits or unread streams; reference symbols:
makeSpawner, STARTED, FakeChild, spawner.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fc0cdb83-0cdc-42eb-b77f-c3ac9fe32c8f
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (55)
.github/workflows/device-use-e2e.ymldocs/device-use-v2-design.mdpackage.jsonpackages/device-use/.gitignorepackages/device-use/AGENTS.mdpackages/device-use/README.mdpackages/device-use/agentic-app.jsonpackages/device-use/package.jsonpackages/device-use/scripts/build-ts.tspackages/device-use/scripts/e2e-server.tspackages/device-use/scripts/longpress-smoke.tspackages/device-use/scripts/sdk-smoke.tspackages/device-use/scripts/touch-smoke.tspackages/device-use/scripts/ws-smoke.tspackages/device-use/src/cli/commands/serve.tspackages/device-use/src/cli/commands/stream.tspackages/device-use/src/cli/index.tspackages/device-use/src/cli/stream/manager.tspackages/device-use/src/cli/stream/viewer-html.tspackages/device-use/src/engine/index.tspackages/device-use/src/engine/logs.tspackages/device-use/src/engine/project-info.tspackages/device-use/src/engine/xcodebuild.tspackages/device-use/src/frontend/App.tsxpackages/device-use/src/frontend/components/DeviceFrame.tsxpackages/device-use/src/frontend/components/LogsDrawer.tsxpackages/device-use/src/frontend/components/Sidebar.tsxpackages/device-use/src/frontend/components/Toasts.tsxpackages/device-use/src/frontend/components/TopBar.tsxpackages/device-use/src/frontend/index.htmlpackages/device-use/src/frontend/lib/api.tspackages/device-use/src/frontend/lib/ws.tspackages/device-use/src/frontend/main.tsxpackages/device-use/src/frontend/stores/activity-store.tspackages/device-use/src/frontend/stores/logs-store.tspackages/device-use/src/frontend/stores/project-store.tspackages/device-use/src/frontend/stores/refs-store.tspackages/device-use/src/frontend/stores/sim-store.tspackages/device-use/src/frontend/styles.csspackages/device-use/src/sdk/apps.tspackages/device-use/src/sdk/builder.tspackages/device-use/src/sdk/index.tspackages/device-use/src/server/events.tspackages/device-use/src/server/index.tspackages/device-use/src/server/invoker.tspackages/device-use/src/server/mcp.tspackages/device-use/src/server/state.tspackages/device-use/src/server/stream.tspackages/device-use/src/server/tools.tspackages/device-use/test/logs.test.tspackages/device-use/test/project-info.test.tspackages/device-use/test/server.test.tspackages/device-use/test/xcodebuild.test.tspackages/device-use/tsconfig.jsonpackages/device-use/vite.config.ts
💤 Files with no reviewable changes (8)
- packages/device-use/scripts/build-ts.ts
- packages/device-use/scripts/sdk-smoke.ts
- packages/device-use/src/sdk/apps.ts
- packages/device-use/src/cli/commands/stream.ts
- packages/device-use/src/sdk/builder.ts
- packages/device-use/src/cli/stream/viewer-html.ts
- packages/device-use/src/sdk/index.ts
- packages/device-use/src/cli/stream/manager.ts
zvadaadam
left a comment
There was a problem hiding this comment.
AI code review (Claude)
CodeRabbit already covered most of what I would have flagged (writeChain .catch() bug, hardcoded colors in styles.css, serve.ts path bug, install/launch error swallowing, etc.). I'm only posting the two findings that didn't show up in CodeRabbit's pass:
set_active_projectsilently wipesscheme/configurationwhen called with just a path — triggered by every blur of the path input.- Frontend 10s polling violates the root CLAUDE.md "WebSocket push over polling" Hard Rule.
Generated by Claude Code as an automated code review.
| configuration: z.string().optional(), | ||
| }), | ||
| handler: async (ctx, params) => { | ||
| await ctx.state.update({ project: params }); |
There was a problem hiding this comment.
AI code review (Claude) — bug, not flagged by CodeRabbit
set_active_project silently wipes scheme and configuration whenever called with just a path.
params is stored wholesale, and StateStore.update (src/server/state.ts:51-56) shallow-merges only at the top level — so state.project is replaced entirely, not merged with the previous value.
The frontend hits this on every blur of the path input: TopBar.onProjectPath (src/frontend/components/TopBar.tsx:25-29) calls useProjectStore.setProject(path) with no scheme → api.setActiveProject(path, undefined, undefined) → JSON.stringify drops the undefined keys → server receives {path} → state.project becomes {path}, scheme gone. Re-focusing and blurring the field without changing it wipes the pinned scheme and disables ▶ Run.
Fix: merge inside the handler, e.g.
const prev = ctx.state.get().project ?? {};
await ctx.state.update({ project: { ...prev, ...params } });or only overwrite keys the caller actually provided.
| useSimStore.getState().refresh(); | ||
| useProjectStore.getState().refresh(); | ||
| // Refresh sim list every 10s (catches external simctl boot/shutdown). | ||
| const t = setInterval(() => useSimStore.getState().refresh(), 10_000); |
There was a problem hiding this comment.
AI code review (Claude) — CLAUDE.md violation, not flagged by CodeRabbit
Root CLAUDE.md Hard Rules: "WebSocket push over polling — only poll for git diffs on working sessions." The package's AGENTS.md does not override this.
This 10s setInterval continuously polls /api/state + /api/stream. The justification (catching external xcrun simctl boot/shutdown) is real, but the rule has only one carve-out (git diffs). Two earlier commits in this branch (ef6e7ca2, 8f0fb70e) had to fight UI flicker caused by this exact polling — the cost of the violation is already showing up.
Fix: poll xcrun simctl list -j server-side and emit a sim-state event over the existing /ws bus when the list changes; frontend subscribes once instead of polling on a timer.
17 inline comments from the bot — addressing all of them. Brief
mapping (file → comment id):
CRITICAL
package.json#62 Add missing zustand dep (was hoisted to root, would
fail bun install --frozen-lockfile in CI).
MAJOR
agentic-app.json#33 UI/MCP URLs use 127.0.0.1 instead of localhost so
host IDEs don't trip over IPv4/IPv6 happy-eyeballs
ambiguity. serve.ts also defaults --host to 0.0.0.0
(matches the server's own default), and the printed
probe URL collapses 0.0.0.0 → 127.0.0.1.
serve.ts#33 Compiled-layout candidate path was off-by-one
(..server vs ../..server). Fixed.
serve.ts#65 Browser only opens after /health responds 200; on
startup failure, no dead tab.
serve.ts#72 Propagate child exit code; non-zero exits return
{success:false} with the code.
e2e-server.ts#173 ChildProcess.killed is set when the signal is sent,
not when the process exits — using it as a kill-fallback
check would race a SIGTERM-ignoring child. Switched
to a tracked `exited` flag + clearTimeout on exit.
Sidebar.tsx#17 Refresh refs whenever streamUdid changes (sim swap),
not only when refs.length === 0. Stops stale refs
from a previous sim from lingering.
styles.css#85 Added --on-accent / --stage-bg / --frame-bg /
--canvas-bg / --shadow-* tokens; removed every
literal `white`, `#06060a`, `#1a1a22`, `#000`, and
`rgba(...)` from the stylesheet body.
state.ts#72 persist() now catches mkdir/writeFile failures
inside the chained promise so a single transient
error can't permanently brick the writeChain.
tools.ts#562 stream_logs wires onError + onExit to delete the
handle from logHandles, so failed/exited streams
don't leak.
MINOR
design-doc#57,#65 Architecture diagram said "TanStack Start on Nitro"
but the locked stack (decision #5) is "Hono on
Bun.serve" / "Vite + React". Aligned.
README.md#88 Added `text` language tag to the architecture fenced
code block (markdownlint MD040).
project-info.ts#43 detectKind() throws XcodebuildError on inputs that
aren't .xcodeproj or .xcworkspace, instead of
silently treating them as .xcodeproj.
xcodebuild.ts#159 Removed the synthesized appPath from BuildResult —
it was a guess based on `scheme.app` that's wrong
whenever target product name differs. Callers
should use resolveAppPath() (already used by the
`run` composite tool).
TopBar.tsx#15 onBoot pushes a failed tool-event into the activity
store on api.boot() failure, so the toast surfaces
the error.
sim-store.ts#67 setPinned checks res.success — only commits the
pinnedUdid change on success, otherwise stores
error state.
styles.css#345 Replaced deprecated `word-break: break-word` with
`overflow-wrap: anywhere` (modern equivalent).
Side effects:
scripts/*.ts Added `export {}` to the three smoke scripts so
tsc treats them as modules, not global scripts
(was causing duplicate-symbol errors after a
second smoke script joined the project).
Verified:
- 86/86 unit tests pass
- typecheck clean
- bun scripts/e2e-server.ts passes against a real iPhone 17 Pro
(17 tool-events, 97 tool-log frames, 0 failures)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greenlight — Round 1CodeRabbit posted 17 actionable inline comments (1 critical, 8 major, 8 minor). All addressed in Critical (1)
Major (8)
Minor (8)
Side effect
Verified
|
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
packages/device-use/src/server/tools.ts (1)
159-172:⚠️ Potential issue | 🟠 Major
set_active_projectsilently wipesschemeandconfigurationwhen called with only apath.This was flagged in a previous review and remains unaddressed. When
paramscontains onlypath, storing it wholesale viactx.state.update({ project: params })replaces the entireprojectobject, losing any previously setschemeorconfiguration. The frontend's TopBar blur handler can trigger this, breaking the "▶ Run" button.🐛 Proposed fix — merge with previous project state
const set_active_project = tool({ name: "set_active_project", description: "Pin the active Xcode project (path + scheme) for this workspace. Used by build/run when params are omitted.", schema: z.object({ path: z.string(), scheme: z.string().optional(), configuration: z.string().optional(), }), handler: async (ctx, params) => { - await ctx.state.update({ project: params }); - return { ok: true, project: params }; + const prev = ctx.state.get().project ?? {}; + const merged = { ...prev, ...params }; + await ctx.state.update({ project: merged }); + return { ok: true, project: merged }; }, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/server/tools.ts` around lines 159 - 172, The handler for set_active_project currently overwrites the entire project state with params which causes scheme and configuration to be wiped when only path is provided; change the handler to read the existing project from ctx.state (e.g., ctx.state.project or via ctx.state.get), shallow-merge the existing project with the incoming params so unspecified fields are preserved, and then call ctx.state.update({ project: mergedProject }) instead of ctx.state.update({ project: params }); ensure you reference set_active_project and ctx.state.update when making the change.docs/device-use-v2-design.md (1)
183-184:⚠️ Potential issue | 🟡 MinorPhase 1 still references the rejected TanStack Start + Nitro stack.
Lines 183-184 describe adding "TanStack Start" and creating a "minimal Nitro server", but decision
#5(line 23) documents that the team refined away from TanStack Start on Nitro during Phase 1 scaffolding itself, ultimately choosing "Hono on Bun.serve" for the backend. The Phase 1 description should reflect the final chosen stack, not the initial exploration.📝 Suggested fix
### Phase 1 — Scaffold the new server + frontend (~1 day) -- Add TanStack Start to `packages/device-use` (new deps, tsconfig adjustments). -- `src/server/` — minimal Nitro server with `/health` route. +- Add Hono + Vite + TanStack Router to `packages/device-use` (new deps, tsconfig adjustments). +- `src/server/` — minimal Hono on Bun.serve with `/health` route. - `src/frontend/` — minimal React app served at `/`. - `bun run dev` / `bun run build` / `bun run start` wire-up. - **Exit criterion:** `bun run dev` → open `localhost:3100/` → "device-use v2" heading renders; `GET /health` returns 200.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/device-use-v2-design.md` around lines 183 - 184, Update the Phase 1 description that still references "TanStack Start" and "minimal Nitro server": replace the bullet "- Add TanStack Start to `packages/device-use` (new deps, tsconfig adjustments)." with a note about adding the chosen Hono stack (e.g., "Add Hono on Bun.serve to `packages/device-use`") and change "- `src/server/` — minimal Nitro server with `/health` route." to reflect the final implementation (e.g., "src/server/ — minimal Hono server on Bun.serve with `/health` route"). Ensure the Phase 1 heading/text that mentions "Phase 1" and those two bullets are updated so the document accurately states the final backend choice ("Hono on Bun.serve") instead of "TanStack Start" or "Nitro".packages/device-use/src/frontend/stores/sim-store.ts (1)
64-70:⚠️ Potential issue | 🟡 MinorHandle thrown API errors in
setPinnedas well as non-success responses.Line 65 is only guarded for
res.success === false; ifapi.setActiveSimulator()throws, error state is not set and callers see a rejected promise path.💡 Proposed fix
setPinned: async (udid) => { - const res = await api.setActiveSimulator(udid); - if (!res.success) { - set({ error: res.error ?? "failed to pin simulator" }); - return; - } - set({ pinnedUdid: udid, error: null }); + try { + const res = await api.setActiveSimulator(udid); + if (!res.success) { + set({ error: res.error ?? "failed to pin simulator" }); + return; + } + set({ pinnedUdid: udid, error: null }); + } catch (err) { + set({ error: err instanceof Error ? err.message : "failed to pin simulator" }); + } },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/src/frontend/stores/sim-store.ts` around lines 64 - 70, The setPinned async action (setPinned) only handles non-success responses from api.setActiveSimulator but not exceptions; wrap the api.setActiveSimulator(udid) call in a try/catch (or attach .catch) inside setPinned, set the store's error via set({ error: <message> }) when an exception occurs, and ensure the function returns after setting error (so callers do not see an unhandled rejection); keep the existing success-path (set({ pinnedUdid: udid, error: null })) intact and include the thrown error message (or a fallback string) in the error state.
🧹 Nitpick comments (3)
packages/device-use/test/xcodebuild.test.ts (1)
32-37: Remove unusedSTARTEDarray.The
STARTEDarray is declared and cleared inafterEachbut never populated or read anywhere in the test file.🧹 Suggested cleanup
-const STARTED: FakeChild[] = []; - -afterEach(() => { - // Guard: don't leave fake children emitting after a test ends. - STARTED.length = 0; -}); - describe("build", () => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/test/xcodebuild.test.ts` around lines 32 - 37, Remove the unused STARTED array and its cleanup: delete the const STARTED: FakeChild[] = []; declaration and remove the afterEach block that only resets STARTED.length, ensuring no other code refers to STARTED (search for STARTED, FakeChild or that afterEach helper) before committing the change in xcodebuild.test.ts.packages/device-use/scripts/e2e-server.ts (1)
213-226: Consider deriving expected tool count dynamically.The hardcoded
24for tool count validation will require manual updates whenever tools are added or removed. Consider importingTOOLS.lengthfrom the tools module if the e2e script can safely depend on server internals, or document this as an intentional snapshot assertion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/device-use/scripts/e2e-server.ts` around lines 213 - 226, The checks use a hardcoded expected value "24" for tool counts (see toolCount check and mcp("tools/list") assertion), which will break when tools change; update both comparisons to derive the expected count from the actual tools definition (import TOOLS and use TOOLS.length) and compare against that value instead of 24, or if this is intentionally a snapshot test, add a clear comment explaining it's intentional; ensure the import of TOOLS is added at the top and replace the two literal 24s used with the dynamic expectedCount when calling fail and logging via mcp and toolCount checks (referencing toolCount, mcp("tools/list"), listRes.tools.length, fail, and log).docs/device-use-v2-design.md (1)
39-39: Consider adding a language specifier to the code block.The fenced code block at line 39 lacks a language specifier. While the ASCII art diagram renders correctly, adding
textor another appropriate specifier would satisfy markdown linters and be consistent with best practices.📝 Suggested fix
-``` +```text ┌─────────────────────────────────────────────────────────────┐ │ packages/device-use/ │🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/device-use-v2-design.md` at line 39, The fenced code block in docs/device-use-v2-design.md (the ASCII art diagram starting at the code fence around line 39) lacks a language specifier; update that opening triple-backtick to include a language such as text (e.g., ```text) so markdown linters accept it and the rendering remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/device-use-v2-design.md`:
- Line 168: Update the dev/prod modes description so it reflects the actual
chosen stack (Hono on Bun.serve for the backend) instead of saying "Standard
TanStack Start"; modify the sentence that currently reads "`bun run dev` = Vite
HMR + `tsx watch` for server. `bun run start` = bundled frontend + compiled
server. Standard TanStack Start." to explicitly state that `bun run dev` uses
Vite HMR with a Bun/Hono dev server (e.g., `tsx watch` driving Hono on
Bun.serve) and that `bun run start` runs the bundled frontend with the
production Bun/Hono server; ensure the phrase "Standard TanStack Start" is
removed or replaced and that this aligns with decision `#5` which switched to Hono
on Bun.serve.
In `@packages/device-use/src/frontend/stores/sim-store.ts`:
- Around line 15-19: The current sameStream function only compares udid and port
which can let url/size changes be ignored; update sameStream (and the other
occurrence noted) to include url and size in the comparison (or perform a
shallow/deep equality of StreamInfo objects) so any change to url or size will
be detected and state will update; locate the sameStream function and replace
its equality check to compare udid, port, url, and size (or use a reliable
object-equality helper) to prevent stale viewer metadata.
- Around line 22-30: The equality function sameSims currently compares udid,
state, and name but omits runtime, causing missed updates; update sameSims (and
the other occurrence mentioned) to also compare x.runtime and y.runtime (e.g.,
add a check like x.runtime !== y.runtime) so any change to the simulator runtime
triggers inequality and subscriber updates.
- Around line 40-71: The refresh() and setPinned() handlers in sim-store.ts are
performing server fetches and mutations directly; rework them to use TanStack
Query v5 instead: create queries for device list, state and stream (e.g.,
useQuery keys like ["devices"], ["simState"], ["stream"]) and a mutation for
setActiveSimulator via useMutation, then in sim-store's refresh and setPinned
either read from/queryClient (queryClient.getQueryData or invalidate/refetch
queries) or call mutation.mutateAsync and onSuccess invalidate/refetch the
relevant queries; preserve existing identity logic by deriving
sims/pinnedUdid/streamInfo from the query data and continue using
sameSims/sameStream to avoid unnecessary replacements. Ensure setPinned uses the
mutation's onSuccess to update/refresh the ["devices"] and ["simState"] queries
and set errors via the store only from query/mutation results rather than
calling api directly.
---
Duplicate comments:
In `@docs/device-use-v2-design.md`:
- Around line 183-184: Update the Phase 1 description that still references
"TanStack Start" and "minimal Nitro server": replace the bullet "- Add TanStack
Start to `packages/device-use` (new deps, tsconfig adjustments)." with a note
about adding the chosen Hono stack (e.g., "Add Hono on Bun.serve to
`packages/device-use`") and change "- `src/server/` — minimal Nitro server with
`/health` route." to reflect the final implementation (e.g., "src/server/ —
minimal Hono server on Bun.serve with `/health` route"). Ensure the Phase 1
heading/text that mentions "Phase 1" and those two bullets are updated so the
document accurately states the final backend choice ("Hono on Bun.serve")
instead of "TanStack Start" or "Nitro".
In `@packages/device-use/src/frontend/stores/sim-store.ts`:
- Around line 64-70: The setPinned async action (setPinned) only handles
non-success responses from api.setActiveSimulator but not exceptions; wrap the
api.setActiveSimulator(udid) call in a try/catch (or attach .catch) inside
setPinned, set the store's error via set({ error: <message> }) when an exception
occurs, and ensure the function returns after setting error (so callers do not
see an unhandled rejection); keep the existing success-path (set({ pinnedUdid:
udid, error: null })) intact and include the thrown error message (or a fallback
string) in the error state.
In `@packages/device-use/src/server/tools.ts`:
- Around line 159-172: The handler for set_active_project currently overwrites
the entire project state with params which causes scheme and configuration to be
wiped when only path is provided; change the handler to read the existing
project from ctx.state (e.g., ctx.state.project or via ctx.state.get),
shallow-merge the existing project with the incoming params so unspecified
fields are preserved, and then call ctx.state.update({ project: mergedProject })
instead of ctx.state.update({ project: params }); ensure you reference
set_active_project and ctx.state.update when making the change.
---
Nitpick comments:
In `@docs/device-use-v2-design.md`:
- Line 39: The fenced code block in docs/device-use-v2-design.md (the ASCII art
diagram starting at the code fence around line 39) lacks a language specifier;
update that opening triple-backtick to include a language such as text (e.g.,
```text) so markdown linters accept it and the rendering remains unchanged.
In `@packages/device-use/scripts/e2e-server.ts`:
- Around line 213-226: The checks use a hardcoded expected value "24" for tool
counts (see toolCount check and mcp("tools/list") assertion), which will break
when tools change; update both comparisons to derive the expected count from the
actual tools definition (import TOOLS and use TOOLS.length) and compare against
that value instead of 24, or if this is intentionally a snapshot test, add a
clear comment explaining it's intentional; ensure the import of TOOLS is added
at the top and replace the two literal 24s used with the dynamic expectedCount
when calling fail and logging via mcp and toolCount checks (referencing
toolCount, mcp("tools/list"), listRes.tools.length, fail, and log).
In `@packages/device-use/test/xcodebuild.test.ts`:
- Around line 32-37: Remove the unused STARTED array and its cleanup: delete the
const STARTED: FakeChild[] = []; declaration and remove the afterEach block that
only resets STARTED.length, ensuring no other code refers to STARTED (search for
STARTED, FakeChild or that afterEach helper) before committing the change in
xcodebuild.test.ts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 419393ca-8ab7-47b2-82b5-526e7235cd2d
📒 Files selected for processing (18)
docs/device-use-v2-design.mdpackages/device-use/README.mdpackages/device-use/agentic-app.jsonpackages/device-use/package.jsonpackages/device-use/scripts/e2e-server.tspackages/device-use/scripts/longpress-smoke.tspackages/device-use/scripts/touch-smoke.tspackages/device-use/scripts/ws-smoke.tspackages/device-use/src/cli/commands/serve.tspackages/device-use/src/engine/project-info.tspackages/device-use/src/engine/xcodebuild.tspackages/device-use/src/frontend/components/Sidebar.tsxpackages/device-use/src/frontend/components/TopBar.tsxpackages/device-use/src/frontend/stores/sim-store.tspackages/device-use/src/frontend/styles.csspackages/device-use/src/server/state.tspackages/device-use/src/server/tools.tspackages/device-use/test/xcodebuild.test.ts
✅ Files skipped from review due to trivial changes (3)
- packages/device-use/README.md
- packages/device-use/scripts/touch-smoke.ts
- packages/device-use/agentic-app.json
🚧 Files skipped from review as they are similar to previous changes (7)
- packages/device-use/src/frontend/components/Sidebar.tsx
- packages/device-use/src/frontend/components/TopBar.tsx
- packages/device-use/src/cli/commands/serve.ts
- packages/device-use/src/engine/project-info.ts
- packages/device-use/src/server/state.ts
- packages/device-use/package.json
- packages/device-use/src/engine/xcodebuild.ts
The zustand dep was added to packages/device-use/package.json in f14d701 but the workspace lockfile entry wasn't captured. This is the missing one-line lockfile sync — `bun install --frozen-lockfile` was already passing in CI because the root entry was up-to-date, but the workspace block needed the addition for consistency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- design-doc#168 — removed stale "TanStack Start" reference from the dev/prod modes row; replaced with the actual setup (Hono server with --hot, separate Vite dev server, prod = bundled frontend + same Hono server in NODE_ENV=production). - sim-store#19 (sameStream) — extended comparison beyond udid + port to include url and structural size comparison (pxW/pxH/ptW/ptH). Stream metadata changes (e.g. orientation rotation altering size while UDID/port stay fixed) now correctly notify subscribers. - sim-store#30 (sameSims) — added runtime to the per-simulator equality check. Runtime upgrades now propagate. Disagreed with one suggestion (TanStack Query refactor for sim-store fetches). Reasoning in the round summary on the PR — short version: device-use is a standalone package whose frontend deliberately runs on Zustand only, has no TanStack Query dependency today, and the guideline cited applies to apps/web (Deus's main UI), not to packaged sub-products. Adding TanStack Query for two REST reads would inflate the bundle and introduce a paradigm split between the WS event subscription and the polled data, with no real benefit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greenlight — Round 2CodeRabbit posted 4 inline comments after the round-1 push. Triaged: 3 fixed, 1 disagreed. Fixed (3) in
Disagreed (1) — TanStack Query refactor for sim-store Verified
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/device-use-v2-design.md (1)
183-185:⚠️ Potential issue | 🟡 MinorPhase 1 still references deprecated stack choices.
This section says “Add TanStack Start” and “minimal Nitro server,” which conflicts with locked decision
#5and the rest of this doc’s Hono-on-Bun.serve direction. Please update this phase text to Hono + Vite terminology for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/device-use-v2-design.md` around lines 183 - 185, Update Phase 1 wording to match the locked decision and the rest of the doc: replace "Add TanStack Start" with a Hono + Vite phrasing (e.g., "Add Hono + Vite to `packages/device-use`" including new deps and tsconfig adjustments), change "`src/server/` — minimal Nitro server with `/health` route" to "`src/server/` — minimal Hono-on-Bun.serve server with `/health` route", and change "`src/frontend/` — minimal React app served at `/`" to explicitly state the React app is served via Vite (e.g., "minimal React app served via Vite at `/`"); keep file/dir references `packages/device-use`, `src/server`, and `src/frontend` so reviewers can find the updated lines.
🧹 Nitpick comments (1)
docs/device-use-v2-design.md (1)
39-39: Add a language to the fenced code block.The architecture diagram block should include a language tag (for markdownlint MD040), e.g.
text.Suggested doc fix
-``` +```text ┌─────────────────────────────────────────────────────────────┐ ... -``` +```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/device-use-v2-design.md` at line 39, The fenced code block containing the architecture diagram is missing a language tag (causing markdownlint MD040); update the opening fence (the triple backticks that start the diagram block) to include a language tag such as text (e.g., change ``` to ```text) so the diagram block is fenced with a language identifier and leave the closing ``` unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/device-use-v2-design.md`:
- Line 318: The docs currently reference `launch.ready.http`, which conflicts
with the manifest shape shown earlier that uses `launch.ready` with `type:
"http"`; update the sentence to use the manifest field name consistently (e.g.,
`launch.ready` and indicate `type: "http"` where relevant) so it does not imply
a different schema path, and ensure related occurrences refer to `launch.ready`
rather than `launch.ready.http`.
---
Duplicate comments:
In `@docs/device-use-v2-design.md`:
- Around line 183-185: Update Phase 1 wording to match the locked decision and
the rest of the doc: replace "Add TanStack Start" with a Hono + Vite phrasing
(e.g., "Add Hono + Vite to `packages/device-use`" including new deps and
tsconfig adjustments), change "`src/server/` — minimal Nitro server with
`/health` route" to "`src/server/` — minimal Hono-on-Bun.serve server with
`/health` route", and change "`src/frontend/` — minimal React app served at `/`"
to explicitly state the React app is served via Vite (e.g., "minimal React app
served via Vite at `/`"); keep file/dir references `packages/device-use`,
`src/server`, and `src/frontend` so reviewers can find the updated lines.
---
Nitpick comments:
In `@docs/device-use-v2-design.md`:
- Line 39: The fenced code block containing the architecture diagram is missing
a language tag (causing markdownlint MD040); update the opening fence (the
triple backticks that start the diagram block) to include a language tag such as
text (e.g., change ``` to ```text) so the diagram block is fenced with a
language identifier and leave the closing ``` unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5cea110d-dfb8-412c-bd89-7f2ec910ed1f
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
docs/device-use-v2-design.mdpackages/device-use/src/frontend/stores/sim-store.ts
✅ Files skipped from review due to trivial changes (1)
- packages/device-use/src/frontend/stores/sim-store.ts
CodeRabbit flagged that `launch.ready.http` was an inconsistent path
shorthand — the actual manifest schema is `launch.ready` with
`{type: "http", path: "/health"}`. Reworded to use the correct
shape so future readers don't assume a `launch.ready.http` key.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greenlight — Round 3CodeRabbit posted 1 minor doc-wording nit. Fixed in No other findings. Ready to merge once CI on the new SHA confirms green. |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/device-use-v2-design.md`:
- Line 85: Update the `/ws` bullet so it does not claim standalone CLI commands
are visible live; change it to state that `/ws` exposes bidirectional events for
subscribed frontends and live updates only reflect actions routed through the
server/viewer (standalone CLI actions do not populate server/viewer activity).
Ensure the wording references `/ws`, "CLI commands", and "server/viewer
activity" so readers understand the limitation already noted elsewhere.
- Line 30: The doc incorrectly shows the storage path as
`{storage.workspace}/.device-use/state.json`; update the text to reflect how
StateStore and DEUS_STORAGE actually behave by documenting the file as
`{storage.workspace}/state.json` (since DEUS_STORAGE is set to
`{workspace}/.device-use` and StateStore writes to `{storageDir}/state.json`),
and apply the same correction to the other occurrences noted (lines referenced
203 and 320); mention the relevant symbols StateStore, storageDir, and
DEUS_STORAGE so readers can map env var to actual file path.
- Around line 242-243: Update the out-of-scope note that incorrectly claims
agentic-app.json is separate: change the wording around the "AAP protocol itself
(`agentic-app.json`, registry, lifecycle in Deus backend)" to state that the
manifest file agentic-app.json is included in this PR (manifest included) while
protocol/host integration (registry and lifecycle handling in the Deus backend)
remains deferred to a follow-up; apply the same rewording to the repeated
occurrence referenced at line 253 so both locations reflect "manifest included,
protocol/host integration deferred."
- Around line 183-185: Update the stale Phase 1 stack text that references
"TanStack Start" and "Nitro" to reflect the locked stack: replace those list
items so they read that `packages/device-use` uses Hono on Bun.serve for the
server and a Vite-served frontend (update the bullets mentioning `src/server/`
and `src/frontend/` accordingly), and ensure any mentions of tsconfig/deps
reference the actual deps/config needed for Hono+Bun and Vite.
- Line 39: The fenced architecture block uses a plain code fence (```) without a
language tag; update the opening fence to include a language identifier (for
example change ``` to ```text) so the block becomes a text-fenced code block and
satisfies markdownlint MD040 for the architecture diagram in the document.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: daa94643-9771-460d-ac9a-627e72b8d0cd
📒 Files selected for processing (1)
docs/device-use-v2-design.md
All 5 are doc-only fixes in docs/device-use-v2-design.md:
- Storage path duplication — `state.json` lives at
`{storage.workspace}/state.json`, not `{storage.workspace}/.device-use/state.json`
(the `.device-use` segment is already inside `storage.workspace`).
Fixed in 3 places (table row #12, backend wiring section).
- Architecture fence missing language — added `text` (markdownlint MD040).
- /ws bullet overstated CLI visibility — clarified that only actions
routed through the server (REST, MCP, viewer) populate the activity
stream; standalone CLI invocations bypass it (peer model).
- Phase 1 stack text was stale — said "Add TanStack Start" + "Nitro"
but the locked stack (decision #5) is Hono on Bun.serve + Vite.
Bullet rewritten to match what was actually built.
- Out-of-scope note about agentic-app.json was outdated — the manifest
IS in this PR; the next PR is the Deus host that consumes it. Reworded
+ retitled the manifest section to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greenlight — Round 4CodeRabbit posted 5 inline doc-only nits in
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/device-use-v2-design.md`:
- Around line 270-271: Docs show manifest examples using "localhost" for the
"ui" url and "agent.tools.url"; update those example values to use the IPv4-safe
literal "127.0.0.1" instead of "localhost" so they match the PR's manifest fix
and avoid IPv4/IPv6 ambiguity (replace the URL values referenced by the "ui" key
and by "agent" → "tools" → "url").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6fe59951-a281-459f-8428-2ac2460e52f4
📒 Files selected for processing (1)
docs/device-use-v2-design.md
CodeRabbit noticed the design-doc manifest example still showed \`localhost\` URLs while the actual \`packages/device-use/agentic-app.json\` uses \`127.0.0.1\` (the IPv4-safe form chosen during the round-1 fix to avoid IPv4/IPv6 happy-eyeballs ambiguity). Aligned the example so copy/paste from the doc doesn't regress the fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greenlight — Round 5CodeRabbit posted 1 doc-consistency nit. Fixed in HEAD — aligned the design-doc manifest example URLs ( |
The viewer now uses the same color/text/radius/font system that
apps/web/src/global.css defines for the Deus IDE itself, so embedding
device-use as an AAP app feels native instead of looking like a
separate skin.
Tokens mirrored 1:1 from Deus's dark theme:
- Backgrounds — six-tier hierarchy (--bg-base #0b0b0b → --bg-surface
→ --bg-elevated → --bg-raised → --bg-overlay → --bg-muted #2a2a2a).
Each step adds perceptible depth without needing borders.
- Text — five-level OKLCH grayscale (--text-primary 0.86 → 0.32),
generous gaps between levels.
- Accent — `oklch(0.78 0.09 345)` (Deus's "cool rose" primary), not
the previous purple. Used for the run button, status dot glow,
ref ids, and ripples.
- Semantic signals — equiluminant OKLCH (success / warning /
destructive all at L≈0.65-0.68), matches Deus exactly.
- Borders — `color-mix(in oklch, var(--foreground) 5%, transparent)`,
semi-transparent so they adapt to any surface.
- Radius scale — 2 / 4 / 6 / 8 / 10 / 12 / 16 / 20 / 24 px.
Device frame uses --radius-4xl (24px), canvas inside uses
--radius-3xl (20px), buttons --radius-md/lg.
- Typography — 13px base (matches Deus's dense IDE scale), system
-apple-system / SF Pro Text + SF Mono, negative letter-spacing
throughout (-0.005em / -0.015em on brand) for the "premium texture"
layer.
- Custom scrollbars themed to bg-raised → bg-overlay on hover.
JSX inline styles updated to reference the new tokens
(--accent-2 → --primary, --danger → --destructive, --dim →
--text-muted-foreground).
Class names unchanged — same DOM structure, just new visual tokens
underneath. Verified:
- 86/86 tests pass
- typecheck clean
- vite build: index-CKuhJYDJ.css (9.62 KB), index-BOvIo28T.js
- bun scripts/e2e-server.ts PASS — 18 tool-events, 97 tool-log
frames, 0 failures
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The logs pane was eating 220px at the bottom even when empty, which is most of the time. Now it's just a ~32px header row by default; the phone + sidebar get the extra vertical space back. - Click the header row to toggle expand/collapse. - Chevron flips ▸ / ▾ to signal state. - Auto-expands on any new log line — if you triggered a build or a log stream, the drawer pops open for you. Collapsing is always manual. Auto-expand uses a direct Zustand subscription (not a useEffect setState) to satisfy react-hooks/set-state-in-effect. - Clear button hidden when collapsed (no affordance without content). - Grid row for logs changed from fixed 220px to `auto`; the `.expanded` class pins the body at 220px when open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Native <select> popups use OS chrome (white on macOS), breaking the
dark Deus theme. Swapped for @radix-ui/react-select wrapped in a
small themed component — same primitive Deus's apps/web Select uses,
just styled with plain CSS against our token system (no Tailwind).
What's new:
- src/frontend/components/Select.tsx — minimal wrapper exposing a
plain {value, onValueChange, options, placeholder} API. Options
can be any ReactNode so the sim picker can render a state dot + a
dim runtime suffix inline.
- Styled trigger matches the other topbar inputs (bg-elevated, 1px
border, 6px radius, hover brightens to bg-raised, focus-visible
rings with --ring).
- Portaled popover uses --popover / --border / --shadow-elevated,
radius-xl; items have 6/10 padding, left checkmark indicator in
--primary, data-highlighted lifts to bg-elevated.
- Chevron rotates 180° on open (data-state="open").
- Smooth 180ms in / 150ms out with cubic-bezier matching apps/web.
TopBar wiring:
- Sim picker renders each option as `[dot] Name · runtime`, with the
dot green+glow when that sim is Booted.
- Scheme picker renders as a plain list.
- Empty "no simulators" case stays inline text (no dropdown needed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without explicit cache directives, browsers heuristically cache the HTML shell and keep pointing at old Vite-hashed asset URLs across deploys — user refreshes, sees a stale bundle. This was hitting the dropdown-themed commit: the new bundle was built + served but the browser's cached HTML still referenced the pre-Radix build. Fix in the prod static-serving branch: - /assets/* → `Cache-Control: public, max-age=31536000, immutable` (Vite fingerprints these, so the URL changes whenever the content does — safe to cache forever). - Everything else (including the HTML shell) → `Cache-Control: no-cache` (revalidate every load so new bundle URLs are picked up immediately). Verified via `curl -D -`. HTML now sends `no-cache`, asset sends `public, max-age=31536000, immutable`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old frame was just a rounded rect with 12px padding and 40px radius — didn't match iPhone proportions, had no dynamic island, no camera hole, and no accurate bezel thickness. Ported the GenericDeviceFrame shape from apps/web/src/features/simulator/ui/DeviceFrame.tsx exactly: - Aspect ratio per device class (iPhone 430:932, iPad 834:1194); the frame scales to fit the stage height while preserving ratio. - Shell radius 3.25rem (iPhone) / 2.75rem (iPad), screen radius 2.6rem / 2.1rem — the two-layer corner system. - Percentage-based screen insets (iPhone 1.7% top / 2.5% sides / 1.9% bottom) matching Apple's actual bezel thickness. - Dynamic island: 38% × 4.1%, positioned at 2.35% top, bg-base rounded-full. Shows for iPhone only. - Camera dot: 10×10, positioned at 1.45% top with a subtle ring. Shows for iPad only. - Inner bezel ring at inset 0.6% — catches light, hints at edge thickness without adding chrome. - iPhone vs iPad detection from pinned sim's name (same heuristic apps/web uses). We don't bundle the Apple-derived bezel assets (apps/web loads them from a generated manifest via scripts/prepare-device-chrome.mjs — needs Xcode + licensing considerations for packaged device-use); the generic shell is the fallback Deus uses anyway when the manifest isn't present, so visual parity is the same as Deus's default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ady has them The simulator's MJPEG stream captures iOS's full screen, including the native status bar with its own dynamic island (iPhone) or camera cutout (iPad). My overlay versions drew a second pill on top at slightly different sizes/positions, creating the visible "two things stacked" artifact the user flagged. Fix: remove the overlay divs + their CSS. The stream shows the real ones rendered by iOS at the right size/position. Why Deus's apps/web draws them anyway: their code path assumes the captured stream excludes the status bar (that is the case when they eventually swap in the Apple-derived bezel assets, which crop it). For the generic-shell + full-screen MJPEG case (what we have), the overlays are redundant. Kept: shell background, proper aspect ratio, shell/screen radii, percentage screen insets, subtle inner ring. Those still make the frame feel intentional without faking what's already in the stream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… fits exactly The hardcoded 430/932 shell AR only matches the base iPhone 16 Pro. For iPhone Air (420×912) or iPhone 17 Pro Max (440×956), the stream either got stretched/squished by the canvas filling the box, or left a black gap at the top/bottom because the inner screen box didn't match the stream's aspect ratio. Fix: compute the shell aspect ratio from streamInfo.size.ptW/ptH so that, after subtracting the percentage bezel insets, the inner screen box has exactly the same ratio as the stream. The canvas then fills the screen cleanly with no object-fit letterboxing needed. Math: widthFrac = 1 − (left% + right%) / 100 heightFrac = 1 − (top% + bottom%) / 100 screenAR = ptW / ptH (from streamInfo.size) shellAR = screenAR × widthFrac / heightFrac Bezel insets (1.7/2.5/2.5/1.9 % for iPhone, 2.1 all-around for iPad) are unchanged. Fallback to the previous hardcoded ratios when we don't have streamInfo yet (empty-state placeholder). Dropped the canvas `object-fit: contain` since there's nothing to contain now — the box matches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract ElementsPanel and ActivityPanel as discrete components under components/sidebar/, share a Panel wrapper for chrome, and add a collapsible shell with a Cmd+B toggle. Sidebar state persists via a new ui-store (localStorage). Collapsed state shows a thin rail button on the right edge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar refactor — collapsible + componentizedCommit: What changed
UX
Verified: |
Working design doc covering (a) hyperframes investigation as a candidate second AAP app and (b) AAP v1 protocol revisions surfaced by that investigation. Not implemented yet — informs upcoming PR sequence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17 inline comments from the bot — addressing all of them. Brief
mapping (file → comment id):
CRITICAL
package.json#62 Add missing zustand dep (was hoisted to root, would
fail bun install --frozen-lockfile in CI).
MAJOR
agentic-app.json#33 UI/MCP URLs use 127.0.0.1 instead of localhost so
host IDEs don't trip over IPv4/IPv6 happy-eyeballs
ambiguity. serve.ts also defaults --host to 0.0.0.0
(matches the server's own default), and the printed
probe URL collapses 0.0.0.0 → 127.0.0.1.
serve.ts#33 Compiled-layout candidate path was off-by-one
(..server vs ../..server). Fixed.
serve.ts#65 Browser only opens after /health responds 200; on
startup failure, no dead tab.
serve.ts#72 Propagate child exit code; non-zero exits return
{success:false} with the code.
e2e-server.ts#173 ChildProcess.killed is set when the signal is sent,
not when the process exits — using it as a kill-fallback
check would race a SIGTERM-ignoring child. Switched
to a tracked `exited` flag + clearTimeout on exit.
Sidebar.tsx#17 Refresh refs whenever streamUdid changes (sim swap),
not only when refs.length === 0. Stops stale refs
from a previous sim from lingering.
styles.css#85 Added --on-accent / --stage-bg / --frame-bg /
--canvas-bg / --shadow-* tokens; removed every
literal `white`, `#06060a`, `#1a1a22`, `#000`, and
`rgba(...)` from the stylesheet body.
state.ts#72 persist() now catches mkdir/writeFile failures
inside the chained promise so a single transient
error can't permanently brick the writeChain.
tools.ts#562 stream_logs wires onError + onExit to delete the
handle from logHandles, so failed/exited streams
don't leak.
MINOR
design-doc#57,#65 Architecture diagram said "TanStack Start on Nitro"
but the locked stack (decision #5) is "Hono on
Bun.serve" / "Vite + React". Aligned.
README.md#88 Added `text` language tag to the architecture fenced
code block (markdownlint MD040).
project-info.ts#43 detectKind() throws XcodebuildError on inputs that
aren't .xcodeproj or .xcworkspace, instead of
silently treating them as .xcodeproj.
xcodebuild.ts#159 Removed the synthesized appPath from BuildResult —
it was a guess based on `scheme.app` that's wrong
whenever target product name differs. Callers
should use resolveAppPath() (already used by the
`run` composite tool).
TopBar.tsx#15 onBoot pushes a failed tool-event into the activity
store on api.boot() failure, so the toast surfaces
the error.
sim-store.ts#67 setPinned checks res.success — only commits the
pinnedUdid change on success, otherwise stores
error state.
styles.css#345 Replaced deprecated `word-break: break-word` with
`overflow-wrap: anywhere` (modern equivalent).
Side effects:
scripts/*.ts Added `export {}` to the three smoke scripts so
tsc treats them as modules, not global scripts
(was causing duplicate-symbol errors after a
second smoke script joined the project).
Verified:
- 86/86 unit tests pass
- typecheck clean
- bun scripts/e2e-server.ts passes against a real iPhone 17 Pro
(17 tool-events, 97 tool-log frames, 0 failures)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Refactors
packages/device-usefrom a stateless CLI + Swift binary into a standalone iOS Simulator workbench: a long-running Bun server hosting a React viewer, an MCP HTTP endpoint, a binary-WS touch proxy, and the same engine the CLI now shares as a peer. Ships as the reference implementation of an Agentic Apps Protocol (AAP) app —agentic-app.jsonis included; the Deus host integration follows in a separate PR.20 commits, +4.8k / −1k LOC, scoped entirely to
packages/device-use/+docs/device-use-v2-design.md+.github/workflows/device-use-e2e.yml.What landed
Server (
src/server/) — Hono on Bun.serve, one process serving:/React viewer (Vite-built)/mcpMCP HTTP transport (24 tools, JSON-RPC over SSE)/wsevent bus (broadcast everytool-event+tool-log)/sim-inputbinary WS proxy → simbridge/ws(low-latency touch + buttons)/api/tools/*,/api/state,/api/stream,/api/eventsREST surface/stream.mjpegpassthrough from simbridge/healthliveness probeEngine (
src/engine/) — kept; expanded withxcodebuild,logs.ts,project-info.ts,app-lifecycleprimitives. CLI and server are independent peers — both import the engine, neither depends on the other.Frontend (
src/frontend/) — React + Vite + Zustand. Phone-frame canvas with live MJPEG, tap/swipe/long-press via binary WS, hardware Home/Lock buttons (Cmd+H/Cmd+L), real-time@refelement sidebar with auto-snapshot on interaction, build-status feedback, agent activity feed, log drawer, keyboard input that batches keystrokes throughtype_text.CLI —
device-use serveboots the server. All other commands (list, boot, snapshot, tap, type, screenshot, etc.) unchanged. SDK +streamsubcommand deleted (replaced by the server +/sim-input).AAP manifest —
packages/device-use/agentic-app.json. Declares launch command, ready probe, viewer URL, MCP endpoint, storage, lifecycle, requires.Design doc —
docs/device-use-v2-design.md. Locked architectural decisions, implementation log, and a handoff section for the next PR (AAP host integration in Deus).CI —
.github/workflows/device-use-e2e.yml. Nightly + workflow_dispatch onmacos-15; runs the full e2e (build → install → launch → tap) against a real simulator.Test plan
bun test— 86/86 pass (engine unit + server integration)bun run typecheck— cleanbun scripts/e2e-server.ts— full real-sim flow passes (18 tool-events, 97 tool-log frames)bun scripts/touch-smoke.ts—/sim-inputbinary WS proxy passes touch correctlybun scripts/longpress-smoke.ts— 1.5s hold opens iOS context menu (verified)bun run start→ http://localhost:3100 → boot sim → set TestApp project → ▶ Run → click apps → swipe → home → keyboard typingtools/listreturns 24 tools,tools/callround-tripsWhat's deliberately NOT in this PR
Tracked as v1.1 in the design doc:
/wsactivity feed — peer model preserved; opt-in--via-serverflag is the planned shape.apps/web/src/features/simulator/etc. — happens in the AAP host PR after soak.Architectural choices reviewers will likely ask about
All deliberate, all in the design doc:
WebStandardStreamableHTTPServerTransport.invokeToolfor latency (/sim-input→ simbridge WS direct). REST + MCP taps still go throughinvokeToolso MCP/REST agents see them in the activity feed.StateStoreis a JSON file, not SQLite — ~200 bytes of state per workspace.as anycasts intoolInputSchema()— zod-to-json-schema's input type is too deep for the union we pass; documented inline.0.0.0.0so localhost resolves regardless of v4/v6 — discovered while debugging an orphaned IPv6 listener shadowing us.Known visible gaps
Documented honestly; none are PR blockers:
sendButtononly wires home/lock.type_textcall. Special keys other than Enter/Backspace are not forwarded.list_devices started/completedevents that show in the activity feed (small noise — could move to event-driven later).Next steps (separate PR)
The Deus-side AAP host integration. The "Handoff to next PR" section at the bottom of
docs/device-use-v2-design.mdlays out the receiver-side responsibilities, the ~2.5k LOC of legacy simulator code that gets deleted once it works, 8 lessons device-use learned that should shape the host's behavior, and a concrete first-commit slice.🤖 Generated with Claude Code
Summary by CodeRabbit