Skip to content

feat: AAP host + Mobile Use app integration (launch/stop/skills end-to-end)#250

Merged
zvadaadam merged 7 commits into
mainfrom
zvadaadam/aap-host-review
Apr 20, 2026
Merged

feat: AAP host + Mobile Use app integration (launch/stop/skills end-to-end)#250
zvadaadam merged 7 commits into
mainfrom
zvadaadam/aap-host-review

Conversation

@zvadaadam

@zvadaadam zvadaadam commented Apr 19, 2026

Copy link
Copy Markdown
Owner

Summary

Implements the Agentic Apps Protocol host so Deus can spawn, supervise, and MCP-route installed agentic apps — deus.mobile-use ships as the first integrated app, launchable from the UI's Apps tab or via Claude's mcp__deus__launch_app tool (+ stop_app, list_apps, read_app_skill).

Backend owns lifecycle (spawn + ready-probe + SIGTERM stop, orphan sweep via PID journal), agent-server's app-registrar broadcasts MCP config via query.setMcpServers while preserving the session's built-in SDK servers (attach/detachQuery — without this the deus transport would disconnect mid-tool-call and hang launch_app forever), frontend opens/closes Browser tabs on apps:launched / apps:stopped q:events matching by an immutable openedAt stamp so Electron's chrome-error rewrite after app exit doesn't break the match.

Along the way fixed: device-use's serve now runs in-process (was orphaning a grandchild server on stop), its UI serves the prod bundle (NODE_ENV=production in manifest) with the static root anchored to the package via import.meta.url (AAP spawns with cwd={workspace}), and register-mcp is fire-and-forget to break a re-entrant control-request deadlock.

Test plan

  • Backend unit + integration (9 AAP test files, manifest/template/registry/lifecycle/mcp-bridge/port-allocator/storage/integration/command-dispatch)
  • App-registrar tests pin SDK-server preservation (regression guard)
  • Integration test pins the fire-and-forget register-mcp contract
  • Live agent-loop probe: launch_appread_app_skillstop_app round-trips, apps:launched + apps:stopped events fire with correct payloads
  • UI smoke: Apps tab → click Launch → viewer tab opens → click Stop → tab auto-closes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Agentic Apps (AAP): new "Apps" tab and launcher UI to browse, launch, stop, and monitor workspace-scoped apps with real-time status; launched apps open browser tabs and stopped apps close matching tabs.
    • Backend hosting: app lifecycle (launch/probe/stop), dynamic port allocation, orphan-process cleanup, MCP registration bridge, and APIs for listing/launching/stopping apps.
  • Documentation

    • Added comprehensive AAP host design doc.
  • Tests

    • Extensive unit and integration tests covering AAP lifecycle, tooling, and edge cases.
  • Chores

    • Ignore local .scratch/ directory.

…o-end)

Implements the Agentic Apps Protocol host so Deus can spawn, supervise, and
MCP-route installed agentic apps. Ships with `deus.mobile-use` (device-use)
as the first integrated app — launchable from the UI's Apps tab or via the
agent's `mcp__deus__launch_app` tool.

Architecture (per docs/aap-host-design.md):
- Shared: manifest schema (Zod), template substitution, public AAP types,
  `apps:launched` / `apps:stopped` q:events.
- Backend (services/aap/): apps.service owns the `runningApps` in-memory
  map; lifecycle (spawn + ready-probe + SIGTERM stop) runs single-process;
  pid-journal catches orphans across restarts; port-allocator + storage
  helpers; mcp-bridge fires register/unregister to the agent-server; new
  `readAppSkill(appId)` reads manifest-declared skill files on demand.
  q:command launchApp/stopApp for user-initiated UI path; shared service
  method for the agent-RPC path so both converge on identical inputs.
- Agent-server: deus-tools now exposes `list_apps`, `launch_app`,
  `stop_app`, `read_app_skill`; `app-registrar` holds the dynamic MCP map
  and broadcasts via `query.setMcpServers` while preserving each query's
  initial SDK servers via attachQuery/detachQuery — without this the SDK
  disconnects the transport mid-tool-call and hangs `launch_app` forever.
- Frontend: features/apps/ (AppsLauncher card in a new Apps content tab,
  status chips, `useInstalledApps`/`useRunningApps` subscriptions);
  `useAppsLaunched` opens a Browser tab on launch, `useAppsStopped` closes
  it on exit; Browser tabs stamp `openedAt` at creation so matching
  survives Electron's chrome-error rewrite after the backend dies.

Device-use fixes discovered during integration:
- `serve` runs in-process (was spawning a grandchild that orphaned on
  SIGTERM/SIGKILL since the cli.js wrapper couldn't forward signals).
- Manifest sets NODE_ENV=production so the UI serves the prod bundle
  instead of proxying to a nonexistent :5173 Vite dev server.
- Static root anchored to the package via `import.meta.url` — AAP spawns
  with `cwd={workspace}`, so `./dist/frontend` would resolve wrong.

AAP launch triggers fire-and-forget MCP registration from the backend to
break a re-entrant control-request deadlock: the very tool call in flight
is what owns the transport the register RPC needs to update.

Tests: 9 AAP test files cover manifest/template/registry/lifecycle/
mcp-bridge/port-allocator/storage/integration/command-dispatch; an
integration test pins the fire-and-forget contract; app-registrar tests
pin SDK-server preservation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 19, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Implements Agentic Apps (AAP) end-to-end: shared manifest/template/types, backend app lifecycle and runtime (spawn/probe/stop, PID journal, port allocation), MCP bridge to agent-server, agent-server registrar and MCP tools, frontend Apps UI/hooks, tests, docs, and adds .scratch/ to .gitignore.

Changes

Cohort / File(s) Summary
Backend — AAP service & runtime
apps/backend/src/services/aap/apps.service.ts, .../lifecycle.ts, .../pid-journal.ts, .../port-allocator.ts, .../storage.ts, .../registry.ts, .../index.ts, apps/backend/src/config/installed-apps.ts, apps/backend/src/lib/repo-root.ts, apps/backend/src/server.ts
Adds full AAP hosting stack: launch/stop/list/read APIs, lifecycle spawn/probe/stop primitives, PID journal, port allocator, storage/gitignore helpers, installed-manifests config, repo-root resolver, startup orphan sweep and shutdown stop hooks.
Backend — agent integration & commands
apps/backend/src/services/aap/mcp-bridge.ts, apps/backend/src/services/agent/service.ts, apps/backend/src/services/agent/client.ts, apps/backend/src/services/agent/commands.ts, apps/backend/src/services/agent/index.ts, apps/backend/src/services/query-engine.ts
Adds MCP bridge (register/unregister) calling agent RPCs, backend→agent passthrough (sendRequestToAgent), AAP command handlers (launchApp/stopApp), query resources (apps/running_apps), and workspace-archive stop logic.
Agent-server — registrar & tools
apps/agent-server/app-registrar.ts, apps/agent-server/agents/deus-tools/apps.ts, apps/agent-server/agents/deus-tools/index.ts, apps/agent-server/agents/claude/claude-handler.ts, apps/agent-server/agents/session-store.ts, apps/agent-server/event-broadcaster.ts, apps/agent-server/index.ts, apps/agent-server/rpc-schemas.ts
Introduces in-memory registrar that broadcasts full MCP maps to Claude Query instances with attach/detach APIs; composes AAP MCP tools (list/launch/stop/read_skill) into deus-tools; integrates attach/detach into Claude handler; adds EventBroadcaster AAP RPCs and RPC schemas; SessionStore.values() added.
Shared — manifest, template, types, events
shared/aap/manifest.ts, shared/aap/template.ts, shared/aap/types.ts, shared/events.ts
Adds AAP manifest Zod schema and helpers, template substitution utilities, shared AAP types and events, and extends protocol/query/command registries for apps.
Frontend — Apps UI, hooks, browser wiring
apps/web/src/features/apps/..., apps/web/src/features/browser/..., apps/web/src/features/workspace/..., apps/web/src/app/layouts/..., apps/web/src/app/layouts/content-tabs.ts
Adds Apps tab and UI (AppsLauncher, AppCard, AppStatusChip), hooks for installed/running/launched/stopped apps, browser pending new/close-tab queue and openedAt metadata, workspace tab type "apps", and mounts hooks in ContentView.
Device-use app & server
packages/device-use/agentic-app.json, packages/device-use/src/cli/commands/serve.ts, packages/device-use/src/server/index.ts
Manifest updates (NODE_ENV, skills, bootstrap hint); serve runs in-process with signal handling; static asset paths resolved via module URL.
Tests & docs
apps/backend/test/..., apps/agent-server/test/..., docs/aap-host-design.md
Adds extensive unit and integration tests covering lifecycle, registry, storage, registrar/mcp-bridge, deus-tools, lifecycle primitives, and integration flows; adds AAP design doc.
Misc
.gitignore
Ignore .scratch/ directory.

Sequence Diagram(s)

sequenceDiagram
    participant User as User (Frontend)
    participant Frontend as Apps UI
    participant Backend as Backend Service
    participant AgentServer as Agent-Server
    participant Query as Claude Query (SDK)

    User->>Frontend: Click "Launch" (appId, workspaceId)
    Frontend->>Backend: sendCommand("launchApp", {appId, workspaceId})
    Backend->>Backend: validate → allocate port → spawn child → waitForReady
    Backend->>Backend: record running entry → broadcast apps:launched
    Backend->>AgentServer: registerMcpForRunningApp(appId,mcpUrl) (fire-and-forget)
    AgentServer->>AgentServer: registerAppMcp(serverName,url) → update registeredServers
    AgentServer->>Query: query.setMcpServers(fullPerQueryMap)
    Backend-->>Frontend: return LaunchAppResult { runningAppId, url, bootstrap? }
    Frontend->>Frontend: useAppsLaunched handler opens browser tab
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰
I hopped through manifests, ports, and logs,
Spawned tiny servers, soothed their clogs,
Bridged MCPs and sent the call,
Opened tabs and caught them all,
Now apps are home—hip-hop hurrah!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.20% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: implementing AAP host infrastructure with Mobile Use app integration across launch, stop, and skills workflows end-to-end.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zvadaadam/aap-host-review

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/agent-server/agents/claude/claude-handler.ts (1)

490-505: ⚠️ Potential issue | 🔴 Critical

Hoist queryResult out of the try block before using it in finally.

queryResult is declared with block scope inside try (line 490), but the finally block at line 685 tries to access it. This causes a ReferenceError at runtime. Hoist the variable declaration and initialize it to null, then null-guard the cleanup to handle cases where the SDK query creation fails before assignment.

Proposed fix
       // Mutable context accumulated during the streaming loop.
       const ctx = createStreamContext();
+      let queryResult: ReturnType<typeof claudeSDK> | null = null;
 
       try {
-      const queryResult = claudeSDK({ prompt: promptInput, options: sdkOptions });
+      queryResult = claudeSDK({ prompt: promptInput, options: sdkOptions });
       console.log(
         `[TIMING][${generatorId}] claudeSDK() constructor returned in ${Date.now() - tSdkSpawn}ms`
       );
 
       claudeQueries.set(sessionId, queryResult);

and in the finally block:

     } finally {
       if (claudeSessions.owns(sessionId, session)) {
-        detachQuery(queryResult);
+        if (queryResult) detachQuery(queryResult);
         claudeQueries.delete(sessionId);
         claudeSessions.delete(sessionId);
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/agent-server/agents/claude/claude-handler.ts` around lines 490 - 505,
The variable queryResult is declared inside the try block but accessed in the
finally block causing a ReferenceError; hoist queryResult (declare it before the
try, initialized to null) in claude-handler.ts near where claudeSDK is invoked,
then update the finally cleanup to null-guard any operations that use
queryResult (e.g., claudeQueries.delete(sessionId), attachQuery cleanup, or
other teardown) so cleanup only runs when queryResult was successfully assigned
after calling claudeSDK({ prompt: promptInput, options: sdkOptions }) and before
calling attachQuery.
🧹 Nitpick comments (6)
apps/backend/src/services/aap/storage.ts (1)

14-20: Enforce the absolute-path contract before mkdir().

Right now ensureStorageDirs() will create any truthy path. If a manifest substitution ever yields a relative path, this writes under the backend process cwd instead of the intended workspace/global root. Fail fast here so malformed app config cannot create stray directories.

Suggested hardening
 export async function ensureStorageDirs(paths: {
   workspace?: string;
   global?: string;
 }): Promise<void> {
-  if (paths.workspace) await mkdir(paths.workspace, { recursive: true });
-  if (paths.global) await mkdir(paths.global, { recursive: true });
+  if (paths.workspace) {
+    if (!isAbsolute(paths.workspace)) {
+      throw new Error(`storage.workspace must be absolute: ${paths.workspace}`);
+    }
+    await mkdir(paths.workspace, { recursive: true });
+  }
+  if (paths.global) {
+    if (!isAbsolute(paths.global)) {
+      throw new Error(`storage.global must be absolute: ${paths.global}`);
+    }
+    await mkdir(paths.global, { recursive: true });
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/services/aap/storage.ts` around lines 14 - 20,
ensureStorageDirs currently calls mkdir on any truthy path which can create
directories relative to the process CWD; validate inputs and fail fast by
checking that any provided paths (the workspace and global parameters) are
absolute before calling mkdir. In the ensureStorageDirs function, use
path.isAbsolute (or equivalent) to assert each provided path is absolute and
throw a clear error if not, so mkdir(paths.workspace, ...) and
mkdir(paths.global, ...) are only called after the absolute-path check; include
the offending path value in the error message for easier debugging.
apps/backend/test/unit/services/aap/port-allocator.test.ts (1)

24-25: Rename the test to match actual behavior.

Line 24 says “sequential calls,” but Line 25 runs concurrent allocations via Promise.all. Renaming improves intent clarity.

✏️ Suggested rename
-  it("returns different ports across sequential calls (sanity)", async () => {
+  it("returns different ports across concurrent calls (sanity)", async () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/test/unit/services/aap/port-allocator.test.ts` around lines 24 -
25, The test name says "sequential calls" but the body calls allocateFreePort()
concurrently with Promise.all; update the it(...) description to reflect the
actual behavior (e.g., "returns different ports across concurrent calls" or
"returns different ports when called concurrently") so the test name matches the
use of allocateFreePort() in the Promise.all invocation.
apps/agent-server/test/deus-tools.test.ts (1)

267-383: Missing test coverage for read_app_skill tool.

Test suites exist for list_apps, launch_app, and stop_app, but read_app_skill has no corresponding test coverage despite being one of the four AAP tools defined in apps/agent-server/agents/deus-tools/apps.ts.

Do you want me to generate the test suite for read_app_skill?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/agent-server/test/deus-tools.test.ts` around lines 267 - 383, Add a new
test suite for the read_app_skill tool to mirror coverage of the other AAP
tools: call getRegisteredTools(server.instance)["read_app_skill"] and assert
that on success mockFrontendAPI.requestReadAppSkill is called with { sessionId:
SESSION_ID, appId } and the handler returns the skill JSON/text as expected; add
a test for when the backend returns a not-found or empty skill response (assert
appropriate message/omission), and a test for when
mockFrontendAPI.requestReadAppSkill.mockRejectedValue(...) to ensure the handler
returns an "AAP error: ..." string (does NOT throw). Use the same test patterns
and assertions as list_apps/launch_app/stop_app suites to keep behavior
consistent.
apps/backend/test/unit/services/agent-commands-aap.test.ts (1)

13-38: Use vi.hoisted for the shared mock handles.

Vitest 4 hoists vi.mock, and its factory cannot safely capture file-scope variables unless they come from vi.hoisted. These factories currently close over mockLaunchApp / mockStopApp / mockGetWorkspaceForMiddleware / mockComputeWorkspacePath, so please move those spies into a hoisted block before the mocks. (v4.vitest.dev)

♻️ Suggested change
-import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
+import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";

-// ---- Hoisted mocks ----
-const mockLaunchApp = vi.fn();
-const mockStopApp = vi.fn();
-const mockGetWorkspaceForMiddleware = vi.fn();
-const mockComputeWorkspacePath = vi.fn();
+// ---- Hoisted mocks ----
+const {
+  mockLaunchApp,
+  mockStopApp,
+  mockGetWorkspaceForMiddleware,
+  mockComputeWorkspacePath,
+} = vi.hoisted(() => ({
+  mockLaunchApp: vi.fn(),
+  mockStopApp: vi.fn(),
+  mockGetWorkspaceForMiddleware: vi.fn(),
+  mockComputeWorkspacePath: vi.fn(),
+}));

As per coding guidelines "Write tests using Vitest with vi.mock() and vi.hoisted()."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/test/unit/services/agent-commands-aap.test.ts` around lines 13 -
38, Move the file-scope mock spy variables into a vi.hoisted block so the
vi.mock factories don't close over regular variables; specifically hoist
mockLaunchApp, mockStopApp, mockGetWorkspaceForMiddleware, and
mockComputeWorkspacePath using vi.hoisted(...) before the vi.mock(...) calls,
then keep the vi.mock factories referencing those hoisted handles (the factories
that return launchApp/stopApp, getWorkspaceForMiddleware, and
computeWorkspacePath) so Vitest v4's hoisting works correctly.
apps/backend/test/integration/aap.test.ts (1)

593-594: Clarify the intent of void statements.

The void a; void b; statements appear to be silencing unused-variable warnings after stopAllApps() already cleaned up. Consider adding a brief comment explaining this is intentional to avoid lint errors, or use // eslint-disable-next-line @typescript-eslint/no-unused-vars`` if applicable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/test/integration/aap.test.ts` around lines 593 - 594, The two
lines "void a; void b;" are used to silence unused-variable warnings after
stopAllApps() has cleaned up; replace them with an explicit intention comment or
an inline ESLint suppression to make the intent clear: either add a brief
comment referencing stopAllApps() and why a and b are intentionally unused, or
add "// eslint-disable-next-line `@typescript-eslint/no-unused-vars`" immediately
above the declarations of a and b (or remove the variables if they are no longer
needed) so the lint rule and future readers understand this is intentional.
docs/aap-host-design.md (1)

44-64: Add language specifiers to fenced code blocks.

The ASCII architecture diagram (Lines 44-64) and file layout tree (Lines 76-120) lack language specifiers. While these aren't executable code, adding text or plaintext satisfies markdown linting and improves editor syntax highlighting.

📝 Suggested fix
-```
+```text
 Backend (owns state)                  Agent-server (stateless)           Frontend
-```
+```text
 shared/aap/
 ├── manifest.ts            # Zod schema, Manifest type, idToServerName

Also applies to: 76-120

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/aap-host-design.md` around lines 44 - 64, Add a language specifier
(e.g., "text" or "plaintext") to the fenced code blocks containing the ASCII
architecture diagram ("Backend (owns state)                  Agent-server
(stateless)           Frontend") and the file tree starting with "shared/aap/"
so markdown linting and editor highlighting pass; update the opening fences for
those blocks to include the specifier and leave the block contents unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/agent-server/test/deus-tools.test.ts`:
- Around line 82-85: The test is missing an assertion that the new AAP tool
"read_app_skill" is registered; update the assertion block that checks toolNames
(variable toolNames in the test) to include
expect(toolNames).toContain("read_app_skill") alongside the existing
expect(toolNames).toContain("list_apps"), expect(... "launch_app"), and
expect(... "stop_app") checks so the test validates the new tool registration.
- Around line 11-14: Add a mock for requestReadAppSkill in the same mock object
where requestListApps, requestLaunchApp, and requestStopApp are defined so tests
for the read_app_skill tool can run; locate the mock definition in
deus-tools.test.ts (the object containing
requestListApps/requestLaunchApp/requestStopApp) and add requestReadAppSkill:
vi.fn() alongside them, ensuring any test setup that references read_app_skill
uses this mock.

In `@apps/backend/src/services/aap/lifecycle.ts`:
- Around line 267-269: The current check for string-form "bin" mistakenly
compares pj.bin to command and returns pj.main; instead, when typeof pj.bin ===
"string" you should treat the binary name as the package name: check pj.name ===
command and return resolvePath(packageRoot, pj.bin) (use pj.name, pj.bin and
packageRoot/resolvePath to locate the change). Ensure you remove the incorrect
pj.main comparison and return the resolved pj.bin path.

In `@apps/backend/src/services/aap/pid-journal.ts`:
- Around line 55-56: The current chain .map((s) => parseInt(s.trim(),
10)).filter((n) => Number.isFinite(n) && n > 0) accepts strings like "123abc";
replace the loose parseInt coercion with a strict digit-only check: for each
string (the mapped variable), trim it, verify it matches /^\d+$/ (or equivalent)
before converting to a number, then filter by Number.isInteger(...) && n > 0;
update the mapping/filtering logic in the PID parsing pipeline (the map + filter
that produces PIDs) to only convert and emit values that pass the digit-only
test so malformed entries are ignored.

In `@apps/backend/src/services/aap/registry.ts`:
- Around line 34-45: The loop over manifest.skills currently only rejects
absolute paths (isAbsolute(rel)) but still permits paths like "../foo" that
escape the package directory; update the validation in the for loop that
processes manifest.skills (where isAbsolute(rel) and resolvePath(packageRoot,
rel) are used) to ensure the resolved absolute path (abs) is contained within
packageRoot — for example compute relative = path.relative(packageRoot, abs) and
throw an Error if relative startsWith('..') or path.isAbsolute(relative); keep
the existing existsSync(abs) check and use the same manifest.id and rel values
in the new error message.

In `@apps/backend/src/services/query-engine.ts`:
- Around line 542-545: The call to await Promise.race([
stopAppsForWorkspace(workspaceId), new Promise(... ARCHIVE_STOP_CEILING_MS) ])
can reject if stopAppsForWorkspace throws, which will abort the archive; make
stopping apps best-effort by swallowing errors from stopAppsForWorkspace before
racing with the timeout. Replace the direct call with a wrapped promise like
stopAppsForWorkspace(workspaceId).catch(err => { /* log a warning */ }) so the
race never rejects due to stopApps failures, preserving the existing timeout
(ARCHIVE_STOP_CEILING_MS) and keeping the archive mutation resilient.

In `@apps/backend/test/unit/shared/events.test.ts`:
- Around line 291-293: The test's expectations for PROTOCOL_EVENTS are stale:
update the assertions in the unit test that reference PROTOCOL_EVENTS to also
expect the "apps:stopped" event alongside "apps:launched" (i.e., ensure
PROTOCOL_EVENTS contains "apps:stopped") and change the length assertion from 22
to 23 to match the current exported constant set; locate the assertions in the
test referencing PROTOCOL_EVENTS and adjust them accordingly.

In `@apps/web/src/features/apps/ui/AppCard.tsx`:
- Around line 127-135: The primary action branch currently shows a disabled
"Launch" button even when the app is in the stopping state; update the rendering
logic inside the Button (the element using handleLaunch, busy, isTransitioning,
chipStatus) to detect when running.status (or chipStatus) === "stopping" and
render a "Stopping..." label (and keep the button disabled) instead of "Launch"
so the main action matches the chip and state machine.

In `@apps/web/src/features/browser/store/browserWindowStore.ts`:
- Around line 36-43: The current single-slot pendingNewTab/pendingCloseTab will
drop events; change both to FIFO queues (e.g., pendingNewTabs:
PendingNewTabRequest[] and pendingCloseTabs: PendingCloseTabRequest[]) and
update the API: have requestNewTab and requestCloseTabByUrlPrefix push a new
request object (including a unique requestId/timestamp) onto the appropriate
queue, and change consumePendingNewTab and consumePendingClose to accept a
requestId (or return the dequeued item’s requestId) so the BrowserPanel can
acknowledge and remove a specific entry instead of clearing the whole slot;
update related types and any callers (e.g., BrowserPanel consumer code and the
set/clear helpers like setDetachedWindowOpen/clearDetachedWindow) to
enqueue/dequeue by requestId.

In `@apps/web/src/features/browser/types.ts`:
- Around line 56-63: PersistedBrowserTab currently omits the openedAt field so
hydratePersistedTab() drops the invariant that identifies app-owned tabs; update
the PersistedBrowserTab type to include openedAt?: string and change
hydratePersistedTab() (and any serialization/save logic that builds
PersistedBrowserTab) to copy/preserve the openedAt value from the persisted data
into the hydrated BrowserTab object so restored tabs retain their openedAt
identifier for apps:stopped matching.

---

Outside diff comments:
In `@apps/agent-server/agents/claude/claude-handler.ts`:
- Around line 490-505: The variable queryResult is declared inside the try block
but accessed in the finally block causing a ReferenceError; hoist queryResult
(declare it before the try, initialized to null) in claude-handler.ts near where
claudeSDK is invoked, then update the finally cleanup to null-guard any
operations that use queryResult (e.g., claudeQueries.delete(sessionId),
attachQuery cleanup, or other teardown) so cleanup only runs when queryResult
was successfully assigned after calling claudeSDK({ prompt: promptInput,
options: sdkOptions }) and before calling attachQuery.

---

Nitpick comments:
In `@apps/agent-server/test/deus-tools.test.ts`:
- Around line 267-383: Add a new test suite for the read_app_skill tool to
mirror coverage of the other AAP tools: call
getRegisteredTools(server.instance)["read_app_skill"] and assert that on success
mockFrontendAPI.requestReadAppSkill is called with { sessionId: SESSION_ID,
appId } and the handler returns the skill JSON/text as expected; add a test for
when the backend returns a not-found or empty skill response (assert appropriate
message/omission), and a test for when
mockFrontendAPI.requestReadAppSkill.mockRejectedValue(...) to ensure the handler
returns an "AAP error: ..." string (does NOT throw). Use the same test patterns
and assertions as list_apps/launch_app/stop_app suites to keep behavior
consistent.

In `@apps/backend/src/services/aap/storage.ts`:
- Around line 14-20: ensureStorageDirs currently calls mkdir on any truthy path
which can create directories relative to the process CWD; validate inputs and
fail fast by checking that any provided paths (the workspace and global
parameters) are absolute before calling mkdir. In the ensureStorageDirs
function, use path.isAbsolute (or equivalent) to assert each provided path is
absolute and throw a clear error if not, so mkdir(paths.workspace, ...) and
mkdir(paths.global, ...) are only called after the absolute-path check; include
the offending path value in the error message for easier debugging.

In `@apps/backend/test/integration/aap.test.ts`:
- Around line 593-594: The two lines "void a; void b;" are used to silence
unused-variable warnings after stopAllApps() has cleaned up; replace them with
an explicit intention comment or an inline ESLint suppression to make the intent
clear: either add a brief comment referencing stopAllApps() and why a and b are
intentionally unused, or add "// eslint-disable-next-line
`@typescript-eslint/no-unused-vars`" immediately above the declarations of a and b
(or remove the variables if they are no longer needed) so the lint rule and
future readers understand this is intentional.

In `@apps/backend/test/unit/services/aap/port-allocator.test.ts`:
- Around line 24-25: The test name says "sequential calls" but the body calls
allocateFreePort() concurrently with Promise.all; update the it(...) description
to reflect the actual behavior (e.g., "returns different ports across concurrent
calls" or "returns different ports when called concurrently") so the test name
matches the use of allocateFreePort() in the Promise.all invocation.

In `@apps/backend/test/unit/services/agent-commands-aap.test.ts`:
- Around line 13-38: Move the file-scope mock spy variables into a vi.hoisted
block so the vi.mock factories don't close over regular variables; specifically
hoist mockLaunchApp, mockStopApp, mockGetWorkspaceForMiddleware, and
mockComputeWorkspacePath using vi.hoisted(...) before the vi.mock(...) calls,
then keep the vi.mock factories referencing those hoisted handles (the factories
that return launchApp/stopApp, getWorkspaceForMiddleware, and
computeWorkspacePath) so Vitest v4's hoisting works correctly.

In `@docs/aap-host-design.md`:
- Around line 44-64: Add a language specifier (e.g., "text" or "plaintext") to
the fenced code blocks containing the ASCII architecture diagram ("Backend (owns
state)                  Agent-server (stateless)           Frontend") and the
file tree starting with "shared/aap/" so markdown linting and editor
highlighting pass; update the opening fences for those blocks to include the
specifier and leave the block contents unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f317c5c7-b071-46f8-8bed-2e442eddc37a

📥 Commits

Reviewing files that changed from the base of the PR and between 8538df3 and 5256998.

📒 Files selected for processing (62)
  • .gitignore
  • apps/agent-server/agents/claude/claude-handler.ts
  • apps/agent-server/agents/deus-tools/apps.ts
  • apps/agent-server/agents/deus-tools/index.ts
  • apps/agent-server/agents/session-store.ts
  • apps/agent-server/app-registrar.ts
  • apps/agent-server/event-broadcaster.ts
  • apps/agent-server/index.ts
  • apps/agent-server/rpc-schemas.ts
  • apps/agent-server/test/app-registrar.test.ts
  • apps/agent-server/test/deus-tools.test.ts
  • apps/backend/src/config/installed-apps.ts
  • apps/backend/src/lib/repo-root.ts
  • apps/backend/src/server.ts
  • apps/backend/src/services/aap/apps.service.ts
  • apps/backend/src/services/aap/index.ts
  • apps/backend/src/services/aap/lifecycle.ts
  • apps/backend/src/services/aap/mcp-bridge.ts
  • apps/backend/src/services/aap/pid-journal.ts
  • apps/backend/src/services/aap/port-allocator.ts
  • apps/backend/src/services/aap/registry.ts
  • apps/backend/src/services/aap/storage.ts
  • apps/backend/src/services/agent/client.ts
  • apps/backend/src/services/agent/commands.ts
  • apps/backend/src/services/agent/index.ts
  • apps/backend/src/services/agent/service.ts
  • apps/backend/src/services/query-engine.ts
  • apps/backend/test/integration/aap.test.ts
  • apps/backend/test/unit/services/aap/lifecycle.test.ts
  • apps/backend/test/unit/services/aap/mcp-bridge.test.ts
  • apps/backend/test/unit/services/aap/port-allocator.test.ts
  • apps/backend/test/unit/services/aap/registry.test.ts
  • apps/backend/test/unit/services/aap/storage.test.ts
  • apps/backend/test/unit/services/agent-commands-aap.test.ts
  • apps/backend/test/unit/shared/aap/manifest.test.ts
  • apps/backend/test/unit/shared/aap/template.test.ts
  • apps/backend/test/unit/shared/events.test.ts
  • apps/web/src/app/layouts/ContentView.tsx
  • apps/web/src/app/layouts/content-tabs.ts
  • apps/web/src/features/apps/hooks/index.ts
  • apps/web/src/features/apps/hooks/useAppsLaunched.ts
  • apps/web/src/features/apps/hooks/useAppsStopped.ts
  • apps/web/src/features/apps/hooks/useInstalledApps.ts
  • apps/web/src/features/apps/hooks/useRunningApps.ts
  • apps/web/src/features/apps/index.ts
  • apps/web/src/features/apps/ui/AppCard.tsx
  • apps/web/src/features/apps/ui/AppStatusChip.tsx
  • apps/web/src/features/apps/ui/AppsLauncher.tsx
  • apps/web/src/features/apps/ui/index.ts
  • apps/web/src/features/browser/store/browserWindowStore.ts
  • apps/web/src/features/browser/types.ts
  • apps/web/src/features/browser/ui/BrowserPanel.tsx
  • apps/web/src/features/workspace/store/workspaceLayoutStore.ts
  • docs/aap-host-design.md
  • packages/device-use/agentic-app.json
  • packages/device-use/src/cli/commands/serve.ts
  • packages/device-use/src/server/index.ts
  • shared/aap/manifest.ts
  • shared/aap/template.ts
  • shared/aap/types.ts
  • shared/events.ts
  • shared/schema.ts

Comment thread apps/agent-server/test/deus-tools.test.ts
Comment thread apps/agent-server/test/deus-tools.test.ts
Comment thread apps/backend/src/services/aap/lifecycle.ts Outdated
Comment thread apps/backend/src/services/aap/pid-journal.ts Outdated
Comment thread apps/backend/src/services/aap/registry.ts
Comment thread apps/backend/src/services/query-engine.ts
Comment thread apps/backend/test/unit/shared/events.test.ts Outdated
Comment thread apps/web/src/features/apps/ui/AppCard.tsx
Comment on lines +36 to +43
pendingNewTab: PendingNewTabRequest | null;
pendingCloseTab: PendingCloseTabRequest | null;
setDetachedWindowOpen: (context: DetachedBrowserWorkspaceContext) => void;
clearDetachedWindow: () => void;
requestNewTab: (workspaceId: string, url: string) => void;
consumePendingNewTab: () => void;
requestCloseTabByUrlPrefix: (workspaceId: string, urlPrefix: string) => void;
consumePendingCloseTab: () => void;

@coderabbitai coderabbitai Bot Apr 19, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Single-slot “pending” state will drop tab open/close events.

pendingNewTab and pendingCloseTab only hold one request each. If two apps:launched or apps:stopped events arrive before BrowserPanel consumes the first one, the later set() overwrites the earlier request. The zero-arg consumePending*() calls can also clear a newer request accidentally. Please make these actual queues, and acknowledge consumption by requestId.

Also applies to: 74-96

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/features/browser/store/browserWindowStore.ts` around lines 36 -
43, The current single-slot pendingNewTab/pendingCloseTab will drop events;
change both to FIFO queues (e.g., pendingNewTabs: PendingNewTabRequest[] and
pendingCloseTabs: PendingCloseTabRequest[]) and update the API: have
requestNewTab and requestCloseTabByUrlPrefix push a new request object
(including a unique requestId/timestamp) onto the appropriate queue, and change
consumePendingNewTab and consumePendingClose to accept a requestId (or return
the dequeued item’s requestId) so the BrowserPanel can acknowledge and remove a
specific entry instead of clearing the whole slot; update related types and any
callers (e.g., BrowserPanel consumer code and the set/clear helpers like
setDetachedWindowOpen/clearDetachedWindow) to enqueue/dequeue by requestId.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding the single-slot design. Two reasons:

  1. Backend dedups. apps.service enforces one running instance per (appId, workspace), so a duplicate launchApp returns the existing runningAppId instead of emitting a second apps:launched. The race the suggestion targets (two events arriving before BrowserPanel consumes) is essentially unreachable for the same app, and a user would have to click two distinct app cards within the same React macrotask for it to fire across apps.
  2. Consumer is synchronous. BrowserPanel's useEffect fires on the same render cycle that the producer's set() triggers — no animation frame or async wait in between. This pattern (single-slot + immediate consume) is the same one we use for chatInsertStore.

Promoting to a queue with requestId ack would touch four call sites and add a new failure mode (forgotten ack → unbounded growth) for a race we can't reproduce. If we ever see it in practice, the fix lands in one place — happy to revisit then.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zvadaadam, understood — both arguments are solid. The backend's single-instance enforcement removes the root cause of the race, and the synchronous useEffect / consume cycle makes the window vanishingly small even across different apps. Acknowledged, I'll stop flagging this pattern here.


✏️ Learnings added
Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 250
File: apps/web/src/features/browser/store/browserWindowStore.ts:36-43
Timestamp: 2026-04-19T22:14:39.702Z
Learning: In `apps/web/src/features/browser/store/browserWindowStore.ts`, the single-slot `pendingNewTab`/`pendingCloseTab` design is intentional. Two reasons: (1) `apps.service` enforces one running instance per (appId, workspace), so a second `apps:launched` event for the same app is never emitted — the duplicate `launchApp` call returns the existing `runningAppId`; (2) `BrowserPanel`'s `useEffect` consumer fires synchronously on the same render cycle that the producer's Zustand `set()` triggers (same pattern as `chatInsertStore`), so the slot is consumed before any second event could overwrite it. Do not suggest converting to a FIFO queue or requestId-ack pattern in future reviews.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 244
File: apps/web/src/features/session/hooks/useSlashCommand.ts:87-154
Timestamp: 2026-04-15T21:01:15.460Z
Learning: In `apps/web/src/features/session/hooks/useSlashCommand.ts`, the `useEffect` that fetches skills/commands on workspace change intentionally omits a stale-response cancellation guard. The trade-off is deliberate: workspace path changes are rare (only on repo switch), the backend reads local filesystem files in sub-100ms, and the race condition window is practically impossible. Do not flag the absence of a cancellation guard in this hook in future reviews.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 247
File: apps/web/src/app/layouts/useChatTabs.ts:328-347
Timestamp: 2026-04-17T08:21:39.540Z
Learning: In `apps/web/src/app/layouts/useChatTabs.ts`, `updateChatTabAgentHarness` is intentionally a no-op when `tab.data?.hasStarted` is true. Once a session has messages, the harness is locked server-side in `handleSendMessage`; allowing local UI harness changes on started tabs would cause drift from the persisted session until reload. Do not flag this early-return as missing logic in future reviews.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 246
File: apps/backend/src/services/agent/commands.ts:240-272
Timestamp: 2026-04-16T17:05:23.922Z
Learning: In `apps/backend/src/services/agent/commands.ts`, the `sim:launchApp`, `sim:terminateApp`, and `sim:uninstallApp` handlers intentionally do not guard against the TOCTOU race between `getContextForWorkspace()` and `execFile("xcrun", ...)`. The decision is deliberate: simctl operations complete in under 100ms and session mutations (startStream/stopStream) are rare, making the race window practically negligible. Do not flag the absence of a session lock in these handlers in future reviews.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 244
File: apps/web/src/features/session/hooks/useSlashCommand.ts:67-154
Timestamp: 2026-04-15T21:01:13.186Z
Learning: In `apps/web/src/features/session/hooks/useSlashCommand.ts`, the one-shot fetch pattern using `useEffect` + `useState` + `fetchedWorkspaceRef` is intentional. Skills/commands are local config reads that happen once per workspace open and do not need TanStack Query (no cache invalidation, stale-while-revalidate, or server state management required). Do not suggest migrating this to TanStack Query; it would be over-engineering for this use case.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 240
File: apps/web/src/features/session/hooks/useTextBuffer.ts:49-60
Timestamp: 2026-04-12T23:40:23.156Z
Learning: In `apps/web/src/features/session/hooks/useTextBuffer.ts`, the markdown code-fence check in the rAF streaming buffer is intentionally conservative: it flushes the full text immediately when any unclosed ``` is detected in the current slice, even if it's a false positive (e.g. right after a closing fence). This trade-off is deliberate — premature flush (showing full text) is harmless, while rendering a partial/unclosed code fence causes visual glitches. Do not flag this as a bug in future reviews.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 246
File: apps/agent-server/event-broadcaster.ts:401-408
Timestamp: 2026-04-16T17:05:42.204Z
Learning: In zvadaadam/deus-machine, `EventBroadcaster.tunnels` in `apps/agent-server/event-broadcaster.ts` only has the backend attached in practice. The agent-server exposes a single WebSocket endpoint that only the backend connects to. `requestSimulatorContext` (calling `rpc("getSimulatorContext", ...)`) is safe because `apps/backend/src/services/agent/service.ts` intercepts `onFrontendRpc` calls with `method === "getSimulatorContext"` and returns `getContextForSession(sessionId)` directly, never forwarding to relay or desktop clients. Do not flag multi-tunnel routing concerns for this method.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 247
File: apps/backend/cli.ts:278-278
Timestamp: 2026-04-17T08:15:31.894Z
Learning: In zvadaadam/deus-machine, `apps/backend/cli.ts` is a local developer-only integration CLI that reads `--agent` exclusively from process argv and writes to the developer's own local SQLite DB. SQL interpolation of `opts.agent` without escaping is acceptable risk here — do not flag raw argv interpolation in this file in future reviews.

Learnt from: zvadaadam
Repo: zvadaadam/box-ide PR: 177
File: src/app/layouts/hooks/useRepoActions.ts:21-22
Timestamp: 2026-03-12T23:25:58.674Z
Learning: In zvadaadam/box-ide, two TypeScript path aliases coexist: 'shared/*' -> '/shared/' (root-level, cross-package) and '@/shared/*' -> 'src/shared/' (frontend-only). Do not flag mixed usage as inconsistent since they reference different directories and scopes. Ensure imports use the correct alias per intended scope and that tsconfig.json and vite.config.ts are configured accordingly. This applies to all TypeScript files in the repo.

Comment thread apps/web/src/features/browser/types.ts
- claude-handler: hoist queryResult declaration so the finally block can
  detach the SDK query without a ReferenceError on early throws (CI red)
- deus-tools test: add requestReadAppSkill mock + read_app_skill suite,
  bump tool count 41 → 42 (CI red)
- events test: PROTOCOL_EVENTS now contains apps:stopped, length 22 → 23
  (CI red)
- pid-journal: strict /^\d+$/ parse so a corrupted entry like "123abc"
  can't target an unrelated PID during the orphan sweep
- aap registry: contain skill paths within the package root so a future
  untrusted manifest can't escape via "../foo" via read_app_skill
- aap lifecycle: fix bin shorthand resolution per the npm spec — match
  pj.name === command and resolve pj.bin (was matching pj.bin and
  resolving pj.main, dead code today but a foot-gun for future apps)
- query-engine archiveWorkspace: swallow stopAppsForWorkspace rejection
  so a slow/crashing child can't block the archive — orphan sweep on
  next boot is the safety net
- AppCard: render "Stopping..." (matching the chip) instead of a
  disabled "Launch" while running.status === "stopping"
- browser types: persist openedAt in PersistedBrowserTab + hydrate it
  back so apps:stopped can still match an app-owned tab after a
  workspace reload

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
apps/backend/src/services/aap/lifecycle.ts (2)

52-52: Naming mismatch: RING_MAX_LINES limits chunks, not lines.

The constant name suggests line-based limiting, but pushBounded pushes entire Buffer.toString() results (chunks). A single data event can span multiple lines or be a partial line. The current behavior is fine for a tail buffer, but the name is misleading.

📝 Suggested rename for clarity
-const RING_MAX_LINES = 50;
+const RING_MAX_CHUNKS = 50;

Also applies to: 85-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/services/aap/lifecycle.ts` at line 52, The constant
RING_MAX_LINES is misleading because it limits pushed chunks (from
Buffer.toString()) rather than lines; rename RING_MAX_LINES to something
reflecting chunk-based limits (e.g., RING_MAX_CHUNKS or RING_MAX_ENTRIES) and
update all references (including where pushBounded is called) and any related
comments to match the new name; ensure uses in lifecycle.ts (the declaration and
the logic around pushBounded at lines referenced in the review) are updated
consistently so behavior remains unchanged but the identifier accurately
describes that it bounds chunks, not lines.

155-163: Use ts-pattern with .exhaustive() for the ReadyProbe discriminated union.

ReadyProbe is a discriminated union on type ("http" | "tcp"). The current if-else chain with a fallback return false bypasses compile-time exhaustiveness checking.

♻️ Suggested refactor using ts-pattern
+import { match } from "ts-pattern";
+
 async function probeOnce(probe: ReadyProbe, port: number): Promise<boolean> {
   try {
-    if (probe.type === "http") return await probeHttp(port, probe.path);
-    if (probe.type === "tcp") return await probeTcp(port);
-    return false;
+    return await match(probe)
+      .with({ type: "http" }, (p) => probeHttp(port, p.path))
+      .with({ type: "tcp" }, () => probeTcp(port))
+      .exhaustive();
   } catch {
     return false;
   }
 }

As per coding guidelines: "Prefer ts-pattern for discriminated unions. Use .exhaustive() for exhaustiveness checking."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend/src/services/aap/lifecycle.ts` around lines 155 - 163, Replace
the if-chain in probeOnce so the ReadyProbe discriminated union is handled via
ts-pattern: match(probe).with({type: "http"}, () => probeHttp(port,
probe.path)).with({type: "tcp"}, () => probeTcp(port)).exhaustive(); keep the
try/catch and return semantics (wrap the match call in the existing try and
return false on catch) and reference the functions ReadyProbe, probeOnce,
probeHttp, and probeTcp so the pattern match is used for compile-time
exhaustiveness checking with .exhaustive().
apps/agent-server/agents/claude/claude-handler.ts (1)

500-509: Consider refining the type cast.

The cast to McpSdkServerConfigWithInstance is technically imprecise since sdkOptions.mcpServers contains config objects passed to the SDK, not the connected instances returned by the SDK. However, this is functionally safe because:

  1. broadcast() spreads these into a Record<string, McpServerConfig> (the base type)
  2. TypeScript's structural typing means missing instance properties are ignored downstream

The current code works, but a more precise type would be Record<string, McpServerConfig> matching what broadcast() actually consumes.

♻️ Optional: Use more precise type
-      const sdkServers = (sdkOptions.mcpServers ?? {}) as Record<
-        string,
-        McpSdkServerConfigWithInstance
-      >;
+      const sdkServers = (sdkOptions.mcpServers ?? {}) as Record<string, McpServerConfig>;

This would also require updating the attachQuery signature in app-registrar.ts to accept Record<string, McpServerConfig>.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/agent-server/agents/claude/claude-handler.ts` around lines 500 - 509,
The code narrows sdkOptions.mcpServers to McpSdkServerConfigWithInstance which
is imprecise; change the local type to Record<string, McpServerConfig> (the
shape broadcast() actually consumes) and update attachQuery's parameter type in
app-registrar.ts to accept Record<string, McpServerConfig> so the cast is no
longer needed; locate sdkOptions.mcpServers and the attachQuery call in
claude-handler.ts and update the attachQuery signature in app-registrar.ts
accordingly (retain behavior, just adjust types).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/web/src/features/browser/ui/BrowserPanel.tsx`:
- Around line 352-354: The filter for doomed tabs currently only matches when
t.openedAt exists; change the predicate in the tabs.filter used to compute
doomed so it first tries t.openedAt.startsWith(pendingCloseTab.urlPrefix) and,
if t.openedAt is falsy, falls back to checking
t.currentUrl?.startsWith(pendingCloseTab.urlPrefix) so older persisted tabs
without openedAt are included; update the expression around doomed to use this
fallback logic and keep pendingCloseTab.urlPrefix as the matching value.

---

Nitpick comments:
In `@apps/agent-server/agents/claude/claude-handler.ts`:
- Around line 500-509: The code narrows sdkOptions.mcpServers to
McpSdkServerConfigWithInstance which is imprecise; change the local type to
Record<string, McpServerConfig> (the shape broadcast() actually consumes) and
update attachQuery's parameter type in app-registrar.ts to accept Record<string,
McpServerConfig> so the cast is no longer needed; locate sdkOptions.mcpServers
and the attachQuery call in claude-handler.ts and update the attachQuery
signature in app-registrar.ts accordingly (retain behavior, just adjust types).

In `@apps/backend/src/services/aap/lifecycle.ts`:
- Line 52: The constant RING_MAX_LINES is misleading because it limits pushed
chunks (from Buffer.toString()) rather than lines; rename RING_MAX_LINES to
something reflecting chunk-based limits (e.g., RING_MAX_CHUNKS or
RING_MAX_ENTRIES) and update all references (including where pushBounded is
called) and any related comments to match the new name; ensure uses in
lifecycle.ts (the declaration and the logic around pushBounded at lines
referenced in the review) are updated consistently so behavior remains unchanged
but the identifier accurately describes that it bounds chunks, not lines.
- Around line 155-163: Replace the if-chain in probeOnce so the ReadyProbe
discriminated union is handled via ts-pattern: match(probe).with({type: "http"},
() => probeHttp(port, probe.path)).with({type: "tcp"}, () =>
probeTcp(port)).exhaustive(); keep the try/catch and return semantics (wrap the
match call in the existing try and return false on catch) and reference the
functions ReadyProbe, probeOnce, probeHttp, and probeTcp so the pattern match is
used for compile-time exhaustiveness checking with .exhaustive().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0d9b6487-0a44-447f-a9ea-4046996ff224

📥 Commits

Reviewing files that changed from the base of the PR and between 5256998 and 836b68d.

📒 Files selected for processing (10)
  • apps/agent-server/agents/claude/claude-handler.ts
  • apps/agent-server/test/deus-tools.test.ts
  • apps/backend/src/services/aap/lifecycle.ts
  • apps/backend/src/services/aap/pid-journal.ts
  • apps/backend/src/services/aap/registry.ts
  • apps/backend/src/services/query-engine.ts
  • apps/backend/test/unit/shared/events.test.ts
  • apps/web/src/features/apps/ui/AppCard.tsx
  • apps/web/src/features/browser/types.ts
  • apps/web/src/features/browser/ui/BrowserPanel.tsx
✅ Files skipped from review due to trivial changes (1)
  • apps/backend/src/services/aap/pid-journal.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • apps/backend/src/services/query-engine.ts
  • apps/backend/test/unit/shared/events.test.ts
  • apps/web/src/features/browser/types.ts
  • apps/web/src/features/apps/ui/AppCard.tsx
  • apps/backend/src/services/aap/registry.ts

Comment on lines +352 to +354
const doomed = tabs.filter(
(t) => t.openedAt && t.openedAt.startsWith(pendingCloseTab.urlPrefix)
);

@coderabbitai coderabbitai Bot Apr 19, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a fallback matcher for tabs without openedAt.

Line 353 currently excludes tabs that don’t have openedAt (e.g., older persisted tabs before this field existed). That can miss expected close behavior on apps:stopped. Keep openedAt as primary, but fall back to currentUrl when missing.

💡 Proposed patch
-    const doomed = tabs.filter(
-      (t) => t.openedAt && t.openedAt.startsWith(pendingCloseTab.urlPrefix)
-    );
+    const doomed = tabs.filter((t) => {
+      const matchUrl = t.openedAt ?? t.currentUrl ?? "";
+      return matchUrl.startsWith(pendingCloseTab.urlPrefix);
+    });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const doomed = tabs.filter(
(t) => t.openedAt && t.openedAt.startsWith(pendingCloseTab.urlPrefix)
);
const doomed = tabs.filter((t) => {
const matchUrl = t.openedAt ?? t.currentUrl ?? "";
return matchUrl.startsWith(pendingCloseTab.urlPrefix);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/features/browser/ui/BrowserPanel.tsx` around lines 352 - 354,
The filter for doomed tabs currently only matches when t.openedAt exists; change
the predicate in the tabs.filter used to compute doomed so it first tries
t.openedAt.startsWith(pendingCloseTab.urlPrefix) and, if t.openedAt is falsy,
falls back to checking t.currentUrl?.startsWith(pendingCloseTab.urlPrefix) so
older persisted tabs without openedAt are included; update the expression around
doomed to use this fallback logic and keep pendingCloseTab.urlPrefix as the
matching value.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding the openedAt-only matcher. The block comment two lines above (apps/web/src/features/browser/ui/BrowserPanel.tsx:343-347) explains why currentUrl is the wrong fallback: when an app dies, Electron's BrowserView fails the next reload with ERR_CONNECTION_REFUSED and rewrites currentUrl to chrome-error://chromewebdata/, so the startsWith(urlPrefix) check would silently miss the very tab we want to close. openedAt is the immutable identifier that solves that.

The 'older persisted tabs' migration scenario also doesn't apply here — this PR introduces both the field and its persistence simultaneously, and there's no shipped version where AAP-owned tabs existed without it. Falling back to currentUrl would also force-close any user-opened tab that happens to share a URL prefix (e.g. their own localhost:3000 dev tab), which the AAP stop flow shouldn't do.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zvadaadam, you're right on all three counts — the reasoning is airtight:

  1. currentUrl erasure: Electron's ERR_CONNECTION_REFUSED reload rewriting currentUrl to chrome-error://chromewebdata/ means any startsWith(urlPrefix) check on currentUrl would silently miss the very tab it's supposed to close — the opposite of a safe fallback.
  2. No migration gap: Since openedAt stamping and its persistence land in the same PR as AAP tab creation, there's no pre-existing persisted state where AAP-owned tabs lack the field.
  3. Accidental close risk: Falling back to currentUrl could close unrelated user tabs that happen to share the same localhost origin/port prefix.

Withdrawing the suggestion. The openedAt-only approach is correct here.


✏️ Learnings added
Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 250
File: apps/web/src/features/browser/ui/BrowserPanel.tsx:352-354
Timestamp: 2026-04-19T22:14:42.822Z
Learning: In `apps/web/src/features/browser/ui/BrowserPanel.tsx`, the `pendingCloseTab` effect intentionally matches only on `t.openedAt` (not `t.currentUrl`) when filtering tabs to close on `apps:stopped`. Do NOT suggest a `currentUrl` fallback for three reasons: (1) Electron rewrites `currentUrl` to `chrome-error://chromewebdata/` on `ERR_CONNECTION_REFUSED` reload when an app dies, so `startsWith(urlPrefix)` on `currentUrl` would silently miss the tab; (2) `openedAt` and its persistence were introduced in the same PR as AAP tab creation, so there is no migration scenario where AAP-owned tabs lack the field; (3) falling back to `currentUrl` risks accidentally closing user-opened tabs that share the same localhost origin/port prefix.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 244
File: apps/web/src/features/session/hooks/useSlashCommand.ts:87-154
Timestamp: 2026-04-15T21:01:15.460Z
Learning: In `apps/web/src/features/session/hooks/useSlashCommand.ts`, the `useEffect` that fetches skills/commands on workspace change intentionally omits a stale-response cancellation guard. The trade-off is deliberate: workspace path changes are rare (only on repo switch), the backend reads local filesystem files in sub-100ms, and the race condition window is practically impossible. Do not flag the absence of a cancellation guard in this hook in future reviews.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 247
File: apps/web/src/app/layouts/useChatTabs.ts:328-347
Timestamp: 2026-04-17T08:21:39.540Z
Learning: In `apps/web/src/app/layouts/useChatTabs.ts`, `updateChatTabAgentHarness` is intentionally a no-op when `tab.data?.hasStarted` is true. Once a session has messages, the harness is locked server-side in `handleSendMessage`; allowing local UI harness changes on started tabs would cause drift from the persisted session until reload. Do not flag this early-return as missing logic in future reviews.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 240
File: apps/web/src/features/session/hooks/useTextBuffer.ts:49-60
Timestamp: 2026-04-12T23:40:23.156Z
Learning: In `apps/web/src/features/session/hooks/useTextBuffer.ts`, the markdown code-fence check in the rAF streaming buffer is intentionally conservative: it flushes the full text immediately when any unclosed ``` is detected in the current slice, even if it's a false positive (e.g. right after a closing fence). This trade-off is deliberate — premature flush (showing full text) is harmless, while rendering a partial/unclosed code fence causes visual glitches. Do not flag this as a bug in future reviews.

Learnt from: zvadaadam
Repo: zvadaadam/deus-machine PR: 247
File: apps/backend/cli.ts:278-278
Timestamp: 2026-04-17T08:15:31.894Z
Learning: In zvadaadam/deus-machine, `apps/backend/cli.ts` is a local developer-only integration CLI that reads `--agent` exclusively from process argv and writes to the developer's own local SQLite DB. SQL interpolation of `opts.agent` without escaping is acceptable risk here — do not flag raw argv interpolation in this file in future reviews.

@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 1

Fixed (9 in 836b68d4)

  • read_app_skill — added missing mock + assertion + 3-test suite (count 41 → 42)
  • PROTOCOL_EVENTS length 22 → 23 (apps:stopped now asserted)
  • pid-journal: strict /^\d+$/ parse so corrupted journals can't target unrelated PIDs
  • aap/registry: path.relative containment check for skill paths (defense-in-depth for v2 untrusted manifests)
  • aap/lifecycle: bin shorthand resolver now matches pj.name === command and resolves pj.bin per the npm spec
  • query-engine.archiveWorkspace: stopAppsForWorkspace rejection no longer blocks the archive (orphan sweep is the safety net)
  • AppCard: renders Stopping... while chipStatus === "stopping" instead of a disabled "Launch"
  • PersistedBrowserTab: persists + hydrates openedAt so restored AAP tabs still match apps:stopped
  • Plus: hoisted queryResult declaration in claude-handler to fix the ReferenceError: queryResult is not defined that was failing CI

Disagreed (2)

  • Single-slot pending tab requests → queue: the backend dedupes one-instance-per-(appId, workspace), the consumer's useEffect fires synchronously with the producer's set(), and promoting to a queue with requestId ack adds a real failure mode (forgotten ack → unbounded growth) for an unreproducible race. Pattern matches our existing chatInsertStore design.
  • openedAt matcher → fall back to currentUrl: the entire purpose of openedAt is that currentUrl becomes chrome-error://chromewebdata/ after the app dies, so the fallback would re-introduce the bug openedAt was added to solve. The "old persisted tabs" migration scenario doesn't apply since this PR introduces both the field and its persistence together.

CI is green (4/4 checks passing).

- aap/lifecycle: rename RING_MAX_LINES → RING_MAX_CHUNKS — the bound is on
  pushBounded chunks (one per stdout/stderr "data" event), not lines, and
  the old name was misleading
- aap/lifecycle: switch probeOnce's ReadyProbe handling to ts-pattern
  match(...).exhaustive() per CLAUDE.md ("Prefer ts-pattern for
  discriminated unions, use .exhaustive() for exhaustiveness checking").
  Drops the dead `return false` fallback that was bypassing the check
- app-registrar / claude-handler: type protectedByQuery (and attachQuery's
  parameter) as Record<string, McpServerConfig> — the union the SDK's
  setMcpServers actually consumes. Drops the imprecise cast to
  McpSdkServerConfigWithInstance in claude-handler

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 2

Caught 3 nitpick comments that were collapsed inside the second CodeRabbit review body (not posted as inline comments).

Fixed (3 in c26fe273)

  • aap/lifecycle.ts: rename RING_MAX_LINESRING_MAX_CHUNKSpushBounded pushes whole chunks (one per Buffer.toString() event), not lines
  • aap/lifecycle.ts:probeOnce: replace the if/else chain with ts-pattern match(...).exhaustive() per the CLAUDE.md convention. Drops the dead return false fallback that was masking the missing exhaustiveness check
  • app-registrar + claude-handler: type protectedByQuery and attachQuery's parameter as Record<string, McpServerConfig> — the union the SDK's setMcpServers actually consumes — and drop the imprecise cast to McpSdkServerConfigWithInstance in claude-handler

All 407 agent-server tests + 14 lifecycle tests still pass. CI green.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/agent-server/agents/claude/claude-handler.ts`:
- Around line 683-685: The detachQuery call should run unconditionally so the
Query's protectedByQuery entry is always removed even if
claudeSessions.owns(sessionId, session) becomes false; change the finally block
to first call detachQuery(queryResult) if queryResult is truthy, and then
perform the ownership-guarded cleanup (i.e., only call
claudeQueries.delete(sessionId) and any claudeSessions-related deletions inside
the if (claudeSessions.owns(sessionId, session)) block). Keep the
functions/variables claudeSessions.owns, detachQuery, queryResult, and
claudeQueries.delete as the locating references.

In `@apps/agent-server/app-registrar.ts`:
- Around line 90-92: attachQuery currently only stores sdkServers in
protectedByQuery for a new Query but does not populate that Query with the
existing registered servers, so queries attached after apps were registered
never see pre-existing entries; modify attachQuery (or call a sync step from
broadcast after attachQuery) to merge the current registeredServers snapshot
into the new Query immediately — i.e., after protectedByQuery.set(query,
sdkServers) invoke the same codepath that applies a registration payload to a
Query (using the registeredServers map) so the newly attached Query receives all
currently registered servers without waiting for the next register/unregister
event.
- Around line 63-81: Concurrent registry updates can interleave: wrap the
mutate-then-broadcast sequence in a single serialized lock so registerAppMcp and
unregisterAppMcp never run concurrently and an older broadcast cannot overwrite
a newer snapshot; implement a module-scoped async mutex/promise-chain (e.g., a
single Promise lastOp or a small mutex class) and perform the
registeredServers.set/delete and the subsequent await broadcast() inside that
critical section, and apply the same serialization pattern to any other
functions that call broadcast() (the code path that triggers
query.setMcpServers()) so all registry mutations are ordered.

In `@apps/backend/src/services/aap/lifecycle.ts`:
- Around line 64-65: The manifest's relative paths must be anchored to the app
package root: when computing cwd use the substituted launch.cwd but if that
value is a relative path resolve it against packageRoot (so cwd becomes
path.resolve(packageRoot, substitutedCwd) when not absolute); likewise update
resolveCommand(launch.command, packageRoot) to treat commands that contain path
separators or start with ./ or ../ as file paths and resolve those relative
commands against packageRoot (instead of the process cwd), and apply the same
fix wherever resolveCommand or launch.cwd are used later (e.g., the spawn/exec
paths around the 264–299 logic) so spawned processes run from the packageRoot
context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 334ffec3-d6f0-4577-ab86-61594e6ea309

📥 Commits

Reviewing files that changed from the base of the PR and between 836b68d and c26fe27.

📒 Files selected for processing (3)
  • apps/agent-server/agents/claude/claude-handler.ts
  • apps/agent-server/app-registrar.ts
  • apps/backend/src/services/aap/lifecycle.ts

Comment thread apps/agent-server/agents/claude/claude-handler.ts
Comment thread apps/agent-server/app-registrar.ts Outdated
Comment thread apps/agent-server/app-registrar.ts
Comment thread apps/backend/src/services/aap/lifecycle.ts Outdated
- claude-handler: detach the Query from the registrar unconditionally in
  the finally block. The registrar's protectedByQuery is keyed by Query
  identity, so a rapid re-query (which makes claudeSessions.owns return
  false) used to leak the dead Query handle until process exit. Only the
  sessionId-keyed maps still need the ownership guard.
- app-registrar: serialize register/unregister broadcasts through a single
  promise chain. The JSON-RPC dispatcher fires handlers without locking,
  so two concurrent state changes could race — the later mutation finishing
  first, then the earlier broadcast overwriting the SDK with a stale
  snapshot. FIFO chain enforces submission order without a real mutex.
- app-registrar: catch new queries up to the current dynamic state when
  attachQuery runs. Previously, apps registered before a session existed
  stayed invisible to that session until the next register/unregister
  event happened to fire.
- aap/lifecycle: anchor relative `launch.cwd` and path-form `launch.command`
  (e.g. "./dist/cli.js") to the app's packageRoot. Node's spawn would
  otherwise resolve them against the backend's process cwd — fine for
  device-use today, but a foot-gun for any future manifest using
  relative paths.

Tests:
- app-registrar: catch-up, no-op-when-empty-registry, FIFO serialization
- aap/lifecycle: relative cwd resolves under packageRoot, not process.cwd

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 3

CodeRabbit caught a 4-comment cluster around AAP registrar concurrency / lifecycle. All actionable, all fixed in da5f4778:

Fixed (4)

  • claude-handler: detachQuery runs unconditionally in finally so a rapid re-query no longer leaks the dead Query's protectedByQuery entry until process exit
  • app-registrar: serialize registerAppMcp / unregisterAppMcp through a FIFO promise chain — concurrent RPC handlers can't let an older broadcast overwrite a newer snapshot
  • app-registrar: attachQuery catches a newly-joining query up to the current dynamic state, so apps launched before the session started are visible immediately instead of waiting for the next register/unregister event
  • aap/lifecycle: anchor relative launch.cwd and path-form launch.command (e.g. ./dist/cli.js) to packageRoot — Node's spawn would otherwise resolve them against the backend's process cwd

Tests added (4)

  • attachQuery after a register catches the new query up to the current dynamic state
  • attachQuery is a no-op (no setMcpServers) when nothing is registered yet
  • serializes concurrent register/unregister in FIFO order
  • anchors a relative launch.cwd to packageRoot, not the backend's process cwd

All 410 agent-server tests + 61 AAP unit/integration tests pass locally. CI green.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
apps/agent-server/test/app-registrar.test.ts (1)

162-181: Consider exposing a test hook for awaiting the update chain.

The __sync__ register/unregister pattern works but is indirect. A dedicated __awaitPendingUpdatesForTests() helper in the registrar would make the intent clearer and avoid the dead-letter 0.0.0.0:0 URL. However, this is a minor ergonomic nit—the current approach correctly validates the FIFO guarantee.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/agent-server/test/app-registrar.test.ts` around lines 162 - 181, Add a
dedicated test hook on the registrar to await the pending update chain instead
of using the indirect "__sync__" register/unregister trick: expose a method
named something like awaitPendingUpdatesForTests() in the registrar
implementation and export it for tests, then update this test to call await
awaitPendingUpdatesForTests() after attachQuery (replacing the
registerAppMcp("__sync__", ...) / unregisterAppMcp("__sync__") calls). Ensure
the new helper waits until the same update chain that attachQuery triggers has
been processed so the existing expectation on q1.setMcpServers remains valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/agent-server/test/app-registrar.test.ts`:
- Around line 100-109: Rename the test case to reflect that unregistering an
unknown server does not broadcast; update the it(...) description string so it
matches the assertion (e.g., change the string in the test containing
"unregister an unknown server is a no-op for the map but still broadcasts
current state" to something like "unregister an unknown server is a no-op and
does not broadcast"), keeping the test body that uses claudeQueries, q1, and
asserts q1.setMcpServers.not.toHaveBeenCalled() and the call to
unregisterAppMcp("never_registered") unchanged.

In `@apps/backend/src/services/aap/lifecycle.ts`:
- Around line 120-122: The current code finalizes buffered stderr inside the
child.once("exit", ...) handler which can miss pending stderr data; change the
listener to child.once("close", (code, signal) => { finalizeExit(code, signal,
getStderr()); }) so finalizeExit is invoked after stdio is fully drained; update
any references to the exit handler to use "close" and keep using finalizeExit
and getStderr as before.

---

Nitpick comments:
In `@apps/agent-server/test/app-registrar.test.ts`:
- Around line 162-181: Add a dedicated test hook on the registrar to await the
pending update chain instead of using the indirect "__sync__"
register/unregister trick: expose a method named something like
awaitPendingUpdatesForTests() in the registrar implementation and export it for
tests, then update this test to call await awaitPendingUpdatesForTests() after
attachQuery (replacing the registerAppMcp("__sync__", ...) /
unregisterAppMcp("__sync__") calls). Ensure the new helper waits until the same
update chain that attachQuery triggers has been processed so the existing
expectation on q1.setMcpServers remains valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90b886b1-0e23-46c2-b44b-98459fb3392e

📥 Commits

Reviewing files that changed from the base of the PR and between c26fe27 and da5f477.

📒 Files selected for processing (5)
  • apps/agent-server/agents/claude/claude-handler.ts
  • apps/agent-server/app-registrar.ts
  • apps/agent-server/test/app-registrar.test.ts
  • apps/backend/src/services/aap/lifecycle.ts
  • apps/backend/test/unit/services/aap/lifecycle.test.ts

Comment thread apps/agent-server/test/app-registrar.test.ts Outdated
Comment thread apps/backend/src/services/aap/lifecycle.ts
- aap/lifecycle: switch the spawnApp completion listener from "exit" to
  "close" so all stdio is fully drained before getStderr() is captured.
  The ring buffer exists for crash diagnostics; using "exit" could miss
  the very stderr chunks that explain why a child died.
- lifecycle.test "SIGKILLs a child that ignores SIGTERM": await the
  onExit callback explicitly. Previously the test relied on stopChild
  returning at the same tick as onExit; "close" lands a tick or two
  after "exit" so the assertion would race the callback.
- app-registrar.test: rename "no-op for the map but still broadcasts
  current state" → "silent no-op — no broadcast" so the description
  matches the assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 4

Two minor cleanups in 9b8ee7bd:

Fixed (2)

  • aap/lifecycle: finalize buffered stderr on the close event instead of exit. The ring buffer exists for crash diagnostics; using exit could miss the late stderr chunks that explain why the child died. Updated the SIGKILL regression test to await onExit directly (close lands slightly after exit, so the old assertion was racing the callback).
  • app-registrar test: renamed "no-op for the map but still broadcasts current state""silent no-op — no broadcast" so the description matches the assertion.

All 410 agent-server + 61 AAP tests pass locally.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/backend/test/unit/services/aap/lifecycle.test.ts`:
- Around line 149-163: The test starts a delayed setTimeout that flips the
server response but never clears its timer, so if the test fails early the
pending callback can mutate server state; store the timer id returned from
setTimeout (replace the anonymous call with e.g. const flipTimer =
setTimeout(...)) and in the existing finally block call clearTimeout(flipTimer)
before awaiting close() to ensure the delayed flip is cancelled; update
references around the server flip code and the finally block where close() is
awaited to clear the timer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cec806a6-b27d-4b99-9c29-ed43f174ad25

📥 Commits

Reviewing files that changed from the base of the PR and between da5f477 and 9b8ee7b.

📒 Files selected for processing (3)
  • apps/agent-server/test/app-registrar.test.ts
  • apps/backend/src/services/aap/lifecycle.ts
  • apps/backend/test/unit/services/aap/lifecycle.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/agent-server/test/app-registrar.test.ts
  • apps/backend/src/services/aap/lifecycle.ts

Comment thread apps/backend/test/unit/services/aap/lifecycle.test.ts Outdated
- "SIGTERMs a graceful child" was racing the onExit callback after the
  spawnApp listener moved from "exit" to "close" (close lands a tick or
  two after exit). Await the callback through a Promise like the SIGKILL
  test already does, so the assertion fires after onExit lands.
- "keeps polling when the server returns 5xx" never cleared its 400ms
  flip timer in the finally block — an early throw would leave the
  callback alive and hit a closed server in the next test. Hoist the
  timer outside try and clearTimeout in finally for tighter isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 5

Fixed (2)

  • lifecycle.test "SIGTERMs a graceful child" — was racing the onExit callback after Round 4's exitclose switch (close lands a tick or two after exit, so the post-stopChild assertion read empty values). Await the callback through a Promise like the SIGKILL test already does. This unbreaks CI.
  • lifecycle.test "keeps polling when 5xx then 200" — hoist the 400ms flip timer outside the try block and clearTimeout in finally. An early throw would otherwise leave the callback alive and hit a closed server in the next test.

All 15 lifecycle tests pass locally.

CodeRabbit suggested finalizing on "close" to capture any trailing stderr
chunks emitted between exit and close. That's theoretically safer, but on
Linux CI it makes the SIGTERM lifecycle test time out at 10s:

  spawn("sh -c \"sleep 30\"") → sh (PID X) forks sleep (PID Y)
  stopChild sends SIGTERM to X; dash exits immediately without
  propagating SIGTERM to Y. `sleep` is reparented to init and keeps
  holding its inherited stdout/stderr pipe fds open. Node's "close"
  event only fires when ALL writers release the pipe — so close waits
  up to 30 seconds for `sleep` to naturally exit.

macOS bash kills its children on SIGTERM, so close fires immediately
there. But we can't rely on that on Linux.

Trade-off: the ring buffer already captures everything delivered up to
the exit moment. For crash diagnostics that's the material signal —
a handful of trailing bytes from a not-yet-flushed write are not worth
the cross-platform hang. Revert to "exit" and document the rationale
in place so the next reviewer doesn't re-propose the same change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvadaadam

Copy link
Copy Markdown
Owner Author

Greenlight — Round 6 (revert)

Rolling back the Round 4 exitclose switch in 58214fb0. Linux CI surfaced a real issue:

sh -c "sleep 30" on Linux doesn't propagate SIGTERM to its child. When stopChild SIGTERMs the shell, sleep is reparented to init and holds the inherited stdio pipe fds open — Node's close event only fires once every writer has released the pipe, so it waits up to 30 seconds for sleep to naturally exit. That's what tripped the 10s test timeout on the previous CI run.

macOS bash kills its children on SIGTERM so close fires immediately there; we can't rely on that.

Trade-off: the ring buffer already captures everything delivered by the time exit fires — for crash-tail diagnostics, the trailing bytes we might lose to a pending flush aren't worth a cross-platform hang on every orphan-leaving child. Rationale is in-code so a future reviewer doesn't re-propose the same change.

15 lifecycle tests pass. CI should be green.

@zvadaadam zvadaadam merged commit dc6a46a into main Apr 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant