From 6fab62d8cb195e25e0208105f02818df4eaa1b9d Mon Sep 17 00:00:00 2001 From: AviPeltz Date: Sun, 26 Apr 2026 20:12:18 -0700 Subject: [PATCH] chore: ignore local-only plans dir Move ad-hoc reference notes out of the tracked plans/ tree and into plans/local/, which is gitignored. --- .gitignore | 3 + ...ound-agents-chat-architecture-reference.md | 340 ---------------- ...de-electron-chat-architecture-reference.md | 298 -------------- plans/t3code-chat-architecture-reference.md | 379 ------------------ 4 files changed, 3 insertions(+), 1017 deletions(-) delete mode 100644 plans/background-agents-chat-architecture-reference.md delete mode 100644 plans/opencode-electron-chat-architecture-reference.md delete mode 100644 plans/t3code-chat-architecture-reference.md diff --git a/.gitignore b/.gitignore index f4011029bab..2c81080d953 100644 --- a/.gitignore +++ b/.gitignore @@ -90,3 +90,6 @@ superset-dev-data/ # MCP config (contains per-user server URLs/tokens) .mcp.json .cursor/mcp.json + +# Local-only plans (not tracked) +plans/local/ diff --git a/plans/background-agents-chat-architecture-reference.md b/plans/background-agents-chat-architecture-reference.md deleted file mode 100644 index e8d64d2dfb5..00000000000 --- a/plans/background-agents-chat-architecture-reference.md +++ /dev/null @@ -1,340 +0,0 @@ -# Background Agents (Open-Inspect) Chat Architecture — Reference Notes - -Research notes on how [Open-Inspect](./temp/background-agents) implements chat for background coding agents with multiplayer real-time collaboration. Written as background for the v2 chat transport rearchitecture (see `host-service-chat-architecture.md`, `v2-chat-greenfield-architecture.md`, and the companion `t3code-chat-architecture-reference.md` and `opencode-electron-chat-architecture-reference.md`). All paths below are relative to `temp/background-agents/` unless noted. - -## TL;DR - -Open-Inspect (inspired by Ramp's Inspect) is the most directly relevant reference architecture for what we're building. It uses **Cloudflare Durable Objects as the control plane** — one DO per session holding per-session SQLite, a WebSocket hub, and a FIFO prompt queue — and **Modal sandboxes as the execution plane**. Multiple humans can collaborate on one session in real time: every connected client subscribes to the same DO, the DO broadcasts events, and a small participant-presence service keeps everyone aware of who's there. WebSocket hibernation makes thousands of idle sessions cheap. Sessions can spawn child sessions into separate sandboxes. Input can originate from the web UI, Slack, GitHub, Linear, or webhooks — they all converge on the same session DO. This is **exactly the DO-based architecture I proposed as P5 of our plan, already built, production-style.** - -## Architecture diagram - -``` -┌──────────────────────────┐ ┌─────────────────────────────────┐ ┌─────────────────────┐ -│ Clients (many per │ │ Control plane │ │ Execution plane │ -│ session, many types) │ │ Cloudflare Workers + DO │ │ Modal sandbox │ -├──────────────────────────┤ ├─────────────────────────────────┤ ├─────────────────────┤ -│ │ │ │ │ │ -│ Web UI │ │ SessionDO (one per session) │ │ supervisor │ -│ (Next.js + React) │────┐ │ ┌──────────────────────────┐ │ │ ├─ entrypoint.py │ -│ │ │ │ │ per-session SQLite │ │ │ ├─ OpenCode agent │ -│ Slack bot │────┼─▶│ │ session · participants │ │ ┌──▶│ └─ bridge │ -│ │ │ │ │ messages (FIFO queue) │ │ │ │ (WS back) │ -│ GitHub bot (PR hooks) │────┤ │ │ events (indexed stream)│ │ │ │ │ -│ │ │ │ │ artifacts · sandbox │ │ │ │ filesystem: │ -│ Linear bot │────┤ │ │ ws_client_mapping │ │ │ │ workspace + │ -│ │ │ │ └──────────────────────────┘ │ │ │ dev environment │ -│ Webhooks / cron │────┘ │ │ │ │ │ -│ │ │ WebSocket hub (hibernation) │◀──┼───│ emits: │ -│ ────────────────────────│ │ many client WS + │ │ │ token events │ -│ │ │ one sandbox WS per session │ │ │ tool-call events│ -│ open a session: │ │ │ │ │ step-finish │ -│ POST /sessions │─────▶ │ HTTP surface: │───┘ │ cost · errors │ -│ (web / bot / hook) │ │ POST /sessions │ │ │ -│ │ │ POST /sessions/:id/ws-token │ │ │ -│ join its stream: │ │ GET /sessions/:id/ws │ │ │ -│ GET /sessions/:id/ws │─────▶ │ POST /sessions/:id/children/* │ │ │ -│ (WS, hibernation tag) │ │ POST /sessions/:id/cancel │ │ │ -│ │ │ GET /sessions/:id/spawn-ctx │ │ │ -│ │ │ │ │ │ -└──────────────────────────┘ └─────────────────────────────────┘ └─────────────────────┘ - │ ▲ - ▼ │ - ┌───────────────────┐ │ - │ Global D1 │ │ - │ session_index │ │ - │ repo_metadata │ │ - │ repo_secrets │ Modal lifecycle: spawn, │ - │ user_scm_tokens │ ready, snapshot, stop, │ - └───────────────────┘ restore-from-snapshot │ - │ - spawn_task (agent-initiated) ─────────────┘ - child session = new SessionDO + new sandbox - - Transports: - • Client ↔ Control plane : HTTPS + WebSocket (with hibernation) - • Control plane ↔ Sandbox: WebSocket (sandbox-auth-token handshake) - • Control plane ↔ D1 : HTTP (global metadata, secrets, session index) - • Sandbox ↔ git remote : HTTPS (GitHub App token, ephemeral) -``` - -### Same thing as a Mermaid diagram - -```mermaid -flowchart LR - subgraph Clients["Clients (many per session)"] - direction TB - Web["Web UI (Next.js)"] - Slack["Slack bot"] - GitHub["GitHub bot"] - Linear["Linear bot"] - Webhooks["Webhooks / cron"] - end - - subgraph ControlPlane["Control plane (Cloudflare Workers + Durable Objects)"] - direction TB - subgraph DO["SessionDO (one per session)"] - direction TB - SQLite[("per-session SQLite
session · participants ·
messages (FIFO) · events ·
artifacts · sandbox ·
ws_client_mapping")] - WSHub["WebSocket hub
client WSs + sandbox WS
(hibernation)"] - Presence["Presence service"] - Lifecycle["SandboxLifecycleManager
(pure decision fns)"] - end - HTTP["HTTP surface
POST /sessions · /children/* · /cancel
GET /ws · /spawn-context"] - D1[("Global D1
session_index · repo_metadata ·
repo_secrets · user_scm_tokens")] - HTTP --> DO - DO --> D1 - end - - subgraph Exec["Execution plane (Modal)"] - direction TB - Sandbox["Sandbox
supervisor · OpenCode agent · bridge"] - Snap[("Modal Image snapshots
(filesystem state)")] - Sandbox --> Snap - end - - Web -->|HTTPS + WS| HTTP - Slack --> HTTP - GitHub --> HTTP - Linear --> HTTP - Webhooks --> HTTP - - Web <-.->|WS: sandbox_event · presence · etc| WSHub - WSHub <-.->|WS: sandbox-auth-token| Sandbox - DO -.->|spawn_task creates child DO| DO -``` - -Solid arrows = requests. Dotted arrows = long-lived streaming connections. - -## Topology - -Three explicit tiers and the middle one is a Durable Object: - -1. **Clients** — web, Slack, GitHub, Linear, webhooks. All converge on the same HTTP + WS surface. -2. **Control plane** — Cloudflare Workers routing requests to per-session Durable Objects. Each DO owns one session: its SQLite database, its WebSocket connections, its lifecycle state. Stateless D1 database underneath for global indexes and repo metadata. -3. **Execution plane** — Modal sandboxes. One sandbox per session (or per child session). Runs OpenCode agent inside a real dev environment. Connects back to its session's DO via WebSocket. - -## Packages - -- `shared/` — TypeScript types, auth utilities, session/spawn context shapes. Consumed by everything. -- `control-plane/` — Cloudflare Workers + Durable Objects. The brain. Hosts `SessionDO`. -- `web/` — Next.js 16 + React 19 app. Session UI, OAuth, dashboard, real-time streaming. -- `slack-bot/`, `github-bot/`, `linear-bot/` — Cloudflare Workers (Hono). Translate external events into control-plane HTTP calls. -- `modal-infra/` — Python 3.12 Modal app. Sandbox supervisor + OpenCode runner + bridge that talks to the session DO. -- `sandbox-runtime/` — Python. Shared sandbox utilities. -- `daytona-infra/` — alternative sandbox provider, less used. Modal is the default. -- `terraform/` — IaC for Cloudflare Workers, Vercel, Modal, D1 schema migrations. Production deployment model, not demo-ware. - -## Control plane: Durable Objects - -Every session is a `SessionDO` addressed by session ID. Each DO owns: - -- **Per-session SQLite database** (lives inside the DO). Tables (`control-plane/src/session/schema.ts`): - - `session` — repo, branch, model, status (`created | active | completed | failed | archived | cancelled`), cost. - - `participants` — users present in this session with encrypted SCM tokens and `ws_auth_token` hashes. - - `messages` — prompt queue, FIFO by insertion order. - - `events` — sandbox events (tokens, tool calls, step-finish, errors). Indexed on `(created_at, id)` for cursor-paginated reads. - - `artifacts` — PRs, screenshots, branch refs. - - `sandbox` — current sandbox id, status, auth token, snapshot image id. - - `ws_client_mapping` — stable `wsId → participantId` so hibernated WebSockets can be rehydrated. -- **Active WebSocket connections** — many client WSs + one sandbox WS per session. -- **Sandbox lifecycle state machine** — implemented as pure decision functions (`evaluateSpawnDecision`, `evaluateCircuitBreaker`, `evaluateInactivityTimeout`, `evaluateHeartbeatHealth` in `control-plane/src/sandbox/lifecycle/manager.ts`). - -**Why DOs earn their keep here:** - -- Single-threaded per session → no concurrency races inside one session. -- SQLite-backed storage → durable, survivable across deploys, supports range reads. -- WebSocket hibernation → sessions can be idle for hours with zero cost and wake instantly when a message arrives. -- Global addressability → a Slack bot in one Worker can route to the exact DO holding the session. - -**Global state in D1** (regular Cloudflare D1, not per-session): - -- `session_index` — list of all sessions, keyed by user_id, for dashboards. -- `repo_metadata` — descriptions, aliases, Slack channel associations. -- `repo_secrets` — AES-256-GCM encrypted environment variables per repo. -- `user_scm_tokens` — cached OAuth tokens with refresh logic. - -## Transport: WebSocket with hibernation - -One WS per client, plus one WS per sandbox, all terminating at the session's DO. - -**Authentication flow:** - -1. User OAuths against GitHub → gets user id and SCM token. -2. Client calls `POST /sessions/:id/ws-token` and receives a 24-hour JWT. -3. Client opens `GET /sessions/:id/ws` WebSocket. -4. Client sends `{ type: "subscribe", token, clientId }` as first message. -5. DO validates the token hash against `participants.ws_auth_token`, looks up the participant, tags the WS with `wsid:` via `ctx.acceptWebSocket(ws, [tag])`, records the mapping in `ws_client_mapping`. -6. DO replies with `{ type: "subscribed", sessionId, state, artifacts, participantId, replay? }`. - -After that, the WS is hibernation-eligible — the DO can sleep while the WS stays open. - -**Client → server messages:** - -- `ping` · `subscribe` · `prompt { content, model?, attachments? }` · `stop` · `typing` · `presence { status, cursor? }` - -**Server → client messages:** - -- `pong` · `subscribed { …, replay? }` · `sandbox_event { event }` · `presence_sync` · `presence_update` · `sandbox_spawning | sandbox_ready | sandbox_error` · `artifact_created` · `snapshot_saved` · `session_status` · `child_session_update` · `error` - -**Hibernation recovery.** When the DO wakes after hibernation, it reads `ws_client_mapping` to re-associate each WS with its participant. No client action is required; the client is simply still subscribed. - -## Session model - -**One session = one piece of work tied to a repo.** Sessions are long-lived across client connections — you close your browser, come back tomorrow, the session is still there. - -**Created via:** web (`POST /sessions`), Slack `@mention`, GitHub PR webhook, Linear issue assignment, or automation trigger. All converge on the same creation path that: - -1. Generates session id. -2. Writes to `session_index` (global D1). -3. Creates the `SessionDO` and initializes its per-session SQLite. -4. Inserts the initial prompt into `messages` as `pending`. - -**Status lifecycle:** `created → active → completed | failed | archived | cancelled`. - -**Message queue.** Prompts go into the `messages` table with FIFO order. The DO processes one at a time. Concurrent `prompt` messages from two users on the same session just queue up — no dropping, no merge conflict. - -## Multiplayer real-time collaboration - -This is what makes Open-Inspect unusually relevant for us. Multiple humans can subscribe to the same session DO and see identical state in real time. - -**Event broadcasting.** When the sandbox emits an event, the DO: - -1. Persists it into the per-session `events` table. -2. Calls `forEachClientSocket("authenticated_only", ws => ws.send({ type: "sandbox_event", event }))`. - -Every connected client sees the same stream in the same order. No per-client state, no reconciliation. - -**Presence.** A `PresenceService` inside the DO maintains the roster of currently connected clients with last-seen timestamps and active/idle status. `presence_sync` on join hands a client the current roster; `presence_update` fans out on changes. - -**Identity.** Each event carries `participantId`, `name`, `avatar`, derived from the `participants` table. Clients render messages with correct attribution regardless of which user typed them. - -**Concurrency.** DOs are single-threaded per session — a Cloudflare platform guarantee. If User A and User B send prompts at the same moment, both hit the DO serially; both inserts go into `messages` in insertion order. The agent processes them one at a time. No CRDTs, no locks, no conflict resolution — the DO's single-threaded invariant does the work. - -**Event replay on reconnect.** The `subscribed` reply to a re-joining client can include an optional `replay: { events, hasMore, cursor }` payload. The client catches up via cursor-paginated history, then joins the live stream. Cursor is `{ timestamp, id }` into the `events` table's index — the same shape any paginator would use. - -**What this gets you.** A user starts a session on desktop, walks away, someone else on a phone opens the same session and sees the full history plus the live tail. Both can chime in; both see the other's messages. When the user gets back to desktop, they see both their and their colleague's contributions. No extra plumbing, no conflict resolution, no sync issues. - -## Parallel sub-tasks (`spawn_task`) - -An agent tool lets a running session spawn **child sessions** into separate sandboxes. Implemented as a typed tool the agent can call during a turn. - -**Guardrails:** - -- `MAX_SPAWN_DEPTH = 2` — children can't spawn children (prevents fork bombs). -- `MAX_CONCURRENT_CHILDREN = 5` — at most five running at once. -- `MAX_TOTAL_CHILDREN = 15` — lifetime per parent session. - -**Mechanics:** - -1. Parent agent calls `spawn_task({ title, prompt })`. -2. Control plane creates new session with `parent_session_id`, `spawn_source: "agent"`, `spawn_depth: parent.depth + 1`, inheriting repo/model/owner. -3. Child fetches `GET /sessions/:id/spawn-context` to get parent-owned SCM tokens and model config. -4. Child sandbox spins up, child DO enqueues the prompt, child runs independently. -5. Parent continues its own turn; it does not block on the child. -6. Child posts progress via `POST /sessions/:parent-id/children/:child-id` which the parent DO broadcasts as `child_session_update` to the parent's subscribers. -7. Parent agent can call `get_task_status` and `cancel_task` tools to poll or abort children. -8. Final merge (PRs, file changes) is explicit — the parent decides what to do with children's artifacts. - -This is a non-trivial pattern. No CRDT, no automatic merging — just typed tools, a parent-child graph, and explicit coordination. - -## Sandbox lifecycle (Modal) - -Three startup modes, chosen by the lifecycle manager based on current state: - -- **Fresh start:** spawn container → clone repo → run `.openinspect/setup.sh` → run `.openinspect/start.sh` → agent ready. Slowest (~30-300s). -- **Snapshot restore:** restore filesystem from Modal Image snapshot → `git pull` → run `start.sh` → agent ready. Usually <10s. -- **Repo image start:** use pre-built image → incremental `git pull` → `start.sh` → agent ready. Also fast. - -**Snapshots** capture filesystem state and are taken after successful prompts, before inactivity timeout, or on explicit request. Stored as Modal Image IDs referenced from the `sandbox` row in per-session SQLite. - -**Warming on typing.** When a client sends `{ type: "typing" }`, the control plane broadcasts `sandbox_warming` and begins spawning the sandbox speculatively. By the time the actual prompt arrives the sandbox is often ready. Hides cold-start latency. - -**Lifecycle decisions are pure functions.** `evaluateSpawnDecision(state) → decision`, `evaluateCircuitBreaker(state) → decision`, etc., return discriminated-union results; the manager then performs side effects via injected dependencies. Easy to unit-test, easy to reason about. - -## Entry point unification - -All integrations terminate at the same `POST /sessions` + WebSocket surface. The session doesn't know whether a prompt came from Slack or the web — the message goes into the same `messages` queue. - -**Callback notifications** for integrations that need a feedback loop: `CallbackNotificationService` dispatches async tasks (`ctx.waitUntil`) when the agent makes progress — a tool call, a PR creation, a completion — and the Slack/GitHub/Linear bot posts an update back to the original thread/PR/issue. These don't block the session and don't count against the prompt queue. - -## Persistence - -| Data | Location | -|---|---| -| Messages queue + history | Per-session SQLite in DO | -| Events stream | Per-session SQLite in DO, indexed on `(created_at, id)` | -| Participants + encrypted SCM tokens | Per-session SQLite in DO | -| Artifacts (PRs, screenshots) | Per-session SQLite + R2 for large media | -| Sandbox lifecycle state | Per-session SQLite in DO | -| WebSocket hibernation mapping | `ws_client_mapping` in per-session SQLite | -| Global session index | D1 (global) | -| Repo metadata + aliases | D1 | -| Repo secrets (encrypted) | D1 | -| User OAuth tokens (cached) | D1 | - -Everything session-scoped is local to the DO. Everything shared is in D1. No Postgres, no Redis, no broker. Cloudflare's platform primitives carry it. - -## Tool calls, approvals, interrupts - -**Tool calls** emit `tool_call` events: `{ tool, args, callId, status: "running" | "completed" | "error" }`. Broadcast to all clients. No server-side approval step — tools execute immediately. - -**Interrupts.** Client sends `{ type: "stop" }`. DO closes the sandbox WS, marks sandbox `stopped`, sets session status `cancelled`, broadcasts `session_status`. - -**Approvals.** Not implemented. Would require adding a `pending` tool-call state with a client-originated approval message and a sandbox-side block. Achievable but not present. - -## Auth / single-tenant model - -Open-Inspect is **single-tenant by design** — "all users are trusted members of the same organization." One shared GitHub App installation per deployment, no per-user repo access validation, no tenant isolation. - -**Token types:** - -- **GitHub App token** (shared, ephemeral) — clone and push from sandbox. -- **User OAuth token** (per user) — PR creation and attribution. AES-256-GCM encrypted at rest in `participants`. -- **Sandbox auth token** — one per session; sandbox uses it to prove itself to the session DO. -- **WebSocket JWT** — one per client-session pair, 24h TTL. - -**Why single-tenant?** The shared GitHub App model is the architectural shortcut that makes collaboration easy. Multi-tenant would require per-tenant GitHub App installations, access validation on session creation, and tenant isolation in the data model — none of which is here. - -## Noteworthy patterns worth stealing - -- **Durable Object per session as the whole "one owner per session + durable state + multi-subscriber fan-out" primitive in one building block.** This is the P5 architecture in my v2-chat plan, already built. No Postgres event table, no in-process pubsub, no LISTEN/NOTIFY — DOs give it to you. -- **WebSocket hibernation with `ws_client_mapping`.** Idle sessions cost nothing; reconnects are seamless; multi-device works without any special code. -- **Cursor-based event replay** (`{ timestamp, id }` paginator over the events table). Simple, indexed, works for any subscriber joining late. -- **FIFO prompt queue in SQL.** Concurrent prompts from multiple users queue cleanly; no race, no drop. -- **Pure decision functions for lifecycle.** Testable, easy to reason about, decoupled from I/O. -- **Presence as a first-class thing.** Not just "who's connected" but also status (active/idle), last-seen, role. Worth copying the shape if we ever add presence. -- **Speculative sandbox warming on typing.** Cheap UX win that hides a few hundred ms of cold-start every turn. -- **Entry-point unification at the HTTP layer.** One `POST /sessions`, many producers. Keeps the bot implementations skinny. -- **Pure-tool model for sub-task spawning.** The agent gets `spawn_task` / `get_task_status` / `cancel_task` as regular typed tools; parallelism is an agent-level concern, not an infrastructure one. - -## Things that are fragile (or we'd do differently) - -- **DO storage size cap.** SQLite per DO is 10 GB today. Per-session is fine but a very chatty session could bump into it. No eviction story visible. -- **24 h WS JWT with no auto-rotation.** Close code 4001 on expiry, and the web client's retry logic isn't obvious from the code — if a user leaves a tab open overnight, they likely have to re-auth. -- **D1 as the global session-index bottleneck.** SQLite under the hood; fine at low-to-mid scale, but high concurrent session creation could hit contention. Not a problem we'll face any time soon. -- **No approval flow.** Auto-approve-everything is a deliberate choice given sandboxes are ephemeral and scoped, but it means there's no building block for "ask the user before running this shell command." -- **Single-tenant assumption baked in.** Shared GitHub App + no per-user access validation is the explicit design. Fine for Ramp-shaped internal tools, not fine for a product with external users. -- **Snapshot failures are silent.** If a Modal snapshot fails, the next session pays a cold-start cost and nobody tells you. -- **Events are lost on session archival/deletion.** No long-term archival to object storage beyond media artifacts. If "audit this session from six months ago" becomes a requirement, we'd need to add it. -- **Sandbox runs OpenCode, period.** No provider abstraction inside the sandbox. If we ever want Claude Code or Codex inside Modal, it's another implementation in `modal-infra`, not a swap. - -## Signal for our rearchitecture - -Ranked by direct relevance: - -1. **This is the DO-based P5 we were sketching, already running.** Same shape — one DO per session, per-session SQLite, WebSocket hub with hibernation, events table with cursor replay. The exact things I said DOs give you (single-threaded ordering, built-in fan-out, durable storage per session, cheap idle cost) are the exact things this uses. -2. **Multi-subscriber real-time is solved by "subscribe every client to the same DO and broadcast events."** No custom broker. If we ever want multi-user collaboration in a Superset workspace, the pattern transfers directly. -3. **Entry-point unification** — `POST /sessions` from web or a bot lands in the same session. For Superset this maps onto "session is a workspace thing, reachable from any client" — web, mobile, a hypothetical Slack bot — without special per-integration logic. -4. **Hibernation as a cost story.** Idle chat sessions are free (no running server). Important if we want to keep long-lived history accessible. -5. **FIFO prompt queue in SQL as the concurrency primitive.** Concurrent `sendMessage` from two devices? Both insert into the same table, agent picks up one at a time, no race. Much simpler than in-process per-session async queues. -6. **Cursor replay over a timestamped events table.** Same idea as our `replayEvents(fromSeq)` RPC — subtly different (timestamp+id pair vs monotonic seq). Their version has the advantage of not needing a separate counter; ours has the advantage that gap detection is a subtraction. -7. **Separation of control plane and execution plane.** Cloudflare DO for state + WebSocket, Modal for the sandbox. If we move toward cloud runtime, this split is the right shape: state layer stays tight and durable, execution layer is wherever the agent happens to run. - -Things **not** to take directly: - -- **Single-tenant shortcut.** Fine for them, not a fit for us. -- **Auto-approve everything.** We want disciplined typed approvals like t3code's. -- **OpenCode-only in the sandbox.** We have our own harness (Mastracode) and shouldn't replace it. -- **Cloudflare ecosystem lock-in** — a real cost to weigh even when adopting the *pattern*. Implementing the same shape over Node + Postgres is achievable; it's just more code than DOs give you for free. - -The bigger meta-signal: **everything we've been sketching about cloud runtime + multi-device + multi-subscriber already has a concrete, running reference implementation here.** If we get to P5 of our plan and the decision is "Cloudflare DOs vs. rolling our own on Postgres," this repo is the argument for Cloudflare. It's not a prototype — it's a real system with bot integrations, parallel sub-tasks, presence, snapshots, and a Terraform deploy story. diff --git a/plans/opencode-electron-chat-architecture-reference.md b/plans/opencode-electron-chat-architecture-reference.md deleted file mode 100644 index 6f91a0edbad..00000000000 --- a/plans/opencode-electron-chat-architecture-reference.md +++ /dev/null @@ -1,298 +0,0 @@ -# OpenCode (Electron) Chat Architecture — Reference Notes - -Research notes on how [OpenCode](./temp/opencode) wires chat inside its Electron desktop app. Written as background for the v2 chat transport rearchitecture (see `host-service-chat-architecture.md`, `chat-mastra-rebuild-execplan.md`, and the companion `t3code-chat-architecture-reference.md`). All paths below are relative to `temp/opencode/` unless noted. - -## TL;DR - -OpenCode runs its agent runtime **in the same Node process as the Electron main process**. On startup, main calls `spawnLocalServer()` which binds a Hono HTTP server to `127.0.0.1:` with HTTP Basic auth (username `opencode`, password = a UUID generated per app launch). The renderer talks to that server over plain HTTP + **Server-Sent Events** — not IPC, not WebSockets. Writes are REST; reads are a single long-lived SSE subscription to a global event bus. Client state is SolidJS + a lightweight event bus; no Redux/Zustand. Partial assistant output streams as **`message.part.delta`** events (field + delta string) that the client applies token-by-token, coalesced to one flush per animation frame. - -## Architecture diagram - -``` -┌────────────────────────── Electron (one Node / V8 process) ──────────────────────────┐ -│ │ -│ ┌─────────────── MAIN ─────────────┐ ┌────────── RENDERER ──────────┐ │ -│ │ │ │ │ │ -│ │ preload/index.ts │ │ SolidJS app │ │ -│ │ └─ contextBridge "api": │ │ │ │ -│ │ awaitInitialization() ────┼─ IPC ────▶ │ window.api │ │ -│ │ storeGet / storeSet │ (bootstrap│ { url, username, password } │ │ -│ │ dialogs, killSidecar │ only) │ │ │ │ -│ │ │ │ ▼ │ │ -│ │ spawnLocalServer() │ ◀── REST ─│ global-sdk.tsx │ │ -│ │ └─ Hono @ 127.0.0.1: │ ── SSE ──▶│ SSE consumer │ │ -│ │ basic auth: opencode / UUID │ │ per-frame coalescer (~16ms)│ │ -│ │ CORS: oc://renderer │ │ │ │ │ -│ │ │ │ │ ▼ │ │ -│ │ ▼ │ │ Solid event bus │ │ -│ │ Effect runtime (in-proc) │ │ (keyed by directory) │ │ -│ │ └─ Bus (PubSub, fan-out) │ │ │ │ │ -│ │ SessionPrompt.Service │ │ ▼ │ │ -│ │ Permission.Service │ │ Per-view reactive stores │ │ -│ │ ToolRegistry.Service │ │ (SolidJS createStore) │ │ -│ │ LLM (Vercel AI SDK) │ │ │ │ │ -│ │ │ │ │ ▼ │ │ -│ │ ▼ │ │ SolidJS components │ │ -│ │ SQLite (Drizzle) │ │ │ │ -│ │ └─ opencode.db │ │ │ │ -│ │ MessageTable · PartTable · │ │ │ │ -│ │ SessionTable │ │ │ │ -│ │ │ │ │ │ -│ └──────────────────────────────────┘ └──────────────────────────────┘ │ -│ │ -│ ───────────────── transport between the two halves (loopback) ───────────────── │ -│ │ -│ (1) REST (write path) POST /session/:sessionID/message │ -│ body: PromptInput │ -│ returns: MessageV2.WithParts (final turn, synchronous) │ -│ │ -│ (2) SSE (read path) GET /event (10s server heartbeat, 15s client timeout) │ -│ data: { type: "message.part.updated", properties: {part}} │ -│ data: { type: "message.part.delta", │ -│ properties: { partID, field, delta } } │ -│ data: { type: "session.updated", ... } │ -│ data: { type: "lsp.updated", ... } │ -│ │ -└──────────────────────────────────────────────────────────────────────────────────────┘ -``` - -Two things to notice. First, there is no process boundary between `spawnLocalServer()` and the rest of main — the "server" is just another Effect layer inside the same V8 isolate, not a child process or a separate binary. Second, the renderer never uses IPC for chat: bootstrap goes over `contextBridge`, but all message traffic rides the loopback HTTP interface with the basic-auth credentials handed to it at startup. - -### Same thing as a Mermaid diagram - -```mermaid -flowchart LR - subgraph Proc["Electron (one Node / V8 process)"] - direction LR - - subgraph Main["Main"] - direction TB - Preload["preload/index.ts
contextBridge 'api'
awaitInitialization · storeGet/Set ·
dialogs · killSidecar"] - Server["spawnLocalServer()
Hono @ 127.0.0.1:rnd port
basic auth: opencode / UUID
CORS: oc://renderer"] - subgraph Runtime["Effect runtime (in-proc)"] - direction TB - Bus["Bus (PubSub, fan-out to SSE)"] - SP["SessionPrompt.Service"] - Perm["Permission.Service"] - Tools["ToolRegistry.Service"] - LLM["LLM (Vercel AI SDK)"] - end - DB[("SQLite (Drizzle)
opencode.db
MessageTable · PartTable ·
SessionTable")] - end - - subgraph Render["Renderer (SolidJS)"] - direction TB - WinApi["window.api
{ url, username, password }"] - Sdk["global-sdk.tsx
SSE consumer +
per-frame coalescer (~16ms)"] - EB["Solid event bus
(keyed by directory)"] - Stores["Per-view reactive stores
(createStore)"] - UI["SolidJS components"] - end - end - - Preload -. "IPC (bootstrap only)" .-> WinApi - WinApi --> Sdk - Sdk -->|"(1) REST  POST /session/:id/message"| Server - Server -.->|"(2) SSE  GET /event
message.part.updated
message.part.delta { field, delta }"| Sdk - Sdk --> EB - EB --> Stores - Stores --> UI - - Server --> SP - SP --> LLM - SP --> Tools - Tools --> Perm - SP --> Bus - SP --> DB - Bus -.-> Server -``` - -Solid arrows = request / command direction. Dotted arrows = server-pushed events or IPC bootstrap. The renderer's only IPC call is to pick up `{ url, username, password }`; everything chat-shaped rides REST and SSE over loopback. - -## Topology - -Three tiers, but only two processes: - -1. **Electron main** — `packages/desktop-electron/src/main/index.ts`. Manages windows, lifecycle, IPC. Also hosts the agent server *in-process*. -2. **Sidecar server** — `packages/desktop-electron/src/main/server.ts::spawnLocalServer()` imports the Hono server from a virtual module (`virtual:opencode-server`) and calls `Server.listen()`. Not a child process; same Node runtime as main. -3. **Renderer** — `packages/desktop-electron/src/renderer/index.tsx`. SolidJS app. Talks to the sidecar over HTTP with credentials from the preload bridge. - -Boundaries: - -- `packages/opencode/src/server/server.ts` — the Hono server definition. -- `packages/desktop-electron/src/preload/index.ts` — `contextBridge.exposeInMainWorld("api", …)`. The only IPC surface is *bootstrap* (`awaitInitialization` returns `{ url, username, password }`, plus store get/set, dialogs, clipboard, `killSidecar`). Chat itself never goes through IPC. - -Port is chosen dynamically (TCP port 0, OS picks). CORS on the server is restricted to the custom scheme `oc://renderer`. Main waits on `GET /global/health` in a ~100 ms poll loop before allowing the renderer to finish initialization (`main/index.ts` ~145-196). - -## Transport - -Two channels: - -- **REST (request/response)** — session CRUD, message listing, metadata, and *sending* a user message. The key endpoint is `POST /session/:sessionID/message` (`packages/opencode/src/server/routes/instance/session.ts` ~846-891). It validates via Zod, calls `SessionPrompt.Service.prompt()`, and returns the final `MessageV2.WithParts` object. The handler *does* use Hono's `stream()` helper, but only calls `stream.write(JSON.stringify(msg))` once, after the turn completes — so functionally it behaves like a non-streamed JSON body. Real-time updates come over SSE, not over this stream. - -- **SSE (server → client push)** — one global event stream at `GET /event` (`packages/opencode/src/server/routes/instance/event.ts`). Started once on app init in `packages/app/src/context/global-sdk.tsx` (~140). All state changes — message parts created/updated, session status, LSP, etc. — are published to the internal `Bus` and flushed to every connected SSE client. - -SSE headers worth noting (`event.ts` ~36-38): - -``` -Cache-Control: no-cache, no-transform -X-Accel-Buffering: no -``` - -A 10 s server-side heartbeat (~51-58) keeps proxies from killing idle connections. Client treats >15 s of silence (`global-sdk.tsx` ~111) as dead and reconnects. - -Effectively: **writes are REST, reads are one always-on SSE pipe**. There is no WebSocket, no tRPC, no polling. - -## Server runtime - -Built on **Effect** (Effect-ts) plus the **Vercel AI SDK**. Not Mastra, not a bespoke harness loop in the style of `packages/chat` in our repo. - -Key services (all Effect Layers): - -- `SessionPrompt.Service` — `packages/opencode/src/session/prompt.ts` (~80+). Exposes `prompt(input)` for a one-shot user→AI turn, `loop(input)` for the agentic multi-step loop, and defers cancellation to `SessionRunState.Service`. -- `ToolRegistry.Service` — tool definitions and dispatch. -- `Permission.Service` — gatekeeper for tool execution; emits events when approval is needed. -- `LLM` — wraps Vercel AI SDK for model calls. -- `Bus` — `packages/opencode/src/bus/index.ts`. A layer-based PubSub: one unbounded Effect `PubSub` per event type plus a wildcard channel. Every state change goes through it, and the SSE route subscribes to all. - -The flow from `POST /session/.../message` is: parse → resolve session → hydrate history from SQLite → kick `SessionPrompt.prompt()` → Effect runtime drives the agent loop → each state change publishes events on the `Bus` → SSE fan-out pushes them to every connected renderer → server finally returns the terminal message on the REST response. - -## Message / event model - -The canonical shape is `MessageV2` (`packages/opencode/src/session/message-v2.ts`): - -- Each message has `.info` (metadata) and `.parts[]`. -- Part kinds (the `type` discriminator on each part): `text`, `reasoning`, `file`, `agent`, `compaction`, `subtask`, `retry`, `step-start`, `step-finish`, `tool`, `snapshot`, `patch`. Tool calls and their results live on the single `tool` part, not a separate `tool_result` kind — the result is carried in a nested `state` field whose shape changes as the tool runs. - -Two event types carry updates: - -- **`message.part.updated`** — full `Part` object. Used for part creation, tool results, and final snapshots. -- **`message.part.delta`** (~602-611): - ```ts - { - type: "message.part.delta", - properties: { sessionID, messageID, partID, field, delta } - } - ``` - `field` is the part field being updated (typically `"text"`); `delta` is the string to append. This is how token streaming is expressed — not full text replacement, not a unified diff. Just `append(delta)` into `part[field]`. - -A typical assistant turn looks like: - -``` -message.part.updated { part: { id, type: "text", text: "" } } -message.part.delta { field: "text", delta: "Hello" } -message.part.delta { field: "text", delta: " world" } -message.part.updated { part: { id, type: "text", text: "Hello world" } } -``` - -No global sequence numbers, no per-session counters. SSE ordering is the ordering guarantee. There is no `replayEvents` RPC and no gap detection on the client. - -## Client state: SolidJS + event bus - -No Redux, no Zustand, no central store. The renderer uses: - -- **SolidJS** fine-grained reactivity with `createStore` for reactive objects. -- **Solid Query** (`@tanstack/solid-query`) for REST fetches (session lists, history hydration). -- **`@solid-primitives/event-bus`** for the event stream: `event.on(directory, listener)` / `event.listen(directory)` exposed from `global-sdk.tsx`. - -On app init: - -1. `useServer()` picks an active server (there can be several — the same renderer can attach to multiple). -2. `useGlobalSDK()` builds the SDK client and starts the SSE subscription. -3. SSE events are dispatched onto the event bus keyed by `directory` (project/worktree). -4. Session views subscribe to the bus for their directory and mutate reactive stores accordingly. -5. SolidJS picks up the reactive read and re-renders only the affected DOM nodes. - -The pattern is unusual but coherent: each UI screen is its own little reducer over the event stream, keeping its own local reactive store. There is no global chat state object. - -### Event coalescing - -`global-sdk.tsx` (~54-95) batches incoming events by key and flushes once per animation frame (~16 ms). For a given `partID` field, intermediate deltas may be discarded if a full `message.part.updated` arrives before flush (~170-172). This keeps the UI smooth under high-frequency LLM token streams but is a place where an overeager renderer will miss intermediate states. - -## Send flow, end-to-end - -1. **Compose.** SolidJS composer captures text, attachments, context. -2. **REST call.** `sdk.session.prompt({ sessionID, parts: [{ type: "text", text: "hello" }] })` → `POST /session/{sessionID}/message`. -3. **Server accepts.** `session.ts` handler validates, loads history from SQLite, calls `SessionPrompt.Service.prompt()`. -4. **Agent loop runs.** `LLM.stream()` via the Vercel AI SDK drives token generation. For each incremental chunk, the loop publishes `PartDelta` to the `Bus`. Tool execution publishes `PartUpdated` events with tool-result parts. -5. **Fan-out.** The SSE route (`/event`) is already subscribed to the Bus; each event is serialized as `data: {...}\n\n`. -6. **Renderer consumes.** A `for await (const event of events.stream)` loop in `global-sdk.tsx` dispatches to the Solid event bus. -7. **Local reducers.** Page-level stores (session view, message list) update themselves reactively. -8. **SolidJS renders.** Fine-grained reactivity means only the affected DOM node updates. -9. **REST returns.** The original `POST /message` resolves with the final `MessageV2.WithParts`. The UI usually already reflects this state from the SSE stream by the time REST returns. - -Note the ordering: the SSE stream is often *ahead of* the REST response. The REST call is effectively a synchronous "start this and wait for completion" with the real-time updates coming out-of-band. - -## Tool approvals and interrupts - -Approval *requests* ride the SSE bus; approval *replies* are a dedicated typed REST endpoint. - -- When a tool is about to run, the agent loop consults `Permission.Service`. If the user hasn't allowed this tool, a permission event is published to the Bus and the agent loop blocks on the Effect primitive waiting for the user's response. -- Client sees the event via SSE, shows inline UI or a dialog, and submits the reply via `POST /permission/:requestID/reply` (`packages/opencode/src/server/routes/instance/permission.ts`, operationId `permission.reply`). Body shape: `{ reply: Permission.Reply, message?: string }`. `Permission.Service.reply({ requestID, reply, message })` resolves the blocked Effect. -- `GET /permission` (operationId `permission.list`) lets the client enumerate all pending approvals across sessions. -- Cancellation: `SessionRunState.Service.cancel(sessionID)` flips a cancellation flag; the agent loop checks it at natural boundaries and exits. The renderer calls this via the session API. - -So the approval protocol is actually quite disciplined — it's a typed `(requestID, reply)` request/response, just carried on the REST side rather than in a single orchestration command stream like t3code's `thread.approval.respond`. - -## Persistence - -SQLite via Drizzle ORM. - -- Drizzle schema at `packages/opencode/src/session/session.sql.ts`. SQL table names are lowercase (`session`, `message`, `part`); the TS exports are `SessionTable`, `MessageTable`, `PartTable`. -- `MessageTable` columns: `id`, `session_id`, `data` (JSON, typed as `InfoData` — role/streaming/etc. live inside the JSON blob), plus `Timestamps`. -- `PartTable` columns: `id`, `message_id`, `session_id`, `data` (JSON, typed as `PartData` — the part kind lives inside the JSON blob), plus `Timestamps`. -- `SessionTable` carries richer metadata: `id`, `project_id`, `workspace_id`, `parent_id`, `slug`, `directory`, `title`, `version`, `share_url`, summary counters, `revert` snapshot, permission ruleset, plus `Timestamps`. -- Path: `$XDG_DATA_HOME/opencode/opencode.db` (defaults under `~/.local/share/opencode/`). -- On app start, main ensures the file exists; `JsonMigration.run()` initializes or upgrades the schema (`main/index.ts` ~140-217). -- History is loaded on demand: `Session.list()` for the sidebar, `Session.get(sessionID)` hydrates a session with all its parts joined. - -No event log, no projections, no replay. Once a `message.part.updated` commits, the prior deltas are forgotten. Deltas in flight during a crash are lost — the REST response is the commit boundary, not the individual SSE events. - -## Reconnect and resumability - -Minimal. The client simply reopens the SSE stream on failure: - -- Retry loop with 250 ms delay (`global-sdk.tsx` ~201). -- 15 s heartbeat timeout; if no server event in that window the client aborts and reconnects. -- On reconnect, the client re-fetches message history via REST — there is no "resume from sequence N" mechanism. If the server crashed mid-turn, the turn is just gone. - -This works because SQLite is the source of truth for completed messages and because users tolerate the occasional lost in-flight turn. It would not work in a collaborative setting where multiple clients share a session and need identical state. - -## Electron-specific wiring - -- **Preload** (`src/preload/index.ts`) exposes a minimal `api` via `contextBridge`. All chat traffic bypasses IPC; only bootstrap, settings, and platform features (dialogs, clipboard, notifications) cross the bridge. -- **Sidecar boot** (`src/main/server.ts::spawnLocalServer`, `src/main/index.ts` ~44-50): - - Random port via `getSidecarPort()` (OS-assigned). - - Server instance lives in the same V8 isolate as Electron main. - - Password: random UUID generated per launch; handed to renderer via preload. - - Health check: poll `GET /global/health` every ~100 ms until 200. -- **Custom protocol**: `oc://renderer` is registered and used as the sole allowed CORS origin. `opencode://` deep links are also registered for file associations and external launch URLs (`main/index.ts` ~114). -- **Shutdown**: `killSidecar` IPC handler triggers a graceful server stop on app quit. - -## Noteworthy patterns - -- **In-process sidecar.** Zero IPC overhead; the "server" is just another Effect layer running in the same process. Radically simple to deploy and reason about. Tradeoff: crash in main kills the runtime; no independent restart. -- **HTTP + single global SSE pipe.** One durable subscription for *all* state, filtered on the client by session/directory. Much simpler than per-resource subscriptions. -- **Delta events with a `field` selector.** Token streaming expressed as `append(delta)` into a named field on a part, rather than as token objects or diffs. The client's apply function is three lines. -- **Per-frame event coalescing (`~16 ms`).** Caps render cost regardless of server output rate. -- **Custom URL scheme as CORS origin.** Keeps the sidecar inaccessible from stray `http://localhost` origins even if the port leaked. -- **Effect-first codebase.** Services, DI, and error handling are all Effect; the same patterns compose from REST handlers down into LLM calls. - -## Things that are fragile (or that we'd do differently) - -- **No sequence numbers.** SSE ordering is the only ordering guarantee. A dropped event between heartbeats is silently lost; the client only self-heals by re-fetching history. For a local single-user app this is fine; for anything with multiple clients or mobile backgrounding it isn't. -- **No replay protocol.** Once the connection drops you refetch the whole session. Fine for small histories, rough for long agent sessions with many parts. -- **REST body is the final message.** The "send" call blocks until the turn completes. Any UI that doesn't already consume the SSE stream will look frozen during long turns. -- **Random-port + password in memory.** Elegant but means closing and reopening the window invalidates the credentials; everything downstream has to refetch them through the preload bridge. No way to share session with a second client process. -- **Delta coalescing can drop intermediate states.** If a full `message.part.updated` lands before the frame flush, pending deltas for that part are discarded. Usually what you want; occasionally surprising when debugging. -- **Approval transport is split across two channels.** Request events come over SSE; replies go over a separate REST endpoint (`POST /permission/:requestID/reply`). It's fully typed — not ad-hoc — but a client has to wire both sides independently, unlike t3code where request and reply are both commands/events on the same orchestration stream. - -## Signal for our rearchitecture - -Ranked by relevance to our current problem: - -1. **Single global event stream, scoped on the client.** Similar to t3code's shell-plus-detail split. For our workspace-scoped chat, this would become one host-service subscription the client routes by session/workspace. This is the transport shape we probably want. -2. **Delta-as-append.** `{ field, delta }` is dead simple and avoids both token-object complexity and unified-diff complexity. Compare to t3code's unified-diff stream — OpenCode's is considerably cheaper to implement. -3. **REST write + SSE read.** If we don't want tRPC subscriptions on WebSockets, this is a viable alternative: keep mutations as plain tRPC queries/mutations and open one SSE endpoint for events. Host-service already has Hono; adding an SSE route is trivial. -4. **In-process sidecar architecture.** Close cousin of our host-service direction. Ours is explicitly a separate process for reasons (multi-client, mobile, web parity), but the *ownership* story is the same: one runtime, multiple subscribers. -5. **What NOT to take.** The lack of sequence numbers and replay. This is the main thing t3code does better and the main thing we need given our desktop/web/mobile story and reconnect requirements. diff --git a/plans/t3code-chat-architecture-reference.md b/plans/t3code-chat-architecture-reference.md deleted file mode 100644 index 7c5da26975f..00000000000 --- a/plans/t3code-chat-architecture-reference.md +++ /dev/null @@ -1,379 +0,0 @@ -# T3 Code Chat Architecture — Reference Notes - -Research notes on how [T3 Code](./temp/t3code) implements its chat system. Written as background for the v2 chat transport rearchitecture (see `host-service-chat-architecture.md` and `chat-mastra-rebuild-execplan.md`). All file paths below are relative to `temp/t3code/` unless noted. - -## TL;DR - -T3 Code is **event-sourced**. The server owns an append-only, sequence-numbered event log in SQLite. Clients connect via an **Effect-native WebSocket RPC** (not tRPC), open two long-lived subscriptions, and apply each event to a **Zustand store**. User actions are commands that dispatch through the same RPC; their effects come back as events. On reconnect, clients detect sequence gaps and call `replayEvents(from, to)` to catch up. There is no polling. - -## Architecture diagram - -```mermaid -flowchart LR - subgraph Client["Client (web / electron renderer, React 19 + Zustand)"] - direction TB - UI["React components"] - Store["Zustand store
threadShell · messages ·
activities · turnDiffs"] - Recovery["Recovery coordinator
tracks latestSequence /
highestObservedSequence"] - UI <--> Store - Recovery -.-> Store - end - - subgraph RPC["Effect RPC over WebSocket (/ws)"] - direction TB - Cmd["dispatchCommand (req/res)"] - Replay["replayEvents(from, to) (req/res)"] - Shell["subscribeShell (stream)"] - Detail["subscribeThread(id) (stream)"] - end - - subgraph Server["apps/server (Node + Effect)"] - direction TB - Engine["OrchestrationEngine"] - Dedup["CommandReceipts
(idempotency by commandId)"] - Log[("SQLite event log
append-only · seq-numbered")] - Proj[("Projections
messages · activities ·
approvals · sessions · turns")] - Ingest["ProviderRuntime
Ingestion reactor"] - subgraph Providers["ProviderService
ProviderSessionStatus:
connecting · ready · running · error · closed"] - direction LR - Claude["ClaudeAdapter"] - Codex["CodexAdapter"] - Cursor["CursorAdapter"] - Op["OpenCodeAdapter"] - end - Engine --> Dedup - Engine --> Log - Log --> Proj - Engine --> Providers - Providers --> Ingest - Ingest --> Log - end - - Store -->|writes: commands| Cmd - Cmd --> Engine - Recovery -->|gap detected| Replay - Replay --> Log - Shell --> Store - Detail --> Store - Log -.->|fan-out| Shell - Log -.->|fan-out| Detail - Proj -.->|initial snapshot| Shell - Proj -.->|initial snapshot| Detail -``` - -Solid arrows = request / command direction. Dotted arrows = server-pushed events or snapshots. Note the two **independent** subscription streams (`subscribeShell` and `subscribeThread`): they both read from the same event log but the client treats them as disjoint writers into different regions of the store. - -### Same thing in plain text - -``` -┌──────────────── CLIENT (apps/web, apps/desktop renderer) ────────────────┐ -│ │ -│ React components │ -│ ▲ │ -│ │ │ -│ Zustand store (partitioned by which stream writes it) │ -│ ├─ threadShellById / sidebarThreadSummaryById ◀─ shell stream only │ -│ ├─ messageByThreadId / messageIdsByThreadId ◀─ detail stream only │ -│ ├─ activityIdsByThreadId ◀─ detail stream only │ -│ └─ turnDiffSummaryByThreadId ◀─ detail stream only │ -│ ▲ │ -│ │ │ -│ reducers: │ -│ applyEnvironmentOrchestrationEvent(state, event, env) per-event │ -│ syncServerThreadDetail(state, thread, env) full snapshot │ -│ ▲ │ -│ │ │ -│ createOrchestrationRecoveryCoordinator(...) │ -│ latestSequence · highestObservedSequence │ -│ classify each event: ignore / defer / recover / apply │ -│ │ -└──────┬────────────────────────────────────────────────────────────▲───────┘ - │ │ - │ commands events │ - ▼ │ - ┌─────────────── Effect RPC over WebSocket (/ws) ─────────────────────┐ - │ │ - │ (1) dispatchCommand req/res client → server │ - │ (2) replayEvents(from, to) req/res client → server (on gap) │ - │ (3) subscribeShell stream server → client (1 per conn) │ - │ (4) subscribeThread(id) stream server → client (per open) │ - │ │ - └────┬─────────────────────────────────────────────────────────▲───────┘ - │ │ - ▼ │ -┌──────────────────── SERVER (apps/server, Node + Effect) ──────────────────┐ -│ │ -│ ws.ts (Effect RPC handler) │ -│ │ │ -│ ▼ │ -│ OrchestrationEngine │ -│ ├─ CommandReceipts (idempotency by commandId) │ -│ │ │ -│ ├──▶ SQLite event log (append-only, global monotonic `sequence`) │ -│ │ ├─ thread.message-sent { streaming: bool } │ -│ │ ├─ thread.turn-start-requested │ -│ │ ├─ thread.turn-diff-completed { diff: unified diff } │ -│ │ ├─ thread.activity-appended (tool calls, errors, …) │ -│ │ ├─ thread.approval-response-requested │ -│ │ ├─ thread.user-input-response-requested │ -│ │ ├─ thread.session-set · thread.reverted │ -│ │ └─ project.* │ -│ │ │ │ -│ │ ▼ │ -│ │ Projections (materialized reads in SQLite) │ -│ │ projection_thread_messages · …_activities │ -│ │ projection_pending_approvals · …_thread_sessions │ -│ │ projection_thread_turns │ -│ │ │ -│ └──▶ ProviderService (per-thread ProviderSessionStatus: │ -│ connecting · ready · running · error · closed)│ -│ ├─ ClaudeAdapter │ -│ ├─ CodexAdapter │ -│ ├─ CursorAdapter │ -│ └─ OpenCodeAdapter │ -│ │ │ -│ ▼ │ -│ ProviderRuntimeIngestion │ -│ translates ProviderRuntimeEvent → orchestration events │ -│ and appends them back into the event log │ -│ │ -└────────────────────────────────────────────────────────────────────────────┘ -``` - -## Topology - -- `apps/server` — Node.js + Effect process. Single source of truth. -- `apps/web` — Vite/React client. -- `apps/desktop` — Electron shell around the web app. -- `packages/contracts` — shared schemas (events, commands, errors). -- `packages/client-runtime` — shared client wiring. - -Both clients (web + desktop renderer) talk to the server the same way: one WebSocket to `/ws`. The server can back any number of concurrent clients on the same environment. - -## Transport: Effect RPC over WebSocket - -Not tRPC. T3 uses `effect/unstable/rpc` with `effect/unstable/socket/Socket`. - -- Client: `apps/web/src/rpc/wsTransport.ts`, `apps/web/src/rpc/protocol.ts` (WS endpoint `/ws`, line ~46). -- Server: `apps/server/src/ws.ts`. -- Connection lifecycle / backoff: `apps/web/src/rpc/wsConnectionState.ts`. - -The transport exposes two shapes: - -1. **Request/response** — used for all write paths (e.g. `dispatchCommand`, `replayEvents`, `thread.approval.respond`). -2. **Subscriptions** — hot streams the server pushes to. Two are used in practice: `subscribeShell` and `subscribeThread(threadId)`. - -Writes are commands the client pushes; reads are event streams the server pushes. The client never polls. - -## Server runtime - -Chat is agent-agnostic. The server does not use Mastra or the Vercel AI SDK. Instead, each supported agent has an adapter implementing `ProviderAdapter`: - -- `apps/server/src/provider/Services/ProviderAdapter.ts` (interface) -- `apps/server/src/provider/Services/ClaudeAdapter.ts` -- `apps/server/src/provider/Services/CodexAdapter.ts` -- `apps/server/src/provider/Services/CursorAdapter.ts` -- `apps/server/src/provider/Services/OpenCodeAdapter.ts` - -`ProviderService` (`apps/server/src/provider/Services/ProviderService.ts`) is a cross-provider facade. Per thread it owns a `ProviderSession` whose status is one of `connecting | ready | running | error | closed` (see `ProviderSessionStatus` in `packages/contracts/src/provider.ts` ~26-32). A command like `thread.turn.start` calls `sendTurn()` on the chosen provider, which runs the agent and produces a stream of `ProviderRuntimeEvent`s. Those are ingested and turned into orchestration events (see next section). - -## The event log - -The central abstraction. Every state change — user message, assistant message, approval request, approval response, tool call, plan upsert, session state transition, revert — is a single `OrchestrationEvent`: - -```ts -// EventBaseFields — shared by every event in the union -{ - sequence: NonNegativeInt // monotonic, global - eventId: EventId - aggregateKind: "project" | "thread" - aggregateId: string - occurredAt: IsoDateTime - commandId: CommandId // the command that produced this event - causationEventId: EventId | null // event that caused this one (if chained) - correlationId: CorrelationId // groups a whole causal chain - metadata: { providerTurnId?, adapterKey?, ingestedAt?, requestId?, providerItemId?, ... } -} -// ...plus a type-specific discriminator and payload fields per event variant. -``` - -Schema: `packages/contracts/src/orchestration.ts` — `EventBaseFields` at ~945-955, event union at ~957+. Events are tagged structs (Effect `Schema.TaggedStruct` per variant), not a single `{ type, payload }` shape. There is no explicit `actor` field — the originating actor is inferred from `commandId` / `metadata`. - -Key event types: - -- `thread.message-sent` — user or assistant message added (assistant emits with `streaming: true` first, then `false` on completion). -- `thread.turn-start-requested` — turn initiated. -- `thread.turn-diff-completed` — streaming content delivered as a unified diff (see §streaming). -- `thread.activity-appended` — tool calls, errors, setup-script activity, etc. -- `thread.approval-response-requested` / `thread.approval.respond` — tool-call approvals. -- `thread.user-input-response-requested` / `thread.user-input.respond` — structured user input. -- `thread.proposed-plan-upserted` — plan generation. -- `thread.session-set` — FSM transitions. -- `thread.reverted` — revert to a checkpoint. -- Project events: `project.created`, `project.meta-updated`, `project.deleted`. - -Events are **immutable** and append-only. Sequence numbers are globally monotonic, not per-session. This makes cross-thread ordering straightforward and replay trivial. - -## Persistence - -SQLite. One event table plus a handful of projection tables that are derived from it for fast reads: - -- Event store interface: `apps/server/src/persistence/Services/OrchestrationEventStore.ts`. -- Projections: - - `projection_thread_messages` - - `projection_thread_activities` - - `projection_pending_approvals` - - `projection_thread_sessions` - - `projection_thread_turns` - -On startup (`serverRuntimeStartup.ts`) the server replays the log to rebuild the read models in memory. The shell stream is served from these projections so a freshly-connected client gets the computed sidebar state in one shot. - -## Client state: Zustand + two streams - -Store: `apps/web/src/store.ts` (~2k lines; state shape at ~40-90). - -```ts -interface EnvironmentState { - // Sidebar / session-level — written by shell stream - threadShellById: Record - sidebarThreadSummaryById: Record - - // Per-thread content — written only by detail stream - messageIdsByThreadId: Record - messageByThreadId: Record> - activityIdsByThreadId: Record - proposedPlanIdsByThreadId: Record - turnDiffSummaryByThreadId: Record> - - bootstrapComplete: boolean -} -``` - -The client runs two independent subscriptions: - -- **Shell stream** (`subscribeShell`) — one per connection. Broadcasts session state, sidebar summaries, and pending flags for every thread in the environment. Cheap and always-on. -- **Detail stream** (`subscribeThread(threadId)`) — one per currently-open thread. Delivers the full per-thread payload (messages, activities, turn diffs). - -Convention (documented in the store-file architecture comment): only the detail stream writes to per-thread content fields, only the shell stream writes to sidebar summary fields. Both may write to `threadShellById` / `threadSessionById`, but writes go through `writeThreadState()` which does structural equality to avoid redundant re-renders. - -This split is the thing that kills the race-condition class our current design has. The two streams don't fight for the same state, and cross-arriving events from the wrong stream are ignored by convention. - -Reducers: - -- `applyEnvironmentOrchestrationEvent(state, event, environmentId)` — per-event reducer for the shell stream. -- `syncServerThreadDetail(state, thread, environmentId)` — full-snapshot reducer for the detail stream; used on initial subscribe and on sequence-gap recovery. - -## Message send flow, end-to-end - -```mermaid -sequenceDiagram - autonumber - participant User - participant Store as Client store - participant WS as WebSocket RPC - participant Engine as OrchestrationEngine - participant Log as Event log + projections - participant Prov as Provider adapter - - User->>Store: submit message - Store->>Store: optimistic user message - Store->>WS: dispatchCommand(thread.turn.start, commandId) - WS->>Engine: command - Engine->>Engine: dedup by commandId - Engine->>Log: append thread.turn-start-requested - Log-->>WS: fan-out event - WS-->>Store: shell + detail apply event - Engine->>Prov: sendTurn() - loop streaming - Prov->>Engine: ProviderRuntimeEvent - Engine->>Log: append turn-diff-completed / activity-appended - Log-->>WS: fan-out - WS-->>Store: apply to turnDiffs / activities - end - Prov->>Engine: turn complete - Engine->>Log: append thread.message-sent (streaming:false) - Log-->>WS: fan-out - WS-->>Store: finalize assistant message - Store-->>User: render -``` - -Primarily in `apps/web/src/components/ChatView.tsx` (around 2610+). - -1. **Compose.** Client generates `messageId` and `commandId` (`newCommandId()`), inserts an optimistic user message into the store immediately. -2. **Dispatch.** RPC call `api.orchestration.dispatchCommand({ type: "thread.turn.start", threadId, message: { messageId, text, attachments }, modelSelection, titleSeed, runtimeMode, interactionMode, bootstrap?, createdAt })`. Note `messageId` / `text` / `attachments` are nested under `message`, not top-level on the command. -3. **Server accepts.** `apps/server/src/ws.ts` (~548) validates, deduplicates by `commandId` (see `OrchestrationCommandReceipts`), handles bootstrap (thread creation, worktree setup) if present, emits `thread.turn-start-requested`, routes to the provider adapter. -4. **Provider runs.** Adapter emits `ProviderRuntimeEvent`s. The ingestion reactor translates them into orchestration events: `thread.message-sent` (streaming), `thread.turn-diff-completed` (diffs), `thread.activity-appended` (tool calls), etc. -5. **Publish.** Events are appended to the event store, projections updated, pushed to all subscribers. -6. **Client applies.** Detail subscription gets the stream first (it's already open for the focused thread). Shell subscription updates the sidebar summary shortly after. -7. **Complete.** A final `thread.message-sent` with `streaming: false` is the authoritative terminal state for the assistant turn. - -If `dispatchCommand` fails the optimistic message is rolled back and the composer is restored. - -## Streaming assistant output - -`thread.turn-diff-completed` carries a **unified diff** against the in-progress turn, not token deltas or full snapshots. Schema: `ThreadTurnDiff` in `packages/contracts/src/orchestration.ts` (~1100). - -```ts -ThreadTurnDiff = TurnCountRange.mapFields(Struct.assign({ - threadId: ThreadId, - diff: Schema.String, // unified diff -})) -``` - -The client accumulates diffs into `turnDiffSummaryByThreadId`. When the authoritative `thread.message-sent` with `streaming: false` lands, that becomes the source of truth and the in-progress diff buffer is reconciled. - -The diff format keeps the wire size bounded even for long responses, which matters because the event log persists every event. - -## Tool calls, approvals, interrupts - -All in the same event stream. No side channels. - -Agent-initiated request → `thread.approval-response-requested` or `thread.user-input-response-requested` event. Client derives pending-approval UI via `derivePendingApprovals` in `apps/web/src/session-logic.ts`. - -User responds: - -```ts -dispatchCommand({ - type: "thread.approval.respond", - threadId, - requestId, - decision: "accept" | "decline" | "acceptForSession" | "cancel", -}) -``` - -`ProviderService.respondToRequest()` routes the answer back to the adapter, which unblocks the agent or fails the turn. - -Cancellation is `thread.turn.interrupt` → the provider session stops and emits `thread.session-stop-requested`. - -## Reconnect and replay - -Handled by `createOrchestrationRecoveryCoordinator` (a factory, not a class) in `apps/web/src/orchestrationRecovery.ts` (~88+). It returns a coordinator object that owns the recovery state. - -Client tracks two cursors: - -- `latestSequence` — highest sequence successfully applied. -- `highestObservedSequence` — highest sequence seen (may be ahead if events arrive out of order across the two streams). - -Every incoming event is classified as `ignore | defer | recover | apply`. If a gap is detected, the coordinator calls the RPC `replayEvents(fromSequence, toSequence)` to fetch the missing slice, applies it, and drains the deferred queue. - -Commands are idempotent by `CommandId` via the server-side `OrchestrationCommandReceipts` table, so retries on reconnect don't duplicate effects. - -This is the pattern that lets t3 be rude with the network and still be correct. - -## Things worth stealing - -- **Single monotonic sequence per environment.** Makes gap detection a subtraction. -- **Command IDs with server-side dedup.** Retries are free. -- **Dual stream (shell + detail) with a written-by-who convention.** Removes the race between session-level state and per-thread content that `getDisplayState()` + `listMessages()` causes today. This is the single highest-value idea. -- **Diff-based streaming in the event log.** Bounded wire size, full auditability. -- **Projections as a pattern, not just a perf trick.** Keeps the client's initial render cheap without coupling clients to the log shape. -- **One event type for approvals / questions / tool calls.** Not a side channel. -- **Explicit provider status (`connecting | ready | running | error | closed`).** Makes "is the agent running" a boolean derived from one field, not an inference across two polls. -- **Causation + correlation IDs on every event.** `causationEventId` chains events back to the event that spawned them; `correlationId` groups a whole causal chain (e.g. a turn). Useful for debugging and for ordering beyond bare sequence numbers. - -## Things to approach carefully - -- **Effect RPC.** Nice ergonomics for t3, but we're already tRPC-shaped. Porting the *patterns* (subscriptions, sequenced events, replay RPC) to tRPC subscriptions over WS gets us 90% of the value without switching RPC systems. -- **Event-sourced everything.** t3 pays a persistence cost on every state change. For us, only the *transport* race needs fixing; whether the chat store becomes fully event-sourced on disk is a separate question from whether the wire protocol is event-driven. -- **Global sequence vs per-session sequence.** Global is cleaner for multi-thread clients (sidebars), but per-session is simpler to implement on top of the existing harness subscription. Pick one and commit. -- **Unified-diff streaming format.** Clever, but requires a diff library on client and server and adds complexity vs. "emit a `message_updated` event with latest full content." Worth it only if we care about wire size for very long turns.