diff --git a/docs/plans/mcp-multi-org-and-execute.md b/docs/plans/mcp-multi-org-and-execute.md new file mode 100644 index 000000000..272f8fc19 --- /dev/null +++ b/docs/plans/mcp-multi-org-and-execute.md @@ -0,0 +1,332 @@ +# MCP multi-org + `execute`/`search`: addendum to the search-execute design doc + +Extends `docs/mcp-search-execute-design-doc.md` (owletto proper, status "Planned, not yet implemented") with two scopes the original didn't fully land: (1) cross-org addressing inside `execute`, and (2) the full frontend + UX surface the new tools imply. Language decision: **TypeScript over a typed `ClientSDK` in `isolated-vm`** — reviewed by a second and third opinion (codex, pi), both concurred. Bash-as-primary was evaluated and rejected because reactions are the real workload and shell quoting degrades stored user code. + +Target repo for implementation: `packages/owletto-backend` + `packages/owletto-web` in the `lobu` monorepo. The owletto repo is deprecated. + +## Decisions locked in + +- `execute` runtime: `isolated-vm` V8 isolate (not bash, not node:vm, not subprocess). +- `execute` authoring language: TypeScript compiled via esbuild, same path as today's reaction scripts. +- Cross-org addressing: `client.org(slugOrId)` accessor returning a proxy SDK bound to a re-validated `ToolContext`. No per-tool `org_slug` parameter. +- Access level for `execute`: `write` (member-tier), not `admin`. Per-call `checkToolAccess` on every SDK method is the actual gate. +- `search` + `execute` exposed on **both** scoped (`/mcp/{slug}`) and unscoped (`/mcp`) endpoints. Scoped is the default for Claude/Cursor connectors today. +- `list_organizations` + `switch_organization` exposed on scoped endpoints too. Non-script users and read-tier members still need the serial-hop path. Rename `join_organization` → drop it (semantically it's a switch when the user is already a member, and a no-op entry when they're not). + +## Why TypeScript, short version + +Bash + a CLI was seriously considered. Rejected on four axes: + +1. **Reactions are stored, deferred user code.** Picking bash means stored reactions inherit shell quoting, pipe-failure semantics, `jq` shape drift, CLI version skew, and re-auth overhead — all as part of the durable product surface. Typed TS with TypeBox `Value.Errors` gives the model field-level repair signal. +2. **Multi-org efficiency.** One SDK client holds session context, caches membership, and reuses an auth handshake across N orgs. Bash turns a cross-org walk into N CLI invocations with N auth handshakes and N JSON parses. +3. **Runtime compounding.** Reactions today are TS source compiled by esbuild, stored in DB. `execute` and reactions sharing one SDK + one sandbox collapses two runtimes into one. Splitting them would cost a second sandbox forever. +4. **Agent fluency is an affordance problem.** LLMs write bash more natively than typed SDKs, but LLMs also repair typed errors far faster than shell errors. Solve fluency with tiny authored surface (one global `client`, top-level `await`, plain objects) and `search` returning copy-pasteable signatures — not by changing language. + +## Cross-org SDK: the `client.org()` accessor + +Today's `ClientSDK` is built once per request with a fixed `toolCtx.organizationId`. The cross-org accessor returns a proxy bound to a fresh `ToolContext`: + +```ts +export default async (ctx, client) => { + const orgs = await client.organizations.list(); // user's memberships + public orgs they can read + const buremba = orgs.find(o => o.slug === 'buremba'); + if (!buremba) throw new Error('buremba not found'); + const watchers = await client.org(buremba.id).watchers.list({ template: 'reddit' }); + return watchers.filter(w => w.status !== 'active' || w.pending > 0); +}; +``` + +Contract: + +- `client.org(slugOrId)` returns `ClientSDK`. Accepts slug or UUID. First call per `(userId, orgId)` tuple verifies membership against `member`; subsequent calls within the same isolate hit an in-process LRU cache keyed by `(userId, orgId)` with a 30s TTL. +- Membership lookup populates `memberRole` (`owner | admin | member`) into the swapped `ToolContext`. Public-visibility orgs the user isn't a member of return a `memberRole: null` context, and the existing `isPublicReadable(toolName, args)` path in `src/auth/tool-access.ts` gates writes. +- Non-member on a private org: `.org()` throws `AccessDenied` synchronously, before any SDK method dispatches. +- `client.organizations.{list, current}` — new SDK namespace. `list` wraps `listOrganizations`; `current` returns the session's default org. +- The `ctx` passed to `execute` scripts carries `organization_id` = the session's default org (pinned URL or last `switch_organization`). `client` with no `.org()` call uses that same default. + +Authz invariants preserved: + +- Every SDK method still fires `checkToolAccess(toolName, args, ctx)`. The org swap changes `ctx`, never bypasses the check. +- Membership is re-verified on each `.org()` call, not cached across calls for >30s. A script that calls `.org(X).entities.create()` after membership was revoked mid-script fails on the second call. +- Public-workspace scripts (`role: null` on session default) can read but never write, same as today's tool surface. + +## `execute` access level: write, not admin + +The original design doc says `getRequiredAccessLevel('execute') = 'admin'`. Flip to `write`. Rationale: + +- Per-method access checks already exist in the SDK dispatch. A member running `execute` can call any method they could call as a direct tool — composition does not create new authorization. +- An `admin`-only gate on `execute` would force members onto the aggregation-tool path we're explicitly killing. Members either deserve scripted composition or they don't. +- Read-tier sessions (no `mcp:write` scope or no member role) still can't call write methods inside a script — the first write attempt fails mid-execution with a typed `AccessDenied`. Partial side effects already committed remain committed; no two-phase rollback. +- Bare-minimum entry gate: `execute` requires authentication. Read-tier sessions can run read-only scripts. Write-tier runs anything their per-method access permits. + +Public-workspace callers (`role: null`) can run `execute` but every write method throws. `search` is read-only and available to everyone. + +## Scoped-endpoint UX fix: expose org tools everywhere + +Drop the "org-switching tools only on /mcp" rule in `src/tools/execute.ts` (`ORG_AGNOSTIC_TOOLS`). Expose `list_organizations` + `switch_organization` on `/mcp/{slug}` too. Reasons: + +- On a scoped URL the default org is the pinned one, but nothing is actually at risk by letting the user list memberships or switch mid-session. The pin is ergonomic, not a hard wall. +- Drop `join_organization` entirely. For already-authenticated users on a scoped URL, "join" is a misnomer — they're either already a member (no-op), or not (should fail with a "not a member" error, identical to `switch_organization`'s behavior). One tool, one semantic. + +Session-resume behavior (`src/mcp-handler.ts` line 356) still rejects cross-scope recovery (scoped ↔ unscoped mismatch). Unchanged — that's correct. + +## Authoring affordances + +These make "LLMs write bash more fluently than typed SDKs" a non-concern: + +- **Tiny surface in authored scripts.** `export default async (ctx, client) => { ... }`. One `client` global. No imports. Top-level `await` supported via esbuild's `format: 'esm'` wrapper. Plain objects/arrays. No classes, no decorators, no framework ceremony. +- **`search("ns.method")` returns signature + copy-pasteable example.** The design doc already specifies inline TypeBox-derived signatures. Extend each method's metadata with a minimal example literal: + + ```ts + // Example: + // const w = await client.watchers.list({ entity_id: 42, status: 'active' }); + ``` + + Stored in `src/sandbox/method-metadata.ts` next to the summary/throws annotations. +- **Structured errors keyed for repair.** TypeBox `Value.Errors` surface as: + + ```ts + { name: 'ValidationError', method: 'watchers.create', + fields: [{ path: 'extraction_schema', expected: 'object', got: 'string', + example: { type: 'object', properties: { ... } } }] } + ``` + + The `example` field on validation errors nudges the model to the right shape on retry. +- **Dry-run first-class.** `client.watchers.testReaction` already exists in the design doc. Add `execute` dry-run mode too: `{ script, dry_run: true }` runs under the same write-interception wrapper that reactions use, returning the `would_have` list without committing. Cost is one wrapper branch, same SDK. + +## Frontend plan + +Two new surfaces in `packages/owletto-web`, plus two upgrades. + +### New: `/[owner]/tools/execute` — script console + +A first-class execute + search page. Inspired by SQL console patterns; no equivalent exists today. + +- Monaco editor with TypeScript language mode. Seeded with the standard preamble: `export default async (ctx, client) => {`. +- Inline signature panel on the right — driven by the `search` tool. Search box + namespace tree. Selecting a method injects its example into the editor at cursor. +- Org selector dropdown (top of page) — defaults to session org, switches the default `ctx.organization_id` for the run. Independent of the `client.org()` in-script accessor (which overrides per-call). +- "Dry-run" button (writes intercepted, surfaced as `would_have` list) and "Run" button. Results pane below with structured JSON output, `logs` array, `error` with line/col mapping back to user source. +- Visible run history (last 20 per org) — click to reload a script. Stored per-user in localStorage initially; DB-backed later if needed. + +Files: +- `src/app/[owner]/tools/execute/page.tsx` (new route) +- `src/components/tools/execute-console/{editor,signature-panel,results-pane,run-history}.tsx` +- `src/hooks/use-execute.ts` → POSTs `{ script, dry_run, org_slug }` to `/api/mcp/execute` (internal proxy to the backend's MCP `execute` tool). + +### New: `/[owner]/settings/organizations` — org membership + invites + +Today org CRUD is a dropdown overlay. A dedicated page is needed for cross-org work: + +- Tab "Members" — list members of the current org with roles. +- Tab "Invites" — pending `invitation` rows sent to this user's email. Accept/decline. +- Tab "Your Organizations" — flat list of all orgs the user belongs to with direct-link switch. +- Tab "Delete" (owners only). + +Files: +- `src/app/[owner]/settings/organizations/page.tsx` +- `src/components/settings/organizations/{members,invites,my-orgs,delete}-tab.tsx` + +### Upgrade: watcher reaction editor + +Today: plain `