Skip to content

docs(plans): multi-org execute + search MCP tool plan#346

Closed
buremba wants to merge 1 commit into
mainfrom
docs/mcp-multi-org-plan
Closed

docs(plans): multi-org execute + search MCP tool plan#346
buremba wants to merge 1 commit into
mainfrom
docs/mcp-multi-org-plan

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented Apr 24, 2026

Summary

  • Addendum to docs/mcp-search-execute-design-doc.md covering cross-org addressing inside execute, frontend surface, edge cases, and a 5-PR land order.
  • Locks in: TypeScript + isolated-vm (not bash), client.org() accessor, execute as write-tier (not admin-only), scoped endpoints expose list_organizations + switch_organization, new /[owner]/tools/execute console + org settings page, Monaco on the reaction editor, new execute_invocation audit table.
  • Language decision was vetted by two independent second opinions (codex + pi), both picked TypeScript with the same two pivots: reactions are the real workload; agent fluency is an affordance problem.

Test plan

  • Read the plan end-to-end for scope/sequencing issues
  • Flag any edge case that should be BLOCKER instead of GUARD (or vice versa)
  • Confirm the 5-PR split matches your preferred land order
  • Sign off on the execute = write access-level change before PR-2 lands

No code changes; doc-only.

Addendum to docs/mcp-search-execute-design-doc.md covering (1) the
client.org(slugOrId) cross-org accessor, (2) execute access-level flip
from admin to write with per-call checks as the real boundary, (3)
scoped-endpoint exposure of list/switch_organization (drop
join_organization), (4) authoring affordances for LLM fluency, (5) a
new /[owner]/tools/execute console + /[owner]/settings/organizations
page + Monaco upgrade for the reaction editor, and (6) an
execute_invocation audit table. Five-PR land order with validation.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

buremba added a commit that referenced this pull request Apr 24, 2026
…folding

Plan content was originally on docs/mcp-multi-org-plan (PR #346) but
that PR was overhead for a solo land — folding the doc into the same
PR as its first implementation step. PRs 2–5 will reference this file.
@buremba
Copy link
Copy Markdown
Member Author

buremba commented Apr 24, 2026

Folded into PR #348 — separate plan PR was overhead for a solo land. Plan doc now lives at docs/plans/mcp-multi-org-and-execute.md on the implementation branch.

@buremba buremba closed this Apr 24, 2026
buremba added a commit that referenced this pull request Apr 26, 2026
* feat(owletto-backend): ClientSDK + sandbox scaffolding for execute/search

Lands PR-1 of docs/plans/mcp-multi-org-and-execute.md: the in-process SDK
surface, per-namespace delegations, membership cache, and isolated-vm
runner skeleton. No new MCP tools yet — execute and search land in PR-2.

- `src/sandbox/client-sdk.ts` — buildClientSDK + .org() accessor with
  per-call membership re-validation. AccessDenied/OrgNotFound errors
  surface the exact org and reason.
- `src/sandbox/membership-cache.ts` — small LRU (cap 128, TTL 30s) keyed
  on (userId, slug-or-id). Catches revocations within the TTL while
  keeping cross-org walks cheap.
- `src/sandbox/namespaces/*.ts` — 11 namespaces (entities, entitySchema,
  connections, feeds, authProfiles, operations, watchers, classifiers,
  viewTemplates, knowledge, organizations). Thin delegations over
  existing admin handlers; per-call authz runs inside the handlers.
- `src/sandbox/method-metadata.ts` — summary/access/example keyed by SDK
  path. Seeded for PR-1; PR-2 adds CI coverage check.
- `src/sandbox/typebox-to-signature.ts` — TypeBox → inline TS signature
  formatter used by the `search` tool.
- `src/sandbox/run-script.ts` — isolated-vm loader + limits shape.
  Native module is an optionalDependency so bun install succeeds on
  platforms without a prebuild; Dockerfile adds python3 + build-essential
  so production builds work.

Tests: 25 unit tests (bun:test) cover membership cache semantics,
metadata shape, typebox formatter, and run-script loader fallback.
Integration test (vitest) for .org() exercises real org/member tables
with AccessDenied/OrgNotFound/public-org/LRU-shortcircuit/revocation
paths.

* fix(owletto-backend): audit ClientSDK against handler schemas + consolidate cache

Addresses PR-1 review from Pi and my own gap analysis.

- Namespace payload alignment against real handler TypeBox schemas:
  * watchers.delete → watcher_ids[]; setReactionScript → reaction_script;
    completeWindow → window_token. watcher_id now string throughout.
  * entitySchema.* uses plain `slug`; addRule takes
    source/target_entity_type_slug + relationship_type_slug.
  * operations.execute.watcher_source.window_id is a number.
  * authProfiles keyed by slug, not numeric id; connections.connect takes
    auth_profile_slug; feeds.create requires feed_key.
  * viewTemplates.resource_id accepts string | number; classifiers uses
    classifier_id for CRUD, classifier_slug for classify.
- Slug-vs-id resolution now tries slug first, falls back to id on miss.
  Drops the ID_RE regex that false-positived 20+ char slugs.
- Consolidated on MultiTenantProvider's memberRoleCache (explicit
  invalidation on member-table writes). Separate sandbox LRU removed.
  Exposes getCachedMembershipRole / getCachedOrgBySlug / getOrgById.
- client.query() drops the dead `params` arg — validateAndScopeQuery
  rejects user positional parameters and the old wrapper's merge was
  unsafe. Metadata example updated.
- AccessDeniedError + OrgNotFoundError share a SdkError base.
- method-metadata.ts covers every namespace method; the unit test
  asserts coverage so drift shows up in CI.
- New integration test that dispatches real read-path methods through
  every namespace (catches the class of bug that static casts hid).
- vitest.config.ts + migrations-dir walk-up so `OWLETTO_TEST_BACKEND=pglite`
  actually runs the integration suite. PGlite-incompatible handler queries
  (sql template-fragment composition, prepared-statement mismatches)
  run under real Postgres via a small pgOnlyIt helper.

* fix(owletto-backend): remaining namespace contract drift + serial integration runs

Second Pi review caught four more wrapper lies:

- authProfiles.create/update now mirror the handler: `profile_kind` (not
  `auth_type`), `display_name` required on create, `credentials` +
  `auth_data` instead of a generic `config`.
- viewTemplates.set exposes a single `json_template` (which may nest
  `data_sources`) instead of two separate fields; rollback takes
  `version` (a number) not `version_id`.
- connections.updateConnectorAuth uses `auth_values` (the handler
  contract), not `auth_config`.
- connections.installConnector drops the required `connector_key` —
  handler picks from `source_url` / `source_uri` / `source_code` /
  `mcp_url` and accepts optional `auth_values`.
- Added app_auth_profile_slug + config to connections.{create, connect, update}
  for parity with the handler.

vitest.config.ts: force singleFork pool so the two integration files
can't stomp each other's fixtures via cleanupTestDatabase when run
in parallel.

* fix(owletto-backend): authProfiles.create uses `slug` (not `auth_profile_slug`)

Third-pass Pi review caught the asymmetry: the handler's create_auth_profile
action reads `args.slug` to set the new profile's stable identifier, while
get/test/update/delete read `auth_profile_slug` to identify an existing
profile. Wrapper now mirrors the handler verbatim:

- create: takes optional `slug` + `display_name` + `profile_kind` +
  `credentials`/`auth_data`/`requested_scopes`.
- update: takes `auth_profile_slug` (required, identifies the row) + all
  the mutable fields, including optional new `slug` for renames.
- get/test/delete: positional `auth_profile_slug`.
- list: adds connector_key/provider/profile_kind filters for parity.

Without this fix, `client.authProfiles.create({ slug: 'primary' })` would
silently auto-derive the slug from display_name, and follow-up
get('primary') would miss. Typed contract now matches runtime.

* chore(owletto-backend): drop unused env arg from organizations namespace

Pi final-pass nit. Per the repo's convention
(AGENTS.md §"When fixing unused-parameter errors, delete the parameter
rather than prefixing with `_`"), drop `_env` from
buildOrganizationsNamespace. Call site in client-sdk.ts adjusted.

* docs(plans): land multi-org execute + search plan alongside PR-1 scaffolding

Plan content was originally on docs/mcp-multi-org-plan (PR #346) but
that PR was overhead for a solo land — folding the doc into the same
PR as its first implementation step. PRs 2–5 will reference this file.

* chore(owletto-backend): remove dead getCurrentOrgInfo helper

No callers anywhere — was added speculatively for the search tool's
preamble and the web console's header chip (PR-2/PR-5 territory). Drop
it now; it'll come back when there's an actual call site to motivate
its return shape.

Also: tried collapsing the per-namespace XxxNamespace interfaces into
ReturnType<typeof buildXxx> in client-sdk.ts to save ~250 lines, but
TypeScript strict mode requires those interfaces to type the inline
arrow params inside each builder. The interfaces are load-bearing, not
ceremonial. Keeping them.

* fix(deps): regenerate bun.lock with packages/owletto-web submodule initialized

CI's `bun install --frozen-lockfile` step failed because the lockfile
was committed without the submodule deps. Re-runs with the submodule
checked out produce 543 added lock entries (mostly owletto-web's
React/Tailwind tree). No source changes.

* fix(owletto-backend): escape backslashes in typebox-to-signature literal output

CodeQL high-severity flag on src/sandbox/typebox-to-signature.ts:104:
formatLiteral escaped single quotes but not backslashes. A string like
\`path\\to\\file\` would emit \`'path\\to\\file'\`, which is an invalid
TS string literal (\\t is a tab, \\f is a form feed). A trailing
backslash produced an unterminated literal \`'\\'\`.

Fix order: escape backslashes first (so the \\\\ they become don't
collide with the \\' produced by the next pass), then escape single
quotes. Two regression tests cover backslash + quote paths.

* feat(owletto-backend): wire execute + search MCP tools, swap reactions to ClientSDK

PR-2 of docs/plans/mcp-multi-org-and-execute.md. Folded into the same
PR as PR-1 (the SDK scaffolding) since neither delivers user-visible
functionality alone.

External MCP surface — what changes for clients
-----------------------------------------------
- `execute(script)`: run TypeScript over the typed `ClientSDK`. The
  `client` global exposes namespaces (entities, watchers, knowledge,
  organizations, …) plus `client.org(slug)` for cross-org calls,
  `client.query(sql)` for read-only SQL, `client.log(...)`. Per-method
  authz fires inside the SDK; entry-point gate is write-tier.
- `search(query)`: discover SDK methods. Namespace listing for "watchers",
  drill-down with signature/throws/example for "watchers.create".
- `list_organizations` + `switch_organization`: now exposed on both
  unscoped /mcp and scoped /mcp/{slug} endpoints.
- Removed from external surface: the 14 `manage_*` tools,
  `list_watchers`, `get_watcher`, `read_knowledge`, `join_organization`.
  Their handlers are kept and re-registered as `internal: true` so the
  in-process test harness can still dispatch them by name without forcing
  every test to migrate to script-based calls. Production agents go
  through the SDK.

Implementation
--------------
- `src/sandbox/run-script.ts`: replaces the PR-1 stub with the real
  `isolated-vm` bridge. Compiles TS via the existing esbuild pipeline,
  builds a single async dispatch reference, reconstructs `client` guest-
  side via Proxy, captures structured logs + return value, enforces
  memory/CPU/quota/output limits, maps errors to TimeoutError /
  QuotaExceeded / OutOfMemory / ScriptError / CompileError.
- `src/sandbox/method-metadata.ts` is the data source for `search`.
- `src/types/isolated-vm.d.ts`: minimal type shim so TS compiles when the
  optional native module isn't installed (Node 25 etc.). Production
  Docker installs the real module.
- `src/watchers/reaction-executor.ts`: dropped QuickJS + the flat
  ReactionSDK env-bridge. Reactions now go through `buildClientSDK` +
  `runScript`. Stored reaction scripts that export `react(ctx, sdk)`
  keep working — the runner accepts both `default` and `react` exports.
- Dropped deps: `@sebastianwessel/quickjs`,
  `@jitl/quickjs-ng-wasmfile-release-asyncify`.
- `src/tools/execute.ts`: dropped scopedToOrg gating on org tools, dropped
  the `join_organization` carve-out.
- `src/auth/tool-access.ts`: kept the per-action policy table — it
  applies to internal-tool dispatches the same way it did to external.
  Added `execute` (write), `search` (public-readable).

Test status
-----------
- Unit (bun:test): 27/27 sandbox + 32/32 tool-access pass.
- Integration: tests that go through the test harness against legacy
  `manage_*` names keep working via the `internal: true` registration.
  Other long-standing integration failures (~190) are pre-existing
  pglite/worktree environment issues, not from this PR. CI will run
  against real Postgres.
- Removed: `__tests__/integration/mcp/public-org-join.test.ts` —
  exclusively tested the dropped `join_organization` MCP tool.

* fix(owletto-backend): drop legacy MCP registrations + reaction `react` fallback

User direction: the new MCP surface should be consistent everywhere; no
back-compat needed since stored data (reaction scripts, agent prompts)
will be migrated manually in production.

- Removed `internal: true` re-registrations of the 14 manage_* tools +
  list_watchers / get_watcher / read_knowledge. Deleted
  `src/tools/admin/index.ts` (the ADMIN_TOOLS aggregator) entirely.
- Removed the `module.exports.react` fallback in run-script.ts. Stored
  reaction scripts must export `default async (ctx, client, params?)`;
  legacy ones need a one-time DB migration.
- Updated workspace-instructions to point agents at `execute` /
  `client.<ns>.<method>` patterns instead of legacy tool names.
- Updated openclaw plugin's "already authenticated" hint text.
- auth.test.ts: removed the "hides org switching on scoped routes"
  assertion (org tools are now exposed everywhere), updated tool-list
  assertion to expect the new clean surface.
- tool-access.test.ts: replaced one assertion that referenced
  manage_connections (now non-existent) with query_sql which is the
  canonical admin-only tool on the new surface.

Frontend (`packages/owletto-web`) ships ~50+ `apiCall('manage_*', ...)`
calls through REST — those break with this change. Tracked as a separate
submodule PR to migrate them to `apiCall('execute', { script: ... })`.

Integration tests that still use `mcpToolsCall('manage_*', ...)` will
fail with `Tool not found` — those require per-test migration to either
direct handler imports or `execute`-script form, which is out of scope
for this PR. The single-test-file migration pattern is intentional so
each test owner can review the new SDK call shape.

* fix(owletto-backend): address self + Pi review findings on PR #348

Self-review and Pi senior-engineer pass surfaced a privilege-escalation
gap and a handful of stale-after-cold-drop issues. Fixes here only — the
deploy-time prerequisites (reaction-script DB migration, owletto-web
caller migration) remain tracked as follow-ups in the PR description.

Security
- `client.query()` now enforces owner/admin role to match the `query_sql`
  MCP tool. Without this, write-tier users could bypass the admin gate by
  calling `execute({script: "client.query(...)"})` and read audit/event
  tables exposed by the validated allowlist.
- `list_organizations` / `switch_organization` now require the `mcp:read`
  scope; previously the org-agnostic branch returned before the scope
  check, letting profile-only OAuth tokens call them.
- `execute` MCP tool annotated `destructiveHint: true` so host clients
  prompt for approval — scripts can delete entities, drop watchers, and
  trigger external operations.

Sandbox hardening (run-script.ts)
- Guest-side proxy short-circuits `then`, `catch`, `finally`, Symbol keys,
  `constructor`, `__proto__`, `inspect` — accidental `await client.foo`
  no longer turns into a host SDK call against `foo.then`.
- Host dispatcher now requires the namespace and method to be own
  enumerable keys of the resolved SDK target. Belt-and-braces with the
  guest-side filter; closes the avenue where `client.constructor.assign`
  would resolve to `Object.assign` on the host.
- Script return value (`returnJson`) is now charged to `outputBytes`, so
  scripts can't bypass the 256KB output cap by returning a large object.

Stale references after the cold-drop
- reaction-executor.ts header no longer claims it accepts the legacy
  `react` export — that fallback was removed in 9dfd372.
- tools/search.ts no-match suggestion text and the `search_knowledge`
  module header now point at `execute` + `client.<ns>.<method>` patterns
  instead of the removed `manage_*` MCP names.
- auth.test.ts assertions for public-org tools/list dropped the now-stale
  expectations on `manage_entity` being externally listed.
- run-script.test.ts no longer asserts the runner is a stub; the bridge
  is real now. Test gracefully skips on systems without `isolated-vm`
  installed.

* chore(docs): align stragglers with the post-cold-drop MCP surface

Sweep prompted by "are all docs everywhere consistent?" — turns out no:
several docs and a couple of dead type exports were still describing the
legacy `manage_*` / `ReactionSDK` world. Surgical fixes only; the
owletto-cli runtime callers (separate scope) are added to the PR's
follow-up section, not patched here.

- packages/owletto-sdk: drop the dead `ReactionSDK` and `ReactionHandler`
  type exports — only `ReactionContext` (still imported by the new
  reaction-executor) and `ReactionEntity` remain. The runtime SDK that
  reactions actually call is `ClientSDK`, defined in owletto-backend.
- packages/owletto-backend/src/auth/tool-access.ts: clarify the comment
  next to the `manage_entity` policy entry. The handlers are no longer
  registered as `internal: true` MCP tools — they're unregistered, but
  the policy table still drives `routeAction`'s per-action checks when
  SDK wrappers reach them from inside `execute`.
- docs/plans/mcp-multi-org-and-execute.md: the "Existing reaction
  scripts in DB" risk bullet claimed `ClientSDK` keeps the same method
  paths and entry-point shape as `ReactionSDK`. That's wrong (`content.
  save` → `knowledge.save`, `actions.execute` → `operations.execute`,
  no `notify`, new `default async (ctx, client, params?)` shape).
  Replace with the actual mapping plus the audit query that confirms
  the migration cleared the legacy patterns.
- packages/landing/.../owletto-cli.md: the `owletto run` examples and
  "available tools" list still pointed at `read_knowledge`, `list_
  watchers`, `get_watcher`, and the 14 `manage_*` names. Replace with
  the current external surface: `search_knowledge`, `save_knowledge`,
  `search`, `execute`, `query_sql`, `list_organizations`, `switch_
  organization`, plus an `execute` script example.
- packages/owletto-connectors/src/README.md: manual connector install
  instruction pointed at `manage_connections` — now points at
  `client.connections.installConnector(...)` from inside `execute` (or
  the equivalent admin REST endpoint).

* fix(owletto-backend): enforce SDK action access

* fix(ci): run owletto auth test with bun

* fix(ci): build owletto sdk before backend tests

* fix(owletto-backend): apply pi second-pass review findings

Pi's second-pass review caught two real things to address before merge:

1. Sandbox guest proxy reserved-key list missed coercion probes.
   `JSON.stringify(client.entities)` and `console.log(client.entities)`
   would call `toJSON` / `toString` / `valueOf` on the proxy, get back
   a dispatcher closure, and end up firing a host SDK call against
   `entities.toJSON` (etc) — wasting quota and surfacing a confusing
   `Unknown SDK method` error. Add those three keys to
   `__isReservedKey()` alongside the existing `then`/Symbol/proto
   short-circuits in run-script.ts.

2. `compileReactionScript` and `runScript`'s internal recompile used
   different esbuild `external` lists. Reactions externalized
   `@owletto/reactions`, which the runtime recompile (with `external: []`)
   would then fail to resolve — save-time would accept code the runtime
   could not run. Align reactions to `external: []` so save-time and
   runtime accept the same set of imports. None of the migrated reactions
   in `_reactions_backup_2026_04_25` import `@owletto/reactions`, so this
   is a no-op for current production data.

* fix(owletto-backend): preserve internal admin REST tools

* fix(owletto-backend): keep manage_connections + manage_auth_profiles on MCP for CLI

The owletto-cli browser-auth flow drives connection setup and credential
exchange through MCP RPC (`packages/owletto-cli/src/commands/browser-auth.ts`).
After flipping all admin tools to `internal: true` those calls fail with
"Tool not found" because internal tools are filtered out of the MCP path
(see tools/execute.ts:85).

Re-expose just the two tools the CLI actively needs. The other admin
tools used by the CLI (`seed.ts`) hit the REST endpoint
`/api/{slug}/{toolName}` which bypasses the internal-flag filter, so
those don't need this treatment.

Once a CLI migration PR ports browser-auth to execute + client.* the
internal flag can be re-applied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant