Skip to content

feat(world-model): cross-org references — schema search path + write guard#374

Merged
buremba merged 3 commits into
mainfrom
feat/cross-org-refs
Apr 26, 2026
Merged

feat(world-model): cross-org references — schema search path + write guard#374
buremba merged 3 commits into
mainfrom
feat/cross-org-refs

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented Apr 26, 2026

First slice of the world-model plan. Exercises the FK foundation from #370 with two small changes — no schema migration.

What this enables

  • A tenant agent can write a tax_filing entity whose type is defined in public-uk-tax, with no per-tenant cloning of the vocabulary. The resolved `entity_type_id` is materialized on the row.
  • A tenant relationship can point at a canonical world entity (HMRC, Barclays, …) when that entity lives in a `visibility = 'public'` org. Public → tenant references remain forbidden.

What's in the diff

File Change
`utils/entity-management.ts` (createEntity validator) Slug → `entity_type_id` lookup widens: tenant org first, then any `visibility = 'public'` org. `ORDER BY` keeps tenant-local types ahead of public ones when both exist.
`utils/entity-link-upsert.ts` Same search path applied to the auto-link insert site.
`utils/relationship-validation.ts` (validateScopeRule) Source must still be in the caller's org. Target may be same-org OR `visibility='public'`. Anything else (a private org you don't control) is rejected.
`utils/tests/entity-management-schema-search.test.ts` Unit tests: fall-through to public catalog; tenant-local wins; unknown type rejected; private orgs not searched.
`tests/integration/relationships/entity-relationships.test.ts` Adds a positive cross-org link test (target in a public catalog org). The pre-existing reject test for cross-org-to-private still passes (now via the relaxed guard, not the old absolute rule).

Out of scope (intentional)

  • Public catalog seeding. No `public-uk-tax` / `public-uk-finance` rows yet — that's a separate piece of work. This PR is the application path; once a catalog exists, agents can resolve types and references against it immediately.
  • Schema search path declared per-agent. Right now an agent's resolution searches all public catalogs. When there are multiple catalogs with overlapping slugs, an explicit `uses_catalog` declaration on the agent will be the right answer; deferred until the ambiguity actually exists.

Test plan

  • `bunx tsc --noEmit` clean
  • `make build-packages` clean
  • `bun run check` (Biome) clean
  • CI tests

buremba added 2 commits April 26, 2026 23:12
…guard

Two small changes that exercise the FK foundation #370 set up:

1. **Schema search path in createEntity / entity-link-upsert.** When an
   entity_type slug isn't registered in the entity's own org, fall back to
   any org with `organization.visibility = 'public'`. First match wins,
   tenant-local types preferred when both exist. The resolved
   `entity_type_id` is materialized on the entity row, so reads never need
   to repeat the search. Lets a tenant agent write a `tax_filing` entity
   whose type lives in `public-uk-tax`, no per-tenant cloning.

2. **Cross-org write guard in validateScopeRule.** Relationship targets may
   now be in a different org *if* that org is a public catalog
   (`visibility = 'public'`). Sources still must be in the caller's org.
   Public → tenant references stay forbidden. The relationship row's
   organization_id remains the source's, keeping the assertion under the
   caller's control. Lets a tenant relationship point at a canonical entity
   like HMRC or Barclays without copying it locally.

No schema migration; both changes piggyback on the global FKs already in
place (#370 for `entity_types.id`, baseline for `entities.id`).

Tests:
- `entity-management-schema-search.test.ts` — 4 unit tests (fall-through,
  tenant-local-wins, unknown-type-rejected, private-org-not-snooped)
- `entity-relationships.test.ts` — adds positive case for cross-org link
  to a public-catalog entity
1. Document the slug-poisoning caveat — visibility='public' alone is trusted
   today; long-term we'll narrow with `is_catalog` or per-agent
   `uses_catalog`. Operationally we restrict visibility flips to admins.
2. Add unit test for the entity-link-upsert resolver path so future drift
   between createEntity and entity-link-upsert is caught.
3. Add the missing negative case in entity-relationships.test.ts: source
   entity in a different org from the caller is rejected (sources must
   always be in the caller's org).
4. Comment typo (et alias).

TOCTOU between type lookup and INSERT (pi finding 2) noted but deferred —
the window is microseconds, the failure mode is semantic-not-corruption,
and the cleanest fix is a transactional rewrite of createEntity that's out
of scope for this PR.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b19d1a218

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +247 to +250
AND (
et.organization_id = ${data.organization_id}
OR o.visibility = 'public'
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce schema checks for public-catalog entity types

This change allows createEntity to resolve entity_type from any public org, but metadata validation is still scoped to ctx.organizationId in validateEntityMetadata (utils/schema-validation.ts), which returns "valid" when it cannot find a schema. In practice, creating or updating a tenant entity that uses a public catalog type can now bypass that type's JSON schema entirely, so required fields and constraints from the catalog are silently skipped and invalid metadata is persisted.

Useful? React with 👍 / 👎.

@github-actions github-actions Bot added the triage:needs-human Triage agent escalated for human review label Apr 26, 2026
Pi flagged: validateScopeRule checks `from_entity_id` before
canonicalization, but `canonicalizeSymmetricEdge` may swap a
public-catalog target into the stored `from_entity_id` slot when its
numeric id is lower. The row stays under the caller's org (so it's
tenant-owned) but the stored source ends up cosmetically inverted vs
the documented invariant ("source must always be the caller's org").

Fix: canonicalize symmetric edges only when both endpoints are in the
caller's org. Cross-org symmetric edges keep caller-from / public-to as
provided. Same-org symmetric still canonicalizes by id so dedup catches
a→b and b→a as the same edge. Cross-org dedup is unaffected because
validateScopeRule already forbids a `public → tenant` create direction.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 26, 2026

Triage decision: needs-human

Reasons:

  • Review comment contains P1 escalation keyword (automatic escalation per triage policy)
  • Comment from chatgpt-codex-connector[bot]: "Enforce schema checks for public-catalog entity types" flagged as P1 priority
  • Security concern regarding schema validation bypass for public catalog entity types

Next: Assigned to @buremba for human review of the P1 security/validation concern. The PR allows entities to resolve types from public catalogs but metadata validation is still scoped to the tenant org, potentially allowing invalid metadata to bypass catalog constraints.

@buremba buremba merged commit 426b2e2 into main Apr 26, 2026
12 checks passed
@buremba buremba deleted the feat/cross-org-refs branch April 26, 2026 22:21
buremba added a commit that referenced this pull request Apr 26, 2026
…377)

* feat(world-model): cross-org relationship_types + catalog discovery in search

Closes the two BLOCKER gaps pi flagged after #374:

1. **Schema search path for entity_relationship_types** (`tools/admin/manage_entity.ts::handleLink`).
   Mirrors what #374 did for entity_types: tenant first, then any
   `visibility='public'` org. Tenant-local relationship types still win.
   Without this, even though entities can use public-catalog vocabulary,
   relationships couldn't — e.g. a tenant relating their `\$member` to a
   canonical Apple Inc would have to register a local copy of `works_at`.

2. **Public-catalog discovery in `tools/search.ts`**. Adds an
   `include_public_catalogs` arg (defaults to true) so tenant agents can
   find canonical entities (HMRC, banks, currencies, …) by name/type
   without knowing entity ids upfront. Result rows already carry
   `organization_id`, so the agent can tell tenant-local from canonical
   hits. `fetchEntityById` widens the same way so an entity_id lookup
   following a search hit resolves cleanly.

No DB migration. Tests:
- `tools/__tests__/search-cross-org.test.ts` (3): public+tenant in one
  call; flag=false hides public; private orgs not snooped
- `entity-relationships.test.ts`: tenant uses a `works-at-public`
  relationship_type defined in a public catalog org

* fix(cross-org-fixes): close privacy leaks in cross-org search

Pi flagged two BLOCKERS in the previous round:

1. **Connection metadata leak.** `formatEntityResult` calls
   `fetchConnectionsForEntity(primaryEntity.id)` with no caller-org
   scope. For a public-catalog entity referenced by multiple tenants,
   any tenant searching that entity would receive other tenants'
   connection display names, configs, and feed entity names. Now
   skipped when the primary entity is in a different org from the
   caller — connections are tenant operational data, never canonical.

2. **Cross-tenant stat side channel.** Count subqueries in the SELECT
   (content_count, connection_count, watcher_count, children_count)
   computed globally for the entity id; for public-catalog entities
   referenced from many tenants, this leaks aggregate activity volumes.
   Now gated `CASE WHEN e.organization_id = $callerOrg THEN ... ELSE 0
   END` for each count, so cross-org rows return zeros for operational
   stats. Children query also scoped to primary's own org.

Also addressing IMPORTANT #3: tenant-local results were getting pushed
out by high-scoring public matches. ORDER BY now `(e.organization_id =
$caller) DESC, match_score DESC` so caller-org wins ties.

* fix(cross-org-fixes): zero out children content_count for cross-org primaries

Pi follow-up: maintains the 'operational counts are zero for cross-org'
invariant consistently — children of a public-catalog primary now show
content_count=0 to match the primary's own zeroed stats.
buremba added a commit that referenced this pull request Apr 27, 2026
* feat(world-model): cross-org schema CRUD + read-side tolerance

Closes the tenant-facing surface that consumes the agent-side cross-org
plumbing landed in #374/#377. Items #1, #4 from
docs/plans/world-model.md "Outstanding work"; #3, #5 collapse to doc-only.

- manage_entity_schema list/get widen to (caller_org OR visibility=public)
  with tenant-first ORDER BY; rows now carry organization_slug. Same
  pattern used in entity-management.ts:249-260 resolver.
- resolve_path widens both intermediate and leaf entity lookups so a
  tenant path can traverse into a public-catalog entity referenced via
  a cross-org relationship.
- getEntity widens the read; comment already promised "own org or public".
- Re-key entity_count helpers from slug to entity_type_id so cross-org
  slug collisions don't merge counts across rows.
- Item #3 noted as already shipped (organization-dropdown.tsx already
  splits Your Organizations / Public Organizations with a separator).
- Item #5 deferred — no exposed updateOrganization mutation today; the
  guard SQL is preserved inline for the future implementer.

* docs(world-model): item #6 first-pass changelog

Pruned classification-test-brand (id=45) from market-intelligence.
Held back the $member rows (real membership, not cruft) and the
template-seed verticals (need user call before pruning whole orgs).

* fix(world-model): gate operational counts + $member ACL after cross-org widening

Pi review of #386 flagged three real regressions introduced by the
cross-org read widening. Fixes:

- getEntity: scope total_content / active_connections / watchers_count /
  children_count by caller org. When `e` is a public-catalog row, totals
  now reflect the caller's references to it, never aggregate cross-tenant
  activity.
- resolve_path leaf: same scoping for total_content (events) and
  watchers_count.
- Exclude $member from public-catalog fallback in getEntity, resolve_path
  intermediate, and resolve_path leaf. Member-redaction uses
  ctx.memberRole (caller's workspace role), so a tenant admin/owner could
  otherwise read a public catalog's $member email by virtue of being
  admin of their own org. $member rows are per-tenant by design.
- rtHandleList relationship_count: scope by caller's organization so
  public relationship-type rows don't expose global usage volume.

Pre-existing concerns flagged in review but out of scope for this PR
(documented for follow-up): resolve_path bootstrap entity-type counts
(unscoped + missing deleted_at), schema get's slug ambiguity across
multiple public catalogs, requireRelationshipType denying list_rules on
public RTs.
buremba added a commit that referenced this pull request Apr 27, 2026
…ng + tests

Audit follow-up to #386, #399 found three additional consumers of the
widened cross-org reads that need attention:

1. utils/schema-validation.ts getEntityTypeSchema() loaded only the
   caller's org schema. After #374 a tenant entity can carry a public
   catalog type (resolved via the schema search path) — validation then
   ran against an empty schema and silently let bad metadata through.
   Widened with the same (caller_org OR visibility=public) + tenant-
   first ORDER BY pattern as the resolver in entity-management.ts.
   Now creating an entity with a public catalog type validates against
   the catalog's metadata_schema.

2. tools/search.ts entitySelectColumns() — connection_count and
   active_connection_count subqueries were CASE-WHEN-gated on caller-
   org-equality, but the inner FROM feeds f / JOIN connections cn
   didn't restate f.organization_id = e.organization_id like the
   children/watcher_count subqueries on the same function do. Added
   the predicate for consistency: defensive belt-and-suspenders against
   any future cross-org feed.entity_ids reference.

3. tools/get_watchers.ts entity-context query — the LEFT JOIN feeds /
   current_event_records on entity_id was unscoped. requireReadAccess
   blocks cross-org callers from reaching this site today, but the
   join itself should match the entity's org regardless. Added the
   filters so the count is always entity-org-local.

Also adds packages/owletto-backend/src/__tests__/integration/entity-types/
cross-org.test.ts covering the post-#386/#399 contract:
- list returns local + cross-org rows tenant-first with organization_slug
- get resolves public catalog types and surfaces organization_slug
- $member is per-tenant: get auto-provisions in caller, never returns
  the catalog's $member
- tenant-first ordering wins on slug collisions
- list_rules works on cross-org rel types (read mode)
- add_rule still 403s on cross-org (write mode strict)
- create with cross-org type validates against the catalog's schema

Tests run against the project's standard integration test backend
(real Postgres). PGlite mode has a pre-existing auth issue affecting
all integration tests in this repo.
buremba added a commit that referenced this pull request Apr 27, 2026
…ng + tests (#407)

* fix(world-model): cross-org schema validation + defensive count scoping + tests

Audit follow-up to #386, #399 found three additional consumers of the
widened cross-org reads that need attention:

1. utils/schema-validation.ts getEntityTypeSchema() loaded only the
   caller's org schema. After #374 a tenant entity can carry a public
   catalog type (resolved via the schema search path) — validation then
   ran against an empty schema and silently let bad metadata through.
   Widened with the same (caller_org OR visibility=public) + tenant-
   first ORDER BY pattern as the resolver in entity-management.ts.
   Now creating an entity with a public catalog type validates against
   the catalog's metadata_schema.

2. tools/search.ts entitySelectColumns() — connection_count and
   active_connection_count subqueries were CASE-WHEN-gated on caller-
   org-equality, but the inner FROM feeds f / JOIN connections cn
   didn't restate f.organization_id = e.organization_id like the
   children/watcher_count subqueries on the same function do. Added
   the predicate for consistency: defensive belt-and-suspenders against
   any future cross-org feed.entity_ids reference.

3. tools/get_watchers.ts entity-context query — the LEFT JOIN feeds /
   current_event_records on entity_id was unscoped. requireReadAccess
   blocks cross-org callers from reaching this site today, but the
   join itself should match the entity's org regardless. Added the
   filters so the count is always entity-org-local.

Also adds packages/owletto-backend/src/__tests__/integration/entity-types/
cross-org.test.ts covering the post-#386/#399 contract:
- list returns local + cross-org rows tenant-first with organization_slug
- get resolves public catalog types and surfaces organization_slug
- $member is per-tenant: get auto-provisions in caller, never returns
  the catalog's $member
- tenant-first ordering wins on slug collisions
- list_rules works on cross-org rel types (read mode)
- add_rule still 403s on cross-org (write mode strict)
- create with cross-org type validates against the catalog's schema

Tests run against the project's standard integration test backend
(real Postgres). PGlite mode has a pre-existing auth issue affecting
all integration tests in this repo.

* fix(search): scope content_count subquery by entity org

Pi caught one more in entitySelectColumns(): content_count subquery
joined current_event_records by entityLinkMatchSql only, with no
ev.organization_id filter. Cross-org events that share an entity_id
or an identity-namespace match could inflate counts even though the
outer CASE WHEN guards against returning anything for cross-org rows.
Same pattern as the connection_count / active_connection_count fixes
in the parent commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

triage:needs-human Triage agent escalated for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant