feat(world-model): cross-org references — schema search path + write guard#374
Conversation
…guard Two small changes that exercise the FK foundation #370 set up: 1. **Schema search path in createEntity / entity-link-upsert.** When an entity_type slug isn't registered in the entity's own org, fall back to any org with `organization.visibility = 'public'`. First match wins, tenant-local types preferred when both exist. The resolved `entity_type_id` is materialized on the entity row, so reads never need to repeat the search. Lets a tenant agent write a `tax_filing` entity whose type lives in `public-uk-tax`, no per-tenant cloning. 2. **Cross-org write guard in validateScopeRule.** Relationship targets may now be in a different org *if* that org is a public catalog (`visibility = 'public'`). Sources still must be in the caller's org. Public → tenant references stay forbidden. The relationship row's organization_id remains the source's, keeping the assertion under the caller's control. Lets a tenant relationship point at a canonical entity like HMRC or Barclays without copying it locally. No schema migration; both changes piggyback on the global FKs already in place (#370 for `entity_types.id`, baseline for `entities.id`). Tests: - `entity-management-schema-search.test.ts` — 4 unit tests (fall-through, tenant-local-wins, unknown-type-rejected, private-org-not-snooped) - `entity-relationships.test.ts` — adds positive case for cross-org link to a public-catalog entity
1. Document the slug-poisoning caveat — visibility='public' alone is trusted today; long-term we'll narrow with `is_catalog` or per-agent `uses_catalog`. Operationally we restrict visibility flips to admins. 2. Add unit test for the entity-link-upsert resolver path so future drift between createEntity and entity-link-upsert is caught. 3. Add the missing negative case in entity-relationships.test.ts: source entity in a different org from the caller is rejected (sources must always be in the caller's org). 4. Comment typo (et alias). TOCTOU between type lookup and INSERT (pi finding 2) noted but deferred — the window is microseconds, the failure mode is semantic-not-corruption, and the cleanest fix is a transactional rewrite of createEntity that's out of scope for this PR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b19d1a218
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| AND ( | ||
| et.organization_id = ${data.organization_id} | ||
| OR o.visibility = 'public' | ||
| ) |
There was a problem hiding this comment.
Enforce schema checks for public-catalog entity types
This change allows createEntity to resolve entity_type from any public org, but metadata validation is still scoped to ctx.organizationId in validateEntityMetadata (utils/schema-validation.ts), which returns "valid" when it cannot find a schema. In practice, creating or updating a tenant entity that uses a public catalog type can now bypass that type's JSON schema entirely, so required fields and constraints from the catalog are silently skipped and invalid metadata is persisted.
Useful? React with 👍 / 👎.
Pi flagged: validateScopeRule checks `from_entity_id` before
canonicalization, but `canonicalizeSymmetricEdge` may swap a
public-catalog target into the stored `from_entity_id` slot when its
numeric id is lower. The row stays under the caller's org (so it's
tenant-owned) but the stored source ends up cosmetically inverted vs
the documented invariant ("source must always be the caller's org").
Fix: canonicalize symmetric edges only when both endpoints are in the
caller's org. Cross-org symmetric edges keep caller-from / public-to as
provided. Same-org symmetric still canonicalizes by id so dedup catches
a→b and b→a as the same edge. Cross-org dedup is unaffected because
validateScopeRule already forbids a `public → tenant` create direction.
|
Triage decision: Reasons:
Next: Assigned to @buremba for human review of the P1 security/validation concern. The PR allows entities to resolve types from public catalogs but metadata validation is still scoped to the tenant org, potentially allowing invalid metadata to bypass catalog constraints. |
…377) * feat(world-model): cross-org relationship_types + catalog discovery in search Closes the two BLOCKER gaps pi flagged after #374: 1. **Schema search path for entity_relationship_types** (`tools/admin/manage_entity.ts::handleLink`). Mirrors what #374 did for entity_types: tenant first, then any `visibility='public'` org. Tenant-local relationship types still win. Without this, even though entities can use public-catalog vocabulary, relationships couldn't — e.g. a tenant relating their `\$member` to a canonical Apple Inc would have to register a local copy of `works_at`. 2. **Public-catalog discovery in `tools/search.ts`**. Adds an `include_public_catalogs` arg (defaults to true) so tenant agents can find canonical entities (HMRC, banks, currencies, …) by name/type without knowing entity ids upfront. Result rows already carry `organization_id`, so the agent can tell tenant-local from canonical hits. `fetchEntityById` widens the same way so an entity_id lookup following a search hit resolves cleanly. No DB migration. Tests: - `tools/__tests__/search-cross-org.test.ts` (3): public+tenant in one call; flag=false hides public; private orgs not snooped - `entity-relationships.test.ts`: tenant uses a `works-at-public` relationship_type defined in a public catalog org * fix(cross-org-fixes): close privacy leaks in cross-org search Pi flagged two BLOCKERS in the previous round: 1. **Connection metadata leak.** `formatEntityResult` calls `fetchConnectionsForEntity(primaryEntity.id)` with no caller-org scope. For a public-catalog entity referenced by multiple tenants, any tenant searching that entity would receive other tenants' connection display names, configs, and feed entity names. Now skipped when the primary entity is in a different org from the caller — connections are tenant operational data, never canonical. 2. **Cross-tenant stat side channel.** Count subqueries in the SELECT (content_count, connection_count, watcher_count, children_count) computed globally for the entity id; for public-catalog entities referenced from many tenants, this leaks aggregate activity volumes. Now gated `CASE WHEN e.organization_id = $callerOrg THEN ... ELSE 0 END` for each count, so cross-org rows return zeros for operational stats. Children query also scoped to primary's own org. Also addressing IMPORTANT #3: tenant-local results were getting pushed out by high-scoring public matches. ORDER BY now `(e.organization_id = $caller) DESC, match_score DESC` so caller-org wins ties. * fix(cross-org-fixes): zero out children content_count for cross-org primaries Pi follow-up: maintains the 'operational counts are zero for cross-org' invariant consistently — children of a public-catalog primary now show content_count=0 to match the primary's own zeroed stats.
* feat(world-model): cross-org schema CRUD + read-side tolerance Closes the tenant-facing surface that consumes the agent-side cross-org plumbing landed in #374/#377. Items #1, #4 from docs/plans/world-model.md "Outstanding work"; #3, #5 collapse to doc-only. - manage_entity_schema list/get widen to (caller_org OR visibility=public) with tenant-first ORDER BY; rows now carry organization_slug. Same pattern used in entity-management.ts:249-260 resolver. - resolve_path widens both intermediate and leaf entity lookups so a tenant path can traverse into a public-catalog entity referenced via a cross-org relationship. - getEntity widens the read; comment already promised "own org or public". - Re-key entity_count helpers from slug to entity_type_id so cross-org slug collisions don't merge counts across rows. - Item #3 noted as already shipped (organization-dropdown.tsx already splits Your Organizations / Public Organizations with a separator). - Item #5 deferred — no exposed updateOrganization mutation today; the guard SQL is preserved inline for the future implementer. * docs(world-model): item #6 first-pass changelog Pruned classification-test-brand (id=45) from market-intelligence. Held back the $member rows (real membership, not cruft) and the template-seed verticals (need user call before pruning whole orgs). * fix(world-model): gate operational counts + $member ACL after cross-org widening Pi review of #386 flagged three real regressions introduced by the cross-org read widening. Fixes: - getEntity: scope total_content / active_connections / watchers_count / children_count by caller org. When `e` is a public-catalog row, totals now reflect the caller's references to it, never aggregate cross-tenant activity. - resolve_path leaf: same scoping for total_content (events) and watchers_count. - Exclude $member from public-catalog fallback in getEntity, resolve_path intermediate, and resolve_path leaf. Member-redaction uses ctx.memberRole (caller's workspace role), so a tenant admin/owner could otherwise read a public catalog's $member email by virtue of being admin of their own org. $member rows are per-tenant by design. - rtHandleList relationship_count: scope by caller's organization so public relationship-type rows don't expose global usage volume. Pre-existing concerns flagged in review but out of scope for this PR (documented for follow-up): resolve_path bootstrap entity-type counts (unscoped + missing deleted_at), schema get's slug ambiguity across multiple public catalogs, requireRelationshipType denying list_rules on public RTs.
…ng + tests Audit follow-up to #386, #399 found three additional consumers of the widened cross-org reads that need attention: 1. utils/schema-validation.ts getEntityTypeSchema() loaded only the caller's org schema. After #374 a tenant entity can carry a public catalog type (resolved via the schema search path) — validation then ran against an empty schema and silently let bad metadata through. Widened with the same (caller_org OR visibility=public) + tenant- first ORDER BY pattern as the resolver in entity-management.ts. Now creating an entity with a public catalog type validates against the catalog's metadata_schema. 2. tools/search.ts entitySelectColumns() — connection_count and active_connection_count subqueries were CASE-WHEN-gated on caller- org-equality, but the inner FROM feeds f / JOIN connections cn didn't restate f.organization_id = e.organization_id like the children/watcher_count subqueries on the same function do. Added the predicate for consistency: defensive belt-and-suspenders against any future cross-org feed.entity_ids reference. 3. tools/get_watchers.ts entity-context query — the LEFT JOIN feeds / current_event_records on entity_id was unscoped. requireReadAccess blocks cross-org callers from reaching this site today, but the join itself should match the entity's org regardless. Added the filters so the count is always entity-org-local. Also adds packages/owletto-backend/src/__tests__/integration/entity-types/ cross-org.test.ts covering the post-#386/#399 contract: - list returns local + cross-org rows tenant-first with organization_slug - get resolves public catalog types and surfaces organization_slug - $member is per-tenant: get auto-provisions in caller, never returns the catalog's $member - tenant-first ordering wins on slug collisions - list_rules works on cross-org rel types (read mode) - add_rule still 403s on cross-org (write mode strict) - create with cross-org type validates against the catalog's schema Tests run against the project's standard integration test backend (real Postgres). PGlite mode has a pre-existing auth issue affecting all integration tests in this repo.
…ng + tests (#407) * fix(world-model): cross-org schema validation + defensive count scoping + tests Audit follow-up to #386, #399 found three additional consumers of the widened cross-org reads that need attention: 1. utils/schema-validation.ts getEntityTypeSchema() loaded only the caller's org schema. After #374 a tenant entity can carry a public catalog type (resolved via the schema search path) — validation then ran against an empty schema and silently let bad metadata through. Widened with the same (caller_org OR visibility=public) + tenant- first ORDER BY pattern as the resolver in entity-management.ts. Now creating an entity with a public catalog type validates against the catalog's metadata_schema. 2. tools/search.ts entitySelectColumns() — connection_count and active_connection_count subqueries were CASE-WHEN-gated on caller- org-equality, but the inner FROM feeds f / JOIN connections cn didn't restate f.organization_id = e.organization_id like the children/watcher_count subqueries on the same function do. Added the predicate for consistency: defensive belt-and-suspenders against any future cross-org feed.entity_ids reference. 3. tools/get_watchers.ts entity-context query — the LEFT JOIN feeds / current_event_records on entity_id was unscoped. requireReadAccess blocks cross-org callers from reaching this site today, but the join itself should match the entity's org regardless. Added the filters so the count is always entity-org-local. Also adds packages/owletto-backend/src/__tests__/integration/entity-types/ cross-org.test.ts covering the post-#386/#399 contract: - list returns local + cross-org rows tenant-first with organization_slug - get resolves public catalog types and surfaces organization_slug - $member is per-tenant: get auto-provisions in caller, never returns the catalog's $member - tenant-first ordering wins on slug collisions - list_rules works on cross-org rel types (read mode) - add_rule still 403s on cross-org (write mode strict) - create with cross-org type validates against the catalog's schema Tests run against the project's standard integration test backend (real Postgres). PGlite mode has a pre-existing auth issue affecting all integration tests in this repo. * fix(search): scope content_count subquery by entity org Pi caught one more in entitySelectColumns(): content_count subquery joined current_event_records by entityLinkMatchSql only, with no ev.organization_id filter. Cross-org events that share an entity_id or an identity-namespace match could inflate counts even though the outer CASE WHEN guards against returning anything for cross-org rows. Same pattern as the connection_count / active_connection_count fixes in the parent commit.
First slice of the world-model plan. Exercises the FK foundation from #370 with two small changes — no schema migration.
What this enables
tax_filingentity whose type is defined inpublic-uk-tax, with no per-tenant cloning of the vocabulary. The resolved `entity_type_id` is materialized on the row.What's in the diff
Out of scope (intentional)
Test plan