Skip to content

feat(db)!: convert entities.entity_type to entity_type_id FK#370

Merged
buremba merged 5 commits into
mainfrom
feat/entity-type-fk
Apr 26, 2026
Merged

feat(db)!: convert entities.entity_type to entity_type_id FK#370
buremba merged 5 commits into
mainfrom
feat/entity-type-fk

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented Apr 26, 2026

⚠️ DO NOT MERGE until db/migrations/20260426120000_entities_entity_type_fk.sql has been applied manually to prod. This migration is not expand/contract — it adds and drops in one step. The code in this PR requires the new column; pre-migration code requires the old one. Operational sequence below.

Why

Two problems collapse to the same fix.

  1. Integrity. entities.entity_type was a free-floating text slug. Renaming or hard-deleting an entity_types row silently orphaned every referencing entity — joins went empty, the validator only protected new inserts. With a real FK on entity_types(id), Postgres refuses to drop a referenced type and renames update for free (slug is display-only via JOIN).
  2. Cross-org vocabulary. entity_types.id is globally unique (one sequence across all orgs). An entity in tenant org A can carry a type defined in public-catalog org B by FK alone — no entity_type_org_id column needed. This is the foundation for the world-model plan in docs/plans/world-model.md.

What changes

  • Migration at db/migrations/20260426120000_entities_entity_type_fk.sql: add nullable entity_type_id FK, backfill (live entity_types preferred, soft-deleted as fallback), fail loudly on orphans, set NOT NULL, drop the text column.
  • All read sites switched to JOIN entity_types et ON et.id = e.entity_type_id and select et.slug AS entity_type. Parent-entity joins use pet; relationship from/to use fet/tet.
  • All INSERT sites resolve slug → id first, then INSERT entity_type_id.
  • execute-data-sources.ts preserves the public entities.entity_type contract for user-supplied query_sql by aliasing et.slug AS entity_type in the entities CTE.
  • Test fixtures updated to the same pattern.

Out of scope (intentional)

This PR is same-org-only. INSERT-side validators still resolve entity_type slugs against the entity's own org (matching today's behaviour). The cross-org capability is enabled by the FK — entities.entity_type_id can already point at any org's entity_types row — but the resolution path (schema search path through an agent's uses_catalog declarations, plus the cross-org write guard for entity_relationships) is left to the next PR (slim install + agent_template entity). Adding it here without an exercising path would be dead code.

entity_relationships.relationship_type_id is already an integer FK — no change needed.

Manual deploy procedure

The migration adds and drops in one transaction; new code requires the new column. To avoid serving traffic against a half-migrated DB:

# 1. Stop the gateway / quiesce writers.

# 2. Apply the migration (single transaction; rolls back on orphans).
psql "$DATABASE_URL" -f db/migrations/20260426120000_entities_entity_type_fk.sql

# 3. Sanity-check.
psql "$DATABASE_URL" -c '\d entities'                                                       # entity_type column gone, entity_type_id NOT NULL present
psql "$DATABASE_URL" -c "SELECT COUNT(*) FROM entities WHERE entity_type_id IS NULL;"        # expect 0

# 4. Merge this PR + deploy + start the gateway.

If the migration aborts on orphans, surface the offenders before retrying:

SELECT e.id, e.organization_id, e.entity_type, e.name
FROM entities e
LEFT JOIN entity_types et
  ON et.slug = e.entity_type AND et.organization_id = e.organization_id
WHERE et.id IS NULL;

Review (pi pass)

Pi reviewed the diff and flagged three missed SQL sites (manage_entity.ts:545, search.ts:405, workspace-instructions.ts:25) — fixed in 01c23330. Same commit also tightens the migration backfill against soft-deleted/active slug collisions ((deleted_at IS NULL) DESC, id DESC ordering preserves history while resolving deterministically).

Test plan

  • bunx tsc -p packages/owletto-backend --noEmit clean (only pre-existing agents/install.ts:799 errors).
  • make build-packages clean.
  • bun run check (Biome) clean.
  • Integration tests pass against a DB with the migration applied.
  • Manual smoke: create entity, list entities, link entities, list with type filter, delete entity — all paths use the new FK.

Replaces the text-slug column on entities with an integer FK on
entity_types(id). Two motivations folded into one change:

1. Integrity. The previous text-slug pattern silently orphaned entities
   when their entity_type was renamed or hard-deleted; only the create-
   path validator enforced same-org existence, and updates/deletes
   bypassed it. The FK now does that work at the DB layer: renames are
   safe (slug becomes display-only via JOIN), Postgres refuses to drop
   a referenced type.

2. Cross-org vocabulary. entity_types.id is globally unique, so an
   entity in tenant org A can carry a type defined in public-catalog
   org B by FK alone — without an extra org_id column on entities.
   Sets up the world-model plan in docs/plans/world-model.md for slim
   install and public-catalog references in subsequent PRs.

Migration: db/migrations/20260426120000_entities_entity_type_fk.sql.
Single-prod-DB: adds nullable column, backfills from existing slug
lookup, fails loudly on orphaned rows, sets NOT NULL, drops the text
column. RUN MANUALLY before deploying.

All read sites switched to "JOIN entity_types et ON et.id = e.entity_type_id"
and select "et.slug AS entity_type". INSERT sites resolve slug → id
first. Test fixtures updated to the same pattern.

The cross-org write guard for entity_relationships is deferred to the
PR that introduces cross-org INSERTs (slim install + agent_template
entity); adding it now would be dead code.
@buremba buremba marked this pull request as ready for review April 26, 2026 17:26
@github-actions github-actions Bot added the triage:needs-human Triage agent escalated for human review label Apr 26, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f488ff547b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +28 to +30
FROM public.entity_types et
WHERE et.slug = e.entity_type
AND et.organization_id = e.organization_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Disambiguate entity-type backfill before setting FK IDs

This backfill join is non-deterministic when an org has multiple entity_types rows with the same slug (which is possible because uniqueness is only enforced for deleted_at IS NULL). In that case, one entities row can match multiple et rows and PostgreSQL may assign an arbitrary et.id, so entities can be rebound to a stale/deleted type after migration. Make the mapping deterministic (for example, prefer the active row, or otherwise fail when multiple matches exist) before writing entity_type_id.

Useful? React with 👍 / 👎.

Comment on lines 670 to 672
LEFT JOIN entities e
ON e.organization_id = et.organization_id
AND e.entity_type = et.slug
ON e.entity_type_id = et.id
WHERE et.deleted_at IS NULL
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope entity-type counts to the current organization

This join now counts every entity that references the type ID, regardless of which organization owns the entity. Once cross-org type references exist, entity_count in resolve_path bootstrap will include other orgs’ entities and leak/overstate counts for the current workspace. The previous logic constrained counts to same-org entities; add an e.organization_id = ${organizationId} condition here to preserve org-local semantics.

Useful? React with 👍 / 👎.

…c backfill

Pi review caught three SELECTs that referenced the (now-dropped) entity_type
text column and would crash post-migration:

- tools/admin/manage_entity.ts:545 — handleUpdate fetched before-state
- tools/search.ts:405 — fetchTopEntitiesByType
- utils/workspace-instructions.ts:25 — entity-type counts

Each rewritten to JOIN entity_types and select et.slug AS entity_type.

Backfill in the migration was non-deterministic: the (slug, organization_id)
pair has multiple matches when an active entity_types row coexists with a
soft-deleted predecessor, since the UNIQUE index only covers `deleted_at IS
NULL`. Switched to a correlated subquery ordered by `(deleted_at IS NULL)
DESC, id DESC` so live rows always win, soft-deleted rows are the fallback
for history preservation.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 26, 2026

Triage decision: needs-human

Reasons:

  • Review comments contain escalation keyword "P1" (automated bot review flagged priority database migration issues)
  • PR size exceeds auto-merge threshold (883 lines, 26 files; limits 300 lines, 10 files)
  • Current merge state is BLOCKED
  • Database migration requires manual operational steps as documented in PR description

Next: Human review required for P1 database migration issues and manual deployment coordination. Review inline comments for specific migration concerns.

buremba added 3 commits April 26, 2026 18:50
DROP COLUMN entity_type would silently cascade-drop
entities_organization_id_entity_type_slug_parent_id_key. Make the drop
explicit, and document why we don't recreate it: the stronger
entities_slug_parent_unique (UNIQUE on org_id, COALESCE(parent_id, 0),
slug) already enforces slug uniqueness within (org, parent) regardless
of entity type, with NULL-parent collapsing — so the entity-type-keyed
constraint never caught anything the index didn't already catch.

Down migration recreates the constraint and the column comment for
rollback fidelity.

Caught by pi review v2.
Tests routinely create entities in fresh orgs (e.g. createTestOrganization
followed by createTestEntity) without first calling seedSystemEntityTypes.
Pre-FK that worked because the slug-text column accepted any string; the
new FK requires an entity_types row to exist. Either seed in every test
setup or upsert here — the latter is one-time-cheap and keeps existing
test code working.

Caught by pi review v3.
dbmate auto-regenerated after applying 20260426120000_entities_entity_type_fk.sql
to prod. Reflects:
- entities.entity_type column dropped
- entities.entity_type_id integer NOT NULL FK on entity_types(id)
- idx_entities_entity_type_id index
- entities_organization_id_entity_type_slug_parent_id_key constraint dropped
  (redundant with entities_slug_parent_unique)
@buremba buremba added the skip-size-check Bypass PR size gate for intentionally large single-concern changes label Apr 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Triage decision: needs-human

Reasons:

  • Review comments contain escalation keyword "P1" (automated bot review flagged priority database migration issues)
  • Current merge state is BLOCKED
  • Database migration requires manual operational steps as documented in PR description
  • Complex schema change converting entity_type from text slug to FK requires careful review

Next: Human review required for P1 database migration issues and manual deployment coordination. Review inline comments for specific migration concerns including non-deterministic backfill handling and cross-org entity count scoping.

@buremba buremba enabled auto-merge (squash) April 26, 2026 20:00
@buremba buremba merged commit ab7ecde into main Apr 26, 2026
16 of 17 checks passed
@buremba buremba deleted the feat/entity-type-fk branch April 26, 2026 20:01
buremba added a commit that referenced this pull request Apr 26, 2026
…guard (#374)

* feat(world-model): cross-org references — schema search path + write guard

Two small changes that exercise the FK foundation #370 set up:

1. **Schema search path in createEntity / entity-link-upsert.** When an
   entity_type slug isn't registered in the entity's own org, fall back to
   any org with `organization.visibility = 'public'`. First match wins,
   tenant-local types preferred when both exist. The resolved
   `entity_type_id` is materialized on the entity row, so reads never need
   to repeat the search. Lets a tenant agent write a `tax_filing` entity
   whose type lives in `public-uk-tax`, no per-tenant cloning.

2. **Cross-org write guard in validateScopeRule.** Relationship targets may
   now be in a different org *if* that org is a public catalog
   (`visibility = 'public'`). Sources still must be in the caller's org.
   Public → tenant references stay forbidden. The relationship row's
   organization_id remains the source's, keeping the assertion under the
   caller's control. Lets a tenant relationship point at a canonical entity
   like HMRC or Barclays without copying it locally.

No schema migration; both changes piggyback on the global FKs already in
place (#370 for `entity_types.id`, baseline for `entities.id`).

Tests:
- `entity-management-schema-search.test.ts` — 4 unit tests (fall-through,
  tenant-local-wins, unknown-type-rejected, private-org-not-snooped)
- `entity-relationships.test.ts` — adds positive case for cross-org link
  to a public-catalog entity

* fix(cross-org-refs): address pi review

1. Document the slug-poisoning caveat — visibility='public' alone is trusted
   today; long-term we'll narrow with `is_catalog` or per-agent
   `uses_catalog`. Operationally we restrict visibility flips to admins.
2. Add unit test for the entity-link-upsert resolver path so future drift
   between createEntity and entity-link-upsert is caught.
3. Add the missing negative case in entity-relationships.test.ts: source
   entity in a different org from the caller is rejected (sources must
   always be in the caller's org).
4. Comment typo (et alias).

TOCTOU between type lookup and INSERT (pi finding 2) noted but deferred —
the window is microseconds, the failure mode is semantic-not-corruption,
and the cleanest fix is a transactional rewrite of createEntity that's out
of scope for this PR.

* fix(cross-org-refs): preserve caller-from for cross-org symmetric edges

Pi flagged: validateScopeRule checks `from_entity_id` before
canonicalization, but `canonicalizeSymmetricEdge` may swap a
public-catalog target into the stored `from_entity_id` slot when its
numeric id is lower. The row stays under the caller's org (so it's
tenant-owned) but the stored source ends up cosmetically inverted vs
the documented invariant ("source must always be the caller's org").

Fix: canonicalize symmetric edges only when both endpoints are in the
caller's org. Cross-org symmetric edges keep caller-from / public-to as
provided. Same-org symmetric still canonicalizes by id so dedup catches
a→b and b→a as the same edge. Cross-org dedup is unaffected because
validateScopeRule already forbids a `public → tenant` create direction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-size-check Bypass PR size gate for intentionally large single-concern changes triage:needs-human Triage agent escalated for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant