chore(db): squash 82 migrations into baseline + retire schema.sql + embedded patches by buremba · Pull Request #908 · lobu-ai/lobu

buremba · 2026-05-19T03:45:02Z

Summary

Clean-cut consolidation of the DB schema management story. Net: -13,024 / +1,683 lines, 89 files.

What changed

Change	Why
`db/migrations/`: 82 files → 1 baseline	Single source of truth; cold-start drops from 82 sequential applies to one CREATE-everything
`db/schema.sql` deleted	Baseline IS the schema; no more dual source, no more drift gate
`scripts/normalize-schema.sh` deleted	Was only needed to scrub pg_dump output for the drift diff
`Makefile`: `db-schema` target removed	No schema.sql to regenerate
`.github/workflows/ci.yml`	Removed: drift gate, normalize step, `--schema-file` flag on `dbmate up`. Kept: dbmate-up validation + status check + immutability check (honors `[squash-baseline]` sentinel)
`packages/server/src/db/embedded-schema-patches.ts` deleted	Embedded path now runs migrations the same way prod does — no second mirror to maintain
`packages/server/src/start-local.ts`	Replaced the "skip migrations if `organization` exists" branch + patches loop with a single `schema_migrations`-aware applier that mirrors dbmate's behavior

Audit findings (4 parallel Explore agents)

Category	Findings
Dead tables (zero TS readers/writers)	`mcp_proxy_sessions`, `organization_lobu_links`, 4 `migration_*` temp artifacts
Dead columns	`agents.skill_auto_granted_domains` (0 hits), `runs.retry_delay_seconds` (comments only)
Dead views / functions / triggers	none (past migrations already cleaned up)
Deprecation markers	none actionable (already done in past migrations)

Kept after spot-check: `agents.{soulMd, nixConfig, networkConfig, pluginsConfig}` — heavily referenced via camelCase in the owletto web admin agent editor + CLI apply diff/desired-state. Audit's snake_case grep missed them.

Top-15 tables get `COMMENT ON TABLE` descriptions

`events`, `runs`, `agents`, `connections`, `entities`, `auth_profiles`, `organization`, `user`, `member`, `watchers`, `feeds`, `personal_access_tokens`, `oauth_tokens`, `entity_types`, `event_classifications`. Self-documenting schema.

Data safety

Nothing is dropped from prod. All "removed" tables get renamed; all "removed" columns get snapshotted into side tables. Data stays inside the live DB, queryable any time, restorable in one SQL statement.

Two layered recovery paths:

Path A — CNPG point-in-time recovery (preferred, full restore)

This codebase already has CNPG WAL archiving wired:

Config	Value	Source
Backup target	Cloudflare R2 (`s3://summaries-db-backup`)	`packages/owletto/deploy/k8s/apps/lobu/base/helmrelease.yaml`
`retentionPolicy`	30 days	same
`archive_timeout`	900s (15-min force-archive)	same
`ScheduledBackup`	daily 02:00 UTC	same
Recovery template	`db-recovery.yaml` (used 2026-03-15 after Reddit re-sync incident)	proof-point in same dir

If anything breaks post-deploy, apply a fresh `Cluster` CR with `bootstrap.recovery.recoveryTarget.targetTime` set to the pre-surgery timestamp recorded in STEP 0 — CNPG restores transaction-level state from the latest base backup + WAL replay. Flip the app's `DATABASE_URL` to point at the recovered cluster.

Path B — In-DB safety net (lightweight, just-the-squash rollback)

Surgery leaves behind:

6 renamed tables (suffix `_d20260519`)
2 column-snapshot tables (one per dropped column, keyed off the parent's PK)
The pre-surgery `schema_migrations` ledger CSV at `/tmp/schema-migrations-pre-squash.csv`

To undo just the squash without a full PITR:

```
-- 1. Restore the ledger:
psql "$PROD_DATABASE_URL" -c "DELETE FROM public.schema_migrations"
psql "$PROD_DATABASE_URL"
-c "\\copy public.schema_migrations FROM '/tmp/schema-migrations-pre-squash.csv' CSV HEADER"
-- 2. Rename tables back:
ALTER TABLE public.mcp_proxy_sessions_d20260519 RENAME TO mcp_proxy_sessions;
-- ... (one per renamed table)
-- 3. ADD COLUMN + UPDATE from snapshot tables.
```

Full rollback details in the baseline file's header.

Rollout procedure

STEP 0 — record pre-surgery timestamp:
```
psql "$PROD_DATABASE_URL" -c "SELECT now()" | tee /tmp/pre-surgery-ts.txt
```
STEP 1 — full pg_dump as belt + suspenders (CNPG PITR is the real safety net).
STEP 2 — CSV the schema_migrations ledger (for Path B rollback).
STEP 3 — sanity-check row counts on the 6 droppee tables. Abort if non-zero.
STEP 4 — apply the rename + snapshot + ledger-reset surgery in a single BEGIN/COMMIT (full script in the baseline header).
STEP 5 — deploy the new image. `dbmate up` skips the baseline (already-applied per the surgery's ledger insert).

Fresh DBs (local dev, PGlite, CI): no surgery; `dbmate up` applies the baseline from scratch. For local PGlite: `rm -rf /data` next to your `lobu run` to take advantage of the squash.

Verification

Manual end-to-end test, two Docker containers:

Container	Setup	Outcome
A — "prod simulation"	`pgvector/pgvector:pg16` + 82 old migrations + surgery	Canonical schema aligns to baseline; 8 backup tables retain rows
B — "fresh DB"	`pgvector/pgvector:pg16` + new baseline only	67 user tables + dbmate-managed schema_migrations

Diff between A and B's canonical schemas (excluding backup tables): empty at the column level, across all 67 tables.

`bun run typecheck` clean
Container B applies the baseline in 2.6s without errors
Container A surgery preserves all rows; canonical schema matches B
`agents.skill_auto_granted_domains` absent in both; data preserved in `agents_d20260519_skill_auto_granted_domains` on A
`runs.retry_delay_seconds` absent in both; data preserved in `runs_d20260519_retry_delay_seconds` on A
`schema_migrations` has only `'00000000000000'` post-surgery on A; fresh-applied on B
CI: `migrations` job + `dbmate up` validation (running now)
Code review on the baseline file's accuracy
Pre-merge: confirm the recovery procedure works on a CNPG staging clone

CI

The `[squash-baseline]` sentinel in commit messages signals to CI's immutability check that this PR is a one-time squash. Future schema PRs without that marker still get strict "applied migrations are immutable" enforcement.

Summary by CodeRabbit

Release Notes

Chores
- Consolidated numerous database schema migrations into a baseline schema to streamline database upgrades and initialization processes.
- Removed legacy migration tooling, embedded schema patching infrastructure, and related build utilities.
- Simplified CI pipeline configuration and removed schema normalization scripts.
- Updated local development bootstrapping to use streamlined migration ledger tracking.

…a.sql + embedded patches Clean-cut consolidation of the DB schema management story. Authorized by the user with "nobody is using our app yet, we can patch prod once." Net: -13,024 / +1,518 lines. What changed ------------ db/migrations/ - Replaced 82 existing files (the stale 00000000000000_baseline.sql + 81 forward deltas) with one regenerated baseline that captures the current schema verbatim. - Baseline generated by: `dbmate up` all 82 against a fresh pgvector/pgvector:pg16 container (same image CI uses) → drop dead schema (audit-confirmed) → annotate top-15 tables with COMMENT ON → pg_dump --schema-only → strip dump noise. db/schema.sql — DELETED. The baseline IS the schema. No more dual source of truth; no more drift gate; no more "did I forget to regenerate?" gotcha. scripts/normalize-schema.sh — DELETED. Was only used to scrub pg_dump output before the drift diff. No diff, no script. Makefile - Removed `db-schema` target (no schema.sql to regenerate). - Help line dropped. .github/workflows/ci.yml - Removed: normalize step, drift-gate step, `--schema-file` flag on `dbmate up`. - Kept: immutability check, rebased to exclude `00000000000000_baseline.sql` so future re-squashes can ship. - Kept: `dbmate up` + status check (validates baseline applies cleanly to a fresh DB). packages/server/src/db/embedded-schema-patches.ts — DELETED. Embedded path now runs the migrations directory the same way prod does — no second mirror to maintain. packages/server/src/start-local.ts - Replaced the "skip migrations if `organization` exists" branch + `applyEmbeddedSchemaPatches` loop with a single `schema_migrations`- aware applier that mirrors dbmate's behavior: ensure ledger table, read applied versions, apply only the unseen ones, record each on success. - Idempotent against any starting state (fresh or pre-initialized), so legacy embedded DBs catch up to the baseline on next boot without a separate code path. Dead schema dropped in the baseline (audit-flagged) ---------------------------------------------------- Tables (6): - mcp_proxy_sessions (no readers/writers in TS) - organization_lobu_links (no readers/writers in TS) - migration_20260315300000_entity_type_org_backfill (one-off temp) - migration_20260316100000_created_entity_types (one-off temp) - migration_20260316100000_deleted_default_entity_types (one-off temp) - migration_20260316100000_events_kind_backup (one-off temp) Columns (2): - agents.skill_auto_granted_domains (jsonb, 0 hits) - runs.retry_delay_seconds (in comments only, never assigned) Kept after verification: agents.{soulMd,nixConfig,networkConfig, pluginsConfig} — heavy camelCase usage in owletto web admin agent editor + lobu CLI apply diff/desired-state. Documentation ------------- Added COMMENT ON TABLE for the 15 load-bearing tables: events, runs, agents, connections, entities, auth_profiles, organization, user, member, watchers, feeds, personal_access_tokens, oauth_tokens, entity_types, event_classifications. Self-documenting schema. Prod rollout — REQUIRED before deploying this image ----------------------------------------------------- Run on each prod DB once, BEFORE rolling out the new code: BEGIN; DROP TABLE IF EXISTS public.mcp_proxy_sessions CASCADE; DROP TABLE IF EXISTS public.organization_lobu_links CASCADE; DROP TABLE IF EXISTS public.migration_20260315300000_entity_type_org_backfill CASCADE; DROP TABLE IF EXISTS public.migration_20260316100000_created_entity_types CASCADE; DROP TABLE IF EXISTS public.migration_20260316100000_deleted_default_entity_types CASCADE; DROP TABLE IF EXISTS public.migration_20260316100000_events_kind_backup CASCADE; ALTER TABLE public.agents DROP COLUMN IF EXISTS skill_auto_granted_domains; ALTER TABLE public.runs DROP COLUMN IF EXISTS retry_delay_seconds; DELETE FROM public.schema_migrations; INSERT INTO public.schema_migrations (version) VALUES ('00000000000000'); COMMIT; Why this order: new pods boot with `dbmate up` which now expects schema_migrations to list `'00000000000000'`. The DELETE+INSERT makes that the only applied row, so dbmate skips the baseline (whose contents already match prod's schema after the DROPs). PGlite / local dev — wipe and rebuild -------------------------------------- The simplest path: `rm -rf <workspace>/data` next to your `lobu run` invocations. Next boot recreates the schema from the baseline. Equivalent: `dbmate drop && dbmate up` against your dev DB. Why not let dbmate self-heal on boot ------------------------------------- Without the surgery, prod's `schema_migrations` table has 82 ghost rows (the old applied versions). dbmate would see baseline as unapplied, try to apply it, but its strict `CREATE TABLE` (no IF NOT EXISTS) would error against prod's already-existing tables. Hence the manual reset. Pre-flight verification ----------------------- `bun run typecheck` clean. `dbmate --migrations-dir db/migrations up` against a fresh pgvector/pgvector:pg16 container applies the baseline successfully and produces the same schema state as the pre-squash 82-migration chain. Audit trail ----------- Three Explore agents ran in parallel before the squash to find stale schema everywhere. Findings (zero false positives uncovered when spot-checking): - Audit 1 (dead tables): 6 dead, all dropped. - Audit 2 (dead views/functions/triggers/sequences): 0 dead. Past migrations had already cleaned up event_thread_tree, normalize_event_created_by, three notify_* functions. - Audit 3 (dead columns): 1 confirmed (agents.skill_auto_granted_domains), 1 confirmed dead by comment-only references (runs.retry_delay_seconds). 4 false-positive suspects (agents.soulMd et al) — verified alive in owletto admin UI + CLI apply. - Audit 4 (deprecation markers in source): no actionable items not already cleaned up. Codex pushback was applied earlier in the conversation: don't drop `embedded-schema-patches.ts` while keeping the two-execution-model embedded boot (which would break). This commit also rewrites that boot path to remove the dual-model, so the file's deletion is now safe.

coderabbitai · 2026-05-19T03:45:09Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9f95ab1c-9445-4254-b4e5-898bd579e550

📥 Commits

Reviewing files that changed from the base of the PR and between 54de2e0 and 283c2ee.

📒 Files selected for processing (92)

.github/workflows/ci.yml
Makefile
db/migrations/00000000000000_baseline.sql
db/migrations/20260405193000_add_mcp_sessions.sql
db/migrations/20260408120000_remove_system_connectors.sql
db/migrations/20260408120001_optional_compiled_code.sql
db/migrations/20260409110000_add_active_watcher_run_index.sql
db/migrations/20260409130000_connector_default_config.sql
db/migrations/20260410120000_add_agent_secrets.sql
db/migrations/20260413170000_add_watcher_group_id.sql
db/migrations/20260416120000_add_entity_wa_jid_index.sql
db/migrations/20260417100000_add_entity_identities.sql
db/migrations/20260418100000_add_auth_runs.sql
db/migrations/20260418110000_add_runs_created_by_user.sql
db/migrations/20260419120000_add_event_identity_indexes.sql
db/migrations/20260420120000_extend_reserved_org_slugs.sql
db/migrations/20260424030000_add_watcher_run_correlation.sql
db/migrations/20260424130000_relax_events_client_id_fk.sql
db/migrations/20260425100000_normalize_watcher_feedback.sql
db/migrations/20260425120000_add_run_diagnostics.sql
db/migrations/20260425130000_add_repair_agent_plumbing.sql
db/migrations/20260426120000_entities_entity_type_fk.sql
db/migrations/20260426130000_db_integrity_cleanup.sql
db/migrations/20260426130001_db_integrity_cleanup_concurrent.sql
db/migrations/20260427133000_events_created_by_nullable.sql
db/migrations/20260427140000_identity_engine_indexes.sql
db/migrations/20260427150000_drop_events_source_id.sql
db/migrations/20260427160000_drop_dead_schema.sql
db/migrations/20260427170000_market_founder_to_member.sql
db/migrations/20260428040000_cascade_events_watchers_org_fk.sql
db/migrations/20260428050000_add_runs_approved_input.sql
db/migrations/20260429010000_auth_profile_tenant_scoped_fk.sql
db/migrations/20260429060000_extend_runs_for_lobu_queue.sql
db/migrations/20260429120000_agent_changed_notify.sql
db/migrations/20260429120100_user_auth_profiles_and_model_prefs.sql
db/migrations/20260429120200_fix_notify_old_keys.sql
db/migrations/20260429130000_oauth_states_cli_sessions_rate_limits.sql
db/migrations/20260429140000_phase8_grants_chat_connections_mcp_sessions.sql
db/migrations/20260429140100_runs_priority_expires_at_retry_delay.sql
db/migrations/20260429180000_drop_invalidatable_cache_triggers.sql
db/migrations/20260430005614_agents_apply_fields.sql
db/migrations/20260430022231_fix_connection_config_encryption.sql
db/migrations/20260430151215_add_task_run_type.sql
db/migrations/20260501000000_drop_cli_sessions.sql
db/migrations/20260501133000_lobu_memory_mcp_id.sql
db/migrations/20260502000000_drop_chat_connections.sql
db/migrations/20260503000000_agent_secrets_org_scope.sql
db/migrations/20260504000000_flatten_agents_drop_sandbox_model.sql
db/migrations/20260510220000_connector_required_capability.sql
db/migrations/20260512000000_device_worker_connection_binding.sql
db/migrations/20260512131703_connections_slug.sql
db/migrations/20260513000000_chat_user_identities.sql
db/migrations/20260513120000_auth_profiles_device_binding.sql
db/migrations/20260513150000_auth_profiles_cdp_url.sql
db/migrations/20260513200000_notifications_as_events.sql
db/migrations/20260514000000_scheduled_jobs.sql
db/migrations/20260514120000_auth_profiles_connector_key_nullable.sql
db/migrations/20260514130000_connection_action_modes.sql
db/migrations/20260514160000_auth_profiles_mirror_mode.sql
db/migrations/20260515120000_agents_per_org_pk.sql
db/migrations/20260515150000_geo_enrichment.sql
db/migrations/20260515160000_drop_agents_org_id_unique.sql
db/migrations/20260515170000_auth_profiles_default_for_connector.sql
db/migrations/20260516120000_agents_per_org_pk_swap.sql
db/migrations/20260516200000_events_search_tsv.sql
db/migrations/20260516200100_events_lifecycle_changes_index.sql
db/migrations/20260517010000_drop_unused_indexes.sql
db/migrations/20260517020000_softdelete_orphan_feeds.sql
db/migrations/20260517030000_pat_worker_id_binding.sql
db/migrations/20260517040000_archive_orphan_watchers.sql
db/migrations/20260517050000_watcher_agent_id_not_null.sql
db/migrations/20260517060000_watcher_schema_additions.sql
db/migrations/20260517150000_goals_primitive.sql
db/migrations/20260517160000_drop_goals_primitive.sql
db/migrations/20260518000000_pending_interactions.sql
db/migrations/20260518010000_runs_heartbeat_reaper_index.sql
db/migrations/20260518020000_runs_heartbeat_inflight_narrow.sql
db/migrations/20260518040000_agent_transcript_snapshot.sql
db/migrations/20260518050000_runs_denormalize_agent_conversation.sql
db/migrations/20260518060000_revert_runs_denormalize.sql
db/migrations/20260518070000_runs_heartbeat_inflight_widen.sql
db/migrations/20260519000000_passkey_table.sql
db/migrations/20260519020000_chat_state_tables.sql
db/migrations/20260519020001_revoked_tokens.sql
db/schema.sql
packages/server/src/__tests__/integration/embedded-schema-patches.test.ts
packages/server/src/__tests__/integration/identity/founder-to-member-migration.test.ts
packages/server/src/db/embedded-schema-patches.ts
packages/server/src/gateway/auth/revoked-token-store.ts
packages/server/src/gateway/connections/state-adapter.ts
packages/server/src/start-local.ts
scripts/normalize-schema.sh

📝 Walkthrough

Walkthrough

This PR implements a large-scale database migration squashing and boot refactoring. Dozens of migration files (dating February–May 2026) are removed to consolidate into a baseline migration. The CI workflow gains a [squash-baseline] commit-message bypass for immutability checks, removes schema normalization steps, and simplifies to basic migration application. The local server boot switches from embedded schema patches to a migration ledger approach using the schema_migrations table.

Changes

Squashed Baseline Migration & Boot Refactor

Layer / File(s)	Summary
CI immutability and migration workflow simplification `.github/workflows/ci.yml`	Immutability check gains `[squash-baseline]` bypass; `dbmate up` is invoked without `--schema-file` for drift checking; schema normalization logic and post-migration snapshot diffing are removed.
Build target removal `Makefile`	The `db-schema` make target (which ran `dbmate` + `./scripts/normalize-schema.sh`) is deleted; `.PHONY` and `help` output are updated.
Local server migration ledger and boot refactor `packages/server/src/start-local.ts`	Embedded schema patches import and fallback are removed. `runMigrations` now ensures `schema_migrations` table exists, loads applied versions into memory, filters and applies unapplied migrations from `db/migrations/`, and records versions via `INSERT ... ON CONFLICT IGNORE`.
Documentation clarifications on schema source `packages/server/src/gateway/auth/revoked-token-store.ts`, `packages/server/src/gateway/connections/state-adapter.ts`	Module docs clarify that Postgres schema is now sourced from the squashed baseline migration (`db/migrations/00000000000000_baseline.sql`) applied by all boot paths, replacing the prior embedded patches approach.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

lobu-ai/lobu#834: Adds public.pending_interactions table that this PR removes via deleted migration 20260518000000_pending_interactions.sql.
lobu-ai/lobu#901: Adds scripts/normalize-schema.sh and db-schema target that this PR removes.
lobu-ai/lobu#893: Modifies packages/server/src/db/embedded-schema-patches.ts that this PR deletes entirely.

Suggested labels

skip-size-check

Poem

🐰 Migrations once many, now one baseline true,
Schema patches gone, ledger-tracked through and through,
CI bypasses squash-commits with grace,
Boot from db/migrations finds its rightful place!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/db-squash-baseline

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-05-19T03:50:15Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…teps to baseline header Two follow-ups to the schema squash (#908): 1. CI's migration-immutability check now skips when ANY commit in the PR contains the sentinel `[squash-baseline]` in its message. This is the one-time-per-squash escape hatch. Code review is the human gate; future random deletions won't ship without the sentinel + reviewer sign-off. 2. The baseline file's header now spells out the full prod-rollout procedure with explicit backup commands (pg_dump full snapshot, CSV dumps of the schema_migrations ledger and the named droppee tables) plus a rollback procedure for both kinds of restoration. Audit said the dropped tables had zero TS readers; the dumps are paranoia. [squash-baseline]

… rollback Three fixes after end-to-end verification against two containers (Container A: 82 old migrations + surgery; Container B: fresh DB + new baseline). Canonical schemas now match byte-for-byte across both. 1. Strip schema_migrations table CREATE from the baseline body. dbmate creates that table itself on first use, and the previous baseline colliding `CREATE TABLE public.schema_migrations` would error a fresh `dbmate up` with `relation "schema_migrations" already exists`. Verified Container B now applies in 2.6s with 67 user tables + the dbmate-managed schema_migrations. 2. Switch the surgery from DROP TABLE to RENAME for the 6 dead tables so no row-level data can be lost even if the audit was wrong about one of them. Column drops get the same treatment: snapshot the value into a side table BEFORE the ALTER TABLE DROP COLUMN, so restore is a single UPDATE if needed. Shortened suffix to `_d20260519` (10 chars) to fit Postgres's 63-char identifier limit for the long `migration_*` artifact table names. All renames verified non-truncating. 3. Anchor the rollback story on CNPG point-in-time recovery, which this codebase already has wired up: - `Cluster.spec.backup.barmanObjectStore` streams WAL to Cloudflare R2 (`s3://summaries-db-backup`) - `retentionPolicy: 30d` (30-day PITR window) - `archive_timeout: 900` (15-min WAL force-archive) - `ScheduledBackup` daily at 02:00 UTC - Recovery proof-point exists at packages/owletto/deploy/k8s/apps/lobu/base/db-recovery.yaml (used on 2026-03-15 after the Reddit re-sync incident) The baseline header now documents two layered recovery paths: - Path A (preferred, full): apply a recovery Cluster CR with `recoveryTarget.targetTime` set to the pre-surgery timestamp captured in STEP 0. CNPG fetches the latest base backup + replays WAL up to that time. Flip the app's DATABASE_URL to point at the recovered cluster. - Path B (lightweight, just the squash): restore the ledger CSV + rename the backup tables back + ADD COLUMN + UPDATE from snapshot tables. End-to-end test (manual; not automated since requires Docker): - Boot pgvector/pgvector:pg16, apply old 82 migrations → snapshot A. - Run surgery on A → snapshot A' (post-surgery). - Boot fresh pgvector, apply new baseline → snapshot B. - Diff canonical (non-d20260519, non-dropped_*) schemas between A' and B: * Identical column-by-column for all 67+ canonical tables * agents.skill_auto_granted_domains absent in both * runs.retry_delay_seconds absent in both * schema_migrations contains only '00000000000000' in both - A' additionally has 8 backup tables (renamed originals + 2 column snapshots). B doesn't have them by design. [squash-baseline]

CI integration job failed on two tests that hard-coded paths to files the baseline squash removed: 1. `packages/server/src/__tests__/integration/embedded-schema-patches.test.ts` - Imported `EMBEDDED_SCHEMA_PATCHES` from the now-deleted `db/embedded-schema-patches.ts`. Test was a unit-shape verifier for that file; with the file gone (embedded path now runs migrations the same way prod does), the test has nothing to verify. 2. `packages/server/src/__tests__/integration/identity/founder-to-member-migration.test.ts` - Read `db/migrations/20260427170000_market_founder_to_member.sql` directly (extracted -- migrate:up section, re-ran it manually to check idempotency). The migration is now collapsed into the baseline; idempotency of one historical migration is no longer a meaningful unit boundary. Both tests deleted. Also updated two stale doc-comments in `packages/server/src/gateway/{auth/revoked-token-store,connections/state-adapter}.ts` that pointed at the deleted forward-delta migrations + the deleted `embedded-schema-patches.ts`. Both now point at the baseline. [squash-baseline]

Codex review of #908 flagged three real issues. All three fixed: 1. (HIGH) PITR recovery doc was wrong about the ledger state. The recovered DB has the OLD 82 ledger rows but not '00000000000000'. When the new image's dbmate-up runs against it, it sees the baseline as pending and tries CREATE TABLE against existing tables, which errors. Rewrote recovery path A to spell out the two reconciliation choices: (a) revert to old image then repoint (safest — old image expects 82 ledger rows; new recovered DB has them); (b) keep new image and manually DELETE FROM schema_migrations; INSERT … VALUES ('00000000000000'); on the recovered DB before any new-image migration job runs. Without (b), the new image fails on first boot against the recovered DB. 2. (HIGH) agents column snapshot missed the composite primary key. agents_pkey is (organization_id, id); the snapshot stored only `id AS agent_id`, and the rollback UPDATE matched only `a.id = b.agent_id`. When the same id is reused across orgs (the PK contract permits it), restoration would target a wrong row nondeterministically. Snapshot now includes organization_id; the rollback UPDATE joins on both. runs.PK is just (id) so the runs snapshot is unchanged. 3. (MEDIUM) pg_dump emitted SELECT pg_catalog.set_config('search_path', '', false) which is session-scoped. Subsequent forward migrations using unqualified names would fail under CI's `dbmate up` (which doesn't reset between files). Changed `false` → `true` (transaction-scoped) with a comment explaining why. Re-verified Container B (fresh DB + baseline) applies in 1.7s, 68 tables. Surgery logic in the header was edited in-place (it's a comment block, not an executable statement), so the change is doc-only on the apply side; the script you'd paste at surgery time is now the correct one. [squash-baseline]

buremba · 2026-05-19T04:21:09Z

Codex review (high-effort) — applied (commit `e3dad80`)

Three findings, all fixed before merge.

#	Severity	Finding	Fix
1	HIGH	PITR recovery doc was wrong about the ledger. Recovered DB has the OLD 82 ledger rows but not `'00000000000000'`. New image's `dbmate up` would treat the baseline as pending → CREATE TABLE against existing tables → error.	Rewrote recovery path A. Two reconciliation choices now spelled out: (a) revert to old image then repoint (safest), (b) keep new image and insert baseline ledger row before any new-image migration job.
2	HIGH	`agents` snapshot missed the composite primary key. `agents_pkey` is `(organization_id, id)`; snapshot stored only `id`. Restore would clobber the wrong org's agent if `id` is reused across orgs.	Snapshot now keeps `organization_id, id, value`; rollback UPDATE joins on both. (`runs.PK = (id)` so the runs snapshot is unchanged.)
3	MEDIUM	`pg_dump` emitted `SELECT pg_catalog.set_config('search_path', '', false)` — session-scoped, leaks past baseline. Later forward migrations using unqualified names would fail under CI's `dbmate up`.	Changed `false` → `true` (transaction-scoped).

Re-verification: Container B (fresh DB + baseline) still applies cleanly in 1.7s with 68 tables.

Direct codex confirmations:

runMigrations() in start-local.ts correctly handles the post-surgery prod DB (with only '00000000000000' in schema_migrations).
Fresh PGlite bootstrap path works.
Surgery sequence is data-preserving after the agents composite-PK fix.

…earing CI integration job caught what the audit missed: `runs.retry_delay_seconds` is heavily used by RunsQueue, not "comments only" as the audit reported. Concrete references in packages/server/src/gateway/infrastructure/queue/runs-queue.ts: - L301: `const retryDelaySeconds = options?.retryDelay ?? null;` - L368: `retry_delay_seconds,` (INSERT column list) - L386: `retryDelaySeconds,` (INSERT value binding) - L575: type signature exposes `retryDelaySeconds: number | null;` - L584: SQL projection types `retry_delay_seconds: number | string | null` - L603: `RETURNING r.id, r.action_input, r.attempts, r.max_attempts, r.retry_delay_seconds` - L617-620: post-claim value extraction Audit's snake-case grep returned 7 hits "in comments/type hints" — the file genuinely uses the snake_case column in SQL strings AND the camelCase JS binding for the same data. The miss was a methodology blind spot the audit also had on agents.soulMd / nixConfig (caught earlier by spot-check). The CI failure was 6 RunsQueue integration tests all failing with: PostgresError: column "retry_delay_seconds" of relation "runs" does not exist Fix: - Re-add `retry_delay_seconds integer,` to the runs CREATE TABLE block in the baseline (between `expires_at` and the constraints, matching origin/main's schema.sql). - Remove the runs.retry_delay_seconds drop from the surgery script (it stays in prod; nothing to surgery). - Remove the runs_d20260519_retry_delay_seconds snapshot table from the rollback section. - Update the docstring drop list. agents.skill_auto_granted_domains stays dropped — audit was correct about that one (verified with spot-check earlier). Re-verified Container B (fresh DB + baseline) applies in 1.9s. The `runs` table now has the column; SELECT confirms it. [squash-baseline]

buremba added 4 commits May 19, 2026 04:50

buremba marked this pull request as ready for review May 19, 2026 04:33

buremba merged commit 54207a8 into main May 19, 2026
19 of 20 checks passed

buremba deleted the chore/db-squash-baseline branch May 19, 2026 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(db): squash 82 migrations into baseline + retire schema.sql + embedded patches#908

chore(db): squash 82 migrations into baseline + retire schema.sql + embedded patches#908
buremba merged 6 commits into
mainfrom
chore/db-squash-baseline

buremba commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

codecov-commenter commented May 19, 2026

Uh oh!

buremba commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

buremba commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Audit findings (4 parallel Explore agents)

Top-15 tables get `COMMENT ON TABLE` descriptions

Data safety

Path A — CNPG point-in-time recovery (preferred, full restore)

Path B — In-DB safety net (lightweight, just-the-squash rollback)

Rollout procedure

Verification

CI

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

codecov-commenter commented May 19, 2026

Codecov Report

Uh oh!

buremba commented May 19, 2026

Codex review (high-effort) — applied (commit e3dad80)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

buremba commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Codex review (high-effort) — applied (commit `e3dad80`)