Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions docs/MIGRATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# Migrations

dbmate-managed SQL under `db/migrations/`. Names are `<UTC-yyyymmddHHMMSS>_<slug>.sql` with a `-- migrate:up` / `-- migrate:down` split. Pre-upgrade Helm hook runs `dbmate up` against the prod database before the app rolls.

Most migrations are routine: an additive column, a small index, a view bump. The risk lives in the ones that **rewrite a hot table** or **rebuild a large index**. Get those wrong and prod stalls under `ACCESS EXCLUSIVE` until the database's `statement_timeout` fires, after which dbmate exits non-zero, the migration leaves no row in `schema_migrations`, and the app deploys forward into a schema it expects but doesn't have. That's the 2026-05-16 outage in one sentence ([PR #767](https://github.com/lobu-ai/lobu/pull/767)).

The rules below are what we wish #765 had followed.

---

## Always

- **Read what your DDL actually does.** Postgres has three speeds: `O(1)` metadata flip, `O(rows)` table rewrite under `ACCESS EXCLUSIVE`, and `O(rows)` write with regular row locks. Know which one you wrote.
- **State the operational cost in the migration body** as a comment, in seconds for the real table size, not "should be fast." Cite the row count you tested on. If you didn't test on the real size, say so.
- **Keep migrations idempotent on the up-side** when practical (`CREATE INDEX IF NOT EXISTS`, `ALTER TABLE … ADD COLUMN IF NOT EXISTS`). dbmate doesn't retry, but a half-applied transaction recovered manually is easier when the SQL is re-runnable.

---

## Watch out for

### `ADD COLUMN ... GENERATED ALWAYS AS (...) STORED`

This rewrites every row to materialize the generated value. On the 1.15M-row `events` table the rewrite took >60s and tripped the DB's `statement_timeout`. Do not use `GENERATED STORED` on any table over ~100k rows during a Helm hook.

**Safe pattern:**

```sql
-- migrate:up

-- Step 1: nullable column, no backfill, no rewrite. O(1).
ALTER TABLE events ADD COLUMN IF NOT EXISTS search_tsv tsvector;

-- Step 2: trigger so new rows get the value (fast inserts; trigger overhead
-- is the cost of doing this incrementally).
CREATE OR REPLACE FUNCTION events_set_search_tsv() RETURNS trigger AS $$
BEGIN
NEW.search_tsv :=
setweight(to_tsvector('english', COALESCE(NEW.title, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(NEW.payload_text, '')), 'B');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS trg_events_set_search_tsv ON events;
CREATE TRIGGER trg_events_set_search_tsv
BEFORE INSERT OR UPDATE OF title, payload_text ON events
FOR EACH ROW EXECUTE FUNCTION events_set_search_tsv();
```

Then **backfill in a separate migration / script** that runs in batches outside the Helm hook (e.g. a `BATCH_BACKFILL=1`-gated boot path or a `scripts/backfill-*.sh` runbook):

```sql
UPDATE events SET search_tsv =
setweight(to_tsvector('english', COALESCE(title, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(payload_text, '')), 'B')
WHERE id BETWEEN $START AND $END AND search_tsv IS NULL;
```

Pace ~10k rows per batch with a sleep between batches; each batch is its own transaction so VACUUM can keep up.

### `CREATE INDEX` on a large table

Plain `CREATE INDEX` takes a `SHARE` lock on the table — reads still work, but **writes block** for the duration of the build. On `events` that's seconds of stalled INSERTs.

**Safe pattern:** `CREATE INDEX CONCURRENTLY`. Multiple passes, but only takes `SHARE UPDATE EXCLUSIVE` so reads + writes continue.

```sql
-- CONCURRENTLY does not run inside a transaction. dbmate runs each .sql
-- file in a single implicit transaction unless you opt out, so:
--
-- migrate:up transaction:false
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_events_search_tsv
ON events USING gin (search_tsv);
```

If the migration *must* be one file with the column-add too, the column-add stays in a normal transaction-wrapped migration and the index is a follow-up `transaction:false` migration.

**`CONCURRENTLY` + `IF NOT EXISTS` trap:** if a previous `CREATE INDEX CONCURRENTLY` failed mid-build (timeout, deadlock, connection dropped), Postgres leaves an **invalid** index behind. The index *name* exists, so the next `CREATE INDEX CONCURRENTLY IF NOT EXISTS` skips the rebuild — and now your "successful" deploy is using a half-built index that the planner refuses to read from. Before retrying, check:

```sql
SELECT i.relname, x.indisvalid
FROM pg_class i
JOIN pg_index x ON x.indexrelid = i.oid
WHERE i.relname = 'idx_events_search_tsv';
```

If `indisvalid = false`, drop it with `DROP INDEX CONCURRENTLY idx_events_search_tsv;` (also `transaction:false`) and retry the create.

### `ALTER TABLE ... ALTER COLUMN ... SET NOT NULL`

Postgres has to verify every row, scanning the whole table under `ACCESS EXCLUSIVE`. On a hot table this stalls.

**Safe pattern (PG 12+):** add a `CHECK (col IS NOT NULL) NOT VALID` first, run `ALTER TABLE … VALIDATE CONSTRAINT` (only takes `SHARE UPDATE EXCLUSIVE`), then `ALTER COLUMN SET NOT NULL` becomes `O(1)` because the constraint already proves the property.

```sql
ALTER TABLE events ADD CONSTRAINT events_title_not_null
CHECK (title IS NOT NULL) NOT VALID;
ALTER TABLE events VALIDATE CONSTRAINT events_title_not_null;
ALTER TABLE events ALTER COLUMN title SET NOT NULL;
ALTER TABLE events DROP CONSTRAINT events_title_not_null;
```

### Cascading `ON DELETE SET NULL` / `ON DELETE CASCADE` on hot tables

A single `DELETE FROM connections WHERE id IN (...)` triggers an internal `UPDATE events SET connection_id = NULL WHERE connection_id IN (...)`. For a connection with 92k events that's a 13s blocking UPDATE — visible to ops as "the admin action hung the API."

**Safe pattern:** keep cascades for *small* parent tables only. For parents that fan out to events, watcher_window_events, or anything else 100k+ rows wide:

1. Make sure the **child FK column is indexed** (e.g. `idx_events_connection_id`) — without it, the cascade falls back to a seq scan and the UPDATE goes from 13s to minutes.
2. Manage the dependent nulling in application code with batched UPDATEs (e.g. 1k rows per loop, sleep between).
3. Run the parent `DELETE` only **after** the child rows are already nulled/orphaned — the cascade then has nothing to do and the delete is fast.

Indexing alone helps the cascade, but it doesn't eliminate the per-row WAL write; batching before the delete is what keeps the API responsive.

### Bare `DROP INDEX`

Takes `ACCESS EXCLUSIVE`. Use `DROP INDEX CONCURRENTLY` (also `transaction:false`).

---

## Operational checklist for any "big" migration

If the migration touches a table over ~100k rows or rebuilds a GIN/ivfflat index, before you open the PR:

1. **Time it locally against a copy of prod-sized data**, not a 1k-row dev DB. We have `psql "$DATABASE_URL" -c '\COPY ...'` paths to clone a single table for this purpose.
2. **Write the actual observed timing in the migration's header comment**, not a guess.
3. **Compare against the DB's `statement_timeout`** for the migration role. Prod is 60s. If your migration is close, restructure — don't ship hoping it'll fit.
4. **State the recovery plan** in the PR body: if the migration times out, what's the manual `SET statement_timeout=0; \i path/to/file` recipe?
5. **Verify the new app image still boots against an un-applied schema** by running it locally with the migration held back. With the boot-time schema-version assertion from [#767](https://github.com/lobu-ai/lobu/pull/767), it should refuse to start — that's good. The point is to confirm the refusal is loud and clean.

---

## When dbmate fails in prod

Symptoms: `summaries-app-lobu-migrate-*` pod in `Error` state, app pod in `CrashLoopBackOff` or stuck in a tight error loop. Confirm the schema state:

```sh
DB="$(... pull DATABASE_URL from secret ...)"
psql "$DB" -tAc "SELECT max(version) FROM schema_migrations"
psql "$DB" -tAc "SELECT 1 FROM information_schema.columns WHERE table_name='<table>' AND column_name='<col>'"
```

If a migration left no `schema_migrations` row, the safe recovery is to apply it manually with the cap lifted. Two paths depending on whether the migration uses `CONCURRENTLY`:

**Standard migration (no CONCURRENTLY):** one transaction with the timeouts lifted.

```sh
psql "$DB" -v ON_ERROR_STOP=1 <<'SQL'
BEGIN;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Split recovery for transaction:false migrations

When the failed migration is one of the CREATE INDEX CONCURRENTLY/DROP INDEX CONCURRENTLY migrations recommended above, pasting its migrate:up section inside this BEGIN block will fail because PostgreSQL rejects concurrent index operations inside a transaction block. That makes the prod recovery recipe unusable for exactly the large-index migrations this document tells authors to write; call out that transaction:false migrations must be replayed without BEGIN/COMMIT and then record the schema version separately.

Useful? React with 👍 / 👎.

SET LOCAL statement_timeout = 0;
SET LOCAL lock_timeout = 0;
-- paste the migrate:up section here
INSERT INTO public.schema_migrations(version) VALUES ('<UTC-yyyymmddHHMMSS>');
COMMIT;
SQL
```

**`transaction:false` migration (any `CREATE/DROP INDEX CONCURRENTLY` or `REINDEX CONCURRENTLY`):** these can't run inside `BEGIN`. Use session-level `SET` and run statements one at a time. Verify success between each, then record the migration:

```sh
psql "$DB" <<'SQL'
SET statement_timeout = 0;
SET lock_timeout = 0;

-- one statement at a time, no BEGIN
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_foo ON bar (baz);
-- verify the index is valid before continuing:
SELECT indisvalid FROM pg_index WHERE indexrelid = 'idx_foo'::regclass;
-- if false: DROP INDEX CONCURRENTLY idx_foo; and rerun the create above.

-- once everything is healthy, record the migration in its own tiny txn:
BEGIN;
INSERT INTO public.schema_migrations(version) VALUES ('<UTC-yyyymmddHHMMSS>');
COMMIT;
SQL
```

Then `kubectl rollout restart deploy/<release>-lobu-app` to wipe cached prepared statements.

If the migration partially applied (column created, index missing, view missing), figure out which subset succeeded with `\d <table>` and run only the missing pieces.

---

## See also

- [PR #767](https://github.com/lobu-ai/lobu/pull/767) — the boot-time schema-version assertion that catches behind-DB images at startup.
- The post-incident commentary on the original migration ([#765](https://github.com/lobu-ai/lobu/pull/765)) for what "operational note" looks like when a migration's risks are flagged but not addressed.
Loading