feat(gateway): mirror assistant_ingress_invites schema in gateway DB#29890
Conversation
… as transport loss Codex review on #29886 caught that assistantDbTransaction() treats every non-success response from ipcCallAssistant() as a transport failure and throws 'assistant may not be ready'. But ipcCallAssistant() also returns undefined on handler-level errors (msg.error path) — including SQL constraint violations. Real transaction errors were being misclassified as assistant unavailability, masking the underlying SQL error and breaking debuggability + retry decisions. Fix: - assistant: db_proxy_transaction now wraps caught errors in RouteError with statusCode 500 + DB_PROXY_TRANSACTION_FAILED code, preserving the original SQL message. Empty-steps validation throws a 400 RouteError (INVALID_PARAMS) instead of a bare Error. - gateway: assistantDbTransaction() switches to ipcCallAssistantStrict so handler errors come through as IpcHandlerError (carrying the SQL message + statusCode) and transport errors as IpcTransportError. - Tests updated to assert RouteError shape + that the original SQL constraint message survives wrapping.
Add the assistant_ingress_invites table to the gateway DB so future PRs
can migrate invite redemption (currently in invite-redemption-service.ts)
into the gateway process. With this schema in place plus contacts /
contact_channels already gateway-side, voice-invite redemption can
upsert the contact channel and record invite use in a single LOCAL
gateway-DB transaction instead of round-tripping the assistant DB.
Schema mirrors assistant/src/memory/schema/contacts.ts:assistantIngressInvites
exactly, with two additions:
* contactId now has a real FK on contacts.id (ON DELETE CASCADE).
The assistant-side schema only has the column without a reference
because contacts live in a different DB on that side.
* Four indexes covering the active query patterns from invite-store.ts:
- idx_..._voice_lookup (source_channel, status, expected_external_user_id)
for findActiveVoiceInvites
- idx_..._code_lookup (invite_code_hash, source_channel)
for findByInviteCodeHash
- idx_..._token_hash (token_hash) for findByTokenHash
- idx_..._contact (contact_id) for contact-scoped lookups
No data migration: voice invites have a TTL of 24h, so aging out during
the cutover window is acceptable. The table starts empty in the gateway
DB; reads/writes will continue going to the assistant DB until follow-up
PRs migrate the invite-creation and invite-redemption code paths.
Schema test verifies columns, NOT NULL constraints, defaults
(max_uses=1, use_count=0, status='active'), FK cascade behaviour, and
index column composition.
| // 6-digit code redemption: findByInviteCodeHash filters by | ||
| // (invite_code_hash, source_channel, status='active'). | ||
| index("idx_assistant_ingress_invites_code_lookup").on( | ||
| table.inviteCodeHash, | ||
| table.sourceChannel, | ||
| ), |
There was a problem hiding this comment.
🚩 Code redemption index may not fully cover the query pattern
The comment at gateway/src/db/schema.ts:179-180 says findByInviteCodeHash filters by (invite_code_hash, source_channel, status='active'), but the index idx_assistant_ingress_invites_code_lookup at lines 181-184 only covers (invite_code_hash, source_channel) — it does not include status. This means queries with a status='active' filter will need to do a post-index lookup for the status column. This is a minor inefficiency rather than a correctness bug, and may be intentional to keep the index smaller since most invites will be active during their short TTL.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Tests for schemas are ridiculous and useless lol - please remove
| // being collapsed into a generic transport failure. The previous | ||
| // ipcCallAssistant() based implementation masked real SQL errors as | ||
| // "assistant may not be ready", driving incorrect retry/fallback decisions. | ||
| const result = await ipcCallAssistantStrict("db_proxy_transaction", { |
There was a problem hiding this comment.
Is there any harm in ipcCallAssistant always being strict? not seeing the value of having both
There was a problem hiding this comment.
Fair point — the non-strict variant is mostly a footgun, lets callers silently no-op when the assistant is down/throwing.
Surveyed the call sites:
handle-inbound.ts(db_proxy in inbound pipeline) — should fail loudvoice-approval-sync.ts(db_proxy in approval sync) — should fail loudcontact-prompt.ts,contacts-control-plane-proxy.ts(resolve_contact_prompt, emit_event) — should fail loudpost-assistant-ready.ts(health check at startup) — currently lenient on purpose so startup can recoverroute-schema-cache.ts(get_route_schema) — defensive fallback
5/6 sites would be improved by going strict. Out of scope for this PR, but I'll open a follow-up that renames ipcCallAssistant → ipcCallAssistantOptional (or similar) to make the footgun explicit, migrates the 5 callers to strict, and leaves post-assistant-ready on the optional path with a comment.
There was a problem hiding this comment.
I was wrong — there isn't a real reason for the lenient form here either. Walked through waitForAssistant again:
while (Date.now() < deadline) {
const result = await ipcCallAssistant("health");
if (result !== undefined) return true;
await sleep(2000);
}The lenient form treats "assistant down" and "health handler threw" identically — both yield undefined, both get retried. With strict, the poll loop wraps in try/catch on IpcTransportError (expected, retry) and lets handler errors propagate (a thrown health handler is a real bug we shouldn't paper over). That's strictly better than today.
So: every caller can be strict. Going to roll the rename + migration into the same follow-up PR — ipcCallAssistant becomes the only variant (signature and semantics of today's Strict), and ipcCallAssistantStrict gets dropped. No optional path. Will open after this PR merges.
There was a problem hiding this comment.
Let's then not introduce ipcCallAssistantStrict and just use ipcCallAssistant
There was a problem hiding this comment.
Done in cabc2cd542. ipcCallAssistantStrict removed, ipcCallAssistant is now strict-by-default — throws IpcHandlerError (preserves statusCode + code) or IpcTransportError. Migrated every caller in this PR:
assistantDbTransactioncalls it directlyipc-runtime-proxyalready used the strict shape, just renamedcontacts-control-plane-proxyemit_eventbecomesvoid ipcCallAssistant(…).catch(() => {})(fire-and-forget)handle-inboundtouchContactChannelStatscasts result + checksresult.rows?.length; outer caller still fire-and-forgetvoice-approval-syncsame shapecontact-promptresolves errors via a newnotifyDaemonResolveErrorhelper that wraps the IPC in try/catch so resolve failures don't reject the HTTP responsepost-assistant-readywaitForAssistantswallowsIpcTransportErrorduring health polling and rethrows everything else
Tests updated to assert rejects.toBeInstanceOf(IpcTransportError) and a new IpcHandlerError propagation test. 43/43 pass on the affected suites.
| export const assistantIngressInvites = sqliteTable( | ||
| "assistant_ingress_invites", |
There was a problem hiding this comment.
Rename the table on this side to just be ingress_invites
There was a problem hiding this comment.
Done — assistant_ingress_invites → ingress_invites (drizzle export ingressInvites), all indexes renamed to idx_ingress_invites_* in 4e22952.
| id: text("id").primaryKey(), | ||
| sourceChannel: text("source_channel").notNull(), | ||
| tokenHash: text("token_hash").notNull(), | ||
| sourceConversationId: text("source_conversation_id"), |
There was a problem hiding this comment.
Remove conversation id, this is a daemon only concept
There was a problem hiding this comment.
Done — source_conversation_id removed in 4e22952.
| voiceCodeHash: text("voice_code_hash"), | ||
| voiceCodeDigits: integer("voice_code_digits"), |
There was a problem hiding this comment.
Remove all voice related columns - we want to keep this table channel agnostic
There was a problem hiding this comment.
Done in 4e22952 — removed expected_external_user_id, voice_code_hash, voice_code_digits (and the voice_lookup index). For voice redemption to keep working post-cutover, the equivalent lookup ("which active invite is this caller bound to?") will have to go through contact_channels.external_user_id → contact_id → invite. That's a redemption-code-path change, deferred to the follow-up PR that migrates redeemVoiceInvite to the gateway.
| // Display metadata for personalized voice prompts (nullable). | ||
| friendName: text("friend_name"), | ||
| guardianName: text("guardian_name"), | ||
| contactId: text("contact_id") |
There was a problem hiding this comment.
We can remove these and rely on the contacts table
There was a problem hiding this comment.
Done — friend_name and guardian_name removed in 4e22952. Voice prompts that need a display name will read it from contacts.display_name (already gateway-side) via the contact_id FK.
| { | ||
| id: text("id").primaryKey(), | ||
| sourceChannel: text("source_channel").notNull(), | ||
| tokenHash: text("token_hash").notNull(), |
There was a problem hiding this comment.
What's the difference between this column and invite_code_hash?
There was a problem hiding this comment.
Two different things, both kept (with comments now in the schema):
token_hash— hash of the long opaque token in the invite URL (e.g.vellum.me/i/<token>). Always set; every invite has a URL form.invite_code_hash— hash of the optional short human-typeable code (typically 6 digits). Nullable; not every invite is issued with a typeable fallback.
They serve different redemption flows (URL click vs typed code) and can both be set on the same invite when it supports both forms. We could collapse to one hash column with a code_format discriminator, but that would force each invite to a single redemption form — breaking the case where one invite has both a sharable URL and a typeable fallback. Let me know if you'd rather collapse anyway.
There was a problem hiding this comment.
Looked at the create code (invite-service.ts lines 211-215, invite-store.ts lines 60-67). Today they're two independent secrets, NOT reformats:
rawToken= 32 random bytes, base64url → 43 chars. Stored astokenHash. Always set.inviteCode= 6 digits fromgenerateVoiceCode(6). Stored asinviteCodeHash. Set for non-voice invites.
The URL embeds the 43-char token; the 6-digit code is shown separately. Possessing one doesn't give you the other. The design intent seems to be "two parallel credentials for the same invite" — leak one, the other is still secret.
I think your instinct is right that this is over-engineered. A cleaner design: one secret per invite, presented in whatever form the channel needs. Either:
- Short secret (6 digits) embedded in URL —
vellum.me/i/847291. Brute-forceable in isolation, so add per-invite rate limit + short TTL + channel scope to mitigate. Droptoken_hash, keepinvite_code_hash. - Long secret (43 chars) for unguessable URLs, no typeable fallback. Drop
invite_code_hash, keeptoken_hash. Voice/text typeable flows would need a different mechanism (e.g. generate a short scoped lookup key on demand).
I'd lean toward option 1 (short secret + rate limit) since the typed form is what makes voice and text-channel invites work at all, and rate-limiting per-invite is cheap. But this is a redesign that touches the creation/redemption code, not just schema. Want me to keep the schema as-is here and queue the unification as a separate PR after #29890 + the redemption-migration PR? Or collapse to one column now and we accept that the redemption code still writes token_hash as a no-op until the unification lands?
There was a problem hiding this comment.
We'll do unification in the future as we migrate logic over to the gateway. for now, let's keep the short secret.
There was a problem hiding this comment.
Done — dropped token_hash column and idx_ingress_invites_token_lookup, kept invite_code_hash as the short secret. Two indexes remain: idx_ingress_invites_code_lookup on (invite_code_hash, source_channel) and idx_ingress_invites_contact on contact_id. In e952b09c0b.
- Rename table assistant_ingress_invites -> ingress_invites; drop the 'assistant_' prefix on the gateway side. Indexes renamed to match. - Drop source_conversation_id (daemon-only concept; conversations don't exist in the gateway). - Drop voice-specific columns (expected_external_user_id, voice_code_hash, voice_code_digits) and the voice_lookup index. Make the table channel-agnostic — voice flows can locate the right invite via contact_channels.external_user_id + contact_id rather than embedded columns here. - Drop friend_name and guardian_name; rely on contacts.display_name. - Remove the schema test file. Schema tests against drizzle definitions re-state the schema in another shape and don't catch real bugs. Survives: id, source_channel, token_hash (URL token), invite_code_hash (optional short typeable code), note, max_uses/use_count, expires_at, status, redeemed_by_*, contact_id (FK CASCADE), created_at, updated_at.
| // Sits in the gateway DB so future PRs can move invite redemption | ||
| // (currently in assistant/src/runtime/invite-redemption-service.ts) into | ||
| // the gateway process. With contacts/contact_channels already gateway-side, | ||
| // invite redemption can do a single LOCAL gateway-DB transaction | ||
| // (contact-channel upsert + invite use record) instead of round-tripping | ||
| // through the assistant DB. | ||
| // | ||
| // The table is empty at install time; it begins receiving writes when the | ||
| // invite-creation and invite-redemption code paths are migrated. No data | ||
| // migration from the assistant DB — voice invites are short-lived (TTL | ||
| // ≤24h) and aging out is acceptable during the cutover window. | ||
| // | ||
| // Channel-agnostic by design. Channel-specific lookup (e.g. matching a | ||
| // caller to their pending invite) goes through contact_channels rather | ||
| // than embedded columns here. |
There was a problem hiding this comment.
don't need any of these comments
There was a problem hiding this comment.
Done — comments stripped on ingressInvites (header + per-column) in e952b09c0b.
| { | ||
| id: text("id").primaryKey(), | ||
| sourceChannel: text("source_channel").notNull(), | ||
| tokenHash: text("token_hash").notNull(), |
There was a problem hiding this comment.
We'll do unification in the future as we migrate logic over to the gateway. for now, let's keep the short secret.
| // being collapsed into a generic transport failure. The previous | ||
| // ipcCallAssistant() based implementation masked real SQL errors as | ||
| // "assistant may not be ready", driving incorrect retry/fallback decisions. | ||
| const result = await ipcCallAssistantStrict("db_proxy_transaction", { |
There was a problem hiding this comment.
Let's then not introduce ipcCallAssistantStrict and just use ipcCallAssistant
Per review: ingressInvites only ever held tokenHash transiently before the redemption path moved to gateway-managed bindings. Now that the schema reflects gateway ownership, only inviteCodeHash is needed. - Remove tokenHash column and idx_ingress_invites_token_lookup - Strip header/inline comments on the ingressInvites table - Keep idx_ingress_invites_code_lookup on (inviteCodeHash, sourceChannel) and idx_ingress_invites_contact on contactId
…ct variant
Per review: the lenient ipcCallAssistant was a transitional shim. Every
call site either (a) relied on the strict variant already, or (b) fired
and forgot — neither case wants silent error swallowing. This replaces
the lenient overload with a single strict implementation that throws
either IpcHandlerError (preserving statusCode + code) or
IpcTransportError, and migrates every caller in this PR.
Caller migrations:
- assistantDbTransaction now calls ipcCallAssistant directly (no Strict
alias)
- ipc-runtime-proxy uses ipcCallAssistant (was already effectively
strict)
- contacts-control-plane-proxy emit_event becomes
void ipcCallAssistant(...).catch(() => {}) since fire-and-forget
- handle-inbound touchContactChannelStats casts result and checks
result.rows?.length; outer caller still fire-and-forget
- voice-approval-sync follows the same shape
- contact-prompt resolves errors through a new
notifyDaemonResolveError helper that wraps the IPC in try/catch so
resolve failures don't reject the HTTP response
- post-assistant-ready waitForAssistant catches IpcTransportError
during health polling but rethrows everything else
Tests:
- assistant-client.test.ts: replace 'returns undefined' tests with
rejects.toBeInstanceOf(IpcTransportError); add IpcHandlerError test
covering message/statusCode/code propagation; gate
ASSISTANT_IPC_SOCKET_DIR in setup so tests don't hit the real daemon
- ipc-runtime-proxy.test.ts: collapse dual mocks into a single
ipcCallAssistantMock with a defaultIpcImpl
Also fix lint import sort in
assistant/src/ipc/routes/db-proxy-transaction.ts.
Summary
PR 2 of 2 for moving invite redemption out of the assistant daemon and into the gateway. The first PR (#29886) added the
db_proxy_transactionIPC route — this one mirrors theassistant_ingress_invitestable into the gateway DB so future PRs can migrate the invite-creation and redemption code paths into a local gateway-DB transaction without round-tripping the assistant DB.What's in this PR
Codex follow-up from #29886
The first commit incorporates Codex's P2 comment from #29886 —
assistantDbTransaction()was masking SQL handler errors as transport loss becauseipcCallAssistantreturnsundefinedfor both:Now uses
ipcCallAssistantStrictso handler errors come through asIpcHandlerErrorwith the original SQL message + statusCode, and transport errors asIpcTransportError. The assistant route wraps caught SQL/runtime errors in aRouteErrorwithstatusCode 500and codeDB_PROXY_TRANSACTION_FAILED, and rejects empty step lists with a 400RouteError(codeINVALID_PARAMS). Tests assert the original SQL constraint message survives the wrapping.assistant_ingress_invitesin the gateway DBThe schema mirrors
assistant/src/memory/schema/contacts.ts:assistantIngressInvitesexactly, with two additions:contactIdnow has a real FK oncontacts.id(ON DELETE CASCADE). The assistant-side schema only has the column without a reference because contacts live in a different DB on that side.assistant/src/memory/invite-store.ts:idx_..._voice_lookup (source_channel, status, expected_external_user_id)— coversfindActiveVoiceInvitesidx_..._code_lookup (invite_code_hash, source_channel)— coversfindByInviteCodeHashidx_..._token_hash (token_hash)— coversfindByTokenHashidx_..._contact (contact_id)— covers contact-scoped lookupsWhat's NOT in this PR
pushSQLiteSchemaflow handles fresh table creation; the data-migrations registry is reserved for one-shot data movements.Tests
gateway/src/__tests__/assistant-ingress-invites-schema.test.ts(5 tests, 57 expects): columns / NOT NULL / defaults / FK cascade / index column composition.assistant/src/__tests__/db-proxy-transaction.test.ts(6 tests, 28 expects): regression check for the Codex fix — original SQL message + statusCode survive error wrapping.push-schema-no-prompt+auto-approve-thresholds).Follow-ups
After this lands, in approximate order:
redeemVoiceInviteto the gateway, replacing the cross-DBdb_proxyround-trip with a local gateway transaction (this is wheredb_proxy_transactionfrom feat(ipc): add db_proxy_transaction for atomic gateway-orchestrated writes #29886 starts paying for itself during the cutover window).upsertContactChannelinvocation frominvite-redemption-service.tsfor the phone case (consumer ci: add web and platform CI/PR workflows #4 in ATL-291).