Skip to content

feat(gateway): mirror assistant_ingress_invites schema in gateway DB#29890

Merged
dvargasfuertes merged 5 commits into
mainfrom
apollo/invite-table-gateway
May 8, 2026
Merged

feat(gateway): mirror assistant_ingress_invites schema in gateway DB#29890
dvargasfuertes merged 5 commits into
mainfrom
apollo/invite-table-gateway

Conversation

@vellum-apollo-bot
Copy link
Copy Markdown
Contributor

@vellum-apollo-bot vellum-apollo-bot Bot commented May 7, 2026

Summary

PR 2 of 2 for moving invite redemption out of the assistant daemon and into the gateway. The first PR (#29886) added the db_proxy_transaction IPC route — this one mirrors the assistant_ingress_invites table into the gateway DB so future PRs can migrate the invite-creation and redemption code paths into a local gateway-DB transaction without round-tripping the assistant DB.

What's in this PR

Codex follow-up from #29886

The first commit incorporates Codex's P2 comment from #29886assistantDbTransaction() was masking SQL handler errors as transport loss because ipcCallAssistant returns undefined for both:

  • the assistant being unreachable (transport), and
  • the route handler throwing (handler error)

Now uses ipcCallAssistantStrict so handler errors come through as IpcHandlerError with the original SQL message + statusCode, and transport errors as IpcTransportError. The assistant route wraps caught SQL/runtime errors in a RouteError with statusCode 500 and code DB_PROXY_TRANSACTION_FAILED, and rejects empty step lists with a 400 RouteError (code INVALID_PARAMS). Tests assert the original SQL constraint message survives the wrapping.

assistant_ingress_invites in the gateway DB

The schema mirrors assistant/src/memory/schema/contacts.ts:assistantIngressInvites exactly, with two additions:

  • contactId now has a real FK on contacts.id (ON DELETE CASCADE). The assistant-side schema only has the column without a reference because contacts live in a different DB on that side.
  • Four indexes covering active query patterns from assistant/src/memory/invite-store.ts:
    • idx_..._voice_lookup (source_channel, status, expected_external_user_id) — covers findActiveVoiceInvites
    • idx_..._code_lookup (invite_code_hash, source_channel) — covers findByInviteCodeHash
    • idx_..._token_hash (token_hash) — covers findByTokenHash
    • idx_..._contact (contact_id) — covers contact-scoped lookups

What's NOT in this PR

  • No data migration. Voice invites have a TTL of ≤24h, so aging out during the cutover window is acceptable. The table starts empty in the gateway DB.
  • No code changes to invite-creation or invite-redemption. Those continue to read/write the assistant-side table until follow-up PRs migrate them.
  • No new gateway data migration entry. The pushSQLiteSchema flow handles fresh table creation; the data-migrations registry is reserved for one-shot data movements.

Tests

  • gateway/src/__tests__/assistant-ingress-invites-schema.test.ts (5 tests, 57 expects): columns / NOT NULL / defaults / FK cascade / index column composition.
  • assistant/src/__tests__/db-proxy-transaction.test.ts (6 tests, 28 expects): regression check for the Codex fix — original SQL message + statusCode survive error wrapping.
  • Full gateway test suite still green after rebase (sanity-checked push-schema-no-prompt + auto-approve-thresholds).

Follow-ups

After this lands, in approximate order:

  1. Migrate invite-creation writes to the gateway DB.
  2. Migrate redeemVoiceInvite to the gateway, replacing the cross-DB db_proxy round-trip with a local gateway transaction (this is where db_proxy_transaction from feat(ipc): add db_proxy_transaction for atomic gateway-orchestrated writes #29886 starts paying for itself during the cutover window).
  3. Remove the upsertContactChannel invocation from invite-redemption-service.ts for the phone case (consumer ci: add web and platform CI/PR workflows #4 in ATL-291).

Open in Devin Review

… as transport loss

Codex review on #29886 caught that assistantDbTransaction() treats every
non-success response from ipcCallAssistant() as a transport failure and
throws 'assistant may not be ready'. But ipcCallAssistant() also returns
undefined on handler-level errors (msg.error path) — including SQL
constraint violations. Real transaction errors were being misclassified
as assistant unavailability, masking the underlying SQL error and
breaking debuggability + retry decisions.

Fix:
- assistant: db_proxy_transaction now wraps caught errors in RouteError
  with statusCode 500 + DB_PROXY_TRANSACTION_FAILED code, preserving
  the original SQL message. Empty-steps validation throws a 400
  RouteError (INVALID_PARAMS) instead of a bare Error.
- gateway: assistantDbTransaction() switches to ipcCallAssistantStrict
  so handler errors come through as IpcHandlerError (carrying the SQL
  message + statusCode) and transport errors as IpcTransportError.
- Tests updated to assert RouteError shape + that the original SQL
  constraint message survives wrapping.
Add the assistant_ingress_invites table to the gateway DB so future PRs
can migrate invite redemption (currently in invite-redemption-service.ts)
into the gateway process. With this schema in place plus contacts /
contact_channels already gateway-side, voice-invite redemption can
upsert the contact channel and record invite use in a single LOCAL
gateway-DB transaction instead of round-tripping the assistant DB.

Schema mirrors assistant/src/memory/schema/contacts.ts:assistantIngressInvites
exactly, with two additions:

  * contactId now has a real FK on contacts.id (ON DELETE CASCADE).
    The assistant-side schema only has the column without a reference
    because contacts live in a different DB on that side.
  * Four indexes covering the active query patterns from invite-store.ts:
      - idx_..._voice_lookup (source_channel, status, expected_external_user_id)
        for findActiveVoiceInvites
      - idx_..._code_lookup (invite_code_hash, source_channel)
        for findByInviteCodeHash
      - idx_..._token_hash (token_hash) for findByTokenHash
      - idx_..._contact (contact_id) for contact-scoped lookups

No data migration: voice invites have a TTL of 24h, so aging out during
the cutover window is acceptable. The table starts empty in the gateway
DB; reads/writes will continue going to the assistant DB until follow-up
PRs migrate the invite-creation and invite-redemption code paths.

Schema test verifies columns, NOT NULL constraints, defaults
(max_uses=1, use_count=0, status='active'), FK cascade behaviour, and
index column composition.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment thread gateway/src/db/schema.ts Outdated
Comment on lines +179 to +184
// 6-digit code redemption: findByInviteCodeHash filters by
// (invite_code_hash, source_channel, status='active').
index("idx_assistant_ingress_invites_code_lookup").on(
table.inviteCodeHash,
table.sourceChannel,
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Code redemption index may not fully cover the query pattern

The comment at gateway/src/db/schema.ts:179-180 says findByInviteCodeHash filters by (invite_code_hash, source_channel, status='active'), but the index idx_assistant_ingress_invites_code_lookup at lines 181-184 only covers (invite_code_hash, source_channel) — it does not include status. This means queries with a status='active' filter will need to do a post-index lookup for the status column. This is a minor inefficiency rather than a correctness bug, and may be intentional to keep the index smaller since most invites will be active during their short TTL.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests for schemas are ridiculous and useless lol - please remove

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — file removed in 4e22952.

Comment thread gateway/src/db/assistant-db-proxy.ts Outdated
// being collapsed into a generic transport failure. The previous
// ipcCallAssistant() based implementation masked real SQL errors as
// "assistant may not be ready", driving incorrect retry/fallback decisions.
const result = await ipcCallAssistantStrict("db_proxy_transaction", {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any harm in ipcCallAssistant always being strict? not seeing the value of having both

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — the non-strict variant is mostly a footgun, lets callers silently no-op when the assistant is down/throwing.

Surveyed the call sites:

  • handle-inbound.ts (db_proxy in inbound pipeline) — should fail loud
  • voice-approval-sync.ts (db_proxy in approval sync) — should fail loud
  • contact-prompt.ts, contacts-control-plane-proxy.ts (resolve_contact_prompt, emit_event) — should fail loud
  • post-assistant-ready.ts (health check at startup) — currently lenient on purpose so startup can recover
  • route-schema-cache.ts (get_route_schema) — defensive fallback

5/6 sites would be improved by going strict. Out of scope for this PR, but I'll open a follow-up that renames ipcCallAssistantipcCallAssistantOptional (or similar) to make the footgun explicit, migrates the 5 callers to strict, and leaves post-assistant-ready on the optional path with a comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong — there isn't a real reason for the lenient form here either. Walked through waitForAssistant again:

while (Date.now() < deadline) {
  const result = await ipcCallAssistant("health");
  if (result !== undefined) return true;
  await sleep(2000);
}

The lenient form treats "assistant down" and "health handler threw" identically — both yield undefined, both get retried. With strict, the poll loop wraps in try/catch on IpcTransportError (expected, retry) and lets handler errors propagate (a thrown health handler is a real bug we shouldn't paper over). That's strictly better than today.

So: every caller can be strict. Going to roll the rename + migration into the same follow-up PR — ipcCallAssistant becomes the only variant (signature and semantics of today's Strict), and ipcCallAssistantStrict gets dropped. No optional path. Will open after this PR merges.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's then not introduce ipcCallAssistantStrict and just use ipcCallAssistant

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in cabc2cd542. ipcCallAssistantStrict removed, ipcCallAssistant is now strict-by-default — throws IpcHandlerError (preserves statusCode + code) or IpcTransportError. Migrated every caller in this PR:

  • assistantDbTransaction calls it directly
  • ipc-runtime-proxy already used the strict shape, just renamed
  • contacts-control-plane-proxy emit_event becomes void ipcCallAssistant(…).catch(() => {}) (fire-and-forget)
  • handle-inbound touchContactChannelStats casts result + checks result.rows?.length; outer caller still fire-and-forget
  • voice-approval-sync same shape
  • contact-prompt resolves errors via a new notifyDaemonResolveError helper that wraps the IPC in try/catch so resolve failures don't reject the HTTP response
  • post-assistant-ready waitForAssistant swallows IpcTransportError during health polling and rethrows everything else

Tests updated to assert rejects.toBeInstanceOf(IpcTransportError) and a new IpcHandlerError propagation test. 43/43 pass on the affected suites.

Comment thread gateway/src/db/schema.ts Outdated
Comment on lines +141 to +142
export const assistantIngressInvites = sqliteTable(
"assistant_ingress_invites",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the table on this side to just be ingress_invites

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — assistant_ingress_invitesingress_invites (drizzle export ingressInvites), all indexes renamed to idx_ingress_invites_* in 4e22952.

Comment thread gateway/src/db/schema.ts Outdated
id: text("id").primaryKey(),
sourceChannel: text("source_channel").notNull(),
tokenHash: text("token_hash").notNull(),
sourceConversationId: text("source_conversation_id"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove conversation id, this is a daemon only concept

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — source_conversation_id removed in 4e22952.

Comment thread gateway/src/db/schema.ts Outdated
Comment on lines +158 to +159
voiceCodeHash: text("voice_code_hash"),
voiceCodeDigits: integer("voice_code_digits"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all voice related columns - we want to keep this table channel agnostic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4e22952 — removed expected_external_user_id, voice_code_hash, voice_code_digits (and the voice_lookup index). For voice redemption to keep working post-cutover, the equivalent lookup ("which active invite is this caller bound to?") will have to go through contact_channels.external_user_idcontact_id → invite. That's a redemption-code-path change, deferred to the follow-up PR that migrates redeemVoiceInvite to the gateway.

Comment thread gateway/src/db/schema.ts Outdated
Comment on lines +162 to +165
// Display metadata for personalized voice prompts (nullable).
friendName: text("friend_name"),
guardianName: text("guardian_name"),
contactId: text("contact_id")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove these and rely on the contacts table

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — friend_name and guardian_name removed in 4e22952. Voice prompts that need a display name will read it from contacts.display_name (already gateway-side) via the contact_id FK.

Comment thread gateway/src/db/schema.ts Outdated
{
id: text("id").primaryKey(),
sourceChannel: text("source_channel").notNull(),
tokenHash: text("token_hash").notNull(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this column and invite_code_hash?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two different things, both kept (with comments now in the schema):

  • token_hash — hash of the long opaque token in the invite URL (e.g. vellum.me/i/<token>). Always set; every invite has a URL form.
  • invite_code_hash — hash of the optional short human-typeable code (typically 6 digits). Nullable; not every invite is issued with a typeable fallback.

They serve different redemption flows (URL click vs typed code) and can both be set on the same invite when it supports both forms. We could collapse to one hash column with a code_format discriminator, but that would force each invite to a single redemption form — breaking the case where one invite has both a sharable URL and a typeable fallback. Let me know if you'd rather collapse anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the create code (invite-service.ts lines 211-215, invite-store.ts lines 60-67). Today they're two independent secrets, NOT reformats:

  • rawToken = 32 random bytes, base64url → 43 chars. Stored as tokenHash. Always set.
  • inviteCode = 6 digits from generateVoiceCode(6). Stored as inviteCodeHash. Set for non-voice invites.

The URL embeds the 43-char token; the 6-digit code is shown separately. Possessing one doesn't give you the other. The design intent seems to be "two parallel credentials for the same invite" — leak one, the other is still secret.

I think your instinct is right that this is over-engineered. A cleaner design: one secret per invite, presented in whatever form the channel needs. Either:

  • Short secret (6 digits) embedded in URL — vellum.me/i/847291. Brute-forceable in isolation, so add per-invite rate limit + short TTL + channel scope to mitigate. Drop token_hash, keep invite_code_hash.
  • Long secret (43 chars) for unguessable URLs, no typeable fallback. Drop invite_code_hash, keep token_hash. Voice/text typeable flows would need a different mechanism (e.g. generate a short scoped lookup key on demand).

I'd lean toward option 1 (short secret + rate limit) since the typed form is what makes voice and text-channel invites work at all, and rate-limiting per-invite is cheap. But this is a redesign that touches the creation/redemption code, not just schema. Want me to keep the schema as-is here and queue the unification as a separate PR after #29890 + the redemption-migration PR? Or collapse to one column now and we accept that the redemption code still writes token_hash as a no-op until the unification lands?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll do unification in the future as we migrate logic over to the gateway. for now, let's keep the short secret.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — dropped token_hash column and idx_ingress_invites_token_lookup, kept invite_code_hash as the short secret. Two indexes remain: idx_ingress_invites_code_lookup on (invite_code_hash, source_channel) and idx_ingress_invites_contact on contact_id. In e952b09c0b.

- Rename table assistant_ingress_invites -> ingress_invites; drop the
  'assistant_' prefix on the gateway side. Indexes renamed to match.
- Drop source_conversation_id (daemon-only concept; conversations don't
  exist in the gateway).
- Drop voice-specific columns (expected_external_user_id, voice_code_hash,
  voice_code_digits) and the voice_lookup index. Make the table
  channel-agnostic — voice flows can locate the right invite via
  contact_channels.external_user_id + contact_id rather than embedded
  columns here.
- Drop friend_name and guardian_name; rely on contacts.display_name.
- Remove the schema test file. Schema tests against drizzle definitions
  re-state the schema in another shape and don't catch real bugs.

Survives: id, source_channel, token_hash (URL token), invite_code_hash
(optional short typeable code), note, max_uses/use_count, expires_at,
status, redeemed_by_*, contact_id (FK CASCADE), created_at, updated_at.
Comment thread gateway/src/db/schema.ts Outdated
Comment on lines +128 to +142
// Sits in the gateway DB so future PRs can move invite redemption
// (currently in assistant/src/runtime/invite-redemption-service.ts) into
// the gateway process. With contacts/contact_channels already gateway-side,
// invite redemption can do a single LOCAL gateway-DB transaction
// (contact-channel upsert + invite use record) instead of round-tripping
// through the assistant DB.
//
// The table is empty at install time; it begins receiving writes when the
// invite-creation and invite-redemption code paths are migrated. No data
// migration from the assistant DB — voice invites are short-lived (TTL
// ≤24h) and aging out is acceptable during the cutover window.
//
// Channel-agnostic by design. Channel-specific lookup (e.g. matching a
// caller to their pending invite) goes through contact_channels rather
// than embedded columns here.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need any of these comments

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — comments stripped on ingressInvites (header + per-column) in e952b09c0b.

Comment thread gateway/src/db/schema.ts Outdated
{
id: text("id").primaryKey(),
sourceChannel: text("source_channel").notNull(),
tokenHash: text("token_hash").notNull(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll do unification in the future as we migrate logic over to the gateway. for now, let's keep the short secret.

Comment thread gateway/src/db/assistant-db-proxy.ts Outdated
// being collapsed into a generic transport failure. The previous
// ipcCallAssistant() based implementation masked real SQL errors as
// "assistant may not be ready", driving incorrect retry/fallback decisions.
const result = await ipcCallAssistantStrict("db_proxy_transaction", {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's then not introduce ipcCallAssistantStrict and just use ipcCallAssistant

Per review: ingressInvites only ever held tokenHash transiently before
the redemption path moved to gateway-managed bindings. Now that the
schema reflects gateway ownership, only inviteCodeHash is needed.

- Remove tokenHash column and idx_ingress_invites_token_lookup
- Strip header/inline comments on the ingressInvites table
- Keep idx_ingress_invites_code_lookup on (inviteCodeHash, sourceChannel)
  and idx_ingress_invites_contact on contactId
…ct variant

Per review: the lenient ipcCallAssistant was a transitional shim. Every
call site either (a) relied on the strict variant already, or (b) fired
and forgot — neither case wants silent error swallowing. This replaces
the lenient overload with a single strict implementation that throws
either IpcHandlerError (preserving statusCode + code) or
IpcTransportError, and migrates every caller in this PR.

Caller migrations:
- assistantDbTransaction now calls ipcCallAssistant directly (no Strict
  alias)
- ipc-runtime-proxy uses ipcCallAssistant (was already effectively
  strict)
- contacts-control-plane-proxy emit_event becomes
  void ipcCallAssistant(...).catch(() => {}) since fire-and-forget
- handle-inbound touchContactChannelStats casts result and checks
  result.rows?.length; outer caller still fire-and-forget
- voice-approval-sync follows the same shape
- contact-prompt resolves errors through a new
  notifyDaemonResolveError helper that wraps the IPC in try/catch so
  resolve failures don't reject the HTTP response
- post-assistant-ready waitForAssistant catches IpcTransportError
  during health polling but rethrows everything else

Tests:
- assistant-client.test.ts: replace 'returns undefined' tests with
  rejects.toBeInstanceOf(IpcTransportError); add IpcHandlerError test
  covering message/statusCode/code propagation; gate
  ASSISTANT_IPC_SOCKET_DIR in setup so tests don't hit the real daemon
- ipc-runtime-proxy.test.ts: collapse dual mocks into a single
  ipcCallAssistantMock with a defaultIpcImpl

Also fix lint import sort in
assistant/src/ipc/routes/db-proxy-transaction.ts.
@dvargasfuertes dvargasfuertes merged commit b4ece43 into main May 8, 2026
14 checks passed
@dvargasfuertes dvargasfuertes deleted the apollo/invite-table-gateway branch May 8, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant