fix(web): gate onboarding Google step on a live platform session, not a cached id#33123
Conversation
… a cached id Closes LUM-2180 The onboarding Google connect step was offered whenever a `cloud === "vellum"` assistant id was cached in the lockfile. That cached id outlives the platform session (logout/expiry never clears platform entries from the lockfile), so a local user with no live session could be shown the Google connect screen and attach OAuth to a stale, possibly-invalid assistant id. The actual precondition for the platform-backed steps (Google connect and prior-assistants import) is a *live* platform session — they talk to the platform with platform auth, and a cached id is not a substitute. Gate both on a single documented predicate `isPlatformFunnelAvailable`, dropping the `localPlatformAssistantId !== null` arm.
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b5b556a644
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ing gate The local gateway auth path sets isLoading:false before its getSession() probe settles, so hasPlatformSession===false is ambiguous: it can mean "no session" or "not yet known". Gating the platform-backed onboarding steps on the bare flag raced a returning user past their prior-assistants/Google steps in the pared-down funnel before the probe resolved. Track platformSessionResolved in the auth store and trust a cached platform assistant id as a funnel signal only while the probe is in flight. Once the probe settles, a cached id alone no longer offers the steps — cloud=vellum lockfile entries outlive the session, which is the stale-id bug being fixed.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de412df204
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…robe Re-running the local gateway platform-session probe (app-resume refresh, return from a provider callback) left platformSessionResolved as a stale true from the previous probe. During the new probe window the onboarding gate then saw resolved=true with hasPlatformSession=false, so a cached platform assistant no longer kept the Google/prior-assistants funnel available and a returning user could be raced past those steps. Centralize the probe's reset-before / set-after invariant in a probePlatformSession helper shared by both gateway init and refresh paths, so resolved re-enters the unknown state for the duration of every probe. Add a regression test covering the re-run reset.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f9a6f2365
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Overlapping platform-session probes (an app-resume refresh firing while the initial probe is still in flight) could let a slower earlier probe run its .finally() and flip platformSessionResolved back to true while the newer probe is still pending. With hasPlatformSession still false in that window, isPlatformFunnelAvailable treated the not-yet-known session as confirmed-absent and dropped the prior-assistants/Google steps. Stamp each probe with a monotonic id and resolve overlaps latest-wins: only the newest probe may settle hasPlatformSession or platformSessionResolved, so a stale completion can no longer retire the unknown window a later probe just opened. Add a regression test covering out-of-order probe completion.
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
✦ Vex Review — APPROVE
Real bug, real fix, real concurrency safety. LUM-2180 fix-PR that closes a race the LUM-2000 capability-gating refactor (#33068) exposed when the cached localPlatformAssistantId arm was removed.
Root Cause (well-diagnosed)
Onboarding's canOfferGoogleStep + canOfferPriorAssistants both gated on hasPlatformSession. In local gateway-auth mode initSession() flips isLoading: false immediately and only THEN async-probes getSession() — so a brief hasPlatformSession === false window means "unknown, not yet probed", not "confirmed absent". The previous localPlatformAssistantId !== null arm papered over this by keeping the funnel up during the probe window; LUM-2180 removed it and exposed the race.
Returning users with cached platform assistants would briefly see the funnel collapse, race past Google/prior-assistants into the name screen, and re-stabilize seconds later when the probe settled. Exactly the bug the issue described.
Final Fix at HEAD 1c6bee29 — Tri-state Gate + Probe Generation
(1) New platformSessionResolved boolean in auth-store.ts. Flips true exactly when the platform probe settles — synchronously on non-gateway paths (set({ platformSessionResolved: true }) at lines 213/253/308 + every isLoggedIn=false path), asynchronously inside probePlatformSession.finally() on the gateway path.
(2) New isPlatformFunnelAvailable predicate in prechat-steps.ts:13. Tri-state truth table:
platformSessionResolved |
hasPlatformSession |
funnel? |
|---|---|---|
| true | true | yes |
| true | false | no (confirmed absent) |
| false | * | yes (unknown — keep funnel up during probe) |
Both canOfferGoogleStep and canOfferPriorAssistants consume the single predicate — they can't drift apart again.
(3) Probe generation counter — auth-store.ts:140-191 (the Codex P2 #2 fix at HEAD):
let latestPlatformProbe = 0;
function probePlatformSession(set, options = {}) {
const probeId = ++latestPlatformProbe;
const isStale = (): boolean => probeId !== latestPlatformProbe;
set({ platformSessionResolved: false });
getSession()
.then((result) => { if (isStale()) return; /* ... */ })
.catch(() => { if (isStale()) return; /* ... */ })
.finally(() => { if (isStale()) return; set({ platformSessionResolved: true }); });
}All three callbacks short-circuit when a newer probe has superseded this one. The .finally() early-return is the critical one — without it, a slow earlier probe can flip resolved=true while a newer probe is still pending, opening the exact tri-state-collapse race the boolean was added to prevent.
Bot Iteration Trail (clean)
Three Codex P2 findings, each addressed in a subsequent commit:
| Codex finding | sha flagged | sha fixed |
|---|---|---|
| P2 #1: cached-id arm removed, fast funnel can drop during initial probe | b5b556a6 |
de412df2 (platformSessionResolved tri-state) |
P2 #2 (round 1): refreshSession doesn't reset resolved=false before new probe |
de412df2 |
5f9a6f23 (reset at top of probePlatformSession) |
P2 #2 (round 2): older probe's .finally still flips resolved=true under app-resume |
5f9a6f23 |
1c6bee29 (probe generation counter, latest-wins) |
Each fix correctly identifies the next race that the prior fix exposes — this is exactly the iteration shape #33094's TanStack rollback work followed. Devin's resolution replies on each thread accurately diagnose the bug class before naming the fix.
Tests Added (102 lines in auth-store.test.ts + 100 in prechat-steps.test.ts)
auth-store.test.ts: two-probe interleaving — older probe's.finally()runs while newer probe is pending; assertsplatformSessionResolvedstaysfalseuntil the latest probe settles. This is the test that would have failed against the5f9a6f23HEAD.prechat-steps.test.ts: the tri-state truth table forisPlatformFunnelAvailableacross(resolved, hasSession)permutations +canOfferGoogleStep/canOfferPriorAssistantsconsume the same predicate.onboarding-lifecycle-sync.test.tsx: 3 lines, asserts gate consumes resolution.
Anti-pattern Grep on Diff
Clean. Zero as casts, zero non-null ! in production code (the one ! is getSessionGates!.push(resolve) in auth-store.test.ts — test scaffolding asserting the gate array was set up). Zero || 0 fallbacks. Six files, +342/-68.
Net
A textbook LUM-2180 fix — diagnoses the cached-id arm as papering over a deeper unknowability race, replaces it with explicit tri-state semantics, gets the probe-generation invariant right at HEAD after two Codex iterations. The same shape (++generation + latest-wins early-returns in async callbacks) is reusable for any "supersedable async probe" pattern in the codebase — worth promoting to a small util if a second consumer appears.
Codex 👍 + "Nice work!" at HEAD 1c6bee29. CI 7/7 green.
Ship it.
…ing/welcome fork (#33201) The local-mode auth middleware chose between the onboarding hosting and welcome screens off a bare `hasPlatformSession`. That flag is set by a fire-and-forget platform-session probe (the local gateway auth paths return before the session is known), so when the middleware runs during the probe window it reads an ambiguous `false` and sends a returning platform user -- one with zero local assistants -- to the new-user welcome flow instead of the hosting picker. Wait for `platformSessionResolved` before forking, mirroring the `waitForAuthReady` gate already used for `isLoading` and the prechat funnel fix (#33123) that introduced the resolved flag for exactly this race. Co-authored-by: ashlee@vellum.ai <ashlee@vellum.ai>
Prompt / plan
Fix LUM-2180 — a real bug found while reviewing the onboarding capability-gating refactor (#33068): the Google connect step (and the prior-assistants import) could be offered to a local user with no live platform session, because the gate trusted a cached platform assistant id that outlives the session.
Closes LUM-2180.
What changed
isPlatformFunnelAvailable({ localMode, hasPlatformSession, platformSessionResolved, hasCachedPlatformAssistant }).localPlatformAssistantId !== nullarm ofcanOfferGoogleStep. A cached id is still used as the assistant id for the Google step when it legitimately renders; it no longer decides whether the step is offered once the session state is known.canOfferGoogleStepandcanOfferPriorAssistantsderive from the same predicate, so the two platform-gated steps move together (previously they could diverge — prior-assistants gated off while Google stayed on).platformSessionResolvedto the auth store to distinguish "platform session probe still in flight" from "probe settled, no session" (see race section).Why it's needed
A user who previously had a platform/managed session has
cloud === "vellum"entries cached in the lockfile (persisted in local storage). On logout or session expiry the store clearshasPlatformSession, but the lockfile entries are not pruned — they're only re-synced when a new valid session is established. SoreadLocalPlatformAssistantId()kept returning a stale id,canOfferGoogleStepstayedtrue, and a local user with no authenticated channel could reach the Google connect screen and attach OAuth to a stale, possibly-invalid assistant id.The correct precondition is a live session: both steps talk to the platform with platform auth, and a cached id is not a substitute for being authenticated.
Pre-existing race fixed in this PR (Codex P2)
Gating purely on
hasPlatformSessionexposed a pre-existing race in local gateway-auth mode.auth-store.initSession()setsisLoading: falseimmediately after priming the local gateway and only then probesgetSession()asynchronously. During that probe windowhasPlatformSession === falseis ambiguous — it can mean "no session" or "not yet known". The oldlocalPlatformAssistantId !== nullarm accidentally papered over the race by keeping the funnel up during the probe; removing it would have raced a returning platform user past their prior-assistants/Google steps in the fast name→Google pared-down funnel.Fix — make the gate tri-state instead of blocking onboarding on the probe:
platformSessionResolvedflipstrueexactly when the probe settles: synchronously on the non-gateway paths, via.finally()on the fire-and-forget gateway/local probes, and immediately on logout/error paths (where absence is confirmed).resolved === false), a cached platform assistant is treated as a strong signal a session exists, so the funnel stays available.resolved === true), only a livehasPlatformSessionoffers the funnel — a cached id alone does not, which is the stale-id fix.Net behavior: returning platform user mid-probe keeps their steps; confirmed session-loss after probe hides the funnel (LUM-2180); a fresh local user with no cache is never optimistically shown the funnel.
Probe lifecycle: reset-before-probe + latest-wins
The probe is fired from three sites (gateway init, local-mode init, refresh). Two follow-on issues in the tri-state flag's lifecycle are handled by centralizing all three in one
probePlatformSession(set, options)helper:platformSessionResolvedis set back tofalseup front, so a re-run probe (app-resume refresh, return from a provider callback) re-enters the "unknown" state instead of inheriting a staletruefrom the previous probe — otherwise the race reappears on the second and later probes..then/.catch/.finallycallbacks early-return unless the probe is still the latest. This prevents a slower earlier probe from flippingplatformSessionResolvedback totrue(or mutatinghasPlatformSession) while a newer probe is still pending, which would otherwise retire the "unknown" window the newer probe just opened.Safety / compatibility
!localMode) is unaffected — the predicate returnstrueregardless of session there, preserving existing managed/Electron behavior. Local mode with a live session is likewise unchanged.Root cause analysis
localPlatformAssistantId !== nullwas OR'd intocanOfferGoogleStep. That proxy also happened to mask the gateway-auth probe race.auth-store.tsclearhasPlatformSessionbut never prune platform lockfile entries — an asymmetry (sync-on-gain, no-prune-on-loss) that should have flagged "cached id ≠ session." An existing test even encoded the buggy behavior as expected.platformSessionResolved), centralize the probe so the reset-before/settle-after invariant lives in one place, and unit-test all(resolved, cached)combinations plus the probe-lifecycle edges (re-run reset, out-of-order completion) directly. Tests that locked in incorrect behavior were rewritten to assert the corrected behavior.isPlatformFunnelAvailable,platformSessionResolved, andprobePlatformSession, plus the unit tests, are the durable guardrail.Test plan
bun test src/domains/onboarding/prechat-steps.test.ts—isPlatformFunnelAvailablesuite covers: platform mode always available regardless of session state; local + live session available; local + probe-resolved + cached id → unavailable (LUM-2180); local + probe-in-flight + cached id → available (race fix); local + probe-in-flight + no cache → unavailable.bun test src/domains/onboarding/onboarding-lifecycle-sync.test.tsx— onboarding integration/lifecycle-sync green; mock extended withplatformSessionResolved.bun test src/stores/auth-store.test.ts— auth-store green, including two probe-lifecycle regression tests: a re-run gateway probe resetsplatformSessionResolveduntil it settles, and a stale probe completing after a newer one does not settle resolution.bunx tsc --noEmitandbun run lintclean.Link to Devin session: https://app.devin.ai/sessions/15bca57bd4c64a3085cfb80e1f26355a
Requested by: @ashleeradka