fix(web): gate platform API calls on organization ID readiness#33953
Conversation
Platform API endpoints require the Vellum-Organization-Id header. During app startup, the organization store loads asynchronously, but the assistant lifecycle query and client feature flag fetch could fire before it resolved — sending requests without the header and getting 400 responses from Django. This caused the hatching screen to hang on "Setting up your assistant…" indefinitely. Gate both the lifecycle server query and the feature flag sync on the organization ID being available before firing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1618ae5c66
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const shouldQueryServer = | ||
| isAuthenticated(sessionStatus) && | ||
| !isGatewayAuthMode() && | ||
| (hasPlatformSession || !isLocalMode()); | ||
| (hasPlatformSession || !isLocalMode()) && | ||
| !!currentOrganizationId; |
There was a problem hiding this comment.
Gate the imperative lifecycle fetch on org readiness
For authenticated platform sessions while currentOrganizationId is still hydrating, this only disables the passive useAssistantQuery; the effect below still calls lifecycleService.respondToInputs(), which immediately runs checkAssistant()/fetchQuery() and hits assistantsList without the Vellum-Organization-Id header. In the common case where selectedPlatformAssistantId remains null (multi-assistant disabled), the effect also won't rerun when the org later arrives, so the startup 400 can still put the lifecycle into an error/stuck state instead of recovering.
Useful? React with 👍 / 👎.
|
Root cause: the race was introduced by #31194 (migrate OrganizationProvider from React Context to Zustand store). The original Context provider blocked rendering until the org was resolved; the Zustand migration made the fetch non-blocking, so children can fire platform API calls before the org ID is ready. A |
The Codex review correctly identified that respondToInputs() calls checkAssistant() imperatively via fetchQuery, bypassing the passive useAssistantQuery enabled gate. Pass hasOrganization through the service inputs and short-circuit respondToInputs() before the imperative fetch when the org store hasn't resolved yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
ran into this issue when hatching assistants on dev loaded infinitely |
There was a problem hiding this comment.
✦ COMMENT — fix shape is right, one helper-use nit.
What this does: gates shouldQueryServer, useClientFeatureFlagSync, and the imperative respondToInputs → checkAssistant path on the org store having a currentOrganizationId. Cleanly addresses Codex's P1 about the lifecycle effect still firing checkAssistant without the org header — confirmed in lifecycle-service.ts:204 with the early-return + currentOrganizationId added to the effect deps array.
One observation (non-blocking): there's already a canonical helper for this exact case — useIsOrgReady() in apps/web/src/hooks/use-is-org-ready.ts, used in ~8 production sites (conversation-queries.ts, text-to-speech-card.tsx, use-stored-credential-presence.ts, etc.) and called out in apps/web/AGENTS.md as the canonical org-hydration gate. Established in #32912 (LUM-2114) for exactly this 400 Organization-Id header failure mode, reused across #33160 / #33165 / #33201.
It's also slightly more correct: returns !hasPlatformSession || currentOrgId != null, so local/self-hosted/gateway-only sessions read true instead of getting blocked on an org that never arrives. With raw !!currentOrganizationId in the new lifecycle-service.ts gate, respondToInputs will early-return for the entire session in pure local-mode without a platform session — worth confirming checkAssistant() isn't load-bearing on that path (your test plan covers it via shouldQueryServer=false, but the service-level gate is a separate code path).
Suggested shape — three call sites become one:
const isOrgReady = useIsOrgReady();
// shouldQueryServer && isOrgReady
// useClientFeatureFlagSync(hasPlatformSession && !isSessionInitializing && isOrgReady)
// hasOrganization: isOrgReadyHeads up: Codex's review is anchored at 1618ae5c (the pre-fix commit). The 471d9bd4 commit is what actually addresses its P1 — a fresh @codex review would clear the stale finding at HEAD.
Vellum Constitution — Trust-seeking: consolidating on one named gate keeps the org-hydration invariant auditable; reinventing it per call-site is how the next 400 sneaks back in.
Replace raw `!!currentOrganizationId` / `hasOrganization` checks with the canonical `useIsOrgReady()` hook (established in #32912). The hook returns `!hasPlatformSession || currentOrgId != null`, so local/gateway sessions pass through instead of being blocked on an org that never arrives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
assistantsList,featureFlagsClientFlagValuesRetrieve) require theVellum-Organization-Idheader, but could fire before the organization store finished loading — causing 400 errors and leaving the hatching screen stuck on "Setting up your assistant…" indefinitely.shouldQueryServer) and the client feature flag sync oncurrentOrganizationIdbeing non-null before firing.Test plan
/v1/assistantsor/v1/feature-flags/client-flag-values/during startupshouldQueryServeris already false🤖 Generated with Claude Code