Skip to content

fix(web): root-cause fix for daemon query retries + TQ conversion (WEB-7, WEB-2H)#33165

Merged
vex-assistant-bot[bot] merged 4 commits into
mainfrom
devin/1780446522-fix-daemon-query-retry-and-tq-conversion
Jun 3, 2026
Merged

fix(web): root-cause fix for daemon query retries + TQ conversion (WEB-7, WEB-2H)#33165
vex-assistant-bot[bot] merged 4 commits into
mainfrom
devin/1780446522-fix-daemon-query-retry-and-tq-conversion

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Prompt / plan

Root-cause fix for Sentry issues WEB-7 (77 events) and WEB-2H (1 event). Instead of suppressing errors via bestEffort (symptom treatment), this prevents them via TanStack Query retry and proper data-fetching patterns.

What changed

1. Shared daemon-error utilities (utils/daemon-errors.ts)

  • isExpectedDaemonTransientError(error) — classifies 503/502/401/400-org-header as expected transient daemon startup errors. Moved from lib/sentry/capture-error.ts (wrong location per CONVENTIONS.md — not Sentry-specific)
  • shouldRetryDaemonError(failureCount, error) — TQ retry predicate. retry: shouldRetryDaemonError is now the convention for daemon queries
  • Re-exported from capture-error.ts for backward compatibility

2. WEB-7 fix (use-history-pagination.ts)

  • Replaced unjustified retry: false with retry: shouldRetryDaemonError
  • TQ now retries 503 "still starting up" transparently (user sees spinner → success, no error flash)

3. WEB-2H fix (web-search-card.tsx)

  • Converted imperative useEffect + async IIFE secret read to TQ useQuery — gains retry, caching, abort, refetch-on-focus
  • Eliminated: secretScopeRef, secretReadRevision (manual cache-bust), cancelled boolean, webSearchHasStoredKey useState
  • Fixed permanent-false bug: old code set webSearchHasStoredKey = false on transient daemon 503 with no retry
  • Added optimistic savedOverride state to prevent save button race condition during async config refetch
  • Uses useDaemonConfigQuery + useDaemonConfigMutation + useProvisionProviderKey from PR refactor(web): split useDaemonConfig god-hook, fix untyped patches and double invalidation (LUM-2184) #33156's refactor

4. Defense-in-depthbestEffort: true in error handlers catches anything retry doesn't cover (retries exhausted)

5. Convention docs — Updated CONVENTIONS.md and STATE_MANAGEMENT.md with daemon retry guidance

Architecture

daemon 503 → TQ retry (1s → 2s → 4s) → success (user sees loading → data)
                                       ↓ retries exhausted
                                  captureError(e, { bestEffort: true })
                                       → silences expected transient errors
                                       → reports real bugs (500, data integrity)

Closes LUM-2199
Related: LUM-2205 (same pattern in provider-editor-modal.tsx, separate PR)

Test plan

  • bunx tsc --noEmit — clean
  • bun run lint — clean
  • All 7 CI checks passing (lint, typecheck, build, test, socket security)
  • Unit tests added for isExpectedDaemonTransientError and shouldRetryDaemonError (daemon-errors.test.ts)
  • Config/logic-only changes — Sentry noise will clear as stale caches expire post-deploy

Link to Devin session: https://app.devin.ai/sessions/c2e17ff1867f4ebd90aac007ea0f5453
Requested by: @ashleeradka

…B-7, WEB-2H)

Closes LUM-2199

- Extract shouldRetryDaemonError utility to utils/daemon-errors.ts
- Move isExpectedDaemonTransientError from lib/sentry/ to utils/ (re-export for compat)
- Replace retry: false with shouldRetryDaemonError in use-history-pagination.ts (WEB-7)
- Convert imperative credential read to TQ useQuery in web-search-card.tsx (WEB-2H)
- Derive saved state from daemon config via useMemo, eliminating redundant useState
- Keep bestEffort: true as defense-in-depth in error handlers

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@linear

linear Bot commented Jun 3, 2026

Copy link
Copy Markdown

LUM-2199

@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread apps/web/src/domains/settings/ai/web-search-card.tsx Outdated
ashleeradka and others added 2 commits June 3, 2026 00:48
…ter config patch

The savedWebSearchMode/savedWebSearchProvider derivation from daemon config
lags behind during the async refetch window after patchDaemonConfig. This
briefly re-enables the save button. Fix: set a synchronous override in
handleSave, cleared when the daemon config refetch completes.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ctor

Adopted useDaemonConfigQuery + useDaemonConfigMutation + useProvisionProviderKey
from #33156, combined with TQ credential query, savedOverride, and
requiresProviderCredential rename from this branch.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@ashleeradka

Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad62a900de

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}
return data!.found;
},
enabled: !!assistantId && requiresProviderCredential,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate credential query on org readiness

In platform-hosted sessions this daemon request needs the Vellum-Organization-Id header, but the new TanStack query can run as soon as assistantId exists even if the org store has not hydrated yet. I checked apps/web/docs/STATE_MANAGEMENT.md and useIsOrgReady(): queries that need this header should include that gate; otherwise, if the 400 org-header retries exhaust before hydration completes, the query stays errored with webSearchHasStoredKey treated as false and users with an existing BYOK web-search key can be blocked/asked for a key until something else causes a refetch.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this is a real bug. Fixed in ca4f9da.\n\nAdded useIsOrgReady() gate to the credential query's enabled condition, matching the established pattern in conversation-queries.ts. The query now waits for org store hydration before firing, preventing 400 "Organization-Id header" errors that could exhaust retries and leave webSearchHasStoredKey permanently false.\n\nNote: useDaemonConfigQuery itself also lacks this gate (pre-existing), but that's a separate concern — it's gated on assistantId which in platform mode implies the lifecycle service has run, which typically means org is ready. The credential query was more vulnerable because it could mount eagerly via requiresProviderCredential before the config query resolves.

…e hydration

The secretsReadPost daemon call needs the Vellum-Organization-Id header,
which the org store provides after async hydration. Without the isOrgReady
gate, the query could fire before hydration completes, exhaust retries on
400 'Organization-Id header' errors, and leave webSearchHasStoredKey as
false permanently.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

@vex-assistant-bot vex-assistant-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE

Value: Replaces a bestEffort-style symptom mute with proper TanStack Query retry + the useIsOrgReady() hydration gate so WEB-7 (77 events) and WEB-2H stop firing for the reason they fire, not just at the Sentry shipping layer. Centralizes the daemon-transient predicate where every retry-aware caller can reach it.

What this does: Moves isExpectedDaemonTransientError from lib/sentry/capture-error.tsutils/daemon-errors.ts (correct home, not Sentry-specific), adds shouldRetryDaemonError as the canonical TQ retry predicate (3 retries max, exponential backoff, fails fast on 500/programming bugs), wires it into use-history-pagination (replaces unjustified retry: false), and rewrites web-search-card.tsx to use useDaemonConfigQuery + useQuery for the secret-presence check — eliminating secretScopeRef, secretReadRevision, cancelled, and the permanent-false bug.

Review detail

Verified at HEAD ca4f9da9:

  • daemon-errors.ts matches the predicate shape adopted in #33160 (503/502/401/400+org-header). MAX_DAEMON_RETRIES = 3 is sensible — at 1s/2s/4s backoff that's ~7s of recovery window, well above typical hydration time.
  • use-history-pagination.ts:144retry: shouldRetryDaemonError is the right call. Old retry: false on history pagination was the textbook case the PR description calls out.
  • capture-error.ts re-exports the predicate for migration compatibility — diff is strictly subtractive (–31/+5).

Codex P2 / Devin P1 — both real, both already addressed:

  • Codex P2 (org-header hydration race on credential query): legitimate — same pattern just landed in #32912/#33160. Fixed by Devin in ca4f9da9 by adding useIsOrgReady() to the enabled predicate at line 125. Verified.
  • Devin P1 (save button briefly re-enables in the refetch window between setSaving(false) and daemon config invalidation): real race. Fixed in e2c6e1d8 via the savedOverride synchronous bridge that clears when reconciled updates. The dual-state pattern (savedOverride for synchronous bridging + reconciled for ground truth) is clean.

Anti-pattern grep on full file at HEAD:

  • Zero new runtime-boundary casts. as ServiceMode on the localStorage read is pre-existing parity with sibling cards.
  • data!.found at L123 — gated by if (!response.ok) throw immediately prior; HeyAPI shape guarantees data when response is ok. Defensible.
  • assistantId! at L112 — gated by enabled: !!assistantId && requiresProviderCredential && isOrgReady. Defensible.
  • Zero @ts-ignore, zero eslint-disable, zero || 0.

Architecture note: useDaemonConfigQuery itself doesn't have the useIsOrgReady gate (Devin called this out in the response thread as a separate, pre-existing concern). Worth a follow-up — it's gated on assistantId, which transitively gates on assistantsListOptions succeeding, but on the platform path that upstream query needs the org header too. Probably a clean follow-up ticket rather than scope creep into this PR.

Test coverage: daemon-errors.test.ts (+107) covers the full status matrix (503/502/401/400-org/400-other/500/TypeError/Error) for both isExpectedDaemonTransientError and shouldRetryDaemonError, plus the failureCount >= MAX_DAEMON_RETRIES cap.

Vellum Constitution — Trust-seeking: replaces a silent-by-default Sentry mute with bounded, observable retry + an explicit hydration gate. Failure modes that survive retries still surface; expected transients no longer pollute the signal.

@vex-assistant-bot

Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant