fix(web): add bestEffort mode to captureError for daemon transient errors#33160
Merged
vex-assistant-bot[bot] merged 1 commit intoJun 2, 2026
Conversation
…rors
captureError previously had a binary gap: it silenced browser-level
TypeError: Failed to fetch (via isTransientNetworkError) but treated
every HTTP error response identically. This forced best-effort
background fetches to choose between captureError (over-reporting
expected daemon startup errors) and empty catches (silencing real bugs).
Add isExpectedDaemonTransientError() and a bestEffort option to
captureError. When bestEffort is true, expected daemon transient HTTP
errors (503/502/401/400-org-header) are silently dropped while
unexpected errors (500, data integrity, programming errors) still get
reported to Sentry.
Update refreshConversationRow call sites in useActiveConversation and
useConversationSync to use captureError({ bestEffort: true }) instead
of either bare captureError (which over-reports) or empty catches
(which under-reports).
Also add org-readiness gating to useActiveConversation to prevent
fetches from firing before the Vellum-Organization-Id header is
available.
Closes LUM-2189
Closes LUM-2193
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Contributor
There was a problem hiding this comment.
✦ Vex Review: Approved
Clean, defensible widening of . The flag is opt-in and additive — existing callers unchanged.
** shape:**
- Only matches (browser-level /network errors continue to flow through )
- 503/502/401 by status alone — safe; these statuses on a daemon API call don't have other plausible meanings
- 400 narrowed by — brittle on message text, but documented in comment and the alternative (matching all 400s as transient) would mask real client bugs. Worth keeping an eye on if the daemon error message ever changes.
Call-site fixes:
- : real fix is the new gate — same hydration-race pattern as #32912. Wrapping captureError with is defense-in-depth for the residual 502/503/auth-race windows.
- : trading for does lose the warning-tier surfaceability for these errors entirely; that's intentional — the whole point is to stop them from appearing in Sentry when transient.
Anti-pattern grep on the diff: clean. Zero casts, zero non-null , zero bare . Tests cover 7 predicate cases (positive + negative) plus 3 captureError integration cases including the "still report 500 under bestEffort" guard rail.
Closes LUM-2189 + LUM-2193. Supersedes #33151 cleanly.
CI 7/7 green at HEAD.
This was referenced Jun 3, 2026
This was referenced Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prompt / plan
Closes LUM-2189 — transient error resilience for
refreshConversationRowCloses LUM-2193 —
captureErrorlacks best-effort mode for expected daemon transient errorsSupersedes #33151 (which used empty catches instead of the
bestEffortapproach).Problem
captureError()had a binary gap between two error categories:TypeError: Failed to fetch— silenced viaisTransientNetworkError✅This forced every best-effort background fetch to choose between
captureError(over-reports expected startup noise → WEB-20) or empty.catch(() => {})(silences real bugs like 500s and data-integrity errors). LUM-2114, LUM-2188, LUM-2189 each independently chose empty catches.Changes
isExpectedDaemonTransientError()— new helper incapture-error.tsthat identifies expected transient HTTP errors from daemon API calls:ApiError(503)— daemon still bootingApiError(502)— reverse proxy can't reach daemon podApiError(401)— auth session race during loginApiError(400)with "Organization-Id header" — org store not yet hydratedOnly
ApiErrorinstances match.TypeError, genericError, and plain objects pass through.captureError({ bestEffort: true })— when the flag is set,isExpectedDaemonTransientErrorerrors are silently dropped in addition to transient network errors. Unexpected errors (500, data-integrity, programming errors) still get reported. Without the flag, behavior is unchanged.Call-site updates:
useActiveConversation— addsuseIsOrgReady()gate (prevents fetch before org header is available) +captureError({ bestEffort: true })(silences expected transients, reports real bugs)useConversationSync.refreshRow—captureError({ bestEffort: true })CONVENTIONS.md — documents the
bestEffortoption in the "Manual error reporting" section.Root cause analysis
How did the code get into this state?
captureErrorwas designed for the browser-level network error case (PR that introducedisTransientNetworkError). HTTP error responses from the daemon (503/502/401/400) are valid responses, not network errors, so they weren't filtered. Each new background-fetch call site had to independently decide how to handle them.What led to the gap? The original
captureErrorimplementation assumed errors were either browser network failures (transient) or application errors (report). It didn't model the middle ground: HTTP responses that are expected during daemon startup but not application bugs.Warning signs? Multiple call sites (LUM-2114, LUM-2188) choosing empty catches over
captureErrorwas the signal that the abstraction had a gap. Each empty catch was a local workaround for the missingbestEffortmode.Prevention? The
bestEffortoption makes the correct default easy — future call sites usecaptureError({ bestEffort: true })and get the right filtering without reinventing error-handling decisions.Alternatives considered
isTransientNetworkErrorto cover HTTP errors — semantically wrong. 503/502/401 are not network errors; they're valid HTTP responses. Conflating them would make the function name misleading and could affect other call sites that only want to filter network-level failures.error instanceof ApiError && [503, 502, ...].includes(error.status). Rejected because it duplicates the same logic at every call site and is error-prone (easy to forget a status code). The centralizedisExpectedDaemonTransientErroris the DRY version.Test plan
bun test src/lib/sentry/capture-error.test.ts— 22 tests pass:isExpectedDaemonTransientErrorfor all status codes,captureErrorwith/withoutbestEffortflag, existingnormalizeToErrorandcaptureErrortests unchangedbun test src/domains/chat/hooks/use-active-conversation.test.tsx— 6 tests pass: existing behavior + new "does not fetch when org is not ready" testbunx tsc --noEmit— cleanbun run lint— no new warningsLink to Devin session: https://app.devin.ai/sessions/c2e17ff1867f4ebd90aac007ea0f5453
Requested by: @ashleeradka