fix: replace bootstrap polling loops with reactive AsyncStream observation#21686
Conversation
Convert awaitDaemonReady() and performInitialBootstrap() from polling loops (500ms-10s exponential backoff on @mainactor) to reactive Combine observation using $isConnected.values AsyncPublisher. Root cause: After Mac sleep/wake, multiple @mainactor tasks resume simultaneously (health checks, token refresh, reconnection, SSE), and the polling loops add unnecessary main-thread wakeups that push total main-actor work past the 2000ms ANR threshold. Changes: - awaitDaemonReady(): Uses withTaskGroup to race $isConnected.values against a timeout, matching the pattern in awaitLocalBootstrapCompleted() - performInitialBootstrap(): Replaces exponential-backoff connection polling with awaitConnectionEstablished() reactive helper - New awaitConnectionEstablished(): Suspends via for-await on $isConnected.values until connected, producing zero main-actor wakeups while disconnected Refs: LUM-459, LUM-486 Apple refs checked (2026-03-26): WWDC23 'Analyze hangs with Instruments', Apple docs 'Improving app responsiveness', clients/AGENTS.md §165 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
LUM-459 App Hanging: App hanging for at least 2000 ms.
Summary: Sentry reported the macOS app hanging for >= 2000ms (likely an ANR / main-thread stall) in vellum-assistant-macos. Key Context:
✨ Generated by Linear Issue Context Agent LUM-486 Convert bootstrap/connection polling to reactive Combine observation
ContextParent: LUM-459 (App Hanging: App hanging for at least 2000 ms) Problem
Acceptance Criteria
Files to Modify
Technical Approach
LUM-459 App hang: bootstrap polling loops + Combine AsyncPublisher cancellation defect cause main-thread stalls ≥ 2000 ms
SummarySentry ANR VELLUM-ASSISTANT-MACOS-8Z — the macOS app hangs for ≥ 2000 ms (first seen 2026-03-25 in vellum-macos@0.5.8). Root Causes1. Main-actor polling loops in bootstrap code
2. Combine's
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
⚙️ Control Options:
|
GatewayHTTPClient is a stateless enum with only static methods for HTTP operations (get, post, patch, put, delete, download, stream). None of these methods touch UI state — they construct URLs, build requests, and execute network calls via URLSession. The type-level @mainactor annotation forces ALL callers to serialize their network I/O through the main actor, which directly contributes to the ANR reported in LUM-459. After Mac sleep/wake, the main actor becomes a bottleneck as health checks, bootstrap, SSE, and credential refresh all queue behind each other. Removing @mainactor allows network operations to run on any executor, reducing main-thread pressure. All existing callers (SwiftUI views, @mainactor stores) continue to work — they just no longer force an unnecessary actor hop for pure HTTP work. Refs: LUM-459, LUM-487 Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
After removing @mainactor from GatewayHTTPClient, resolveConnection() can no longer synchronously access AuthService.shared.baseURL (which is @MainActor-isolated). The correct fix per Apple's concurrency model: - resolveConnection() → async throws (awaits AuthService.shared.baseURL) - isConnectionManaged() → async throws (cascaded from resolveConnection) - buildURL() → async throws (cascaded from resolveConnection) - handleAuthenticationFailure() → async (cascaded from isConnectionManaged) - Update all callers with await All high-level API methods (get, post, stream, etc.) were already async, so this change only affects the internal resolution path. No API surface changes for external callers. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
Address Devin Review feedback: awaitConnectionEstablished() accesses connectionManager.$isConnected which is a @published property on the @MainActor-isolated GatewayConnectionManager. Adding explicit @mainactor ensures safety regardless of whether AppDelegate inherits @mainactor from NSApplicationDelegate, matching the pattern used in awaitDaemonReady(). Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
Combine's AsyncPublisher (.values) does not cooperate with Swift task cancellation — a cancelled 'for await' on .values never terminates, which causes withTaskGroup to hang (it waits for all children before returning). Replace with an AsyncStream bridge that properly returns nil when the consuming task is cancelled, allowing structured concurrency groups to complete promptly. Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
…story Co-Authored-By: ashlee@vellum.ai <ashlee@vellum.ai>
Summary
Fixes LUM-459 / LUM-486 — Sentry ANR VELLUM-ASSISTANT-MACOS-8Z
Problem
awaitDaemonReady()pollsconnectionManager.isConnectedevery 500 ms andperformInitialBootstrap()polls with exponential backoff (500 ms → 10 s), both on@MainActor. After sleep/wake, these loops resume alongside health checks, token refresh, reconnection, and SSE — all serialised through the main actor — exceeding the 2 s ANR threshold.What changed
GatewayConnectionManager.swiftisConnectedStreamproperty — anAsyncStream<Bool>bridge over$isConnectedvia Combine.sink, withonTerminationcleanup.AppDelegate+Bootstrap.swiftawaitDaemonReady()— replaced polling loop withwithTaskGroupracingisConnectedStreamagainst a timeout (same pattern as existingawaitLocalBootstrapCompleted()).AppDelegate+Bootstrap.swiftperformInitialBootstrap()— replaced exponential-backoff connection wait withawaitConnectionEstablished(), which suspends viaisConnectedStreamuntil connected. Outer token-bootstrap retry loop unchanged.AppDelegate+Bootstrap.swiftawaitConnectionEstablished()—@MainActor, suspends onisConnectedStream.Why
AsyncStreaminstead of$isConnected.valuesCombine's
AsyncPublisher(.values) does not terminate its iterator on task cancellation. In awithTaskGroupthat callscancelAll(), the child iterating.valuesnever exits, so the group hangs indefinitely.AsyncStream's iterator returnsnilon cancellation, allowing task groups to complete normally. This is a known Combine defect (FB9700937), confirmed by Apple engineer Philippe Hausler.Benefits
AsyncStream+onTerminationcooperates with Swift structured concurrency, unlikeAsyncPublisher.Why it's safe
@Publishedemits the current value on.sinksubscription, so the guard-then-observe pattern has no TOCTOU gap.isConnectedStreamcreates a fresh Combine subscription per call;onTerminationcleans it up when the consumer finishes or is cancelled.performInitialBootstrapno longer uses exponential backoff — it suspends entirely until connected. The old backoff added latency with no benefit since connection state changes are event-driven.References
AsyncStreamas the recommended bridge for callback-based APIs → async/awaitonTerminationwith.cancelledfor cooperative cancellation cleanupAsyncPublishercancellation defect (FB9700937)withTaskGroupwaits for all children before returningObservableObject→@Observablemigration guidance (not done here — 50+ views)Review & Testing Checklist for Human
import Combineis new in both files;isConnectedStreamuses$isConnected.sink. Confirm no compiler errors with strict concurrency checking enabled.awaitDaemonReadytimes out gracefully (shows timeout screen) rather than hanging.performInitialBootstrapwaits for connection then bootstraps credentials successfully.Link to Devin session: https://app.devin.ai/sessions/0dbe772658c84bb583dd673f7d076dcb
Requested by: @ashleeradka