[LUM-782] Fix hang in performHealthCheck by caching active assistant#24558
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8342ba065
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Cache a LockfileAssistant snapshot on GatewayConnectionManager instead of reading the lockfile from disk on every health check cycle. This eliminates 3 synchronous file I/O + JSON parse calls that blocked the main thread every 15 seconds (2 seconds during updates). - Add cachedAssistant property, refreshed on connect/reconfigure and when activeAssistantDidChange fires - Replace synchronous lockfile reads in isLocal, isManaged, performHealthCheck, and handleAuthenticationFailure with cached values - Move updateServiceGroupVersion off the main actor via Task.detached Co-Authored-By: tkheyfets <timur@vellum.ai>
…AuthenticationFailure Keep GatewayHTTPClient.isConnectionManaged() in iOS #else branches to preserve managed connection detection via UserDefaults. The cache optimization is macOS-only since the lockfile I/O bottleneck only exists on macOS. Co-Authored-By: tkheyfets <timur@vellum.ai>
|
@codex review |
e81c50b to
6aafc98
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e81c50b28f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Co-Authored-By: tkheyfets <timur@vellum.ai>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c941715c9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…d fallback Co-Authored-By: tkheyfets <timur@vellum.ai>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b70aa85ebf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ation Co-Authored-By: tkheyfets <timur@vellum.ai>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 431fcdb1f1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Summary
GatewayConnectionManager.performHealthCheck()was performing 3 synchronous lockfile reads (disk I/O + JSON parse) on the main thread every 15 seconds (2s during updates), causing periodic UI hangs. This PR caches the activeLockfileAssistantonGatewayConnectionManager— populated on connect, cleared on reconfigure/disconnect, and refreshed viaactiveAssistantDidChange— so macOS hot-path reads (isLocal,isManaged,performHealthCheck,handleAuthenticationFailure) use the in-memory cache instead of hitting disk. iOS retains the originalGatewayHTTPClient.isConnectionManaged()call since itsresolveConnection()reads from UserDefaults (no disk I/O bottleneck).updateServiceGroupVersionalso uses the cached assistant ID instead of a synchronousloadActiveAssistantId()call.To keep the cache consistent with cross-process lockfile changes (e.g.
vellum usein CLI),connectImpl()now callsLockfileAssistant.startWatching()which monitors the lockfile viaDispatchSourceand firesactiveAssistantDidChangewhen the active assistant field changes on disk.deinitcallsstopWatching()for cleanup.refreshCachedAssistant()mirrorsresolveConnectedAssistant()'s resolution order: lockfileactiveAssistant→ UserDefaultsconnectedAssistantId→ nil, covering the edge case where the lockfile hasn't been migrated yet at connect time.cachedAssistantis cleared beforedisconnect()inreconfigure()so thatautoWakeIfAssistantDied()does not read staleisLocal/isManagedduring assistant switches.Review & Testing Checklist for Human
startWatching()/stopWatching()are static onLockfileAssistantand were previously only called in E2E tests. Verify that the connection manager's lifecycle (connect → deinit) is the correct production scope and that no other code conflicts. If the connection manager is recreated frequently, the watcher restarts each time (safe —startWatching()callsstopWatching()first).connectImpl(),reconfigure(), andactiveAssistantDidChange(via file watcher) cover all paths where the active assistant can change. Any path that mutates the active assistant without writing the lockfile will result in stale data.vellum use) and confirm the cache refreshes (connection state updates correctly). Test managed assistant flow on both macOS and iOS if applicable. Verifyreconfigure()flow (assistant switch) does not trigger spurious auto-wake.Notes
@MainActoroff HTTP clients but left synchronous lockfile calls inGatewayConnectionManageruntouched. This PR completes that work for the health check path.performHealthCheckandhandleAuthenticationFailurestill useGatewayHTTPClient.isConnectionManaged()on iOS. Only macOS switches to the cached value.handlePostSparkleUpdateusescachedAssistant?.assistantId ?? LockfileAssistant.loadActiveAssistantId()as a fallback since the cache may not be populated in all post-update scenarios.refreshCachedAssistant()itself still performs synchronous I/O, but only runs on connect/reconfigure/assistant-switch — not every 15s — so the impact is negligible.performHealthCheckderiveshealthPathfromcachedAssistant, butGatewayHTTPClient.get(...)resolves the target assistant viaresolveConnection()which prioritizes_assistantOverride(used in transfer/teleport flows). These could theoretically diverge during awithAssistant(...)scope, but the overlap window is negligible in practice and the_assistantOverridepattern has pre-existing data-race concerns.Link to Devin session: https://app.devin.ai/sessions/9d56576283954b95834f2bfcbb526db9
Requested by: @tkheyfets