Skip to content

[LUM-782] Fix hang in performHealthCheck by caching active assistant#24558

Merged
tkheyfets merged 5 commits into
mainfrom
devin/1775747392-fix-health-check-hang
Apr 9, 2026
Merged

[LUM-782] Fix hang in performHealthCheck by caching active assistant#24558
tkheyfets merged 5 commits into
mainfrom
devin/1775747392-fix-health-check-hang

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Summary

GatewayConnectionManager.performHealthCheck() was performing 3 synchronous lockfile reads (disk I/O + JSON parse) on the main thread every 15 seconds (2s during updates), causing periodic UI hangs. This PR caches the active LockfileAssistant on GatewayConnectionManager — populated on connect, cleared on reconfigure/disconnect, and refreshed via activeAssistantDidChange — so macOS hot-path reads (isLocal, isManaged, performHealthCheck, handleAuthenticationFailure) use the in-memory cache instead of hitting disk. iOS retains the original GatewayHTTPClient.isConnectionManaged() call since its resolveConnection() reads from UserDefaults (no disk I/O bottleneck). updateServiceGroupVersion also uses the cached assistant ID instead of a synchronous loadActiveAssistantId() call.

To keep the cache consistent with cross-process lockfile changes (e.g. vellum use in CLI), connectImpl() now calls LockfileAssistant.startWatching() which monitors the lockfile via DispatchSource and fires activeAssistantDidChange when the active assistant field changes on disk. deinit calls stopWatching() for cleanup.

refreshCachedAssistant() mirrors resolveConnectedAssistant()'s resolution order: lockfile activeAssistant → UserDefaults connectedAssistantId → nil, covering the edge case where the lockfile hasn't been migrated yet at connect time. cachedAssistant is cleared before disconnect() in reconfigure() so that autoWakeIfAssistantDied() does not read stale isLocal/isManaged during assistant switches.

Review & Testing Checklist for Human

  • Build in Xcode: CI skips macOS/iOS Swift builds — this must compile locally before merging. No compilation has been verified.
  • Lockfile watcher lifecycle: startWatching() / stopWatching() are static on LockfileAssistant and were previously only called in E2E tests. Verify that the connection manager's lifecycle (connect → deinit) is the correct production scope and that no other code conflicts. If the connection manager is recreated frequently, the watcher restarts each time (safe — startWatching() calls stopWatching() first).
  • Cache staleness: Confirm that connectImpl(), reconfigure(), and activeAssistantDidChange (via file watcher) cover all paths where the active assistant can change. Any path that mutates the active assistant without writing the lockfile will result in stale data.
  • Test plan: Launch the app, connect to a local assistant, verify health checks run without UI hitches. Switch assistants via CLI (vellum use) and confirm the cache refreshes (connection state updates correctly). Test managed assistant flow on both macOS and iOS if applicable. Verify reconfigure() flow (assistant switch) does not trigger spurious auto-wake.

Notes

  • Prior art: PR Move network layer off @MainActor isolation (LUM-492) #21729 (LUM-492) moved @MainActor off HTTP clients but left synchronous lockfile calls in GatewayConnectionManager untouched. This PR completes that work for the health check path.
  • iOS behavior is unchanged: performHealthCheck and handleAuthenticationFailure still use GatewayHTTPClient.isConnectionManaged() on iOS. Only macOS switches to the cached value.
  • handlePostSparkleUpdate uses cachedAssistant?.assistantId ?? LockfileAssistant.loadActiveAssistantId() as a fallback since the cache may not be populated in all post-update scenarios.
  • refreshCachedAssistant() itself still performs synchronous I/O, but only runs on connect/reconfigure/assistant-switch — not every 15s — so the impact is negligible.
  • Known limitation: performHealthCheck derives healthPath from cachedAssistant, but GatewayHTTPClient.get(...) resolves the target assistant via resolveConnection() which prioritizes _assistantOverride (used in transfer/teleport flows). These could theoretically diverge during a withAssistant(...) scope, but the overlap window is negligible in practice and the _assistantOverride pattern has pre-existing data-race concerns.

Link to Devin session: https://app.devin.ai/sessions/9d56576283954b95834f2bfcbb526db9
Requested by: @tkheyfets


Open with Devin

@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration[bot]

This comment was marked as resolved.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8342ba065

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread clients/shared/Network/GatewayConnectionManager.swift Outdated
Comment thread clients/shared/Network/GatewayConnectionManager.swift Outdated
devin-ai-integration Bot and others added 2 commits April 9, 2026 20:12
Cache a LockfileAssistant snapshot on GatewayConnectionManager instead of
reading the lockfile from disk on every health check cycle. This eliminates
3 synchronous file I/O + JSON parse calls that blocked the main thread
every 15 seconds (2 seconds during updates).

- Add cachedAssistant property, refreshed on connect/reconfigure and when
  activeAssistantDidChange fires
- Replace synchronous lockfile reads in isLocal, isManaged, performHealthCheck,
  and handleAuthenticationFailure with cached values
- Move updateServiceGroupVersion off the main actor via Task.detached

Co-Authored-By: tkheyfets <timur@vellum.ai>
…AuthenticationFailure

Keep GatewayHTTPClient.isConnectionManaged() in iOS #else branches to
preserve managed connection detection via UserDefaults. The cache
optimization is macOS-only since the lockfile I/O bottleneck only
exists on macOS.

Co-Authored-By: tkheyfets <timur@vellum.ai>
@tkheyfets

Copy link
Copy Markdown
Contributor

@codex review

@devin-ai-integration devin-ai-integration Bot force-pushed the devin/1775747392-fix-health-check-hang branch from e81c50b to 6aafc98 Compare April 9, 2026 20:52

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e81c50b28f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread clients/shared/Network/GatewayConnectionManager.swift
@tkheyfets

Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c941715c9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread clients/shared/Network/GatewayConnectionManager.swift Outdated
Comment thread clients/shared/Network/GatewayConnectionManager.swift
…d fallback

Co-Authored-By: tkheyfets <timur@vellum.ai>
@tkheyfets

Copy link
Copy Markdown
Contributor

@codex review

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment thread clients/shared/Network/GatewayConnectionManager.swift

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b70aa85ebf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread clients/shared/Network/GatewayConnectionManager.swift
Comment thread clients/shared/Network/GatewayConnectionManager.swift Outdated
…ation

Co-Authored-By: tkheyfets <timur@vellum.ai>
@tkheyfets

Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 431fcdb1f1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread clients/shared/Network/GatewayConnectionManager.swift
@tkheyfets tkheyfets merged commit 8173a64 into main Apr 9, 2026
7 checks passed
@tkheyfets tkheyfets deleted the devin/1775747392-fix-health-check-hang branch April 9, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant