Skip to content

fix: eliminate ~5s delay when opening Switch Models panel with local provider#8626

Merged
jh-block merged 4 commits into
mainfrom
jhugo/local-inference-model-list-performance
Apr 17, 2026
Merged

fix: eliminate ~5s delay when opening Switch Models panel with local provider#8626
jh-block merged 4 commits into
mainfrom
jhugo/local-inference-model-list-performance

Conversation

@jh-block
Copy link
Copy Markdown
Collaborator

Category: fix
User Impact: The "Switch models" panel now opens instantly instead of showing "Loading models…" for ~5 seconds when the local inference provider is selected.

Problem: Opening the Switch Models panel triggered a long blocking load. Two things contributed: (1) the list_local_models endpoint called ensure_featured_models_in_registry() which made up to 6 sequential HTTP requests to HuggingFace to resolve metadata for featured models — models the UI immediately filtered out since only downloaded models are selectable, and (2) the modal eagerly fetched model lists from every configured provider in parallel via Promise.all, blocking the UI until the slowest provider responded, even though only one provider's models are displayed at a time.

Solution: Removed the unnecessary HuggingFace sync from the model listing endpoint and moved it to a dedicated POST /local-inference/sync-featured endpoint called only by the Settings → Models management page that actually needs it. Parallelized the HF API calls with join_all for when the sync is needed. Changed the modal to lazy-load models per-provider on selection instead of eagerly fetching all, with results cached so switching back is instant.

File changes

crates/goose-server/src/routes/local_inference.rs
Removed ensure_featured_models_in_registry() from list_local_models so it returns instantly from the registry. Added a new POST /local-inference/sync-featured endpoint. Restructured ensure_featured_models_in_registry into 3 phases (check registry → resolve concurrently via join_all → update registry) to parallelize HuggingFace API calls.

crates/goose-server/src/openapi.rs
Registered the new sync_featured_models endpoint in the OpenAPI spec.

ui/desktop/openapi.json
Generated OpenAPI spec with the new endpoint.

ui/desktop/src/api/sdk.gen.ts
Generated SDK client with syncFeaturedModels function.

ui/desktop/src/api/types.gen.ts
Generated types for the new endpoint.

ui/desktop/src/api/index.ts
Generated export for syncFeaturedModels.

ui/desktop/src/components/settings/localInference/LocalInferenceSettings.tsx
Calls syncFeaturedModels() before listLocalModels() so the Settings → Models management page still populates featured models for browsing/downloading.

ui/desktop/src/components/settings/models/subcomponents/SwitchModelModal.tsx
Split the initial load into two phases: provider list fetch (fast) and per-provider model fetch (on demand when selected). Results are cached in a ref so switching back to a previously-loaded provider is instant.

Reproduction Steps

  1. Configure the local inference provider (and optionally one or more cloud providers)
  2. Open a chat session and click the model selector to open the "Switch models" panel
  3. Select the local inference provider from the dropdown
  4. Observe that the model list loads instantly instead of showing "Loading models…" for several seconds
  5. Switch to another provider — its models load on first selection, then are cached for subsequent switches

…provider

Remove unnecessary HuggingFace API calls from list_local_models endpoint
and lazy-load model lists per-provider instead of eagerly fetching all.

Two root causes addressed:

1. list_local_models called ensure_featured_models_in_registry() on every
   request, making up to 6 sequential HTTP calls to HuggingFace to resolve
   metadata for featured models the UI immediately filtered out (only
   downloaded models are shown in the selector). Removed this call and
   exposed a dedicated POST /local-inference/sync-featured endpoint for
   the Settings > Models management page that actually needs it. Also
   parallelized the HF API calls using join_all.

2. SwitchModelModal fetched models for ALL configured providers eagerly
   on open, blocking on Promise.all until the slowest provider responded.
   Changed to lazy per-provider fetching triggered when the user selects
   a provider, with results cached so switching back is instant.

Signed-off-by: jh-block <jhugo@block.xyz>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41c1804dfc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/goose-server/src/routes/local_inference.rs
On fresh installs the registry is empty, so LocalModelPicker would show
no models without syncing featured models first. Same pattern as
LocalInferenceSettings.

Signed-off-by: jh-block <jhugo@block.xyz>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63ac8682e2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ui/desktop/src/components/settings/models/subcomponents/SwitchModelModal.tsx Outdated
When a provider fetch was cancelled (user switches providers before the
request completes), loadingModels stayed true indefinitely. Now the
cleanup function and the cached-provider early return both reset loading.

Also clear stale error/warning entries when a provider fetch succeeds on
retry, so the error UI doesn't persist over a valid model list.

Signed-off-by: jh-block <jhugo@block.xyz>
@jh-block jh-block requested a review from DOsinga April 17, 2026 16:35
let mut entries_to_add = Vec::new();
let mut mmproj_downloads_needed: Vec<(String, String, PathBuf)> = Vec::new();

// Phase 1: Check registry to find which featured models need HF resolution.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we drop the phasing comments? I find that they are extra likely to rot

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

Signed-off-by: jh-block <jhugo@block.xyz>
@jh-block jh-block enabled auto-merge April 17, 2026 18:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3c6396c50

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

) -> Result<Json<Vec<LocalModelResponse>>, ErrorResponse> {
ensure_featured_models_in_registry().await?;

let recommended_id = recommend_local_model(&state.inference_runtime);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep mmproj backfill reachable from local model listing

Removing ensure_featured_models_in_registry() from list_local_models also removes the only GET-path that runs model.enrich_with_featured_mmproj() and the mmproj auto-download loop for already-downloaded models. After this change, that repair logic only runs through POST /local-inference/sync-featured, which is invoked in LocalModelPicker and LocalInferenceSettings but not when users only use Switch Model flows or other API clients calling /local-inference/models; in those cases, older registry entries can remain without mmproj metadata/files and vision-capable local models stay degraded until a separate sync is triggered.

Useful? React with 👍 / 👎.

@jh-block jh-block added this pull request to the merge queue Apr 17, 2026
Merged via the queue into main with commit 1a18c27 Apr 17, 2026
21 checks passed
@jh-block jh-block deleted the jhugo/local-inference-model-list-performance branch April 17, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants