fix: eliminate ~5s delay when opening Switch Models panel with local provider#8626
Conversation
…provider Remove unnecessary HuggingFace API calls from list_local_models endpoint and lazy-load model lists per-provider instead of eagerly fetching all. Two root causes addressed: 1. list_local_models called ensure_featured_models_in_registry() on every request, making up to 6 sequential HTTP calls to HuggingFace to resolve metadata for featured models the UI immediately filtered out (only downloaded models are shown in the selector). Removed this call and exposed a dedicated POST /local-inference/sync-featured endpoint for the Settings > Models management page that actually needs it. Also parallelized the HF API calls using join_all. 2. SwitchModelModal fetched models for ALL configured providers eagerly on open, blocking on Promise.all until the slowest provider responded. Changed to lazy per-provider fetching triggered when the user selects a provider, with results cached so switching back is instant. Signed-off-by: jh-block <jhugo@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 41c1804dfc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
On fresh installs the registry is empty, so LocalModelPicker would show no models without syncing featured models first. Same pattern as LocalInferenceSettings. Signed-off-by: jh-block <jhugo@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63ac8682e2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
When a provider fetch was cancelled (user switches providers before the request completes), loadingModels stayed true indefinitely. Now the cleanup function and the cached-provider early return both reset loading. Also clear stale error/warning entries when a provider fetch succeeds on retry, so the error UI doesn't persist over a valid model list. Signed-off-by: jh-block <jhugo@block.xyz>
| let mut entries_to_add = Vec::new(); | ||
| let mut mmproj_downloads_needed: Vec<(String, String, PathBuf)> = Vec::new(); | ||
|
|
||
| // Phase 1: Check registry to find which featured models need HF resolution. |
There was a problem hiding this comment.
can we drop the phasing comments? I find that they are extra likely to rot
Signed-off-by: jh-block <jhugo@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d3c6396c50
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ) -> Result<Json<Vec<LocalModelResponse>>, ErrorResponse> { | ||
| ensure_featured_models_in_registry().await?; | ||
|
|
||
| let recommended_id = recommend_local_model(&state.inference_runtime); |
There was a problem hiding this comment.
Keep mmproj backfill reachable from local model listing
Removing ensure_featured_models_in_registry() from list_local_models also removes the only GET-path that runs model.enrich_with_featured_mmproj() and the mmproj auto-download loop for already-downloaded models. After this change, that repair logic only runs through POST /local-inference/sync-featured, which is invoked in LocalModelPicker and LocalInferenceSettings but not when users only use Switch Model flows or other API clients calling /local-inference/models; in those cases, older registry entries can remain without mmproj metadata/files and vision-capable local models stay degraded until a separate sync is triggered.
Useful? React with 👍 / 👎.
Category: fix
User Impact: The "Switch models" panel now opens instantly instead of showing "Loading models…" for ~5 seconds when the local inference provider is selected.
Problem: Opening the Switch Models panel triggered a long blocking load. Two things contributed: (1) the
list_local_modelsendpoint calledensure_featured_models_in_registry()which made up to 6 sequential HTTP requests to HuggingFace to resolve metadata for featured models — models the UI immediately filtered out since only downloaded models are selectable, and (2) the modal eagerly fetched model lists from every configured provider in parallel viaPromise.all, blocking the UI until the slowest provider responded, even though only one provider's models are displayed at a time.Solution: Removed the unnecessary HuggingFace sync from the model listing endpoint and moved it to a dedicated
POST /local-inference/sync-featuredendpoint called only by the Settings → Models management page that actually needs it. Parallelized the HF API calls withjoin_allfor when the sync is needed. Changed the modal to lazy-load models per-provider on selection instead of eagerly fetching all, with results cached so switching back is instant.File changes
crates/goose-server/src/routes/local_inference.rs
Removed
ensure_featured_models_in_registry()fromlist_local_modelsso it returns instantly from the registry. Added a newPOST /local-inference/sync-featuredendpoint. Restructuredensure_featured_models_in_registryinto 3 phases (check registry → resolve concurrently viajoin_all→ update registry) to parallelize HuggingFace API calls.crates/goose-server/src/openapi.rs
Registered the new
sync_featured_modelsendpoint in the OpenAPI spec.ui/desktop/openapi.json
Generated OpenAPI spec with the new endpoint.
ui/desktop/src/api/sdk.gen.ts
Generated SDK client with
syncFeaturedModelsfunction.ui/desktop/src/api/types.gen.ts
Generated types for the new endpoint.
ui/desktop/src/api/index.ts
Generated export for
syncFeaturedModels.ui/desktop/src/components/settings/localInference/LocalInferenceSettings.tsx
Calls
syncFeaturedModels()beforelistLocalModels()so the Settings → Models management page still populates featured models for browsing/downloading.ui/desktop/src/components/settings/models/subcomponents/SwitchModelModal.tsx
Split the initial load into two phases: provider list fetch (fast) and per-provider model fetch (on demand when selected). Results are cached in a ref so switching back to a previously-loaded provider is instant.
Reproduction Steps