Stop load openai fast model for openapi compatible custom endpoint#8644
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7b14148cb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
to api.openai.com to avoid false positives.
jh-block
left a comment
There was a problem hiding this comment.
Thanks for the contribution, I agree with the need for this change but the fix is slightly too narrow: api.openai.com is not the only OpenAI API endpoint. There is at least eu.api.openai.com (EU data residency) as well, and I would not be surprised if there are other prefixes. Maybe this would be better:
.map(|h| h == "api.openai.com" || h.ends_with(".api.openai.com"))|
Agreed. It will be great the API will have a way to indicate, but I don't think I could find one... Updated the fix. |
* main: feat: extend goose2 context window ux with auto-compaction (#8721) improve goose2 agent management flows (#8737) alexhancock/tui-improvements (#8736) fix: add strict:false to Responses API tools and gpt-5.4 to known models (#8636) persist and reliably apply chat model selection (#8734) merge goose-acp crate into goose (#8726) docs: AGENTS.md section on goose2 desktop backend architecture (#8732) feat: goose2 message bubble + action tray (#8720) consolidate provider ACP methods onto inventory (#8710) ci: declare and enforce MSRV of 1.91.1 (#8670) fix(ui): correct grammar in apps view description (#8668) (#8679) Stop load openai fast model for openapi compatible custom endpoint (#8644)
* main: (41 commits) removed the specific code owner for documentation change (#8749) fix(providers): handle missing delta field in streaming chunks (#8700) refactor(providers): extract http_status module and rename handle_status_openai_compat (#8620) fix(providers/openai): accept streaming chunks with both reasoning fields (#8715) feat: associate threads with projects (#8745) upgrade goose sdk and tui to be compatible with the latest agentclientprotocol/sdk package (#8667) feat: extend goose2 context window ux with auto-compaction (#8721) improve goose2 agent management flows (#8737) alexhancock/tui-improvements (#8736) fix: add strict:false to Responses API tools and gpt-5.4 to known models (#8636) persist and reliably apply chat model selection (#8734) merge goose-acp crate into goose (#8726) docs: AGENTS.md section on goose2 desktop backend architecture (#8732) feat: goose2 message bubble + action tray (#8720) consolidate provider ACP methods onto inventory (#8710) ci: declare and enforce MSRV of 1.91.1 (#8670) fix(ui): correct grammar in apps view description (#8668) (#8679) Stop load openai fast model for openapi compatible custom endpoint (#8644) feat(hooks): add Husky git hooks for ui/goose2 (#8577) fix: links in chat could not be opened (#8544) ...
Summary
When using custom OPENAI compatible endpoint,
gpt-4o-miniis always tried to be loaded automatically even it didn't exist.There are always warnings in the log like this:
ANALYSIS
OpenAiProvider::from_env()readOPENAI_HOSTonly after unconditionally callingmodel.with_fast("gpt-4o-mini", ...). So any user pointing the openai provider at a custom endpoint (likehttp://localhost:8000) always gotgpt-4o-minibaked in as the fast model — which then fails with 404 at runtime whenevercomplete_fast()is called (MOIM summarization, context compaction, orchestrator).FIX
The fix (
openai.rs:73-86): read OPENAI_HOST first, then only set the default fast model when the host contains api.openai.com. For any other host, fast_model_config stays None, so use_fast_model() falls back to the main model instead of attempting a non-existentgpt-4o-mini.With
OPENAI_HOST: http://localhost:8000, the fast model will now silently use/models/Qwen3.5-35B-A3B-FP8for MOIM/summarization calls instead of retryinggpt-4o-mini3× and then falling back. If you later want a dedicated fast model on your local server, you can set it via a custom provider config with fast_model: "your-fast-model".Testing
Tested with local VLLM hosted QWEN3.5:35B and OpenAI API Endpoints. The WARN message disappeared after the fix.