Skip to content

Stop load openai fast model for openapi compatible custom endpoint#8644

Merged
jh-block merged 3 commits into
aaif-goose:mainfrom
spikewang:fix/no-openai-fast-model-for-custom
Apr 21, 2026
Merged

Stop load openai fast model for openapi compatible custom endpoint#8644
jh-block merged 3 commits into
aaif-goose:mainfrom
spikewang:fix/no-openai-fast-model-for-custom

Conversation

@spikewang
Copy link
Copy Markdown
Contributor

Summary

When using custom OPENAI compatible endpoint, gpt-4o-mini is always tried to be loaded automatically even it didn't exist.

There are always warnings in the log like this:

{"timestamp":"2026-04-19T00:48:34.916138Z","level":"WARN","fields":{"message":"Provider request failed with status: 404 Not Found. Payload: Some(Object {\"error\": Object\
 {\"message\": String(\"The model `gpt-4o-mini` does not exist.\"), \"type\": String(\"NotFoundError\"), \"param\": String(\"model\"), \"code\": Number(404)}}). Returning\
 error: RequestFailed(\"Resource not found (404): The model `gpt-4o-mini` does not exist.\")"},"target":"goose::providers::openai_compatible","span":{"gen_ai.request.mode\
l":"gpt-4o-mini","session.id":"20260419_4","name":"complete"},"spans":[{"gen_ai.request.model":"gpt-4o-mini","session.id":"20260419_4","name":"complete"}]}
{"timestamp":"2026-04-19T00:48:34.916189Z","level":"WARN","fields":{"message":"Request failed, retrying (1/3): RequestFailed(\"Resource not found (404): The model `gpt-4o\
-mini` does not exist.\")"},"target":"goose::providers::retry","span":{"gen_ai.request.model":"gpt-4o-mini","session.id":"20260419_4","name":"complete"},"spans":[{"gen_ai\
.request.model":"gpt-4o-mini","session.id":"20260419_4","name":"complete"}]}

ANALYSIS

OpenAiProvider::from_env() read OPENAI_HOST only after unconditionally calling model.with_fast("gpt-4o-mini", ...). So any user pointing the openai provider at a custom endpoint (like http://localhost:8000) always got gpt-4o-mini baked in as the fast model — which then fails with 404 at runtime whenever complete_fast() is called (MOIM summarization, context compaction, orchestrator).

FIX

The fix (openai.rs:73-86): read OPENAI_HOST first, then only set the default fast model when the host contains api.openai.com. For any other host, fast_model_config stays None, so use_fast_model() falls back to the main model instead of attempting a non-existent gpt-4o-mini.

With OPENAI_HOST: http://localhost:8000, the fast model will now silently use /models/Qwen3.5-35B-A3B-FP8 for MOIM/summarization calls instead of retrying gpt-4o-mini 3× and then falling back. If you later want a dedicated fast model on your local server, you can set it via a custom provider config with fast_model: "your-fast-model".

Testing

Tested with local VLLM hosted QWEN3.5:35B and OpenAI API Endpoints. The WARN message disappeared after the fix.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7b14148cb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/goose/src/providers/openai.rs Outdated
to api.openai.com to avoid false positives.
Copy link
Copy Markdown
Collaborator

@jh-block jh-block left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, I agree with the need for this change but the fix is slightly too narrow: api.openai.com is not the only OpenAI API endpoint. There is at least eu.api.openai.com (EU data residency) as well, and I would not be surprised if there are other prefixes. Maybe this would be better:

    .map(|h| h == "api.openai.com" || h.ends_with(".api.openai.com"))

@jh-block jh-block self-assigned this Apr 20, 2026
@spikewang
Copy link
Copy Markdown
Contributor Author

spikewang commented Apr 21, 2026

Agreed. It will be great the API will have a way to indicate, but I don't think I could find one... Updated the fix.

@spikewang spikewang requested a review from jh-block April 21, 2026 04:36
@jh-block jh-block added this pull request to the merge queue Apr 21, 2026
Merged via the queue into aaif-goose:main with commit aa731a9 Apr 21, 2026
20 checks passed
lifeizhou-ap added a commit that referenced this pull request Apr 22, 2026
* main:
  feat: extend goose2 context window ux with auto-compaction (#8721)
  improve goose2 agent management flows (#8737)
  alexhancock/tui-improvements (#8736)
  fix: add strict:false to Responses API tools and gpt-5.4 to known models (#8636)
  persist and reliably apply chat model selection (#8734)
  merge goose-acp crate into goose (#8726)
  docs: AGENTS.md section on goose2 desktop backend architecture (#8732)
  feat: goose2 message bubble + action tray (#8720)
  consolidate provider ACP methods onto inventory (#8710)
  ci: declare and enforce MSRV of 1.91.1 (#8670)
  fix(ui): correct grammar in apps view description (#8668) (#8679)
  Stop load openai fast model for openapi compatible custom endpoint (#8644)
lifeizhou-ap added a commit that referenced this pull request Apr 22, 2026
* main: (41 commits)
  removed the specific code owner for documentation change (#8749)
  fix(providers): handle missing delta field in streaming chunks (#8700)
  refactor(providers): extract http_status module and rename handle_status_openai_compat (#8620)
  fix(providers/openai): accept streaming chunks with both reasoning fields (#8715)
  feat: associate threads with projects (#8745)
  upgrade goose sdk and tui to be compatible with the latest agentclientprotocol/sdk package (#8667)
  feat: extend goose2 context window ux with auto-compaction (#8721)
  improve goose2 agent management flows (#8737)
  alexhancock/tui-improvements (#8736)
  fix: add strict:false to Responses API tools and gpt-5.4 to known models (#8636)
  persist and reliably apply chat model selection (#8734)
  merge goose-acp crate into goose (#8726)
  docs: AGENTS.md section on goose2 desktop backend architecture (#8732)
  feat: goose2 message bubble + action tray (#8720)
  consolidate provider ACP methods onto inventory (#8710)
  ci: declare and enforce MSRV of 1.91.1 (#8670)
  fix(ui): correct grammar in apps view description (#8668) (#8679)
  Stop load openai fast model for openapi compatible custom endpoint (#8644)
  feat(hooks): add Husky git hooks for ui/goose2 (#8577)
  fix: links in chat could not be opened (#8544)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants