fix: increase Ollama retry config + add transient-only mode by fresh3nough · Pull Request #7677 · aaif-goose/goose

fresh3nough · 2026-03-05T15:22:46Z

Problem

When using Ollama with large models (e.g. Qwen3.5 35b), goose gives up after ~7 seconds with a 500 'connection refused' error. Ollama returns HTTP 500 while loading a model into memory, which can take 10-120s for large models on consumer hardware.

The default retry config (3 retries with 1s/2s/4s backoff = ~7s total) is insufficient for this scenario.

Fix

Override retry_config() in OllamaProvider with values tuned for local model loading:

10 retries (up from 3)
2s initial interval (up from 1s)
1.5x backoff multiplier (down from 2.0, more gradual ramp)
15s max interval (down from 30s)

This provides ~100s of total retry wait time (even with worst-case jitter, >60s), which handles models that take up to ~2 minutes to load.

Testing

Added unit tests verifying the retry config values and that total wait time exceeds 60s
All existing Ollama tests continue to pass
cargo clippy clean

Closes #7635

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 616e2df1c5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…7635) When Ollama loads a large model (e.g. Qwen3.5 35b), it returns HTTP 500 errors while the model is loading into memory. The default retry config (3 retries, ~7s total) was insufficient for models that take 10-120s to load, causing 'connection refused' errors. Override retry_config() in OllamaProvider with Ollama-specific values: - 10 retries (up from 3) - 2s initial interval (up from 1s) - 1.5x backoff multiplier (down from 2.0, more gradual) - 15s max interval (down from 30s) This provides ~100s of total retry wait time, enough for large models on slower hardware. Closes aaif-goose#7635 Signed-off-by: fre <anonwurcod@proton.me>

Client errors (400/404) such as typo model names now fail fast instead of waiting 80-100s through the full retry backoff. Transient errors (5xx during model loading, connection refused, rate limits) still use the extended Ollama retry config. - Add transient_only flag to RetryConfig - Update should_retry predicate to accept config - Set transient_only on Ollama retry config - Add unit tests for retry predicate behavior Signed-off-by: fre <anonwurcod@proton.me>

DOsinga

this does seem to change a lot more than just the retry for ollama

fresh3nough · 2026-03-11T15:59:21Z

in addition im updating pr title to extending transient mode

Remove test_ollama_retry_config_values (asserts constants equal themselves) and test_ollama_retry_config_provides_sufficient_wait_time (recomputes backoff math). The transient_only behavior test is retained as it exercises actual feature logic. Signed-off-by: fre <anonwurcod@proton.me>

DOsinga · 2026-03-11T16:48:24Z

so I am thinking we shouldn't fix this using the retry mechanism - it would just retry any ollama call even the real faulty one > 1m. is there a different way of doing this? something that is more targeted to the issue

fresh3nough · 2026-03-11T17:55:50Z

gotcha, couple options:

Long First-Byte Timeout + Normal Chunk Timeout (builds on your earlier Ollama work) --> Use tokio::time::timeout only until the first SSE line/chunk arrives (like the “defer stall timeout” pattern we discussed before). --> After first chunk → revert to 30s per-chunk. --> This handles slow model loading without touching retries at all.

or

Smart Error-Pattern Retry Predicate
Only extend retries when the error matches known “model still loading” patterns:
Common Ollama signatures during cold start (from real issues + community reports):
HTTP 503 Service Unavailable
HTTP 500 with body containing: "llama runner", "loading model", "server not yet available", "timed out waiting for llama runner to start", or "model is loading"

In code: add a new method in OllamaProvider:Rustfn

is_model_loading_error(&self, err: &reqwest::Error) -> bool {
if err.status() == Some(StatusCode::SERVICE_UNAVAILABLE) { return true; }
let body = err.to_string().to_lowercase();
body.contains("llama runner") || body.contains("loading model") || ...
}

Then update the should_retry predicate to use this only for Ollama (and only up to ~120s total). Real errors fail instantly. This is precise, zero user impact, and easy to maintain

2 is more what I was thinking tbh

DOsinga

Clean fix — the transient_only mode correctly handles the Codex concern about 4xx errors getting the long Ollama backoff, the tests are meaningful, and the mechanical RetryConfig::new() updates to bedrock/databricks are the right approach now that transient_only is a private field. LGTM.

Resolve merge conflict in crates/goose/src/providers/retry.rs by combining: - upstream auth credential refresh logic - PR transient_only support for should_retry Signed-off-by: fre <anonwurcod@proton.me>

…se#7677) Signed-off-by: fre <anonwurcod@proton.me> Signed-off-by: Cameron Yick <cameron.yick@datadoghq.com>

fresh3nough force-pushed the fix/ollama-model-loading-timeout-7635 branch from 616e2df to 0334f3a Compare March 5, 2026 15:24

chatgpt-codex-connector Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread crates/goose/src/providers/ollama.rs

fresh3nough force-pushed the fix/ollama-model-loading-timeout-7635 branch from 0334f3a to 0b03c0a Compare March 5, 2026 15:32

fresh3nough force-pushed the fix/ollama-model-loading-timeout-7635 branch from 65bffa9 to 31bddd6 Compare March 5, 2026 15:45

DOsinga assigned lifeizhou-ap Mar 9, 2026

DOsinga reviewed Mar 11, 2026

View reviewed changes

Comment thread crates/goose/src/providers/ollama.rs Outdated

fresh3nough changed the title ~~fix: increase Ollama retry config for slow model loading~~ fix: increase Ollama retry config + add transient-only mode Mar 11, 2026

fresh3nough force-pushed the fix/ollama-model-loading-timeout-7635 branch from bcc0d6b to d571860 Compare March 11, 2026 16:01

DOsinga approved these changes Mar 20, 2026

View reviewed changes

Merge upstream/main into fix/ollama-model-loading-timeout-7635

530e539

Resolve merge conflict in crates/goose/src/providers/retry.rs by combining: - upstream auth credential refresh logic - PR transient_only support for should_retry Signed-off-by: fre <anonwurcod@proton.me>

DOsinga added this pull request to the merge queue Mar 26, 2026

Merged via the queue into aaif-goose:main with commit dfbd2dd Mar 26, 2026
21 checks passed

hydrosquall pushed a commit to hydrosquall/goose that referenced this pull request Mar 31, 2026

fix: increase Ollama retry config + add transient-only mode (aaif-goo…

a3b8675

…se#7677) Signed-off-by: fre <anonwurcod@proton.me> Signed-off-by: Cameron Yick <cameron.yick@datadoghq.com>

github-actions Bot mentioned this pull request Apr 2, 2026

chore(release): release version 1.30.0 #8261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: increase Ollama retry config + add transient-only mode#7677

fix: increase Ollama retry config + add transient-only mode#7677
DOsinga merged 4 commits into
aaif-goose:mainfrom
fresh3nough:fix/ollama-model-loading-timeout-7635

fresh3nough commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

DOsinga left a comment

Uh oh!

Uh oh!

fresh3nough commented Mar 11, 2026

Uh oh!

DOsinga commented Mar 11, 2026

Uh oh!

fresh3nough commented Mar 11, 2026 •

edited

Loading

Uh oh!

DOsinga left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fresh3nough commented Mar 5, 2026

Problem

Fix

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fresh3nough commented Mar 11, 2026

Uh oh!

DOsinga commented Mar 11, 2026

Uh oh!

fresh3nough commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fresh3nough commented Mar 11, 2026 •

edited

Loading