fix: increase Ollama retry config + add transient-only mode#7677
Conversation
616e2df to
0334f3a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 616e2df1c5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…7635) When Ollama loads a large model (e.g. Qwen3.5 35b), it returns HTTP 500 errors while the model is loading into memory. The default retry config (3 retries, ~7s total) was insufficient for models that take 10-120s to load, causing 'connection refused' errors. Override retry_config() in OllamaProvider with Ollama-specific values: - 10 retries (up from 3) - 2s initial interval (up from 1s) - 1.5x backoff multiplier (down from 2.0, more gradual) - 15s max interval (down from 30s) This provides ~100s of total retry wait time, enough for large models on slower hardware. Closes aaif-goose#7635 Signed-off-by: fre <anonwurcod@proton.me>
0334f3a to
0b03c0a
Compare
Client errors (400/404) such as typo model names now fail fast instead of waiting 80-100s through the full retry backoff. Transient errors (5xx during model loading, connection refused, rate limits) still use the extended Ollama retry config. - Add transient_only flag to RetryConfig - Update should_retry predicate to accept config - Set transient_only on Ollama retry config - Add unit tests for retry predicate behavior Signed-off-by: fre <anonwurcod@proton.me>
65bffa9 to
31bddd6
Compare
DOsinga
left a comment
There was a problem hiding this comment.
this does seem to change a lot more than just the retry for ollama
|
in addition im updating pr title to extending transient mode |
Remove test_ollama_retry_config_values (asserts constants equal themselves) and test_ollama_retry_config_provides_sufficient_wait_time (recomputes backoff math). The transient_only behavior test is retained as it exercises actual feature logic. Signed-off-by: fre <anonwurcod@proton.me>
bcc0d6b to
d571860
Compare
|
so I am thinking we shouldn't fix this using the retry mechanism - it would just retry any ollama call even the real faulty one > 1m. is there a different way of doing this? something that is more targeted to the issue |
|
gotcha, couple options: Long First-Byte Timeout + Normal Chunk Timeout (builds on your earlier Ollama work) --> Use tokio::time::timeout only until the first SSE line/chunk arrives (like the “defer stall timeout” pattern we discussed before). --> After first chunk → revert to 30s per-chunk. --> This handles slow model loading without touching retries at all. or Smart Error-Pattern Retry Predicate In code: add a new method in OllamaProvider:Rustfn is_model_loading_error(&self, err: &reqwest::Error) -> bool { Then update the should_retry predicate to use this only for Ollama (and only up to ~120s total). Real errors fail instantly. This is precise, zero user impact, and easy to maintain 2 is more what I was thinking tbh |
DOsinga
left a comment
There was a problem hiding this comment.
Clean fix — the transient_only mode correctly handles the Codex concern about 4xx errors getting the long Ollama backoff, the tests are meaningful, and the mechanical RetryConfig::new() updates to bedrock/databricks are the right approach now that transient_only is a private field. LGTM.
Resolve merge conflict in crates/goose/src/providers/retry.rs by combining: - upstream auth credential refresh logic - PR transient_only support for should_retry Signed-off-by: fre <anonwurcod@proton.me>
…se#7677) Signed-off-by: fre <anonwurcod@proton.me> Signed-off-by: Cameron Yick <cameron.yick@datadoghq.com>
Problem
When using Ollama with large models (e.g. Qwen3.5 35b), goose gives up after ~7 seconds with a 500 'connection refused' error. Ollama returns HTTP 500 while loading a model into memory, which can take 10-120s for large models on consumer hardware.
The default retry config (3 retries with 1s/2s/4s backoff = ~7s total) is insufficient for this scenario.
Fix
Override
retry_config()inOllamaProviderwith values tuned for local model loading:This provides ~100s of total retry wait time (even with worst-case jitter, >60s), which handles models that take up to ~2 minutes to load.
Testing
cargo clippycleanCloses #7635