Generic retry and error parsing #3558

DOsinga · 2025-07-21T16:11:09Z

This lifts the retry mechanism out of the databricks provider and also the error parsing so we can generally call it. it would be good to discuss this and merge it and then apply it to all the other things

fixes #887

michaelneale · 2025-07-21T23:47:45Z

seems sensible approach - if we go with this - can we also close down #3194 and #3547 (which is my update of it) - as this does part of that (not sure if you want any of the other stuff from there)

DOsinga · 2025-07-22T18:23:01Z

seems sensible approach - if we go with this - can we also close down #3194 and #3547 (which is my update of it) - as this does part of that (not sure if you want any of the other stuff from there)

sorry, hadn't seen your version of this, just the the jackjack - I like the configbuilder thing from yours, I'll add that in a follow up. mostly what stops me from getting this in, is the testing

Resolved merge conflicts in provider files by keeping retry logic from HEAD branch. Fixed compilation issues with async closures and reference handling.

DOsinga · 2025-08-03T08:50:10Z

this is ready for review @michaelneale - I incorporated the client logic from the other PR. I left out connection pooling as the LLMs tell me we already had that

DOsinga · 2025-08-03T08:51:08Z

/cc @ahau-square for visibility

jamadeo

Looks like a great improvement -- how do we satisfactorily test?

I think there is a subtle bug in your ApiClient .with_ methods though

jamadeo · 2025-07-21T16:12:10Z

crates/goose/src/providers/anthropic.rs


    /// Fetch supported models from Anthropic; returns Err on failure, Ok(None) if not present
-    async fn fetch_supported_models_async(&self) -> Result<Option<Vec<String>>, ProviderError> {
+    async fn fetch_supported_models(&self) -> Result<Option<Vec<String>>, ProviderError> {


thank you for renaming this

jamadeo · 2025-08-04T14:40:07Z

crates/goose-server/src/routes/agent.rs

+        None => return Err(StatusCode::BAD_REQUEST),
+    };
+
+    let model_config = ModelConfig::new(&model).map_err(|_| StatusCode::BAD_REQUEST)?;


would be nice if we could include text with these, but maybe leave that for another change

I do agree - I didn't quickly see how that works though. there's a whole bunch of other handlers that would be helped

jamadeo · 2025-08-04T14:48:13Z

crates/goose/src/providers/api_client.rs

+    #[allow(dead_code)]
+    OAuth(OAuthConfig),
+    Custom(Box<dyn AuthProvider>),
+}


jamadeo · 2025-08-04T14:49:54Z

crates/goose/src/providers/api_client.rs

+
+impl ApiClient {
+    pub fn new(host: String, auth: AuthMethod) -> Result<Self> {
+        Self::with_timeout(host, auth, Duration::from_secs(600))


const DEFAULT_TIMEOUT = ...

jamadeo · 2025-08-04T14:51:55Z

crates/goose/src/providers/api_client.rs

+        self.client = Client::builder()
+            .timeout(self.timeout)
+            .default_headers(self.default_headers.clone())
+            .build()?;


these .with_ methods look like they could be chained because they take a reference but they make a new client every time -- maybe a builder would be better? .with_timeout().with_headers() won't do the expected thing right?

This is a bit clunky and maybe a builder pattern would be better, but since these methods also set the timeout and the headers on themselves it should work, no?

sorry yes you're right, I think I got mixed up because with_timeout is not a method after all

jamadeo · 2025-08-04T14:57:15Z

crates/goose/src/providers/api_client.rs

+    }
+}
+
+pub struct ApiRequestBuilder<'a> {


nit: you could just use an owned String for path and drop the lifetime here

DOsinga · 2025-08-04T15:32:23Z

Looks like a great improvement -- how do we satisfactorily test?

I think there is a subtle bug in your ApiClient .with_ methods though

I don't know how we satisfactory test this other than making sure we have keys for all providers and then have scenario tests for all of them or some such.

* main: Token counting in Auto-compact uses provider metadata (#3788) docs: Add YouTube link to Git MCP Tutorial (#3831) feat: more robust client initialization for the app (#3830) Build app bundles on release branches always (#3789) fix param order of debug_conversation_fixer (#3796) Fix directory switcher not working in active chat sessions and file browser not defaulting to current session directory path (#3791) File completion in CLI (#3822)

michaelneale · 2025-08-05T06:37:05Z

Aside: how well does the retry work in databricks? I haven't found it working for me lately (well I keep hitting things which I thought were retry-able, upstream errors etc)?

michaelneale · 2025-08-05T06:38:49Z

crates/goose/src/providers/anthropic.rs

            "Claude and other models from Anthropic",
            ANTHROPIC_DEFAULT_MODEL,
-            vec![
-                ModelInfo::new("claude-sonnet-4-0", 200000),


where do these live now - not used as it is always loaded?

note: these will be in a follow up for per provider custom stuff

previously we declared ANTHROPIC_KNOWN_MODELS but were not using that at all. I now drag it in and just annotate it with 200k.

but yeah, the true way is for all providers to declare their models and token limits and then in models.rs read that to estimate limits instead of doing it twice or three times as we do now. small steps

michaelneale · 2025-08-05T08:04:54Z

crates/goose-cli/src/session/output.rs

+fn print_value(value: &Value, debug: bool) {
+    let formatted = match value {
+        Value::String(s) => {
+            if !debug && s.len() > get_tool_params_max_length() {


this shows up as:

is that intended when not debug?

I don't know - I didn't really touch this code, the new linter rules force me to break up this function

ah - fair enough, yes refactoring makes reviewing diffs very difficult when it breaks up things (wish the diff was a bit smarter, and could point out the code previously existed - if you search for it you can see but easy to miss). Yeah makes sense, just odd I never saw this before.

michaelneale

I think this is ok - I did test it with :

anthropic/opus
databricks/sonnet
openrouter/qwen3-coder

and all seemed to work well (but didnt' test other providers).

not sure what this takes away (ie would need follow on for anthropic provider models?)

and one comment about the big bold yellow REDACTED but seems nice and the direction we want it to go in.

* main: Fix leaky env variable causing flaky test (#3761) Update gemini error msg (#3847) Generic retry and error parsing (#3558) Clear the current line on ctrl-c in line with other tools (#3764)

* 'main' of github.com:block/goose: Changed app settings configuration form to match settings panels (#3829) Tell the user to hit compact (#3851) Pin @mcp-ui/client in package.json (#3860) blog for mcp-jupyter server (#3059) docs: Adding dev.to Tutorial & Update CLI Component (#3828) Detect client disconnects and cancel tool calls (#3782) Suppress ansi with pipes (#3775) Fix leaky env variable causing flaky test (#3761) Update gemini error msg (#3847) Generic retry and error parsing (#3558) Clear the current line on ctrl-c in line with other tools (#3764) chore: upgrade morph to use new model with instruction (#3745) add CODEOWNERS file with /documentation owners (#3840)

…-files * upstream/main: (150 commits) fix: replace glob/grep tool with shell (block#3834) docs: Add Youtube Link to dev.to tutorial (block#3869) Changed app settings configuration form to match settings panels (block#3829) Tell the user to hit compact (block#3851) Pin @mcp-ui/client in package.json (block#3860) blog for mcp-jupyter server (block#3059) docs: Adding dev.to Tutorial & Update CLI Component (block#3828) Detect client disconnects and cancel tool calls (block#3782) Suppress ansi with pipes (block#3775) Fix leaky env variable causing flaky test (block#3761) Update gemini error msg (block#3847) Generic retry and error parsing (block#3558) Clear the current line on ctrl-c in line with other tools (block#3764) chore: upgrade morph to use new model with instruction (block#3745) add CODEOWNERS file with /documentation owners (block#3840) Token counting in Auto-compact uses provider metadata (block#3788) docs: Add YouTube link to Git MCP Tutorial (block#3831) feat: more robust client initialization for the app (block#3830) Build app bundles on release branches always (block#3789) fix param order of debug_conversation_fixer (block#3796) ... # Conflicts: # crates/goose-mcp/src/developer/mod.rs

…e-editable-displayable-title * upstream/main: (134 commits) fix: optimise reading large file content (block#3767) fix: replace glob/grep tool with shell (block#3834) docs: Add Youtube Link to dev.to tutorial (block#3869) Changed app settings configuration form to match settings panels (block#3829) Tell the user to hit compact (block#3851) Pin @mcp-ui/client in package.json (block#3860) blog for mcp-jupyter server (block#3059) docs: Adding dev.to Tutorial & Update CLI Component (block#3828) Detect client disconnects and cancel tool calls (block#3782) Suppress ansi with pipes (block#3775) Fix leaky env variable causing flaky test (block#3761) Update gemini error msg (block#3847) Generic retry and error parsing (block#3558) Clear the current line on ctrl-c in line with other tools (block#3764) chore: upgrade morph to use new model with instruction (block#3745) add CODEOWNERS file with /documentation owners (block#3840) Token counting in Auto-compact uses provider metadata (block#3788) docs: Add YouTube link to Git MCP Tutorial (block#3831) feat: more robust client initialization for the app (block#3830) Build app bundles on release branches always (block#3789) ...

* main: (33 commits) fix: optimise reading large file content (#3767) fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869) Changed app settings configuration form to match settings panels (#3829) Tell the user to hit compact (#3851) Pin @mcp-ui/client in package.json (#3860) blog for mcp-jupyter server (#3059) docs: Adding dev.to Tutorial & Update CLI Component (#3828) Detect client disconnects and cancel tool calls (#3782) Suppress ansi with pipes (#3775) Fix leaky env variable causing flaky test (#3761) Update gemini error msg (#3847) Generic retry and error parsing (#3558) Clear the current line on ctrl-c in line with other tools (#3764) chore: upgrade morph to use new model with instruction (#3745) add CODEOWNERS file with /documentation owners (#3840) Token counting in Auto-compact uses provider metadata (#3788) docs: Add YouTube link to Git MCP Tutorial (#3831) feat: more robust client initialization for the app (#3830) Build app bundles on release branches always (#3789) ...

Generic retry and error parsing

0a7fbde

DOsinga requested a review from baxen July 21, 2025 16:11

Douwe Osinga added 2 commits July 22, 2025 10:38

More providers

c6fd1bf

Goose did the rest

f08ec24

DOsinga mentioned this pull request Jul 22, 2025

Expand databricks agent error handling to fallback to token limit errors #3535

Closed

Merge main into generic-retry-and-error-parsing

b60d032

Resolved merge conflicts in provider files by keeping retry logic from HEAD branch. Fixed compilation issues with async closures and reference handling.

DOsinga mentioned this pull request Aug 2, 2025

fix: auto-compact on context limit error #3635

Merged

4 tasks

Douwe Osinga added 7 commits August 2, 2025 18:41

Start working on the ApiClient

c215f99

WIP

b092746

WIP

dbbaad0

Checkpoint

a4d3f58

Change the models

1f31c76

Add one more

c9145e4

Do the other providers

d2ab2fc

DOsinga requested review from ahau-square and michaelneale and removed request for ahau-square and baxen August 3, 2025 08:50

Douwe Osinga added 3 commits August 4, 2025 10:32

Merge branch 'main' into generic-retry-and-error-parsing

b542699

Allow

1d84913

Complexity

fe0365e

jamadeo approved these changes Aug 4, 2025

View reviewed changes

michaelneale assigned michaelneale and unassigned michaelneale Aug 4, 2025

michaelneale reviewed Aug 5, 2025

View reviewed changes

michaelneale approved these changes Aug 5, 2025

View reviewed changes

DOsinga merged commit 918fadd into main Aug 5, 2025
11 checks passed

DOsinga deleted the generic-retry-and-error-parsing branch August 5, 2025 08:37

alexhancock mentioned this pull request Aug 7, 2025

chore(release): release version 1.3.0 #3921

Merged

Generic retry and error parsing #3558

Generic retry and error parsing #3558

Uh oh!

Conversation

DOsinga commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelneale commented Jul 21, 2025

Uh oh!

DOsinga commented Jul 22, 2025

Uh oh!

DOsinga commented Aug 3, 2025

Uh oh!

DOsinga commented Aug 3, 2025

Uh oh!

jamadeo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DOsinga commented Aug 4, 2025

Uh oh!

michaelneale commented Aug 5, 2025

Uh oh!

michaelneale Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelneale left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DOsinga commented Jul 21, 2025 •

edited

Loading

michaelneale Aug 5, 2025 •

edited

Loading