Skip to content

Conversation

@DOsinga
Copy link
Collaborator

@DOsinga DOsinga commented Jul 21, 2025

This lifts the retry mechanism out of the databricks provider and also the error parsing so we can generally call it. it would be good to discuss this and merge it and then apply it to all the other things

fixes #887

@DOsinga DOsinga requested a review from baxen July 21, 2025 16:11
@michaelneale
Copy link
Collaborator

seems sensible approach - if we go with this - can we also close down #3194 and #3547 (which is my update of it) - as this does part of that (not sure if you want any of the other stuff from there)

@DOsinga
Copy link
Collaborator Author

DOsinga commented Jul 22, 2025

seems sensible approach - if we go with this - can we also close down #3194 and #3547 (which is my update of it) - as this does part of that (not sure if you want any of the other stuff from there)

sorry, hadn't seen your version of this, just the the jackjack - I like the configbuilder thing from yours, I'll add that in a follow up. mostly what stops me from getting this in, is the testing

Resolved merge conflicts in provider files by keeping retry logic from HEAD branch.
Fixed compilation issues with async closures and reference handling.
@DOsinga
Copy link
Collaborator Author

DOsinga commented Aug 3, 2025

this is ready for review @michaelneale - I incorporated the client logic from the other PR. I left out connection pooling as the LLMs tell me we already had that

@DOsinga DOsinga requested review from ahau-square and michaelneale and removed request for ahau-square and baxen August 3, 2025 08:50
@DOsinga
Copy link
Collaborator Author

DOsinga commented Aug 3, 2025

/cc @ahau-square for visibility

Copy link
Collaborator

@jamadeo jamadeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great improvement -- how do we satisfactorily test?

I think there is a subtle bug in your ApiClient .with_ methods though


/// Fetch supported models from Anthropic; returns Err on failure, Ok(None) if not present
async fn fetch_supported_models_async(&self) -> Result<Option<Vec<String>>, ProviderError> {
async fn fetch_supported_models(&self) -> Result<Option<Vec<String>>, ProviderError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for renaming this

None => return Err(StatusCode::BAD_REQUEST),
};

let model_config = ModelConfig::new(&model).map_err(|_| StatusCode::BAD_REQUEST)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice if we could include text with these, but maybe leave that for another change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree - I didn't quickly see how that works though. there's a whole bunch of other handlers that would be helped

#[allow(dead_code)]
OAuth(OAuthConfig),
Custom(Box<dyn AuthProvider>),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice


impl ApiClient {
pub fn new(host: String, auth: AuthMethod) -> Result<Self> {
Self::with_timeout(host, auth, Duration::from_secs(600))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const DEFAULT_TIMEOUT = ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.client = Client::builder()
.timeout(self.timeout)
.default_headers(self.default_headers.clone())
.build()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these .with_ methods look like they could be chained because they take a reference but they make a new client every time -- maybe a builder would be better? .with_timeout().with_headers() won't do the expected thing right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit clunky and maybe a builder pattern would be better, but since these methods also set the timeout and the headers on themselves it should work, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry yes you're right, I think I got mixed up because with_timeout is not a method after all

}
}

pub struct ApiRequestBuilder<'a> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could just use an owned String for path and drop the lifetime here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@DOsinga
Copy link
Collaborator Author

DOsinga commented Aug 4, 2025

Looks like a great improvement -- how do we satisfactorily test?

I think there is a subtle bug in your ApiClient .with_ methods though

I don't know how we satisfactory test this other than making sure we have keys for all providers and then have scenario tests for all of them or some such.

* main:
  Token counting in Auto-compact uses provider metadata (#3788)
  docs: Add YouTube link to Git MCP Tutorial (#3831)
  feat: more robust client initialization for the app (#3830)
  Build app bundles on release branches always (#3789)
  fix param order of debug_conversation_fixer (#3796)
  Fix directory switcher not working in active chat sessions and file browser not defaulting to current session directory path (#3791)
  File completion in CLI (#3822)
@michaelneale
Copy link
Collaborator

Aside: how well does the retry work in databricks? I haven't found it working for me lately (well I keep hitting things which I thought were retry-able, upstream errors etc)?

"Claude and other models from Anthropic",
ANTHROPIC_DEFAULT_MODEL,
vec![
ModelInfo::new("claude-sonnet-4-0", 200000),
Copy link
Collaborator

@michaelneale michaelneale Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do these live now - not used as it is always loaded?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: these will be in a follow up for per provider custom stuff

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we declared ANTHROPIC_KNOWN_MODELS but were not using that at all. I now drag it in and just annotate it with 200k.

but yeah, the true way is for all providers to declare their models and token limits and then in models.rs read that to estimate limits instead of doing it twice or three times as we do now. small steps

fn print_value(value: &Value, debug: bool) {
let formatted = match value {
Value::String(s) => {
if !debug && s.len() > get_tool_params_max_length() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shows up as:

image

is that intended when not debug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know - I didn't really touch this code, the new linter rules force me to break up this function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah - fair enough, yes refactoring makes reviewing diffs very difficult when it breaks up things (wish the diff was a bit smarter, and could point out the code previously existed - if you search for it you can see but easy to miss). Yeah makes sense, just odd I never saw this before.

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok - I did test it with :

  • anthropic/opus
  • databricks/sonnet
  • openrouter/qwen3-coder

and all seemed to work well (but didnt' test other providers).

not sure what this takes away (ie would need follow on for anthropic provider models?)

and one comment about the big bold yellow REDACTED but seems nice and the direction we want it to go in.

@DOsinga DOsinga merged commit 918fadd into main Aug 5, 2025
11 checks passed
@DOsinga DOsinga deleted the generic-retry-and-error-parsing branch August 5, 2025 08:37
michaelneale added a commit that referenced this pull request Aug 5, 2025
* main:
  Fix leaky env variable causing flaky test (#3761)
  Update gemini error msg (#3847)
  Generic retry and error parsing (#3558)
  Clear the current line on ctrl-c in line with other tools (#3764)
katzdave added a commit that referenced this pull request Aug 5, 2025
* 'main' of github.com:block/goose:
  Changed app settings configuration form to match settings panels (#3829)
  Tell the user to hit compact (#3851)
  Pin @mcp-ui/client in package.json (#3860)
  blog for mcp-jupyter server (#3059)
  docs: Adding dev.to Tutorial & Update CLI Component (#3828)
  Detect client disconnects and cancel tool calls (#3782)
  Suppress ansi with pipes (#3775)
  Fix leaky env variable causing flaky test (#3761)
  Update gemini error msg (#3847)
  Generic retry and error parsing (#3558)
  Clear the current line on ctrl-c in line with other tools (#3764)
  chore: upgrade morph to use new model with instruction (#3745)
  add CODEOWNERS file with /documentation owners (#3840)
kathawthorne added a commit to kathawthorne/goose that referenced this pull request Aug 5, 2025
…-files

* upstream/main: (150 commits)
  fix: replace glob/grep tool with shell (block#3834)
  docs: Add Youtube Link to dev.to tutorial (block#3869)
  Changed app settings configuration form to match settings panels (block#3829)
  Tell the user to hit compact (block#3851)
  Pin @mcp-ui/client in package.json (block#3860)
  blog for mcp-jupyter server (block#3059)
  docs: Adding dev.to Tutorial & Update CLI Component (block#3828)
  Detect client disconnects and cancel tool calls (block#3782)
  Suppress ansi with pipes (block#3775)
  Fix leaky env variable causing flaky test (block#3761)
  Update gemini error msg (block#3847)
  Generic retry and error parsing (block#3558)
  Clear the current line on ctrl-c in line with other tools (block#3764)
  chore: upgrade morph to use new model with instruction (block#3745)
  add CODEOWNERS file with /documentation owners (block#3840)
  Token counting in Auto-compact uses provider metadata (block#3788)
  docs: Add YouTube link to Git MCP Tutorial (block#3831)
  feat: more robust client initialization for the app (block#3830)
  Build app bundles on release branches always (block#3789)
  fix param order of debug_conversation_fixer (block#3796)
  ...

# Conflicts:
#	crates/goose-mcp/src/developer/mod.rs
kathawthorne added a commit to kathawthorne/goose that referenced this pull request Aug 5, 2025
…e-editable-displayable-title

* upstream/main: (134 commits)
  fix: optimise reading large file content (block#3767)
  fix: replace glob/grep tool with shell (block#3834)
  docs: Add Youtube Link to dev.to tutorial (block#3869)
  Changed app settings configuration form to match settings panels (block#3829)
  Tell the user to hit compact (block#3851)
  Pin @mcp-ui/client in package.json (block#3860)
  blog for mcp-jupyter server (block#3059)
  docs: Adding dev.to Tutorial & Update CLI Component (block#3828)
  Detect client disconnects and cancel tool calls (block#3782)
  Suppress ansi with pipes (block#3775)
  Fix leaky env variable causing flaky test (block#3761)
  Update gemini error msg (block#3847)
  Generic retry and error parsing (block#3558)
  Clear the current line on ctrl-c in line with other tools (block#3764)
  chore: upgrade morph to use new model with instruction (block#3745)
  add CODEOWNERS file with /documentation owners (block#3840)
  Token counting in Auto-compact uses provider metadata (block#3788)
  docs: Add YouTube link to Git MCP Tutorial (block#3831)
  feat: more robust client initialization for the app (block#3830)
  Build app bundles on release branches always (block#3789)
  ...
michaelneale added a commit that referenced this pull request Aug 5, 2025
* main: (33 commits)
  fix: optimise reading large file content (#3767)
  fix: replace glob/grep tool with shell (#3834)
  docs: Add Youtube Link to dev.to tutorial (#3869)
  Changed app settings configuration form to match settings panels (#3829)
  Tell the user to hit compact (#3851)
  Pin @mcp-ui/client in package.json (#3860)
  blog for mcp-jupyter server (#3059)
  docs: Adding dev.to Tutorial & Update CLI Component (#3828)
  Detect client disconnects and cancel tool calls (#3782)
  Suppress ansi with pipes (#3775)
  Fix leaky env variable causing flaky test (#3761)
  Update gemini error msg (#3847)
  Generic retry and error parsing (#3558)
  Clear the current line on ctrl-c in line with other tools (#3764)
  chore: upgrade morph to use new model with instruction (#3745)
  add CODEOWNERS file with /documentation owners (#3840)
  Token counting in Auto-compact uses provider metadata (#3788)
  docs: Add YouTube link to Git MCP Tutorial (#3831)
  feat: more robust client initialization for the app (#3830)
  Build app bundles on release branches always (#3789)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle API Rate Limits sensibly

4 participants