Skip to content

Conversation

@uddhav
Copy link
Contributor

@uddhav uddhav commented Apr 20, 2025

GCP Vertex AI: Add retry handling for Anthropic API 529 overloaded status code

Adds support for handling Anthropic's HTTP 529 "API Overloaded" status code in the GCP Vertex AI provider. This status code indicates temporary backend capacity issues rather than quota exhaustion.

Screenshot 2025-04-17 at 9 10 33 AM

Changes

  • Added detection and retry logic for 529 "API overloaded" responses
  • Applied the same backoff strategy used for 429 rate limit errors
  • Enhanced error messages to distinguish between rate limits and overloaded states
  • Added unit test to verify correct 529 status code handling
  • chore: cleaned up the deprecated GCP Vertex AI model ID for Gemini 2.0 Pro Experimental

This improves reliability when interacting with Anthropic models through the Vertex AI provider during high-traffic periods.

@uddhav uddhav marked this pull request as ready for review April 20, 2025 05:37
@uddhav uddhav force-pushed the gcp-vertex-ai-overloaded-retry branch from d88ab0f to ab94cfc Compare April 21, 2025 22:42
@uddhav uddhav force-pushed the gcp-vertex-ai-overloaded-retry branch from ab94cfc to dfbb7e4 Compare May 2, 2025 01:51
@baxen
Copy link
Collaborator

baxen commented Jun 16, 2025

We're doing a cleanup run on providers for #2953, since this is now a bit out of date with other changes i suggest we close this PR and include this in the refactor? Thank you for the contrib!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants