diff --git a/docs/my-website/docs/providers/vertex.md b/docs/my-website/docs/providers/vertex.md index 94619082e88..d919d0412cd 100644 --- a/docs/my-website/docs/providers/vertex.md +++ b/docs/my-website/docs/providers/vertex.md @@ -1687,6 +1687,20 @@ litellm.vertex_location = "us-central1 # Your Location | gemini-2.5-flash-lite-preview-09-2025 | `completion('gemini-2.5-flash-lite-preview-09-2025', messages)`, `completion('vertex_ai/gemini-2.5-flash-lite-preview-09-2025', messages)` | | gemini-3.1-flash-lite-preview | `completion('gemini-3.1-flash-lite-preview', messages)`, `completion('vertex_ai/gemini-3.1-flash-lite-preview', messages)` | +## PayGo / Priority Cost Tracking + +LiteLLM automatically tracks spend for Vertex AI Gemini models using the correct pricing tier based on the response's `usageMetadata.trafficType`: + +| Vertex AI `trafficType` | LiteLLM `service_tier` | Pricing applied | +|-------------------------|-------------------------|-----------------| +| `ON_DEMAND_PRIORITY` | `priority` | PayGo / priority pricing (`input_cost_per_token_priority`, `output_cost_per_token_priority`) | +| `ON_DEMAND` | standard | Default on-demand pricing | +| `FLEX` / `BATCH` | `flex` | Batch/flex pricing | + +When you use [Vertex AI PayGo](https://cloud.google.com/vertex-ai/generative-ai/pricing) (on-demand priority) or batch workloads, LiteLLM reads `trafficType` from the response and applies the matching cost per token from the [model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). No configuration is required — spend tracking works out of the box for both standard and PayGo requests. + +See [Spend Tracking](../proxy/cost_tracking.md) for general cost tracking setup. + ## Private Service Connect (PSC) Endpoints LiteLLM supports Vertex AI models deployed to Private Service Connect (PSC) endpoints, allowing you to use custom `api_base` URLs for private deployments. diff --git a/docs/my-website/docs/proxy/cost_tracking.md b/docs/my-website/docs/proxy/cost_tracking.md index b1e5eae2a62..f28eec287d4 100644 --- a/docs/my-website/docs/proxy/cost_tracking.md +++ b/docs/my-website/docs/proxy/cost_tracking.md @@ -8,6 +8,8 @@ Track spend for keys, users, and teams across 100+ LLMs. LiteLLM automatically tracks spend for all known models. See our [model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json) +Provider-specific cost tracking (e.g., [Vertex AI PayGo / priority pricing](../providers/vertex.md#paygo--priority-cost-tracking), [Bedrock service tiers](../providers/bedrock.md#usage---service-tier), [Azure base model mapping](./custom_pricing.md#set-base_model-for-cost-tracking-eg-azure-deployments)) is applied automatically when the response includes tier metadata. + :::tip Keep Pricing Data Updated [Sync model pricing data from GitHub](./sync_models_github.md) to ensure accurate cost tracking. ::: diff --git a/docs/my-website/docs/proxy/custom_pricing.md b/docs/my-website/docs/proxy/custom_pricing.md index b61da85bb1d..2a28ddbc454 100644 --- a/docs/my-website/docs/proxy/custom_pricing.md +++ b/docs/my-website/docs/proxy/custom_pricing.md @@ -104,9 +104,18 @@ There are other keys you can use to specify costs for different scenarios and mo - `input_cost_per_video_per_second` - Cost per second of video input - `input_cost_per_video_per_second_above_128k_tokens` - Video cost for large contexts - `input_cost_per_character` - Character-based pricing for some providers +- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini, Bedrock) +- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing These keys evolve based on how new models handle multimodality. The latest version can be found at [https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). +### Service Tier / PayGo Pricing (Vertex AI, Bedrock) + +For providers that support multiple pricing tiers (e.g., Vertex AI PayGo, Bedrock service tiers), LiteLLM automatically applies the correct cost based on the response: + +- **Vertex AI Gemini**: Uses `usageMetadata.trafficType` (`ON_DEMAND_PRIORITY` → priority, `FLEX`/`BATCH` → flex). See [Vertex AI - PayGo / Priority Cost Tracking](../providers/vertex.md#paygo--priority-cost-tracking). +- **Bedrock**: Uses `serviceTier` from the response. See [Bedrock - Usage - Service Tier](../providers/bedrock.md#usage---service-tier). + ## Zero-Cost Models (Bypass Budget Checks) **Use Case**: You have on-premises or free models that should be accessible even when users exceed their budget limits.