BerriAI · Sameerlite · Mar 6, 2026 · Mar 6, 2026 · greptile-apps · Mar 6, 2026
diff --git a/docs/my-website/docs/providers/vertex.md b/docs/my-website/docs/providers/vertex.md
@@ -1687,6 +1687,20 @@ litellm.vertex_location = "us-central1 # Your Location
 | gemini-2.5-flash-lite-preview-09-2025   | `completion('gemini-2.5-flash-lite-preview-09-2025', messages)`, `completion('vertex_ai/gemini-2.5-flash-lite-preview-09-2025', messages)` |
 | gemini-3.1-flash-lite-preview   | `completion('gemini-3.1-flash-lite-preview', messages)`, `completion('vertex_ai/gemini-3.1-flash-lite-preview', messages)` |
 
+## PayGo / Priority Cost Tracking
+
+LiteLLM automatically tracks spend for Vertex AI Gemini models using the correct pricing tier based on the response's `usageMetadata.trafficType`:
+
+| Vertex AI `trafficType` | LiteLLM `service_tier` | Pricing applied |
+|-------------------------|-------------------------|-----------------|
+| `ON_DEMAND_PRIORITY` | `priority` | PayGo / priority pricing (`input_cost_per_token_priority`, `output_cost_per_token_priority`) |
+| `ON_DEMAND` | standard | Default on-demand pricing |
-| `ON_DEMAND` | standard | Default on-demand pricing |
+| `ON_DEMAND` | `standard` | Default on-demand pricing |
-| `ON_DEMAND` | standard | Default on-demand pricing |
+| `ON_DEMAND` | `standard` | Default on-demand pricing |
+| `FLEX` / `BATCH` | `flex` | Batch/flex pricing |
+
+When you use [Vertex AI PayGo](https://cloud.google.com/vertex-ai/generative-ai/pricing) (on-demand priority) or batch workloads, LiteLLM reads `trafficType` from the response and applies the matching cost per token from the [model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). No configuration is required — spend tracking works out of the box for both standard and PayGo requests.
+
+See [Spend Tracking](../proxy/cost_tracking.md) for general cost tracking setup.
+
 ## Private Service Connect (PSC) Endpoints
 
 LiteLLM supports Vertex AI models deployed to Private Service Connect (PSC) endpoints, allowing you to use custom `api_base` URLs for private deployments.

diff --git a/docs/my-website/docs/proxy/cost_tracking.md b/docs/my-website/docs/proxy/cost_tracking.md
@@ -8,6 +8,8 @@ Track spend for keys, users, and teams across 100+ LLMs.
 
 LiteLLM automatically tracks spend for all known models. See our [model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
 
+Provider-specific cost tracking (e.g., [Vertex AI PayGo / priority pricing](../providers/vertex.md#paygo--priority-cost-tracking), [Bedrock service tiers](../providers/bedrock.md#usage---service-tier), [Azure base model mapping](./custom_pricing.md#set-base_model-for-cost-tracking-eg-azure-deployments)) is applied automatically when the response includes tier metadata.
+
 :::tip Keep Pricing Data Updated
 [Sync model pricing data from GitHub](./sync_models_github.md) to ensure accurate cost tracking.
 :::

diff --git a/docs/my-website/docs/proxy/custom_pricing.md b/docs/my-website/docs/proxy/custom_pricing.md
@@ -104,9 +104,18 @@ There are other keys you can use to specify costs for different scenarios and mo
 - `input_cost_per_video_per_second` - Cost per second of video input
 - `input_cost_per_video_per_second_above_128k_tokens` - Video cost for large contexts
 - `input_cost_per_character` - Character-based pricing for some providers
+- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini, Bedrock)
+- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing
- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini, Bedrock)
- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing
+- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini; can be manually set for other providers)
+- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing (Vertex AI Gemini; can be manually set for other providers)
- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini, Bedrock)
- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing
+- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini; can be manually set for other providers)
+- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing (Vertex AI Gemini; can be manually set for other providers)
 
 These keys evolve based on how new models handle multimodality. The latest version can be found at [https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json).
 
+### Service Tier / PayGo Pricing (Vertex AI, Bedrock)
+
+For providers that support multiple pricing tiers (e.g., Vertex AI PayGo, Bedrock service tiers), LiteLLM automatically applies the correct cost based on the response:
+
+- **Vertex AI Gemini**: Uses `usageMetadata.trafficType` (`ON_DEMAND_PRIORITY` → priority, `FLEX`/`BATCH` → flex). See [Vertex AI - PayGo / Priority Cost Tracking](../providers/vertex.md#paygo--priority-cost-tracking).
+- **Bedrock**: Uses `serviceTier` from the response. See [Bedrock - Usage - Service Tier](../providers/bedrock.md#usage---service-tier).
+
 ## Zero-Cost Models (Bypass Budget Checks)
 
 **Use Case**: You have on-premises or free models that should be accessible even when users exceed their budget limits.