Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs/my-website/docs/providers/vertex.md
Original file line number Diff line number Diff line change
Expand Up @@ -1687,6 +1687,20 @@ litellm.vertex_location = "us-central1 # Your Location
| gemini-2.5-flash-lite-preview-09-2025 | `completion('gemini-2.5-flash-lite-preview-09-2025', messages)`, `completion('vertex_ai/gemini-2.5-flash-lite-preview-09-2025', messages)` |
| gemini-3.1-flash-lite-preview | `completion('gemini-3.1-flash-lite-preview', messages)`, `completion('vertex_ai/gemini-3.1-flash-lite-preview', messages)` |

## PayGo / Priority Cost Tracking

LiteLLM automatically tracks spend for Vertex AI Gemini models using the correct pricing tier based on the response's `usageMetadata.trafficType`:

| Vertex AI `trafficType` | LiteLLM `service_tier` | Pricing applied |
|-------------------------|-------------------------|-----------------|
| `ON_DEMAND_PRIORITY` | `priority` | PayGo / priority pricing (`input_cost_per_token_priority`, `output_cost_per_token_priority`) |
| `ON_DEMAND` | standard | Default on-demand pricing |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent backtick formatting for standard tier

The priority and flex values in the LiteLLM service_tier column are wrapped in backticks, but standard is not. For consistency:

Suggested change
| `ON_DEMAND` | standard | Default on-demand pricing |
| `ON_DEMAND` | `standard` | Default on-demand pricing |

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

| `FLEX` / `BATCH` | `flex` | Batch/flex pricing |

When you use [Vertex AI PayGo](https://cloud.google.com/vertex-ai/generative-ai/pricing) (on-demand priority) or batch workloads, LiteLLM reads `trafficType` from the response and applies the matching cost per token from the [model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). No configuration is required — spend tracking works out of the box for both standard and PayGo requests.

See [Spend Tracking](../proxy/cost_tracking.md) for general cost tracking setup.

## Private Service Connect (PSC) Endpoints

LiteLLM supports Vertex AI models deployed to Private Service Connect (PSC) endpoints, allowing you to use custom `api_base` URLs for private deployments.
Expand Down
2 changes: 2 additions & 0 deletions docs/my-website/docs/proxy/cost_tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Track spend for keys, users, and teams across 100+ LLMs.

LiteLLM automatically tracks spend for all known models. See our [model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)

Provider-specific cost tracking (e.g., [Vertex AI PayGo / priority pricing](../providers/vertex.md#paygo--priority-cost-tracking), [Bedrock service tiers](../providers/bedrock.md#usage---service-tier), [Azure base model mapping](./custom_pricing.md#set-base_model-for-cost-tracking-eg-azure-deployments)) is applied automatically when the response includes tier metadata.

:::tip Keep Pricing Data Updated
[Sync model pricing data from GitHub](./sync_models_github.md) to ensure accurate cost tracking.
:::
Expand Down
9 changes: 9 additions & 0 deletions docs/my-website/docs/proxy/custom_pricing.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,18 @@ There are other keys you can use to specify costs for different scenarios and mo
- `input_cost_per_video_per_second` - Cost per second of video input
- `input_cost_per_video_per_second_above_128k_tokens` - Video cost for large contexts
- `input_cost_per_character` - Character-based pricing for some providers
- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini, Bedrock)
- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing
Comment on lines +107 to +108
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading Bedrock attribution for priority/flex pricing keys

The description says input_cost_per_token_priority / output_cost_per_token_priority applies to "(Vertex AI Gemini, Bedrock)", but inspecting model_prices_and_context_window.json shows that no Bedrock model entries currently have input_cost_per_token_priority or input_cost_per_token_flex keys. Only Vertex AI (and Google Gemini, Azure, OpenAI) models have these entries populated.

The code does have a fallback (_get_cost_per_unit in llm_cost_calc/utils.py) that gracefully returns standard pricing when a tier-specific key is missing, so Bedrock requests won't error — but the cost tracking will silently use standard rates regardless of the requested service tier, which is the opposite of what users reading this note might expect.

Consider either:

  • Removing "Bedrock" from this line (since the keys are not pre-populated for it), or
  • Clarifying that for Bedrock, these keys can be manually set in custom pricing to enable tier-differentiated tracking, but no out-of-the-box priority/flex pricing data exists for Bedrock today.
Suggested change
- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini, Bedrock)
- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing
- `input_cost_per_token_priority` / `output_cost_per_token_priority` - Priority/PayGo pricing (Vertex AI Gemini; can be manually set for other providers)
- `input_cost_per_token_flex` / `output_cost_per_token_flex` - Batch/flex pricing (Vertex AI Gemini; can be manually set for other providers)


These keys evolve based on how new models handle multimodality. The latest version can be found at [https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json).

### Service Tier / PayGo Pricing (Vertex AI, Bedrock)

For providers that support multiple pricing tiers (e.g., Vertex AI PayGo, Bedrock service tiers), LiteLLM automatically applies the correct cost based on the response:

- **Vertex AI Gemini**: Uses `usageMetadata.trafficType` (`ON_DEMAND_PRIORITY` → priority, `FLEX`/`BATCH` → flex). See [Vertex AI - PayGo / Priority Cost Tracking](../providers/vertex.md#paygo--priority-cost-tracking).
- **Bedrock**: Uses `serviceTier` from the response. See [Bedrock - Usage - Service Tier](../providers/bedrock.md#usage---service-tier).

## Zero-Cost Models (Bypass Budget Checks)

**Use Case**: You have on-premises or free models that should be accessible even when users exceed their budget limits.
Expand Down
Loading