Add Priority PayGo cost tracking gemini/vertex ai#21909
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds Priority PayGo cost tracking for Gemini and Vertex AI models by mapping Gemini's
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/cost_calculator.py | Adds traffic_type-to-service_tier mapping and extraction from hidden_params for Gemini/Vertex AI. Provider-specific code placed outside llms/ directory contrary to repo conventions. |
| litellm/litellm_core_utils/llm_cost_calc/utils.py | Extends threshold pricing and audio cost to support service_tier, but above-threshold cache read/creation keys are not mapped to service_tier variants — causing incorrect pricing for priority tier above 200k tokens. |
| litellm/llms/gemini/cost_calculator.py | Passes new service_tier parameter through to generic_cost_per_token. Straightforward, correct change. |
| litellm/llms/vertex_ai/cost_calculator.py | Adds service_tier parameter and passes it to generic_cost_per_token. Note: the _handle_128k_pricing early-return path at line 263 still ignores service_tier, but no current models have both 128k threshold pricing and priority tiers. |
| model_prices_and_context_window.json | Adds priority pricing fields (_priority suffixed keys) and supports_service_tier: true for Gemini 3.x Pro/Flash and 2.5 Pro models across both gemini/ and vertex_ai/ prefixes. Trailing newline removed. |
| litellm/model_prices_and_context_window_backup.json | Mirror of model_prices_and_context_window.json changes — adds priority pricing fields for Gemini models. Kept in sync with primary file. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Gemini/Vertex AI Response] -->|usageMetadata.trafficType| B[_hidden_params.provider_specific_fields]
B --> C{completion_cost}
C -->|Extract traffic_type| D[_map_traffic_type_to_service_tier]
D -->|ON_DEMAND_PRIORITY| E["service_tier = 'priority'"]
D -->|FLEX / BATCH| F["service_tier = 'flex'"]
D -->|ON_DEMAND| G["service_tier = None"]
E --> H[cost_per_token]
F --> H
G --> H
H -->|gemini provider| I[gemini/cost_calculator.py]
H -->|vertex_ai provider| J[vertex_ai/cost_calculator.py]
I --> K[generic_cost_per_token]
J --> K
K --> L[_get_token_base_cost]
L -->|service_tier suffix| M["Key lookup: e.g. input_cost_per_token_priority"]
L -->|above threshold + service_tier| N["Key lookup: e.g. input_cost_per_token_above_200k_tokens_priority"]
L -->|above threshold cache - BUG| O["Key lookup: cache_read_..._above_200k_tokens\n(missing _priority suffix)"]
M --> P[Final Cost]
N --> P
O --> P
Last reviewed commit: 164cde9
| "supports_web_search": true, | ||
| "supports_url_context": true, | ||
| "supports_native_streaming": true, | ||
| "tpm": 800000 | ||
| "input_cost_per_token_priority": 3.6e-06, | ||
| "input_cost_per_token_above_200k_tokens_priority": 7.2e-06, | ||
| "output_cost_per_token_priority": 2.16e-05, | ||
| "output_cost_per_token_above_200k_tokens_priority": 3.24e-05, | ||
| "cache_read_input_token_cost_priority": 3.6e-07, | ||
| "cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07, | ||
| "supports_service_tier": true |
There was a problem hiding this comment.
tpm field accidentally removed
The tpm: 800000 field that was previously present for gemini/gemini-3.1-pro-preview has been dropped in this change. The diff shows the old line - "tpm": 800000 was replaced with the new priority pricing fields, but tpm was not retained. This will affect rate limiting behavior for this model.
The same field was correctly preserved for the other models (e.g. gemini/gemini-3-pro-preview, gemini/gemini-3-flash-preview, gemini/gemini-3.1-pro-preview-customtools).
| "supports_web_search": true, | |
| "supports_url_context": true, | |
| "supports_native_streaming": true, | |
| "tpm": 800000 | |
| "input_cost_per_token_priority": 3.6e-06, | |
| "input_cost_per_token_above_200k_tokens_priority": 7.2e-06, | |
| "output_cost_per_token_priority": 2.16e-05, | |
| "output_cost_per_token_above_200k_tokens_priority": 3.24e-05, | |
| "cache_read_input_token_cost_priority": 3.6e-07, | |
| "cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07, | |
| "supports_service_tier": true | |
| "supports_native_streaming": true, | |
| "tpm": 800000, | |
| "input_cost_per_token_priority": 3.6e-06, | |
| "input_cost_per_token_above_200k_tokens_priority": 7.2e-06, | |
| "output_cost_per_token_priority": 2.16e-05, | |
| "output_cost_per_token_above_200k_tokens_priority": 3.24e-05, | |
| "cache_read_input_token_cost_priority": 3.6e-07, | |
| "cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07, | |
| "supports_service_tier": true |
| _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER: dict = { | ||
| # ON_DEMAND_PRIORITY maps to "priority" — selects input_cost_per_token_priority, etc. | ||
| "ON_DEMAND_PRIORITY": "priority", | ||
| # FLEX / BATCH maps to "flex" — selects input_cost_per_token_flex, etc. | ||
| "FLEX": "flex", | ||
| "BATCH": "flex", | ||
| # ON_DEMAND is standard pricing — no service_tier suffix applied | ||
| "ON_DEMAND": None, | ||
| } | ||
|
|
||
|
|
||
| def _map_traffic_type_to_service_tier(traffic_type: Optional[str]) -> Optional[str]: | ||
| """ | ||
| Map a Gemini usageMetadata.trafficType value to a LiteLLM service_tier string. | ||
|
|
||
| This allows the same `_priority` / `_flex` cost-key suffix logic used for | ||
| OpenAI/Azure to work for Gemini and Vertex AI models. | ||
|
|
||
| trafficType values seen in practice | ||
| ------------------------------------ | ||
| ON_DEMAND -> standard pricing (service_tier = None) | ||
| ON_DEMAND_PRIORITY -> priority pricing (service_tier = "priority") | ||
| FLEX / BATCH -> batch/flex pricing (service_tier = "flex") | ||
| """ | ||
| if traffic_type is None: | ||
| return None | ||
| service_tier = _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER.get(traffic_type.upper()) | ||
| return service_tier |
There was a problem hiding this comment.
Gemini-specific mapping placed outside llms/
The _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER dictionary and _map_traffic_type_to_service_tier function are Gemini/Vertex AI-specific concepts (ON_DEMAND, ON_DEMAND_PRIORITY, FLEX, BATCH are Gemini traffic types). Per repository conventions, provider-specific code should live inside the llms/ directory (e.g., litellm/llms/gemini/cost_calculator.py or litellm/llms/vertex_ai/cost_calculator.py).
Consider moving this mapping to litellm/llms/gemini/cost_calculator.py or a shared Gemini/Vertex utility, and importing it here.
Context Used: Rule from dashboard - What: Avoid writing provider-specific code outside of the llms/ directory.
Why: This practice ensur... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| "supports_web_search": true, | ||
| "supports_url_context": true, | ||
| "supports_native_streaming": true, | ||
| "tpm": 800000 | ||
| "input_cost_per_token_priority": 3.6e-06, | ||
| "input_cost_per_token_above_200k_tokens_priority": 7.2e-06, | ||
| "output_cost_per_token_priority": 2.16e-05, | ||
| "output_cost_per_token_above_200k_tokens_priority": 3.24e-05, | ||
| "cache_read_input_token_cost_priority": 3.6e-07, | ||
| "cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07, | ||
| "supports_service_tier": true |
There was a problem hiding this comment.
tpm field accidentally removed (backup JSON)
Same issue as the primary JSON file — the tpm: 800000 field for gemini/gemini-3.1-pro-preview was dropped when the priority pricing fields were added. Both files should be kept in sync.
| "supports_web_search": true, | |
| "supports_url_context": true, | |
| "supports_native_streaming": true, | |
| "tpm": 800000 | |
| "input_cost_per_token_priority": 3.6e-06, | |
| "input_cost_per_token_above_200k_tokens_priority": 7.2e-06, | |
| "output_cost_per_token_priority": 2.16e-05, | |
| "output_cost_per_token_above_200k_tokens_priority": 3.24e-05, | |
| "cache_read_input_token_cost_priority": 3.6e-07, | |
| "cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07, | |
| "supports_service_tier": true | |
| "supports_native_streaming": true, | |
| "tpm": 800000, | |
| "input_cost_per_token_priority": 3.6e-06, | |
| "input_cost_per_token_above_200k_tokens_priority": 7.2e-06, | |
| "output_cost_per_token_priority": 2.16e-05, | |
| "output_cost_per_token_above_200k_tokens_priority": 3.24e-05, | |
| "cache_read_input_token_cost_priority": 3.6e-07, | |
| "cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07, | |
| "supports_service_tier": true |
Additional Comments (1)
The Currently this only affects older models (gemini-1.5-flash etc.) that don't have priority pricing configured, but it creates a latent bug if priority pricing is ever added for those models. |
| _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER: dict = { | ||
| # ON_DEMAND_PRIORITY maps to "priority" — selects input_cost_per_token_priority, etc. | ||
| "ON_DEMAND_PRIORITY": "priority", | ||
| # FLEX / BATCH maps to "flex" — selects input_cost_per_token_flex, etc. | ||
| "FLEX": "flex", | ||
| "BATCH": "flex", | ||
| # ON_DEMAND is standard pricing — no service_tier suffix applied | ||
| "ON_DEMAND": None, | ||
| } | ||
|
|
||
|
|
||
| def _map_traffic_type_to_service_tier(traffic_type: Optional[str]) -> Optional[str]: | ||
| """ | ||
| Map a Gemini usageMetadata.trafficType value to a LiteLLM service_tier string. | ||
|
|
||
| This allows the same `_priority` / `_flex` cost-key suffix logic used for | ||
| OpenAI/Azure to work for Gemini and Vertex AI models. | ||
|
|
||
| trafficType values seen in practice | ||
| ------------------------------------ | ||
| ON_DEMAND -> standard pricing (service_tier = None) | ||
| ON_DEMAND_PRIORITY -> priority pricing (service_tier = "priority") | ||
| FLEX / BATCH -> batch/flex pricing (service_tier = "flex") | ||
| """ | ||
| if traffic_type is None: | ||
| return None | ||
| service_tier = _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER.get(traffic_type.upper()) | ||
| return service_tier |
There was a problem hiding this comment.
Gemini-specific mapping outside llms/ directory
_GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER and _map_traffic_type_to_service_tier encode Gemini/Vertex AI-specific concepts (ON_DEMAND, ON_DEMAND_PRIORITY, FLEX, BATCH are Gemini traffic types). Per repository conventions, provider-specific code should live inside the llms/ directory (e.g., litellm/llms/gemini/cost_calculator.py or a shared Gemini/Vertex utility) and be imported here.
This keeps cost_calculator.py provider-agnostic and aligns with the existing pattern where provider-specific cost logic lives under litellm/llms/{provider}/.
Context Used: Rule from dashboard - What: Avoid writing provider-specific code outside of the llms/ directory.
Why: This practice ensur... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Additional Comments (1)
The input and output above-threshold keys are correctly mapped to service_tier-specific variants (e.g. The fix would be similar to what's done for input/output keys — use |
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
🐛 Bug Fix
Changes