Skip to content

Add Priority PayGo cost tracking gemini/vertex ai#21909

Merged
Sameerlite merged 4 commits intomainfrom
litellm_cost_tracking_gemini
Feb 23, 2026
Merged

Add Priority PayGo cost tracking gemini/vertex ai#21909
Sameerlite merged 4 commits intomainfrom
litellm_cost_tracking_gemini

Conversation

@Sameerlite
Copy link
Collaborator

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix

Changes

@vercel
Copy link

vercel bot commented Feb 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 23, 2026 1:27pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Greptile Summary

This PR adds Priority PayGo cost tracking for Gemini and Vertex AI models by mapping Gemini's trafficType field (e.g. ON_DEMAND_PRIORITY, FLEX) to LiteLLM's existing service_tier mechanism, and adding priority-specific pricing keys to the model pricing JSON for Gemini 3.x Pro/Flash and 2.5 Pro models.

  • Adds _map_traffic_type_to_service_tier to translate Gemini traffic types to service tiers used by the cost key suffix logic (_priority, _flex)
  • Extracts traffic_type from _hidden_params.provider_specific_fields during completion_cost and maps it to service_tier
  • Threads service_tier through Gemini and Vertex AI cost calculator functions to generic_cost_per_token
  • Extends _get_token_base_cost to apply service_tier to above-threshold input/output keys and filters out service_tier-specific threshold keys from the threshold detection loop
  • Adds priority pricing fields (input_cost_per_token_priority, output_cost_per_token_priority, etc.) for ~15 model entries across both JSON files
  • Bug: Above-threshold cache read/creation keys are not mapped to service_tier variants, so priority-tier requests above 200k tokens will use standard cache read rates instead of the higher priority rates
  • No tests included — the PR pre-submission checklist requires at least 1 test in tests/litellm/, and this feature has enough complexity to warrant coverage (traffic type mapping, above-threshold tiered pricing, service_tier extraction from hidden params)

Confidence Score: 2/5

  • This PR has a logic bug in above-threshold cache pricing for priority tier and lacks required tests — recommend fixing before merge.
  • Score of 2 reflects: (1) a concrete logic bug where above-threshold cache read/creation costs ignore service_tier, leading to incorrect billing for priority-tier requests above 200k tokens; (2) no tests are included despite the PR checklist requiring at least 1 test; (3) provider-specific code placed outside the llms/ directory contrary to repo conventions. The core plumbing for service_tier threading is correct, and the pricing JSON additions look accurate.
  • litellm/litellm_core_utils/llm_cost_calc/utils.py — above-threshold cache keys not mapped to service_tier variants. litellm/cost_calculator.py — provider-specific Gemini mapping should be moved to llms/ directory.

Important Files Changed

Filename Overview
litellm/cost_calculator.py Adds traffic_type-to-service_tier mapping and extraction from hidden_params for Gemini/Vertex AI. Provider-specific code placed outside llms/ directory contrary to repo conventions.
litellm/litellm_core_utils/llm_cost_calc/utils.py Extends threshold pricing and audio cost to support service_tier, but above-threshold cache read/creation keys are not mapped to service_tier variants — causing incorrect pricing for priority tier above 200k tokens.
litellm/llms/gemini/cost_calculator.py Passes new service_tier parameter through to generic_cost_per_token. Straightforward, correct change.
litellm/llms/vertex_ai/cost_calculator.py Adds service_tier parameter and passes it to generic_cost_per_token. Note: the _handle_128k_pricing early-return path at line 263 still ignores service_tier, but no current models have both 128k threshold pricing and priority tiers.
model_prices_and_context_window.json Adds priority pricing fields (_priority suffixed keys) and supports_service_tier: true for Gemini 3.x Pro/Flash and 2.5 Pro models across both gemini/ and vertex_ai/ prefixes. Trailing newline removed.
litellm/model_prices_and_context_window_backup.json Mirror of model_prices_and_context_window.json changes — adds priority pricing fields for Gemini models. Kept in sync with primary file.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Gemini/Vertex AI Response] -->|usageMetadata.trafficType| B[_hidden_params.provider_specific_fields]
    B --> C{completion_cost}
    C -->|Extract traffic_type| D[_map_traffic_type_to_service_tier]
    D -->|ON_DEMAND_PRIORITY| E["service_tier = 'priority'"]
    D -->|FLEX / BATCH| F["service_tier = 'flex'"]
    D -->|ON_DEMAND| G["service_tier = None"]
    E --> H[cost_per_token]
    F --> H
    G --> H
    H -->|gemini provider| I[gemini/cost_calculator.py]
    H -->|vertex_ai provider| J[vertex_ai/cost_calculator.py]
    I --> K[generic_cost_per_token]
    J --> K
    K --> L[_get_token_base_cost]
    L -->|service_tier suffix| M["Key lookup: e.g. input_cost_per_token_priority"]
    L -->|above threshold + service_tier| N["Key lookup: e.g. input_cost_per_token_above_200k_tokens_priority"]
    L -->|above threshold cache - BUG| O["Key lookup: cache_read_..._above_200k_tokens\n(missing _priority suffix)"]
    M --> P[Final Cost]
    N --> P
    O --> P
Loading

Last reviewed commit: 164cde9

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 17135 to +17144
"supports_web_search": true,
"supports_url_context": true,
"supports_native_streaming": true,
"tpm": 800000
"input_cost_per_token_priority": 3.6e-06,
"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,
"output_cost_per_token_priority": 2.16e-05,
"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,
"cache_read_input_token_cost_priority": 3.6e-07,
"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,
"supports_service_tier": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tpm field accidentally removed

The tpm: 800000 field that was previously present for gemini/gemini-3.1-pro-preview has been dropped in this change. The diff shows the old line - "tpm": 800000 was replaced with the new priority pricing fields, but tpm was not retained. This will affect rate limiting behavior for this model.

The same field was correctly preserved for the other models (e.g. gemini/gemini-3-pro-preview, gemini/gemini-3-flash-preview, gemini/gemini-3.1-pro-preview-customtools).

Suggested change
"supports_web_search": true,
"supports_url_context": true,
"supports_native_streaming": true,
"tpm": 800000
"input_cost_per_token_priority": 3.6e-06,
"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,
"output_cost_per_token_priority": 2.16e-05,
"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,
"cache_read_input_token_cost_priority": 3.6e-07,
"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,
"supports_service_tier": true
"supports_native_streaming": true,
"tpm": 800000,
"input_cost_per_token_priority": 3.6e-06,
"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,
"output_cost_per_token_priority": 2.16e-05,
"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,
"cache_read_input_token_cost_priority": 3.6e-07,
"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,
"supports_service_tier": true

Comment on lines +710 to +737
_GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER: dict = {
# ON_DEMAND_PRIORITY maps to "priority" — selects input_cost_per_token_priority, etc.
"ON_DEMAND_PRIORITY": "priority",
# FLEX / BATCH maps to "flex" — selects input_cost_per_token_flex, etc.
"FLEX": "flex",
"BATCH": "flex",
# ON_DEMAND is standard pricing — no service_tier suffix applied
"ON_DEMAND": None,
}


def _map_traffic_type_to_service_tier(traffic_type: Optional[str]) -> Optional[str]:
"""
Map a Gemini usageMetadata.trafficType value to a LiteLLM service_tier string.

This allows the same `_priority` / `_flex` cost-key suffix logic used for
OpenAI/Azure to work for Gemini and Vertex AI models.

trafficType values seen in practice
------------------------------------
ON_DEMAND -> standard pricing (service_tier = None)
ON_DEMAND_PRIORITY -> priority pricing (service_tier = "priority")
FLEX / BATCH -> batch/flex pricing (service_tier = "flex")
"""
if traffic_type is None:
return None
service_tier = _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER.get(traffic_type.upper())
return service_tier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini-specific mapping placed outside llms/

The _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER dictionary and _map_traffic_type_to_service_tier function are Gemini/Vertex AI-specific concepts (ON_DEMAND, ON_DEMAND_PRIORITY, FLEX, BATCH are Gemini traffic types). Per repository conventions, provider-specific code should live inside the llms/ directory (e.g., litellm/llms/gemini/cost_calculator.py or litellm/llms/vertex_ai/cost_calculator.py).

Consider moving this mapping to litellm/llms/gemini/cost_calculator.py or a shared Gemini/Vertex utility, and importing it here.

Context Used: Rule from dashboard - What: Avoid writing provider-specific code outside of the llms/ directory.

Why: This practice ensur... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 17135 to +17144
"supports_web_search": true,
"supports_url_context": true,
"supports_native_streaming": true,
"tpm": 800000
"input_cost_per_token_priority": 3.6e-06,
"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,
"output_cost_per_token_priority": 2.16e-05,
"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,
"cache_read_input_token_cost_priority": 3.6e-07,
"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,
"supports_service_tier": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tpm field accidentally removed (backup JSON)

Same issue as the primary JSON file — the tpm: 800000 field for gemini/gemini-3.1-pro-preview was dropped when the priority pricing fields were added. Both files should be kept in sync.

Suggested change
"supports_web_search": true,
"supports_url_context": true,
"supports_native_streaming": true,
"tpm": 800000
"input_cost_per_token_priority": 3.6e-06,
"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,
"output_cost_per_token_priority": 2.16e-05,
"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,
"cache_read_input_token_cost_priority": 3.6e-07,
"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,
"supports_service_tier": true
"supports_native_streaming": true,
"tpm": 800000,
"input_cost_per_token_priority": 3.6e-06,
"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,
"output_cost_per_token_priority": 2.16e-05,
"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,
"cache_read_input_token_cost_priority": 3.6e-07,
"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,
"supports_service_tier": true

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Additional Comments (1)

litellm/llms/vertex_ai/cost_calculator.py
service_tier ignored in 128k pricing path

The _handle_128k_pricing function does not accept or use the service_tier parameter. When a Vertex AI model has input_cost_per_token_above_128k_tokens set (e.g., older gemini-1.5 models), this early-return path will always use standard pricing regardless of the traffic type. The service_tier parameter added to cost_per_token is never forwarded here.

Currently this only affects older models (gemini-1.5-flash etc.) that don't have priority pricing configured, but it creates a latent bug if priority pricing is ever added for those models.

@Sameerlite
Copy link
Collaborator Author

@greptile-apps

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +710 to +737
_GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER: dict = {
# ON_DEMAND_PRIORITY maps to "priority" — selects input_cost_per_token_priority, etc.
"ON_DEMAND_PRIORITY": "priority",
# FLEX / BATCH maps to "flex" — selects input_cost_per_token_flex, etc.
"FLEX": "flex",
"BATCH": "flex",
# ON_DEMAND is standard pricing — no service_tier suffix applied
"ON_DEMAND": None,
}


def _map_traffic_type_to_service_tier(traffic_type: Optional[str]) -> Optional[str]:
"""
Map a Gemini usageMetadata.trafficType value to a LiteLLM service_tier string.

This allows the same `_priority` / `_flex` cost-key suffix logic used for
OpenAI/Azure to work for Gemini and Vertex AI models.

trafficType values seen in practice
------------------------------------
ON_DEMAND -> standard pricing (service_tier = None)
ON_DEMAND_PRIORITY -> priority pricing (service_tier = "priority")
FLEX / BATCH -> batch/flex pricing (service_tier = "flex")
"""
if traffic_type is None:
return None
service_tier = _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER.get(traffic_type.upper())
return service_tier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini-specific mapping outside llms/ directory

_GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER and _map_traffic_type_to_service_tier encode Gemini/Vertex AI-specific concepts (ON_DEMAND, ON_DEMAND_PRIORITY, FLEX, BATCH are Gemini traffic types). Per repository conventions, provider-specific code should live inside the llms/ directory (e.g., litellm/llms/gemini/cost_calculator.py or a shared Gemini/Vertex utility) and be imported here.

This keeps cost_calculator.py provider-agnostic and aligns with the existing pattern where provider-specific cost logic lives under litellm/llms/{provider}/.

Context Used: Rule from dashboard - What: Avoid writing provider-specific code outside of the llms/ directory.

Why: This practice ensur... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Additional Comments (1)

litellm/litellm_core_utils/llm_cost_calc/utils.py
Above-threshold cache costs ignore service_tier

The input and output above-threshold keys are correctly mapped to service_tier-specific variants (e.g. input_cost_per_token_above_200k_tokens_priority), but the cache read and cache creation above-threshold keys on lines 266-274 are not. Since the JSON data includes keys like cache_read_input_token_cost_above_200k_tokens_priority (which differs from the standard rate — e.g. 7.2e-07 vs 4e-07 for gemini-3.1-pro-preview), a priority-tier request exceeding 200k tokens will use the standard cache read rate instead of the higher priority rate.

The fix would be similar to what's done for input/output keys — use _get_service_tier_cost_key to construct the tiered cache keys when service_tier is set:

                    # Apply tiered pricing to cache costs
                    cache_creation_tiered_key = (
                        _get_service_tier_cost_key(
                            f"cache_creation_input_token_cost_above_{threshold_str}_tokens",
                            service_tier,
                        )
                        if service_tier
                        else f"cache_creation_input_token_cost_above_{threshold_str}_tokens"
                    )
                    cache_creation_1hr_tiered_key = (
                        f"cache_creation_input_token_cost_above_1hr_above_{threshold_str}_tokens"
                    )
                    cache_read_tiered_key = (
                        _get_service_tier_cost_key(
                            f"cache_read_input_token_cost_above_{threshold_str}_tokens",
                            service_tier,
                        )
                        if service_tier
                        else f"cache_read_input_token_cost_above_{threshold_str}_tokens"
                    )

@Sameerlite Sameerlite merged commit f97ee62 into main Feb 23, 2026
32 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant