Add Priority PayGo cost tracking gemini/vertex ai by Sameerlite · Pull Request #21909 · BerriAI/litellm

Sameerlite · 2026-02-23T07:49:58Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
🐛 Bug Fix

Changes

vercel · 2026-02-23T07:50:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 23, 2026 1:27pm

greptile-apps · 2026-02-23T07:54:23Z

Greptile Summary

This PR adds Priority PayGo cost tracking for Gemini and Vertex AI models by mapping Gemini's trafficType field (e.g. ON_DEMAND_PRIORITY, FLEX) to LiteLLM's existing service_tier mechanism, and adding priority-specific pricing keys to the model pricing JSON for Gemini 3.x Pro/Flash and 2.5 Pro models.

Adds _map_traffic_type_to_service_tier to translate Gemini traffic types to service tiers used by the cost key suffix logic (_priority, _flex)
Extracts traffic_type from _hidden_params.provider_specific_fields during completion_cost and maps it to service_tier
Threads service_tier through Gemini and Vertex AI cost calculator functions to generic_cost_per_token
Extends _get_token_base_cost to apply service_tier to above-threshold input/output keys and filters out service_tier-specific threshold keys from the threshold detection loop
Adds priority pricing fields (input_cost_per_token_priority, output_cost_per_token_priority, etc.) for ~15 model entries across both JSON files
Bug: Above-threshold cache read/creation keys are not mapped to service_tier variants, so priority-tier requests above 200k tokens will use standard cache read rates instead of the higher priority rates
No tests included — the PR pre-submission checklist requires at least 1 test in tests/litellm/, and this feature has enough complexity to warrant coverage (traffic type mapping, above-threshold tiered pricing, service_tier extraction from hidden params)

Confidence Score: 2/5

This PR has a logic bug in above-threshold cache pricing for priority tier and lacks required tests — recommend fixing before merge.
Score of 2 reflects: (1) a concrete logic bug where above-threshold cache read/creation costs ignore service_tier, leading to incorrect billing for priority-tier requests above 200k tokens; (2) no tests are included despite the PR checklist requiring at least 1 test; (3) provider-specific code placed outside the llms/ directory contrary to repo conventions. The core plumbing for service_tier threading is correct, and the pricing JSON additions look accurate.
litellm/litellm_core_utils/llm_cost_calc/utils.py — above-threshold cache keys not mapped to service_tier variants. litellm/cost_calculator.py — provider-specific Gemini mapping should be moved to llms/ directory.

Important Files Changed

Filename	Overview
litellm/cost_calculator.py	Adds traffic_type-to-service_tier mapping and extraction from hidden_params for Gemini/Vertex AI. Provider-specific code placed outside `llms/` directory contrary to repo conventions.
litellm/litellm_core_utils/llm_cost_calc/utils.py	Extends threshold pricing and audio cost to support service_tier, but above-threshold cache read/creation keys are not mapped to service_tier variants — causing incorrect pricing for priority tier above 200k tokens.
litellm/llms/gemini/cost_calculator.py	Passes new `service_tier` parameter through to `generic_cost_per_token`. Straightforward, correct change.
litellm/llms/vertex_ai/cost_calculator.py	Adds `service_tier` parameter and passes it to `generic_cost_per_token`. Note: the `_handle_128k_pricing` early-return path at line 263 still ignores service_tier, but no current models have both 128k threshold pricing and priority tiers.
model_prices_and_context_window.json	Adds priority pricing fields (`_priority` suffixed keys) and `supports_service_tier: true` for Gemini 3.x Pro/Flash and 2.5 Pro models across both gemini/ and vertex_ai/ prefixes. Trailing newline removed.
litellm/model_prices_and_context_window_backup.json	Mirror of model_prices_and_context_window.json changes — adds priority pricing fields for Gemini models. Kept in sync with primary file.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Gemini/Vertex AI Response] -->|usageMetadata.trafficType| B[_hidden_params.provider_specific_fields]
    B --> C{completion_cost}
    C -->|Extract traffic_type| D[_map_traffic_type_to_service_tier]
    D -->|ON_DEMAND_PRIORITY| E["service_tier = 'priority'"]
    D -->|FLEX / BATCH| F["service_tier = 'flex'"]
    D -->|ON_DEMAND| G["service_tier = None"]
    E --> H[cost_per_token]
    F --> H
    G --> H
    H -->|gemini provider| I[gemini/cost_calculator.py]
    H -->|vertex_ai provider| J[vertex_ai/cost_calculator.py]
    I --> K[generic_cost_per_token]
    J --> K
    K --> L[_get_token_base_cost]
    L -->|service_tier suffix| M["Key lookup: e.g. input_cost_per_token_priority"]
    L -->|above threshold + service_tier| N["Key lookup: e.g. input_cost_per_token_above_200k_tokens_priority"]
    L -->|above threshold cache - BUG| O["Key lookup: cache_read_..._above_200k_tokens\n(missing _priority suffix)"]
    M --> P[Final Cost]
    N --> P
    O --> P

_{Last reviewed commit: 164cde9}

greptile-apps

_{6 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-23T07:54:27Z

model_prices_and_context_window.json

        "supports_web_search": true,
        "supports_url_context": true,
        "supports_native_streaming": true,
-        "tpm": 800000
+        "input_cost_per_token_priority": 3.6e-06,
+        "input_cost_per_token_above_200k_tokens_priority": 7.2e-06,
+        "output_cost_per_token_priority": 2.16e-05,
+        "output_cost_per_token_above_200k_tokens_priority": 3.24e-05,
+        "cache_read_input_token_cost_priority": 3.6e-07,
+        "cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,
+        "supports_service_tier": true


tpm field accidentally removed

The tpm: 800000 field that was previously present for gemini/gemini-3.1-pro-preview has been dropped in this change. The diff shows the old line - "tpm": 800000 was replaced with the new priority pricing fields, but tpm was not retained. This will affect rate limiting behavior for this model.

The same field was correctly preserved for the other models (e.g. gemini/gemini-3-pro-preview, gemini/gemini-3-flash-preview, gemini/gemini-3.1-pro-preview-customtools).

Suggested change

"supports_web_search": true,

"supports_url_context": true,

"supports_native_streaming": true,

"tpm": 800000

"input_cost_per_token_priority": 3.6e-06,

"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,

"output_cost_per_token_priority": 2.16e-05,

"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,

"cache_read_input_token_cost_priority": 3.6e-07,

"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,

"supports_service_tier": true

"supports_native_streaming": true,

"tpm": 800000,

"input_cost_per_token_priority": 3.6e-06,

"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,

"output_cost_per_token_priority": 2.16e-05,

"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,

"cache_read_input_token_cost_priority": 3.6e-07,

"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,

"supports_service_tier": true

greptile-apps · 2026-02-23T07:54:31Z

litellm/cost_calculator.py

+_GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER: dict = {
+    # ON_DEMAND_PRIORITY maps to "priority" — selects input_cost_per_token_priority, etc.
+    "ON_DEMAND_PRIORITY": "priority",
+    # FLEX / BATCH maps to "flex" — selects input_cost_per_token_flex, etc.
+    "FLEX": "flex",
+    "BATCH": "flex",
+    # ON_DEMAND is standard pricing — no service_tier suffix applied
+    "ON_DEMAND": None,
+}
+
+
+def _map_traffic_type_to_service_tier(traffic_type: Optional[str]) -> Optional[str]:
+    """
+    Map a Gemini usageMetadata.trafficType value to a LiteLLM service_tier string.
+
+    This allows the same `_priority` / `_flex` cost-key suffix logic used for
+    OpenAI/Azure to work for Gemini and Vertex AI models.
+
+    trafficType values seen in practice
+    ------------------------------------
+    ON_DEMAND          -> standard pricing  (service_tier = None)
+    ON_DEMAND_PRIORITY -> priority pricing  (service_tier = "priority")
+    FLEX / BATCH       -> batch/flex pricing (service_tier = "flex")
+    """
+    if traffic_type is None:
+        return None
+    service_tier = _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER.get(traffic_type.upper())
+    return service_tier


Gemini-specific mapping placed outside llms/

The _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER dictionary and _map_traffic_type_to_service_tier function are Gemini/Vertex AI-specific concepts (ON_DEMAND, ON_DEMAND_PRIORITY, FLEX, BATCH are Gemini traffic types). Per repository conventions, provider-specific code should live inside the llms/ directory (e.g., litellm/llms/gemini/cost_calculator.py or litellm/llms/vertex_ai/cost_calculator.py).

Consider moving this mapping to litellm/llms/gemini/cost_calculator.py or a shared Gemini/Vertex utility, and importing it here.

Context Used: Rule from dashboard - What: Avoid writing provider-specific code outside of the llms/ directory.

Why: This practice ensur... (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-23T07:54:32Z

litellm/model_prices_and_context_window_backup.json

        "supports_web_search": true,
        "supports_url_context": true,
        "supports_native_streaming": true,
-        "tpm": 800000
+        "input_cost_per_token_priority": 3.6e-06,
+        "input_cost_per_token_above_200k_tokens_priority": 7.2e-06,
+        "output_cost_per_token_priority": 2.16e-05,
+        "output_cost_per_token_above_200k_tokens_priority": 3.24e-05,
+        "cache_read_input_token_cost_priority": 3.6e-07,
+        "cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,
+        "supports_service_tier": true


tpm field accidentally removed (backup JSON)

Same issue as the primary JSON file — the tpm: 800000 field for gemini/gemini-3.1-pro-preview was dropped when the priority pricing fields were added. Both files should be kept in sync.

Suggested change

"supports_web_search": true,

"supports_url_context": true,

"supports_native_streaming": true,

"tpm": 800000

"input_cost_per_token_priority": 3.6e-06,

"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,

"output_cost_per_token_priority": 2.16e-05,

"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,

"cache_read_input_token_cost_priority": 3.6e-07,

"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,

"supports_service_tier": true

"supports_native_streaming": true,

"tpm": 800000,

"input_cost_per_token_priority": 3.6e-06,

"input_cost_per_token_above_200k_tokens_priority": 7.2e-06,

"output_cost_per_token_priority": 2.16e-05,

"output_cost_per_token_above_200k_tokens_priority": 3.24e-05,

"cache_read_input_token_cost_priority": 3.6e-07,

"cache_read_input_token_cost_above_200k_tokens_priority": 7.2e-07,

"supports_service_tier": true

greptile-apps · 2026-02-23T07:54:35Z

Additional Comments (1)

litellm/llms/vertex_ai/cost_calculator.py
service_tier ignored in 128k pricing path

The _handle_128k_pricing function does not accept or use the service_tier parameter. When a Vertex AI model has input_cost_per_token_above_128k_tokens set (e.g., older gemini-1.5 models), this early-return path will always use standard pricing regardless of the traffic type. The service_tier parameter added to cost_per_token is never forwarded here.

Currently this only affects older models (gemini-1.5-flash etc.) that don't have priority pricing configured, but it creates a latent bug if priority pricing is ever added for those models.

Sameerlite · 2026-02-23T11:53:36Z

@greptile-apps

greptile-apps

_{6 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-23T11:59:39Z

litellm/cost_calculator.py

+_GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER: dict = {
+    # ON_DEMAND_PRIORITY maps to "priority" — selects input_cost_per_token_priority, etc.
+    "ON_DEMAND_PRIORITY": "priority",
+    # FLEX / BATCH maps to "flex" — selects input_cost_per_token_flex, etc.
+    "FLEX": "flex",
+    "BATCH": "flex",
+    # ON_DEMAND is standard pricing — no service_tier suffix applied
+    "ON_DEMAND": None,
+}
+
+
+def _map_traffic_type_to_service_tier(traffic_type: Optional[str]) -> Optional[str]:
+    """
+    Map a Gemini usageMetadata.trafficType value to a LiteLLM service_tier string.
+
+    This allows the same `_priority` / `_flex` cost-key suffix logic used for
+    OpenAI/Azure to work for Gemini and Vertex AI models.
+
+    trafficType values seen in practice
+    ------------------------------------
+    ON_DEMAND          -> standard pricing  (service_tier = None)
+    ON_DEMAND_PRIORITY -> priority pricing  (service_tier = "priority")
+    FLEX / BATCH       -> batch/flex pricing (service_tier = "flex")
+    """
+    if traffic_type is None:
+        return None
+    service_tier = _GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER.get(traffic_type.upper())
+    return service_tier


Gemini-specific mapping outside llms/ directory

_GEMINI_TRAFFIC_TYPE_TO_SERVICE_TIER and _map_traffic_type_to_service_tier encode Gemini/Vertex AI-specific concepts (ON_DEMAND, ON_DEMAND_PRIORITY, FLEX, BATCH are Gemini traffic types). Per repository conventions, provider-specific code should live inside the llms/ directory (e.g., litellm/llms/gemini/cost_calculator.py or a shared Gemini/Vertex utility) and be imported here.

This keeps cost_calculator.py provider-agnostic and aligns with the existing pattern where provider-specific cost logic lives under litellm/llms/{provider}/.

Context Used: Rule from dashboard - What: Avoid writing provider-specific code outside of the llms/ directory.

Why: This practice ensur... (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-23T11:59:41Z

Additional Comments (1)

litellm/litellm_core_utils/llm_cost_calc/utils.py
Above-threshold cache costs ignore service_tier

The input and output above-threshold keys are correctly mapped to service_tier-specific variants (e.g. input_cost_per_token_above_200k_tokens_priority), but the cache read and cache creation above-threshold keys on lines 266-274 are not. Since the JSON data includes keys like cache_read_input_token_cost_above_200k_tokens_priority (which differs from the standard rate — e.g. 7.2e-07 vs 4e-07 for gemini-3.1-pro-preview), a priority-tier request exceeding 200k tokens will use the standard cache read rate instead of the higher priority rate.

The fix would be similar to what's done for input/output keys — use _get_service_tier_cost_key to construct the tiered cache keys when service_tier is set:

                    # Apply tiered pricing to cache costs
                    cache_creation_tiered_key = (
                        _get_service_tier_cost_key(
                            f"cache_creation_input_token_cost_above_{threshold_str}_tokens",
                            service_tier,
                        )
                        if service_tier
                        else f"cache_creation_input_token_cost_above_{threshold_str}_tokens"
                    )
                    cache_creation_1hr_tiered_key = (
                        f"cache_creation_input_token_cost_above_1hr_above_{threshold_str}_tokens"
                    )
                    cache_read_tiered_key = (
                        _get_service_tier_cost_key(
                            f"cache_read_input_token_cost_above_{threshold_str}_tokens",
                            service_tier,
                        )
                        if service_tier
                        else f"cache_read_input_token_cost_above_{threshold_str}_tokens"
                    )

Sameerlite added 2 commits February 23, 2026 13:15

Add support for Priority PayGo for vertex ai and gemini

20023bd

Add model pricing

4d8f509

greptile-apps bot reviewed Feb 23, 2026

View reviewed changes

Readd tpm limit

164cde9

vercel bot deployed to Preview February 23, 2026 11:54 View deployment

greptile-apps bot reviewed Feb 23, 2026

View reviewed changes

Fix test_aaamodel_prices_and_context_window_json_is_valid

2f8d36b

vercel bot deployed to Preview February 23, 2026 13:27 View deployment

Sameerlite merged commit f97ee62 into main Feb 23, 2026
32 of 34 checks passed

This was referenced Mar 6, 2026

[Feature]: Gemini Priority PayGo support for Vertex AI #21560

Closed

docs: add PayGo/priority cost tracking for Gemini Vertex AI #22948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Priority PayGo cost tracking gemini/vertex ai#21909

Add Priority PayGo cost tracking gemini/vertex ai#21909
Sameerlite merged 4 commits intomainfrom
litellm_cost_tracking_gemini

Sameerlite commented Feb 23, 2026

Uh oh!

vercel bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 23, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 23, 2026

Uh oh!

greptile-apps bot Feb 23, 2026

Uh oh!

greptile-apps bot Feb 23, 2026

Uh oh!

greptile-apps bot commented Feb 23, 2026

Uh oh!

Sameerlite commented Feb 23, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 23, 2026

Uh oh!

greptile-apps bot commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sameerlite commented Feb 23, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 23, 2026

Uh oh!

Sameerlite commented Feb 23, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Feb 23, 2026 •

edited

Loading

greptile-apps bot commented Feb 23, 2026 •

edited

Loading