Skip to content

Day 0 gemini 3.1 flash lite preview support#22674

Merged
Sameerlite merged 4 commits intomainfrom
litellm_gemini-3.1-flash-lite-preview
Mar 3, 2026
Merged

Day 0 gemini 3.1 flash lite preview support#22674
Sameerlite merged 4 commits intomainfrom
litellm_gemini-3.1-flash-lite-preview

Conversation

@Sameerlite
Copy link
Collaborator

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

@vercel
Copy link

vercel bot commented Mar 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Error Error Mar 3, 2026 4:53pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR adds day 0 support for gemini-3.1-flash-lite-preview by adding model entries to the pricing JSON (bare, gemini/, and vertex_ai/ variants), updating provider docs, adding a blog post, and including a cost calculation test.

  • Model pricing and capability flags added consistently across all three JSON key variants in model_prices_and_context_window.json and backup
  • New cost calculation test correctly validates reasoning token pricing for the model
  • Issue: The test change in test_vertex_and_google_ai_studio_gemini.py reverses an assertion (from thinkingConfig in result to thinkingConfig not in result) without a corresponding code change — the production code still auto-adds thinkingConfig for Gemini 3+ models, so this test will fail
  • Issue: The blog post's reasoning_effort mapping table is inaccurate — _map_reasoning_effort_to_thinking_level doesn't recognize gemini-3.1-flash-lite-preview in its is_gemini3flash check, so minimal/medium/disable/none map to different levels than documented
  • The backup JSON diff includes many unrelated changes (Anthropic cache pricing, deprecation date updates) that appear to be a full sync

Confidence Score: 2/5

  • PR has a failing test and documented behavior that doesn't match the code — needs fixes before merging
  • The model pricing entries look correct, but there are two significant issues: (1) a test assertion was changed to contradict the actual code behavior, meaning the test will fail, and (2) the blog post documents reasoning_effort mappings that don't match what the code actually does for this model name. The thinking level handling code needs to be updated to recognize gemini-3.1-flash-lite-preview.
  • Pay close attention to tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py (contradicts code) and docs/my-website/blog/gemini_3_1_flash_lite/index.md (inaccurate mapping table)

Important Files Changed

Filename Overview
model_prices_and_context_window.json Adds gemini-3.1-flash-lite-preview entries for bare, gemini/, and vertex_ai/ providers with pricing, token limits, and capability flags. Entries are consistent across all three variants.
litellm/model_prices_and_context_window_backup.json Syncs backup JSON with main file. Includes gemini-3.1-flash-lite-preview entries and many other unrelated changes (cache pricing for Anthropic models, deprecation date updates).
tests/test_litellm/litellm_core_utils/llm_cost_calc/test_llm_cost_calc_utils.py Adds mock-based cost calculation test for gemini-3.1-flash-lite-preview with reasoning tokens. Test logic is correct and validates prompt/completion cost math.
tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py Changes test assertion from expecting thinkingConfig to be present to expecting it absent, but the underlying code still auto-adds thinkingConfig for Gemini 3+ models. This test will fail.
docs/my-website/blog/gemini_3_1_flash_lite/index.md New blog post for gemini-3.1-flash-lite-preview. The reasoning_effort mapping table is inaccurate — the code doesn't handle this model name in the is_gemini3flash check.
docs/my-website/docs/providers/gemini.md Adds gemini-3.1-flash-lite-preview to the supported models table. Straightforward documentation addition.
docs/my-website/docs/providers/vertex.md Adds gemini-3.1-flash-lite-preview to the Vertex AI supported models table. Straightforward documentation addition.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User calls completion with gemini-3.1-flash-lite-preview] --> B{Provider prefix?}
    B -->|bare key| C[Resolves to vertex_ai-language-models]
    B -->|gemini/| D[Resolves to gemini provider]
    B -->|vertex_ai/| E[Resolves to vertex_ai-language-models]
    C --> F[map_openai_params]
    D --> F
    E --> F
    F --> G{_is_gemini_3_or_newer?}
    G -->|Yes: contains gemini-3| H{reasoning_effort provided?}
    H -->|Yes| I[_map_reasoning_effort_to_thinking_level]
    H -->|No| J[Auto-add thinkingConfig with thinkingLevel=low]
    I --> K{is_gemini3flash check}
    K -->|No match for 3.1-flash-lite| L[Falls to else branch: minimal→low, medium→high]
    K -->|Would match if updated| M[Correct mapping: minimal→minimal, medium→medium]
    J --> N[Send request to Gemini API]
    L --> N
    M --> N
Loading

Last reviewed commit: 9d06106

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +14337 to +14386
"gemini-3.1-flash-lite-preview": {
"cache_read_input_token_cost": 2.5e-08,
"cache_read_input_token_cost_per_audio_token": 5e-08,
"input_cost_per_audio_token": 5e-07,
"input_cost_per_token": 2.5e-07,
"litellm_provider": "vertex_ai-language-models",
"max_audio_length_hours": 8.4,
"max_audio_per_prompt": 1,
"max_images_per_prompt": 3000,
"max_input_tokens": 1048576,
"max_output_tokens": 65536,
"max_pdf_size_mb": 30,
"max_tokens": 65536,
"max_video_length": 1,
"max_videos_per_prompt": 10,
"mode": "chat",
"output_cost_per_reasoning_token": 1.5e-06,
"output_cost_per_token": 1.5e-06,
"source": "https://ai.google.dev/gemini-api/docs/models",
"supported_endpoints": [
"/v1/chat/completions",
"/v1/completions",
"/v1/batch"
],
"supported_modalities": [
"text",
"image",
"audio",
"video"
],
"supported_output_modalities": [
"text"
],
"supports_audio_input": true,
"supports_audio_output": false,
"supports_code_execution": true,
"supports_file_search": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_pdf_input": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_system_messages": true,
"supports_tool_choice": true,
"supports_url_context": true,
"supports_video_input": true,
"supports_vision": true,
"supports_web_search": true
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing supports_native_streaming on bare key

The bare gemini-3.1-flash-lite-preview entry is missing "supports_native_streaming": true, while both gemini/gemini-3.1-flash-lite-preview (line 17193) and vertex_ai/gemini-3.1-flash-lite-preview (line 32457) include it. Other comparable bare-key entries like gemini-3-pro-preview (line 14877) also have this field. This inconsistency could cause streaming behavior to differ depending on which key is used to look up the model.

Suggested change
"gemini-3.1-flash-lite-preview": {
"cache_read_input_token_cost": 2.5e-08,
"cache_read_input_token_cost_per_audio_token": 5e-08,
"input_cost_per_audio_token": 5e-07,
"input_cost_per_token": 2.5e-07,
"litellm_provider": "vertex_ai-language-models",
"max_audio_length_hours": 8.4,
"max_audio_per_prompt": 1,
"max_images_per_prompt": 3000,
"max_input_tokens": 1048576,
"max_output_tokens": 65536,
"max_pdf_size_mb": 30,
"max_tokens": 65536,
"max_video_length": 1,
"max_videos_per_prompt": 10,
"mode": "chat",
"output_cost_per_reasoning_token": 1.5e-06,
"output_cost_per_token": 1.5e-06,
"source": "https://ai.google.dev/gemini-api/docs/models",
"supported_endpoints": [
"/v1/chat/completions",
"/v1/completions",
"/v1/batch"
],
"supported_modalities": [
"text",
"image",
"audio",
"video"
],
"supported_output_modalities": [
"text"
],
"supports_audio_input": true,
"supports_audio_output": false,
"supports_code_execution": true,
"supports_file_search": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_pdf_input": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_system_messages": true,
"supports_tool_choice": true,
"supports_url_context": true,
"supports_video_input": true,
"supports_vision": true,
"supports_web_search": true
},
"gemini-3.1-flash-lite-preview": {
"cache_read_input_token_cost": 2.5e-08,
"cache_read_input_token_cost_per_audio_token": 5e-08,
"input_cost_per_audio_token": 5e-07,
"input_cost_per_token": 2.5e-07,
"litellm_provider": "vertex_ai-language-models",
"max_audio_length_hours": 8.4,
"max_audio_per_prompt": 1,
"max_images_per_prompt": 3000,
"max_input_tokens": 1048576,
"max_output_tokens": 65536,
"max_pdf_size_mb": 30,
"max_tokens": 65536,
"max_video_length": 1,
"max_videos_per_prompt": 10,
"mode": "chat",
"output_cost_per_reasoning_token": 1.5e-06,
"output_cost_per_token": 1.5e-06,
"source": "https://ai.google.dev/gemini-api/docs/models",
"supported_endpoints": [
"/v1/chat/completions",
"/v1/completions",
"/v1/batch"
],
"supported_modalities": [
"text",
"image",
"audio",
"video"
],
"supported_output_modalities": [
"text"
],
"supports_audio_input": true,
"supports_audio_output": false,
"supports_code_execution": true,
"supports_file_search": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_pdf_input": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_system_messages": true,
"supports_tool_choice": true,
"supports_url_context": true,
"supports_video_input": true,
"supports_vision": true,
"supports_web_search": true,
"supports_native_streaming": true
},

Comment on lines +32 to +52
If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above.
:::

## Deploy this version

<Tabs>
<TabItem value="docker" label="Docker">

``` showLineNumbers title="docker run litellm"
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.8-stable.1
```

</TabItem>

<TabItem value="pip" label="Pip">

``` showLineNumbers title="pip install litellm"
pip install litellm==v1.80.8-stable.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated version references in blog post

The blog post references v1.80.8-stable.1 for both Docker and pip install, but the current litellm version is 1.82.0 (per pyproject.toml). These version references appear outdated. Additionally, the pip install litellm==v1.80.8-stable.1 uses a v prefix which is non-standard for pip version specifiers — it should typically be pip install litellm==1.80.8.

@Sameerlite Sameerlite merged commit daa0397 into main Mar 3, 2026
1 of 85 checks passed
Comment on lines +2477 to +2478
# Should NOT have thinkingConfig automatically added when user provides no reasoning_effort
assert "thinkingConfig" not in result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assertion contradicts actual code behavior

This test now asserts "thinkingConfig" not in result, but no corresponding change was made to the production code. The map_openai_params method in vertex_and_google_ai_studio_gemini.py (around line 1085-1099) still auto-adds thinkingConfig with a default thinkingLevel for all Gemini 3+ non-image models when no reasoning_effort is provided:

if VertexGeminiConfig._is_gemini_3_or_newer(model):
    ...
    thinking_config["thinkingLevel"] = (
        "minimal" if is_gemini3flash else "low"
    )
    optional_params["thinkingConfig"] = thinking_config

Since _is_gemini_3_or_newer checks for "gemini-3" in model, this will match gemini-3-pro-preview and auto-add thinkingConfig. This means this modified assertion will fail at runtime. Either the code needs to be updated to stop auto-adding thinkingConfig, or this test change should be reverted.

Comment on lines +170 to +175
| `minimal` | `minimal` | Ultra-fast responses, simple queries |
| `low` | `low` | Basic instruction following |
| `medium` | `medium` | Balanced reasoning for moderate complexity |
| `high` | `high` | Maximum reasoning depth, complex problems |
| `disable` | `minimal` | Disable extended reasoning |
| `none` | `minimal` | No extended reasoning | No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blog reasoning_effort mapping table is inaccurate for this model

The mapping table claims minimalminimal, disableminimal, noneminimal, and mediummedium. However, looking at _map_reasoning_effort_to_thinking_level in vertex_and_google_ai_studio_gemini.py, the is_gemini3flash check only matches "gemini-3-flash-preview" or "gemini-3-flash" — it does not match gemini-3.1-flash-lite-preview. Similarly, is_gemini31pro only matches "gemini-3.1-pro-preview".

As a result, for gemini-3.1-flash-lite-preview:

  • minimal actually maps to low (not minimal)
  • medium actually maps to high (not medium)
  • disable and none actually map to low (not minimal)

Either the code needs to be updated to recognize gemini-3.1-flash-lite-preview as a model that supports minimal/medium thinking levels, or the table should reflect the actual behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant