Day 0 gemini 3.1 flash lite preview support by Sameerlite · Pull Request #22674 · BerriAI/litellm

Sameerlite · 2026-03-03T16:38:29Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature

Changes

vercel · 2026-03-03T16:38:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Error		Mar 3, 2026 4:53pm

greptile-apps · 2026-03-03T16:45:09Z

Greptile Summary

This PR adds day 0 support for gemini-3.1-flash-lite-preview by adding model entries to the pricing JSON (bare, gemini/, and vertex_ai/ variants), updating provider docs, adding a blog post, and including a cost calculation test.

Model pricing and capability flags added consistently across all three JSON key variants in model_prices_and_context_window.json and backup
New cost calculation test correctly validates reasoning token pricing for the model
Issue: The test change in test_vertex_and_google_ai_studio_gemini.py reverses an assertion (from thinkingConfig in result to thinkingConfig not in result) without a corresponding code change — the production code still auto-adds thinkingConfig for Gemini 3+ models, so this test will fail
Issue: The blog post's reasoning_effort mapping table is inaccurate — _map_reasoning_effort_to_thinking_level doesn't recognize gemini-3.1-flash-lite-preview in its is_gemini3flash check, so minimal/medium/disable/none map to different levels than documented
The backup JSON diff includes many unrelated changes (Anthropic cache pricing, deprecation date updates) that appear to be a full sync

Confidence Score: 2/5

PR has a failing test and documented behavior that doesn't match the code — needs fixes before merging
The model pricing entries look correct, but there are two significant issues: (1) a test assertion was changed to contradict the actual code behavior, meaning the test will fail, and (2) the blog post documents reasoning_effort mappings that don't match what the code actually does for this model name. The thinking level handling code needs to be updated to recognize gemini-3.1-flash-lite-preview.
Pay close attention to tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py (contradicts code) and docs/my-website/blog/gemini_3_1_flash_lite/index.md (inaccurate mapping table)

Important Files Changed

Filename	Overview
model_prices_and_context_window.json	Adds gemini-3.1-flash-lite-preview entries for bare, gemini/, and vertex_ai/ providers with pricing, token limits, and capability flags. Entries are consistent across all three variants.
litellm/model_prices_and_context_window_backup.json	Syncs backup JSON with main file. Includes gemini-3.1-flash-lite-preview entries and many other unrelated changes (cache pricing for Anthropic models, deprecation date updates).
tests/test_litellm/litellm_core_utils/llm_cost_calc/test_llm_cost_calc_utils.py	Adds mock-based cost calculation test for gemini-3.1-flash-lite-preview with reasoning tokens. Test logic is correct and validates prompt/completion cost math.
tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py	Changes test assertion from expecting thinkingConfig to be present to expecting it absent, but the underlying code still auto-adds thinkingConfig for Gemini 3+ models. This test will fail.
docs/my-website/blog/gemini_3_1_flash_lite/index.md	New blog post for gemini-3.1-flash-lite-preview. The reasoning_effort mapping table is inaccurate — the code doesn't handle this model name in the is_gemini3flash check.
docs/my-website/docs/providers/gemini.md	Adds gemini-3.1-flash-lite-preview to the supported models table. Straightforward documentation addition.
docs/my-website/docs/providers/vertex.md	Adds gemini-3.1-flash-lite-preview to the Vertex AI supported models table. Straightforward documentation addition.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User calls completion with gemini-3.1-flash-lite-preview] --> B{Provider prefix?}
    B -->|bare key| C[Resolves to vertex_ai-language-models]
    B -->|gemini/| D[Resolves to gemini provider]
    B -->|vertex_ai/| E[Resolves to vertex_ai-language-models]
    C --> F[map_openai_params]
    D --> F
    E --> F
    F --> G{_is_gemini_3_or_newer?}
    G -->|Yes: contains gemini-3| H{reasoning_effort provided?}
    H -->|Yes| I[_map_reasoning_effort_to_thinking_level]
    H -->|No| J[Auto-add thinkingConfig with thinkingLevel=low]
    I --> K{is_gemini3flash check}
    K -->|No match for 3.1-flash-lite| L[Falls to else branch: minimal→low, medium→high]
    K -->|Would match if updated| M[Correct mapping: minimal→minimal, medium→medium]
    J --> N[Send request to Gemini API]
    L --> N
    M --> N

_{Last reviewed commit: 9d06106}

greptile-apps

_{7 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-03T16:45:13Z

model_prices_and_context_window.json

+    "gemini-3.1-flash-lite-preview": {
+        "cache_read_input_token_cost": 2.5e-08,
+        "cache_read_input_token_cost_per_audio_token": 5e-08,
+        "input_cost_per_audio_token": 5e-07,
+        "input_cost_per_token": 2.5e-07,
+        "litellm_provider": "vertex_ai-language-models",
+        "max_audio_length_hours": 8.4,
+        "max_audio_per_prompt": 1,
+        "max_images_per_prompt": 3000,
+        "max_input_tokens": 1048576,
+        "max_output_tokens": 65536,
+        "max_pdf_size_mb": 30,
+        "max_tokens": 65536,
+        "max_video_length": 1,
+        "max_videos_per_prompt": 10,
+        "mode": "chat",
+        "output_cost_per_reasoning_token": 1.5e-06,
+        "output_cost_per_token": 1.5e-06,
+        "source": "https://ai.google.dev/gemini-api/docs/models",
+        "supported_endpoints": [
+            "/v1/chat/completions",
+            "/v1/completions",
+            "/v1/batch"
+        ],
+        "supported_modalities": [
+            "text",
+            "image",
+            "audio",
+            "video"
+        ],
+        "supported_output_modalities": [
+            "text"
+        ],
+        "supports_audio_input": true,
+        "supports_audio_output": false,
+        "supports_code_execution": true,
+        "supports_file_search": true,
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_pdf_input": true,
+        "supports_prompt_caching": true,
+        "supports_reasoning": true,
+        "supports_response_schema": true,
+        "supports_system_messages": true,
+        "supports_tool_choice": true,
+        "supports_url_context": true,
+        "supports_video_input": true,
+        "supports_vision": true,
+        "supports_web_search": true
+    },


Missing supports_native_streaming on bare key

The bare gemini-3.1-flash-lite-preview entry is missing "supports_native_streaming": true, while both gemini/gemini-3.1-flash-lite-preview (line 17193) and vertex_ai/gemini-3.1-flash-lite-preview (line 32457) include it. Other comparable bare-key entries like gemini-3-pro-preview (line 14877) also have this field. This inconsistency could cause streaming behavior to differ depending on which key is used to look up the model.

Suggested change

"gemini-3.1-flash-lite-preview": {

"cache_read_input_token_cost": 2.5e-08,

"cache_read_input_token_cost_per_audio_token": 5e-08,

"input_cost_per_audio_token": 5e-07,

"input_cost_per_token": 2.5e-07,

"litellm_provider": "vertex_ai-language-models",

"max_audio_length_hours": 8.4,

"max_audio_per_prompt": 1,

"max_images_per_prompt": 3000,

"max_input_tokens": 1048576,

"max_output_tokens": 65536,

"max_pdf_size_mb": 30,

"max_tokens": 65536,

"max_video_length": 1,

"max_videos_per_prompt": 10,

"mode": "chat",

"output_cost_per_reasoning_token": 1.5e-06,

"output_cost_per_token": 1.5e-06,

"source": "https://ai.google.dev/gemini-api/docs/models",

"supported_endpoints": [

"/v1/chat/completions",

"/v1/completions",

"/v1/batch"

],

"supported_modalities": [

"text",

"image",

"audio",

"video"

],

"supported_output_modalities": [

"text"

],

"supports_audio_input": true,

"supports_audio_output": false,

"supports_code_execution": true,

"supports_file_search": true,

"supports_function_calling": true,

"supports_parallel_function_calling": true,

"supports_pdf_input": true,

"supports_prompt_caching": true,

"supports_reasoning": true,

"supports_response_schema": true,

"supports_system_messages": true,

"supports_tool_choice": true,

"supports_url_context": true,

"supports_video_input": true,

"supports_vision": true,

"supports_web_search": true

},

"gemini-3.1-flash-lite-preview": {

"cache_read_input_token_cost": 2.5e-08,

"cache_read_input_token_cost_per_audio_token": 5e-08,

"input_cost_per_audio_token": 5e-07,

"input_cost_per_token": 2.5e-07,

"litellm_provider": "vertex_ai-language-models",

"max_audio_length_hours": 8.4,

"max_audio_per_prompt": 1,

"max_images_per_prompt": 3000,

"max_input_tokens": 1048576,

"max_output_tokens": 65536,

"max_pdf_size_mb": 30,

"max_tokens": 65536,

"max_video_length": 1,

"max_videos_per_prompt": 10,

"mode": "chat",

"output_cost_per_reasoning_token": 1.5e-06,

"output_cost_per_token": 1.5e-06,

"source": "https://ai.google.dev/gemini-api/docs/models",

"supported_endpoints": [

"/v1/chat/completions",

"/v1/completions",

"/v1/batch"

],

"supported_modalities": [

"text",

"image",

"audio",

"video"

],

"supported_output_modalities": [

"text"

],

"supports_audio_input": true,

"supports_audio_output": false,

"supports_code_execution": true,

"supports_file_search": true,

"supports_function_calling": true,

"supports_parallel_function_calling": true,

"supports_pdf_input": true,

"supports_prompt_caching": true,

"supports_reasoning": true,

"supports_response_schema": true,

"supports_system_messages": true,

"supports_tool_choice": true,

"supports_url_context": true,

"supports_video_input": true,

"supports_vision": true,

"supports_web_search": true,

"supports_native_streaming": true

},

greptile-apps · 2026-03-03T16:45:14Z

docs/my-website/blog/gemini_3_1_flash_lite/index.md

+If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above.
+:::
+
+## Deploy this version
+
+<Tabs>
+<TabItem value="docker" label="Docker">
+
+``` showLineNumbers title="docker run litellm"
+docker run \
+-e STORE_MODEL_IN_DB=True \
+-p 4000:4000 \
+ghcr.io/berriai/litellm:main-v1.80.8-stable.1
+```
+
+</TabItem>
+
+<TabItem value="pip" label="Pip">
+
+``` showLineNumbers title="pip install litellm"
+pip install litellm==v1.80.8-stable.1


Outdated version references in blog post

The blog post references v1.80.8-stable.1 for both Docker and pip install, but the current litellm version is 1.82.0 (per pyproject.toml). These version references appear outdated. Additionally, the pip install litellm==v1.80.8-stable.1 uses a v prefix which is non-standard for pip version specifiers — it should typically be pip install litellm==1.80.8.

greptile-apps · 2026-03-03T16:57:36Z

tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py

+    # Should NOT have thinkingConfig automatically added when user provides no reasoning_effort
+    assert "thinkingConfig" not in result


Test assertion contradicts actual code behavior

This test now asserts "thinkingConfig" not in result, but no corresponding change was made to the production code. The map_openai_params method in vertex_and_google_ai_studio_gemini.py (around line 1085-1099) still auto-adds thinkingConfig with a default thinkingLevel for all Gemini 3+ non-image models when no reasoning_effort is provided:

if VertexGeminiConfig._is_gemini_3_or_newer(model): ... thinking_config["thinkingLevel"] = ( "minimal" if is_gemini3flash else "low" ) optional_params["thinkingConfig"] = thinking_config

Since _is_gemini_3_or_newer checks for "gemini-3" in model, this will match gemini-3-pro-preview and auto-add thinkingConfig. This means this modified assertion will fail at runtime. Either the code needs to be updated to stop auto-adding thinkingConfig, or this test change should be reverted.

greptile-apps · 2026-03-03T16:57:38Z

docs/my-website/blog/gemini_3_1_flash_lite/index.md

+| `minimal` | `minimal` | Ultra-fast responses, simple queries |
+| `low` | `low` | Basic instruction following |
+| `medium` | `medium` | Balanced reasoning for moderate complexity |
+| `high` | `high` | Maximum reasoning depth, complex problems |
+| `disable` | `minimal` | Disable extended reasoning |
+| `none` | `minimal` | No extended reasoning |


Blog reasoning_effort mapping table is inaccurate for this model

The mapping table claims minimal → minimal, disable → minimal, none → minimal, and medium → medium. However, looking at _map_reasoning_effort_to_thinking_level in vertex_and_google_ai_studio_gemini.py, the is_gemini3flash check only matches "gemini-3-flash-preview" or "gemini-3-flash" — it does not match gemini-3.1-flash-lite-preview. Similarly, is_gemini31pro only matches "gemini-3.1-pro-preview".

As a result, for gemini-3.1-flash-lite-preview:

minimal actually maps to low (not minimal)

medium actually maps to high (not medium)

disable and none actually map to low (not minimal)

Either the code needs to be updated to recognize gemini-3.1-flash-lite-preview as a model that supports minimal/medium thinking levels, or the table should reflect the actual behavior.

Sameerlite added 3 commits March 3, 2026 15:07

Add day 0 support of gemini-3.1-flash-lite-preview

851be58

Add blog post for gemini-3.1-flash-lite-preview

deb8fea

Add correct pricing for gemini 3.1 flash lite

c3fe463

vercel bot had a problem deploying to Preview March 3, 2026 16:38 Failure

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

Fix gemini-3.1-flash-lite-preview for streaming

9d06106

Sameerlite merged commit daa0397 into main Mar 3, 2026
1 of 85 checks passed

vercel bot had a problem deploying to Preview March 3, 2026 16:53 Failure

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

vincentkoc mentioned this pull request Mar 8, 2026

test(vertex_ai): pin gemini-3.1-flash-lite reasoning mapping #23090

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Day 0 gemini 3.1 flash lite preview support#22674

Day 0 gemini 3.1 flash lite preview support#22674
Sameerlite merged 4 commits intomainfrom
litellm_gemini-3.1-flash-lite-preview

Sameerlite commented Mar 3, 2026

Uh oh!

vercel bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 3, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Should NOT have thinkingConfig automatically added when user provides no reasoning_effort
		assert "thinkingConfig" not in result

Uh oh!

Conversation

Sameerlite commented Mar 3, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 3, 2026 •

edited

Loading

greptile-apps bot commented Mar 3, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading