Day 0 gemini 3.1 flash lite preview support#22674
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds day 0 support for
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| model_prices_and_context_window.json | Adds gemini-3.1-flash-lite-preview entries for bare, gemini/, and vertex_ai/ providers with pricing, token limits, and capability flags. Entries are consistent across all three variants. |
| litellm/model_prices_and_context_window_backup.json | Syncs backup JSON with main file. Includes gemini-3.1-flash-lite-preview entries and many other unrelated changes (cache pricing for Anthropic models, deprecation date updates). |
| tests/test_litellm/litellm_core_utils/llm_cost_calc/test_llm_cost_calc_utils.py | Adds mock-based cost calculation test for gemini-3.1-flash-lite-preview with reasoning tokens. Test logic is correct and validates prompt/completion cost math. |
| tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py | Changes test assertion from expecting thinkingConfig to be present to expecting it absent, but the underlying code still auto-adds thinkingConfig for Gemini 3+ models. This test will fail. |
| docs/my-website/blog/gemini_3_1_flash_lite/index.md | New blog post for gemini-3.1-flash-lite-preview. The reasoning_effort mapping table is inaccurate — the code doesn't handle this model name in the is_gemini3flash check. |
| docs/my-website/docs/providers/gemini.md | Adds gemini-3.1-flash-lite-preview to the supported models table. Straightforward documentation addition. |
| docs/my-website/docs/providers/vertex.md | Adds gemini-3.1-flash-lite-preview to the Vertex AI supported models table. Straightforward documentation addition. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User calls completion with gemini-3.1-flash-lite-preview] --> B{Provider prefix?}
B -->|bare key| C[Resolves to vertex_ai-language-models]
B -->|gemini/| D[Resolves to gemini provider]
B -->|vertex_ai/| E[Resolves to vertex_ai-language-models]
C --> F[map_openai_params]
D --> F
E --> F
F --> G{_is_gemini_3_or_newer?}
G -->|Yes: contains gemini-3| H{reasoning_effort provided?}
H -->|Yes| I[_map_reasoning_effort_to_thinking_level]
H -->|No| J[Auto-add thinkingConfig with thinkingLevel=low]
I --> K{is_gemini3flash check}
K -->|No match for 3.1-flash-lite| L[Falls to else branch: minimal→low, medium→high]
K -->|Would match if updated| M[Correct mapping: minimal→minimal, medium→medium]
J --> N[Send request to Gemini API]
L --> N
M --> N
Last reviewed commit: 9d06106
| "gemini-3.1-flash-lite-preview": { | ||
| "cache_read_input_token_cost": 2.5e-08, | ||
| "cache_read_input_token_cost_per_audio_token": 5e-08, | ||
| "input_cost_per_audio_token": 5e-07, | ||
| "input_cost_per_token": 2.5e-07, | ||
| "litellm_provider": "vertex_ai-language-models", | ||
| "max_audio_length_hours": 8.4, | ||
| "max_audio_per_prompt": 1, | ||
| "max_images_per_prompt": 3000, | ||
| "max_input_tokens": 1048576, | ||
| "max_output_tokens": 65536, | ||
| "max_pdf_size_mb": 30, | ||
| "max_tokens": 65536, | ||
| "max_video_length": 1, | ||
| "max_videos_per_prompt": 10, | ||
| "mode": "chat", | ||
| "output_cost_per_reasoning_token": 1.5e-06, | ||
| "output_cost_per_token": 1.5e-06, | ||
| "source": "https://ai.google.dev/gemini-api/docs/models", | ||
| "supported_endpoints": [ | ||
| "/v1/chat/completions", | ||
| "/v1/completions", | ||
| "/v1/batch" | ||
| ], | ||
| "supported_modalities": [ | ||
| "text", | ||
| "image", | ||
| "audio", | ||
| "video" | ||
| ], | ||
| "supported_output_modalities": [ | ||
| "text" | ||
| ], | ||
| "supports_audio_input": true, | ||
| "supports_audio_output": false, | ||
| "supports_code_execution": true, | ||
| "supports_file_search": true, | ||
| "supports_function_calling": true, | ||
| "supports_parallel_function_calling": true, | ||
| "supports_pdf_input": true, | ||
| "supports_prompt_caching": true, | ||
| "supports_reasoning": true, | ||
| "supports_response_schema": true, | ||
| "supports_system_messages": true, | ||
| "supports_tool_choice": true, | ||
| "supports_url_context": true, | ||
| "supports_video_input": true, | ||
| "supports_vision": true, | ||
| "supports_web_search": true | ||
| }, |
There was a problem hiding this comment.
Missing supports_native_streaming on bare key
The bare gemini-3.1-flash-lite-preview entry is missing "supports_native_streaming": true, while both gemini/gemini-3.1-flash-lite-preview (line 17193) and vertex_ai/gemini-3.1-flash-lite-preview (line 32457) include it. Other comparable bare-key entries like gemini-3-pro-preview (line 14877) also have this field. This inconsistency could cause streaming behavior to differ depending on which key is used to look up the model.
| "gemini-3.1-flash-lite-preview": { | |
| "cache_read_input_token_cost": 2.5e-08, | |
| "cache_read_input_token_cost_per_audio_token": 5e-08, | |
| "input_cost_per_audio_token": 5e-07, | |
| "input_cost_per_token": 2.5e-07, | |
| "litellm_provider": "vertex_ai-language-models", | |
| "max_audio_length_hours": 8.4, | |
| "max_audio_per_prompt": 1, | |
| "max_images_per_prompt": 3000, | |
| "max_input_tokens": 1048576, | |
| "max_output_tokens": 65536, | |
| "max_pdf_size_mb": 30, | |
| "max_tokens": 65536, | |
| "max_video_length": 1, | |
| "max_videos_per_prompt": 10, | |
| "mode": "chat", | |
| "output_cost_per_reasoning_token": 1.5e-06, | |
| "output_cost_per_token": 1.5e-06, | |
| "source": "https://ai.google.dev/gemini-api/docs/models", | |
| "supported_endpoints": [ | |
| "/v1/chat/completions", | |
| "/v1/completions", | |
| "/v1/batch" | |
| ], | |
| "supported_modalities": [ | |
| "text", | |
| "image", | |
| "audio", | |
| "video" | |
| ], | |
| "supported_output_modalities": [ | |
| "text" | |
| ], | |
| "supports_audio_input": true, | |
| "supports_audio_output": false, | |
| "supports_code_execution": true, | |
| "supports_file_search": true, | |
| "supports_function_calling": true, | |
| "supports_parallel_function_calling": true, | |
| "supports_pdf_input": true, | |
| "supports_prompt_caching": true, | |
| "supports_reasoning": true, | |
| "supports_response_schema": true, | |
| "supports_system_messages": true, | |
| "supports_tool_choice": true, | |
| "supports_url_context": true, | |
| "supports_video_input": true, | |
| "supports_vision": true, | |
| "supports_web_search": true | |
| }, | |
| "gemini-3.1-flash-lite-preview": { | |
| "cache_read_input_token_cost": 2.5e-08, | |
| "cache_read_input_token_cost_per_audio_token": 5e-08, | |
| "input_cost_per_audio_token": 5e-07, | |
| "input_cost_per_token": 2.5e-07, | |
| "litellm_provider": "vertex_ai-language-models", | |
| "max_audio_length_hours": 8.4, | |
| "max_audio_per_prompt": 1, | |
| "max_images_per_prompt": 3000, | |
| "max_input_tokens": 1048576, | |
| "max_output_tokens": 65536, | |
| "max_pdf_size_mb": 30, | |
| "max_tokens": 65536, | |
| "max_video_length": 1, | |
| "max_videos_per_prompt": 10, | |
| "mode": "chat", | |
| "output_cost_per_reasoning_token": 1.5e-06, | |
| "output_cost_per_token": 1.5e-06, | |
| "source": "https://ai.google.dev/gemini-api/docs/models", | |
| "supported_endpoints": [ | |
| "/v1/chat/completions", | |
| "/v1/completions", | |
| "/v1/batch" | |
| ], | |
| "supported_modalities": [ | |
| "text", | |
| "image", | |
| "audio", | |
| "video" | |
| ], | |
| "supported_output_modalities": [ | |
| "text" | |
| ], | |
| "supports_audio_input": true, | |
| "supports_audio_output": false, | |
| "supports_code_execution": true, | |
| "supports_file_search": true, | |
| "supports_function_calling": true, | |
| "supports_parallel_function_calling": true, | |
| "supports_pdf_input": true, | |
| "supports_prompt_caching": true, | |
| "supports_reasoning": true, | |
| "supports_response_schema": true, | |
| "supports_system_messages": true, | |
| "supports_tool_choice": true, | |
| "supports_url_context": true, | |
| "supports_video_input": true, | |
| "supports_vision": true, | |
| "supports_web_search": true, | |
| "supports_native_streaming": true | |
| }, |
| If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above. | ||
| ::: | ||
|
|
||
| ## Deploy this version | ||
|
|
||
| <Tabs> | ||
| <TabItem value="docker" label="Docker"> | ||
|
|
||
| ``` showLineNumbers title="docker run litellm" | ||
| docker run \ | ||
| -e STORE_MODEL_IN_DB=True \ | ||
| -p 4000:4000 \ | ||
| ghcr.io/berriai/litellm:main-v1.80.8-stable.1 | ||
| ``` | ||
|
|
||
| </TabItem> | ||
|
|
||
| <TabItem value="pip" label="Pip"> | ||
|
|
||
| ``` showLineNumbers title="pip install litellm" | ||
| pip install litellm==v1.80.8-stable.1 |
There was a problem hiding this comment.
Outdated version references in blog post
The blog post references v1.80.8-stable.1 for both Docker and pip install, but the current litellm version is 1.82.0 (per pyproject.toml). These version references appear outdated. Additionally, the pip install litellm==v1.80.8-stable.1 uses a v prefix which is non-standard for pip version specifiers — it should typically be pip install litellm==1.80.8.
| # Should NOT have thinkingConfig automatically added when user provides no reasoning_effort | ||
| assert "thinkingConfig" not in result |
There was a problem hiding this comment.
Test assertion contradicts actual code behavior
This test now asserts "thinkingConfig" not in result, but no corresponding change was made to the production code. The map_openai_params method in vertex_and_google_ai_studio_gemini.py (around line 1085-1099) still auto-adds thinkingConfig with a default thinkingLevel for all Gemini 3+ non-image models when no reasoning_effort is provided:
if VertexGeminiConfig._is_gemini_3_or_newer(model):
...
thinking_config["thinkingLevel"] = (
"minimal" if is_gemini3flash else "low"
)
optional_params["thinkingConfig"] = thinking_configSince _is_gemini_3_or_newer checks for "gemini-3" in model, this will match gemini-3-pro-preview and auto-add thinkingConfig. This means this modified assertion will fail at runtime. Either the code needs to be updated to stop auto-adding thinkingConfig, or this test change should be reverted.
| | `minimal` | `minimal` | Ultra-fast responses, simple queries | | ||
| | `low` | `low` | Basic instruction following | | ||
| | `medium` | `medium` | Balanced reasoning for moderate complexity | | ||
| | `high` | `high` | Maximum reasoning depth, complex problems | | ||
| | `disable` | `minimal` | Disable extended reasoning | | ||
| | `none` | `minimal` | No extended reasoning | No newline at end of file |
There was a problem hiding this comment.
Blog reasoning_effort mapping table is inaccurate for this model
The mapping table claims minimal → minimal, disable → minimal, none → minimal, and medium → medium. However, looking at _map_reasoning_effort_to_thinking_level in vertex_and_google_ai_studio_gemini.py, the is_gemini3flash check only matches "gemini-3-flash-preview" or "gemini-3-flash" — it does not match gemini-3.1-flash-lite-preview. Similarly, is_gemini31pro only matches "gemini-3.1-pro-preview".
As a result, for gemini-3.1-flash-lite-preview:
minimalactually maps tolow(notminimal)mediumactually maps tohigh(notmedium)disableandnoneactually map tolow(notminimal)
Either the code needs to be updated to recognize gemini-3.1-flash-lite-preview as a model that supports minimal/medium thinking levels, or the table should reflect the actual behavior.
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
Changes