[Feat] Add pricing for Nebius models#22614
Conversation
…json Add 30 Nebius AI Studio models covering: - Text-to-text: DeepSeek (R1, R1-0528, R1-Distill, V3, V3-0324), Meta Llama (3.1-8B/70B/405B, 3.3-70B), Qwen (3-235B/32B/30B/14B/4B, 2.5-72B/32B, 2.5-Coder-7B, QwQ-32B), Mistral Nemo, NousResearch Hermes-3, NVIDIA Nemotron Ultra/Super, Google Gemma-3-27B, Llama-Guard-3 - Vision: Qwen2.5-VL-72B, Qwen2-VL-72B, Qwen2-VL-7B - Embedding: BAAI/bge-en-icl, BAAI/bge-multilingual-gemma2, intfloat/e5-mistral-7b Pricing sourced from https://nebius.com/prices-ai-studio (base flavor). Context windows sourced from https://docs.nebius.com/studio/inference/models/ Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
|
Cursor Agent can help with this pull request. Just |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile SummaryAdds 30 Nebius AI Studio model entries to the model cost/context configuration (chat, vision, and embedding models from DeepSeek, Llama, Qwen, Mistral, NVIDIA, Google, and BAAI/intfloat). Also normalizes two Perplexity embedding model prices to scientific notation.
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| model_prices_and_context_window.json | Adds 30 Nebius AI Studio model entries (chat, vision, embedding) with pricing and context windows. Model naming inconsistencies with constants.py (nvidia underscores vs dots, NousResearch name mismatch). Minor perplexity format normalization included. |
| litellm/model_prices_and_context_window_backup.json | Exact copy of the model_prices_and_context_window.json changes (backup file). Same model naming inconsistencies apply here. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["User calls litellm.completion\n(model='nebius/deepseek-ai/DeepSeek-R1')"] --> B["get_llm_provider_logic\nextract provider='nebius'"]
B --> C["Lookup model in\nmodel_prices_and_context_window.json"]
C --> D["get_model_info()\nReturns pricing, context window,\ncapability flags"]
B --> E["Route to nebius API\n(api.studio.nebius.ai/v1)"]
F["constants.py\nnebius_models set"] --> G["validate_environment()\nchecks NEBIUS_API_KEY"]
style C fill:#90EE90
style F fill:#FFD700
Last reviewed commit: ca34ec9
| "supports_function_calling": true, | ||
| "source": "https://nebius.com/prices-ai-studio" | ||
| }, | ||
| "nebius/nvidia/Llama-3.1-Nemotron-Ultra-253B-v1": { |
There was a problem hiding this comment.
Model name mismatch with constants.py
The nvidia model names in this file use dots (Llama-3.1, Llama-3.3) but litellm/constants.py:973-974 uses underscores (Llama-3_1, Llama-3_3). Similarly, nebius/NousResearch/Hermes-3-Llama-3.1-405B (line 24363) doesn't match NousResearch/Hermes-3-Llama-405B in constants.py:964.
These mismatches mean exact string lookups between the two sources won't match. For example, litellm.utils.validate_environment() checks model in litellm.nebius_models using names from constants.py, while get_model_info() uses the JSON keys. Either the JSON names or constants.py names should be aligned.
| "nebius/deepseek-ai/DeepSeek-R1": { | ||
| "max_tokens": 128000, | ||
| "max_input_tokens": 128000, | ||
| "max_output_tokens": 128000, | ||
| "input_cost_per_token": 8e-07, | ||
| "output_cost_per_token": 2.4e-06, | ||
| "litellm_provider": "nebius", | ||
| "mode": "chat", | ||
| "supports_function_calling": true, | ||
| "supports_reasoning": true, | ||
| "source": "https://nebius.com/prices-ai-studio" | ||
| }, |
There was a problem hiding this comment.
max_tokens equals max_input_tokens for all models — likely incorrect for some
Per the file header, max_tokens is a "LEGACY parameter. set to max_output_tokens if provider specifies it." For many of these models, max_tokens = max_input_tokens = max_output_tokens (e.g., DeepSeek-R1 at 128000 for all three). However, other providers define DeepSeek-R1 with different input vs. output limits (e.g., together_ai has max_input_tokens: 128000 but max_output_tokens: 20480; azure_ai has max_input_tokens: 128000 but max_output_tokens: 8192).
If Nebius genuinely supports equal input and output token limits for all these models, this is fine — but it's worth verifying against the Nebius documentation, since many users may get unexpected results if the actual output limit is lower.
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailslitellm.get_model_info()and proxy server model info endpoint.make test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
Changes
Adds 30 Nebius AI Studio models (text-to-text, vision, and embedding) to LiteLLM's model price and context window configuration.
model_prices_and_context_window.jsonandlitellm/model_prices_and_context_window_backup.json.supports_reasoning,supports_vision,supports_function_calling).litellm.get_model_info()./model/info.test_cost_calculator.py,test_model_cost_map_resilience.py,test_deepseek_model_metadata.py,llm_cost_calc/) to ensure no regressions.