Skip to content

[Feat] Add pricing for Nebius models#22614

Merged
ishaan-jaff merged 1 commit intomainfrom
cursor/development-environment-setup-a1ab
Mar 3, 2026
Merged

[Feat] Add pricing for Nebius models#22614
ishaan-jaff merged 1 commit intomainfrom
cursor/development-environment-setup-a1ab

Conversation

@ishaan-jaff
Copy link
Copy Markdown
Contributor

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
    • Verified model recognition via litellm.get_model_info() and proxy server model info endpoint.
  • My PR passes all unit tests on make test-unit
    • All relevant unit tests passed; one pre-existing, unrelated failure was confirmed.
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

Adds 30 Nebius AI Studio models (text-to-text, vision, and embedding) to LiteLLM's model price and context window configuration.

  • Why: To enable support for Nebius AI Studio models within LiteLLM, allowing users to leverage these models with correct pricing and context window information.
  • What:
    • Added 30 Nebius models to model_prices_and_context_window.json and litellm/model_prices_and_context_window_backup.json.
    • Models include various chat, vision, and embedding capabilities from providers like DeepSeek, Meta Llama, Qwen, Mistral, NousResearch, NVIDIA, Google, and BAAI/intfloat.
    • Included accurate pricing (per-token) and context window information sourced from Nebius documentation.
    • Set appropriate capability flags (supports_reasoning, supports_vision, supports_function_calling).
  • Verification:
    • Confirmed all 30 Nebius models are correctly recognized by litellm.get_model_info().
    • Verified the LiteLLM proxy server successfully starts, recognizes Nebius models, and serves their information via /model/info.
    • Ran relevant unit tests (test_cost_calculator.py, test_model_cost_map_resilience.py, test_deepseek_model_metadata.py, llm_cost_calc/) to ensure no regressions.

Open in Web Open in Cursor 

…json

Add 30 Nebius AI Studio models covering:
- Text-to-text: DeepSeek (R1, R1-0528, R1-Distill, V3, V3-0324), Meta Llama
  (3.1-8B/70B/405B, 3.3-70B), Qwen (3-235B/32B/30B/14B/4B, 2.5-72B/32B,
  2.5-Coder-7B, QwQ-32B), Mistral Nemo, NousResearch Hermes-3, NVIDIA
  Nemotron Ultra/Super, Google Gemma-3-27B, Llama-Guard-3
- Vision: Qwen2.5-VL-72B, Qwen2-VL-72B, Qwen2-VL-7B
- Embedding: BAAI/bge-en-icl, BAAI/bge-multilingual-gemma2, intfloat/e5-mistral-7b

Pricing sourced from https://nebius.com/prices-ai-studio (base flavor).
Context windows sourced from https://docs.nebius.com/studio/inference/models/

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
@cursor
Copy link
Copy Markdown

cursor bot commented Mar 3, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Error Error Mar 3, 2026 2:42am

Request Review

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ishaan-jaff ishaan-jaff changed the title Development environment setup [Feat] Add pricing for Nebius models Mar 3, 2026
@ishaan-jaff ishaan-jaff merged commit 86d5b4c into main Mar 3, 2026
26 of 40 checks passed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

Adds 30 Nebius AI Studio model entries to the model cost/context configuration (chat, vision, and embedding models from DeepSeek, Llama, Qwen, Mistral, NVIDIA, Google, and BAAI/intfloat). Also normalizes two Perplexity embedding model prices to scientific notation.

  • Model naming inconsistencies: Several model names in the JSON don't match litellm/constants.py — the nvidia Nemotron models use dots (3.1, 3.3) in the JSON but underscores (3_1, 3_3) in constants, and the NousResearch Hermes model includes 3.1 in the JSON key but omits it in constants. This could cause lookup mismatches in validate_environment().
  • Token limit accuracy: All 27 chat models set max_tokens = max_input_tokens = max_output_tokens, which differs from how other providers define the same models (e.g., DeepSeek-R1 on together_ai has 128k input but only 20k output). Worth verifying these are the actual Nebius limits.
  • Structure follows conventions: Model entries correctly use litellm_provider: "nebius", include the appropriate capability flags (supports_reasoning, supports_vision, supports_function_calling), and embedding models properly omit max_output_tokens.

Confidence Score: 3/5

  • This PR is low-risk (data-only JSON changes) but has naming inconsistencies with constants.py that should be resolved to avoid lookup mismatches.
  • Score of 3 reflects that the changes are purely additive data entries with no code logic changes, but model naming mismatches between the JSON file and constants.py (nvidia dots vs underscores, NousResearch missing version) could cause provider lookup issues. The equal max_input/max_output token values also warrant verification.
  • model_prices_and_context_window.json — verify nvidia/NousResearch model names match constants.py and confirm token limits are correct per Nebius docs.

Important Files Changed

Filename Overview
model_prices_and_context_window.json Adds 30 Nebius AI Studio model entries (chat, vision, embedding) with pricing and context windows. Model naming inconsistencies with constants.py (nvidia underscores vs dots, NousResearch name mismatch). Minor perplexity format normalization included.
litellm/model_prices_and_context_window_backup.json Exact copy of the model_prices_and_context_window.json changes (backup file). Same model naming inconsistencies apply here.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["User calls litellm.completion\n(model='nebius/deepseek-ai/DeepSeek-R1')"] --> B["get_llm_provider_logic\nextract provider='nebius'"]
    B --> C["Lookup model in\nmodel_prices_and_context_window.json"]
    C --> D["get_model_info()\nReturns pricing, context window,\ncapability flags"]
    B --> E["Route to nebius API\n(api.studio.nebius.ai/v1)"]
    
    F["constants.py\nnebius_models set"] --> G["validate_environment()\nchecks NEBIUS_API_KEY"]
    
    style C fill:#90EE90
    style F fill:#FFD700
Loading

Last reviewed commit: ca34ec9

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

"supports_function_calling": true,
"source": "https://nebius.com/prices-ai-studio"
},
"nebius/nvidia/Llama-3.1-Nemotron-Ultra-253B-v1": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model name mismatch with constants.py

The nvidia model names in this file use dots (Llama-3.1, Llama-3.3) but litellm/constants.py:973-974 uses underscores (Llama-3_1, Llama-3_3). Similarly, nebius/NousResearch/Hermes-3-Llama-3.1-405B (line 24363) doesn't match NousResearch/Hermes-3-Llama-405B in constants.py:964.

These mismatches mean exact string lookups between the two sources won't match. For example, litellm.utils.validate_environment() checks model in litellm.nebius_models using names from constants.py, while get_model_info() uses the JSON keys. Either the JSON names or constants.py names should be aligned.

Comment on lines +24229 to +24240
"nebius/deepseek-ai/DeepSeek-R1": {
"max_tokens": 128000,
"max_input_tokens": 128000,
"max_output_tokens": 128000,
"input_cost_per_token": 8e-07,
"output_cost_per_token": 2.4e-06,
"litellm_provider": "nebius",
"mode": "chat",
"supports_function_calling": true,
"supports_reasoning": true,
"source": "https://nebius.com/prices-ai-studio"
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_tokens equals max_input_tokens for all models — likely incorrect for some

Per the file header, max_tokens is a "LEGACY parameter. set to max_output_tokens if provider specifies it." For many of these models, max_tokens = max_input_tokens = max_output_tokens (e.g., DeepSeek-R1 at 128000 for all three). However, other providers define DeepSeek-R1 with different input vs. output limits (e.g., together_ai has max_input_tokens: 128000 but max_output_tokens: 20480; azure_ai has max_input_tokens: 128000 but max_output_tokens: 8192).

If Nebius genuinely supports equal input and output token limits for all these models, this is fine — but it's worth verifying against the Nebius documentation, since many users may get unexpected results if the actual output limit is lower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants