diff --git a/docs/my-website/src/pages/index.md b/docs/my-website/src/pages/index.md
index 91215b33c5d..296a06bd7e9 100644
--- a/docs/my-website/src/pages/index.md
+++ b/docs/my-website/src/pages/index.md
@@ -7,42 +7,41 @@ https://github.com/BerriAI/litellm
## **Call 100+ LLMs using the OpenAI Input/Output Format**
-- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints
-- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']`
+- Translate inputs to provider's endpoints (`/chat/completions`, `/responses`, `/embeddings`, `/images`, `/audio`, `/batches`, and more)
+- [Consistent output](https://docs.litellm.ai/docs/supported_endpoints) - same response format regardless of which provider you use
- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
- Track spend & set budgets per project [LiteLLM Proxy Server](https://docs.litellm.ai/docs/simple_proxy)
## How to use LiteLLM
-You can use litellm through either:
-1. [LiteLLM Proxy Server](#litellm-proxy-server-llm-gateway) - Server (LLM Gateway) to call 100+ LLMs, load balance, cost tracking across projects
-2. [LiteLLM python SDK](#basic-usage) - Python Client to call 100+ LLMs, load balance, cost tracking
-### **When to use LiteLLM Proxy Server (LLM Gateway)**
-
-:::tip
-
-Use LiteLLM Proxy Server if you want a **central service (LLM Gateway) to access multiple LLMs**
-
-Typically used by Gen AI Enablement / ML PLatform Teams
-
-:::
-
- - LiteLLM Proxy gives you a unified interface to access multiple LLMs (100+ LLMs)
- - Track LLM Usage and setup guardrails
- - Customize Logging, Guardrails, Caching per project
-
-### **When to use LiteLLM Python SDK**
-
-:::tip
-
- Use LiteLLM Python SDK if you want to use LiteLLM in your **python code**
-
-Typically used by developers building llm projects
-
-:::
-
- - LiteLLM SDK gives you a unified interface to access multiple LLMs (100+ LLMs)
- - Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
+You can use LiteLLM through either the Proxy Server or Python SDK. Both gives you a unified interface to access multiple LLMs (100+ LLMs). Choose the option that best fits your needs:
+
+
+
+
+ |
+LiteLLM Proxy Server |
+LiteLLM Python SDK |
+
+
+
+
+| Use Case |
+Central service (LLM Gateway) to access multiple LLMs |
+Use LiteLLM directly in your Python code |
+
+
+| Who Uses It? |
+Gen AI Enablement / ML Platform Teams |
+Developers building LLM projects |
+
+
+| Key Features |
+• Centralized API gateway with authentication & authorization • Multi-tenant cost tracking and spend management per project/user • Per-project customization (logging, guardrails, caching) • Virtual keys for secure access control • Admin dashboard UI for monitoring and management |
+• Direct Python library integration in your codebase • Router with retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router • Application-level load balancing and cost tracking • Exception handling with OpenAI-compatible errors • Observability callbacks (Lunary, MLflow, Langfuse, etc.) |
+
+
+
## **LiteLLM Python SDK**
@@ -67,7 +66,7 @@ import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
- model="gpt-3.5-turbo",
+ model="openai/gpt-5",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
```
@@ -83,13 +82,27 @@ import os
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
response = completion(
- model="claude-2",
+ model="anthropic/claude-sonnet-4-5-20250929",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
```
+
+
+```python
+from litellm import completion
+import os
+## set ENV variables
+os.environ["XAI_API_KEY"] = "your-api-key"
+
+response = completion(
+ model="xai/grok-2-latest",
+ messages=[{ "content": "Hello, how are you?","role": "user"}]
+)
+```
+
```python
@@ -97,11 +110,11 @@ from litellm import completion
import os
# auth: run 'gcloud auth application-default'
-os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
-os.environ["VERTEX_LOCATION"] = "us-central1"
+os.environ["VERTEXAI_PROJECT"] = "hardy-device-386718"
+os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
- model="chat-bison",
+ model="vertex_ai/gemini-1.5-pro",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
```
@@ -212,8 +225,61 @@ response = completion(
+
+
+```python
+from litellm import completion
+import os
+
+## set ENV variables. Visit https://vercel.com/docs/ai-gateway#using-the-ai-gateway-with-an-api-key for instructions on obtaining a key
+os.environ["VERCEL_AI_GATEWAY_API_KEY"] = "your-vercel-api-key"
+
+response = completion(
+ model="vercel_ai_gateway/openai/gpt-5",
+ messages=[{ "content": "Hello, how are you?","role": "user"}]
+)
+```
+
+
+
+### Response Format (OpenAI Chat Completions Format)
+
+```json
+{
+ "id": "chatcmpl-565d891b-a42e-4c39-8d14-82a1f5208885",
+ "created": 1734366691,
+ "model": "gpt-5",
+ "object": "chat.completion",
+ "system_fingerprint": null,
+ "choices": [
+ {
+ "finish_reason": "stop",
+ "index": 0,
+ "message": {
+ "content": "Hello! As an AI language model, I don't have feelings, but I'm operating properly and ready to assist you with any questions or tasks you may have. How can I help you today?",
+ "role": "assistant",
+ "tool_calls": null,
+ "function_call": null
+ }
+ }
+ ],
+ "usage": {
+ "completion_tokens": 43,
+ "prompt_tokens": 13,
+ "total_tokens": 56,
+ "completion_tokens_details": null,
+ "prompt_tokens_details": {
+ "audio_tokens": null,
+ "cached_tokens": 0
+ },
+ "cache_creation_input_tokens": 0,
+ "cache_read_input_tokens": 0
+ }
+}
+```
+
### Responses API
Use `litellm.responses()` for advanced models that support reasoning content like GPT-5, o3, etc.
@@ -265,11 +331,11 @@ from litellm import responses
import os
# auth: run 'gcloud auth application-default'
-os.environ["VERTEX_PROJECT"] = "jr-smith-386718"
-os.environ["VERTEX_LOCATION"] = "us-central1"
+os.environ["VERTEXAI_PROJECT"] = "jr-smith-386718"
+os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = responses(
- model="chat-bison",
+ model="vertex_ai/gemini-1.5-pro",
messages=[{ "content": "What is the capital of France?","role": "user"}]
)
```
@@ -314,7 +380,7 @@ import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
- model="gpt-3.5-turbo",
+ model="openai/gpt-5",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
@@ -331,14 +397,29 @@ import os
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
response = completion(
- model="claude-2",
+ model="anthropic/claude-sonnet-4-5-20250929",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
```
+
+
+```python
+from litellm import completion
+import os
+## set ENV variables
+os.environ["XAI_API_KEY"] = "your-api-key"
+
+response = completion(
+ model="xai/grok-2-latest",
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
+ stream=True,
+)
+```
+
```python
@@ -346,11 +427,11 @@ from litellm import completion
import os
# auth: run 'gcloud auth application-default'
-os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
-os.environ["VERTEX_LOCATION"] = "us-central1"
+os.environ["VERTEXAI_PROJECT"] = "hardy-device-386718"
+os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
- model="chat-bison",
+ model="vertex_ai/gemini-1.5-pro",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
@@ -370,7 +451,7 @@ os.environ["NVIDIA_NIM_API_BASE"] = "nvidia_nim_endpoint_url"
response = completion(
model="nvidia_nim/",
- messages=[{ "content": "Hello, how are you?","role": "user"}]
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
```
@@ -466,22 +547,74 @@ response = completion(
```
+
+
+
+```python
+from litellm import completion
+import os
+
+## set ENV variables. Visit https://vercel.com/docs/ai-gateway#using-the-ai-gateway-with-an-api-key for instructions on obtaining a key
+os.environ["VERCEL_AI_GATEWAY_API_KEY"] = "your-vercel-api-key"
+
+response = completion(
+ model="vercel_ai_gateway/openai/gpt-5",
+ messages = [{ "content": "Hello, how are you?","role": "user"}],
+ stream=True,
+)
+```
+
+
+
+### Streaming Response Format (OpenAI Format)
+
+```json
+{
+ "id": "chatcmpl-2be06597-eb60-4c70-9ec5-8cd2ab1b4697",
+ "created": 1734366925,
+ "model": "claude-sonnet-4-5-20250929",
+ "object": "chat.completion.chunk",
+ "system_fingerprint": null,
+ "choices": [
+ {
+ "finish_reason": null,
+ "index": 0,
+ "delta": {
+ "content": "Hello",
+ "role": "assistant",
+ "function_call": null,
+ "tool_calls": null,
+ "audio": null
+ },
+ "logprobs": null
+ }
+ ]
+}
+```
+
### Exception handling
LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM.
```python
-from openai.error import OpenAIError
+import litellm
from litellm import completion
+import os
os.environ["ANTHROPIC_API_KEY"] = "bad-key"
try:
- # some code
- completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
-except OpenAIError as e:
- print(e)
+ completion(model="anthropic/claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
+except litellm.AuthenticationError as e:
+ # Thrown when the API key is invalid
+ print(f"Authentication failed: {e}")
+except litellm.RateLimitError as e:
+ # Thrown when you've exceeded your rate limit
+ print(f"Rate limited: {e}")
+except litellm.APIError as e:
+ # Thrown for general API errors
+ print(f"API error: {e}")
```
### Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks))
@@ -502,7 +635,7 @@ os.environ["OPENAI_API_KEY"]
litellm.success_callback = ["lunary", "mlflow", "langfuse", "helicone"] # log input/output to lunary, mlflow, langfuse, helicone
#openai call
-response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
+response = completion(model="openai/gpt-5", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
```
### Track Costs, Usage, Latency for streaming
@@ -527,7 +660,7 @@ litellm.success_callback = [track_cost_callback] # set custom callback function
# litellm.completion() call
response = completion(
- model="gpt-3.5-turbo",
+ model="openai/gpt-5",
messages=[
{
"role": "user",
@@ -584,7 +717,7 @@ Example `litellm_config.yaml`
```yaml
model_list:
- - model_name: gpt-3.5-turbo
+ - model_name: gpt-5
litellm_params:
model: azure/
api_base: os.environ/AZURE_API_BASE # runs os.getenv("AZURE_API_BASE")
@@ -621,7 +754,7 @@ docker run \
import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
-response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
+response = client.chat.completions.create(model="gpt-5", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"