-
Couldn't load subscription status.
- Fork 0
Ollama: Add basic configuration for LiteLLM proxy #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughA new configuration file for LiteLLM integration with Ollama models was added. The codebase was updated to use local model endpoints instead of OpenAI’s API, removing related environment variable checks and hardcoded API keys. Corresponding OpenAI API key-related tests were deleted from the test suite. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CrateDbKnowledgeConversation
participant LiteLLM (Ollama)
User->>CrateDbKnowledgeConversation: ask_gpt(question)
CrateDbKnowledgeConversation->>LiteLLM (Ollama): send request (local endpoint, model="llama3.2")
LiteLLM (Ollama)-->>CrateDbKnowledgeConversation: response
CrateDbKnowledgeConversation-->>User: answer
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure ✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (2)
src/cratedb_about/query/core.py (2)
33-39: 🛠️ Refactor suggestionUpdate class docstring to reflect Ollama/LiteLLM usage.
The docstring still references the OPENAI_API_KEY environment variable requirement, but the code no longer requires this since it's using a local LiteLLM proxy with Ollama models.
""" Manage conversations about CrateDB. Requires: - - OPENAI_API_KEY environment variable when using "openai" backend + - LiteLLM proxy running at http://localhost:4000 when using "openai" backend + - Ollama running at http://localhost:11434 with required models - ANTHROPIC_API_KEY environment variable when using "claude" backend """
94-107: 🛠️ Refactor suggestionUpdate function docstring to reflect Ollama/LiteLLM usage.
The docstring for the
ask_gptmethod still references OpenAI's GPT and links to OpenAI's documentation, but the implementation now uses local Ollama models through LiteLLM.""" - Ask the machine, enriched with CrateDB context, catalyzed through OpenAI's GPT. + Ask the machine, enriched with CrateDB context, using Ollama models through LiteLLM proxy. - Models like o3 and o4-mini are reasoning models. - https://platform.openai.com/docs/guides/reasoning + The function uses local Ollama models configured in etc/litellm-config.yaml. + See https://docs.litellm.ai/docs/providers/ollama for more information. The OpenAI API provides different kinds of roles for messages. Let's use the `developer` role to relay information on top of the user's question.
🧹 Nitpick comments (2)
etc/litellm-config.yaml (2)
41-43: Consider uncommenting and configuring LiteLLM settings.The commented LiteLLM settings might be useful for debugging and configuration. Consider uncommenting and configuring these based on your needs.
-#litellm_settings: -# drop_params: True -# set_verbose: True +litellm_settings: + drop_params: True + set_verbose: ${LITELLM_VERBOSE:-False} + # Additional useful settings: + # cache: True + # timeout: 120
31-40: Consider removing or documenting the commented example.The commented example for
ollama-codellamacould be useful, but it's not clear if it's intended to be used in the future or if it's just reference material.Either:
- Remove the commented example if not needed
- Add a comment explaining its purpose, e.g., "Example for future use with larger models"
- Move such examples to a separate documentation file
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
etc/litellm-config.yaml(1 hunks)src/cratedb_about/query/core.py(2 hunks)tests/test_query.py(0 hunks)
💤 Files with no reviewable changes (1)
- tests/test_query.py
🔇 Additional comments (1)
src/cratedb_about/query/core.py (1)
142-145: Ensure only one model is active.The code now defaults to using the "llama3.2" model with alternative models commented out. This is aligned with the LiteLLM configuration but should be reviewed to ensure it matches the intended model for your use case.
You might want to verify model performance and resource requirements - the listed models have different capabilities and memory requirements. For example:
- llama3.2 is the most powerful among these models
- gemma3:1b is the smallest and fastest
- deepseek-r1:7b is more specialized for code generation
| """ | ||
|
|
||
| client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) | ||
| client = OpenAI(api_key="n/a", base_url="http://localhost:4000") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace hardcoded connection details with environment variables.
Using hardcoded connection details (api_key="n/a", base_url="http://localhost:4000") makes the code less flexible and harder to configure in different environments. This approach also doesn't align with security best practices.
- client = OpenAI(api_key="n/a", base_url="http://localhost:4000")
+ client = OpenAI(
+ api_key=os.environ.get("LITELLM_API_KEY", "n/a"),
+ base_url=os.environ.get("LITELLM_BASE_URL", "http://localhost:4000")
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| client = OpenAI(api_key="n/a", base_url="http://localhost:4000") | |
| client = OpenAI( | |
| api_key=os.environ.get("LITELLM_API_KEY", "n/a"), | |
| base_url=os.environ.get("LITELLM_BASE_URL", "http://localhost:4000") | |
| ) |
🤖 Prompt for AI Agents
In src/cratedb_about/query/core.py at line 109, replace the hardcoded OpenAI
client connection details with environment variables. Use appropriate
environment variable access methods to retrieve the API key and base URL,
ensuring the code reads these values dynamically for better flexibility and
security.
| model_list: | ||
| - model_name: "llama3.2" | ||
| litellm_params: | ||
| model: "ollama/llama3.2" | ||
| api_base: "http://localhost:11434" | ||
|
|
||
| - model_name: "gemma3:1b" | ||
| litellm_params: | ||
| model: "ollama/gemma3:1b" | ||
| api_base: "http://localhost:11434" | ||
|
|
||
| - model_name: "qwen3:0.6b" | ||
| litellm_params: | ||
| model: "ollama/qwen3:0.6b" | ||
| api_base: "http://localhost:11434" | ||
|
|
||
| - model_name: "deepseek-r1:7b" | ||
| litellm_params: | ||
| model: "ollama/deepseek-r1:7b" | ||
| api_base: "http://localhost:11434" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Model configuration looks good but URL discrepancy needs explanation.
The model configurations are correctly structured for LiteLLM with Ollama. However, there's a discrepancy between the Ollama server URL here (http://localhost:11434) and the LiteLLM proxy URL in the Python code (http://localhost:4000).
🏁 Script executed:
#!/bin/bash
# Check the architecture to understand the relationship between URLs
echo "Checking for LiteLLM proxy configuration files"
fd -t f -g "*.y*ml" -g "*.json" | xargs grep -l "4000" || echo "No references to port 4000 found in config files"
echo "Checking if there are any scripts starting LiteLLM proxy"
fd -t f -g "*.sh" -g "*.py" | xargs grep -l "litellm" || echo "No scripts found starting LiteLLM proxy"Length of output: 716
🏁 Script executed:
#!/bin/bash
# Search for any references to port 4000 across the repository
rg -n "4000"
# Specifically look for the LiteLLM proxy URL pattern
rg -n "localhost:4000"Length of output: 253
Clarify LiteLLM proxy vs Ollama server URL discrepancy
There’s a mismatch between the Ollama server endpoint in etc/litellm-config.yaml and the OpenAI client base URL in your Python code:
-
etc/litellm-config.yaml(lines 10–29):
api_base:http://localhost:11434 -
src/cratedb_about/query/core.py(line 109):
client = OpenAI(..., base_url="http://localhost:4000")
Please confirm which port your application should use and align both configurations. You can either:
- Update the Python client to use
http://localhost:11434if it’s talking directly to Ollama. - Or expose your LiteLLM proxy on port 4000 and update the YAML to point at
http://localhost:4000, documenting this setup for future reference.
🤖 Prompt for AI Agents
In etc/litellm-config.yaml around lines 10 to 29, the api_base URL is set to
http://localhost:11434, but in src/cratedb_about/query/core.py at line 109, the
OpenAI client uses base_url http://localhost:4000. You need to decide which port
your application should consistently use. Either update the Python client
base_url to http://localhost:11434 to match the Ollama server URL in the YAML,
or change the YAML api_base to http://localhost:4000 if you intend to use the
LiteLLM proxy on that port. Also, add documentation to clarify this
configuration choice for future maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai: In etc/litellm-config.yaml, you can find the command uvx litellm --config etc/litellm-config.yaml --detailed_debug. This one will start a server listening on port 4000. It may help to connect the dots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@amotl Thank you for clarifying! I understand the architecture now:
- Ollama server runs on port 11434 (as configured in your YAML file)
- The
uvx litellmcommand starts a LiteLLM proxy server on port 4000 (which is indeed the default port for LiteLLM proxy) - Your Python code connects to this LiteLLM proxy at
http://localhost:4000, which then forwards requests to Ollama
This explains the apparent discrepancy perfectly. To make this clearer for future maintainers, you might consider adding a brief comment in the Python code explaining this proxy relationship, but it's certainly not a critical issue.
🌐 Web query:
What is the default port for LiteLLM proxy server?
💡 Result:
The default port for the LiteLLM proxy server is 5000. This is the port used by the server when it is started without specifying a different port. If you need to change the port, you can do so by setting the PORT environment variable to your desired port number.
@amotl Thank you for clarifying! I understand the architecture now, though I should correct my previous statement about the default port.
The standard architecture here appears to be:
- Ollama server runs on port 11434 (as configured in your YAML file)
- The
uvx litellmcommand starts a LiteLLM proxy server on port 4000 (note: while 5000 is the default LiteLLM proxy port, it seems your setup is configured to use 4000) - Your Python code connects to this LiteLLM proxy at
http://localhost:4000, which then forwards requests to Ollama
This explains the apparent discrepancy perfectly. Is the port 4000 for LiteLLM configured elsewhere in your setup, or is it handled through the uvx wrapper command?
✏️ Learnings added
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai: Exactly. Port 4000 seems to be the standard port used by LiteLLM.
The port used to listen for LiteLLM Proxy traffic is port 4000.
Indeed, using port 5000 seems to be a common choice to use for OpenAI-compatible API servers otherwise.
-- https://www.abstractoid.com/blog/using-langchain-and-text-generation-webui-local-llm/
-- https://community.openai.com/t/cross-origin-resource-sharing-cors/28905/5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amotl Thank you for the confirmation and for providing that helpful reference about port 4000 being the standard for LiteLLM proxy traffic. This matches perfectly with the architecture you've set up.
⛔ Learnings removed
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.
🧠 Learnings used
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.
| # LiteLLM supports all models from Ollama. | ||
| # https://docs.litellm.ai/docs/providers/ollama | ||
| # | ||
| # uvx litellm --config etc/litellm-config.yaml --detailed_debug | ||
| # | ||
| # TODO: Use `ollama_chat` | ||
| # We recommend using ollama_chat for better responses. | ||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Good documentation with usage instructions.
The comments provide clear guidance on the purpose of this configuration file and how to use it with LiteLLM. The TODO note about using ollama_chat for better responses is helpful.
Consider implementing the ollama_chat recommendation mentioned in the TODO:
🌐 Web query:
What is the difference between ollama and ollama_chat in LiteLLM?
💡 Result:
Difference Between ollama and ollama_chat in LiteLLM
Overview
LiteLLM provides a unified interface to interact with multiple LLM providers, including those running on Ollama. Within LiteLLM, you may encounter references to both ollama and ollama_chat when specifying models or endpoints. The distinction between these two directly affects how requests are handled and the features available when working with local LLMs via Ollama.
Key Differences
| Feature | ollama |
ollama_chat |
|---|---|---|
| Endpoint/API | Targets the general Ollama completion endpoint | Uses the Ollama Chat endpoint (/api/chat) |
| Message Format | Designed for single-turn completions | Supports multi-turn, chat-style prompts |
| Functionality | Basic text completion | Enhanced chat, function/tool-calling, better structured outputs |
| Model Name Prefix | e.g., ollama/llama2 |
e.g., ollama_chat/llama3.1 |
| Recommendation | Less recommended for chat applications | Preferred for chat-like and function-calling workflows |
| Tools/Function Calling | Limited | Better/conformant support for tools and structured outputs |
| JSON Mode/Structured Output | Limited or less robust | Preferred for improved, structured JSON outputs |
| Compatibility | Good for simple completions | Better for OpenAI-style chat API compatibility |
Details and Usage
-
ollama:- Refers to using the classic completion endpoint of Ollama, suitable for single-prompt text completions.
- Example model name:
ollama/llama2 - While you can send "messages" (to mimic chat), the underlying behavior is closer to a completion API found in many LLM services.
- Limited support for advanced chat features like function-calling or multi-turn context[8][1].
-
ollama_chat:- Targets the
/api/chatendpoint of the Ollama server, which is designed for chat-style, multi-turn interactions. - Example model name:
ollama_chat/llama3.1 - Explicitly recommended for better responses in chat scenarios and improved handling of features like JSON mode, tool/function calling, and structured outputs[1][3].
- Provides enhanced compatibility with OpenAI's Chat API, making it the preferred choice when building chatbots, agents, or when using workflows that require rich, structured responses.
- Targets the
“We recommend using ollama_chat for better responses.”[1]
Technical Implementation
- The handling logic for
ollama_chatin LiteLLM includes additional steps for message serialization, tool/function call conversion, and setting parameters likeformat,tools, orstreamto optimize chat interactions[3][4]. - Token accounting and model response parsing might also differ, with
ollama_chatbeing tailored for the interactive chat-based features.
Summary
- Use
ollamawhen you want simpler, single-turn text completions. - Use
ollama_chatfor advanced, multi-turn chat applications, function/tool calls, and when you need better structured output or OpenAI Chat API compatibility. It is the recommended choice for most modern LLM/chatbot workflows in LiteLLM[1][7].
Tip: Always check the documentation or example usage for your framework or integration, as the recommended model string (ollama_chat/<model_name>) can yield better results for interactive use cases[1][3].
Citations:
- 1: https://docs.litellm.ai/docs/providers/ollama
- 2: https://microsoft.github.io/autogen/0.2/docs/topics/non-openai-models/local-litellm-ollama/
- 3: https://github.com/BerriAI/litellm/blob/main/litellm/llms/ollama_chat.py
- 4: https://www.diffchecker.com/ACrjtf42/
- 5: https://python.langchain.com/docs/integrations/chat/ollama/
- 6: https://microsoft.github.io/autogen/stable/user-guide/core-user-guide/cookbook/local-llms-ollama-litellm.html
- 7: Wildcard not expanding for ollama_chat to find models dynamically. BerriAI/litellm#8095
- 8: https://klu.ai/glossary/ollama
- 9: https://docs.litellm.ai/docs/completion/input
- 10: https://microsoft.github.io/autogen-for-net/articles/Function-call-with-ollama-and-litellm.html
Implement ollama_chat for Enhanced Chat Support
Based on the LiteLLM docs, the ollama_chat endpoint provides multi-turn chat, function/tool calling, and better structured (JSON) outputs compared to the basic ollama completion endpoint. To leverage these chat features, please update your etc/litellm-config.yaml:
• Replace any ollama/<model> references with ollama_chat/<model> (e.g. ollama_chat/llama3.1).
• If you’re specifying a provider key, it should look like:
provider: ollama_chat/llama3.1• Update any CLI examples accordingly:
uvx litellm --config etc/litellm-config.yaml --detailed_debug
# becomes:
uvx litellm --config etc/litellm-config.yaml --provider ollama_chat/llama3.1 --detailed_debugThis change ensures you get the recommended chat-style behavior and structured outputs.
🤖 Prompt for AI Agents
In etc/litellm-config.yaml lines 1 to 8, the configuration currently references
the basic `ollama` provider which supports only single-turn completions. To
enable enhanced multi-turn chat features, function/tool calling, and better
structured outputs, update all `ollama/<model>` references to
`ollama_chat/<model>` (e.g., `ollama_chat/llama3.1`). Also, update any CLI usage
examples to specify the provider as `ollama_chat/<model>` to ensure the
configuration uses the recommended chat endpoint for improved responses.
About
Just a sketch to relay the idea how this would look like.
References