Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions etc/litellm-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# LiteLLM supports all models from Ollama.
# https://docs.litellm.ai/docs/providers/ollama
#
# uvx litellm --config etc/litellm-config.yaml --detailed_debug
#
# TODO: Use `ollama_chat`
# We recommend using ollama_chat for better responses.
---
Comment on lines +1 to +8
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Good documentation with usage instructions.

The comments provide clear guidance on the purpose of this configuration file and how to use it with LiteLLM. The TODO note about using ollama_chat for better responses is helpful.

Consider implementing the ollama_chat recommendation mentioned in the TODO:


🌐 Web query:

What is the difference between ollama and ollama_chat in LiteLLM?

💡 Result:

Difference Between ollama and ollama_chat in LiteLLM

Overview

LiteLLM provides a unified interface to interact with multiple LLM providers, including those running on Ollama. Within LiteLLM, you may encounter references to both ollama and ollama_chat when specifying models or endpoints. The distinction between these two directly affects how requests are handled and the features available when working with local LLMs via Ollama.


Key Differences

Feature ollama ollama_chat
Endpoint/API Targets the general Ollama completion endpoint Uses the Ollama Chat endpoint (/api/chat)
Message Format Designed for single-turn completions Supports multi-turn, chat-style prompts
Functionality Basic text completion Enhanced chat, function/tool-calling, better structured outputs
Model Name Prefix e.g., ollama/llama2 e.g., ollama_chat/llama3.1
Recommendation Less recommended for chat applications Preferred for chat-like and function-calling workflows
Tools/Function Calling Limited Better/conformant support for tools and structured outputs
JSON Mode/Structured Output Limited or less robust Preferred for improved, structured JSON outputs
Compatibility Good for simple completions Better for OpenAI-style chat API compatibility

Details and Usage

  • ollama:

    • Refers to using the classic completion endpoint of Ollama, suitable for single-prompt text completions.
    • Example model name: ollama/llama2
    • While you can send "messages" (to mimic chat), the underlying behavior is closer to a completion API found in many LLM services.
    • Limited support for advanced chat features like function-calling or multi-turn context[8][1].
  • ollama_chat:

    • Targets the /api/chat endpoint of the Ollama server, which is designed for chat-style, multi-turn interactions.
    • Example model name: ollama_chat/llama3.1
    • Explicitly recommended for better responses in chat scenarios and improved handling of features like JSON mode, tool/function calling, and structured outputs[1][3].
    • Provides enhanced compatibility with OpenAI's Chat API, making it the preferred choice when building chatbots, agents, or when using workflows that require rich, structured responses.

“We recommend using ollama_chat for better responses.”[1]

Technical Implementation

  • The handling logic for ollama_chat in LiteLLM includes additional steps for message serialization, tool/function call conversion, and setting parameters like format, tools, or stream to optimize chat interactions[3][4].
  • Token accounting and model response parsing might also differ, with ollama_chat being tailored for the interactive chat-based features.

Summary

  • Use ollama when you want simpler, single-turn text completions.
  • Use ollama_chat for advanced, multi-turn chat applications, function/tool calls, and when you need better structured output or OpenAI Chat API compatibility. It is the recommended choice for most modern LLM/chatbot workflows in LiteLLM[1][7].

Tip: Always check the documentation or example usage for your framework or integration, as the recommended model string (ollama_chat/<model_name>) can yield better results for interactive use cases[1][3].

Citations:


Implement ollama_chat for Enhanced Chat Support

Based on the LiteLLM docs, the ollama_chat endpoint provides multi-turn chat, function/tool calling, and better structured (JSON) outputs compared to the basic ollama completion endpoint. To leverage these chat features, please update your etc/litellm-config.yaml:

• Replace any ollama/<model> references with ollama_chat/<model> (e.g. ollama_chat/llama3.1).
• If you’re specifying a provider key, it should look like:

provider: ollama_chat/llama3.1

• Update any CLI examples accordingly:

uvx litellm --config etc/litellm-config.yaml --detailed_debug
# becomes:
uvx litellm --config etc/litellm-config.yaml --provider ollama_chat/llama3.1 --detailed_debug

This change ensures you get the recommended chat-style behavior and structured outputs.

🤖 Prompt for AI Agents
In etc/litellm-config.yaml lines 1 to 8, the configuration currently references
the basic `ollama` provider which supports only single-turn completions. To
enable enhanced multi-turn chat features, function/tool calling, and better
structured outputs, update all `ollama/<model>` references to
`ollama_chat/<model>` (e.g., `ollama_chat/llama3.1`). Also, update any CLI usage
examples to specify the provider as `ollama_chat/<model>` to ensure the
configuration uses the recommended chat endpoint for improved responses.


model_list:
- model_name: "llama3.2"
litellm_params:
model: "ollama/llama3.2"
api_base: "http://localhost:11434"

- model_name: "gemma3:1b"
litellm_params:
model: "ollama/gemma3:1b"
api_base: "http://localhost:11434"

- model_name: "qwen3:0.6b"
litellm_params:
model: "ollama/qwen3:0.6b"
api_base: "http://localhost:11434"

- model_name: "deepseek-r1:7b"
litellm_params:
model: "ollama/deepseek-r1:7b"
api_base: "http://localhost:11434"
Comment on lines +10 to +29
Copy link

@coderabbitai coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Model configuration looks good but URL discrepancy needs explanation.

The model configurations are correctly structured for LiteLLM with Ollama. However, there's a discrepancy between the Ollama server URL here (http://localhost:11434) and the LiteLLM proxy URL in the Python code (http://localhost:4000).


🏁 Script executed:

#!/bin/bash
# Check the architecture to understand the relationship between URLs
echo "Checking for LiteLLM proxy configuration files"
fd -t f -g "*.y*ml" -g "*.json" | xargs grep -l "4000" || echo "No references to port 4000 found in config files"

echo "Checking if there are any scripts starting LiteLLM proxy"
fd -t f -g "*.sh" -g "*.py" | xargs grep -l "litellm" || echo "No scripts found starting LiteLLM proxy"

Length of output: 716


🏁 Script executed:

#!/bin/bash
# Search for any references to port 4000 across the repository
rg -n "4000"

# Specifically look for the LiteLLM proxy URL pattern
rg -n "localhost:4000"

Length of output: 253


Clarify LiteLLM proxy vs Ollama server URL discrepancy

There’s a mismatch between the Ollama server endpoint in etc/litellm-config.yaml and the OpenAI client base URL in your Python code:

  • etc/litellm-config.yaml (lines 10–29):
    api_base: http://localhost:11434

  • src/cratedb_about/query/core.py (line 109):
    client = OpenAI(..., base_url="http://localhost:4000")

Please confirm which port your application should use and align both configurations. You can either:

  • Update the Python client to use http://localhost:11434 if it’s talking directly to Ollama.
  • Or expose your LiteLLM proxy on port 4000 and update the YAML to point at http://localhost:4000, documenting this setup for future reference.
🤖 Prompt for AI Agents
In etc/litellm-config.yaml around lines 10 to 29, the api_base URL is set to
http://localhost:11434, but in src/cratedb_about/query/core.py at line 109, the
OpenAI client uses base_url http://localhost:4000. You need to decide which port
your application should consistently use. Either update the Python client
base_url to http://localhost:11434 to match the Ollama server URL in the YAML,
or change the YAML api_base to http://localhost:4000 if you intend to use the
LiteLLM proxy on that port. Also, add documentation to clarify this
configuration choice for future maintainers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai: In etc/litellm-config.yaml, you can find the command uvx litellm --config etc/litellm-config.yaml --detailed_debug. This one will start a server listening on port 4000. It may help to connect the dots.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@amotl Thank you for clarifying! I understand the architecture now:

  1. Ollama server runs on port 11434 (as configured in your YAML file)
  2. The uvx litellm command starts a LiteLLM proxy server on port 4000 (which is indeed the default port for LiteLLM proxy)
  3. Your Python code connects to this LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama

This explains the apparent discrepancy perfectly. To make this clearer for future maintainers, you might consider adding a brief comment in the Python code explaining this proxy relationship, but it's certainly not a critical issue.


🌐 Web query:

What is the default port for LiteLLM proxy server?

💡 Result:

The default port for the LiteLLM proxy server is 5000. This is the port used by the server when it is started without specifying a different port. If you need to change the port, you can do so by setting the PORT environment variable to your desired port number.


@amotl Thank you for clarifying! I understand the architecture now, though I should correct my previous statement about the default port.

The standard architecture here appears to be:

  1. Ollama server runs on port 11434 (as configured in your YAML file)
  2. The uvx litellm command starts a LiteLLM proxy server on port 4000 (note: while 5000 is the default LiteLLM proxy port, it seems your setup is configured to use 4000)
  3. Your Python code connects to this LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama

This explains the apparent discrepancy perfectly. Is the port 4000 for LiteLLM configured elsewhere in your setup, or is it handled through the uvx wrapper command?


✏️ Learnings added
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Member Author

@amotl amotl May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai: Exactly. Port 4000 seems to be the standard port used by LiteLLM.

The port used to listen for LiteLLM Proxy traffic is port 4000.

-- https://techcommunity.microsoft.com/blog/educatordeveloperblog/step-by-step-integrate-ollama-web-ui-to-use-azure-open-ai-api-with-litellm-proxy/4386612

Indeed, using port 5000 seems to be a common choice to use for OpenAI-compatible API servers otherwise.

-- https://www.abstractoid.com/blog/using-langchain-and-text-generation-webui-local-llm/
-- https://community.openai.com/t/cross-origin-resource-sharing-cors/28905/5

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl Thank you for the confirmation and for providing that helpful reference about port 4000 being the standard for LiteLLM proxy traffic. This matches perfectly with the architecture you've set up.


⛔ Learnings removed
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

🧠 Learnings used
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.


# https://github.com/BerriAI/litellm/issues/1517#issuecomment-1922022209
#model_list:
# - model_name: ollama-codellama
# litellm_params:
# model: ollama/codellama:70b
# api_base: http://0.0.0.0:11434
# rpm: 1440
# model_info:
# version: 2

#litellm_settings:
# drop_params: True
# set_verbose: True
12 changes: 6 additions & 6 deletions src/cratedb_about/query/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,6 @@
raise ImportError("The 'openai' package is required when using the OpenAI backend")
if self.backend == "claude" and not CLAUDE_AVAILABLE:
raise ImportError("The 'claudette' package is required when using the Claude backend")
if self.backend == "openai" and not os.environ.get("OPENAI_API_KEY"):
raise ValueError(
"OPENAI_API_KEY environment variable is required when using 'openai' backend"
)
if self.backend == "claude" and not os.environ.get("ANTHROPIC_API_KEY"):
raise ValueError(
"ANTHROPIC_API_KEY environment variable is required when using 'claude' backend"
Expand Down Expand Up @@ -110,7 +106,7 @@
- https://community.openai.com/t/how-is-developer-message-better-than-system-prompt/1062784
"""

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
client = OpenAI(api_key="n/a", base_url="http://localhost:4000")

Check warning on line 109 in src/cratedb_about/query/core.py

View check run for this annotation

Codecov / codecov/patch

src/cratedb_about/query/core.py#L109

Added line #L109 was not covered by tests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace hardcoded connection details with environment variables.

Using hardcoded connection details (api_key="n/a", base_url="http://localhost:4000") makes the code less flexible and harder to configure in different environments. This approach also doesn't align with security best practices.

- client = OpenAI(api_key="n/a", base_url="http://localhost:4000")
+ client = OpenAI(
+     api_key=os.environ.get("LITELLM_API_KEY", "n/a"),
+     base_url=os.environ.get("LITELLM_BASE_URL", "http://localhost:4000")
+ )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
client = OpenAI(api_key="n/a", base_url="http://localhost:4000")
client = OpenAI(
api_key=os.environ.get("LITELLM_API_KEY", "n/a"),
base_url=os.environ.get("LITELLM_BASE_URL", "http://localhost:4000")
)
🤖 Prompt for AI Agents
In src/cratedb_about/query/core.py at line 109, replace the hardcoded OpenAI
client connection details with environment variables. Use appropriate
environment variable access methods to retrieve the API key and base URL,
ensuring the code reads these values dynamically for better flexibility and
security.


input_messages: t.List[Message] = []

Expand Down Expand Up @@ -140,9 +136,13 @@
)

# model = "gpt-4o" # noqa: ERA001
model = "gpt-4.1" # noqa: ERA001
# model = "gpt-4.1" # noqa: ERA001
# model = "o4-mini" # noqa: ERA001
# model = "o3" # noqa: ERA001
model = "llama3.2" # noqa: ERA001

Check warning on line 142 in src/cratedb_about/query/core.py

View check run for this annotation

Codecov / codecov/patch

src/cratedb_about/query/core.py#L142

Added line #L142 was not covered by tests
# model = "gemma3:1b" # noqa: ERA001
# model = "qwen3:0.6b" # noqa: ERA001
# model = "deepseek-r1:7b" # noqa: ERA001
reasoning = None
if model == "o4-mini":
reasoning = Reasoning(
Expand Down
21 changes: 0 additions & 21 deletions tests/test_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

import hishel
import httpx
import openai
import pytest

from cratedb_about import CrateDbKnowledgeConversation
Expand Down Expand Up @@ -54,26 +53,6 @@ def test_example_question():
assert "How to enumerate active jobs?" in Example.questions


def test_ask_openai_no_api_key():
"""
Validate inquiry with OpenAI, failing without an API key.
"""
with pytest.raises(ValueError) as ex:
CrateDbKnowledgeConversation()
assert ex.match("OPENAI_API_KEY environment variable is required when using 'openai' backend")


def test_ask_openai_invalid_api_key(mocker):
"""
Validate inquiry with OpenAI, failing when using an invalid API key.
"""
mocker.patch.dict("os.environ", {"OPENAI_API_KEY": "foo"})
knowledge = CrateDbKnowledgeConversation()
with pytest.raises(openai.AuthenticationError) as ex:
knowledge.ask("CrateDB does not seem to provide an AUTOINCREMENT feature?")
assert ex.match("Incorrect API key provided: foo")


@pytest.mark.skipif(sys.version_info < (3, 10), reason="requires Python 3.10 or higher")
def test_ask_claude_no_api_key():
"""
Expand Down