Skip to content

Conversation

@amotl
Copy link
Member

@amotl amotl commented May 19, 2025

About

Just a sketch to relay the idea how this would look like.

$ history | grep "ollama pull"
455376  ollama pull deepseek-r1:7b
455410  ollama pull gemma3:0.6b
455410  ollama pull gemma3:1b
455380  ollama pull gemma3:4b
455383  ollama pull llama3.2
455373  ollama pull qwen3:0.6b

References

@coderabbitai
Copy link

coderabbitai bot commented May 19, 2025

Walkthrough

A new configuration file for LiteLLM integration with Ollama models was added. The codebase was updated to use local model endpoints instead of OpenAI’s API, removing related environment variable checks and hardcoded API keys. Corresponding OpenAI API key-related tests were deleted from the test suite.

Changes

File(s) Change Summary
etc/litellm-config.yaml Added new configuration file for LiteLLM, specifying Ollama model endpoints and parameters.
src/cratedb_about/query/core.py Removed OpenAI API key environment check; switched to local model endpoints and updated model defaults.
tests/test_query.py Removed tests for OpenAI API key presence and invalid key errors; dropped related imports.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CrateDbKnowledgeConversation
    participant LiteLLM (Ollama)

    User->>CrateDbKnowledgeConversation: ask_gpt(question)
    CrateDbKnowledgeConversation->>LiteLLM (Ollama): send request (local endpoint, model="llama3.2")
    LiteLLM (Ollama)-->>CrateDbKnowledgeConversation: response
    CrateDbKnowledgeConversation-->>User: answer
Loading

Poem

In burrows deep, the models grew,
With local calls, not clouds of blue.
Old key checks gone, the code is light,
Ollama’s models now in sight.
Tests retired, new configs in tow,
The rabbit hops—let queries flow! 🐇✨

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Commit Unit Tests in branch ollama
  • Post Copyable Unit Tests in Comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (2)
src/cratedb_about/query/core.py (2)

33-39: 🛠️ Refactor suggestion

Update class docstring to reflect Ollama/LiteLLM usage.

The docstring still references the OPENAI_API_KEY environment variable requirement, but the code no longer requires this since it's using a local LiteLLM proxy with Ollama models.

    """
    Manage conversations about CrateDB.

    Requires:
-   - OPENAI_API_KEY environment variable when using "openai" backend
+   - LiteLLM proxy running at http://localhost:4000 when using "openai" backend
+   - Ollama running at http://localhost:11434 with required models
    - ANTHROPIC_API_KEY environment variable when using "claude" backend
    """

94-107: 🛠️ Refactor suggestion

Update function docstring to reflect Ollama/LiteLLM usage.

The docstring for the ask_gpt method still references OpenAI's GPT and links to OpenAI's documentation, but the implementation now uses local Ollama models through LiteLLM.

    """
-   Ask the machine, enriched with CrateDB context, catalyzed through OpenAI's GPT.
+   Ask the machine, enriched with CrateDB context, using Ollama models through LiteLLM proxy.
-   Models like o3 and o4-mini are reasoning models.
-   https://platform.openai.com/docs/guides/reasoning
+   The function uses local Ollama models configured in etc/litellm-config.yaml.
+   See https://docs.litellm.ai/docs/providers/ollama for more information.

    The OpenAI API provides different kinds of roles for messages. Let's use the
    `developer` role to relay information on top of the user's question.
🧹 Nitpick comments (2)
etc/litellm-config.yaml (2)

41-43: Consider uncommenting and configuring LiteLLM settings.

The commented LiteLLM settings might be useful for debugging and configuration. Consider uncommenting and configuring these based on your needs.

-#litellm_settings:
-#  drop_params: True
-#  set_verbose: True
+litellm_settings:
+  drop_params: True
+  set_verbose: ${LITELLM_VERBOSE:-False}
+  # Additional useful settings:
+  # cache: True
+  # timeout: 120

31-40: Consider removing or documenting the commented example.

The commented example for ollama-codellama could be useful, but it's not clear if it's intended to be used in the future or if it's just reference material.

Either:

  1. Remove the commented example if not needed
  2. Add a comment explaining its purpose, e.g., "Example for future use with larger models"
  3. Move such examples to a separate documentation file
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbb8cd3 and 59602e1.

📒 Files selected for processing (3)
  • etc/litellm-config.yaml (1 hunks)
  • src/cratedb_about/query/core.py (2 hunks)
  • tests/test_query.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/test_query.py
🔇 Additional comments (1)
src/cratedb_about/query/core.py (1)

142-145: Ensure only one model is active.

The code now defaults to using the "llama3.2" model with alternative models commented out. This is aligned with the LiteLLM configuration but should be reviewed to ensure it matches the intended model for your use case.

You might want to verify model performance and resource requirements - the listed models have different capabilities and memory requirements. For example:

  • llama3.2 is the most powerful among these models
  • gemma3:1b is the smallest and fastest
  • deepseek-r1:7b is more specialized for code generation

"""

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
client = OpenAI(api_key="n/a", base_url="http://localhost:4000")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace hardcoded connection details with environment variables.

Using hardcoded connection details (api_key="n/a", base_url="http://localhost:4000") makes the code less flexible and harder to configure in different environments. This approach also doesn't align with security best practices.

- client = OpenAI(api_key="n/a", base_url="http://localhost:4000")
+ client = OpenAI(
+     api_key=os.environ.get("LITELLM_API_KEY", "n/a"),
+     base_url=os.environ.get("LITELLM_BASE_URL", "http://localhost:4000")
+ )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
client = OpenAI(api_key="n/a", base_url="http://localhost:4000")
client = OpenAI(
api_key=os.environ.get("LITELLM_API_KEY", "n/a"),
base_url=os.environ.get("LITELLM_BASE_URL", "http://localhost:4000")
)
🤖 Prompt for AI Agents
In src/cratedb_about/query/core.py at line 109, replace the hardcoded OpenAI
client connection details with environment variables. Use appropriate
environment variable access methods to retrieve the API key and base URL,
ensuring the code reads these values dynamically for better flexibility and
security.

Comment on lines +10 to +29
model_list:
- model_name: "llama3.2"
litellm_params:
model: "ollama/llama3.2"
api_base: "http://localhost:11434"

- model_name: "gemma3:1b"
litellm_params:
model: "ollama/gemma3:1b"
api_base: "http://localhost:11434"

- model_name: "qwen3:0.6b"
litellm_params:
model: "ollama/qwen3:0.6b"
api_base: "http://localhost:11434"

- model_name: "deepseek-r1:7b"
litellm_params:
model: "ollama/deepseek-r1:7b"
api_base: "http://localhost:11434"
Copy link

@coderabbitai coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Model configuration looks good but URL discrepancy needs explanation.

The model configurations are correctly structured for LiteLLM with Ollama. However, there's a discrepancy between the Ollama server URL here (http://localhost:11434) and the LiteLLM proxy URL in the Python code (http://localhost:4000).


🏁 Script executed:

#!/bin/bash
# Check the architecture to understand the relationship between URLs
echo "Checking for LiteLLM proxy configuration files"
fd -t f -g "*.y*ml" -g "*.json" | xargs grep -l "4000" || echo "No references to port 4000 found in config files"

echo "Checking if there are any scripts starting LiteLLM proxy"
fd -t f -g "*.sh" -g "*.py" | xargs grep -l "litellm" || echo "No scripts found starting LiteLLM proxy"

Length of output: 716


🏁 Script executed:

#!/bin/bash
# Search for any references to port 4000 across the repository
rg -n "4000"

# Specifically look for the LiteLLM proxy URL pattern
rg -n "localhost:4000"

Length of output: 253


Clarify LiteLLM proxy vs Ollama server URL discrepancy

There’s a mismatch between the Ollama server endpoint in etc/litellm-config.yaml and the OpenAI client base URL in your Python code:

  • etc/litellm-config.yaml (lines 10–29):
    api_base: http://localhost:11434

  • src/cratedb_about/query/core.py (line 109):
    client = OpenAI(..., base_url="http://localhost:4000")

Please confirm which port your application should use and align both configurations. You can either:

  • Update the Python client to use http://localhost:11434 if it’s talking directly to Ollama.
  • Or expose your LiteLLM proxy on port 4000 and update the YAML to point at http://localhost:4000, documenting this setup for future reference.
🤖 Prompt for AI Agents
In etc/litellm-config.yaml around lines 10 to 29, the api_base URL is set to
http://localhost:11434, but in src/cratedb_about/query/core.py at line 109, the
OpenAI client uses base_url http://localhost:4000. You need to decide which port
your application should consistently use. Either update the Python client
base_url to http://localhost:11434 to match the Ollama server URL in the YAML,
or change the YAML api_base to http://localhost:4000 if you intend to use the
LiteLLM proxy on that port. Also, add documentation to clarify this
configuration choice for future maintainers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai: In etc/litellm-config.yaml, you can find the command uvx litellm --config etc/litellm-config.yaml --detailed_debug. This one will start a server listening on port 4000. It may help to connect the dots.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@amotl Thank you for clarifying! I understand the architecture now:

  1. Ollama server runs on port 11434 (as configured in your YAML file)
  2. The uvx litellm command starts a LiteLLM proxy server on port 4000 (which is indeed the default port for LiteLLM proxy)
  3. Your Python code connects to this LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama

This explains the apparent discrepancy perfectly. To make this clearer for future maintainers, you might consider adding a brief comment in the Python code explaining this proxy relationship, but it's certainly not a critical issue.


🌐 Web query:

What is the default port for LiteLLM proxy server?

💡 Result:

The default port for the LiteLLM proxy server is 5000. This is the port used by the server when it is started without specifying a different port. If you need to change the port, you can do so by setting the PORT environment variable to your desired port number.


@amotl Thank you for clarifying! I understand the architecture now, though I should correct my previous statement about the default port.

The standard architecture here appears to be:

  1. Ollama server runs on port 11434 (as configured in your YAML file)
  2. The uvx litellm command starts a LiteLLM proxy server on port 4000 (note: while 5000 is the default LiteLLM proxy port, it seems your setup is configured to use 4000)
  3. Your Python code connects to this LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama

This explains the apparent discrepancy perfectly. Is the port 4000 for LiteLLM configured elsewhere in your setup, or is it handled through the uvx wrapper command?


✏️ Learnings added
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Member Author

@amotl amotl May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai: Exactly. Port 4000 seems to be the standard port used by LiteLLM.

The port used to listen for LiteLLM Proxy traffic is port 4000.

-- https://techcommunity.microsoft.com/blog/educatordeveloperblog/step-by-step-integrate-ollama-web-ui-to-use-azure-open-ai-api-with-litellm-proxy/4386612

Indeed, using port 5000 seems to be a common choice to use for OpenAI-compatible API servers otherwise.

-- https://www.abstractoid.com/blog/using-langchain-and-text-generation-webui-local-llm/
-- https://community.openai.com/t/cross-origin-resource-sharing-cors/28905/5

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl Thank you for the confirmation and for providing that helpful reference about port 4000 being the standard for LiteLLM proxy traffic. This matches perfectly with the architecture you've set up.


⛔ Learnings removed
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

🧠 Learnings used
Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

Learnt from: amotl
PR: crate/about#44
File: etc/litellm-config.yaml:10-29
Timestamp: 2025-05-19T21:40:31.731Z
Learning: In the Ollama + LiteLLM setup for the "about" project, there's a two-tier architecture: Ollama server runs on port 11434, while LiteLLM proxy (started with `uvx litellm --config etc/litellm-config.yaml --detailed_debug`) runs on port 4000. The Python code connects to the LiteLLM proxy at http://localhost:4000, which then forwards requests to Ollama.

Comment on lines +1 to +8
# LiteLLM supports all models from Ollama.
# https://docs.litellm.ai/docs/providers/ollama
#
# uvx litellm --config etc/litellm-config.yaml --detailed_debug
#
# TODO: Use `ollama_chat`
# We recommend using ollama_chat for better responses.
---
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Good documentation with usage instructions.

The comments provide clear guidance on the purpose of this configuration file and how to use it with LiteLLM. The TODO note about using ollama_chat for better responses is helpful.

Consider implementing the ollama_chat recommendation mentioned in the TODO:


🌐 Web query:

What is the difference between ollama and ollama_chat in LiteLLM?

💡 Result:

Difference Between ollama and ollama_chat in LiteLLM

Overview

LiteLLM provides a unified interface to interact with multiple LLM providers, including those running on Ollama. Within LiteLLM, you may encounter references to both ollama and ollama_chat when specifying models or endpoints. The distinction between these two directly affects how requests are handled and the features available when working with local LLMs via Ollama.


Key Differences

Feature ollama ollama_chat
Endpoint/API Targets the general Ollama completion endpoint Uses the Ollama Chat endpoint (/api/chat)
Message Format Designed for single-turn completions Supports multi-turn, chat-style prompts
Functionality Basic text completion Enhanced chat, function/tool-calling, better structured outputs
Model Name Prefix e.g., ollama/llama2 e.g., ollama_chat/llama3.1
Recommendation Less recommended for chat applications Preferred for chat-like and function-calling workflows
Tools/Function Calling Limited Better/conformant support for tools and structured outputs
JSON Mode/Structured Output Limited or less robust Preferred for improved, structured JSON outputs
Compatibility Good for simple completions Better for OpenAI-style chat API compatibility

Details and Usage

  • ollama:

    • Refers to using the classic completion endpoint of Ollama, suitable for single-prompt text completions.
    • Example model name: ollama/llama2
    • While you can send "messages" (to mimic chat), the underlying behavior is closer to a completion API found in many LLM services.
    • Limited support for advanced chat features like function-calling or multi-turn context[8][1].
  • ollama_chat:

    • Targets the /api/chat endpoint of the Ollama server, which is designed for chat-style, multi-turn interactions.
    • Example model name: ollama_chat/llama3.1
    • Explicitly recommended for better responses in chat scenarios and improved handling of features like JSON mode, tool/function calling, and structured outputs[1][3].
    • Provides enhanced compatibility with OpenAI's Chat API, making it the preferred choice when building chatbots, agents, or when using workflows that require rich, structured responses.

“We recommend using ollama_chat for better responses.”[1]

Technical Implementation

  • The handling logic for ollama_chat in LiteLLM includes additional steps for message serialization, tool/function call conversion, and setting parameters like format, tools, or stream to optimize chat interactions[3][4].
  • Token accounting and model response parsing might also differ, with ollama_chat being tailored for the interactive chat-based features.

Summary

  • Use ollama when you want simpler, single-turn text completions.
  • Use ollama_chat for advanced, multi-turn chat applications, function/tool calls, and when you need better structured output or OpenAI Chat API compatibility. It is the recommended choice for most modern LLM/chatbot workflows in LiteLLM[1][7].

Tip: Always check the documentation or example usage for your framework or integration, as the recommended model string (ollama_chat/<model_name>) can yield better results for interactive use cases[1][3].

Citations:


Implement ollama_chat for Enhanced Chat Support

Based on the LiteLLM docs, the ollama_chat endpoint provides multi-turn chat, function/tool calling, and better structured (JSON) outputs compared to the basic ollama completion endpoint. To leverage these chat features, please update your etc/litellm-config.yaml:

• Replace any ollama/<model> references with ollama_chat/<model> (e.g. ollama_chat/llama3.1).
• If you’re specifying a provider key, it should look like:

provider: ollama_chat/llama3.1

• Update any CLI examples accordingly:

uvx litellm --config etc/litellm-config.yaml --detailed_debug
# becomes:
uvx litellm --config etc/litellm-config.yaml --provider ollama_chat/llama3.1 --detailed_debug

This change ensures you get the recommended chat-style behavior and structured outputs.

🤖 Prompt for AI Agents
In etc/litellm-config.yaml lines 1 to 8, the configuration currently references
the basic `ollama` provider which supports only single-turn completions. To
enable enhanced multi-turn chat features, function/tool calling, and better
structured outputs, update all `ollama/<model>` references to
`ollama_chat/<model>` (e.g., `ollama_chat/llama3.1`). Also, update any CLI usage
examples to specify the provider as `ollama_chat/<model>` to ensure the
configuration uses the recommended chat endpoint for improved responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant