Replies: 2 comments
-
I discovered llama-server also offers a webif :P The output seems OK here. Any Python libs to talk to this API? My question is basically "How to use llama.cpp from another program" ![]() |
Beta Was this translation helpful? Give feedback.
0 replies
-
OK, so this is an "OpenAI compatible API". from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="kanker")
response = client.chat.completions.create(
[...] That works fine. For anyone reading, something like this helps (more useful than llama.cpp docs): https://blog.steelph0enix.dev/posts/llama-cpp-guide/ Thanks for coming to my TED talk! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Sorry, complete noob here.
llama-cli
works great, it summarizes a conversation for me from an input file (conversation.txt
)llama-server
has wildly different output and does not seem to follow my prompt exactlyI'd like to use llama.cpp from a program, and I don't want to call the
llama-cli
binary each time. The correct output should just be some bullet points, which works inllama-cli
, but notllama-server
.CLI example
./build/bin/llama-cli -m qwen3-06.gguf \ --prompt "You are a summarizer, summarize the following text conversation into points, use minimum of 4 points, and a maximum of 7 points. Output the key takeaways of the conversation:\n\n$(cat conversation.txt)" \ --temp 0.7 \ --top_p 0.9 \ --repeat_penalty 1.1
input (
conversation.txt
truncated)output
This is OK.
Server example
running
input
output
The output starts with:
And much more text. What makes it do that? How to get only the bullet points as with the CLI?
(In addition, I am also open to including
llama.h
directly into my C++ program if that is easier)Beta Was this translation helpful? Give feedback.
All reactions