Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gibberish responses with Llama-2-13B #596

Open
rlleshi opened this issue Aug 10, 2023 · 9 comments
Open

Gibberish responses with Llama-2-13B #596

rlleshi opened this issue Aug 10, 2023 · 9 comments
Labels
model Model specific issue quality Quality of model output

Comments

@rlleshi
Copy link

rlleshi commented Aug 10, 2023

I am testing this nice python wrapper for llama.cpp. But the model's responses don't make much sense.

llm = Llama(model_path="./models/llama-2-13b.ggmlv3.q4_0.bin", n_gpu_layers=35, n_ctx=2048)
output = llm("What is the capital of Germany? Answer only with the name of the capital.", echo=True, temperature=0, max_tokens=512)

Gives the following output:

What is the capital of Germany? Answer only with the name of the capital.
What is the capital of France? Answer only with the name of the capital.
What is the capital of Italy? Answer only with the name of the capital.
What is the capital of Spain? Answer only with the name of the capital.
What is the capital of Portugal? Answer only with the name of the capital.
....

I wonder if the default hyperparameters of llama-cpp-python significantly differ from llama.cpp?

Either way this kind of response shouldn't be the case. I tested similar prompts and the model easily breaks down like above.

Needless to say the responses are as expected from using llama.cpp itself.

Am I missing something?

@gjmulder gjmulder added model Model specific issue quality Quality of model output labels Aug 10, 2023
@rd-neosoft
Copy link

Had similar issue in huggingface, adding repetition_penalty along with temperature, top_p and top_k solved my issue

@rlleshi
Copy link
Author

rlleshi commented Aug 14, 2023

I mean temperature shouldn't really play a role in gibberish responses, right?

Otherwise, the other parameters are set by default in llama-cpp-python (and I think repetition_penalty is being somehow used downstream). Did you use some different values from top_k=40 & top_p=0.95?

@rlleshi rlleshi changed the title Gibberish responses Gibberish responses with Llama-2-13B Aug 15, 2023
@rlleshi rlleshi mentioned this issue Aug 15, 2023
5 tasks
@rd-neosoft
Copy link

rd-neosoft commented Aug 17, 2023

@rlleshi For repetition_penalty yes it is being used i think by default its 1.0 but when we are getting some gibberish responses its better to try some different repetition_penalty, top_k and top_p
In my case repetition_penalty = 1.2, top_k=10, top_p = 0.95 worked for me

@rlleshi
Copy link
Author

rlleshi commented Aug 18, 2023

So I've managed to avoid the repetition and gibberish output problems. But still, the output of the model using this python binding is far from being as robust and good as the output of llama.cpp.

Just one example below:

Prompt: What is 10+10 - 100?

llama.cpp: The calculation is as follows:\n\n10 + 10 = 20\n\n20 - 100 = -80
llama-cpp-python: 20

@geoHeil
Copy link

geoHeil commented Sep 7, 2023

Interesting - I am facing similar issues. @rlleshi were you able to fully reslove these by now?

@rlleshi
Copy link
Author

rlleshi commented Sep 18, 2023

@geoHeil nope

@nkgrush
Copy link

nkgrush commented Sep 20, 2023

@rlleshi @geoHeil I see that there is a problem with your llama prompting scheme, it was poorly documented on llama-2 release, but system and user prompt must always be enclosed in [INST] all text that is not generated by model goes here [/INST]. [INST] is a special sequence (multiple tokens) to mark user instructions. Otherwise you will get next-token completions rather than chat responses, it seems to be the case here, rather than an issue with llama-cpp-python.

See docs for the complete prompting scheme: https://huggingface.co/blog/llama2#how-to-prompt-llama-2

@rlleshi
Copy link
Author

rlleshi commented Sep 25, 2023

@nkgrush I actually tried out the official prompting. But I was actually getting worse results like that.

Experimented with a bunch of other prompts. What was working best was: """{user_input} \n\n### Response:\n""" with stop being: ["###"]. And the content of user input being like so: '### {m["role"]}: {m["content"]}'

@earonesty
Copy link
Contributor

the version 0.2.7 got the chat./user/sys prompting stuff right. later versions seem to break. try downgrading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Model specific issue quality Quality of model output
Projects
None yet
Development

No branches or pull requests

6 participants