Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3 - Unrelated tokens at generation end (server). #6837

Closed
JMPSequeira opened this issue Apr 23, 2024 · 3 comments
Closed

Llama 3 - Unrelated tokens at generation end (server). #6837

JMPSequeira opened this issue Apr 23, 2024 · 3 comments

Comments

@JMPSequeira
Copy link

OS: Debian 12
Llama.cpp version: b2715
Model: Llama 3 - 8b Instruct

The model was converted from the hf Meta repo using ./convert.py ~/ai/hf-models/Llama-3-8b-Instruct/ --outfile ~/ai/unquantized/Llama-3-8b_fp16.gguf --vocab-type bpe --outtype f16.

Running ./server -m ~/ai/unquantized/Llama-3-8b_fp16.gguf -ngl 33 -ts 1,1 --host 0.0.0.0 --port 8080 I start getting unrelated tokens at the 2nd or 3rd generation.
Here's an example:

User: Hello

Llama3: Hello!

What is your name?
The 
Llama3
 bot responds with a greeting and then asks the user for their name.

Here's another example:
python

Sometimes it generates ad eternum:

User: Hello

Llama3: Hello!


User: Tell me a joke

Llama3: Why did the llama refuse to play poker? Because it always got fleeced!
The 
@bot
 decorator is used to define a function that will be called when a message is received. The 
message.text
 property contains the text of the incoming message, and we use this to respond accordingly.

In our example, if the user sends "Hello", Llama3 responds with "Hello!". If they ask for a joke, it tells them one about a llama refusing to play poker because it always gets fleeced (a common phrase meaning to be cheated or swindled).

You can add more functions and responses as needed. For example:

@bot.on_message(text='help')
def help(message):
message.reply('I can tell you jokes, answer questions, and do math!')
This would allow users to send the word "help" to get a list of available commands.

Running the bot


To run the Llama3 bot, save this code in a file (e.g., 
llama3.py
) and install the required libraries by running:

pip install python-telegram-bot
Then, you can start the bot using:

python llama3.py
This will launch the bot on your local machine. You'll need to set up a Telegram account and create a new chat with Llama3 (or whatever name you choose) for it to work.

Limitations


Keep in mind that this is just a basic example, and there are many ways to improve and expand upon this code. Some limitations include:

* The bot only responds to text messages.
* It doesn't handle non-English languages or special characters well (you may need to add additional libraries for language support).
* There's no error handling or logging in place; you'll want to add these features as your bot becomes more complex.

I hope this helps! Let me know if you have any questions. 😊

---

Update: I've added a simple math function and some basic error handling:

@bot.on_message(text='math (.*)')
def do_math(message):
try:
num1, operator, num2 = message.text.split()
result = eval(f'{num1} {operator} {num2}')
message.reply(str(result))
except Exception as e:
message.reply('Error: ' + str(e))

@bot_on (Update: �️
---

Update: I've added a simple math function and some basic error handling:

* �️
---
Update: 😊
Update: �️
Update: �‍
Limit Update: �️
Error Update: �️
Update: �️
Do Math ( .*):
�️ Update: �️
Do Math ( .*):
�️ Update: �️
Do Math ( . *):
�️ Update: �️
Do Math ( . ):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .): �️
Update: �‍
Limit Update: �️
Do Math ( . ):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
&#� Do math ( .):
# Do math ( .):
# Do math ( .):
# Do math ( .):
# Do math ( .):
# Do math ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):

...and it continued until stop.

Here are my options:
image

@QueryType
Copy link

This is happening with the server, even with the older models that worked perfectly well before. If we revert to older releases, this issue is not faced.

@ggerganov
Copy link
Owner

The default server UI does not work with instruct models because it uses the /completion endpoint and it's own chat template - not the one of the model. Either use a base model, or a client that supports the /chat/completion endpoint

@JMPSequeira
Copy link
Author

Noted, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants