Llama 3 - Unrelated tokens at generation end (server). #6837

JMPSequeira · 2024-04-23T01:30:19Z

OS: Debian 12
Llama.cpp version: b2715
Model: Llama 3 - 8b Instruct

The model was converted from the hf Meta repo using ./convert.py ~/ai/hf-models/Llama-3-8b-Instruct/ --outfile ~/ai/unquantized/Llama-3-8b_fp16.gguf --vocab-type bpe --outtype f16.

Running ./server -m ~/ai/unquantized/Llama-3-8b_fp16.gguf -ngl 33 -ts 1,1 --host 0.0.0.0 --port 8080 I start getting unrelated tokens at the 2nd or 3rd generation.
Here's an example:

User: Hello

Llama3: Hello!

What is your name?
The 
Llama3
 bot responds with a greeting and then asks the user for their name.

Here's another example:
python

Sometimes it generates ad eternum:

User: Hello

Llama3: Hello!


User: Tell me a joke

Llama3: Why did the llama refuse to play poker? Because it always got fleeced!
The 
@bot
 decorator is used to define a function that will be called when a message is received. The 
message.text
 property contains the text of the incoming message, and we use this to respond accordingly.

In our example, if the user sends "Hello", Llama3 responds with "Hello!". If they ask for a joke, it tells them one about a llama refusing to play poker because it always gets fleeced (a common phrase meaning to be cheated or swindled).

You can add more functions and responses as needed. For example:

@bot.on_message(text='help')
def help(message):
message.reply('I can tell you jokes, answer questions, and do math!')
This would allow users to send the word "help" to get a list of available commands.

Running the bot


To run the Llama3 bot, save this code in a file (e.g., 
llama3.py
) and install the required libraries by running:

pip install python-telegram-bot
Then, you can start the bot using:

python llama3.py
This will launch the bot on your local machine. You'll need to set up a Telegram account and create a new chat with Llama3 (or whatever name you choose) for it to work.

Limitations


Keep in mind that this is just a basic example, and there are many ways to improve and expand upon this code. Some limitations include:

* The bot only responds to text messages.
* It doesn't handle non-English languages or special characters well (you may need to add additional libraries for language support).
* There's no error handling or logging in place; you'll want to add these features as your bot becomes more complex.

I hope this helps! Let me know if you have any questions. 😊

---

Update: I've added a simple math function and some basic error handling:

@bot.on_message(text='math (.*)')
def do_math(message):
try:
num1, operator, num2 = message.text.split()
result = eval(f'{num1} {operator} {num2}')
message.reply(str(result))
except Exception as e:
message.reply('Error: ' + str(e))

@bot_on (Update: �️
---

Update: I've added a simple math function and some basic error handling:

* �️
---
Update: 😊
Update: �️
Update: �‍
Limit Update: �️
Error Update: �️
Update: �️
Do Math ( .*):
�️ Update: �️
Do Math ( .*):
�️ Update: �️
Do Math ( . *):
�️ Update: �️
Do Math ( . ):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .): �️
Update: �‍
Limit Update: �️
Do Math ( . ):
�️ Do Math ( .):
�️ Do Math ( .):
�️ Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
� Do Math ( .):
&#� Do math ( .):
# Do math ( .):
# Do math ( .):
# Do math ( .):
# Do math ( .):
# Do math ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do ( .):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):
# Do (0):

...and it continued until stop.

Here are my options:

The text was updated successfully, but these errors were encountered:

QueryType · 2024-04-23T05:49:06Z

This is happening with the server, even with the older models that worked perfectly well before. If we revert to older releases, this issue is not faced.

ggerganov · 2024-04-23T06:44:36Z

The default server UI does not work with instruct models because it uses the /completion endpoint and it's own chat template - not the one of the model. Either use a base model, or a client that supports the /chat/completion endpoint

JMPSequeira · 2024-04-23T07:24:53Z

Noted, thanks.

JMPSequeira added the bug-unconfirmed label Apr 23, 2024

JMPSequeira closed this as completed Apr 23, 2024

QueryType mentioned this issue Apr 24, 2024

OpenAI-Compatible Chat Completions API Endpoint Responses include EOS / stop tokens #6859

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3 - Unrelated tokens at generation end (server). #6837

Llama 3 - Unrelated tokens at generation end (server). #6837

JMPSequeira commented Apr 23, 2024

QueryType commented Apr 23, 2024

ggerganov commented Apr 23, 2024

JMPSequeira commented Apr 23, 2024

Llama 3 - Unrelated tokens at generation end (server). #6837

Llama 3 - Unrelated tokens at generation end (server). #6837

Comments

JMPSequeira commented Apr 23, 2024

QueryType commented Apr 23, 2024

ggerganov commented Apr 23, 2024

JMPSequeira commented Apr 23, 2024