Chat tokenization fixes in generate.py & API #1035

vmpuri · 2024-08-15T17:51:39Z

Currently, only the chat() function would encode chat-style messages into the correct format during an interactive session.

I adapted the ChatFormat class to create a generic _ChatFormatter base class - each one includes the function encode_dialog_prompt which can take a series of message objects and encode the system prompt & user/assistant messages correctly.

Allows us to load in a whole conversation (i.e. message objects in a completion request) & keep the API/server stateless.

Test:

Initially, I prompted with the following cURL request:

curl http://127.0.0.1:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "stream": "true",
    "seed": "123",
    "max_tokens": "200",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant. Be as brief as possible, no yapping. Do not include additional details unless asked."
      },
      {
        "role": "user",
        "content": "List 3 early jet fighters."
      }
    ]}'

Then, I added the model's response and prompted it again with the following:

curl http://127.0.0.1:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "stream": "true",
    "seed": "123",
    "max_tokens": "200",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant. Be as brief as possible, no yapping. Do not include additional details unless asked."
      },
      {
        "role": "user",
        "content": "List 3 early jet fighters."
      },
      {
        "role": "system",
        "content": "Certainly! Here are three early jet fighters:\n\n1. Gloster Meteor (UK, 1943)\n2. Messerschmitt Me 262 (Germany, 1944)\n3. Lockheed P-80 Shooting Star (US, 1948)"
      },
      {
        "role": "user",
        "content": "Were there any notable ones from the Soviet Union or Japan?"
      }
    ]
  }'

And got the following response:

Yes, here are a few notable early jet fighters from the Soviet Union and Japan:

Soviet Union:

1. MiG-15 (1947)
2. Lavochkin La-150 (1948)
3. Yakovlev Yak-15 (1946)

Japan:

1. Nakajima J7W (1944)
2. Mitsubishi J8M (1945)
3. Kawasaki Ki-200 (1945)

These early jet fighters represented significant advancements in aeronautical technology and played important roles in their respective countries' militaries during the Cold War era.

pytorch-bot · 2024-08-15T17:51:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1035

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d90e33b with merge base 147c292 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jack-Khuu · 2024-08-16T23:24:33Z

lgtm, remember to verify that python3 torchchat.py generate and chat still works properly for llama2/3 with these changes as well?

Jack-Khuu · 2024-08-16T23:06:54Z

api/api.py

        )
+
+        encoded = torch.tensor(tokens, dtype=torch.int, device=self.builder_args.device)
+        print(self.tokenizer.decode(tokens))


Just checking that this is an intentional print

Yes - this prints out the prompt on the server side so that it's easy to track the full prompt solely from the server side.

However, this raises a larger issue in the generate/API stack - we need to replace print statements with a logger so that users can choose not to print these debug messages.

Jack-Khuu · 2024-08-16T23:08:30Z

api/api.py

-            and x.item() == self.tokenizer.special_tokens["<|eot_id|>"]
-        ):
-            buffer = buffer[:-1]  # drop the eot_id from the output buffer
+        pass


Why is this is a pass again?

The callback function is only used in generate() for the CLI interactive chat to print results to stdout. I initially copied this code naively when refactoring the original generate.py and copied it over to openaiapi where it isn't used.

prideout · 2024-08-25T01:20:26Z

I believe this PR introduced a regression because encode_header now takes a string, but there is a spot in the Generator::chat method that is still passing a dictionary.

Fix proposal in #1061

This fixes the following assert that is easy to repro in any chat session: ``` Traceback (most recent call last): File "/home/ubuntu/cali/torchchat/torchchat.py", line 69, in <module> generate_main(args) File "/home/ubuntu/cali/torchchat/generate.py", line 896, in main for _ in gen.chat(generator_args): File "/home/ubuntu/cali/torchchat/generate.py", line 748, in chat self.chat_formatter.encode_header( File "/home/ubuntu/cali/torchchat/generate.py", line 53, in encode_header tokens.extend(self.tokenizer.encode(role, bos=False, eos=False)) File "/home/ubuntu/cali/torchchat/tokenizer/tiktoken.py", line 133, in encode assert type(s) is str ``` I believe this regressed with #1035.

Fix tokenization of chat interfaces

76b8a5a

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 15, 2024

Merge branch 'main' into openai_api_chat_correctness

d90e33b

vmpuri requested a review from Jack-Khuu August 16, 2024 22:52

vmpuri marked this pull request as ready for review August 16, 2024 22:52

Jack-Khuu approved these changes Aug 16, 2024

View reviewed changes

Jack-Khuu mentioned this pull request Aug 19, 2024

Streamlit Browser Issue #1033

Closed

vmpuri merged commit c7f56f2 into main Aug 19, 2024
51 checks passed

vmpuri mentioned this pull request Aug 19, 2024

Open AI API Maturity #973

Open

prideout mentioned this pull request Aug 25, 2024

Fix assertion failure during chat session. #1061

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat tokenization fixes in generate.py & API #1035

Chat tokenization fixes in generate.py & API #1035

vmpuri commented Aug 15, 2024

pytorch-bot bot commented Aug 15, 2024 •

edited

Loading

Jack-Khuu commented Aug 16, 2024

Jack-Khuu Aug 16, 2024

vmpuri Aug 19, 2024

Jack-Khuu Aug 16, 2024

vmpuri Aug 19, 2024

prideout commented Aug 25, 2024 •

edited

Loading

Chat tokenization fixes in generate.py & API #1035

Chat tokenization fixes in generate.py & API #1035

Conversation

vmpuri commented Aug 15, 2024

pytorch-bot bot commented Aug 15, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1035

✅ No Failures

Jack-Khuu commented Aug 16, 2024

Jack-Khuu Aug 16, 2024

Choose a reason for hiding this comment

vmpuri Aug 19, 2024

Choose a reason for hiding this comment

Jack-Khuu Aug 16, 2024

Choose a reason for hiding this comment

vmpuri Aug 19, 2024

Choose a reason for hiding this comment

prideout commented Aug 25, 2024 • edited Loading

pytorch-bot bot commented Aug 15, 2024 •

edited

Loading

prideout commented Aug 25, 2024 •

edited

Loading