GPT-J Will Not Accept Certain Tokens in Prompt #212

danforbes · 2023-05-11T14:38:38Z

GPT-J does not like tokenizing certain characters when they appear in a prompt - so far I have only been able to induce this behavior with a ! character, but I haven't performed an exhaustive search.

llm: ./target/release/llm gptj infer -m ~/.ggml-models/gpt4all-j-v1.3-groovy.bin -p "!"
✓ Loaded 285 tensors (3.8 GB) after 1980ms

[2023-05-11T14:36:15Z ERROR llm] Failed to tokenize initial prompt.

The text was updated successfully, but these errors were encountered:

philpax · 2023-05-11T14:46:51Z

Our current tokenizer is built around scores. Perhaps we should use a simpler tokenizer for the models where it's known no score is present for the tokens?

LLukas22 · 2023-05-14T13:02:53Z

Couldn't we use huggingfaces tokenizer? Then we would have parity with nearly every implementation out there 🤔

philpax · 2023-05-15T23:03:51Z

Yeah maybe, see #35

danforbes · 2023-05-19T14:19:23Z

@RedBoxing - can you see if this is fixed on your RWKV branch?

RedBoxing · 2023-05-19T15:07:34Z

no issues at all !

danforbes added the issue:bug Something isn't working label May 11, 2023

RedBoxing mentioned this issue May 23, 2023

Add HuggingFace's Tokenizer #271

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-J Will Not Accept Certain Tokens in Prompt #212

GPT-J Will Not Accept Certain Tokens in Prompt #212

danforbes commented May 11, 2023

philpax commented May 11, 2023

LLukas22 commented May 14, 2023

philpax commented May 15, 2023

danforbes commented May 19, 2023

RedBoxing commented May 19, 2023

GPT-J Will Not Accept Certain Tokens in Prompt #212

GPT-J Will Not Accept Certain Tokens in Prompt #212

Comments

danforbes commented May 11, 2023

philpax commented May 11, 2023

LLukas22 commented May 14, 2023

philpax commented May 15, 2023

danforbes commented May 19, 2023

RedBoxing commented May 19, 2023