Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

GPT-J Will Not Accept Certain Tokens in Prompt #212

Open
danforbes opened this issue May 11, 2023 · 5 comments
Open

GPT-J Will Not Accept Certain Tokens in Prompt #212

danforbes opened this issue May 11, 2023 · 5 comments
Labels
issue:bug Something isn't working

Comments

@danforbes
Copy link
Contributor

GPT-J does not like tokenizing certain characters when they appear in a prompt - so far I have only been able to induce this behavior with a ! character, but I haven't performed an exhaustive search.

llm: ./target/release/llm gptj infer -m ~/.ggml-models/gpt4all-j-v1.3-groovy.bin -p "!"
✓ Loaded 285 tensors (3.8 GB) after 1980ms

[2023-05-11T14:36:15Z ERROR llm] Failed to tokenize initial prompt.
@danforbes danforbes added the issue:bug Something isn't working label May 11, 2023
@philpax
Copy link
Collaborator

philpax commented May 11, 2023

Our current tokenizer is built around scores. Perhaps we should use a simpler tokenizer for the models where it's known no score is present for the tokens?

@LLukas22
Copy link
Contributor

Couldn't we use huggingfaces tokenizer? Then we would have parity with nearly every implementation out there 🤔

@philpax
Copy link
Collaborator

philpax commented May 15, 2023

Yeah maybe, see #35

@danforbes
Copy link
Contributor Author

@RedBoxing - can you see if this is fixed on your RWKV branch?

@RedBoxing
Copy link
Contributor

no issues at all !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants