-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Closed
Description
The BlenderBot tokenizer has been trained to treat spaces like parts of the tokens (a bit like sentencepiece) so a word will be encoded differently whether it is at the beginning of the sentence (without space) or not. However, the examples in BlenderBot Tokenizer (BlenderbotTokenizer) are the same:
| [6950, 1085, 2] |
The same issue also occurs in BlenderbotTokenizerFast:
| [6950, 1085, 2] |
Metadata
Metadata
Assignees
Labels
No labels