Fix the tokenizer

We should fix the issues found in the llama tokenizer by @vjeux. It is explained in detail here: https://github.com/ggerganov/llama.cpp/pull/252#issuecomment-1603672216

Might be a good first issue.