Skip to content

Conversation

@jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented May 1, 2025

Handles special tokens before pretokenization and splitting in the c++ huggingface tokenizer.

Llama runner output now with special tokens + json tokenizer:

cmake-out/examples/models/llama/llama_main --model_path qwen3-0_6B_long_context.pte --tokenizer_path ./qwen3
_tokenizer/tokenizer.json --prompt="<|im_start|>system                                     
You are a helpful assistant.                                                               
<|im_end|>                                                                                                                                                                            
<|im_start|>user                                                                           
Tell me a story about Julius Caesar<|im_end|>                                              
<|im_start|>assistant                                                                      
<think>                                                                                                                                                                               
                                                                                                                                                                                      
</think>                                                                                   
                                                                                           
                                                                                                                                                                                      
                                                                                           
" --temperature 0.6 --seq_len 1024
>> Julius Caesar was a Roman general and statesman who played a pivotal role in the fall of the Roman Republic and the establishment of the Roman Empire. He is often remembered for his lea
dership, courage, and decisive actions during the Battle of the Colosseum in 44 BCE. However, there are also many stories about his life and death. Some believe that he was poisoned 
by his own brothers and died in 44 BCE, while others argue that his death was due to a conspiracy involving his family and political rivals. His legacy remains one of the most celebrated in Roman history.<|im_end|>                                                           
<|endoftext|>

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 1, 2025
@jackzhxng jackzhxng changed the base branch from main to jz/fix-null-eos-bos May 1, 2025 17:42
@jackzhxng jackzhxng merged commit 58a6381 into jz/fix-null-eos-bos May 1, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants