Skip to content

Commit

Permalink
docs: add RWKV tokenization to tokenization README
Browse files Browse the repository at this point in the history
  • Loading branch information
danbev committed Sep 2, 2024
1 parent 02da473 commit b234445
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions notes/tokenization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ different tokenization types in llama.cpp:
* [WordPiece](./wordpiece.md)
* [SentencePiece](./sentencepiece.md)
* [Unigram](./unigram.md)
* [RWKV](./rwkv.md)

### Tokenization in llama.cpp
Llama.cpp supports the following types of tokenization:
Expand All @@ -25,6 +26,7 @@ Llama.cpp supports the following types of tokenization:
LLAMA_VOCAB_TYPE_BPE = 2, // GPT-2 tokenizer based on byte-level BPE
LLAMA_VOCAB_TYPE_WPM = 3, // BERT tokenizer based on WordPiece
LLAMA_VOCAB_TYPE_UGM = 4, // T5 tokenizer based on Unigram
LLAMA_VOCAB_TYPE_RWKV = 5, // RWKV tokenizer based on greedy tokenization
};
```
Expand Down

0 comments on commit b234445

Please sign in to comment.