Skip to content

LeoMax-Xiong/leomax-tokenizers

Repository files navigation

leomax_tokenizer

这个仓库是对 fast_tokenizer 的学习

编译环境

ubuntu

gcc-10.5

macos

clang-14.0.3

分词算法

WordPiece

  • 测试分词词典
wget https://bj.bcebos.com/paddlenlp/models/transformers/ernie/vocab.txt

About

对tokenizers进行封装

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published