We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To do the inference with just c and without sentence piece processor one easy way would be to save the id to token in the model.bin?
tokenizer = SentencePieceProcessor(tokenizer_model) vocab = [tokenizer.id_to_piece(id) for id in range(tokenizer.get_piece_size())]
and then just an array to get the proper token from id
auto decode(auto id) { return vocab[id]; }
That would allow not to use the run_wrap.py and it would be in pure C (kinda).
The text was updated successfully, but these errors were encountered:
Ref: https://github.com/google/sentencepiece/blob/master/src/spm_decode_main.cc
Sorry, something went wrong.
Closing old issue. This has been implemented in run.c for a while with ASCII, and UTF-8 support is about to merge soon in #226 .
No branches or pull requests
To do the inference with just c and without sentence piece processor one easy way would be to save the id to token in the model.bin?
and then just an array to get the proper token from id
That would allow not to use the run_wrap.py and it would be in pure C (kinda).
The text was updated successfully, but these errors were encountered: