-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an abstract class for Tokenizer #53
Conversation
I see that #40 changed the implementation to output only token IDs instead of strings. Will need to update this PR with the new changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add return types for encode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, all looks good to me now.
Thanks for the PR, this looks great. I'd appreciate if you can also add unittests (or update existing ones) for the changes. |
* Add an abstract class for tokenizer * Add sentence piece tokenizer as a subclass of Tokenizer * Fix decode method for SentencePieceTokenizer * Fix circular import issue * fix type annotations * fix linting issues * Format files using pyink * Update the tokenizer decode interface to return ids instead of str * format using pyink * Move Tokenizer class to a tokenizer_api.py file * Update engine.build_tokenizer method to return SentencePieceTokenizer by default
* Add an abstract class for tokenizer * Add sentence piece tokenizer as a subclass of Tokenizer * Fix decode method for SentencePieceTokenizer * Fix circular import issue * fix type annotations * fix linting issues * Format files using pyink * Update the tokenizer decode interface to return ids instead of str * format using pyink * Move Tokenizer class to a tokenizer_api.py file * Update engine.build_tokenizer method to return SentencePieceTokenizer by default
No description provided.