-
-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unit tests for tokenizers and filters #1156
Add unit tests for tokenizers and filters #1156
Conversation
@mocobeta That's a really good idea to cover these with tests |
@PSeitz thanks for your reply. I'll try to add tests for other components. |
@mocobeta awesome. I'm happy to merge this, but the PR is in draft state. |
Thanks, @fulmicoton. I'd like to include a few more tests for other tokenizers, and then I will open this soon. |
Codecov Report
@@ Coverage Diff @@
## main #1156 +/- ##
==========================================
+ Coverage 93.78% 93.86% +0.08%
==========================================
Files 203 203
Lines 33654 33994 +340
==========================================
+ Hits 31561 31910 +349
+ Misses 2093 2084 -9
Continue to review full report at Codecov.
|
I think this covers all existing tokenizers/filters. Tests added here are very basic ones though, I hope this will be of some help. |
@mocobeta It definitely helps! Thank you! |
Some tokenizers/filters seem to have no unit test (e.g. SimpleTokenizer and StopWordFilter).
I think it would be nice to add basic tests for them for future development. To start with, I added a test module for SimpleTokenizer; does it make sense?