-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(nlp): Bulgarian added to contenful #1250
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good job!! 💯 💯 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## contentful/nl #1250 +/- ##
=================================================
+ Coverage 64.62% 64.73% +0.11%
=================================================
Files 232 236 +4
Lines 6443 6469 +26
Branches 1115 1118 +3
=================================================
+ Hits 4164 4188 +24
Misses 1966 1966
- Partials 313 315 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Depends on #1249. Please, review it first.⚠️
Description
Bulgarian tokenizer, stemmer, and stopwords have been added to contentful nlp.
Context
Adding them will make it possible to process Bulgarian text.
Approach taken / Explain the design
The Tokenizer used is the base-tokenizer from the nlpjs library.
The Stemmer has been implemented having as a reference this GitHub repository.
The Stopwords have been collected from here.
Testing
The pull request...