-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simplemma analyzer #591
Conversation
Codecov Report
@@ Coverage Diff @@
## master #591 +/- ##
=======================================
Coverage 99.48% 99.48%
=======================================
Files 84 86 +2
Lines 5615 5645 +30
=======================================
+ Hits 5586 5616 +30
Misses 29 29
Continue to review full report at Codecov.
|
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
I did some test runs with MLLM and Parabel. There were no problems with parallellization. The initial version was a bit slow and memory-hungry so I implemented two optimizations, both copied from the Voikko analyzer:
Table of test results:
Observations from the table:
I think this is good enough for review and subsequent merging unless any new problems surface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I added documentation about the simplemma analyzer in the wiki. |
Fixes #590
This PR adds a
simplemma
analyzer based on the Simplemma lemmatizer. Opening as a draft PR to get feedback from QA tools. The analyzer itself needs more testing, e.g. the quality of results and how it works with multiprocessing.