Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplemma analyzer #590

Closed
osma opened this issue May 27, 2022 · 0 comments · Fixed by #591
Closed

Simplemma analyzer #590

osma opened this issue May 27, 2022 · 0 comments · Fixed by #591
Milestone

Comments

@osma
Copy link
Member

osma commented May 27, 2022

Just like with spaCy (see #374) we could add an analyzer that uses simplemma for lemmatization. This is a very fast and lightweight multilingual lemmatizer which currently supports 38 languages. The lemmatization accuracy may not be as high as with e.g. Stanza (see #539) but in practice that doesn't seem to matter much based on experiments I've performed.

Simplemma is implemented in pure Python without external dependencies so I think it should be possible to include this as a core feature, not an optional one, unless there are any unexpected problems with e.g. supported Python versions.

@osma osma added this to the Short term milestone May 27, 2022
@osma osma closed this as completed in #591 May 30, 2022
@juhoinkinen juhoinkinen modified the milestones: Short term, 0.58 Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants