You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using lemmatization can result in better quality keyphrases, since similar keyphrases we will be grouped together.
Adding lemmatization as an option could be a great feature.
If the option is activated, the 'lemmatizer' component will be added to the spacy pipeline, and the lemma of words will be used instead of raw text to build keyphrases.
There should also be a function to retrieve lemmatized documents. They will be built and stored during the pipeline process. This is necessary to calculate tf-idf.
Hello, anyone still working on this issue? I think this would be a great feature as in my tests I see a lot of keywords selected which for example are singular and plural of each other.
Using lemmatization can result in better quality keyphrases, since similar keyphrases we will be grouped together.
Adding lemmatization as an option could be a great feature.
If the option is activated, the 'lemmatizer' component will be added to the spacy pipeline, and the lemma of words will be used instead of raw text to build keyphrases.
There should also be a function to retrieve lemmatized documents. They will be built and stored during the pipeline process. This is necessary to calculate tf-idf.
I started a branch to build this feature : https://github.com/Logora/KeyphraseVectorizers/tree/use_lemmatizer
The text was updated successfully, but these errors were encountered: