Lemmatizing documents and keyphrases #9

hboisgibault · 2022-04-06T09:02:53Z

Using lemmatization can result in better quality keyphrases, since similar keyphrases we will be grouped together.
Adding lemmatization as an option could be a great feature.

If the option is activated, the 'lemmatizer' component will be added to the spacy pipeline, and the lemma of words will be used instead of raw text to build keyphrases.
There should also be a function to retrieve lemmatized documents. They will be built and stored during the pipeline process. This is necessary to calculate tf-idf.

I started a branch to build this feature : https://github.com/Logora/KeyphraseVectorizers/tree/use_lemmatizer

TimSchopf · 2022-06-18T21:03:20Z

Feel free to open a PR in the lemmatizer branch. I will then add this feature in a later release.

asmaier · 2024-01-07T10:08:33Z

Hello, anyone still working on this issue? I think this would be a great feature as in my tests I see a lot of keywords selected which for example are singular and plural of each other.

TimSchopf added the enhancement New feature or request label Jun 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lemmatizing documents and keyphrases #9

Lemmatizing documents and keyphrases #9

hboisgibault commented Apr 6, 2022 •

edited

Loading

TimSchopf commented Jun 18, 2022 •

edited

Loading

asmaier commented Jan 7, 2024

Lemmatizing documents and keyphrases #9

Lemmatizing documents and keyphrases #9

Comments

hboisgibault commented Apr 6, 2022 • edited Loading

TimSchopf commented Jun 18, 2022 • edited Loading

asmaier commented Jan 7, 2024

hboisgibault commented Apr 6, 2022 •

edited

Loading

TimSchopf commented Jun 18, 2022 •

edited

Loading