You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Multithreaded tfidf vectorizer with C++ backend.
Supports 'word', 'char' and 'char_wb' tokenization.
"""
It works as a charm. However, with the new transformer vectorisers, to use Tfidf I need to unnecessarily install all transformers/torch dependencies in my virtual environment. It would be good if the code was decoupled, so a user can just use Tfidf if needed, with a slim environment. In the past, I used to just install pecos with no dependencies, and then only install the specific dependencies for the specific modules - this is desirable for a production environment.
Description
Thank you very much for the nice repository. We are currently using XLinear, together with the multi-thread Tfidf vectoriser
pecos/pecos/utils/featurization/text/vectorizers.py
Lines 190 to 194 in 18a1cc5
It works as a charm. However, with the new transformer vectorisers, to use Tfidf I need to unnecessarily install all transformers/torch dependencies in my virtual environment. It would be good if the code was decoupled, so a user can just use Tfidf if needed, with a slim environment. In the past, I used to just install pecos with no dependencies, and then only install the specific dependencies for the specific modules - this is desirable for a production environment.
Proposal:
pecos/pecos/utils/featurization/text/vectorizers.py
Lines 535 to 537 in 18a1cc5
Be moved to an independent file, say
pretrained_vectorizers.py
, alongside transformers and torch imports.I'm happy to give this change a go, and contribute with a MR if the maintainers upvote it.
The text was updated successfully, but these errors were encountered: