Decouple transformer vectorisers from sklearn-based ones #174

aCampello · 2022-08-23T14:36:24Z

Description

Thank you very much for the nice repository. We are currently using XLinear, together with the multi-thread Tfidf vectoriser

pecos/pecos/utils/featurization/text/vectorizers.py

Lines 190 to 194 in 18a1cc5

    
           class Tfidf(Vectorizer): 
        
               """Multithreaded tfidf vectorizer with C++ backend. 
        
               Supports 'word', 'char' and 'char_wb' tokenization. 
        
               """

It works as a charm. However, with the new transformer vectorisers, to use Tfidf I need to unnecessarily install all transformers/torch dependencies in my virtual environment. It would be good if the code was decoupled, so a user can just use Tfidf if needed, with a slim environment. In the past, I used to just install pecos with no dependencies, and then only install the specific dependencies for the specific modules - this is desirable for a production environment.

Proposal:

That the pertained transformers in

pecos/pecos/utils/featurization/text/vectorizers.py

Lines 535 to 537 in 18a1cc5

    
           class PretrainedTransformer(Vectorizer): 
        
               """Vectorizer with a variety of Transformer models."""

Be moved to an independent file, say pretrained_vectorizers.py, alongside transformers and torch imports.

I'm happy to give this change a go, and contribute with a MR if the maintainers upvote it.

The text was updated successfully, but these errors were encountered:

aCampello · 2022-09-02T10:58:54Z

Any thoughts by the maintainers? @OctoberChang @weiliw-amz ?

OctoberChang · 2022-09-19T21:46:56Z

@aCampello , as suggested, we decoupled the vectorizer.py to contain only tf-idf based vectorizers.

aCampello added the enhancement New feature or request label Aug 23, 2022

OctoberChang mentioned this issue Sep 16, 2022

Remove PretrainedTransformer Vectorizer to avoid Pytest Error #179

Merged

OctoberChang closed this as completed Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple transformer vectorisers from sklearn-based ones #174

Decouple transformer vectorisers from sklearn-based ones #174

aCampello commented Aug 23, 2022 •

edited

Loading

aCampello commented Sep 2, 2022

OctoberChang commented Sep 19, 2022

Decouple transformer vectorisers from sklearn-based ones #174

Decouple transformer vectorisers from sklearn-based ones #174

Comments

aCampello commented Aug 23, 2022 • edited Loading

Description

aCampello commented Sep 2, 2022

OctoberChang commented Sep 19, 2022

aCampello commented Aug 23, 2022 •

edited

Loading