Enhancing TableVectorizer
API
#580
Replies: 4 comments
-
Hi, thanks for the post! I like the
I strongly agree, and I notice another issue from the user point of view: when I see the call tv = TableVectorizer(
numerical_transformer=StandardScaler(),
high_card_cat_transformer=MinHashEncoder(),
temperature_index=RobustScaler(),
) Without reading the I absolutely agree kwargs are cool, but I don't think that's an appropriate use-case :)
Interesting remark, though it's a problem things like the |
Beta Was this translation helpful? Give feedback.
-
I like the column_specific_transformers argument, I think this is the best
solution.
Having both options of using kwargs and column_specific_transformers might
be confusing
I strongly agree, and I notice another issue from the user point of view: when
I see the call
I agree with what Lilian points out and I share his views.
Thanks to both of you!
|
Beta Was this translation helpful? Give feedback.
-
I agree, the most elegant solution for me is with the |
Beta Was this translation helpful? Give feedback.
-
#583 closes this discussion |
Beta Was this translation helpful? Give feedback.
-
Basic use-case
Last week, @GaelVaroquaux mentioned the need to add more hyper-parameters to the
TableVectorizer
and be able to select encoders for specific columns.After discussing the new API design with @ogrisel and @glemaitre, one solution could be to add arbitrary keywords directly to the init:
Here
temperature_index
is a numerical column, and since we pass it toTableVectorizer
as a keyword argument, we overwriteStandardScaler
withRobustScaler
instead.Advanced use-case
Additionally, to support:
[1, 3, 3]
"0temperature"
we could replicate the logic of
ColumnTransformer
with a list of tuples:Advantages
ColumnTransformer
right away.Limitations
column_specific_transformers
might be confusingWhat do you all think of this?
cc @LilianBoulard
Beta Was this translation helpful? Give feedback.
All reactions