[New idea - Feature Request] an option to exclude features from sampling #4962

TpTheGreat · 2022-01-19T22:42:28Z

Hi,

Summary

I want to be able to exclude specific features from col_sample.
meaning - they will be part of every tree.

Motivation and Description

Sometimes I use a column to distinguish different entities or data types, i'm not training multiple models because the signal in the data has a lot of similarities between the different entities, but still each one is critical on its own and there's an importance that each tree will know which entity is in the specific row.

Currently - in order for the column to exist in more trees, i need to do a high colsample.
I would want to be able to define a low colsample, like 0.3, but make sure that the specific feature will exist in each of the trees.

Thanks!

StrikerRUS · 2022-01-20T02:14:20Z

@TpTheGreat Hey! Thanks a lot for your proposal!
I think we already have a such feature request: #4605.

TpTheGreat · 2022-01-21T20:58:58Z

Thanks @StrikerRUS !
Yes, seems that it's already there but motivation was not enough for it to follow through.

mayer79 · 2022-01-22T08:04:30Z

I think it is a good idea. The fast random forest implementation ranger in R offers this and I use it quite frequently in situations where there are many redundant columns and one important "golden feature".

jameslamb · 2022-02-11T02:06:09Z

I'm going to close this, given that it's the same as the request in #4605. I'm also going to lock the conversation here, to keep the conversation on this feature focused.

If you're interested in contributing this or have additional context you'd like to add for whoever does contribute this, please add it in #4605.

jameslamb closed this as completed Feb 11, 2022

jameslamb added the duplicate label Feb 11, 2022

microsoft locked as resolved and limited conversation to collaborators Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New idea - Feature Request] an option to exclude features from sampling #4962

[New idea - Feature Request] an option to exclude features from sampling #4962

TpTheGreat commented Jan 19, 2022

StrikerRUS commented Jan 20, 2022

TpTheGreat commented Jan 21, 2022

mayer79 commented Jan 22, 2022

jameslamb commented Feb 11, 2022

[New idea - Feature Request] an option to exclude features from sampling #4962

[New idea - Feature Request] an option to exclude features from sampling #4962

Comments

TpTheGreat commented Jan 19, 2022

Summary

Motivation and Description

StrikerRUS commented Jan 20, 2022

TpTheGreat commented Jan 21, 2022

mayer79 commented Jan 22, 2022

jameslamb commented Feb 11, 2022