You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to be able to exclude specific features from col_sample.
meaning - they will be part of every tree.
Motivation and Description
Sometimes I use a column to distinguish different entities or data types, i'm not training multiple models because the signal in the data has a lot of similarities between the different entities, but still each one is critical on its own and there's an importance that each tree will know which entity is in the specific row.
Currently - in order for the column to exist in more trees, i need to do a high colsample.
I would want to be able to define a low colsample, like 0.3, but make sure that the specific feature will exist in each of the trees.
Thanks!
The text was updated successfully, but these errors were encountered:
I think it is a good idea. The fast random forest implementation ranger in R offers this and I use it quite frequently in situations where there are many redundant columns and one important "golden feature".
I'm going to close this, given that it's the same as the request in #4605. I'm also going to lock the conversation here, to keep the conversation on this feature focused.
If you're interested in contributing this or have additional context you'd like to add for whoever does contribute this, please add it in #4605.
Hi,
Summary
I want to be able to exclude specific features from col_sample.
meaning - they will be part of every tree.
Motivation and Description
Sometimes I use a column to distinguish different entities or data types, i'm not training multiple models because the signal in the data has a lot of similarities between the different entities, but still each one is critical on its own and there's an importance that each tree will know which entity is in the specific row.
Currently - in order for the column to exist in more trees, i need to do a high colsample.
I would want to be able to define a low colsample, like 0.3, but make sure that the specific feature will exist in each of the trees.
Thanks!
The text was updated successfully, but these errors were encountered: