Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New idea - Feature Request] an option to exclude features from sampling #4962

Closed
TpTheGreat opened this issue Jan 19, 2022 · 4 comments
Closed

Comments

@TpTheGreat
Copy link

Hi,

Summary

I want to be able to exclude specific features from col_sample.
meaning - they will be part of every tree.

Motivation and Description

Sometimes I use a column to distinguish different entities or data types, i'm not training multiple models because the signal in the data has a lot of similarities between the different entities, but still each one is critical on its own and there's an importance that each tree will know which entity is in the specific row.

Currently - in order for the column to exist in more trees, i need to do a high colsample.
I would want to be able to define a low colsample, like 0.3, but make sure that the specific feature will exist in each of the trees.

Thanks!

@StrikerRUS
Copy link
Collaborator

@TpTheGreat Hey! Thanks a lot for your proposal!
I think we already have a such feature request: #4605.

@TpTheGreat
Copy link
Author

Thanks @StrikerRUS !
Yes, seems that it's already there but motivation was not enough for it to follow through.

@mayer79
Copy link
Contributor

mayer79 commented Jan 22, 2022

I think it is a good idea. The fast random forest implementation ranger in R offers this and I use it quite frequently in situations where there are many redundant columns and one important "golden feature".

@jameslamb
Copy link
Collaborator

I'm going to close this, given that it's the same as the request in #4605. I'm also going to lock the conversation here, to keep the conversation on this feature focused.

If you're interested in contributing this or have additional context you'd like to add for whoever does contribute this, please add it in #4605.

@microsoft microsoft locked as resolved and limited conversation to collaborators Feb 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants