Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical and mixed features types #14

Closed
jeremiedb opened this issue May 26, 2019 · 2 comments
Closed

Categorical and mixed features types #14

jeremiedb opened this issue May 26, 2019 · 2 comments

Comments

@jeremiedb
Copy link
Member

Support non one-hot encoded categorical features: features carrying item info as an Int (1 to N levels).
Consider change from Matrix to DataFrames input structure to handle mixed feature.
Consider supporting mix of input structures: DataFrames + SparseMatrix for efficieent handling of mixture of dense (continuous and categorical) and sparse features.

@ablaom
Copy link
Contributor

ablaom commented Jul 10, 2020

Support non one-hot encoded categorical features: features carrying item info as an Int (1 to N levels).

Maybe better to support categorical arrays to deal with classes that disappear on sub-sampling. Or require total number of classes to be specified as metadata.

You should also make a clear distinction between ordered factors and arbitrary categoricals, which are handled differently in tree-based algorithms. Support for the former is easy - you probably already have the potential to do this by just requiring users to encode with "integer floats", yes? By the way, DecisionTree, although not supporting arbitrary categoricals does support any type implementing order < , and I assume you could do the same?

Consider change from Matrix to DataFrames input structure to handle mixed feature.

Perhaps you may as well support arbitrary tabular formats supporting Tables.jl interface?

@jeremiedb
Copy link
Member Author

Implemented in v0.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants