Selective: Feature Selection Library

Selective is a white-box feature selection library that supports unsupervised and supervised selection methods for classification and regression tasks.

The library provides:

Simple to complex selection methods: Variance, Correlation, Statistical, Linear, Tree-based, or Customized.
Interoperable with data frames as the input.
Automated task detection. No need to know what feature selection method works with what machine learning task.
Benchmarking multiple selectors using cross-validation with built-in parallelization.
Inspection of the results and feature importance.

Selective is developed by the Artificial Intelligence Center of Excellence at Fidelity Investments.

Quick Start

# Import Selective and SelectionMethod
from sklearn.datasets import load_boston
from feature.utils import get_data_label
from feature.selector import Selective, SelectionMethod

# Data
data, label = get_data_label(load_boston())

# Feature selectors from simple to more complex
selector = Selective(SelectionMethod.Variance(threshold=0.0))
selector = Selective(SelectionMethod.Correlation(threshold=0.5, method="pearson"))
selector = Selective(SelectionMethod.Statistical(num_features=3, method="anova"))
selector = Selective(SelectionMethod.Linear(num_features=3, regularization="none"))
selector = Selective(SelectionMethod.TreeBased(num_features=3))

# Feature reduction
subset = selector.fit_transform(data, label)
print("Reduction:", list(subset.columns))
print("Scores:", list(selector.get_absolute_scores()))

Available Methods

Method	Options
Variance per Feature	`threshold`
Correlation pairwise Features	Pearson Correlation Coefficient Kendall Rank Correlation Coefficient Spearman's Rank Correlation Coefficient
Statistical Analysis	ANOVA F-test Classification F-value Regression Chi-Square Mutual Information Classification Variance Inflation Factor
Linear Methods	Linear Regression Logistic Regression Lasso Regularization Ridge Regularization
Tree-based Methods	Decision Tree Random Forest Extra Trees Classifier XGBoost LightGBM AdaBoost CatBoost Gradient Boosting Tree

Benchmarking

# Imports
from sklearn.datasets import load_boston
from feature.utils import get_data_label
from xgboost import XGBClassifier, XGBRegressor
from feature.selector import SelectionMethod, benchmark, calculate_statistics

# Data
data, label = get_data_label(load_boston())

# Selectors
corr_threshold = 0.5
num_features = 3
tree_params = {"n_estimators": 50, "max_depth": 5, "random_state": 111, "n_jobs": 4}
selectors = {

  # Correlation methods
  "corr_pearson": SelectionMethod.Correlation(corr_threshold, method="pearson"),
  "corr_kendall": SelectionMethod.Correlation(corr_threshold, method="kendall"),
  "corr_spearman": SelectionMethod.Correlation(corr_threshold, method="spearman"),
  
  # Statistical methods
  "stat_anova": SelectionMethod.Statistical(num_features, method="anova"),
  "stat_chi_square": SelectionMethod.Statistical(num_features, method="chi_square"),
  "stat_mutual_info": SelectionMethod.Statistical(num_features, method="mutual_info"),
  
  # Linear methods
  "linear": SelectionMethod.Linear(num_features, regularization="none"),
  "lasso": SelectionMethod.Linear(num_features, regularization="lasso", alpha=1000),
  "ridge": SelectionMethod.Linear(num_features, regularization="ridge", alpha=1000),
  
  # Non-linear tree-based methods
  "random_forest": SelectionMethod.TreeBased(num_features),
  "xgboost_classif": SelectionMethod.TreeBased(num_features, estimator=XGBClassifier(**tree_params)),
  "xgboost_regress": SelectionMethod.TreeBased(num_features, estimator=XGBRegressor(**tree_params))
}

# Benchmark (sequential)
score_df, selected_df, runtime_df = benchmark(selectors, data, label, cv=5)
print(score_df, "\n\n", selected_df, "\n\n", runtime_df)

# Benchmark (in parallel)
score_df, selected_df, runtime_df = benchmark(selectors, data, label, cv=5, n_jobs=4)
print(score_df, "\n\n", selected_df, "\n\n", runtime_df)

# Get benchmark statistics by feature
stats_df = calculate_statistics(score_df, selected_df)
print(stats_df)

Visualization

import pandas as pd
from sklearn.datasets import load_boston
from feature.utils import get_data_label
from feature.selector import SelectionMethod, Selective, plot_importance

# Data
data, label = get_data_label(load_boston())

# Feature Selector
selector = Selective(SelectionMethod.Linear(num_features=10, regularization="none"))
subset = selector.fit_transform(data, label)

# Plot Feature Importance
df = pd.DataFrame(selector.get_absolute_scores(), index=data.columns)
plot_importance(df)

Installation

Selective requires Python 3.6+ and can be installed from PyPI using pip install selective.

Source

Alternatively, you can build a wheel package on your platform from scratch using the source code:

git clone https://github.com/fidelity/selective.git
cd selective
pip install setuptools wheel # if wheel is not installed
python setup.py sdist bdist_wheel
pip install dist/selective-X.X.X-py3-none-any.whl

Test your set up

cd selective
python -m unittest discover tests

Support

Please submit bug reports and feature requests as Issues.

License

Selective is licensed under the GNU GPL 3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
feature		feature
tests		tests
.gitignore		.gitignore
CHANGELOG.txt		CHANGELOG.txt
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICES		NOTICES
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Selective: Feature Selection Library

Quick Start

Available Methods

Benchmarking

Visualization

Installation

Source

Test your set up

Support

License

About

Releases

Packages

Languages

License

nagireddyakshay/selective

Folders and files

Latest commit

History

Repository files navigation

Selective: Feature Selection Library

Quick Start

Available Methods

Benchmarking

Visualization

Installation

Source

Test your set up

Support

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages