easysklearn

Summary

Easysklearn is a python package designed to perform exploratory data analysis, to help with missing data imputation and to give baseline models. Also, it assists in feature selection which is a common problem when undertaking a data science or machine learning analysis. As its name indicates, this function operates like sklearn. It carries out tasks such as splitting data, feature selection, model fitting, numerical missing data imputation etc.

Installation

$ pip install -i https://test.pypi.org/simple/ easysklearn

Features

This package introduces a data science enthusiast, with little to no knowledge of machine learning, to the common steps required when undertaking a Supervised learning analysis. The package contains four functions that accept a pandas DataFrame. All functions can be used on a dataset with numerical features. The functions might have their own required and optional arguments.

eda: The eda function will split the original data into train and test dataset and will generate a statistical report such as correlation between the variables, number of missing data, class imbalance and type of data present in the dataset.
miss_data: The miss_data function will handle missing numerical data in the data frame.
baseline_fun: The baseline_fun function will give users a quick check of the performance of the selected sklearn models compared to a baseline model, upon which the model can be improved.
feature_select: The feature_select function will remove redundant features based on the forward selection.

How the easysklearn package fits into the Python ecosystem

To our knowledge, while pandas profiling provides some data statistical analysis, there is no general-purpose library for performing the aforementioned tasks together in the Python ecosystem.

Dependencies

python = "^3.8"
pandas = "^1.2.3"
numpy = "^1.20.1"
matplotlib = "^3.3.4"
sklearn = "^0.0"
seaborn = "^0.11.1"
ipython = "^7.21.0"
jupyter = "^1.0.0"

Usage

Task	Function
Exploratory data analysis	`eda(df, target)`
Numerical data imputation	`miss_data(x_train, x_test, strategy="mean")`
Compare selected model to a baseline model	`baseline_fun(X_train, y_train, type="regression", metrics_1="accuracy", metrics_2="r2")`
Feature selection to reduce data dimension	`feature_select(X_train, y_train, threshold=0.05)`

Example

from easysklearn import eda, baseline_fun, miss_data, feature_select

example_df = pd.DataFrame(np.array([[4500, np.nan, 4], [3450, 350_000, 6],
    [np.nan, 800_000, 9]]), columns = ['size', 'price', 'bedrooms'])

- eda = eda(example_df, target)
- x_train, x_test = miss_data(x_train, x_test, strategy="mean")
- results = baseline_fun(X_train, y_train, type="regression", metrics_1="accuracy", metrics_2="r2")
- best_features = feature_select(X_train, y_train, threshold=0.05)

Documentation

The official documentation is hosted on Read the Docs: https://easysklearn.readthedocs.io/en/latest/

Contributors

Development leaders:

Ifeanyi Anene
Lara Habashy
Sakshi Jain
Zhenrui Yu

We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab. If you would like to contribute, please view our contributing guidelines and get familiar with the Github flow workflow.

Credits

This package was created with Cookiecutter and the UBC-MDS/cookiecutter-ubc-mds project template, modified from the pyOpenSci/cookiecutter-pyopensci project template and the audreyr/cookiecutter-pypackage.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
docs		docs
easysklearn		easysklearn
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CONDUCT.rst		CONDUCT.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

easysklearn

Summary

Installation

Features

How the easysklearn package fits into the Python ecosystem

Dependencies

Usage

Example

Documentation

Contributors

Credits

About

Releases 7

Packages

Contributors 5

Languages

License

UBC-MDS/524_easysklearn

Folders and files

Latest commit

History

Repository files navigation

easysklearn

Summary

Installation

Features

How the easysklearn package fits into the Python ecosystem

Dependencies

Usage

Example

Documentation

Contributors

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 5

Languages

Packages