GitHub - lvgig/tubular: Python package implementing transformers for pre processing steps for machine learning.

Tubular pre-processing for machine learning!

tubular implements pre-processing steps for tabular data commonly used in machine learning pipelines.

The transformers are compatible with scikit-learn Pipelines. Each has a transform method to apply the pre-processing step to data and a fit method to learn the relevant information from the data, if applicable.

The transformers in tubular work with data in pandas DataFrames.

There are a variety of transformers to assist with;

capping
dates
imputation
mapping
categorical encoding
numeric operations

Here is a simple example of applying capping to two columns;

from tubular.capping import CappingTransformer
import pandas as pd
from sklearn.datasets import fetch_california_housing

# load the california housing dataset
cali = fetch_california_housing()
X = pd.DataFrame(cali['data'], columns=cali['feature_names'])

# initialise a capping transformer for 2 columns
capper = CappingTransformer(capping_values = {'AveOccup': [0, 10], 'HouseAge': [0, 50]})

# transform the data
X_capped = capper.transform(X)

Installation

The easiest way to get tubular is directly from pypi with;

pip install tubular

Documentation

The documentation for tubular can be found on readthedocs.

Instructions for building the docs locally can be found in docs/README.

Examples

To help get started there are example notebooks in the examples folder in the repo that show how to use each transformer.

To open the example notebooks in binder click here or click on the launch binder shield above and then click on the directory button in the side bar to the left to navigate to the specific notebook.

Issues

For bugs and feature requests please open an issue.

Build and test

The test framework we are using for this project is pytest. To build the package locally and run the tests follow the steps below.

First clone the repo and move to the root directory;

git clone https://github.com/lvgig/tubular.git
cd tubular

Next install tubular and development dependencies;

pip install . -r requirements-dev.txt

Finally run the test suite with pytest;

pytest

Contribute

tubular is under active development, we're super excited if you're interested in contributing!

See the CONTRIBUTING file for the full details of our working practices.

Name		Name	Last commit message	Last commit date
Latest commit History 889 Commits
.devcontainer		.devcontainer
.github		.github
docs		docs
examples		examples
profiling		profiling
tests		tests
tubular		tubular
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.rst		CHANGELOG.rst
CODE_OF_CONDUCT.rst		CODE_OF_CONDUCT.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
README.md		README.md
logo-small.png		logo-small.png
logo.png		logo.png
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Documentation

Examples

Issues

Build and test

Contribute

About

Releases 10

Packages

Contributors 19

Languages

License

lvgig/tubular

Folders and files

Latest commit

History

Repository files navigation

Installation

Documentation

Examples

Issues

Build and test

Contribute

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 19

Languages

Packages