Skip to content

Tabular PyTorch Neural Network with support for monotonic features and aleatoric and epistemic uncertainty estimates using a scikit-learn API

License

Notifications You must be signed in to change notification settings

djarpin/tabularasa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tabularasa

Tabular Pytorch Neural Networks with monotonicity, uncertainty, and a scikit API

Overview

This library is heavily indebted to the following works:

With the goal to provide a usable open source implementation that combines the functionality of all three papers with minimal overhead and more flexibility.

Setup

Recommended:

  1. Create a new conda environment (Python 3.9+)
  2. Move to the root directory of this repository
  3. pip install -r requirements.txt
  4. pip install .

Usage

Please see the example notebooks for a walkthrough of how to use TabulaRasa:

  1. example_data: Generates a fake dataset used throughout the remaining examples.
  2. simple_mlp: Train a simple multi-layer perceptron with an embedding for the categorical feature, linear layers, and ReLU activation. Its purpose is to illustrate the skorch API (for those that are unfamiliar), and to show that without constraints, one feature's relationship with the target will be non-monotonic.
  3. mixed_monotnic: Trains a similar network to the simple MLP, but with a monotonic constraint on some features. In addition, this notebook illustrates the use of orthonormal certificates to estimate epistemic uncertainty based on the training data provided.
  4. simultaneous_quantiles: Trains a network similar to the simple MLP, but uses a loss function that can generate estimates for any predicted quantile. This model does not constrain features to have a monotonic relationship with the target. These predicted quantiles can be used as estimates for aleatoric uncertainty.
  5. external_monotonic: Trains a network with a monotonic constraint on some features, however instead of a simple embedding network to handle categorical features, a network from an external package, TabTransformer, is used.
  6. tabula_rasa: Trains a TabulaRasaRegressor(), which is designed to take in a Pandas DataFrame, and based on data types, automatically generate all transformations and sub-models needed to generate expected predictions, arbitrary quantile predictions, and estimates of epistemic uncertainty.

FAQ

Why is the package named "tabularasa"?

  • I'm not a strong proponent of the tabula rasa theory of development, I just wanted a name with "tabular" in it. Plus, I like that tabula rasa hints at the ability to learn anything, which ideally (although not practically) our models could.

What is the long-term plan for tabularasa?

  • Ideally (in order of how much I care about them):
    • Continue to improve and evolve the default network that's used when a specific network isn't specified.
    • Better software engineering practices (tests, error messages, etc.).
    • Potentially expand into problems beyond regression. It isn't necessarily clear how to generalize monotonicity for multiclass classification problems, but I'd be interested if others see value here.

TODO

  • Clean up wasted memory usage in TabulaRasaRegressor()
  • Make save and reload effortless
  • Make outputs from examples more deterministic
  • Generate partial dependence plots within the library
  • Get GPU working
  • Write basic unit tests
  • Allow for networks with all the combinations of monotonic, non-monotonic, and categorical features
  • Publish in PyPI
  • Improve real-time inference latency

About

Tabular PyTorch Neural Network with support for monotonic features and aleatoric and epistemic uncertainty estimates using a scikit-learn API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages