Py-BoostLSS: An extension of Py-Boost to probabilistic modelling

We present a probabilistic extension of the recently introduced Py-Boost approach and model all moments of a parametric multivariate distribution as functions of covariates. This allows us to create probabilistic predictions from which intervals and quantiles of interest can be derived.

Motivation

Existing implementations of Gradient Boosting Machines, such as XGBoost and LightGBM, are mostly designed for single-target regression tasks. While efficient for low to medium target-dimensions, the computational cost of estimating them becomes prohibitive in high-dimensional settings.

As an example, consider modelling a multivariate Gaussian distribution with D=100 target variables, where the covariance matrix is approximated using the Cholesky-Decomposition. Modelling all conditional moments (i.e., means, standard-deviations and all pairwise correlations) requires estimation of D(D + 3)/2 = 5,150 parameters. Because most GBM implementations are based on a one vs. all estimation strategy, where a separate tree is grown for each parameter, estimating this many parameters for a large dataset can become computationally extremely expensive.

The recently introduced Py-Boost approach provides a more runtime efficient GBM implementation, making it a good candidate for estimating high-dimensional target variables in a probabilistic setting. Borrowing from the original paper SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems, the following figure illustrates the runtime-efficiency of the Py-Boost model.

Even though the original implementation of Py-Boost also supports estimation of univariate responses, Py-BoostLSS focuses on multi-target probabilistic regression settings. For univariate probabilistic GBMs, we refer to our implementations of XGBoostLSS and LightGBMLSS.

Installation

Since Py-BoostLSS is entirely GPU-based, we first need to install the corresponding PyTorch and CuPy packages. If you are on Windows, it is preferable to install CuPy via conda. All other OS can use pip. You can check your cuda version with nvcc --version.

# CuPy (replace with your cuda version)
  # Windows only
  conda install -c conda-forge cupy cudatoolkit=11.x 
  # Others
  pip install cupy-cuda11x

# PyTorch (replace with your cuda version)
pip3 install torch --extra-index-url https://download.pytorch.org/whl/cu11x

Next, you can install Py-BoostLSS.

pip install git+https://github.com/StatMixedML/Py-BoostLSS.git

How to use

We refer to the examples section for example notebooks.

Available Distributions

Py-BoostLSS currently supports the following distributions. More distribution follow soon.

Distribution	Usage	Type	Support
Multivariate Normal (Cholesky)	`MVN()`	Continous (Multivariate)	$y \in (-\infty,\infty)$
Multivariate Normal (Low-Rank Approximation)	`MVN_LRA()`	Continous (Multivariate)	$y \in (-\infty,\infty)$
Multivariate Student-T (Cholesky)	`MVT()`	Continous (Multivariate)	$y \in (-\infty,\infty)$
Dirichlet	`DIRICHLET()`	Continous (Multivariate)	$y \in [0,1]$

Feedback

Please provide feedback on how to improve Py-BoostLSS, or if you request additional distributions to be implemented, by opening a new issue or via the discussion section.

Acknowledgements

The implementation of Py-BoostLSS relies on the following resources:

We genuinely thank the original authors Anton Vakhrushev and Leonid Iosipoi for making their work publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
examples		examples
pyboostlss		pyboostlss
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Py-BoostLSS: An extension of Py-Boost to probabilistic modelling

Motivation

Installation

How to use

Available Distributions

Feedback

Acknowledgements

Reference Paper

About

Releases

Packages

Languages

License

StatMixedML/Py-BoostLSS

Folders and files

Latest commit

History

Repository files navigation

Py-BoostLSS: An extension of Py-Boost to probabilistic modelling

Motivation

Installation

How to use

Available Distributions

Feedback

Acknowledgements

Reference Paper

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages