Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] roadmap of probabilistic regressors to implement or to interface #7

Open
12 of 45 tasks
fkiraly opened this issue Apr 12, 2020 · 8 comments
Open
12 of 45 tasks
Labels
good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to skpro interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:regression probabilistic regression module

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 12, 2020

A wishlist for probabilistic regression methods to implement or interface.
This is partly copied from the list I made when designing the R counterpart mlr-org/mlr3proba#32 .
Number of stars at the end is estimated difficulty or time investment.

GLM

  • generalized linear model(s) with continuous regression link, e.g., Gaussian *
    • Gaussian link, statsmodels
    • further regression links: Gamma, Tweedie, inverse Gaussian
  • generalized linear model(s) with count link, e.g., Poisson *
    • Poisson link, statsmodels
    • Poisson link, sklearn
    • further links: Binomial
  • heteroscedastic linear regression ***
  • Bayesian GLM where conjugate priors are available, e.g., GLM with Gaussian link ***

KRR aka Gaussian process regression

  • vanilla kernel ridge regression with fixed kernel parameters and variance *
  • kernel ridge regression with MLE for kernel parameters and regularization parameter **
  • heteroscedastic KRR or Gaussian processes ***

CDE

  • variants of conditional density estimation (Nadaraya-Watson type) **
  • reduction to density estimation by binning of input variables, then apply unconditional density estimation **

Gradient boosting and tree-based

  • ngboost package interface *
  • probabilistic residual boosting **
  • probabilistic regression trees **

Neural networks

  • interface tensorflow probability - some hard-coded NN architectures **
  • generic tensorflow probability interface - some hard-coded NN architectures ***

Bayesian toolboxes

  • generic pymc3 interface ***
  • generic pyro interface ****
  • generic Stan interface ****
  • generic JAGS interface ****
  • generic BUGS interface ****
  • generic Bayesian interface - prior-valued hyperparameters *****

Pipeline elements for target transformation

  • distr fixed target transformation **
  • distr predictive target calibration **

Composite techniques, reduction to deterministic regression

  • stick mean, sd, from a deterministic regressor which already has these as return types into some location/scale distr family (Gaussian, Laplace) *
  • use model 1 for the mean, model 2 fit to residuals (squared, absolute, or log), put this in some location/scale distr family (Gaussian, Laplace) **
  • upper/lower thresholder for a regression prediction, to use as a pipeline element for a forced lower variance bound **
  • generic parameter prediction by elicitation, output being plugged into parameters of a distr object not necessarily scale/location ****
  • reduction via bootstrapped sampling of a determinstic regressor **

Ensembling type pipeline elements and compositors

  • simple bagging, averaging of pdf/cdf **
  • probabilistic boosting ***
  • probabilistic stacking ***

baselines

  • always predict a Gaussian with mean = training mean, var = training var *
  • unconditional densities via distfit package, interface *
  • IMPORTANT as featureless baseline: reduction to distr/density estimation to produce an unconditional probabilistic regressor **
  • IMPORTANT as deterministic style baseline: reduction to deterministic regression, mean = prediction by det.regressor, var = training sample var, distr type = Gaussian (or Laplace) **

Other reduction from/to probabilistic regression

  • reducing deterministic regression to probabilistic regression - take mean, median or mode **
  • reduction(s) to quantile regression, use predictive quantiles to make a distr ***
  • reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) **
  • reducing interval regression to probabilistic regression - take mean/sd, or take quantile(s) **
  • reduction to survival, as the sub-case of no censoring **
  • reduction to classification, by binning ***
@fkiraly fkiraly added the good first issue Good for newcomers label Apr 12, 2020
@fkiraly fkiraly added module:regression probabilistic regression module implementing algorithms Implementing algorithms, estimators, objects native to skpro interfacing algorithms Interfacing existing algorithms/estimators from third party packages labels Aug 23, 2023
@fkiraly fkiraly changed the title (wish)list of probabilistic regressors to implement or to interface [ENH] roadmap of probabilistic regressors to implement or to interface Sep 13, 2023
@fkiraly fkiraly pinned this issue Sep 13, 2023
@nilesh05apr
Copy link
Contributor

@fkiraly I wish to take up this as my project. What would be a good headstart?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 12, 2024

pick something that you find interesting, with a single star * ?

I've updated the list with checkmarks for implemented estimators.

@ShreeshaM07
Copy link
Contributor

ShreeshaM07 commented Mar 16, 2024

@fkiraly , I am interested in this project idea and would like to start off by adding an interface to ngboost to skpro. Can I go ahead? Also a small doubt since I haven't contributed to skpro earlier, is using the same versions as sktime sufficient for skpro or should I create another virtual environment for it?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 16, 2024

@ShreeshaM07, nice! Can you then quickly post in #135 that you will be working on this?

Also a small doubt since I haven't contributed to skpro earlier, is using the same versions as sktime sufficient for skpro or should I create another virtual environment for it?

I would advise to have a virtual environment ready for testing, with an editable install of skpro.

Like with sktime, you can do an editable install with a pip install -e . in a clone of the skpro repo.

If you have an sktime environment, you might have skpro already installed, but not as editable, in that case your changes to the code will not be reflected in the environment.

Personally, I have an environment where both sktime and skpro are installed as editable versions, to allow debugging and testing across different packages.

Happy to connect quickly on the discord dev-chat if you have further questions about this.

@julian-fong
Copy link
Contributor

julian-fong commented Mar 20, 2024

@fkiraly Hey Franz, I would like to contribute towards some of the GLMs with regression links, is there anything i need to do setup wise with skpro that is different than sktime?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 21, 2024

@fkiraly Hey Franz, I would like to contribute towards some of the GLMs with regression links

Excellent! I'd recommend to start with the statsmodels ones: https://www.statsmodels.org/stable/glm.html#module-statsmodels.genmod.generalized_linear_model, and with Gaussian link.

is there anything i need to do setup wise with skpro that is different than sktime?

It is the same, except of course you do pip install -e .[dev] in a clone of skpro, not sktime.

I'm typically developing in an environment that has editable versions of both, plus scikit-base, that allows me to make changes in all three packages. The "catch" if you do this is that you have to install editable versions in sequence of dependence, i.e., first skbase, then skpro, then sktime, otherwise pip will get the non-editable pypi versions.

@ShreeshaM07
Copy link
Contributor

@fkiraly , Just wanted to know where reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) has been implemented to get an idea on what needs to be done in these types of issues. Could you please help me out.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 23, 2024

yes, that has been implemented already, by @Ram0nB in MultipleQuantileRegressor, see #108.
You can figure out which algorithms have been contributed already by the checkmark next to them (I hope that's all correct, but feel free to ask).

fkiraly pushed a commit that referenced this issue Mar 31, 2024
…222)

#### Reference Issues/PRs

Implements a GLM model for the gaussian link in #7 

#### What does this implement/fix? Explain your changes.

The Gaussian Regressor is a direct interface to the statsmodels package's
GLM model
fkiraly pushed a commit that referenced this issue Apr 25, 2024
#### Reference Issues/PRs
#7

#### What does this implement/fix? Explain your changes.
Added interface for Poisson Regressor
fkiraly added a commit that referenced this issue May 25, 2024
This PR removes the legacy base modules.

* base class: equivalent functionality is now contained in
`BaseDistribution`, `BaseProbaRegressor`, `_DelegatedProbaRegressor`
* pymc vendor interface: currently worked on in
#358
* density estimation: tracked via
#7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to skpro interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:regression probabilistic regression module
Projects
None yet
Development

No branches or pull requests

4 participants