Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] (WIP) Creating a new Bayesian Regressor with PyMC as a backend #358

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

meraldoantonio
Copy link
Contributor

@meraldoantonio meraldoantonio commented May 23, 2024

Reference Issues/PRs

#7

What does this implement/fix? Explain your changes.

This WIP PR implements a Bayesian Linear Regressor with PyMC as a backend

Does your contribution introduce a new dependency? If yes, which one?

Yes - it depends on PyMC family: PyMC itself, XArray and ArviZ

What should a reviewer concentrate their feedback on?

The design of the BayesianLinearRegressor. Especially:

  1. The introduction of the priors. For now, the class hardcodes the priors. We need to think about the way in which the users should inject their own priors.

Did you add any tests for the change?

Not yet

Any other comments?

N/A

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • [ X] The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators

(This is not yet done)

  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@meraldoantonio meraldoantonio marked this pull request as draft May 23, 2024 17:11
fkiraly added a commit that referenced this pull request May 25, 2024
This PR removes the legacy base modules.

* base class: equivalent functionality is now contained in
`BaseDistribution`, `BaseProbaRegressor`, `_DelegatedProbaRegressor`
* pymc vendor interface: currently worked on in
#358
* density estimation: tracked via
#7
# Priors for unknown model parameters
self.intercept = pm.Normal("intercept", mu=self.intercept_mu, sigma=self.intercept_sigma)
self.slopes = pm.Normal("slopes", mu=self.slopes_mu, sigma=self.slopes_sigma, shape = self._X.shape[1], dims=("pred_id"))
self.noise = pm.HalfNormal("noise", sigma=self.noise_sigma)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would inverse gamma not be more standard here, as it is conjugate to the normal?

@fkiraly fkiraly added enhancement module:regression probabilistic regression module labels Sep 7, 2024
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice contribution!

Some high-level points:

  • could you split the notebook off into a chained PR, based on the estimator PR? The notebook may require some more review (time), and should not block the estimator
  • in the estimator, I would kindly ask you to remove the visualization dependencies from python_dependencies, and instead introduce dependency checks in the methods that need them? This way, users do not need to visualisation dependencies when using the model in a deployment pipeline.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 4, 2024

Strange import error - is this related to an upper bound of any of the imports implied by 3.9, e.g., scipy?

@meraldoantonio
Copy link
Contributor Author

Strange import error - is this related to an upper bound of any of the imports implied by 3.9, e.g., scipy?

Apparently there is a bug with Arviz 0.17 and scipy>=1.13 (source 1) (source 2).

The bug is no longer present in Arviz 0.18 but this requires Python 3.10 and above.

As a temporary solution, I've locked the scipy version in all-extras in project.toml

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2024

Makes sense.

From a maintenance perspective, applying the version bound in the pyproject.toml is not a good solution, since the lock is implied only by a single estimator, and not by scipy itself.

Could you add the lock instead in the python_dependencies tag of the estimator, and revert the changes to pyproject?

@meraldoantonio
Copy link
Contributor Author

meraldoantonio commented Oct 6, 2024

Makes sense.

From a maintenance perspective, applying the version bound in the pyproject.toml is not a good solution, since the lock is implied only by a single estimator, and not by scipy itself.

Could you add the lock instead in the python_dependencies tag of the estimator, and revert the changes to pyproject?

Makes sense! But I've tried this a couple of times and for some reason, without the pyproject.toml lock, the test framework keeps installing the "wrong" version of scipy (version 1.13.1), even after specifying "scipy<=1.12.0" in the python_dependencies tag...

It might be that other libraries are pulling in a conflicting version, but I haven't managed to find the exact cause..

Any ideas?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 7, 2024

Any ideas?

Why are you trying to bound scipy instead of arviz? I would simply bound arviz>=0.18, based on your statements, as well as python_version >= 3.10

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 7, 2024

PS: why did you close the notebook PR? That was a nice notebook, and indeed it would be nice as separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement module:regression probabilistic regression module
Projects
Status: PR in progress
Development

Successfully merging this pull request may close these issues.

2 participants