Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of version of model_builder to work with scikit-learn. #161

Merged

Conversation

pdb5627
Copy link
Contributor

@pdb5627 pdb5627 commented Apr 30, 2023

Resolves #155

@pdb5627 pdb5627 closed this Apr 30, 2023
@pdb5627 pdb5627 changed the title Initial draft of version of model_builder to work with scikit-learn. Resolves #155 Initial draft of version of model_builder to work with scikit-learn. Apr 30, 2023
@pdb5627 pdb5627 reopened this Apr 30, 2023
@twiecki
Copy link
Member

twiecki commented Apr 30, 2023

This looks great! I wonder if perhaps we should just make the base class directly use this API. But it certainly can't hurt to also try it out here first.

@michaelraczycki
Copy link
Collaborator

Nice job on the fit refactoring! From what I see the model_builder was already compatible with scikit-learn or you didn't finish pushing the changes yet?

This avoids issues with mutable objects as default parameters.
It allows users to start from the default parameters and modify parts of
them before passing them to __init__.
@pdb5627
Copy link
Contributor Author

pdb5627 commented May 1, 2023

Nice job on the fit refactoring! From what I see the model_builder was already compatible with scikit-learn or you didn't finish pushing the changes yet?

@michaelraczycki No, it wasn't compatible. I made a new file bayesian_estimator_linearmodel.py for a subclass of ModelBuilder and an example implemention subclass. I introduced new tests in a new file tests/test_bayesian_estimator_linearmodel.py. The refactoring in ModelBuilder was to put the sampling into a function that wouldn't be overridden in the subclass. I also pulled out the attr-setting code so that it could be reused by a function for sampling the prior predictive distribution.

@pdb5627
Copy link
Contributor Author

pdb5627 commented May 1, 2023

Here's a comparison of the APIs:

ModelBuilder BayesianEstimator Comment
init init Remove data from constructor. Default args come from properties.
default_model_config ***
default_sampler_config ***
_validate_data
_data_setter *** _data_setter *** Adapted for X,y rather than data
create_sample_input *** create_sample_input *** Creates just data, not configuration. Default args from properties.
build_model *** build_model *** Takes no arguments. Instead of setting MutableData values, puts in dummy data of the right number of dimensions.
sample_model
save
load load Adapted for different constructor arguments
fit fit Adapted for X,y rather than data
predict predict Adapted for X,y rather than data
predict_posterior predict_posterior Adapted for X,y rather than data
predict_proba
sample_prior_predictive
sample_posterior_predictive
_serializable_model_config
id

*** = abstract method requiring implementation in subclasses.

@michaelraczycki
Copy link
Collaborator

Oh, I focused much more on the changes you've introduced to model_builder itself, I didn't go into the implementation of Bayesian Estimator that much in detail as It doesn't affect other projects so far. I really like the ideas you're proposing, could we schedule a short call to discuss Bayesian Estimator a bit in depth? I'm working on making the model_builder to a standard used by other pymc-related projects, and I wanted to discuss with you which of the API changes you've suggested are complete must-have for sklearn compatibility

@pdb5627
Copy link
Contributor Author

pdb5627 commented May 1, 2023

Oh, I focused much more on the changes you've introduced to model_builder itself, I didn't go into the implementation of Bayesian Estimator that much in detail as It doesn't affect other projects so far. I really like the ideas you're proposing, could we schedule a short call to discuss Bayesian Estimator a bit in depth? I'm working on making the model_builder to a standard used by other pymc-related projects, and I wanted to discuss with you which of the API changes you've suggested are complete must-have for sklearn compatibility

@michaelraczycki Ok, sounds good. How would you like to set it up? I'm in UTC+3.

@michaelraczycki
Copy link
Collaborator

michaelraczycki commented May 2, 2023

I'm in UTC+0, so anything after your lunch should work. My email is [email protected], if you shoot me a message with some time slots that suit you I can create a google meeting

@michaelraczycki
Copy link
Collaborator

@pdb5627 is there something else you'd like to add here?

@michaelraczycki michaelraczycki marked this pull request as ready for review May 10, 2023 06:51
Copy link
Collaborator

@michaelraczycki michaelraczycki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job, thank you for your contribution!

@michaelraczycki michaelraczycki merged commit 03dc6d4 into pymc-devs:main May 10, 2023
@twiecki
Copy link
Member

twiecki commented May 10, 2023

Indeed, this is great @pdb5627!

@pdb5627
Copy link
Contributor Author

pdb5627 commented May 10, 2023

Thanks, guys. I'm glad I could contribute.

@pdb5627 pdb5627 deleted the modelbuilder_scikit_compatibility branch May 10, 2023 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

model_builder scikit-learn integration
3 participants