Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sklearn kwargs #2338

Merged
merged 13 commits into from
May 24, 2017
Merged

Sklearn kwargs #2338

merged 13 commits into from
May 24, 2017

Conversation

gaw89
Copy link
Contributor

@gaw89 gaw89 commented May 22, 2017

This allows users of the Sklearn API to provide all Booster params via a **kwargs dict.

@RAMitchell
Copy link
Member

Can you please document what happens if a parameter is provided in kwargs and as normal parameter? Does one override the other?

@gaw89
Copy link
Contributor Author

gaw89 commented May 23, 2017

@RAMitchell great idea. Adding that clarification now. As of now, kwargs will override normal parameters. Is this acceptable behavior?

Scratch that, it actually throws a TypeError because it's being provided multiple values for the same argument. I think this is actually the proper behavior. I am making adjustments accordingly.

I am also adding tests to demonstrate this behavior.

@wxchan
Copy link
Contributor

wxchan commented May 23, 2017

It might not be recommended to use kwargs for sklearn. see the notes on this page: http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html.

@khotilov
Copy link
Member

@gaw89 Could you please find out whether the requirement that @wxchan has pointed out is about consistency, clarity and safety (e.g., protecting against typos) required for an estimator to become an official part of sklearn (which is not a goal for xgboost), or whether using kwargs might actually break something.

@gaw89
Copy link
Contributor Author

gaw89 commented May 23, 2017

@khotilov, yes, trying to sort that out now. Unfortunately, the page that @wxchan linked doesn't give much info, so I might have to go through some Sklearn source... I may try to put in issue in for Sklearn as well to get clarification.

In the meantime, I am using this branch on my machine and it's working great. Allows me to use GPU in GridSearchCV! :)

@gaw89
Copy link
Contributor Author

gaw89 commented May 23, 2017

@khotilov and @wxchan, I filed an issue with Sklearn. Here's their response:

"It's the API of scikit-learn. Things are not garantied to work if you do
it differently. We don't support it."

So, it seems that it is primarily an API/support issue - they won't guarantee that things will not break if we use this. However, it does not guarantee that things will break either. From looking at the Sklearn source as it is presently, I don't see how this would cause problems, but I am not terribly familiar with the inner workings of Sklearn.

I'm not sure what the best course of action is here. It seems fairly unlikely that it will make things break, but there is the risk. At the same time, individually adding the various booster params to the init could be cumbersome both for maintenance and usage.

I am happy to work on updating these things individually if that is the decision.

@RAMitchell
Copy link
Member

It seems likely that they want each parameter explicit in the init function for reflection purposes.

I think we go ahead with the documentation that this is an unsupported sklearn parameter and parameters passed this way are not gauranteed to interact correctly with sklearn.

It is clear that we need a way to pass a variety of parameters to the underlying xgboost model.

Great work @gaw89.

@gaw89
Copy link
Contributor Author

gaw89 commented May 23, 2017

@RAMitchell, thanks! I went ahead and added a note to indicate this caveat with usage of kwargs. Then I fixed my lint errors again... Sick of Spyder adding whitespace at the end of my lines.

@terrytangyuan terrytangyuan merged commit 0f3a404 into dmlc:master May 24, 2017
@terrytangyuan
Copy link
Member

Thank you!

CodingCat pushed a commit to CodingCat/xgboost that referenced this pull request Jun 18, 2017
* Added kwargs support for Sklearn API

* Updated NEWS and CONTRIBUTORS

* Fixed CONTRIBUTORS.md

* Added clarification of **kwargs and test for proper usage

* Fixed lint error

* Fixed more lint errors and clf assigned but never used

* Fixed more lint errors

* Fixed more lint errors

* Fixed issue with changes from different branch bleeding over

* Fixed issue with changes from other branch bleeding over

* Added note that kwargs may not be compatible with Sklearn

* Fixed linting on kwargs note
@lock lock bot locked as resolved and limited conversation to collaborators Jan 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants