Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] support sub-classing scikit-learn estimators #6783
[python-package] support sub-classing scikit-learn estimators #6783
Changes from 9 commits
3b5f648
02c48c3
7b720cb
51b5e64
81178fd
110b0e1
104471a
d80b0df
b7e041a
68177a7
70f29a7
6796ba9
409733a
0a40e9b
64850c6
cd54639
e39d19f
7077c24
7c59cd9
98eb476
734961c
3d351a4
9be0ec1
51c18ad
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's OK to have just one
client
argument in the signature, but describe all parent args in the docstring?..There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a little better for users to see all the parameters right here, instead of having to click over to another page.
This is what XGBoost is doing too: https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRFRegressor
But I do also appreciate that it could look confusing.
If we don't do it this way, then I'd recommend we add a link in the docs for `**kwargs`` in these estimators, like this:
I have a weak preference for keeping it as-is (the signature in docs not having all parameters, but docstring having all parameters), but happy to change it if you think that's confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying your opinion!
I love your suggestion for
**kwargs
description. But my preference is also weak 🙂I think we need a third judge opinion for this question.
Either way, I'm approving this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmoralez or @borchero could one of you comment on this thread and help us break the tie?
To make progress on the release, if we don't hear back in the next 2 days I'll merge this PR as-is and we can come back and change the docs later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I only saw this now! My personal preference would actually be to keep all of the parameters (similar to the previous state) and simply make them keyword arguments. While this results in more code and some duplication of defaults, I think that this is the clearest interface for users. If you think this is undesirable @jameslamb, I'd at least opt for documenting all of the "transitive" parameters, just like in the XGBoost docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I still think that
would be better... But OK.
What I'm definitely sure in is that sklearn classes and Dask ones should follow the same pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was so focused on the Dask estimators in the most recent round of changes that I forgot about the affect this would have on
LGBM{Classifier,Ranker,Regressor}
. I agree, I need to fix this inconsistencyI do think that it'd be better to have all the arguments listed out in the signature explicitly. That's helpful for code completion in editors and
help()
in a REPL. And I strongly suspect that users useLGBM{Classifier,Ranker,Regressor}
directly much more often than they useLGBMModel
. It introduces duplication in the code, but I personally am OK with that in exchange for those benefits for users, for the reasons I mentioned in #6783 (comment)Given that set of possible benefits, @StrikerRUS would you be ok with me duplicating all the defaults into the
__init__()
signature ofLGBM{Classifier,Ranker,Regressor}
too (as currently happens for the Dask estimators) and expanding the tests to confirm that the arguments are all consistent betweenLGBMModel
,LGBM{Classifier,Ranker,Regressor}
, andDaskLGBM{Classifier,Ranker,Regressor}
?Or would you still prefer having
**kwargs
and a docstring like this?It seems from comments above that @borchero was also OK with either form... I think we are all struggling to choose a preferred form here. I don't have any other thoughts on this, so I'll happily defer to your decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll approve consistent version with explicitly listed args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'm sorry for how much effort reviewing this PR has turned into.
I do think LightGBM's users will appreciate sub-classing being easier, and still having tab completion for constructor arguments for
LGBM{Classifier,Ranker,Regressor}
.I just pushed 3d351a4 repeating all the arguments in the constructors.
Also added a test in
test_sklearn.py
similar to the Dask one, to ensure that all the default values and the set of parameters stay the same.Updated docs:
Now they look the same:
I also re-ran the sub-classing example being added to
FAQ.rst
here to be sure it works.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to apologize! Thank you for working on this very important change!