[python-package] support sub-classing scikit-learn estimators #6783

jameslamb · 2025-01-10T06:39:24Z

I recently saw a Stack Overflow post ("Why can't I wrap LGBM?") expressing the same concerns from #4426 ... it's difficult to sub-class lightgbm's scikit-learn estimators.

It doesn't have to be! Look how minimal the code is for XGBRFRegressor:

https://github.com/dmlc/xgboost/blob/45009413ce9f0d2bdfcd0c9ea8af1e71e3c0a191/python-package/xgboost/sklearn.py#L1869

This PR proposes borrowing some patterns I learned while working on xgboost's scikit-learn estimators to make it easier to sub-class lightgbm estimators. This also has the nice side effect of simplifying the lightgbm.dask code 😁

Notes for Reviewers

Why make the breaking change of requiring keyword args?

As part of this PR, I'm proposing immediately switching the constructors for scikit-learn estimators here (including those in lightgbm.dask) to only supporting keyword arguments.

Why I'm proposing this instead of a deprecation cycle:

scikit-learn itself does this (HistGradientBoostingClassifier example)
- so all of its machinery passing parameters around as keyword arguments
- keyword arguments are recommended throughout https://scikit-learn.org/stable/developers/develop.html
I strongly suspect that using positional arguments for these constructors is rare
anyone relying on positional arguments will get a loud and easy-to-diagnose-and-fix error, so the effort to adjust should be minimal

import lightgbm as lgb
lgb.LGBMClassifier("gbdt")
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: LGBMClassifier.__init__() takes 1 positional argument but 2 were given

I posted a related answer to that Stack Overflow question

https://stackoverflow.com/a/79344862/3986677

…htGBM into python/sklearn-subclassing

tests/python_package_test/test_dask.py

StrikerRUS · 2025-01-27T16:15:04Z

Could you please setup an RTD build for this branch? I'd like to see how init signature will be rendered there.

jameslamb · 2025-01-27T16:18:37Z

Sure, here's a first build: https://readthedocs.org/projects/lightgbm/builds/26983170/

StrikerRUS

Great simplification, thanks for working on it!

I don't have any serious comments, just want to get some answers before approving.

docs/FAQ.rst

python-package/lightgbm/sklearn.py

tests/python_package_test/test_dask.py

StrikerRUS · 2025-01-27T17:44:13Z

python-package/lightgbm/dask.py

-            importance_type=importance_type,
-            **kwargs,
-        )
+        super().__init__(**kwargs)

    _base_doc = LGBMClassifier.__init__.__doc__


Do you think it's OK to have just one client argument in the signature, but describe all parent args in the docstring?..

I think it's a little better for users to see all the parameters right here, instead of having to click over to another page.

This is what XGBoost is doing too: https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRFRegressor

But I do also appreciate that it could look confusing.

If we don't do it this way, then I'd recommend we add a link in the docs for `**kwargs`` in these estimators, like this:

**kwargs Other parameters for the model. These can be any of the keyword arguments for LGBMModel or any other LightGBM parameters documented at https://lightgbm.readthedocs.io/en/latest/Parameters.html.

I have a weak preference for keeping it as-is (the signature in docs not having all parameters, but docstring having all parameters), but happy to change it if you think that's confusing.

Thanks for clarifying your opinion!
I love your suggestion for **kwargs description. But my preference is also weak 🙂
I think we need a third judge opinion for this question.

Either way, I'm approving this PR!

@jmoralez or @borchero could one of you comment on this thread and help us break the tie?

To make progress on the release, if we don't hear back in the next 2 days I'll merge this PR as-is and we can come back and change the docs later.

Co-authored-by: Nikita Titov <[email protected]>

StrikerRUS

Thank you very much!

jameslamb added 3 commits January 4, 2025 01:59

[python-package] make sub-classing scikit-learn estimators easier

3b5f648

tests passing

02c48c3

add docs

7b720cb

jameslamb added in progress breaking labels Jan 10, 2025

jameslamb added 4 commits January 10, 2025 00:40

Update tests/python_package_test/test_sklearn.py

51b5e64

remove docs links

81178fd

Merge branch 'python/sklearn-subclassing' of github.com:microsoft/Lig…

110b0e1

…htGBM into python/sklearn-subclassing

Merge branch 'master' into python/sklearn-subclassing

104471a

jameslamb changed the title ~~WIP: [python-package] support sub-classing scikit-learn estimators~~ [python-package] support sub-classing scikit-learn estimators Jan 11, 2025

jameslamb added awaiting review and removed in progress labels Jan 11, 2025

jameslamb marked this pull request as ready for review January 11, 2025 05:06

jameslamb requested review from guolinke, shiyu1994, jmoralez, borchero and StrikerRUS as code owners January 11, 2025 05:06

jameslamb added 2 commits January 12, 2025 23:24

fix Dask tests

d80b0df

Merge branch 'python/sklearn-subclassing' of github.com:microsoft/Lig…

b7e041a

…htGBM into python/sklearn-subclassing

jameslamb commented Jan 13, 2025

View reviewed changes

tests/python_package_test/test_dask.py Show resolved Hide resolved

Merge branch 'master' into python/sklearn-subclassing

68177a7

jameslamb mentioned this pull request Jan 23, 2025

WIP: release v4.6.0 #6796

Draft

31 tasks

jameslamb added 2 commits January 26, 2025 11:31

Merge branch 'master' into python/sklearn-subclassing

70f29a7

Merge branch 'master' into python/sklearn-subclassing

6796ba9

StrikerRUS reviewed Jan 27, 2025

View reviewed changes

jameslamb and others added 2 commits January 29, 2025 22:29

Update tests/python_package_test/test_dask.py

409733a

Co-authored-by: Nikita Titov <[email protected]>

Update python-package/lightgbm/sklearn.py

0a40e9b

Co-authored-by: Nikita Titov <[email protected]>

jameslamb and others added 2 commits January 29, 2025 22:48

Update docs/FAQ.rst

cd54639

Co-authored-by: Nikita Titov <[email protected]>

Merge branch 'master' into python/sklearn-subclassing

e39d19f

jameslamb requested a review from StrikerRUS January 30, 2025 04:48

StrikerRUS approved these changes Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] support sub-classing scikit-learn estimators #6783

[python-package] support sub-classing scikit-learn estimators #6783

jameslamb commented Jan 10, 2025 •

edited

Loading

StrikerRUS commented Jan 27, 2025

jameslamb commented Jan 27, 2025

StrikerRUS left a comment

StrikerRUS Jan 27, 2025 •

edited

Loading

jameslamb Jan 30, 2025

StrikerRUS Jan 30, 2025

jameslamb Jan 31, 2025

StrikerRUS left a comment

[python-package] support sub-classing scikit-learn estimators #6783

Are you sure you want to change the base?

[python-package] support sub-classing scikit-learn estimators #6783

Conversation

jameslamb commented Jan 10, 2025 • edited Loading

Notes for Reviewers

Why make the breaking change of requiring keyword args?

I posted a related answer to that Stack Overflow question

StrikerRUS commented Jan 27, 2025

jameslamb commented Jan 27, 2025

StrikerRUS left a comment

Choose a reason for hiding this comment

StrikerRUS Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

jameslamb Jan 30, 2025

Choose a reason for hiding this comment

StrikerRUS Jan 30, 2025

Choose a reason for hiding this comment

jameslamb Jan 31, 2025

Choose a reason for hiding this comment

StrikerRUS left a comment

Choose a reason for hiding this comment

jameslamb commented Jan 10, 2025 •

edited

Loading

StrikerRUS Jan 27, 2025 •

edited

Loading