plot_partial_dependence() API does not work with LightGBM regression model #16878

little-eyes · 2020-04-09T04:41:40Z

Describe the bug

I use LightGBM 2.3.2 to train a regression model, then call the plot_partial_dependence() to create a partial dependence graph. Then I encountered into the following error.

  File "C:\Users\jilli\Anaconda3\lib\site-packages\sklearn\inspection\_partial_dependence.py", line 319, in partial_dependence
    check_is_fitted(estimator)
  File "C:\Users\jilli\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 967, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This LGBMRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Then I trace down the code, I found LGBMRegressor's implementation uses @property to the attributes. Example below:

    @property
    def n_features_(self):
        """Get the number of features of fitted model."""
        if self._n_features is None:
            raise LGBMNotFittedError('No n_features found. Need to call fit beforehand.')
        return self._n_features

    @property
    def best_score_(self):
        """Get the best score of fitted model."""
        if self._n_features is None:
            raise LGBMNotFittedError('No best_score found. Need to call fit beforehand.')
        return self._best_score

    @property
    def best_iteration_(self):
        """Get the best iteration of fitted model."""
        if self._n_features is None:
            raise LGBMNotFittedError('No best_iteration found. Need to call fit with early_stopping_rounds beforehand.')
        return self._best_iteration

Reference here: https://github.com/Microsoft/LightGBM/blob/master/python-package/lightgbm/sklearn.py

In utils/validation.py, the check_is_fitted function only look at the vars(estimator) which returns empty dictionary. So the check fails, though the estimator actually has the above properties.

    if attributes is not None:
        if not isinstance(attributes, (list, tuple)):
            attributes = [attributes]
        attrs = all_or_any([hasattr(estimator, attr) for attr in attributes])
    else:
        attrs = [v for v in vars(estimator)
                 if v.endswith("_") and not v.startswith("__")]

Steps/Code to Reproduce

Create a LightGBM regressor, then call plot_partial_dependence() API to plot.

Expected Results

Expect a graph to be generated.

Actual Results

The errors above.

Versions

System:
    python: 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\jilli\Anaconda3\python.exe
   machine: Windows-10-10.0.19041-SP0

Python dependencies:
       pip: 20.0.2
setuptools: 45.2.0.post20200210
   sklearn: 0.23.dev0
     numpy: 1.18.1
     scipy: 1.4.1
    Cython: 0.29.15
    pandas: 0.25.1
matplotlib: 3.1.3
    joblib: 0.14.1

Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

rth · 2020-04-09T10:24:21Z

Thanks for the report. It's an interesting problem, for now I'm not sure about the best way to fix it.

we could call some of those properties if they end with _, related to what you proposed in the PR. But this can be slow and not reliable (e.g. SVC uses properties for predict_proba that will always return a function when probability=True), as far as I remember.
a more reliable way could be to have some optional way to specify fitted attributes in an estimator, that will be be used by check_is_fitted. Maybe a optional attribute with the list of attributes to check, but this increases the API surface somewhat.
ask LGBM to drop those properties and use plain attributes (related to [RFC] compatibility with scikit-learn microsoft/LightGBM#2628)
I guess we can also special case LBGM in check_is_fitted given its popularity but that's not very satisfactory

rth · 2020-04-09T10:27:46Z

ask LGBM to drop those properties and use plain attributes

As a side note, I'm pretty sure some scikit-learn's common test must be failing as a result of their current approach. If not we should add one.

little-eyes · 2020-04-11T04:51:04Z

Thanks @rth for the reply. Yeah, this is an interesting problem. I agree that the fix I have is not ideal. You No.2 idea might be the best approach that each estimator can register their "properties to check", then check_is_fitted can validate them.

Another idea in my mind is to keep a bit is_fitted in the base estimator that sets to true after the fit() function is being called. Whenever estimator implements fit, they can call super.fit(...) which will set the bit to be true. But I'm not sure if all the estimators have done super.fit(...) at all.

thomasjpfan · 2022-04-13T18:46:40Z

With the new __sklearn_is_fitted__ protocol, estimators can now define how they choose to respond to check_is_fitted. LightGBM has already introduced it into their estimator here: microsoft/LightGBM#4636

With that in mind, I am closing this issue.

little-eyes added the Bug: triage label Apr 9, 2020

little-eyes mentioned this issue Apr 9, 2020

[MRG] check_is_fitted() should handle getter/setter attributes #16879

Closed

rth added the API label Apr 9, 2020

cmarmo added Bug and removed Bug: triage labels Dec 13, 2020

thomasjpfan closed this as completed Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plot_partial_dependence() API does not work with LightGBM regression model #16878

plot_partial_dependence() API does not work with LightGBM regression model #16878

little-eyes commented Apr 9, 2020

rth commented Apr 9, 2020

rth commented Apr 9, 2020

little-eyes commented Apr 11, 2020

thomasjpfan commented Apr 13, 2022

plot_partial_dependence() API does not work with LightGBM regression model #16878

plot_partial_dependence() API does not work with LightGBM regression model #16878

Comments

little-eyes commented Apr 9, 2020

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

rth commented Apr 9, 2020

rth commented Apr 9, 2020

little-eyes commented Apr 11, 2020

thomasjpfan commented Apr 13, 2022