Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot_partial_dependence() API does not work with LightGBM regression model #16878

Closed
little-eyes opened this issue Apr 9, 2020 · 4 comments
Closed

Comments

@little-eyes
Copy link

Describe the bug

I use LightGBM 2.3.2 to train a regression model, then call the plot_partial_dependence() to create a partial dependence graph. Then I encountered into the following error.

  File "C:\Users\jilli\Anaconda3\lib\site-packages\sklearn\inspection\_partial_dependence.py", line 319, in partial_dependence
    check_is_fitted(estimator)
  File "C:\Users\jilli\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 967, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This LGBMRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Then I trace down the code, I found LGBMRegressor's implementation uses @property to the attributes. Example below:

    @property
    def n_features_(self):
        """Get the number of features of fitted model."""
        if self._n_features is None:
            raise LGBMNotFittedError('No n_features found. Need to call fit beforehand.')
        return self._n_features

    @property
    def best_score_(self):
        """Get the best score of fitted model."""
        if self._n_features is None:
            raise LGBMNotFittedError('No best_score found. Need to call fit beforehand.')
        return self._best_score

    @property
    def best_iteration_(self):
        """Get the best iteration of fitted model."""
        if self._n_features is None:
            raise LGBMNotFittedError('No best_iteration found. Need to call fit with early_stopping_rounds beforehand.')
        return self._best_iteration

Reference here: https://github.com/Microsoft/LightGBM/blob/master/python-package/lightgbm/sklearn.py

In utils/validation.py, the check_is_fitted function only look at the vars(estimator) which returns empty dictionary. So the check fails, though the estimator actually has the above properties.

    if attributes is not None:
        if not isinstance(attributes, (list, tuple)):
            attributes = [attributes]
        attrs = all_or_any([hasattr(estimator, attr) for attr in attributes])
    else:
        attrs = [v for v in vars(estimator)
                 if v.endswith("_") and not v.startswith("__")]

Steps/Code to Reproduce

Create a LightGBM regressor, then call plot_partial_dependence() API to plot.

Expected Results

Expect a graph to be generated.

Actual Results

The errors above.

Versions

System:
    python: 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\jilli\Anaconda3\python.exe
   machine: Windows-10-10.0.19041-SP0

Python dependencies:
       pip: 20.0.2
setuptools: 45.2.0.post20200210
   sklearn: 0.23.dev0
     numpy: 1.18.1
     scipy: 1.4.1
    Cython: 0.29.15
    pandas: 0.25.1
matplotlib: 3.1.3
    joblib: 0.14.1

Built with OpenMP: True
@rth
Copy link
Member

rth commented Apr 9, 2020

Thanks for the report. It's an interesting problem, for now I'm not sure about the best way to fix it.

  • we could call some of those properties if they end with _, related to what you proposed in the PR. But this can be slow and not reliable (e.g. SVC uses properties for predict_proba that will always return a function when probability=True), as far as I remember.
  • a more reliable way could be to have some optional way to specify fitted attributes in an estimator, that will be be used by check_is_fitted. Maybe a optional attribute with the list of attributes to check, but this increases the API surface somewhat.
  • ask LGBM to drop those properties and use plain attributes (related to [RFC] compatibility with scikit-learn microsoft/LightGBM#2628)
  • I guess we can also special case LBGM in check_is_fitted given its popularity but that's not very satisfactory

@rth
Copy link
Member

rth commented Apr 9, 2020

ask LGBM to drop those properties and use plain attributes

As a side note, I'm pretty sure some scikit-learn's common test must be failing as a result of their current approach. If not we should add one.

@rth rth added the API label Apr 9, 2020
@little-eyes
Copy link
Author

Thanks @rth for the reply. Yeah, this is an interesting problem. I agree that the fix I have is not ideal. You No.2 idea might be the best approach that each estimator can register their "properties to check", then check_is_fitted can validate them.

Another idea in my mind is to keep a bit is_fitted in the base estimator that sets to true after the fit() function is being called. Whenever estimator implements fit, they can call super.fit(...) which will set the bit to be true. But I'm not sure if all the estimators have done super.fit(...) at all.

@cmarmo cmarmo added Bug and removed Bug: triage labels Dec 13, 2020
@thomasjpfan
Copy link
Member

With the new __sklearn_is_fitted__ protocol, estimators can now define how they choose to respond to check_is_fitted. LightGBM has already introduced it into their estimator here: microsoft/LightGBM#4636

With that in mind, I am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants