Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge test_<est>_local_predict and test_<est> tests for Dask module #3833

Closed
StrikerRUS opened this issue Jan 24, 2021 · 8 comments
Closed

Comments

@StrikerRUS
Copy link
Collaborator

Refer to #3708 (comment) for more details.

@ShrillShrestha
Copy link
Contributor

@jameslamb, so by merging do you mean modifying one of them to support the functionality of both? I looked through the code and saw some similar statements being implemented, but it would be helpful if you could tell me how I shall navigate/proceed to make the change?
P.S. Please forgive my novice questions.

@jameslamb
Copy link
Collaborator

No problem, ask any questions you need. We're here to help and really thankful for your efforts!

You can look at the changes I made in https://github.com/microsoft/LightGBM/pull/3786/files for reference. I eliminated a test in there called test_classifier_proba, and just moved its contents into the test test_classifier. To close this issue, do the same with the local_predict tests.

So, for example, take

p1 = dask_classifier.to_local().predict(dX)
local_classifier = lightgbm.LGBMClassifier(**params)
local_classifier.fit(X, y, sample_weight=w)
p2 = local_classifier.predict(X)
assert_eq(p1, p2)
assert_eq(y, p1)
assert_eq(y, p2)
and move it into the test test_classifier.

To try the tests out yourself locally, you can build from local source

cd python-package
python setup.py sdist
pip install dist/lightgbm*.tar.gz

and then run just the test you modified

pytest tests/python_package_test/test_dask.py::test_classifier

@ShrillShrestha
Copy link
Contributor

Thanks, @jameslamb, I went through the reference code you provided.

I feel like just deleting the test_classifier_local_predict method should do the job. Am I headed in the right direction?
Also, I am getting the error ERROR: Failed building wheel for lightgbm when I run pip install dist/lightgbm*.tar.gz. Can you give me some advice on how to solve it?

@jameslamb
Copy link
Collaborator

I feel like just deleting the test_classifier_local_predict method should do the job.

That is not correct. You need to move the type of prediction in the link I shared (https://github.com/microsoft/LightGBM/blob/ac706e10e461d1c5987a3975cc48aa82fde33566/tests/python_package_test/test_dask.py#L259-L267) and make sure it is moved into test_classifier. And then make a similar change for DaskLGBMRegressor and DaskLGBMRanker tests.

Also, I am getting the error ERROR: Failed building wheel for lightgbm

You should see a log message that says "The full version of error log was saved into ". Open that file for details on what went wrong compiling LightGBM.

@ShrillShrestha
Copy link
Contributor

ShrillShrestha commented Jan 24, 2021

I think I need to include n_estimators=10 as a parameter or dlgbm.DaskLGBMRegressor as it's not in the params for test_regressor

params = {
"random_state": 42,
"num_leaves": 10
}
dask_regressor = dlgbm.DaskLGBMRegressor(
time_out=5,
local_listen_port=listen_port,
tree='data',
**params
)

but is in
dask_regressor = dlgbm.DaskLGBMRegressor(
local_listen_port=listen_port,
random_state=42,
n_estimators=10,
num_leaves=10,
tree_type='data'
)

Also, I saw tree, tree_types, and tree_learner_type are aliases to each other so that means I can use either one as a parameter name. The same for data and data_parallel, as they are also aliases.
Am I right with the above statements?

Also, the error was:
In file included from /tmp/pip-req-build-vmsufmof/compile/include/LightGBM/config.h:16, from /tmp/pip-req-build-vmsufmof/compile/src/boosting/gbdt_model_text.cpp:5: /tmp/pip-req-build-vmsufmof/compile/include/LightGBM/utils/common.h:35:10: fatal error: ../../../external_libs/fmt/include/fmt/format.h: No such file or directory 35 | #include "../../../external_libs/fmt/include/fmt/format.h"
Any tips to resolve this error?

@jameslamb
Copy link
Collaborator

I think I need to include n_estimators=10 as a parameter or dlgbm.DaskLGBMRegressor as it's not in the params for test_regressor

I believe the for the regression tests, we had to omit setting n_estimators=10. You can leave it out (but good attention to detail!).

Also, I saw tree, tree_types, and tree_learner_type are aliases to each other so that means I can use either one as a parameter name. The same for data and data_parallel, as they are also aliases. Am I right with the above statements?

Yes you are correct! Many of LightGBM's parameters can be provided by aliases. You can see details here https://lightgbm.readthedocs.io/en/latest/Parameters.html#learning-control-parameters.

No such file or directory 35 | #include "../../../external_libs/fmt/include/fmt/format.h"

Make sure that the branch you are working on is up to date with the latest version of master for LightGBM. You can also run the following code from the root of the repo...it's possible you use git clone instead of git clone --recursive when cloning LightGBM. We use several git submodules in this project, which need to be initialized.

git submodule update --recursive

@ShrillShrestha
Copy link
Contributor

I was able to run the test, and I go 4 tests passed and 2 failed. I am getting two ValueError: Expected 2D array, got scalar array instead errors from test_classifier
method.
For merging test_classifier, I referenced the prediction p1 in
https://github.com/ShrillShrestha/LightGBM/blob/ac706e10e461d1c5987a3975cc48aa82fde33566/tests/python_package_test/test_dask.py#L259
as p1_local_predict = dask_classifier.to_local().predict(dX) below
https://github.com/ShrillShrestha/LightGBM/blob/ac706e10e461d1c5987a3975cc48aa82fde33566/tests/python_package_test/test_dask.py#L147.
and added

assert_eq(p1_local_predict, p2)
assert_eq(y, p1_local_predict) 

at the end of test_classifier before client.close(), then deleted test_classifier_local_predict method.
Is it because of
https://github.com/ShrillShrestha/LightGBM/blob/ac706e10e461d1c5987a3975cc48aa82fde33566/tests/python_package_test/test_dask.py#L246 ( _create_data parameter value for output i.e "array")
as I didn't change any _create_data parameter values in test_classifier.
How can I solve the error?

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants