Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace dask-xgboost with xgboost #50

Merged
merged 8 commits into from
Dec 18, 2020
Merged

replace dask-xgboost with xgboost #50

merged 8 commits into from
Dec 18, 2020

Conversation

jameslamb
Copy link
Contributor

@jameslamb jameslamb commented Dec 14, 2020

Now that xgboost 1.3.0 is available, I think it's time to remove uses of dask-xgboost in the example here and replace them with xgboost.dask (the Dask interface built in to xgboost). This is the direction the XGBoost + Dask community is moving (dask/community#104, dask/dask-xgboost#39).

This PR proposes replacing dask-xgboost with xgboost. I tested these changes on the 2020.11.30 release of Saturn (in an internal environment), using the image saturncloud/saturn:2020.11.30.

While I'm touching the XGBoost code, this PR also replaces cutting over all of the XGBoost examples to "tree_method": "hist". We had previously used the slower "tree_method": "approx" because dask-xgboost required pinning to an old XGBoost version that didn't support hist. But hist is available in 1.3.0 and can be much faster.

Notes for Reviewers

In this PR, I'm proposing adding a start script for examples-cpu to deal with old XGBoost versions that are currently installed in the 2020.11.30 images. I'll 've added another PR to update the images (saturncloud/images#127), and then whenever those are rebuilt we can change the image used here and remove that start script. Please let me know what you think about this.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@rikturr rikturr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!! Found a couple small typos.

And regarding the start script/images, would it be possible to coordinate the release of images and examples together so that we can avoid the workaround in the start script? I'm afraid users could get confused seeing the longer start script, and they might even remove some of the lines, causing problems when they try to spin up the example resources.

"cell_type": "markdown",
"metadata": {},
"source": [
"`xgboost.dask.predict()` can be used to create predictiosn on a Dask collection using an XGBoost model object. Note that this model object is just a regular XGBoost booster, not a special Dask-specific model object.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"`xgboost.dask.predict()` can be used to create predictiosn on a Dask collection using an XGBoost model object. Note that this model object is just a regular XGBoost booster, not a special Dask-specific model object.\n",
"`xgboost.dask.predict()` can be used to create predictions on a Dask collection using an XGBoost model object. Note that this model object is just a regular XGBoost booster, not a special Dask-specific model object.\n",

"cell_type": "markdown",
"metadata": {},
"source": [
"`xgboost.dask.predict()` can be used to create predictiosn on a Dask collection using an XGBoost model object. Note that this model object is just a regular XGBoost booster, not a special Dask-specific model object.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"`xgboost.dask.predict()` can be used to create predictiosn on a Dask collection using an XGBoost model object. Note that this model object is just a regular XGBoost booster, not a special Dask-specific model object.\n",
"`xgboost.dask.predict()` can be used to create predictions on a Dask collection using an XGBoost model object. Note that this model object is just a regular XGBoost booster, not a special Dask-specific model object.\n",

@jsignell jsignell removed their request for review December 14, 2020 21:45
@jameslamb jameslamb requested a review from jsignell as a code owner December 16, 2020 18:29
@jameslamb jameslamb requested a review from rikturr December 16, 2020 22:26
Copy link
Contributor

@rikturr rikturr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@jameslamb jameslamb merged commit 42ae8ca into main Dec 18, 2020
@jameslamb jameslamb deleted the feat/xgboost-1.3.0 branch December 18, 2020 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants