Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xgboost GPU support #26

Closed
RAMitchell opened this issue Apr 30, 2019 · 22 comments · Fixed by #84
Closed

Xgboost GPU support #26

RAMitchell opened this issue Apr 30, 2019 · 22 comments · Fixed by #84

Comments

@RAMitchell
Copy link

Hi @aldanor, @beckermr, I am an xgboost dev mostly responsible for the GPU algorithms. We have had a few requests to get a GPU enabled package up on anaconda for linux. What would it take to make this happen? I am fairly new to anaconda, any guidance would be appreciated.

Thanks,
Rory

@jakirkham
Copy link
Member

Hi Rory, thanks for stopping by and filing this request.

Currently conda-forge does not have GPU support, but it is something we are hoping to change.

To get things started, have submitted PR ( conda-forge/docker-images#93 ) to create a new Docker image (similar to the one used for Linux builds currently), which is based off of existing NVIDIA CUDA Docker images that include NVCC. Also have submitted PR ( conda-forge/staged-recipes#8229 ), which creates a shim compiler package for conda-build so it can use the existing NVCC install in the Docker image easily. There are a few more things that we will want to do, but that is a good starting point.

Right now am waiting for community feedback on these two proposals. If you have any thoughts on them, please feel free to chime in.

@jakirkham
Copy link
Member

Do you have a Conda recipe you have been using for this currently?

Barring that what would be the best instructions to follow for building a GPU enabled package? These or something else?

Also what are your requirements in terms of GPU libraries (e.g. CUDA, NCCL, etc.) versions? Are there any other library dependencies we should be aware of?

Finally is there a good test we could run to ensure the build worked correctly?

@RAMitchell
Copy link
Author

We have a Jenkins based CI system that builds python wheels for pypy.

Here is the docker file: https://github.com/dmlc/xgboost/blob/master/tests/ci_build/Dockerfile.gpu_build

This contains appropriate nccl versions. We have been releasing builds with cuda 8.0 for maximum compatibility. We then test CPU algorithms in a minimal container (https://github.com/dmlc/xgboost/blob/master/tests/ci_build/Dockerfile.release) to ensure it still works on a system without a GPU, as well as testing the GPU algorithms on a GPU enabled container.

When I say testing, I would run the Python tests here (tests/python-gpu), disabling or enabling the multi-GPU tests as appropriate. Additionally run the google test executable testxgboost.exe, you will have to enable google tests in the cmake options to generate this.

@jakirkham
Copy link
Member

Thanks for the info. Have a few more follow-up questions.

Do you require those exact versions in the container? Or are they lower bounds? Are there any versions that you have found to be problematic?

As for CUDA 8.0, currently PR ( conda-forge/docker-images#93 ) proposes building for both CUDA 9.2 and 10.0. Should we consider adding 8.0 as well? How long are you holding onto older CUDA versions? When do you start picking up newer CUDA versions? Do you value building for multiple versions of CUDA? In particular JIT compilation time has come up before.

Are there particular tests that you have found useful for catching common build or user issues?

@RAMitchell
Copy link
Author

I haven't really given transitioning between cuda versions so much thought, we started with 8.0 and haven't upgraded since. 9.2 should be fine as well, maybe we should upgrade for pypy as well. Jenkins is currently building 8.0, 9.2 and 10.0 without issues, there is a known compilation bug 10.1.

I do not want to add any additional complexity to the user install experience, hence the choice to use the most stable version. If we can release multiple versions without impacting user experience, then this is beneficial, in particular to avoid JIT compilation for volta.

I can't recommend any specific tests, I would normally just run all python tests and google tests.

@jakirkham
Copy link
Member

Thanks for the additional info.

With CUDA 8.0 it looks like we need an older compiler than what we typically use, I can look into that. Will focus on CUDA 9.2 and CUDA 10.0 near term as that aligns well with what we have. If you could give me an idea of the relative importance of CUDA 8.0 to users that would be very helpful.

Sure that makes sense. In the wheel context picking something old that works broadly sounds like a good choice. Conda is designed to be language agnostic so is very comfortable expressing library dependencies that are not Python specific (e.g. cudatoolkit versions). That said, I would appreciate your feedback on packages we build to ensure they are working as expected.

@RAMitchell
Copy link
Author

RAMitchell commented May 2, 2019

I would say just look at cuda 9.2 or greater. We may remove cuda 8.0 support in the short term anyway.

@jakirkham
Copy link
Member

Sounds good. Thanks for letting me know. 🙂

@hcho3
Copy link
Contributor

hcho3 commented May 15, 2019

@jakirkham Have you also considered building GPU XGBoost for Windows? I have recently spent time to set up Jenkins CI for Windows (dmlc/xgboost#4463 and dmlc/xgboost#4469), and I found out that you don't actually need a GPU to compile CUDA code. (Running CUDA code will require one, however.) You just need to install the CUDA toolkit. So in principle it should be possible to use Appveyor or Azure Pipelines to compile CUDA code.

EDIT. Just found an example of installing CUDA toolkit inside AppVeyor: https://ci.appveyor.com/project/tmcdonell/cuda/builds/24181692/job/7qum8ca8g7l8qfqn#L6

@jakirkham
Copy link
Member

Thanks for raising this @hcho3.

I agree it would be good to think about GPUs on other platforms (including Windows). Have not thought too much on this yet.

Would you be comfortable starting an issue on the webpage repo? This is normally where we discuss things that may affect the org more generally. Would also be happy to raise that issue if you prefer (though may ask you to fill in the Windows specifics as you have looked more closely at this 🙂).

@jakirkham
Copy link
Member

I've seen discussion of a new release coming out. Given this, what is the best way to build a GPU enabled xgboost package now? @RAMitchell ? 🙂

@RAMitchell
Copy link
Author

RAMitchell commented Oct 17, 2019

I don't think much has really changed from our end regarding our build system. We no longer support cuda 8.0. We are currently considering an intermediate release before 1.0, I don't know how soon this will happen.

Where did we get to with this last time? Is the infrastructure in place from the conda-forge perspective to build GPU code? I see there are still some work going on with nccl in #9694. We depend on nccl for the distributed version (e.g. with dask) but could still release a single GPU version to start with.

@jakirkham
Copy link
Member

Thanks for the follow-up Rory! 😄

An intermediate release would be very welcome.

conda-forge now has the infrastructure to perform the builds and we have tried this out on a few feedstocks so far. We support CUDA 9.2, 10.0, and 10.1. Docs still need to be written, but that shouldn't be a blocker to getting this started here.

The NCCL package is ready to go. Am just waiting on feedback from some other people before going ahed. Though no objections to starting more simply here if that makes sense.

@jakirkham
Copy link
Member

Does xgboost still have compilations issues with CUDA 10.1 or was that fixed?

@RAMitchell
Copy link
Author

Fixed afaik

@jakirkham
Copy link
Member

Is the fix in 0.90 or master?

@RAMitchell
Copy link
Author

Looks like 0.9 should include the fix: dmlc/xgboost#4475

@RAMitchell
Copy link
Author

Any updates on this? Xgboost 1.0 is released now.

@twsl
Copy link

twsl commented Apr 8, 2020

I am waiting for it as well. What exactly is currently blocking this?

@ksangeek
Copy link
Contributor

I see that the only other thing missing to update the conda recipe with GPU support is the need for the cudatoolkit conda package in conda-forge. It will be required in the test stage and run.
I see that cudatoolkit-dev and nccl are already available.

Otherwise, we already have a reference on what needs to be done from this recipe branch in the anaconda defaults - https://github.com/AnacondaRecipes/xgboost-feedstock/blob/py-xgboost-gpu/recipe/meta.yaml.

@jakirkham
Copy link
Member

We are discussing how best to proceed offline atm. Will update once we have figured out a plan of action.

cc @JohnZed @quasiben

@anders-wind
Copy link

Looks like cudatoolkit is available at conda-forge now: https://anaconda.org/conda-forge/cudatoolkit
just an FYI :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants