Skip to content

Bump CUDA from 11.2 to 11.8#505

Merged
weiji14 merged 5 commits into
masterfrom
cuda-11.8
Dec 18, 2023
Merged

Bump CUDA from 11.2 to 11.8#505
weiji14 merged 5 commits into
masterfrom
cuda-11.8

Conversation

@weiji14

@weiji14 weiji14 commented Dec 12, 2023

Copy link
Copy Markdown
Member

The CUDA 11.8 migration across conda-forge is practically complete (see https://conda-forge.org/status/#cuda118), so we can start updating to a newer version of CUDA!

This should add support for the new NVIDIA Ada Lovelace and Hopper generation GPUs that requires compute capability 8.9 or 9.0 (see https://docs.nvidia.com/cuda/archive/11.8.0/hopper-compatibility-guide/index.html#verifying-hopper-compatibility-using-cuda-11-8).

Note:

  • There are actually CUDA 12.0 builds on conda-forge already, but it's probably good to have some docker images with CUDA 11.8 first, and transition to CUDA 12 later.
  • CUDA 11.x has good forward and backwards compatibilty (see https://docs.nvidia.com/deploy/cuda-compatibility/index.html#cuda-intro), and as long as folks are using CUDA driver 450.36.06+, it should be ok.

Changes in this PR:

  • Update Pytorch, Torchvision and Tensorflow to use CUDA 11.8 builds
  • Update minimum pin on Tensorflow from >=2.9.1 to >=2.14.0 because lower versions <2.13.1 only has CUDA 11.2 on conda-forge.

References:

Update Pytorch, Torchvision and Tensorflow to use CUDA 11.8 builds. Also bumped tensorflow from 2.9.1 to 2.14.0 because lower versions <2.13.1 only has CUDA 11.2 on conda-forge.
@weiji14 weiji14 self-assigned this Dec 12, 2023
@pangeo-bot

Copy link
Copy Markdown
Collaborator

/condalock
Automatically locking new conda environment, building, and testing images...

@github-actions

Copy link
Copy Markdown
Contributor

Binder 👈 Try on Mybinder.org!

@weiji14 weiji14 marked this pull request as ready for review December 13, 2023 01:23
@weiji14

weiji14 commented Dec 18, 2023

Copy link
Copy Markdown
Member Author

/condalock

- gpytorch
- pytorch>=2.0.0=*cuda112*
- torchvision>=0.15.1=*cuda112*
- pytorch>=2.0.0=*cuda118*

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently we've been on 11.8 for some time, so perhaps this * syntax isn't really pinning anything?

cuda-version==11.8
cudatoolkit==11.8.0

@weiji14 weiji14 Dec 18, 2023

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cudatoolkit. CUDA 11.2 and 11.8 does have this forward/backward compatibility thing, but if you look at the conda-lock.yml file, the Pytorch build is actually the CUDA 11.2 one:

url: https://conda.anaconda.org/conda-forge/linux-64/pytorch-2.1.0-cuda112py311ha0492fd_300.conda

Compared to Pytorch compiled with CUDA 11.2, the one compiled with CUDA 11.8 enables compute capability 8.9, as seen at https://github.com/conda-forge/pytorch-cpu-feedstock/blob/7c7a57b7515eaeda67d3879b56b68466f38f0b0d/recipe/build_pytorch.sh#L144-L153.

@scottyhq

Copy link
Copy Markdown
Member

looks like tensorflow image size dropped a bit, but pytorch keeps inflating :) would be nice to get these below 10GB if possible one of these days

pangeo/ml-notebook 
10.8GB ->  10GB

pangeo/pytorch-notebook   
13.7GB -> 13.9GB

@weiji14

weiji14 commented Dec 18, 2023

Copy link
Copy Markdown
Member Author

Yes, things should get smaller! This is because conda-forge has removed the need for a large cudatoolkit package in CUDA 12 (see conda-forge/conda-forge.github.io#1963) by breaking it into smaller components that are installed as needed. So hopefully that can cut down a few hundred megabytes 🤞

@weiji14 weiji14 merged commit 51db3df into master Dec 18, 2023
@weiji14 weiji14 deleted the cuda-11.8 branch December 18, 2023 22:51
@weiji14 weiji14 mentioned this pull request Feb 16, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants