ML Computer GPU ML Settings

Config (Oct 2021)

Ubuntu 21.04,
CUDA 11.1 (libcuda: a GPU user mode driver),
GPU Driver 470.63.01 (nvidia.ko: a kernel-mode driver)
Python 3.7 / 2.7.18

Test

# this will show the driver version
# NOTE: the CUDA version is the one supported by the driver, not the one installed!
nvidia-smi

Check CUDA version nvcc --version

Check cudnn version: cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Dependency chain

ML Frameworks -> CUDA

PyTorch 1.8.2 (LTS) to 1.9.1: CUDA 10.2 / 11.1

TensorFlow 2.5-6: CUDA 11.2 -> cuDNN 8.1

CUDA -> Driver

CUDA 11 requires Driver 450+

CUDA -> cuDNN

Re-install

driver

Uninstall

sudo /usr/bin/nvidia-uninstall
sudo nvidia-installer --uninstall
sudo apt purge --remove "*nvidia*"
sudo apt autoremove
sudo apt autoclean

Re-install

# check available drivers
ubuntu-drivers devices
# install recommended driver
sudo ubuntu-drivers autoinstall

cuda

sudo /usr/local/cuda-10.2/bin/cuda-uninstaller
sudo rm -rf /usr/local/cuda*

Re-install:

sudo apt install cuda-toolkit-11-1

pytorch

See official install page

Tested with torch==1.9.0+cu111 (pip)

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

To test pytorch uses the GPU:

import torch
print (torch.cuda.is_available())

tensorflow (TF)

Tested with tensorflow-gpu 2.6.0 (pip)

But it needs cudnn8, which must be downloaded separately to cuda.

To test TF uses the GPU:

import tensorflow as tf
print(tf.test.gpu_device_name())


tf.debugging.set_log_device_placement(True)

try:
  # Specify an invalid GPU device
  with tf.device('/device:GPU:2'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
except RuntimeError as e:
  print(e)

I had to add the following line to my bashrc to make it detect libcudnn.so.8:

echo 'export LD_LIBRARY_PATH=/home/gnoel/Downloads/cuda/lib64/:$LD_LIBRARY_PATH' >> ~/.bashrc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML Computer GPU ML Settings

Config (Oct 2021)

Test

Dependency chain

ML Frameworks -> CUDA

CUDA -> Driver

CUDA -> cuDNN

Re-install

driver

cuda

pytorch

tensorflow (TF)

Clone this wiki locally