-
Notifications
You must be signed in to change notification settings - Fork 0
ML Computer GPU ML Settings
Geoffroy Noël edited this page Oct 12, 2021
·
10 revisions
- Ubuntu 21.04,
- CUDA 11.1 (libcuda: a GPU user mode driver),
- GPU Driver 470.63.01 (nvidia.ko: a kernel-mode driver)
- Python 3.7 / 2.7.18
# this will show the driver version
# NOTE: the CUDA version is the one supported by the driver, not the one installed!
nvidia-smi
Check CUDA version
nvcc --version
Check cudnn version:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
PyTorch 1.8.2 (LTS) to 1.9.1: CUDA 10.2 / 11.1
TensorFlow 2.5-6: CUDA 11.2 -> cuDNN 8.1
CUDA 11 requires Driver 450+
Uninstall
sudo /usr/bin/nvidia-uninstall
sudo nvidia-installer --uninstall
sudo apt purge --remove "*nvidia*"
sudo apt autoremove
sudo apt autoclean
Re-install
# check available drivers
ubuntu-drivers devices
# install recommended driver
sudo ubuntu-drivers autoinstall
sudo /usr/local/cuda-10.2/bin/cuda-uninstaller
sudo rm -rf /usr/local/cuda*
Re-install:
sudo apt install cuda-toolkit-11-1
Tested with torch==1.9.0+cu111 (pip)
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
To test pytorch uses the GPU:
import torch
print (torch.cuda.is_available())
Tested with tensorflow-gpu 2.6.0 (pip)
But it needs cudnn8, which must be downloaded separately to cuda.
To test TF uses the GPU:
import tensorflow as tf
print(tf.test.gpu_device_name())
tf.debugging.set_log_device_placement(True)
try:
# Specify an invalid GPU device
with tf.device('/device:GPU:2'):
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
except RuntimeError as e:
print(e)
I had to add the following line to my bashrc to make it detect libcudnn.so.8:
echo 'export LD_LIBRARY_PATH=/home/gnoel/Downloads/cuda/lib64/:$LD_LIBRARY_PATH' >> ~/.bashrc