You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I met the "Not Enough GPU memory" problem after I solved the problem of jax not recognition the GPU device.
The following is the error process.
I install Localcolabfold using "install_colabbatch_linux.sh". When I run the "colabfold_batch", the error of "no GPU detected, will be using CPU" occured. Then I checked whether the jax could recognize the GPU device (refer to #209).
$HOME/software/localcolabfold/colabfold-conda/bin/python3.10
# Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] on linux
>>> import jax
>>> print(jax.local_devices()[0].platform)
# CUDA backend failed to initialize: Unable to load cuDNN. Is it installed? (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
# cpu
I degraded the jax version to "jax==0.4.7, jaxlib==0.4.7+cuda11.cudnn86" (refer to #209). Then the jax can recognize the GPU device.
$HOME/software/localcolabfold/colabfold-conda/bin/python3.10
# Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] on linux
>>> import jax
>>> print(jax.local_devices()[0].platform)
gpu
Then the colabfold_batch met the problem "No module named 'jax.extend'" (refer to #224). I reinstalled the "dm-haiku==0.0.10". And the colabfold_batch could run on the GPU device. However, I met a new problem "Could not predict HNUJ.ctg90.87. Not Enough GPU memory? FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.".
I have two 2080 Ti (11GB * 2).
$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:17:00.0 Off | N/A |
| 38% 41C P0 52W / 250W | 0MiB / 11264MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 2080 Ti Off | 00000000:25:00.0 Off | N/A |
| 25% 30C P0 21W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
I have 450 amino acids in the fasta file. Is this problem caused by insufficient video memory? It seems that 40 GB of video memory still has this problem? (refer to #90)
In addition, given that I added CUDA 12.1 in my $PATH, I also tried to modify the "install_colabbatch_linux.sh" as suggested by A-Talavera (refer to #210).
I changed "$COLABFOLDDIR/colabfold-conda/bin/pip" install --upgrade "jax[cuda11_pip]==0.4.23"
to "$COLABFOLDDIR/colabfold-conda/bin/pip" install --upgrade "jax[cuda12_pip]==0.4.23"
And the jaxlib-0.4.23+cuda12.cudnn89 will be installed by default. Then I tried to degrade the jax to "jaxlib-0.4.7+cuda12.cudnn88" just following the same process as above. I can run colabfold_batch on GPU. But it still tell me "Could not predict HNUJ.ctg90.87. Not Enough GPU memory? FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details". And in #224, you said "jax-0.4.23+cuda11.cudnn86" was also ok for CUDA 12.1.
$ cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
CUDA version if Linux (Show the output of /usr/local/cuda/bin/nvcc --version.)
$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
Since LocalColabFold requires CUDA 11.8+, I added CUDA 12.1 to the environment variable $PATH.
$ which nvcc
/usr/local/cuda-12.1/bin/nvcc
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Is it because the program calls CUDA 11.3 (/usr/local/cuda/bin/nvcc) instead of CUDA 12.1 in $PATH by default?
Looking forward to your reply. Thank you.
Yulong Li
The text was updated successfully, but these errors were encountered:
Tiramisu023
changed the title
When I degraded the jax, the code can run on GPU but Not Enough GPU memory?
When I degraded the jax, the code can run on GPU, but Not Enough GPU memory?
Apr 15, 2024
What is your installation issue?
Hello, I met the "Not Enough GPU memory" problem after I solved the problem of jax not recognition the GPU device.
The following is the error process.
I install Localcolabfold using "install_colabbatch_linux.sh". When I run the "colabfold_batch", the error of "no GPU detected, will be using CPU" occured. Then I checked whether the jax could recognize the GPU device (refer to #209).
Then I checked the jax and jaxlib version,
I degraded the jax version to "jax==0.4.7, jaxlib==0.4.7+cuda11.cudnn86" (refer to #209). Then the jax can recognize the GPU device.
Then the colabfold_batch met the problem "No module named 'jax.extend'" (refer to #224). I reinstalled the "dm-haiku==0.0.10". And the colabfold_batch could run on the GPU device. However, I met a new problem "Could not predict HNUJ.ctg90.87. Not Enough GPU memory? FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.".
I have two 2080 Ti (11GB * 2).
I have 450 amino acids in the fasta file. Is this problem caused by insufficient video memory? It seems that 40 GB of video memory still has this problem? (refer to #90)
In addition, given that I added CUDA 12.1 in my $PATH, I also tried to modify the "install_colabbatch_linux.sh" as suggested by A-Talavera (refer to #210).
And the jaxlib-0.4.23+cuda12.cudnn89 will be installed by default. Then I tried to degrade the jax to "jaxlib-0.4.7+cuda12.cudnn88" just following the same process as above. I can run colabfold_batch on GPU. But it still tell me "Could not predict HNUJ.ctg90.87. Not Enough GPU memory? FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details". And in #224, you said "jax-0.4.23+cuda11.cudnn86" was also ok for CUDA 12.1.
Computational environment
/usr/local/cuda/bin/nvcc --version
.)Since LocalColabFold requires CUDA 11.8+, I added CUDA 12.1 to the environment variable $PATH.
Is it because the program calls CUDA 11.3 (/usr/local/cuda/bin/nvcc) instead of CUDA 12.1 in $PATH by default?
Looking forward to your reply. Thank you.
Yulong Li
The text was updated successfully, but these errors were encountered: