-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to tell if medaka consensus is using cpu or gpu #65
Comments
I believe you are correct that the GPU is not being used correctly. A tell tale sign is this in the log:
indicating that tensorflow has not created a GPU optimised model. Do you have CuDNN installed on your computer as well as CUDA 10? This is usually acquired as a separate download. That being said, I don't think having both
Or just run So a complete example:
Note the change in the batch size here. The default used to be appropriate for a GPU with 11GB (I'm using a 1080Ti), but it looks like something has changed recently which makes the default too big. The relevant part of stdout for the above:
During which |
Thanks, I figured that cuda10.1 is not compatible with tensorflow 1.14.0. I tried installing tensorflow 2.0 package but that does not work at all. I will try to get a docker image for this. |
Hi @HenrivdGeest, Have you managed to get something working? My machine has Nvidia driver 418.67, CUDA 10.1 and cuDNN 7.4.2. I don't think there is a problem with CUDA10.1 and tensorflow 1.14.0 per se; I think the issue is more likely the version of cuDNN you have installed. The pypi tensorflow 1.12.0 package complains when the correct cuDNN library cannot be found which made things fairly obvious. We tested the behaviour of medaka with the tf1.14 package in the absence of cuDNN 7.4: it does not complain and carries on as in your original post. Various cuDNN versions can be downloaded from: https://developer.nvidia.com/rdp/cudnn-archive |
We have found a workaround for using |
The latest release, v0.9.0, has additional logging concerning GPU use and tips for RTX users. |
We have a centos machine with a GTX 2080Ti 11GB installed (+ quad core xeon with 16GB ram)
Cuda 10 and drivers 418:
NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1
I installed medaka from source using the git pull and make install command. I changed the requirements to tensorflow-gpu.
I can now run medaka consensus, but I am wondering if its really using the gpu. or still the cpu.
A few things make me doubt:
I can monitor load and clock speeds of the videocard while running. When idle its reporting this:
nvidia-smi show the memory load/programs using the card when idle:
If I fire up the medaka consensus tool for the ecoli example I see the following in the stdout: ( the consensus part takes a bit less than 2 minutes, which is much faster than 49 minutes (nanopolish) but also slower than 7 seconds compared to the benchmark info.
During running I see that the tool is using some (150Mbyte) amount of memory on the gpu:
Also, the load monitor shows ome short elevation of clock speed and usage during the run, however just for 15 seconds before it seems idle again:
So somehow it seems that something is using the gpu, but its barely using it.
If I force the batch size (-b 1000) to something very big, I force it to crash to capture the error message:
I read from other reports that it says by allocater gpu, but my error says 'cpu'. ( so I guess that I do not have enough system memory)
The python package listing show both tensorflow and tensorflow-gpu:
Any ideas on this? Is it using the gpu or did I miss something?
The text was updated successfully, but these errors were encountered: