Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to tell if medaka consensus is using cpu or gpu #65

Closed
HenrivdGeest opened this issue Jul 11, 2019 · 5 comments
Closed

How to tell if medaka consensus is using cpu or gpu #65

HenrivdGeest opened this issue Jul 11, 2019 · 5 comments

Comments

@HenrivdGeest
Copy link

We have a centos machine with a GTX 2080Ti 11GB installed (+ quad core xeon with 16GB ram)
Cuda 10 and drivers 418:
NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1
I installed medaka from source using the git pull and make install command. I changed the requirements to tensorflow-gpu.
I can now run medaka consensus, but I am wondering if its really using the gpu. or still the cpu.
A few things make me doubt:
I can monitor load and clock speeds of the videocard while running. When idle its reporting this:

#Time        gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
#HH:MM:SS    Idx     W     C     C     %     %     %     %   MHz   MHz
 11:00:20      0    17    43     -     0     2     0     0   405   300
 11:00:21      0    17    43     -     0     2     0     0   405   300

nvidia-smi show the memory load/programs using the card when idle:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   43C    P8    17W / 300W |     99MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1922      G   /usr/bin/X                                    39MiB |
|    0      2633      G   /usr/bin/gnome-shell                          58MiB |
+-----------------------------------------------------------------------------+

If I fire up the medaka consensus tool for the ecoli example I see the following in the stdout: ( the consensus part takes a bit less than 2 minutes, which is much faster than 49 minutes (nanopolish) but also slower than 7 seconds compared to the benchmark info.

(medaka) [geest@gt-mapper medaka_walkthrough]$ medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC}
Checking program versions
Program    Version    Required   Pass     
bgzip      1.9        1.9        True     
minimap2   2.11       2.11       True     
samtools   1.9        1.9        True     
tabix      1.9        1.9        True     
Warning: Output consensus already exists, may use old results.
Not aligning basecalls to draft, calls_to_draft.bam exists.
Running medaka consensus
[11:08:19 - Predict] Processing region(s): utg000001c:0-4702069
[11:08:19 - Predict] Setting tensorflow threads to 4.
[11:08:19 - Predict] Processing 5 long region(s) with batching.
[11:08:19 - ModelLoad] Building model (steps, features, classes): (10000, 10, 5)
[11:08:19 - ModelLoad] With cudnn: False
....
[11:08:45 - PWorker] 18.4% Done (0.9/4.7 Mbases) in 24.6s
....
[11:10:02 - PWorker] 100.0% Done (4.7/4.7 Mbases) in 101.8s
....
[11:10:04 - Stitch] Processing utg000001c.

During running I see that the tool is using some (150Mbyte) amount of memory on the gpu:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   44C    P2    52W / 300W |    264MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1922      G   /usr/bin/X                                    39MiB |
|    0      2189      C   .../geest/bin/medaka-0.8.0/venv/bin/python   155MiB |
|    0      2633      G   /usr/bin/gnome-shell                          57MiB |
+-----------------------------------------------------------------------------+

Also, the load monitor shows ome short elevation of clock speed and usage during the run, however just for 15 seconds before it seems idle again:

#Time        gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
#HH:MM:SS    Idx     W     C     C     %     %     %     %   MHz   MHz
 11:08:19      0    15    43     -     0     2     0     0   405   300
 11:08:20      0    52    44     -     1     1     0     0  6800  1350
 11:08:21      0    52    44     -     0     0     0     0  6800  1350
 11:08:34      0    53    45     -     0     0     0     0  6800  1350
 11:08:35      0    53    45     -     0     0     0     0  6800  1350
....
 11:08:36      0    34    44     -     0     0     0     0   405   420
 11:08:37      0    17    44     -     0     0     0     0   405   315
 11:08:38      0    16    44     -     0     2     0     0   405   315
 11:08:39      0    16    44     -     0     2     0     0   405   300

So somehow it seems that something is using the gpu, but its barely using it.
If I force the batch size (-b 1000) to something very big, I force it to crash to capture the error message:

[11:14:47 - Sampler] Took 1.56s to make features.
[11:14:47 - Sampler] Pileup for utg000001c:3999000.0-4702068.0 is of width 1485904
Traceback (most recent call last):
....
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10000,1000,128] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu

I read from other reports that it says by allocater gpu, but my error says 'cpu'. ( so I guess that I do not have enough system memory)

The python package listing show both tensorflow and tensorflow-gpu:

ls ll  ~/bin/medaka-0.8.0/venv/lib/python3.6/site-packages/
....
tensorboard
tensorboard-1.14.0.dist-info
tensorflow
tensorflow_estimator
tensorflow_estimator-1.14.0.dist-info
....

Any ideas on this? Is it using the gpu or did I miss something?

@cjw85
Copy link
Member

cjw85 commented Jul 11, 2019

I believe you are correct that the GPU is not being used correctly. A tell tale sign is this in the log:

[11:08:19 - ModelLoad] With cudnn: False

indicating that tensorflow has not created a GPU optimised model. Do you have CuDNN installed on your computer as well as CUDA 10? This is usually acquired as a separate download.

That being said, I don't think having both tensorflow and tensorflow gpu installed in your environment is correct. Try uninstalling both and then installing simply tensorflow-gpu, something like:

pip uninstall tensorflow-gpu tensorflow
pip install tensorflow-gpu==1.14.0

Or just run make clean install, after editing the requirements.txt to specify the gpu version.

So a complete example:

git clone [email protected]:nanoporetech/medaka.git
cd medaka
sed -i "s/tensorflow/tensorflow-gpu/" requirements.txt
make install
. venv/bin/activate
cd ..
wget https://s3-eu-west-1.amazonaws.com/ont-research/medaka_walkthrough_no_reads.tar.gz
tar -xzf medaka_walkthrough_no_reads.tar.gz
cd data
medaka_consensus -d draft_assm.fa -i basecalls.fa -t 8 -b 100

Note the change in the batch size here. The default used to be appropriate for a GPU with 11GB (I'm using a 1080Ti), but it looks like something has changed recently which makes the default too big. The relevant part of stdout for the above:

Running medaka consensus
[10:55:18 - Predict] Processing region(s): utg000001c:0-4703280
[10:55:18 - Predict] Setting tensorflow threads to 8.
[10:55:18 - Predict] Processing 5 long region(s) with batching.
[10:55:18 - ModelLoad] Building model (steps, features, classes): (10000, 10, 5)
[10:55:18 - ModelLoad] With cudnn: True
[10:55:18 - ModelLoad] Loading weights from /media/scratch/cwright/medaka_gh65/medaka/venv/lib/python3.5/site-packages/medaka-0.8.0-py3.5-linux-x86_64.egg/medaka/data/r941_min_high_model.hdf5
[10:55:19 - PWorker] Running inference for 4.7M draft bases.
[10:55:19 - Sampler] Initializing sampler for consensus of region utg000001c:0-1000000.
[10:55:20 - Feature] Processed utg000001c:0.0-999999.1 (median depth 76.0)
[10:55:20 - Sampler] Took 1.34s to make features.
[10:55:20 - Sampler] Pileup for utg000001c:0.0-999999.1 is of width 2073349
[10:55:20 - Sampler] Initializing sampler for consensus of region utg000001c:999000-2000000.
[10:55:22 - Feature] Processed utg000001c:999000.0-1999999.0 (median depth 85.0)
[10:55:22 - Sampler] Took 1.62s to make features.
[10:55:22 - Sampler] Pileup for utg000001c:999000.0-1999999.0 is of width 2190799
[10:55:22 - Sampler] Initializing sampler for consensus of region utg000001c:1999000-3000000.
[10:55:23 - Feature] Processed utg000001c:1999000.0-2999999.1 (median depth 89.0)
[10:55:23 - Sampler] Took 1.65s to make features.
[10:55:23 - Sampler] Pileup for utg000001c:1999000.0-2999999.1 is of width 2203455
[10:55:23 - Sampler] Initializing sampler for consensus of region utg000001c:2999000-4000000.
[10:55:25 - Feature] Processed utg000001c:2999000.0-3999999.1 (median depth 89.0)
[10:55:25 - Sampler] Took 1.63s to make features.
[10:55:25 - Sampler] Pileup for utg000001c:2999000.0-3999999.1 is of width 2208176
[10:55:25 - Sampler] Initializing sampler for consensus of region utg000001c:3999000-4703280.
[10:55:26 - Feature] Processed utg000001c:3999000.0-4703279.0 (median depth 80.0)
[10:55:26 - Sampler] Took 1.18s to make features.
[10:55:26 - Sampler] Pileup for utg000001c:3999000.0-4703279.0 is of width 1488424
[10:55:29 - PWorker] 97.3% Done (4.6/4.7 Mbases) in 10.5s
[10:55:30 - PWorker] All done, 0 remainder regions.
[10:55:30 - Predict] Finished processing all regions.
Running medaka stitch
[10:55:32 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
[10:55:32 - Stitch] Processing utg000001c.
Polished assembly written to medaka/consensus.fasta, have a nice day.

During which gpustat shows:

benchmark

@HenrivdGeest
Copy link
Author

Thanks, I figured that cuda10.1 is not compatible with tensorflow 1.14.0. I tried installing tensorflow 2.0 package but that does not work at all. I will try to get a docker image for this.

@cjw85
Copy link
Member

cjw85 commented Jul 19, 2019

Hi @HenrivdGeest,

Have you managed to get something working? My machine has Nvidia driver 418.67, CUDA 10.1 and cuDNN 7.4.2. I don't think there is a problem with CUDA10.1 and tensorflow 1.14.0 per se; I think the issue is more likely the version of cuDNN you have installed.

The pypi tensorflow 1.12.0 package complains when the correct cuDNN library cannot be found which made things fairly obvious. We tested the behaviour of medaka with the tf1.14 package in the absence of cuDNN 7.4: it does not complain and carries on as in your original post.

Various cuDNN versions can be downloaded from: https://developer.nvidia.com/rdp/cudnn-archive

@cjw85
Copy link
Member

cjw85 commented Aug 9, 2019

@HenrivdGeest,

We have found a workaround for using medaka with a 2080 GPU which is working for @devindrown. Can you try this if you are still having problems?

@cjw85
Copy link
Member

cjw85 commented Sep 11, 2019

The latest release, v0.9.0, has additional logging concerning GPU use and tips for RTX users.

@cjw85 cjw85 closed this as completed Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants