Slow Training GPU RTX 2080 #700

MGSousa · 2021-03-11T00:40:00Z

Hi,
I am training a dataset in Portuguese but the process is very slow using CUDA with default hparams.

In this test I'm using batch_size = 8.

As Expected:
Run 2~3 times faster

Anyone has this problem on Windows with Anaconda?

The text was updated successfully, but these errors were encountered:

ghost · 2021-03-24T21:47:55Z

Try without Anaconda and see if the training speed improves.

Although this repo works with Anaconda, we don't have enough developer interest to support it.

ghost · 2021-03-24T21:48:57Z

Will reopen issue if slow training speed is confirmed with a normal Python installation.

ghost · 2021-04-01T00:22:53Z

@MGSousa @StElysse Thanks for your observations in #711 which confirm the issue. Reopening this, no idea where to start since I don't have Windows to test.

ghost · 2021-04-01T00:24:51Z

Do you have cuDNN installed?

StElysse · 2021-04-01T01:56:24Z

I have version 10.0 of cuDNN installed, yes.

MGSousa · 2021-04-01T11:22:37Z

cuDNN 8.1 for CUDA v10.2.

justinjohn0306 · 2021-04-02T09:49:29Z

cuDNN 8.1 for CUDA v10.2.

use cuDNN 7.5

justinjohn0306 · 2021-04-02T09:51:23Z

https://developer.nvidia.com/rdp/cudnn-archive

MGSousa · 2021-04-05T13:14:21Z

cuDNN 8.1 for CUDA v10.2.

use cuDNN 7.5

Same result with r=2 ~42 steps

Rainer2465 · 2021-07-19T03:22:05Z

@MGSousa did it work out for you?

MGSousa · 2021-07-21T18:40:36Z

@Rainer2465 I changed outputs/step to four ( r=4 ~91 ) somewhat got decent results for now.
But with r=2 the results remain the same (42 ~ 43 steps /s).

Rainer2465 · 2021-07-21T23:36:04Z

@MGSousa you're brazilian right? Can we talk a lil more about it?
se for brasileiro podemos ver alguma forma de conversarmos sobre isso? Queria saber mais sobre o dataset de vozes para o treinamento

Ca-ressemble-a-du-fake · 2021-11-05T06:30:48Z

Nividia is providing a benchmark for Tacotron 2. Does it make sense to run it and compare the results with what we have with Real Time Voice Cloning during training, or does it compare apples with oranges ? The goal of that is to check whether the poor performance comes from Drivers / CUDA or from code or from Anaconda.

Ca-ressemble-a-du-fake · 2021-11-05T08:01:36Z

Experiment
So far (without running the benchmark) I have noticed that the GPU power consumption is very low during training. Indeed on my RTX 3070 power draw (nvidia-smi -q -d POWER | grep Draw) shows around 60W (compared to 130W while mining or 100W while playing game). So I doubt GPU is used at its full capabilities.

Expected behaviour
If the power limit is set to 130W I would expect the power draw to reach this limit while training since many users reports 100% utilization of their GPU during training. Behaviour should be similar to mining.

Ca-ressemble-a-du-fake · 2021-11-05T08:46:47Z

The command nvidia-smi -q -g 0 -d UTILIZATION -l 1 shows GPU usage varying between 20 and 50% during training. So GPU is far from being fully utilized whereas it should. So it looks like there is a huge gain margin to harvest somewhere.

I would like to use nvtop to have a graph but it is not compatible with the latest nvidia drivers. If I force the install then there is a driver / library mismatch and the training runs even slower (less than 0.1 steps / s).

ghost · 2021-11-05T09:11:57Z

The repo includes a profiler which is used for encoder training. You can try something similar to find the bottleneck for synthesizer training.

ghost · 2021-11-05T18:50:34Z

Same problem here too, on Ubuntu 20.04 without Anaconda and on RTX 3070 under Python 3.8.8. Around 0.52-0.54 step / s.

Originally posted by @Ca-ressemble-a-du-fake in #711 (comment)

With the whole repo in an unmodified state (including synthesizer hparams), I get 0.72-0.74 steps/s.

Switching to Tacotron2 on Tensorflow 1.x (5425557, installed with these instructions), I get 1.10-1.14 steps/s. Tensorflow training speed is variable, with the training getting faster as the model gets better. This experiment was done with the #538 model as a starting point, finetuning on a large single-speaker dataset.

Because I wanted to understand whether this difference was caused by using Taco1 vs Taco2, or PT vs TF, I also ran a training experiment with a PyTorch Taco2 implementation. Results below.

Model	PT/TF	Model Parameters	Training Speed
Tacotron 1	PyTorch 1.3.1	30.87M	0.72-0.74 steps/s
Tacotron 2	PyTorch 1.3.1	28.44M	0.65-0.66 steps/s
Tacotron 2	Tensorflow 1.15	28.44M	1.10-1.14 steps/s

We can conclude that PyTorch is slower training than Tensorflow 1.x. And there are some other unknown issues causing your training to be even slower.

* OS: Ubuntu 20.04
    * NVIDIA Driver Version: 460.91.03
    * CUDA Version: 11.2 

* GPU: GTX 1660S (desktop)
    * Performance state: P2
    * Power draw: 90W (range: 70-110W)
    * Utilization: 80% (range: 65-95%)
    * VRAM: 5.5/6.0 GB

* Python 3.7.9 (with Anaconda)
* Same training speed obtained for Python 3.8.10 (without Anaconda) with torch==1.7.1

* Dataset storage: HDD

Ca-ressemble-a-du-fake · 2021-11-06T06:06:10Z

This is a nice benchmark. Which command did you use for GPU utilization ? And which dataset did you use ? RTX 3070 should be faster than GTX 1660s on the same dataset or it should be independent from it (my rate was given for a French dataset) ?

So can we say that around 1 step / s should be OK for r = 2 and batch_size = 12 ? Should the results be the same using vanilla Tacotron 2 training on same dataset ?

I just know basic Python (eg : basic string manipulation, basic external program call, ...). How do you use the profiler you talked about earlier ? I did not find any reference of it in encoder_train.py,should I add a call in synthesizer_train.py ?

ghost · 2021-11-06T16:28:26Z

Which command did you use for GPU utilization ?

watch -n 0.5 nvidia-smi

How do you use the profiler you talked about earlier ?

I made a branch for this: https://github.com/blue-fish/Real-Time-Voice-Cloning/tree/700_slow_training
If you don't want to get the branch, make these modifications: https://github.com/blue-fish/Real-Time-Voice-Cloning/commit/989a3e43a834100ae76ac6ff7edd3e5d532ced80

Example training output with profiler:

{| Epoch: 1/8 (20/2564) | Loss: 6.107 | 0.75 steps/s | Step: 0k | }
Average execution time over 10 steps:
  Blocking, waiting for batch (threaded) (10/10):  mean:    0ms   std:    0ms
  Data to cuda (10/10):                            mean:    1ms   std:    0ms
  Forward pass (10/10):                            mean:  466ms   std:   98ms
  Loss (10/10):                                    mean:   12ms   std:    2ms
  Backward pass (10/10):                           mean:  780ms   std:  153ms
  Parameter update (10/10):                        mean:   16ms   std:    0ms
  Extras (visualizations, saving) (10/10):         mean:    4ms   std:    0ms

Ca-ressemble-a-du-fake · 2021-11-06T20:59:02Z

Thanks a lot for these precise instructions! Forward and backward passes are higher than what you showed in your comment 2 days ago. Otherwise if your comment just above deals with a GTX 1660s then my profiler results look good since RTX 3070 should be faster.

{| Epoch: 1/61 (30/748) | Loss: 0.3490 | 1.2 steps/s | Step: 115k | }

Average execution time over 10 steps:
Blocking, waiting for batch (threaded) (10/10): mean: 0ms std: 0ms
Data to cuda (10/10): mean: 1ms std: 0ms
Forward pass (10/10): mean: 292ms std: 64ms
Loss (10/10): mean: 5ms std: 1ms
Backward pass (10/10): mean: 454ms std: 101ms
Parameter update (10/10): mean: 28ms std: 1ms
Extras (visualizations, saving) (10/10): mean: 0ms std: 0ms

(Sorry I did not succeed in formatting the table as you did!

ghost · 2021-11-07T00:31:14Z

I thought your training rate was 0.52-0.54 steps/s, did something change? Your profiler results look reasonable to me.

Ca-ressemble-a-du-fake · 2021-11-07T05:04:35Z

It was indeed (for r = 2) but then I applied gradual training like r = 16 till 20k, then 8 till 40k, then 7 till 80k then 5 till 160k, and finally 2 till the end. Batch size is set constant to 12. Does it still look reasonable to you although r has increased ? In 10k, r will switch back to 2 so I'll post the profiler results to compare to.

GPU utilization shows roughly 40% (range 20-60%) and memory usage 5.7/7.8 GB (mainly used by python3 with 5.3GB) so there may be room for improvements. Maybe by increasing batch size ?

Ca-ressemble-a-du-fake · 2021-11-07T12:01:22Z

Now that r has decreased to 2 with a batch size of 12, training rate is back to 0.52-0.54 step / s

Backward pass takes around 1 s.

`{| Epoch: 2/208 (272/748) | Loss: 0.3196 | 0.56 steps/s | Step: 166k | }

Average execution time over 10 steps:

Blocking, waiting for batch (threaded) (10/10): mean: 0ms std: 0ms
Data to cuda (10/10): mean: 1ms std: 0ms
Forward pass (10/10): mean: 682ms std: 169ms
Loss (10/10): mean: 13ms std: 4ms
Backward pass (10/10): mean: 1060ms std: 259ms
Parameter update (10/10): mean: 28ms std: 1ms
Extras (visualizations, saving) (10/10): mean: 0ms std: 0ms`

GPU utilization is still around 40% (range 20-60%) and memory has increased to 6.7GB (only for training)

ghost · 2021-11-07T14:02:58Z

What is the GPU performance state reported by nvidia-smi?

Ca-ressemble-a-du-fake · 2021-11-08T03:48:29Z

nvidia-smi command reports performance state P2 being used. GPU utilization dropped a little bit falling even as low as 0% sometimes, but most of the time between 20-50%.

ghost · 2021-11-08T19:09:55Z

If GPU is running at P2, doesn't seem like it is the bottleneck. I am running out of ideas.

Which NVIDIA driver and CUDA version is installed?
Is there any possibility of a CPU bottleneck? For example a low-performance or obsolete CPU.

Ca-ressemble-a-du-fake · 2021-11-09T11:18:10Z

Nvidia driver is 470 (I tried several ones, this one is the proprietary one) Cuda is now 11.5 (I also tried 11.3).
CPU does not seem to be the bottleneck since its utilization lives around 50% its an old i7 2600.

Ca-ressemble-a-du-fake · 2021-11-09T14:14:28Z

I tried to run PyTorch bottleneck as advised on PT forum but could not correctly modify train.py so that PT profiler terminates in a given amount of time without messing up with the code. I tried to manually set max_step to 254010 because training was at step 254k and I wanted it to train for 10 steps before exiting, but it made the Loss increase, so I reverted my changes.

Ca-ressemble-a-du-fake · 2021-11-09T14:20:56Z

I also tried to increase batch_size to 20 and quickly got a CUDA out of memory error that advises something :

RuntimeError: CUDA out of memory. Tried to allocate 210.00 MiB (GPU 0; 7.79 GiB total capacity; 5.01 GiB already allocated; 166.25 MiB free; 5.29 GiB reserved in total by PyTorch) **If reserved memory is >> allocated memory** try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
It looks like reserved memory > allocated memory (but not >>) so should I apply their advice ?

Ca-ressemble-a-du-fake · 2021-11-09T14:24:34Z

Increasing batch_size to 16 did not cause the CUDA out of memory but did not noticeably increase GPU utilization. VRAM usage as increased up to 6.9GB.

ghost · 2021-11-09T15:52:02Z

I tried to run PyTorch bottleneck as advised on PT forum but could not correctly modify train.py so that PT profiler terminates in a given amount of time without messing up with the code.

Training schedule update

First, update the training schedule in synthesizer/hparams.py so it only runs 20 steps from scratch.

        ### Tacotron Training
        tts_schedule = [(2,  1e-3,  20,  12)],  # Train only 20 steps for benchmarking

Dataloader update

Next, you will need to change this line so num_workers=0. The profiler doesn't work with multi-worker dataloaders.

Real-Time-Voice-Cloning/synthesizer/train.py

Line 150 in 7432046

num_workers=2 if platform.system() != "Windows" else 0,

Command

Train a new model from scratch. You can call it anything, I called mine "test". The --force_restart option prevents the saved checkpoints from stopping training prematurely.

python -m torch.utils.bottleneck synthesizer_train.py --force_restart test datasets_root/SV2TTS/synthesizer/

Output

Click here to display profiler output

--------------------------------------------------------------------------------
  Environment Summary
--------------------------------------------------------------------------------
PyTorch 1.7.1 DEBUG compiled w/ CUDA 10.2
Running with Python 3.8 and 

`pip3 list` truncated output:
numpy==1.19.4
torch==1.7.1
torchfile==0.1.0
--------------------------------------------------------------------------------
  cProfile output
--------------------------------------------------------------------------------
         6085394 function calls (5892177 primitive calls) in 43.148 seconds

   Ordered by: internal time
   List reduced from 7179 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       20   15.886    0.794   15.886    0.794 {method 'run_backward' of 'torch._C._EngineBase' objects}
     2992    8.177    0.003    8.177    0.003 {method 'read' of '_io.BufferedReader' objects}
      552    3.517    0.006    3.518    0.006 {built-in method io.open}
     5250    1.717    0.000    1.717    0.000 {method 'to' of 'torch._C._TensorBase' objects}
    31000    1.018    0.000    1.018    0.000 {built-in method addmm}
    12400    0.934    0.000    0.934    0.000 {built-in method lstm_cell}
     6200    0.803    0.000    9.254    0.001 synthesizer/models/tacotron.py:270(forward)
      480    0.709    0.001    0.710    0.001 {built-in method numpy.fromfile}
       40    0.626    0.016    0.626    0.016 {built-in method gru}
    19020    0.600    0.000    0.600    0.000 {method 'matmul' of 'torch._C._TensorBase' objects}
    12400    0.494    0.000    1.292    0.000 synthesizer/models/tacotron.py:265(zoneout)
     6200    0.491    0.000    2.612    0.000 synthesizer/models/tacotron.py:221(forward)
 94620/20    0.463    0.000   10.396    0.520 venv/lib/python3.8/site-packages/torch/nn/modules/module.py:715(_call_impl)
     6480    0.448    0.000    0.448    0.000 {built-in method conv1d}
    18720    0.419    0.000    0.419    0.000 {built-in method cat}


--------------------------------------------------------------------------------
  autograd profiler output (CPU mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

-----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
-----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
              TBackward        18.46%     556.886ms        18.46%     556.886ms     556.886ms       0.000us           NaN       0.000us       0.000us             1  
              TBackward         9.39%     283.453ms         9.39%     283.453ms     283.453ms       0.000us           NaN       0.000us       0.000us             1  
                aten::t         9.39%     283.451ms         9.39%     283.451ms     283.451ms       0.000us           NaN       0.000us       0.000us             1  
           aten::conv1d         9.36%     282.482ms         9.36%     282.482ms     282.482ms       0.000us           NaN       0.000us       0.000us             1  
      aten::convolution         9.36%     282.480ms         9.36%     282.480ms     282.480ms       0.000us           NaN       0.000us       0.000us             1  
     aten::_convolution         9.36%     282.478ms         9.36%     282.478ms     282.478ms       0.000us           NaN       0.000us       0.000us             1  
             aten::add_         9.36%     282.389ms         9.36%     282.389ms     282.389ms       0.000us           NaN       0.000us       0.000us             1  
              TBackward         4.73%     142.740ms         4.73%     142.740ms     142.740ms       0.000us           NaN       0.000us       0.000us             1  
                aten::t         4.73%     142.737ms         4.73%     142.737ms     142.737ms       0.000us           NaN       0.000us       0.000us             1  
        aten::transpose         4.65%     140.276ms         4.65%     140.276ms     140.276ms       0.000us           NaN       0.000us       0.000us             1  
           BmmBackward0         2.35%      70.930ms         2.35%      70.930ms      70.930ms       0.000us           NaN       0.000us       0.000us             1  
              aten::bmm         2.35%      70.904ms         2.35%      70.904ms      70.904ms       0.000us           NaN       0.000us       0.000us             1  
               aten::to         2.33%      70.235ms         2.33%      70.235ms      70.235ms       0.000us           NaN       0.000us       0.000us             1  
    aten::empty_strided         2.33%      70.190ms         2.33%      70.190ms      70.190ms       0.000us           NaN       0.000us       0.000us             1  
              aten::gru         1.84%      55.546ms         1.84%      55.546ms      55.546ms       0.000us           NaN       0.000us       0.000us             1  
-----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 3.017s
CUDA time total: 0.000us

--------------------------------------------------------------------------------
  autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

	Because the autograd profiler uses the CUDA event API,
	the CUDA time column reports approximately max(cuda_time, cpu_time).
	Please ignore this output if your code does not use CUDA.

----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                        Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
               AddmmBackward        21.06%     553.945ms        21.06%     553.945ms     553.945ms     553.600ms        27.44%     553.600ms     553.600ms             1  
                    aten::mm        21.06%     553.915ms        21.06%     553.915ms     553.915ms     553.572ms        27.44%     553.572ms     553.572ms             1  
          UnsqueezeBackward0        10.92%     287.145ms        10.92%     287.145ms     287.145ms     286.460ms        14.20%     286.460ms     286.460ms             1  
               aten::squeeze        10.92%     287.135ms        10.92%     287.135ms     287.135ms       0.000us         0.00%       0.000us       0.000us             1  
                 aten::addmm         6.06%     159.393ms         6.06%     159.393ms     159.393ms     159.392ms         7.90%     159.392ms     159.392ms             1  
                MulBackward0         5.76%     151.387ms         5.76%     151.387ms     151.387ms     150.768ms         7.47%     150.768ms     150.768ms             1  
               aten::dropout         3.20%      84.220ms         3.20%      84.220ms      84.220ms      84.218ms         4.17%      84.218ms      84.218ms             1  
        aten::_fused_dropout         3.20%      84.213ms         3.20%      84.213ms      84.213ms      84.214ms         4.17%      84.214ms      84.214ms             1  
                aten::stride         3.20%      84.148ms         3.20%      84.148ms      84.148ms       0.000us         0.00%       0.000us       0.000us             1  
    CudnnConvolutionBackward         2.90%      76.257ms         2.90%      76.257ms      76.257ms      70.970ms         3.52%      70.970ms      70.970ms             1  
                AddBackward0         2.59%      68.060ms         2.59%      68.060ms      68.060ms      68.057ms         3.37%      68.057ms      68.057ms             1  
                 CatBackward         2.34%      61.523ms         2.34%      61.523ms      61.523ms       1.808ms         0.09%       1.808ms       1.808ms             1  
                 CatBackward         2.31%      60.804ms         2.31%      60.804ms      60.804ms       1.012ms         0.05%       1.012ms       1.012ms             1  
                 CatBackward         2.31%      60.761ms         2.31%      60.761ms      60.761ms       1.804ms         0.09%       1.804ms       1.804ms             1  
                 CatBackward         2.19%      57.595ms         2.19%      57.595ms      57.595ms       1.712ms         0.08%       1.712ms       1.712ms             1  
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 2.631s
CUDA time total: 2.018s

Ca-ressemble-a-du-fake · 2021-11-10T05:08:46Z

Thank you @blue-fish for your guide! Unfortunately the computer is running out of ram after completing the second stage of the profiling (the one that involves Autograd). Neither GPU not CPU was more loaded than usual, but after the second stage completed, the RAM (and then swap space) skyrocketed and computer became unusable.

I tried with 20, 10, and even 5 steps failed because 12 GB of RAM were depleted. When computer become usable again I will try to profile for 2 steps only.

Ca-ressemble-a-du-fake · 2021-11-10T09:44:57Z

So I could profile the training for a single step only. The main differences I see with your results are :

different CUDA version (11.3 (although 11.5 in actually installed) VS 10.2)
CUDA total time > CPU total time (in your case it is the opposite)

But the profiling has only one step.

I don't know how you can achieve to format comments so well, I was not successful in doing so. Detailed results are barely legible consequently they cannot be posted.

Environment Summary

PyTorch 1.10.0+cu113 DEBUG compiled w/ CUDA 11.3
Running with Python 3.8 and

pip3 list truncated output:
numpy==1.19.4
torch==1.10.0+cu113
torchaudio==0.10.0+cu113
torchfile==0.1.0
torchvision==0.11.1+cu113

cProfile output

     1846962 function calls (1806034 primitive calls) in 9.077 seconds

What should I look for / watch out in the results ?

ghost · 2021-11-10T19:44:19Z

Does your motherboard support PCI Express 3.0? If it only supports PCIe 2.0 then that is likely the bottleneck.

Some motherboards are manufactured with slots that fit a x16 GPU, but don't include all 16 PCIe lanes for communication.

ghost · 2021-11-10T22:15:39Z

@MGSousa @StElysse @Ca-ressemble-a-du-fake

For now, I am going to assume that "slow training" is caused by hardware being old or having limitations that prevent the GPU from operating at its full potential.

If this is not the case, please provide hardware details including:

Motherboard manufacturer and model (confirm PCIe 3.0/4.0 support, the PCIe x16 slot supports x16 bandwidth)
CPU model
RAM speed and amount

Ca-ressemble-a-du-fake · 2021-11-11T04:23:44Z

@blue-fish thanks for your support. You are right according to nvidia X server settings PCIe generation is only Gen2 (CPU i7 2600 does not support PCIe 3.0). Yet maximum PCIe link width is well reported as x16 (5.0 GT/s).

Consequently you may be right old hardware could be the bottleneck for the GPU although CPU usage keeps around 12% while training, RAM usage is around 65% and PCIe Bandwidth Utilization is reported as low as 1% by nvidia X server settings. So no hardware seems to work at its full potential.

Ca-ressemble-a-du-fake · 2021-11-17T20:30:39Z

Just a follow-up on that topic. I tried to train the model with SIWIS dataset in French on Google Colab.
Environment is as follows :

Drivers : NVIDIA-SMI 495.44 Driver Version: 460.32.03 CUDA Version: 11.2
CPU is Intel(R) Xeon(R) CPU @ 2.30GHz with 2 processors and around 13GB of RAM.

While training with batch size of 32 and r = 7, I get on

K80 (around 11GB VRAM - Kepler arch from 2014) : 0,40 - 0,45 steps / s (I tried both Pytorch installs with CUDA 10.2 and CUDA 11.3 the latter being a little bit slower).
V100-SXM2 (around 16GB VRAM - Volta arch from 2017) : 1.4 steps / s (default install with Cuda 10.2)

I am quite surprised it is not way faster on such high end GPU (around 3 times faster than on my old setup [except for the GPU]).

Is there a way to know if code is dealing with floating point 16 or 32 operations, because I read somewhere that one mode brought a performance boost ? For now it is way too far from my current understanding of the topic!

jxnxts · 2022-01-20T02:54:11Z

@Rainer2465 Mudei as saídas/passo para quatro (r=4 ~91 ) um pouco tem resultados decentes por enquanto. Mas com r=2 os resultados permanecem os mesmos (42 ~ 43 passos /s).

@MGSousa @Rainer2465 vocês possuem algum modelo treinado hoje? podemos conversar?

ghost mentioned this issue Mar 24, 2021

Transitioning to the PyTorch version with Tensorflow-trained models #711

Closed

ghost closed this as completed Mar 24, 2021

ghost reopened this Apr 1, 2021

ghost added the bug Something isn't working label Apr 1, 2021

ghost mentioned this issue May 30, 2021

Slow training on Tesla P40 #740

Closed

wangkewk mentioned this issue Aug 21, 2021

训练模型时显存爆了 babysor/MockingBird#27

Closed

ghost mentioned this issue Nov 3, 2021

TTS outputing different words than the ones typed in #883

Closed

ghost closed this as completed Nov 12, 2021

ghost removed the bug Something isn't working label Nov 12, 2021

Bebaam mentioned this issue Dec 7, 2021

How to know if the training of the synthetizer is being done in parallel with my GPU? #940

Open

This issue was closed.

Slow Training GPU RTX 2080 #700

Slow Training GPU RTX 2080 #700

Comments

MGSousa commented Mar 11, 2021

ghost commented Mar 24, 2021

ghost commented Mar 24, 2021

ghost commented Apr 1, 2021

ghost commented Apr 1, 2021

StElysse commented Apr 1, 2021

MGSousa commented Apr 1, 2021 • edited Loading

justinjohn0306 commented Apr 2, 2021

justinjohn0306 commented Apr 2, 2021

MGSousa commented Apr 5, 2021

Rainer2465 commented Jul 19, 2021

MGSousa commented Jul 21, 2021 • edited Loading

Rainer2465 commented Jul 21, 2021

Ca-ressemble-a-du-fake commented Nov 5, 2021 • edited Loading

Ca-ressemble-a-du-fake commented Nov 5, 2021 • edited Loading

Ca-ressemble-a-du-fake commented Nov 5, 2021

ghost commented Nov 5, 2021 • edited by ghost Loading

ghost commented Nov 5, 2021

Ca-ressemble-a-du-fake commented Nov 6, 2021

ghost commented Nov 6, 2021

Ca-ressemble-a-du-fake commented Nov 6, 2021

ghost commented Nov 7, 2021

Ca-ressemble-a-du-fake commented Nov 7, 2021

Ca-ressemble-a-du-fake commented Nov 7, 2021 • edited Loading

ghost commented Nov 7, 2021

Ca-ressemble-a-du-fake commented Nov 8, 2021 • edited Loading

ghost commented Nov 8, 2021

Ca-ressemble-a-du-fake commented Nov 9, 2021 • edited Loading

Ca-ressemble-a-du-fake commented Nov 9, 2021

Ca-ressemble-a-du-fake commented Nov 9, 2021

Ca-ressemble-a-du-fake commented Nov 9, 2021

ghost commented Nov 9, 2021

Training schedule update

Dataloader update

Command

Output

Ca-ressemble-a-du-fake commented Nov 10, 2021

Ca-ressemble-a-du-fake commented Nov 10, 2021 • edited Loading

Environment Summary

cProfile output

ghost commented Nov 10, 2021 • edited by ghost Loading

ghost commented Nov 10, 2021

Ca-ressemble-a-du-fake commented Nov 11, 2021

Ca-ressemble-a-du-fake commented Nov 17, 2021 • edited Loading

jxnxts commented Jan 20, 2022

MGSousa commented Apr 1, 2021 •

edited

Loading

MGSousa commented Jul 21, 2021 •

edited

Loading

Ca-ressemble-a-du-fake commented Nov 5, 2021 •

edited

Loading

Ca-ressemble-a-du-fake commented Nov 5, 2021 •

edited

Loading

ghost commented Nov 5, 2021 •

edited by ghost

Loading

Ca-ressemble-a-du-fake commented Nov 7, 2021 •

edited

Loading

Ca-ressemble-a-du-fake commented Nov 8, 2021 •

edited

Loading

Ca-ressemble-a-du-fake commented Nov 9, 2021 •

edited

Loading

Ca-ressemble-a-du-fake commented Nov 10, 2021 •

edited

Loading

ghost commented Nov 10, 2021 •

edited by ghost

Loading

Ca-ressemble-a-du-fake commented Nov 17, 2021 •

edited

Loading