Using cuDNN for CTC Loss #7445

sbodenstein · 2017-08-13T19:49:38Z

@piiswrong, @szha: Now that cuDNN 7 supports CTC loss, perhaps we should discard the current GPU implementation in contrib.ctc_loss (adapted from the WarpCTC implementation) and only use cuDNN for GPU? The main reasons:

it requires maintenance effort to ensure the GPU implementation works on new GPU architectures, requiring careful updating of dependencies (like modern gpu).
Users are still reporting problems with memset issues when using the WarpCTC plugin (mx.sym.WarpCTC cuda memcpy or memset failed issue #6121)

I don't think the maintenance effort is worthwhile if almost every single user training with CUDA will have cuDNN.

What are your thoughts?

szha · 2017-08-13T20:04:03Z

Thanks for raising this, @sbodenstein. I'm working on using the cudnn7 implementation of CTC for GPU.

sbodenstein · 2017-08-14T20:49:05Z

@szha: do you agree that we should remove the WarpCTC CUDA implementation?

szha · 2017-08-14T21:01:12Z

@sbodenstein I agree. There is only one catch. It seems that the current WarpCTC supports variable-length inputs whereas cudnn7 only has the intention to support it. To elaborate, the current cudnn7 CTC API for getting workspace size looks like this:

cudnnStatus_t CUDNNWINAPI cudnnGetCTCLossWorkspaceSize(
                                cudnnHandle_t                       handle,
                                const cudnnTensorDescriptor_t       probsDesc,       /* Tensor descriptor for probabilities, the dimensions are T,N,A (T is the timing steps, N is the mini batch size, A is the alphabet size) */
                                const cudnnTensorDescriptor_t       gradientsDesc,   /* Tensor descriptor for gradients, the dimensions are T,N,A. To compute costs only, set it to NULL */
                                const int                          * labels,         /* labels, in CPU memory */
                                const int                          * labelLengths,   /* the length of each label, in CPU memory */
                                const int                          * inputLengths,   /* the lengths of timing steps in each batch, in CPU memory */
                                cudnnCTCLossAlgo_t                  algo,            /* algorithm selected, supported now 0 and 1 */
                                cudnnCTCLossDescriptor_t            ctcLossDesc,
                                size_t                             *sizeInBytes );   /* pointer to the returned workspace size */

However, if I give any *inputLengths that is different from T, it throws an CUDNN_STATUS_BAD_PARAM error. Reading the doc, it has a note saying that this error is thrown if:

‣ The inputLengths do not agree with the first dimension of probsDesc.

...so my guess is that this argument is left there so that variable input lengths will be supported going forward, because asking for a list of Ts doesn't make much sense otherwise.

Back to whether we should remove the WarpCTC implementation, I think we need to first clarify with cudnn team on when the *inputLengths can support variable input lengths, and remove the WarpCTC GPU implementation when that becomes available.

sbodenstein · 2017-08-14T21:13:17Z

Ah, that is annoying! One other limitation I noticed: CUDNN_STATUS_BAD_PARAM is also thrown when

The labelLengths is greater than 256.

whilst WarpCTC:

The CUDA implementation supports a maximum label length of 639 (timesteps are unlimited).

szha · 2017-08-14T21:20:54Z

Good catch. Let me reflect this in the PR as well.

starimpact · 2017-08-16T03:29:40Z

so, the ctc of cudnn7 supports neither variable lengths inputs nor longer labellengths than 256.

sbodenstein · 2017-08-16T13:08:00Z

@szha: one annoying thing about you adding cuDNN support: if you build against cuDNN (which you always usually want to do), you automatically have to use the cuDNN WarpCTC implementation, which you might not want if you want to support variable length inputs.

szha · 2017-08-16T15:37:55Z

Current implementation still includes the WarpCTC implementation in the GPU version and only enables cudnn version when all input requirements are met, since the cudnn version is strictly more limited.

szha · 2017-09-27T20:56:42Z

Unfortunately I have to turn cudnn CTC off because of the API design. I have requested API changes to nv people and hopefully we could incorporate that once they change the API.

sbodenstein · 2017-09-28T12:12:00Z

only enables cudnn version when all input requirements are met, since the cudnn version is strictly more limited.

@szha: why was that not enough? Why do you need to completely turn it off?

szha · 2017-09-28T18:18:05Z

The input length note being supported means that there's no way for me to enforce consistency between CPU and GPU implementation. cudnn version also doesn't reset the padding area gradients to zero, which can cause instability in training. It's too much of a risk to leave it be.

szha · 2017-12-29T00:26:29Z

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

szha · 2018-04-10T17:55:57Z

cudnn integration for ctc requires action from cudnn team in improving the API.

szha mentioned this issue Aug 14, 2017

contrib ctc interface changes, cudnn7 CTC, and gluon CTC #7442

Merged

Soonhwan-Kwon mentioned this issue Nov 3, 2017

about contrib.sym.ctc_loss? ai-adv-lab/deepspeech.mxnet#7

Closed

szha added the needs triage label Dec 29, 2017

szha added Roadmap Operator and removed needs triage labels Apr 10, 2018

szha mentioned this issue Jun 7, 2018

Some mxnet ctc_loss bug & feature request #10995

Closed

jinserk mentioned this issue Jun 15, 2018

RuntimeError: cuda memcpy or memset failed t-vi/warp-ctc#4

Open

tzyll mentioned this issue Feb 17, 2019

Error: compute_ctc_loss, stat = cuda memcpy or memset failed flashlight/wav2letter#207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using cuDNN for CTC Loss #7445

Using cuDNN for CTC Loss #7445

sbodenstein commented Aug 13, 2017

szha commented Aug 13, 2017

sbodenstein commented Aug 14, 2017

szha commented Aug 14, 2017

sbodenstein commented Aug 14, 2017

szha commented Aug 14, 2017

starimpact commented Aug 16, 2017

sbodenstein commented Aug 16, 2017

szha commented Aug 16, 2017

szha commented Sep 27, 2017

sbodenstein commented Sep 28, 2017

szha commented Sep 28, 2017

szha commented Dec 29, 2017

szha commented Apr 10, 2018

Using cuDNN for CTC Loss #7445

Using cuDNN for CTC Loss #7445

Comments

sbodenstein commented Aug 13, 2017

szha commented Aug 13, 2017

sbodenstein commented Aug 14, 2017

szha commented Aug 14, 2017

sbodenstein commented Aug 14, 2017

szha commented Aug 14, 2017

starimpact commented Aug 16, 2017

sbodenstein commented Aug 16, 2017

szha commented Aug 16, 2017

szha commented Sep 27, 2017

sbodenstein commented Sep 28, 2017

szha commented Sep 28, 2017

szha commented Dec 29, 2017

szha commented Apr 10, 2018