Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

ctc_loss with large alphabet size raises CUDA error #12493

Closed
hallazie opened this issue Sep 10, 2018 · 5 comments
Closed

ctc_loss with large alphabet size raises CUDA error #12493

hallazie opened this issue Sep 10, 2018 · 5 comments

Comments

@hallazie
Copy link

##Enviroment:
python2.7/windows7_64bit
mxnet-1.2.0
Nvidia Driver Version 397.31

Error Message:

[15:41:28] G:\deeplearn\mxnet\dmlc-core\include\dmlc/logging.h:308: [15:41:28] g:\deeplearn\mxnet\mshadow\mshadow./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unknown error
[15:41:28] G:\deeplearn\mxnet\dmlc-core\include\dmlc/logging.h:308: [15:41:28] g:\deeplearn\mxnet\src\engine./threaded_engine.h:370: [15:41:28] g:\deeplearn\mxnet\mshadow\mshadow./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unknown error

Minimum reproducible example

import mxnet as mx
import numpy as np
ctx=mx.gpu(0)
alphabet_size=3000
in_var = mx.sym.Variable('data')
labels_var = mx.sym.Variable('label')
ctc = mx.sym.contrib.ctc_loss(in_var, labels_var)
loss = mx.symbol.MakeLoss(ctc)
arg_shapes,_,_ = loss.infer_shape(data=(6,2,alphabet_size), label=(2,3))
arg_array = [mx.nd.normal(shape=shape, ctx=ctx) for shape in arg_shapes]
exe = loss.bind(ctx=ctx, args=arg_array)
exe.forward(is_train=True)
exe.backward()
outTest = exe.outputs[0]
print '%s'%(outTest.asnumpy())

when alphabet_size=200 the code works fine, when alphabet_size=3000 (for chinese ocr task) the code crashes.

@vrakesh
Copy link
Contributor

vrakesh commented Sep 10, 2018

@hallazie Thank you for reporting the issue. We will look into this
@mxnet-label-bot [CUDA, Operator]

@apeforest
Copy link
Contributor

apeforest commented Sep 10, 2018

@hallazie This issue seems to have been resolved in a recent PR: #11834
I cannot reproduce this error using the latest master build on CUDA 9.0. Could you please verify and let me know if you still have issue? Thanks

@hallazie
Copy link
Author

@hallazie This issue seems to have been resolved in a recent PR: #11834
I cannot reproduce this error using the latest master build on CUDA 9.0. Could you please verify and let me know if you still have issue? Thanks

I'm trying to build the master branch from source but encountered some problems. I'll notify you once I verified it.

@apeforest
Copy link
Contributor

apeforest commented Sep 12, 2018

@hallazie Please let me know if you have encountered specific installation issue. We'd love to help. Ideally, you should be able to just run pip install on your Windows as $ pip install mxnet-cu92 --pre

@hallazie
Copy link
Author

@hallazie Please let me know if you have encountered specific installation issue. We'd love to help. Ideally, you should be able to just run pip install on your Windows as $ pip install mxnet-cu92 --pre

This issue is solved by installing newer release pip install mxnet-cu91==1.2.0. Thanks for the help. :D

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants