-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #70
Comments
Medaka v0.8.x will not work with tensorflow(-gpu) 1.12, you must use 1.14. Tensorflow 2.0.0 is untested and not recommended. That being said, this is not the cause of your errors above. I would surmise one of two things is happening.
See #65 (comment) |
Thanks for the rapid reply. I appreciate the help troubleshooting the install environment. I'm pretty sure that tensorflow-gpu is installed in the environment. After I compiled/install medaka from source, I try to uninstall tensorflow, the system confirms it is not installed and that tensorflow-gpu is installed
I was running an incompatible cudnn version (v7.6). I have downgraded the cudnn version down to v7.4. Running a test case and I can confirm it is functional:
When I run Unfortunately, I'm still seeing the same behavior as before. Medaka is crashing.
You mentioned
Other than the stdout, where should I look for the logging? |
To see this run medaka consensus with with the debug logging:
On my computer this gives:
|
Looking at our code, the log message
Only means that a CuDNN accelerated model is going to be built (because tensorflow has detected a CUDA GPU), not that the CuDNN libraries have been successfully loaded. |
I've done a little research, are you using an RTX series card? I do not have access these right now. This issue, (which is perhaps where you came across the When I run That the default value for this parameter is inappropriate for most GPUs is a known bug. For tensorflow 1.12.0 the default value was fine, there are tensorflow github issues noting a similar memory use increase. We intend to reduce the default value. |
Yes, I've got an RTX card. I'm running on a GeForce RTX 2080 I've tried reducing the batch size, even as low as 1 and each time it still produces the same out of error Running medaka in debug mode produces:
Just before the crash, I see this in the debug output
|
I believe you can enable the behaviour of
Have you tried this? When I tested this it resulted in having to reduce the batch size further to An alternative would be to force medaka to not use the cuDNN library. To do this you will have to get your hands ever so slightly dirty with the code: at this line you should fix I'm sorry I cannot provide better guidance at this time. I will discuss this further with some contacts and find out if there are better solutions. |
Changing the environmental variable as you suggest FIXES the crashing behavior. I had to decrease the batchsize to 50 to stop the OOM errors.
You've been very helpful and certainly clarified this for setting up my next box. |
Setup Medaka v0.8.1 to run with GPU, but it crashes consistently get this error during runtime
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
.I'm seeing references to
gpu_options.allow_growth = True
online but not sure how that would be implemented with this code.System:
Ubuntu 18.04
Cuda 10.1
tensorflow-gpu 1.12 (also tried 1.14 and 2.0.0-beta1)
The text was updated successfully, but these errors were encountered: