-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting Caffe to use the GPU #147
Comments
Any ideas @cypof? |
I would be useful to put traces in common.cpp, when the thread-local instance of Caffe is created. It will show if the GPU context is created correctly. Also Caffe.set_mode might need to be called earlier, beofre the creation of the FloatNet. |
Thanks @saudet and @cypof! We tried two things.
We get the same error when we run the
|
It also crashes in the same way here. It wasn't like that before. I wonder what changed... |
It worked with both master branches a couple months ago. Do you a way to debug C code called from inside the JVM? It can be done by launching the JVM from a C program, using jni.h, so that gdb can be attached. Let me know if you are all set. |
Thanks for the input guys! I've tried it in GDB and it crashes in Boost, on a null-pointer assert trying to dereference a shared_ptr from the RNG module. It's really strange. It has nothing to do with CUDA or Java. I remember it working with CUDA 7.0, but I'm not sure that's related. CUDA 7.5 has been out for a while already... |
Ah! Thanks a lot guys, that explains why I couldn't find a version of caffe/javacpp that worked. I didn't suspect that CUDA 7.5 was the problem! Trying 7.0 now... |
Unfortunately I observed the same crash with CUDA 7.0, I set the .jar up like this: https://github.com/amplab/SparkNet/blob/0cbff42ec8e072be215da9f302b3ad55f96e5679/doc/creating-jars.md I can also try an older version of boost and see if that fixes the problem. |
Thanks for testing! I didn't think it was related to the CUDA version either, and I don't think its Boost either, but it's easy enough to test, so why not. If I had a bit of time though, the next thing I'd try to do is to convert the |
Thanks a lot for your help! The approach I'm trying is doing a bisection to see if I can pinpoint where the problem was introduced. For this to work, I need a configuration where it doesn't crash. The oldest configuration I succeeded building was commit 5070834 and caffe/javacpp-presets from the same day, which still seems to crash (that was with CUDA 7.5 however). I'll keep you updated about how it goes. |
Yes, I understand, but I'm pretty sure it used to work with the release of JavaCPP 1.1, but since then OpenJDK and the kernel has been updated a few times, and who knows what at this point |
This error happened for me after updating Caffe, but before I updated to JavaCPP1.1 (that's why I tried updating). |
@bfoust So it's something that changed in Caffe itself? |
@saudet Not yet verified - that is, haven't reverted Caffe, yet, to test that. I'm also on a different machine, so quite possibly just a g++ version mismatch (32-/64-bit) issue. I currently suspect JavaCPP is using a different version of g++ than Caffe, as Caffe would not compile without the previous version of gcc/g++ (3.8). |
As a small experiment just now, I've tried now to rename the |
Ok, I've found the issue! We just have to remove the |
Awesome! We'll give this a try! |
It worked for us. Thanks a lot! |
I didn't find you removing "CPU_ONLY" in your "caffe/src/main/java/org/bytedeco/javacpp/presets/caffe.java". So I tried editing and removing "CPU_ONLY" here:
Now it is working! |
Hi, thanks for all of your help so far!
I have a question about getting Caffe to use the GPU. A minimal example is below, and a project that compiles and runs the code below is attached.
This code creates a network for Cifar10 and then calls
caffeNet.ForwardBackward(inputs)
some number of times. You can see by runningtop
in a separate window that CPU usage is very high. Furthermore, each call to ForwardBackward takes about 0.3s (on my machine), which is much slower than you would expect for one minibatch of Cifar10 on a GPU. This suggests that the GPU is not being used. A call tonvidia-smi
also does not show any GPU usage.The line
Caffe.set_mode(Caffe.GPU)
doesn't seem to make any difference. Are there any obvious mistakes here? Thanks!ExampleGPU.zip
The text was updated successfully, but these errors were encountered: