-
Hello, thanks for sharing. Here is error-log:
How can I fix the python command to adapt my hardware system? |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments
-
Hi @momo1986. Thanks for reporting the issue. What version of mxnet are you using? (you can find by using |
Beta Was this translation helpful? Give feedback.
-
Hello, @szha , I use MXNET 1.4.0.post0. Thanks for your concern! |
Beta Was this translation helpful? Give feedback.
-
@momo1986 there have been some improvements in memory management and dropout state sharing that may be helpful to you. You can try it out by installing the version 1.5.0b20190411 to try it out. If it still doesn't fit, you may need to reduce the batch size or use gradient accumulation to simulate large batch size. (an example can be found in the train_transformer.py) |
Beta Was this translation helpful? Give feedback.
-
Hello, @szha . I try to build with mxnet1.5. However, the problem is reported: I roll back to 1.4. I tried with the command: However, it failed with such stacks:
Thanks & regards! |
Beta Was this translation helpful? Give feedback.
-
Set both MXNET_GPU_MEM_POOL_TYPE=Round and MXNET_GPU_MEM_POOL_RESERVE=10 |
Beta Was this translation helpful? Give feedback.
-
This means the library didn't find cuda 9.2 in your path. Are you using cuda 9.2? |
Beta Was this translation helpful? Give feedback.
-
Hello, @szha However, problem exists both for mxnet-cu92 and mxnet-cu90.
|
Beta Was this translation helpful? Give feedback.
-
@momo1986 the missing shared object file I know installing cuda can be tricky on some systems, and personally I find installing from the local runfile to be most reliable. Given that the symptom points to a problem in cuda installation, I'd recommend the following:
Hope it helps. Feel free to let us know if you need more help. You can also join our slack channel (registration link) so that you may request help there.. |
Beta Was this translation helpful? Give feedback.
-
@momo1986 did you get a chance to try out the above? |
Beta Was this translation helpful? Give feedback.
-
Closing due to lack of activity, feel free to reopen if you have more questions |
Beta Was this translation helpful? Give feedback.
@momo1986 the missing shared object file
libnvrtc.so.9.0
is for NVRTC, which should be part of cuda 9.0 as shown in the doc. Also, mxnet-cu90 only works with cuda 9.0 and mxnet-cu92 only with cuda 9.2, so let's try not to mix them on the same system.I know installing cuda can be tricky on some systems, and personally I find installing from the local runfile to be most reliable. Given that the symptom points to a problem in cuda installation, I'd recommend the following:
LD_LIBRARY_PATH
. If you're sure the right path is included and it's still not working…