-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[WIP] Jetson nano & TX installation updates #14855
Conversation
63e5066
to
45d7dce
Compare
arm8 fails because it is looking for OpenCV. I turned that on by default because it is a requirement for the Java API to build. I think I should check arm8's install steps and add opencv to it. |
Couple nits but looks good. Thanks for updating this to support nano's @aaronmarkham |
@mxnet-label-bot add [pr-work-in-progress] |
setuptools | ||
|
||
sudo pip install \ | ||
graphviz==0.8.4 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
graphviz is installed above already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for your local install vs doing it with docker.
|
||
```python | ||
import mxnet | ||
mxnet.__version__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually it's good to make 2 tensors on the gpu and add them (to make sure CUDA related stuff was built correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I'll add that in. I was able to verify this myself.
I removed the scala setup step since this PR isn't going to get the Scala/Java stuff working yet. It'll focus on Python only. |
e01ceb6
to
64f83d2
Compare
This PR is hung up on a CI issue. It's arm pipelines are failing with not being able to |
Sometimes the docker cache is outdated and contains not anymore valid ppa links. Actually the docker cache should be rebuilt by this job, so it's worth just a try on retriggering the PR verification. |
I sometimes have docker issues as well, I think the way we use it triggers some bugs in caching. I suggest to do the following which has worked for me when I had problems:
This is of course if you think your problems are related to docker caching. |
64f83d2
to
7ddd568
Compare
Rebasing to see if maybe the cache is fixed now... |
Still stuck. I'm not sure what to do now. |
Thanks for your work. But I have problems with your pre-built wheel. after installed your wheel with pip, I copyed libmxnet.so with: cp ./.local/mxnet/libmxnet.so /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so After that, I got this error: david@TinyNuke:~$ python3 mnist_cnn.py
Traceback (most recent call last):
File "mnist_cnn.py", line 2, in <module>
import mxnet as mx
File "/home/david/.local/lib/python3.6/site-packages/mxnet/__init__.py", line 24, in <module>
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/home/david/.local/lib/python3.6/site-packages/mxnet/context.py", line 24, in <module>
from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
File "/home/david/.local/lib/python3.6/site-packages/mxnet/base.py", line 213, in <module>
_LIB = _load_lib()
File "/home/david/.local/lib/python3.6/site-packages/mxnet/base.py", line 204, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so: cannot open shared object file: No such file or directory
Then I figured out your wheel is packed with a x86-64 arch libmxnet.so: david@TinyNuke:~$ ls -lah /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so
-rwxrwxr-x 1 david david 364M 6月 7 22:22 /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so
david@TinyNuke:~$ file /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so
/home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=5b07574bc684f6acdef5ba509889ad9c8298d157, with debug_info, not stripped
d Local dynamic library on jetson nano looked like this: file /usr/local/cuda/lib64/libcudart.so.10.0.166
/usr/local/cuda/lib64/libcudart.so.10.0.166: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, stripped |
If you want we can sit together for a couple of hours and figure this out. |
@dwSun you're right, I need to swap out the .so file in that wheel. I have one that works in the main comment of this PR or just click here. https://s3.us-east-2.amazonaws.com/mxnet-public/install/jetson/1.4.1/libmxnet.so |
@aaronmarkham Thanks, your .so file works perfectly on nano device. |
Closing in favor of #15117 - although at some point nano should be added to CI. |
Description
The Jetson instructions didn't work for me on the Nano, so I worked through a bunch of the errors and came up with these fixes.
The goal was to get around the really long compile time locally on the device and to use cross-compilation.
I also thought it would be nice to use the Java API for inference-only applications.
So this PR gets you MXNet v1.4.1:
Gotchas
MSHADOW_USE_PASCAL
setting in the CI cross-compile route. This probably needs to be addressed, but I'm not sure of the impact. This PR ignores it just as the current setup that uses Docker/CI does.Java Support Excluded (updated)
Thanks @lanking520 and @zachgk for helping on the Java stuff. And @larroy for general Jetson help.
I used a workaround to get v1.4.1 for Java, but it doesn't work on the Nano due to some hard-coded build flags that are for Intel chipsets. Removing the flag manually resulted in other errors asking for configuration. I don't know enough about cross-compiling to sort out these issues. Combine that with it being really hard to reproduce for v1.4.1, I'm letting this go for this PR. Maybe someone else can look at getting the Java API to work with 1.5.0 on the Nano.