Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[WIP] Jetson nano & TX installation updates #14855

Closed
wants to merge 16 commits into from

Conversation

aaronmarkham
Copy link
Contributor

@aaronmarkham aaronmarkham commented May 1, 2019

Description

The Jetson instructions didn't work for me on the Nano, so I worked through a bunch of the errors and came up with these fixes.

The goal was to get around the really long compile time locally on the device and to use cross-compilation.
I also thought it would be nice to use the Java API for inference-only applications.

So this PR gets you MXNet v1.4.1:

Gotchas

  • There's no mention of the MSHADOW_USE_PASCAL setting in the CI cross-compile route. This probably needs to be addressed, but I'm not sure of the impact. This PR ignores it just as the current setup that uses Docker/CI does.

Java Support Excluded (updated)

Thanks @lanking520 and @zachgk for helping on the Java stuff. And @larroy for general Jetson help.

I used a workaround to get v1.4.1 for Java, but it doesn't work on the Nano due to some hard-coded build flags that are for Intel chipsets. Removing the flag manually resulted in other errors asking for configuration. I don't know enough about cross-compiling to sort out these issues. Combine that with it being really hard to reproduce for v1.4.1, I'm letting this go for this PR. Maybe someone else can look at getting the Java API to work with 1.5.0 on the Nano.

@aaronmarkham
Copy link
Contributor Author

arm8 fails because it is looking for OpenCV. I turned that on by default because it is a requirement for the Java API to build. I think I should check arm8's install steps and add opencv to it.

@KellenSunderland
Copy link
Contributor

Couple nits but looks good. Thanks for updating this to support nano's @aaronmarkham

@anirudhacharya
Copy link
Member

@mxnet-label-bot add [pr-work-in-progress]

@marcoabreu marcoabreu added the pr-work-in-progress PR is still work in progress label May 2, 2019
ci/docker/install/arm8_python.sh Show resolved Hide resolved
setuptools

sudo pip install \
graphviz==0.8.4 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphviz is installed above already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for your local install vs doing it with docker.


```python
import mxnet
mxnet.__version__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually it's good to make 2 tensors on the gpu and add them (to make sure CUDA related stuff was built correctly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I'll add that in. I was able to verify this myself.

make/crosscompile.jetson.mk Show resolved Hide resolved
@aaronmarkham
Copy link
Contributor Author

I removed the scala setup step since this PR isn't going to get the Scala/Java stuff working yet. It'll focus on Python only.

@aaronmarkham
Copy link
Contributor Author

This PR is hung up on a CI issue. It's arm pipelines are failing with not being able to apt-get update. Does anyone know why this would be the case?

@lebeg
Copy link
Contributor

lebeg commented May 23, 2019

Sometimes the docker cache is outdated and contains not anymore valid ppa links. Actually the docker cache should be rebuilt by this job, so it's worth just a try on retriggering the PR verification.

@larroy
Copy link
Contributor

larroy commented May 23, 2019

I sometimes have docker issues as well, I think the way we use it triggers some bugs in caching. I suggest to do the following which has worked for me when I had problems:

  • docker pull the base images manually, ex. ubuntu
  • Fully erase the docker cache.
  • Don't use cache-from in your build.py script and build the container locally.
docker pull ubuntu:16.04
docker rm $(docker ps -a -q)
docker rmi $(docker images -a -q)

This is of course if you think your problems are related to docker caching.

@aaronmarkham
Copy link
Contributor Author

Rebasing to see if maybe the cache is fixed now...

@aaronmarkham
Copy link
Contributor Author

Still stuck. I'm not sure what to do now.

@aaronmarkham aaronmarkham mentioned this pull request May 31, 2019
@dwSun
Copy link
Contributor

dwSun commented Jun 7, 2019

Thanks for your work.

But I have problems with your pre-built wheel.

after installed your wheel with pip, I copyed libmxnet.so with:

cp ./.local/mxnet/libmxnet.so /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so

After that, I got this error:

david@TinyNuke:~$ python3 mnist_cnn.py 
Traceback (most recent call last):
  File "mnist_cnn.py", line 2, in <module>
    import mxnet as mx
  File "/home/david/.local/lib/python3.6/site-packages/mxnet/__init__.py", line 24, in <module>
    from .context import Context, current_context, cpu, gpu, cpu_pinned
  File "/home/david/.local/lib/python3.6/site-packages/mxnet/context.py", line 24, in <module>
    from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
  File "/home/david/.local/lib/python3.6/site-packages/mxnet/base.py", line 213, in <module>
    _LIB = _load_lib()
  File "/home/david/.local/lib/python3.6/site-packages/mxnet/base.py", line 204, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so: cannot open shared object file: No such file or directory

Then I figured out your wheel is packed with a x86-64 arch libmxnet.so:

david@TinyNuke:~$ ls -lah /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so
-rwxrwxr-x 1 david david 364M 6月   7 22:22 /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so
david@TinyNuke:~$ file /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so
/home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=5b07574bc684f6acdef5ba509889ad9c8298d157, with debug_info, not stripped
d

Local dynamic library on jetson nano looked like this:

file /usr/local/cuda/lib64/libcudart.so.10.0.166
/usr/local/cuda/lib64/libcudart.so.10.0.166: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, stripped

@larroy
Copy link
Contributor

larroy commented Jun 13, 2019

If you want we can sit together for a couple of hours and figure this out.

@aaronmarkham
Copy link
Contributor Author

@dwSun you're right, I need to swap out the .so file in that wheel. I have one that works in the main comment of this PR or just click here. https://s3.us-east-2.amazonaws.com/mxnet-public/install/jetson/1.4.1/libmxnet.so

@dwSun
Copy link
Contributor

dwSun commented Jun 19, 2019

@aaronmarkham Thanks, your .so file works perfectly on nano device.

@aaronmarkham
Copy link
Contributor Author

Closing in favor of #15117 - although at some point nano should be added to CI.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-work-in-progress PR is still work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants