- NVIDIA Jetson TK1 + Ubuntu 14.04.1
- CUDA 6.5 + CuDNN installed
- An external memory as swap disk. An USB is okay.
- Time. This process might take you hours.
- Create Swap Disk
- Set CUDA v7.0 and CuDNN 4 as a compiler
- Build TensorFlow
03/16/2017: Bazel 4.4 fully supports ARM32-bit. Therefore, we do not have to build protobuf and gRPCJava from starch
* Install Java JDK 8.0
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
# For protobuf & bazel. We will also need `maven`, but I rather install it after this step.
sudo apt-get install git zip unzip autoconf automake libtool curl zlib1g-dev
# For TensorFlow (I assumed you are using python 2 on Jetson TK1. If you are using python 3, I am not sure if the rest will work)
# For Python 2.7
sudo apt-get install python-pip python-numpy swig python-dev
sudo pip install wheel
- Download
bazel 0.4.4
wget https://github.com/bazelbuild/bazel/releases/download/0.4.4/bazel-0.4.4-dist.zip
mkdir bazel-0.4.4
unzip bazel-0.4.4-dist.zip -d bazel-0.4.4
cd bazel-0.4.4
- Create swap disk Since Jetson TK1 only has 2GB, it is safe practice to install extra swap
# Find the USB path
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 29.5G 0 disk # <------ This is my usb
`-sda1 8:1 1 29.5G 0 part [SWAP]
...
- Unmount the disk and create swap disk
sudo umount /dev/sda
sudo mkswap /dev/sda
sudo swapon /dev/sda
- Copy
bazel
to/usr/local/bin
.
sudo cp output/bazel /usr/local/bin/bazel
- Verify that bazel is working
$ bazel
Extracting Bazel installation...
.....................
[bazel release 0.4.3- (@non-git)]
Usage: bazel <command> <options> ...
Available commands:
analyze-profile Analyzes build profile data.
build Builds the specified targets.
...
**Optional : **
If you get OutOfResource
error durring installtion. Try to manually configure maximum heap size as following
- Configure file
scripts/bootstrap/compile.sh
:
# Limit heap size
vim scripts/bootstrap/compile.sh
# Around line 137
# Add this part `-J-Xmx2g` <---Set maximum Heap Size to 2GB
run "${JAVAC}" -classpath "${classpath}" -sourcepath "${sourcepath}" \
-d "${output}/classes" -source "$JAVA_VERSION" -target "$JAVA_VERSION" \
-encoding UTF-8 "@${paramfile}" -J-Xmx2g
The reason is that TF supports only CUDA 7.0 and up. Although we cannot use CUDA 7.0 on TK1, we can still install to use it as a compiler.
- Download and install CUDA 7.0
wget http://developer.download.nvidia.com/embedded/L4T/r24_Release_v1.0/CUDA/cuda-repo-l4t-7-0-local_7.0-76_armhf.deb
sudo dpkg -i cuda-repo-l4t-7-0-local_7.0-76_armhf.deb
sudo apt-get update && sudo apt-get install cuda-toolkit-7-0
Note:
If you run into error like the following, run sudo apt-get clean
and try again
dpkg-deb (subprocess): decompressing archive member: internal gzip read error: '<fd:4>: incorrect data check'
- Download
cuDNN 3.0
to use during compilation
# Download cuDNN 3.0 and decompress
tar -xvf cudnn-7.0-linux-ARMv7-v3.0-prod.tgz
cd cudnn/cuda/
# Copy to cuda folder
sudo cp ./cudnn/cuda/include/cudnn.h /usr/local/cuda/include/
- Build TensorFlow
- Download TensorFlow and checkout v0.12.1
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v0.12.1
- Apply the patch for TensorFlow v0.12.1. Please note that if you are building different version. The patch will not work. Basically, we eddited some files in the tensorflow source so it allows us to compile on jetson TK1.
patch -p1 < ../tensorflow_0.12.1_jetsontk1.patch
- Replace all
lib-64
withlib
and configure TF before installation.
sudo grep -Rl 'lib64' | sudo xargs sed -i 's/lib64/lib/g'
- Configure TF before installation (~./.bashrc and cuDNN is set up correctly).
./configure
...
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: **`3.2`**
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
......................
INFO: All external dependencies fetched successfully.
Configuration finished
- First Installation: wait for failure to edit
Macro.h
file (mentioned by cudamusing )
bazel build -c opt --jobs 1 --local_resources 1800,0.5,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package
- When it failed, edit
Marco.h
file in
#ifndef EIGEN_HAS_VARIADIC_TEMPLATES
#if EIGEN_MAX_CPP_VER>=11 && (__cplusplus > 199711L || EIGEN_COMP_MSVC >= 1900) \
// --> && (!defined(__NVCC__) || !EIGEN_ARCH_ARM_OR_ARM64 || (defined __CUDACC_VER__ && __CUDACC_VER__ >= 80000) )
// ^^ Disable the use of variadic templates when compiling with versions of nvcc older than 8.0 on ARM devices:
// this prevents nvcc from crashing when compiling Eigen on Tegra X1
#define EIGEN_HAS_VARIADIC_TEMPLATES 1
- Second Installation
bazel build -c opt --jobs 1 --local_resources 1800,0.5,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package
- If it is successfully built, you should see something like this
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 7153.308s, Critical Path: 333.40s
- Install TensorFlow
- Build .whl package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
- Install .whl package
sudo pip install /tmp/tensorflow_pkg/tensorflow-0.12.1-cp27-none-linux_armv7l.whl
- Remove symlink
CUDA 7.0
in order to run Tensorflow withCUDA 6.5
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-6.5/ /usr/local/cuda
# Update symlinks lib*.7.0 to local one
# Watch out the driver cuda-driver-dev-6-5
-
Congratulations! You have succesfully built TensorFlow from source on NVIDA Jetson TK1
-
Test Tensorflow
# https://www.tensorflow.org/how_tos/using_gpu/
python
import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
# Expected Output
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:872] ARMV7 does not support NUMA -returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GK20A
major: 3 minor: 2 memoryClockRate (GHz) 0.852
pciBusID 0000:00:00.0
Total memory: 1.85GiB
Free memory: 199.25MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GK20A, pci bus id: 0000:00:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GK20A, pci bus id: 0000:00:00.0
I tensorflow/core/common_runtime/direct_session.cc:149] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GK20A, pci bus id: 0000:00:00.0
>>> print sess.run(c)
b: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:388] b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:388] a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:388] MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
>>>
- Ran out of memory. Try to update
--local-resoures
where n1,n2,n3 is memroy,cpu_thread,i/o input
C++ compilation of rule '//tensorflow/core/kernels:svd_op' failed: gcc failed: error executing command -