Build Tensorflow from source

NVIDIA Jetson TK1 (Linux 32-bit armv7l)

Basic Requirements

NVIDIA Jetson TK1 + Ubuntu 14.04.1
CUDA 6.5 + CuDNN installed
An external memory as swap disk. An USB is okay.
Time. This process might take you hours.

Summary

Install dependencies
Install bazel 0.4.4 --
Build Tensorflow

Create Swap Disk
Set CUDA v7.0 and CuDNN 4 as a compiler
Build TensorFlow

Install Tensorflow

References

Tensorflow on Rasberry Pi 3
Tensorflow on ODROID-C2
TensorFlow on Jetson TK1

UPDATE LOGS

03/16/2017: Bazel 4.4 fully supports ARM32-bit. Therefore, we do not have to build protobuf and gRPCJava from starch

1. Install Dependencies

* Install Java JDK 8.0
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

# For protobuf & bazel. We will also need `maven`, but I rather install it after this step.
sudo apt-get install git zip unzip autoconf automake libtool curl zlib1g-dev  

# For TensorFlow (I assumed you are using python 2 on Jetson TK1. If you are using python 3, I am not sure if the rest will work)
# For Python 2.7
sudo apt-get install python-pip python-numpy swig python-dev
sudo pip install wheel

2. Install Bazel

Download bazel 0.4.4

wget https://github.com/bazelbuild/bazel/releases/download/0.4.4/bazel-0.4.4-dist.zip
mkdir bazel-0.4.4
unzip bazel-0.4.4-dist.zip -d bazel-0.4.4
cd bazel-0.4.4

Create swap disk Since Jetson TK1 only has 2GB, it is safe practice to install extra swap

# Find the USB path
lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    1  29.5G  0 disk 		 # <------ This is my usb
`-sda1        8:1    1  29.5G  0 part [SWAP] 
...

Unmount the disk and create swap disk

sudo umount /dev/sda
sudo mkswap /dev/sda
sudo swapon /dev/sda

Copy bazel to /usr/local/bin.

 sudo cp output/bazel /usr/local/bin/bazel

Verify that bazel is working

 $ bazel

Extracting Bazel installation...
.....................
                                               [bazel release 0.4.3- (@non-git)]
Usage: bazel <command> <options> ...

Available commands:
  analyze-profile     Analyzes build profile data.
  build               Builds the specified targets.
...

**Optional : ** If you get OutOfResource error durring installtion. Try to manually configure maximum heap size as following

Configure file scripts/bootstrap/compile.sh :

# Limit heap size
vim scripts/bootstrap/compile.sh
# Around line 137
# Add this part `-J-Xmx2g`   <---Set maximum Heap Size to 2GB
run "${JAVAC}" -classpath "${classpath}" -sourcepath "${sourcepath}" \
      -d "${output}/classes" -source "$JAVA_VERSION" -target "$JAVA_VERSION" \
      -encoding UTF-8 "@${paramfile}" -J-Xmx2g

B. Set up `CUDA 7.0` and `cuDNN v3` as compiler for TensorFlow

The reason is that TF supports only CUDA 7.0 and up. Although we cannot use CUDA 7.0 on TK1, we can still install to use it as a compiler.

Download and install CUDA 7.0

wget http://developer.download.nvidia.com/embedded/L4T/r24_Release_v1.0/CUDA/cuda-repo-l4t-7-0-local_7.0-76_armhf.deb
sudo dpkg -i cuda-repo-l4t-7-0-local_7.0-76_armhf.deb 
sudo apt-get update && sudo apt-get install cuda-toolkit-7-0

Note: If you run into error like the following, run sudo apt-get clean and try again

dpkg-deb (subprocess): decompressing archive member: internal gzip read error: '<fd:4>: incorrect data check'

Download cuDNN 3.0 to use during compilation

# Download  cuDNN 3.0 and decompress

tar -xvf cudnn-7.0-linux-ARMv7-v3.0-prod.tgz 
cd cudnn/cuda/

# Copy to cuda folder
sudo cp ./cudnn/cuda/include/cudnn.h /usr/local/cuda/include/

Build TensorFlow

Download TensorFlow and checkout v0.12.1

git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v0.12.1

Apply the patch for TensorFlow v0.12.1. Please note that if you are building different version. The patch will not work. Basically, we eddited some files in the tensorflow source so it allows us to compile on jetson TK1.

patch -p1 < ../tensorflow_0.12.1_jetsontk1.patch

Replace all lib-64 with lib and configure TF before installation.

sudo grep -Rl 'lib64' | sudo xargs sed -i 's/lib64/lib/g'

Configure TF before installation (~./.bashrc and cuDNN is set up correctly).

./configure
...
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: **`3.2`**
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
......................
INFO: All external dependencies fetched successfully.
Configuration finished

C. Install TensorFlow

First Installation: wait for failure to edit Macro.h file (mentioned by cudamusing )

bazel build -c opt --jobs 1 --local_resources 1800,0.5,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

When it failed, edit Marco.h file in ~/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/eigen_archive/Eigen/src/Core/util/Macros.h` . Notice my hash dir could be different than yours.

#ifndef EIGEN_HAS_VARIADIC_TEMPLATES
#if EIGEN_MAX_CPP_VER>=11 && (__cplusplus > 199711L || EIGEN_COMP_MSVC >= 1900) \
// -->  && (!defined(__NVCC__) || !EIGEN_ARCH_ARM_OR_ARM64 || (defined __CUDACC_VER__ && __CUDACC_VER__ >= 80000) )
// ^^ Disable the use of variadic templates when compiling with versions of nvcc older than 8.0 on ARM devices:
//    this prevents nvcc from crashing when compiling Eigen on Tegra X1
#define EIGEN_HAS_VARIADIC_TEMPLATES 1

Second Installation

bazel build -c opt --jobs 1 --local_resources 1800,0.5,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

If it is successfully built, you should see something like this

Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 7153.308s, Critical Path: 333.40s

Install TensorFlow

Build .whl package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Install .whl package

sudo pip install /tmp/tensorflow_pkg/tensorflow-0.12.1-cp27-none-linux_armv7l.whl

Remove symlink CUDA 7.0 in order to run Tensorflow with CUDA 6.5

sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-6.5/ /usr/local/cuda

# Update symlinks lib*.7.0 to local one

# Watch out the driver cuda-driver-dev-6-5

Congratulations! You have succesfully built TensorFlow from source on NVIDA Jetson TK1
Test Tensorflow

# https://www.tensorflow.org/how_tos/using_gpu/
python
import tensorflow as tf

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)

# Expected Output
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:872] ARMV7 does not support NUMA -returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GK20A
major: 3 minor: 2 memoryClockRate (GHz) 0.852
pciBusID 0000:00:00.0
Total memory: 1.85GiB
Free memory: 199.25MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GK20A, pci bus id: 0000:00:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GK20A, pci bus id: 0000:00:00.0
I tensorflow/core/common_runtime/direct_session.cc:149] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GK20A, pci bus id: 0000:00:00.0

>>> print sess.run(c)
b: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:388] b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:388] a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:388] MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]
>>>

Known Issues during compilation

Ran out of memory. Try to update --local-resoures where n1,n2,n3 is memroy,cpu_thread,i/o input

C++ compilation of rule '//tensorflow/core/kernels:svd_op' failed: gcc failed: error executing command -

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build_from_source.md

build_from_source.md

Build Tensorflow from source

NVIDIA Jetson TK1 (Linux 32-bit armv7l)

Basic Requirements

Summary

References

UPDATE LOGS

1. Install Dependencies

2. Install Bazel

B. Set up `CUDA 7.0` and `cuDNN v3` as compiler for TensorFlow

C. Install TensorFlow

Known Issues during compilation

Files

build_from_source.md

Latest commit

History

build_from_source.md

File metadata and controls

Build Tensorflow from source

NVIDIA Jetson TK1 (Linux 32-bit armv7l)

Basic Requirements

Summary

References

UPDATE LOGS

1. Install Dependencies

2. Install Bazel

B. Set up CUDA 7.0 and cuDNN v3 as compiler for TensorFlow

C. Install TensorFlow

Known Issues during compilation

B. Set up `CUDA 7.0` and `cuDNN v3` as compiler for TensorFlow