TensorFlow oneDNN build manual for FUJITSU Software Compiler Package (TensorFlow v2.2.0)

Build Instruction for TensorFlow on Fujitsu Supercomputer PRIMEHPC FX1000/FX700

1. Introduction

This document contains instructions for installing TensorFlow on a Fujitsu Supercomputer PRIMEHPC FX1000 or FX700.
It also provides sample instructions for installing and running several important models optimized for the FX1000 and FX700.

On building TensorFlow, bazel, a building tool used in it, download third-party software from the Internet.
However, since there are many requests to build on the systems in the isolated facilities such as corporate laboratories, we experimentally provide the installation instructions for such environments (in this manual, this is called "offline installation"). Please notice that this procedure is still at the beta test level and is not guaranteed to work.

For offline installation, you first download a set of necessary files beforehand on a system connected to the Internet (the "download system"), and then transfer them to the system to be installed (the "target system").

1.1. Terminology

The following terms and abbreviations are used in this manual.

Terms/Abbr.	Meaning
Online Installation	Install TensorFlow on a system with direct access to the Internet (or via proxy)
Offline Installation	Install TensorFlow on a system that does not have direct access to the Internet
Target system	System on which TensorFlow is to be installed
Download system	System for downloading the necessary files in advance for offline installation
TCS	FX1000's job execution scheduler and compiler library environment (Technical Computing Suite)
CP	FX700's compiler library environment (Compiler Package)

2. Environment and prerequisites

2.1. Download System

OS: UNIX or Linux
Following software is available: bash, python3, wget, git, unzip, tar, and curl
Accessible to the Target System
Sufficient free space in the file system The amount of downloaded data is as follows.
- Approx. 2GB for TensorFlow
- Another 2GB for the sample models (Resnet, OpenNMT, and BERT)
- 30GB for Mask R-CNN

You will need another 20GB for the target system.

The download directory is under the TensorFlow source directory (This cannot be altered).

2.2. Target system for installation

PRIMEHPC FX1000 or FX700
For FX700
- RHEL 8.x or CentOS 8.x must be installed
- If you want to use FCC, Compiler Package V10L20 must be installed
The following packages and commands should be already installed
make gcc cmake libffi-devel gcc-gfortran numactl git patch unzip tk tcsh tcl lsof python3 pciutils
(For Mask R-CNN sample model) libxml2 libxslt libxslt-devel libxml2-devel

Please note that building and executing on NFS may cause unexpected problems depending on the performance and configuration of the NFS server.
It is recommended to use locally-attached storage or network storage that is fast enough.

2.3. Directory structure after installation

The directory structure after installation looks like this. The directories PREFIX, VENV_PATH, and TCSDS_PATH are specified in the configuration file env.src. This three directories, and TENSORFLOW_TOP must be independent each other. (Make sure that one directory is not under another directory.)

.
  PREFIX (where local binaries are stored)
    +- bin (Python, etc.)
    +- lib

  VENV_PATH (location of python modules needed to run TensorFlow)
    +- bin (activate)
    +- lib (packages to be installed by pip)

  TCSDS_PATH (Fujitsu compiler, *: already installed before the procedure)
    +- bin (fcc, FCC, etc.)
    +- lib64

  TENSORFLOW_TOP (complete TensorFlow sources, transferred from the download system or downloaded from https://www.github.com/fujitsu/tensorflow)
    +- tensorflow
    +- third_party
    +- fcc_build_script
         +- down (where the downloaded files will be stored)
         +- sample_script (sources for resnet, OpenNMT, BERT, and Mask RCNN models will be extracted below this)

.

2.4. About proxy settings

If your environment requires proxy to the external access, please set the following environment variables.
(Replace "user", "pass", "proxy_url", and "port" with the ones appropriate for your environment.)

$ export http_proxy=http://user:pass@proxy_url:port
$ export https_proxy=https://user:pass@proxy_url:port

Note: curl, wget, git, and pip3 recognize the above environment variables, so edit of rc or .gitconfig is unnecessary.

3. Installation procedure

The general installation flow is as follows:

Preparation
- 3.1-A. Download the TensorFlow Source from Github
- 3.1-B. Edit env.src
Download
Build
- 3.3-A. Build TensorFlow
- 3.3-B. (Optional) Build Sample Models

3.1. Preliminaries (Detail)

3.1-A. Download the source set

$ git clone https://github.com/fujitsu/tensorflow.git
$ cd tensorflow                                         # From now on, we'll call this directory TENSORFLOW_TOP
$ git checkout -b fujitsu_v2.2.0_for_a64fx origin/fujitsu_v2.2.0_for_a64fx

In the following examples, /home/user/tensorflow is used as TENSORFLOW_TOP.

3.1-B. Edit env.src

env.src is located in $TENSORFLOW_TOP/fcc_build_script.

The configuration is divided into two parts.

Control of the Build

Flag Name	Default Value	Meaning	Remarks
`fjenv_use_venv`	True	Use VENV when true	'false' is not tested. Do not modify the setting for now.
`fjenv_use_fcc`	True	use FCC when true, otherwise, use GCC	'false' is not tested. Do not modify the setting for now.
`fjenv_offline_install`	Undefined (= false)	True to use offline installation

Set up the building directory.
For the directory configuration, Refer the diagram in Chapter 3.

Variable name	Meaning	Supplemental information
`PREFIX`	Directory to install the executable generated by this construction procedure.
`VENV_PATH`	name of the directory where VENV is installed	Valid when `use_venv` is set
`TCSDS_PATH`	name of the base directory for TCS and CP (base directory: a directory containing bin, lib, etc.)	Valid when `use_fcc` is set

3.2. Download (Detail)

This section is only for offline installation. If you are installing on an Internet-connected system, skip this section and go to 3.3.

3.2-A. Download the Files for TensorFlow

Run the shell scripts in the fcc_build_script directory, starting with 0, one after the other.
The download location is $TENSORFLOW_TOP/fcc_build_script/down. (This cannot be altered).

$ pwd
/home/user/tensorflow/fcc_build_script          # $TENSORFLOW_TOP/fcc_build_script

$ bash 01_python_build.sh        download       # Download Python (23MB)
$ bash 02_bazel_build.sh         download       # Download bazel (163MB)
$ bash 03_oneDNN_build.sh        download       # Download oneDNN (286MB)
$ bash 04_make_venv.sh           download       # Download Python modules for TensorFlow (208MB)
$ bash 05-0_set_tf_src.sh        download       # Setup TensorFlow Build Environment (none)
$ bash 05-1_build_batchedblas.sh download       # Download BatchedBlas (0.8MB)
$ bash 05_tf_build.sh            download       # Download Modules for TensorFlow build(261MB)
$ bash 06_tf_install.sh          download       # Download TensorFlow Installation data (none)
$ bash 07_horovod_install.sh     download       # Download Horovod (none)
                                                # Total 941MB

The scripts are designed so that it will not download files that has already been downloaded. If you want to download files again, run each script with clean argument first, and then run it with download. Please note that clean has higher priority than download, so if you specify clean download or download clean, only clean is performed.

3.2-B. (Optional) Download the Files for Sample Models

The sample models are located under sample_script, with dedicated directory starting with the number. Run the shell scripts starting with the number in each directory one by one in numbering order, with the argument download.
(There are two types of shell scripts, for the build and for the run, but executing the scripts for the run with download argument is harmless)

Download data size is as follows.

Sample Model Directory	Size
sample_script/01_resnet	578MB
sample_script/02_OpenNMT	43MB
sample_script/03_Bert	995MB
sample_script/04_Mask-R-CNN	30000MB
Total	31616MB (31GB)

The scripts are designed so that it will not download files that has already been downloaded. If you want to download files again, run each script with clean argument first, and then run it with download. Please note that clean has higher priority than download, so if you specify clean download or download clean, only clean is performed.

3.2-C. Transfer to the Target System

This section is only for offline installation. If you are installing on an Internet-connected system, skip this section and go to 3.3.

Transfer all the files in the TensorFlow source directory.

We do not describe the transfer method, as it depends on your system configuration.
Use scp, ftp, a shared filesystem, or any other method appropriate for your system.

3.3. Build (Detail)

3.3-A. Build TensorFlow

Run the shell scripts with name starting number, in numbering order, one after the other.
The following example shows how to install with an interactive shell. The approximate time is shown as a comment in each command (measured on an FX700 2.0GHz 48core). If you are using a job control system, modify the script according to the syntax of your job control system and submit the job.

$ pwd
/home/user/tensorflow/fcc_build_script          # $TENSORFLOW_TOP/fcc_build_script

$ bash 01_python_build.sh          [option]     # Build and install (5 min.)
$ bash 02_bazel_build.sh 	   [option]     # Install bazel (no build, <0 min.)
$ bash 03_oneDNN_build.sh 	   [option]     # Build and install oneDNN (30 min.)
$ bash 04_make_venv.sh 		   [option]     # Create VENV (70 min.)
$ bash 05-0_set_tf_src.sh 	   [option]     # Preperation of TensorFlow build (<0 min.)
$ bash 05-1_build_batchedblas.sh   [option]     # Build BatchedBlas (<0 min.)
$ bash 05_tf_build.sh 		   [option]     # Build TensorFlow (45 min.)
$ bash 06_tf_install.sh 	   [option]     # Create wheel packege for TensorFlow and install (5 min.)
$ bash 07_horovod_install.sh 	   [option]     # Install Horovod (5 min.)

To verify the build, run the sample model in sample_script/01_resnet.

The scripts are designed so that when the build binary exist, it will not build again. If you want to build again, run each script with rebuild argument. Please do not confuse with clean. If it is specified, then all the download files are deleted, that makes you download and transfer again in offline installation.

3.3-B. (Optional) Build Sample Models

The sample models are located in the subdirectory starting with number under sample_script directory. Run the shell scripts with name starting number, in numbering order, one after the other.
The detail of the build and verfication is described in below.

Since the execution speed of deep learning models can vary by 10~20%, you can use the execution speed described in this manual as a guide, and if it is within the certain range, your build is OK. Also, keep in mind that the settings of the sample model is not optimal (the firstest).

01_resnet

Use the official model (for TensorFlow v1.x) from Google. https://github.com/tensorflow/models/official/r1/resnet
Tag: v2.0 (2019/10/15)

$ pwd
/home/user/tensorflow/fcc_build_script/sample_script/01_resnet

$ bash 10_setup_resnet.sh  [option]  # Setup the model (<0 min.)
$ bash 11_train_resnet-single.sh     # Run the model (1node, 1proc、12core, synthetic data) (5 min.)
$ bash 12_train_resnet-4process.sh   # Run the model (1node, 4proc、12core/proc, synthetic data) (5 min.)

The example result of 11_train_resnet-single.sh is as follows (FX700 result; roughly +10% faster on FX1000)

$ bash 11_train_resnet-single.sh     # Start Running (1proc, 12core)
### Start at Mon Aug  9 07:52:23 JST 2021, 11_train_resnet-single.sh rebuild
  (snip)
INFO:tensorflow:cross_entropy = 7.6896524, learning_rate = 0.0, train_accuracy = 0.0                          ## Training Start
I0809 07:54:59.046092 281473418508384 basic_session_run_hooks.py:262] cross_entropy = 7.6896524, learning_rate = 0.0, train_accuracy = 0.0
INFO:tensorflow:loss = 9.084746, step = 0
I0809 07:54:59.051896 281473418508384 basic_session_run_hooks.py:262] loss = 9.084746, step = 0
INFO:tensorflow:global_step/sec: 0.093041
I0809 07:55:09.795205 281473418508384 basic_session_run_hooks.py:702] global_step/sec: 0.093041
INFO:tensorflow:loss = 9.084746, step = 1 (10.747 sec)
I0809 07:55:09.799176 281473418508384 basic_session_run_hooks.py:260] loss = 9.084746, step = 1 (10.747 sec)  ## Ignore the elapsed time of the first step
INFO:tensorflow:global_step/sec: 0.409492
I0809 07:55:12.232794 281473418508384 basic_session_run_hooks.py:702] global_step/sec: 0.409492
INFO:tensorflow:loss = 9.078741, step = 2 (2.435 sec)
I0809 07:55:12.233708 281473418508384 basic_session_run_hooks.py:260] loss = 9.078741, step = 2 (2.435 sec)   ## The elapsed time will be short in the second step and later
INFO:tensorflow:global_step/sec: 0.428228                                                                     ## 2-2.5 sec/iter is good
I0809 07:55:14.568248 281473418508384 basic_session_run_hooks.py:702] global_step/sec: 0.428228
INFO:tensorflow:loss = 9.061764, step = 3 (2.335 sec)
I0809 07:55:14.569161 281473418508384 basic_session_run_hooks.py:260] loss = 9.061764, step = 3 (2.335 sec)   ## The step speed is shonw until step 24
  (snip)
INFO:tensorflow:Starting to evaluate.                                                                         ## Evaluation
  (snip)
INFO:tensorflow:step = 1 time = 2.669 [sec]
I0809 07:56:43.429035 281473418508384 resnet_run_loop.py:777] step = 1 time = 2.669 [sec]                     ## Ignore the elapsed time of the first step
INFO:tensorflow:step = 2 time = 0.801 [sec]
I0809 07:56:44.231100 281473418508384 resnet_run_loop.py:777] step = 2 time = 0.801 [sec]                     ## For the second step and later steps, around 0.8 sec of elapsed time is good
INFO:tensorflow:Evaluation [2/25]
I0809 07:56:44.231565 281473418508384 evaluation.py:167] Evaluation [2/25]
INFO:tensorflow:step = 3 time = 0.801 [sec]
I0809 07:56:45.033299 281473418508384 resnet_run_loop.py:777] step = 3 time = 0.801 [sec]
INFO:tensorflow:step = 4 time = 0.801 [sec]
I0809 07:56:45.835202 281473418508384 resnet_run_loop.py:777] step = 4 time = 0.801 [sec]
  (snip)

The result from 12_train_resnet-4process.sh can be examined in the same way. In this script, four TensorFlows are invoked, each runs the same training and evaluation shown in above, so the total amount of job is four times from 11_train_resnet-single.sh.
Because of this, the result each TensorFlow is reporting is sligithly worse than the single run.

$ bash 12_train_resnet-4process.sh   # Start Running (4proc, 12core/proc, use Horovod)
  (snip)
I0807 14:06:28.097440 281473036105824 basic_session_run_hooks.py:260] loss = 8.829225, step = 2 (2.641 sec)   ## Late 2 seconds to early 3 seconds for the second and later steps
INFO:tensorflow:loss = 9.032612, step = 2 (2.647 sec)
I0807 14:06:28.097791 281473726658656 basic_session_run_hooks.py:260] loss = 9.032612, step = 2 (2.647 sec)   ## Each TensorFlow outputs the result
INFO:tensorflow:global_step/sec: 0.379547
INFO:tensorflow:global_step/sec: 0.377129
I0807 14:06:28.097860 281472940488800 basic_session_run_hooks.py:702] global_step/sec: 0.379547
INFO:tensorflow:loss = 9.16524, step = 2 (2.631 sec)
I0807 14:06:28.098806 281472940488800 basic_session_run_hooks.py:260] loss = 9.16524, step = 2 (2.631 sec)
I0807 14:06:28.098345 281473435875424 basic_session_run_hooks.py:702] global_step/sec: 0.377129
INFO:tensorflow:loss = 8.870877, step = 2 (2.644 sec)
I0807 14:06:28.099238 281473435875424 basic_session_run_hooks.py:260] loss = 8.870877, step = 2 (2.644 sec)
INFO:tensorflow:global_step/sec: 0.386664
INFO:tensorflow:global_step/sec: 0.386681
I0807 14:06:30.682776 281473036105824 basic_session_run_hooks.py:702] global_step/sec: 0.386664
I0807 14:06:30.683004 281473726658656 basic_session_run_hooks.py:702] global_step/sec: 0.386681
INFO:tensorflow:loss = 8.824974, step = 3 (2.586 sec)
I0807 14:06:30.683686 281473036105824 basic_session_run_hooks.py:260] loss = 8.824974, step = 3 (2.586 sec)
INFO:tensorflow:global_step/sec: 0.386693
INFO:tensorflow:loss = 9.028437, step = 3 (2.586 sec)
I0807 14:06:30.683933 281473726658656 basic_session_run_hooks.py:260] loss = 9.028437, step = 3 (2.586 sec)
I0807 14:06:30.683835 281472940488800 basic_session_run_hooks.py:702] global_step/sec: 0.386693
INFO:tensorflow:global_step/sec: 0.386609
INFO:tensorflow:loss = 9.161036, step = 3 (2.586 sec)
I0807 14:06:30.684726 281472940488800 basic_session_run_hooks.py:260] loss = 9.161036, step = 3 (2.586 sec)
I0807 14:06:30.684346 281473435875424 basic_session_run_hooks.py:702] global_step/sec: 0.386609
INFO:tensorflow:loss = 8.866662, step = 3 (2.586 sec)
I0807 14:06:30.685310 281473435875424 basic_session_run_hooks.py:260] loss = 8.866662, step = 3 (2.586 sec)
  (snip)
INFO:tensorflow:Starting to evaluate.                                                                         ## Inference
  (snip)
I0809 08:02:16.188597 281473151645792 resnet_run_loop.py:777] step = 2 time = 0.697 [sec]                     ## For the second step and later steps, around 0.8 sec of elapsed time is good
INFO:tensorflow:Evaluation [2/25]                                                                             ## More variability can be observerd in four-parallel inference 
I0809 08:02:16.189082 281473151645792 evaluation.py:167] Evaluation [2/25]
INFO:tensorflow:step = 2 time = 0.689 [sec]
I0809 08:02:16.301140 281472830257248 resnet_run_loop.py:777] step = 2 time = 0.689 [sec]
INFO:tensorflow:Evaluation [2/25]
I0809 08:02:16.301596 281472830257248 evaluation.py:167] Evaluation [2/25]
INFO:tensorflow:step = 1 time = 2.811 [sec]
I0809 08:02:16.357173 281472819312736 resnet_run_loop.py:777] step = 2 time = 0.745 [sec]
INFO:tensorflow:Evaluation [2/25]
I0809 08:02:16.357645 281472819312736 evaluation.py:167] Evaluation [2/25]
INFO:tensorflow:step = 3 time = 0.715 [sec]
I0809 08:02:16.904115 281473151645792 resnet_run_loop.py:777] step = 3 time = 0.715 [sec]
INFO:tensorflow:step = 3 time = 0.709 [sec]
I0809 08:02:17.010920 281472830257248 resnet_run_loop.py:777] step = 3 time = 0.709 [sec]
INFO:tensorflow:step = 2 time = 0.718 [sec]

02_OpenNMT

Learn to translate by entering English and German sentences in pairs.

https://github.com/OpenNMT/OpenNMT-tf
Tag: v2.11.0 (2020/6/17)

$ pwd
/home/user/tensorflow/fcc_build_script/sample_script/02_OpenNMT

$ bash 20_setup_OpenNMT.sh                           # Setup (2 to 3 min.)
$ bash 21_train_OpenNMT_Transformer-single.sh        # Run the model (1node, 1proc、24core, en-de) (7 min.)
$ bash 22_train_OpenNMT_Transformer-2process.sh      # Run the model (1node, 2proc、24core/proc, en-de) (7 min.)

The operating speed can be checked in the output source words/s.
Since the speed of this task fluctuates greatly, please check the maximum value.

INFO:tensorflow:Step = 1 ; steps/s = 0.00, source words/s = 20, target words/s = 20 ; Learning rate = 0.000000 ; Loss = 10.504834
INFO:tensorflow:Saved checkpoint run/ckpt-1
INFO:tensorflow:Step = 2 ; steps/s = 0.13, source words/s = 736, target words/s = 735 ; Learning rate = 0.000000 ; Loss = 10.511767
INFO:tensorflow:Step = 3 ; steps/s = 0.24, source words/s = 1385, target words/s = 1347 ; Learning rate = 0.000000 ; Loss = 10.509029
INFO:tensorflow:Step = 4 ; steps/s = 0.22, source words/s = 1251, target words/s = 1243 ; Learning rate = 0.000001 ; Loss = 10.506723
INFO:tensorflow:Step = 5 ; steps/s = 0.24, source words/s = 1343, target words/s = 1330 ; Learning rate = 0.000001 ; Loss = 10.499746

On FX700 (2.0GHz), the maximum speed of 21_train_OpenNMT_Transformer-single.sh is about 1350, and the maximum speed of 22_train_OpenNMT_Transformer-2process.sh is about 2400.

03_Bert

Use the official model from Google.

https://github.com/tensorflow/models/official/nlp/bert
Tag: v2.2.0 (2020/4/15)

BERT performs two types of tasks.

$ pwd
/home/user/tensorflow/fcc_build_script/sample_script/03_Bert

$ bash 300_setup_bert.sh                                # Setup (2 to 3 min.)
$ bash 311_create_pretraining_data.sh                   # Prepare the data for the first task (1 min.)
$ bash 312_run_pretraining.sh                           # Run the first task (1node, 1proc, 24core) (10 min.)
$ bash 313_run_pretraining-2process.sh                  # Run the first task (1node, 2proc, 24core/proc) (8 min.)
$ bash 321_create_finetuning_data.sh                    # Prepare the data for the second task (1 min.)
$ bash 322_run_finetuning.sh                            # Run the second task (1node, 1proc, 24core) (8 min.)
$ bash 323_run_finetuning-2process.sh                   # Run the second task (1node, 2proc, 24core/proc) (8 min.)

The running speed of the both tasks can be checked in the output TimeHistory.

I0807 14:45:34.346899 281473323284576 keras_utils.py:119] TimeHistory: 17.58 seconds, 10.92 examples/second between steps 88 and 92
I0807 14:45:34.350614 281473323284576 model_training_utils.py:444] Train Step: 92/100  / loss = 7.7405781745910645
I0807 14:45:38.730090 281473323284576 model_training_utils.py:444] Train Step: 93/100  / loss = 8.055837631225586
I0807 14:45:43.144752 281473323284576 model_training_utils.py:444] Train Step: 94/100  / loss = 7.836669445037842
I0807 14:45:47.515263 281473323284576 model_training_utils.py:444] Train Step: 95/100  / loss = 7.881825923919678
I0807 14:45:51.936028 281473323284576 keras_utils.py:119] TimeHistory: 17.57 seconds, 10.93 examples/second between steps 92 and 96
I0807 14:45:51.940003 281473323284576 model_training_utils.py:444] Train Step: 96/100  / loss = 7.87914514541626
I0807 14:45:56.288466 281473323284576 model_training_utils.py:444] Train Step: 97/100  / loss = 7.819365978240967
I0807 14:46:00.629250 281473323284576 model_training_utils.py:444] Train Step: 98/100  / loss = 7.822110652923584
I0807 14:46:05.090573 281473323284576 model_training_utils.py:444] Train Step: 99/100  / loss = 7.909728527069092
I0807 14:46:09.439723 281473323284576 keras_utils.py:119] TimeHistory: 17.48 seconds, 10.98 examples/second between steps 96 and 100

On FX700(2.0GHz), the expected speed is

For 312_run_pretraining.sh, around 17 sec. (11 examples/second)
For 313_run_pretraining-2process.sh, around 22 sec. (9 examples/second)

For the second task on FX700(2.0GHz),the expected speed is

For 322_run_finetuning.sh, around 40 sec. (11 examples/second)
For 323_run_finetuning-2process.sh, around 54 sec. (9 examples/second)

04_Mask-R-CNN

Use the official model from Google.

https://github.com/tensorflow/models/research/object_detection
Commit id: dc4d11216b (2020/11/8)

  $ pwd
  /home/user/tensorflow/fcc_build_script/sample_script/04_Mask-R-CNN
  
  $ bash 40_setup_mask-r-cnn.sh                           # Setup (30 min.)
  $ bash 41-0_download_traindata.sh                       # Download the training data (26GB) (no work for the build, harmless)
  $ bash 41-1_setup_traindata.sh                          # Prepare the training data (3 hour 30 min.)
  $ bash 42_train_maskrcnn_single.sh                      # Run the task (1node, 1proc, 24core) (8 min.)
  $ bash 43_train_maskrcnn_multi.sh                       # Run the task (1node, 2proc, 24core/proc) (25 min.)

The running speed of the task can be checked in the output pre-step time.

  INFO:tensorflow:Step 1 per-step time 284.779s loss=9.175
  INFO:tensorflow:Step 2 per-step time 4.036s loss=8.950
  INFO:tensorflow:Step 3 per-step time 4.115s loss=8.899
  INFO:tensorflow:Step 4 per-step time 3.995s loss=8.738
  INFO:tensorflow:Step 5 per-step time 4.150s loss=8.442
  INFO:tensorflow:Step 6 per-step time 3.905s loss=8.288

On FX700(2.0GHz), the expected speed is around 4 sec. for both 42_train_maskrcnn_single.sh and 43_train_maskrcnn_multi.sh.

4. Troubleshooting

During download phase, git ends with 'refernce is not a tree'

It could happen on old git command. Use git v2.0 or later.

In '04_make_venv.sh', error occurred during building `numpy`.

Two causes are possible.

If you get an error about _ctype, you are missing libffi-devel, try yum install.
If you cannot find Fortran, yum gcc-gfortran.

pip doesn't accepts `--no-color` option

Under the scheduler, pip may not accept this option. Tweak env.src to remove --no-color from PIP3_OPTIONS.

python3 is not working

When all of the following conditions are met, you will get "cannot execute binary file: Exec format error" message.

Offline installation is being performed.
The download system is other than FX1000 or FX700 (e.g. PRIMERGY or other x86 server).
The download system and target system share the network storage, and you are trying to install on it.
You have already built TensorFlow and are going to build a sample model later.

In this case, please do one of the following 1 or 2. 1 Download everything first, then build it. 1 Separate the download directory and build directory

5. List of Software Version

Software	Version	License	Remarks
-	-	-	-
Python	3.8.2	GPL
TensorFlow	2.2.0	Apache 2.0
bazel	2.0.0	Apache 2.0
oneDNN	v2.1.0L01_aarch64
BatchedBlas	1.0	BSD-3
Horovod	0.19.5	Apache 2.0

pip3 list

The following list is the results of the installation on 8/9/2021. The different version will be used depending on the date and time of the installation.

Package                Version
---------------------- ---------
absl-py                0.13.0
astunparse             1.6.3
cachetools             4.2.2
certifi                2021.5.30
charset-normalizer     2.0.4
cloudpickle            1.6.0
contextlib2            21.6.0
cppy                   1.1.0
cycler                 0.10.0
Cython                 0.29.24
dataclasses            0.6
gast                   0.3.3
gin-config             0.4.0
google-auth            1.34.0
google-auth-oauthlib   0.4.5
google-pasta           0.2.0
grpcio                 1.29.0
h5py                   2.10.0
horovod                0.19.5
idna                   3.2
Keras-Applications     1.0.8
Keras-Preprocessing    1.1.2
kiwisolver             1.3.1
lvis                   0.5.3
lxml                   4.6.3
Markdown               3.3.4
matplotlib             3.3.2
numpy                  1.18.4
oauthlib               3.1.1
OpenNMT-tf             2.11.0
opt-einsum             3.3.0
Pillow                 7.2.0
pip                    19.2.3
portalocker            2.0.0
protobuf               3.17.3
psutil                 5.8.0
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pybind11               2.7.1
pycocotools            2.0.2
pyonmttok              1.18.3
pyparsing              2.4.7
pyter3                 0.3
pytest-runner          5.3.1
python-dateutil        2.8.2
PyYAML                 5.3.1
requests               2.26.0
requests-oauthlib      1.3.0
rouge                  1.0.1
rsa                    4.7.2
sacrebleu              1.5.1
scipy                  1.4.1
sentencepiece          0.1.96
setuptools             41.2.0
six                    1.16.0
tensorboard            2.2.2
tensorboard-plugin-wit 1.8.0
tensorflow             2.2.0
tensorflow-addons      0.10.0
tensorflow-estimator   2.2.0
tensorflow-hub         0.12.0
termcolor              1.1.0
tf-slim                1.1.0
typeguard              2.12.1
urllib3                1.26.6
Werkzeug               2.0.1
wheel                  0.36.2
wrapt                  1.12.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow oneDNN build manual for FUJITSU Software Compiler Package (TensorFlow v2.2.0)

Build Instruction for TensorFlow on Fujitsu Supercomputer PRIMEHPC FX1000/FX700

Table of contents

1. Introduction

1.1. Terminology

2. Environment and prerequisites

2.1. Download System

2.2. Target system for installation

2.3. Directory structure after installation

2.4. About proxy settings

3. Installation procedure

3.1. Preliminaries (Detail)

3.1-A. Download the source set

3.1-B. Edit env.src

3.2. Download (Detail)

3.2-A. Download the Files for TensorFlow

3.2-B. (Optional) Download the Files for Sample Models

3.2-C. Transfer to the Target System

3.3. Build (Detail)

3.3-A. Build TensorFlow

3.3-B. (Optional) Build Sample Models

01_resnet

02_OpenNMT

03_Bert

04_Mask-R-CNN

4. Troubleshooting

During download phase, git ends with 'refernce is not a tree'

In '04_make_venv.sh', error occurred during building `numpy`.

pip doesn't accepts `--no-color` option

python3 is not working

5. List of Software Version

pip3 list

Copyright

Clone this wiki locally

TensorFlow oneDNN build manual for FUJITSU Software Compiler Package (TensorFlow v2.2.0)

Build Instruction for TensorFlow on Fujitsu Supercomputer PRIMEHPC FX1000/FX700

Table of contents

1. Introduction

1.1. Terminology

2. Environment and prerequisites

2.1. Download System

2.2. Target system for installation

2.3. Directory structure after installation

2.4. About proxy settings

3. Installation procedure

3.1. Preliminaries (Detail)

3.1-A. Download the source set

3.1-B. Edit env.src

3.2. Download (Detail)

3.2-A. Download the Files for TensorFlow

3.2-B. (Optional) Download the Files for Sample Models

3.2-C. Transfer to the Target System

3.3. Build (Detail)

3.3-A. Build TensorFlow

3.3-B. (Optional) Build Sample Models

01_resnet

02_OpenNMT

03_Bert

04_Mask-R-CNN

4. Troubleshooting

During download phase, git ends with 'refernce is not a tree'

In '04_make_venv.sh', error occurred during building numpy.

pip doesn't accepts --no-color option

python3 is not working

5. List of Software Version

pip3 list

Copyright

Clone this wiki locally

In '04_make_venv.sh', error occurred during building `numpy`.

pip doesn't accepts `--no-color` option