TensorFlow* is a widely-used machine learning framework in the deep learning arena, demanding efficient computational resource utilization. To take full advantage of Intel® architecture and to extract maximum performance, the TensorFlow framework has been optimized using Intel® oneAPI Deep Neural Networks (Intel® oneDNN) primitives. This sample demonstrates how to train an example neural network and shows how Intel-optimized TensorFlow enables Intel oneDNN calls by default.
Optimized for | Description |
---|---|
OS | Linux* Ubuntu* 18.0.x and later, Windows* 10 |
Hardware | Intel® Xeon® Scalable processor family or newer |
Software | Intel® AI Analytics Toolkit |
What you will learn | How to get started to use Intel® Optimization for TensorFlow* |
Time to complete | 10 minutes |
This sample code shows how to get started with Intel® Optimization for TensorFlow*. It implements an example neural network with one convolution layer and one ReLU layer. Developers can quickly build and train a Tensorflow neural network using a simple python code. Also, by controlling the build-in environment variable, the sample attempts to explicitly show how Intel® oneDNN Primitives are called and their performance during the neural network training.
Intel-optimized Tensorflow is available as part of the Intel® AI Analytics Toolkit. For more information on the optimizations and performance data, see this blog post TensorFlow* Optimizations on Modern Intel® Architecture.
Please export the environment variable ONEDNN_VERBOSE=1
to display the deep learning primitives trace during execution.
- The training data is generated by
np.random
. - The neural network with one convolution layer and one ReLU layer is created by
tf.nn.conv2d
andtf.nn.relu
. - The TF session is inistialized by
tf.global_variables_initializer
. - The train is implemented via the below for-loop:
for epoch in range(0, EPOCHNUM): for step in range(0, BS_TRAIN): x_batch = x_data[step*N:(step+1)*N, :, :, :] y_batch = y_data[step*N:(step+1)*N, :, :, :] s.run(train, feed_dict={x: x_batch, y: y_batch})
Note: For convenience, code line os.environ["ONEDNN_VERBOSE"] = "1" has been added in the body of the script as an alternative method to setting this variable.
Runtime settings for ONEDNN_VERBOSE
, KMP_AFFINITY
, and Inter/Intra-op
Threads are set within the script. You can read more about these settings in this dedicated document: Maximize TensorFlow Performance on CPU: Considerations and Recommendations for Inference Workloads
Code samples are licensed under the MIT license. See License.txt for details.
Third party program Licenses can be found here: third-party-programs.txt
These instructions demonstrate how to build and run a sample on a machine where you have installed the Intel AI Analytics Toolkit. If you would like to try a sample without installing a toolkit, see Running Samples in DevCloud.
TensorFlow is ready for use once you finish the Intel AI Analytics Toolkit installation. You can refer to the oneAPI main page for toolkit installation and the Toolkit Getting Started Guide for Linux for post-installation steps and scripts.
Note: If you have not already done so, set up your CLI environment by sourcing the
setvars
script located in the root of your oneAPI installation.Linux Sudo: . /opt/intel/oneapi/setvars.sh
Linux User: . ~/intel/oneapi/setvars.sh
Windows: C:\Program Files(x86)\Intel\oneAPI\setvars.bat
For more information on environment variables, see Use the setvars Script for Linux or macOS, or Windows.
conda activate tensorflow
please replace ~/intel/oneapi for your oneapi installation path.
By default, the Intel AI Analytics Toolkit is installed in the inteloneapi folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can clone your desired conda environment using the following command:
conda create --name user_tensorflow --clone tensorflow
Then activate your conda environment with the following command:
conda activate user_tensorflow
To run the program on Linux*, type the following command in the terminal with Python installed:
- Navigate to the directory with the TensorFlow sample:
cd ~/oneAPI-samples/AI-and-Analytics/Getting-Started Samples/IntelTensorFlow_GettingStarted
- Run the sample:
python TensorFlow_HelloWorld.py
With successful execution, it will print out the following results:
0 0.4147554
1 0.3561021
2 0.33979267
3 0.33283564
4 0.32920069
[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]
If you export the ONEDNN_VERBOSE as 1 in the command line, the onednn run-time verbose trace should look similar to what is shown below:
export ONEDNN_VERBOSE=1
Windows: set ONEDNN_VERBOSE=1
Notes:the historical environment variables include DNNL_VERBOSE, MKLDNN_VERBOSE
Then run the sample again:
python TensorFlow_HelloWorld.py
You will see the verbose output:
2022-04-24 16:56:02.497963: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
onednn_verbose,info,oneDNN v2.5.0 (commit N/A)
onednn_verbose,info,cpu,runtime:OpenMP
onednn_verbose,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,prim_template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:cdba:f dst_f32:p:blocked:Acdb16a:f,,,10x4x3x3,0.00195312
onednn_verbose,exec,cpu,convolution,brgconv:avx512_core,forward_training,src_f32::blocked:acdb:f wei_f32:p:blocked:Acdb16a:f bia_f32::blocked:a:f dst_f32::blocked:acdb:f,attr-post-ops:eltwise_relu ,alg:convolution_direct,mb,4.96411
onednn_verbose,exec,cpu,convolution,jit:avx512_common,backward_weights,src_f32::blocked:acdb:f wei_f32:p:blocked:Acdb16a:f bia_undef::undef::f dst_f32::blocked:acdb:f,,alg:convolution_direct,mb,0.567871
...
Please see the oneDNN Developer's Guide for more details on the verbose log.
Please refer to using samples in DevCloud for general usage instructions.
- Navigate to the directory with the TensorFlow sample:
cd ~/oneAPI-samples/AI-and-Analytics/Getting-Started Samples/IntelTensorFlow_GettingStarted
- Submit this "TensorFlow_HelloWorld" workload on the selected node with the run script.
./q ./run.sh
the run.sh contains all the instructions needed to run this "TensorFlow_HelloWorld" workload
If an error occurs, troubleshoot the problem using the Diagnostics Utility for Intel® oneAPI Toolkits. Learn more
You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations, and browse and download samples.
The basic steps to build and run a sample using VS Code include:
- Download a sample using the extension Code Sample Browser for Intel oneAPI Toolkits.
- Configure the oneAPI environment with the extension Environment Configurator for Intel oneAPI Toolkits.
- Open a Terminal in VS Code (Terminal>New Terminal).
- Run the sample in the VS Code terminal using the instructions below.
- (Linux only) Debug your GPU application with GDB for Intel® oneAPI toolkits using the Generate Launch Configurations extension.
To learn more about the extensions, see Using Visual Studio Code with Intel® oneAPI Toolkits.
After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.