Skip to content

Intel® Optimizations for TensorFlow 2.11.0

Compare
Choose a tag to compare
@justkw justkw released this 29 Nov 02:08
· 33839 commits to master since this release
b4afce6

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.11.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.11.0, please see the TensorFlow 2.11 release notes also. This build was built from v2.11.0.

This release note covers both Intel® Optimizations for TensorFlow* and official TensorFlow v2.11 which has oneDNN optimizations enabled by default on Linux x86 packages and for CPUs with neural-network-focused hardware features such as AVX512_VNNI, AVX512_BF16, AMX, and others, which are found on Intel Cascade Lake and newer CPUs.

Major features:

· Please see the TensorFlow 2.11.0 release notes
· Further Performance improvements for Bfloat16 models with AMX optimizations and more operations are supported with BF16 datatype.
· A new set of APIs is added for INT8 which will improve performance.
· Supported platforms: Linux.

Improvements:

· Updated oneDNN to version v2.7.1
· Added MLIR support for Contraction + BiasAdd + fusion
· A new set of APIs of quantized convolution ops/fusions is added to consolidate many existing convolution ops/fusions into few. With the new ops API, single op will cover several fusions for INT8.
· Fused Mul-Max pattern into LeakyRelu improved performance by 9% on various GAN models.
· Enabled support for BF16 for Conv3DBackpropFilterV2 and Conv3DBackpropFilterV2 for performance improvement
· Enabled fp32 & bf16 Einsum for CPU by default for performance improvement
· Enhanced performance by~15-20% by updating AMP mkl lists for several models including EfficientDet/EfficientNet, IcNet and more
· Enabled user mode scratchpad for inner-product (FusedMatMul & quantized MatMul) for better memory usage control
· Enabled changes and added a test case by unblocking Matmul+Add(Bias) fusion

Bug fixes:

· Tensorflow 2.11.0 resolved issues
· oneDNN resolved issues. 2.7.1 resolved issues
· Static scan analysis findings are all fixed.
· Fixed AvgPool Floating point issue
· Fixed AvgPool3d Floating point issue
· Fixed Memory Corruption issue in AvgPool3D when OneDNN is enabled
· Fixed Integer divide-by-0 during fused convolution with oneDNN on CPUs supporting AVX512 instructions
· Fixed primitive cache key which has potential problem that can appear in primitive caching for some rare cases where model has some FusedConv2D/3D nodes with same exact dimensions and parameters with the only difference being the fused activation function
· Fixed _FusedConv2D crash in oneDNN enabled
· Fixed LeakyRelu in the grappler remapper fusion (Pad + Conv3D + BiasAdd + Activation
· Fixed unit test failure //tensorflow/python/grappler:remapper_test
· Fixed build failure by adding patch to openmp build
· Fixed memory corruption issue with oneDNN primitive cache

Versions and components

• Intel optimized TensorFlow based on TensorFlow v2.11.0: https://github.com/Intel-tensorflow/tensorflow/tree/r2.11.0_intel_release
• TensorFlow v2.11.0: https://github.com/tensorflow/tensorflow/tree/v2.11.0
• oneDNN: https://github.com/oneapi-src/oneDNN/releases/tag/v2.7.1
• Model Zoo: https://github.com/IntelAI/models

Known issues

Bfloat16 is not guaranteed to work on AVX or AVX2
In Windows OS, to use oneDNN enabled TensorFlow, users need to run “set TF_ENABLE_ONEDNN_OPTS=1”. Also, if the PC has hyperthreading enabled, users need to bind the ML application to one logical core per CPU in order to get the best runtime performance.
Use the initialization script from the following link, to get best performance on windows : https://github.com/IntelAI/models/blob/r2.7/benchmarks/common/windows_intel1dnn_setenv.bat