Skip to content

Intel® Optimizations for TensorFlow 2.14 

Latest
Compare
Choose a tag to compare
@justkw justkw released this 03 Nov 15:43
· 19106 commits to master since this release
1f60e63

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.14.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.14.0, you can check the release notes of TensorFlow 2.14. This build was built from v2.14.0. 

This release notes cover optimizations made in both Intel® Optimization for TensorFlow* and official TensorFlow v2.14.0 which has oneDNN optimizations enabled by default on Linux x86 packages and for CPUs with neural-network-focused hardware features such as AVX512_VNNI, AVX512_BF16, AMX, and others, which are found on Intel Cascade Lake and newer CPUs. 

Breaking changes:

Intel® Optimization for TensorFlow*, version 2.14, will not be supported with any additional security or other updates after March 31, 2024. Intel® Optimization for TensorFlow*, version 2.14, is provided as is. Intel recommends that users of Intel® Optimization for TensorFlow, version 2.14, uninstall, and discontinue its use and install Intel® Extension for TensorFlow*, version 2.14, beginning March 31, 2024, which provides all available optimizations. No changes to code or installation setup is needed. More information on Intel's TensorFlow extension plugin can be viewed at https://github.com/intel/intel-extension-for-tensorflow.

oneDNN v3.0 has introduced a new quantization scheme where bias is applied after dequantization. As a result, some INT8 models may have sub-optimal performance if they contain many int32 bias nodes for convolution. For such models, we recommend that users re-quantize the graph to float32 bias using Intel® Neural Compressor.

Major features:  

  • See TensorFlow 2.14.0 release notes
  • Enabled oneDNN v3.x by default on both Linux and Windows x86 builds.
  • Enabled ITT tagging by default for oneDNN primitives on Intel® VTune™ Profiler which helps users on finding performance bottlenecks and provide detailed platform information such as L1/L2 cache misses or level of AVX512 vectorization.
  • Upgrade to oneDNN v3.2.1
  • Supported platforms: Linux

Improvements: 

  • Enabled caching scaled Bias in QuantizedMatmul in oneDNN v3.x nad enabled weight caching for Matmul op with oneDNN v3.x
  • Added oneDNN v3.x support to the following ops: FusedInstanceNorm, fused-matmul, batch-matmul, MKL CBLAS matmul, Einsum, QuantizedMatmul, QuantizedConvolution ops/fusions, maxpooling and avgpooling fwd and bwd (with primitive cache) for FP32 and BF16. Main changes in oneDNN v3.x for quantization API are that the scale needs to be set for each tensor and Bias needs to be passed as FP32.
  • Enabled reorder primitive cache and oneDNN v3.x benchmark tests.
  • Enabled weight caching in oneDNN convolution ops for oneDNN v3.x
  • Added 3D support to layout optimizer
  • Code clean-up to avoid potential bugs due to nullptr dereference checks, out-of-bound memory accesses, etc.
  • Upgrade curl version for potential vulnerability fixes
  • Enabled valid Eigen kernels for FusedBatchNormV3 and Grad for CPU and relevant tests.
  • Update oneDNN fused conv2d op signature to align with generic fused conv2d
  • Enabled rsync to work on Windows by changing the file path which is input to rsync from windows-compatible to linux-compatible

Bug fixes: 

  • Resolved issues in oneDNN v3.2.1. 
  • Fixed all issues found during static scan analyses. 
  • Change to allocate hash map for kernel registry as a unique pointer to avoid possible memory leak.
  • Fixed incorrect use of int for dim size when invoking oneDNN GEMM and Matmul primitive
  • Updated all occurrences of dim size usage to int64_t data type for all oneDNN kernel implementation.
  • Fixed failing resnet50 benchmark tests with v2 by passing the correct order of parameters.
  • Fixed the corner case for swish and mish op fusion by making sure not only the input ops match, but the tensors also match.
  • Fixed performance issue observed when user enables TF_ONEDNN_THREADPOOL_USE_CALLER_THREAD that allows one task to run on the main thread by using the original threadpool scheduling approach if the use_caller_thread is enabled.
  • Fixed performance issue by removing the log added to execute.cc
  • Fixed mkl_eager_op_rewrite_test by updating the test to use a raw pointer.

Versions and components: