Skip to content

v0.16

Compare
Choose a tag to compare
@tprimak tprimak released this 15 Aug 00:28

Performance optimizations

  • Improved performance of int8 convolutions with number of input and output channels not divisible by SIMD width on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
  • Winograd convolutions optimized for fp32 real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
  • Optimized weights update of dilated convolutions for fp32 data type on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
  • Improved performance of reorder primitive for int8 data type.

New functionality

  • Added dilation support for deconvolution (transposed convolution) primitive.
  • Introduced deconvolution (transposed convolution) primitive for int8 data type.

API deprecations and breaking changes

  • The default behavior of gemm-based convolutions was changed. Now they use internally allocated thread-local scratchpad memory for im2col and col2im operations, weights reduction, and accumulation. This may cause correctness issues when multiple gemm-based convolutions are created in one thread and executed concurrently in different threads. To support concurrent execution, MKL-DNN library must be configured with -DMKLDNN_ENABLE_CONCURRENT_EXEC=TRUE CMake flag.

Usability improvements

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Yasser Zamani @yasserzamani and Loo Rong Jie @rongjiecomputer. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.