Performance optimizations

Improved performance of int8 convolutions with number of input and output channels not divisible by SIMD width on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
Winograd convolutions optimized for fp32 real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
Optimized weights update of dilated convolutions for fp32 data type on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
Improved performance of reorder primitive for int8 data type.

New functionality

Added dilation support for deconvolution (transposed convolution) primitive.
Introduced deconvolution (transposed convolution) primitive for int8 data type.

API deprecations and breaking changes

The default behavior of gemm-based convolutions was changed. Now they use internally allocated thread-local scratchpad memory for im2col and col2im operations, weights reduction, and accumulation. This may cause correctness issues when multiple gemm-based convolutions are created in one thread and executed concurrently in different threads. To support concurrent execution, MKL-DNN library must be configured with -DMKLDNN_ENABLE_CONCURRENT_EXEC=TRUE CMake flag.

Usability improvements

Extended documentation with details on MKL-DNN memory formats.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Yasser Zamani @yasserzamani and Loo Rong Jie @rongjiecomputer. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.16

Performance optimizations

New functionality

API deprecations and breaking changes

Usability improvements

Thanks to the contributors