v0.20
Performance optimizations
- Improved GEMM-based convolutions performance.
- Improved softmax performance.
- Added arbitrary eltwise fusion support in GEMM-based convolutions and inner product.
New functionality
- Introduced bfloat16 data type support in reorders, (de-)convolution, pooling, batch normalization, local response normalization, eltwise, inner product, shuffle, sum, and concat. The implementation relies on new instructions targeting future Intel Xeon Scalable processor (codename Cooper Lake). On the processors with Intel AVX512 support bfloat16 arithmetic is emulated.
Thanks to the contributors
This release contains contributions from many Intel Performance Libraries developers. We would also like to thank everyone who asked questions and reported issues.