Complex Operations

Matrix Multiplication

Multiplication between matrices (or vector-matrix or matrix-vector) can be done simply by using the * operator:

etl::fast_matrix<T, 2, 3> a = {1,2,3,4,5,6};
etl::fast_matrix<T, 3, 2> b = {7,8,9,10,11,12};
etl::fast_matrix<T, 2, 2> c = a * b;

If BLAS is configured, a BLAS implementation will be used for the multiplication. Otherwise a custom blocked implementatio will be used. ETL will always use the fastest implementation available.

An implementation of Strassen algorithm is also available:

etl::fast_matrix<T, 2, 2> c = etl::strassen_mmul(a, b);

But it is nowhere as optimized as a BLAS operation. Although it is theoretically faster (O(n^2.8074) < O(n^3)) than a standard matrix matrix multiplication algorithm, in practice it's much slower.

Convolution

ETL implements a rich set of convolution functions.

ETl contains several implementations of each algorithm and will try to select the fastest algorithm depending on the availability of SSE/AVX/BLAS/CUDA/MKL.

1D Convolution

1-dimensional convolution of vectors is directly possible:

c = conv_1d_valid(a, b) or conv_1d_valid(a, b, c): Valid convolution of a and b
c = conv_1d_full(a, b) or conv_1d_full(a, b, c): Full convolution of a and b
c = conv_1d_same(a, b) or conv_1d_same(a, b, c): Same convolution of a and b

2D Convolution

2-dimensional convolution of matrices is directly possible as well:

C = conv_2d_valid(A, B) or conv_2d_valid(A, B, C): Valid convolution of A and B
C = conv_2d_full(A, B) or conv_2d_full(A, B, C): Full convolution of A and B
C = conv_2d_same(A, B) or conv_2d_same(A, B, C): Same convolution of A and B

To perform a cross correlation or if the weights are already flipped:

C = conv_2d_valid_flipped(A, B) or conv_2d_valid_flipped(A, B, C): Valid cross correlation of A and B
C = conv_2d_full_flipped(A, B) or conv_2d_full_flipped(A, B, C): Full cross correlation of A and B
C = conv_2d_same_flipped(A, B) or conv_2d_same_flipped(A, B, C): Same cross correlation of A and B

Valid convolutions have support for stride and padding:

C = conv_2d_valid<S1, S2, P1, P2>(A, B) or C = conv_2d_valid(A, B, S1, S2, P1, P2): Valid convolution of A and B with a stride of (S1, S2) and padding of (P1, P2).

Convolutions with multiple kernels

If you need to convolve one input with multiple kernels, you can use the multi versions of the expressions:

C = conv_2d_valid_multi(A, B) or conv_2d_valid_multi(A, B, C): Valid convolution of A and B
C = conv_2d_full_multi(A, B) or conv_2d_full_multi(A, B, C): Full convolution of A and B
C = conv_2d_same_multi(A, B) or conv_2d_same_multi(A, B, C): Same convolution of A and B

A is a 2D matrix whereas B and C must be 3D matrices.

Versions with flipped kernels also exists:

C = conv_2d_valid_multi_flipped(A, B) or conv_2d_valid_multi_flipped(A, B, C): Valid convolution of A and B
C = conv_2d_full_multi_flipped(A, B) or conv_2d_full_multi_flipped(A, B, C): Full convolution of A and B
C = conv_2d_same_multi_flipped(A, B) or conv_2d_same_multi_flipped(A, B, C): Same convolution of A and B

Padding and stride is also possible.

'4D' convolutions

If you have multiple images and multiple kernels, you can use '4D' convolutions that performs multiple 2D convolutions and accumulation under the hood. These operations are the ones used in Convolutional Neural Networks.

conv_4d_valid with A[N, C, Hi, Wi], B[K, C, Hf, Wf] and C[N, K, Hi - Hf + 1, Wi - Wf + 1]
conv_4d_full with A[N, C, Hi, Wi], B[K, C, Hf, Wf] and C[N, K, Hi - Hf + 1, Wi - Wf + 1]

Flipped versions are also available:

conv_4d_valid_flipped with A[N, C, Hi, Wi], B[K, C, Hf, Wf] and C[N, K, Hi - Hf + 1, Wi - Wf + 1]
conv_4d_full_flipped with A[N, C, Hi, Wi], B[K, C, Hf, Wf] and C[N, K, Hi - Hf + 1, Wi - Wf + 1]

Deep 2D Convolutions

If you have multiple convolutions, you can use the deep versions of the expressions. For instance, if you want to perform N 'same' convolutions of A[N, 2, 2], B[N, 2, 2], C[N, 2, 2] or with more dimensions (for instance, A[N, K, 2, 2], B[N, K, 2, 2], C[N, K, 2, 2]). In all cases, the 'batch' dimensions must be the same for all 3 structures.

C = conv_deep_valid(A, B) or conv_deep_valid(A, B, C)
C = conv_deep_full(A, B) or conv_deep_full(A, B, C)
C = conv_deep_same(A, B) or conv_deep_same(A, B, C)

Take into account that this is not highly optimized and is simply performing multiple 2D convolutions on sub structures.

Fast Fourrier Transform

Fast Fourrier Transform operations are available:

fft_1d(vector) : Returns the FFT of the given vector
ifft_1d(vector) : Returns Inverse FFT of the given vector
fft_2d(matrix) : Returns the FFT of the given 2D matrix
ifft_2d(matrix) : Returns Inverse FFT of the given 2D matrix
fft_1d_many(matrix) : Batched 1D FFT of the given matrix (the first dimension is considered as batch dimension)
fft_2d_many(matrix) : Batched 2D FFT of the given matrix (the first dimension is considered as batch dimension)

Depending on the configuration, they can be computed using either the standard implementation of ETL, the implementation of the Intel MKL or the implementation of CUFFT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly