Clarification Needed on Memory Requirements for Batch Matrix Multiplication Description #1979

IshitaShreya · 2024-06-26T16:54:45Z

I am using oneDNN's batch matrix multiplication (batchmatmul) to perform three matrix multiplication operations. I need clarification on whether all the matrix elements must be passed as contiguous memory for the function to work correctly.
Question
For performing three matrix multiplication operations with batchmatmul, do we need to pass all matrices as a contiguous memory block, or can we use separate pointers for each matrix?

Context
https://oneapi-src.github.io/oneDNN/page_matmul_example_cpp.html#doxid-matmul-example-cpp
Here are the two scenarios I am considering:

1. Contiguous Memory Block:
Allocate a single memory block for all matrices. I combine all elements of matrices A, B, and C into single contiguous memory blocks (combined_A, combined_B, combined_C).
Example code:
// Allocate contiguous memory blocks for matrices A, B, and C
float *combined_A = new float[sizeA1 + sizeA2 + sizeA3];
float *combined_B = new float[sizeB1 + sizeB2 + sizeB3];
float *combined_C = new float[sizeC1 + sizeC2 + sizeC3];
// Perform batch matrix multiplication
batchmatmul(combined_A, combined_B, combined_C, ...); // This works

2. Separate Pointers:
Allocate separate memory blocks for each matrix.
Example code:
// Vectors of pointers to individual matrices
std::vector<float *> vecA = {ptrA1, ptrA2, ptrA3};
std::vector<float *> vecB = {ptrB1, ptrB2, ptrB3};
std::vector<float *> vecC = {ptrC1, ptrC2, ptrC3};
// Perform batch matrix multiplication
batchmatmul(vecA.data(), vecB.data(), vecC.data(), ...); //Clarification on this

Additional Information
If using separate pointers is supported, any guidelines or examples on how to correctly set up and pass these pointers to the batchmatmul function would be greatly appreciated.

vpirogov · 2024-06-26T20:41:16Z

@IshitaShreya, oneDNN does not have batchmatmul function. Batched matrix-matrix multiplication is supported by matmul primitive. For batched case matrices must be provided as a memory object with (at least) 3 dimensions. Matmul example covers this case.

IshitaShreya · 2024-06-27T14:46:57Z

It is understood that for batched matrix multiplication, matrices must be provided as memory objects with at least 3 dimensions. However, I am interested in knowing if I can pass non-contiguous memory blocks for each batch of matrices.

1. Contiguous Memory Block Example:

// Allocate contiguous memory blocks for matrices A, B, and C
float *combined_A = new float[sizeA1 + sizeA2 + sizeA3];
float *combined_B = new float[sizeB1 + sizeB2 + sizeB3];
float *combined_C = new float[sizeC1 + sizeC2 + sizeC3];
// Set up memory descriptors and memory objects
auto A_mem = memory({{batch, rowsA, colsA}, memory::data_type::f32, memory::format_tag::abc}, eng, combined_A);
auto B_mem = memory({{batch, rowsB, colsB}, memory::data_type::f32, memory::format_tag::abc}, eng, combined_B);
auto C_mem = memory({{batch, rowsC, colsC}, memory::data_type::f32, memory::format_tag::abc}, eng, combined_C);
// Perform batched matrix multiplication
matmul_prim.execute(strm, {{DNNL_ARG_SRC, A_mem}, {DNNL_ARG_WEIGHTS, B_mem}, {DNNL_ARG_DST, C_mem}});

2. Non-Contiguous Memory Block Example:
//With this approach where I am creating memory object with vecA.data(), vecB.data() and vecC.data() I am getting errors like Seg fault, Invalid next Size

// Vectors of pointers to individual matrices
std::vector<float *> vecA = {ptrA1, ptrA2, ptrA3};
std::vector<float *> vecB = {ptrB1, ptrB2, ptrB3};
std::vector<float *> vecC = {ptrC1, ptrC2, ptrC3};
// How to properly set up memory descriptors and memory objects for non-contiguous memory?
// Example (need confirmation if this approach is correct):
auto A_mem = memory({{batch, rowsA, colsA}, memory::data_type::f32, memory::format_tag::abc}, eng, vecA.data());
auto B_mem = memory({{batch, rowsB, colsB}, memory::data_type::f32, memory::format_tag::abc}, eng, vecB.data());
auto C_mem = memory({{batch, rowsC, colsC}, memory::data_type::f32, memory::format_tag::abc}, eng, vecC.data());
// Attempt to perform batched matrix multiplication
matmul_prim.execute(strm, {{DNNL_ARG_SRC, A_mem}, {DNNL_ARG_WEIGHTS, B_mem}, {DNNL_ARG_DST, C_mem}});

Questions
Can I provide non-contiguous memory blocks for each batch(for example 3) of matrices src(A1, A2, and A3), weight(B1, B2, B3) and dst(C1, C2, C3) in the matmul primitive for batched matrix multiplication?
If non-contiguous memory is supported, could you provide an example or guidelines on how to correctly set this up?

vpirogov · 2024-06-28T22:06:26Z

@IshitaShreya, oneDNN memory objects use contiguous memory blocks. Please refer to the memory object section of the API reference.

IshitaShreya added the question label Jun 26, 2024

vpirogov self-assigned this Jun 26, 2024

vpirogov closed this as completed Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification Needed on Memory Requirements for Batch Matrix Multiplication Description #1979

Clarification Needed on Memory Requirements for Batch Matrix Multiplication Description #1979

IshitaShreya commented Jun 26, 2024

vpirogov commented Jun 26, 2024 •

edited

Loading

IshitaShreya commented Jun 27, 2024

vpirogov commented Jun 28, 2024

Clarification Needed on Memory Requirements for Batch Matrix Multiplication Description #1979

Clarification Needed on Memory Requirements for Batch Matrix Multiplication Description #1979

Comments

IshitaShreya commented Jun 26, 2024

vpirogov commented Jun 26, 2024 • edited Loading

IshitaShreya commented Jun 27, 2024

vpirogov commented Jun 28, 2024

vpirogov commented Jun 26, 2024 •

edited

Loading