Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification Needed on Memory Requirements for Batch Matrix Multiplication Description #1979

Closed
IshitaShreya opened this issue Jun 26, 2024 · 3 comments
Assignees
Labels

Comments

@IshitaShreya
Copy link

I am using oneDNN's batch matrix multiplication (batchmatmul) to perform three matrix multiplication operations. I need clarification on whether all the matrix elements must be passed as contiguous memory for the function to work correctly.
Question
For performing three matrix multiplication operations with batchmatmul, do we need to pass all matrices as a contiguous memory block, or can we use separate pointers for each matrix?

Context
https://oneapi-src.github.io/oneDNN/page_matmul_example_cpp.html#doxid-matmul-example-cpp
Here are the two scenarios I am considering:

1. Contiguous Memory Block:
Allocate a single memory block for all matrices. I combine all elements of matrices A, B, and C into single contiguous memory blocks (combined_A, combined_B, combined_C).
Example code:
// Allocate contiguous memory blocks for matrices A, B, and C
float *combined_A = new float[sizeA1 + sizeA2 + sizeA3];
float *combined_B = new float[sizeB1 + sizeB2 + sizeB3];
float *combined_C = new float[sizeC1 + sizeC2 + sizeC3];
// Perform batch matrix multiplication
batchmatmul(combined_A, combined_B, combined_C, ...); // This works

2. Separate Pointers:
Allocate separate memory blocks for each matrix.
Example code:
// Vectors of pointers to individual matrices
std::vector<float *> vecA = {ptrA1, ptrA2, ptrA3};
std::vector<float *> vecB = {ptrB1, ptrB2, ptrB3};
std::vector<float *> vecC = {ptrC1, ptrC2, ptrC3};
// Perform batch matrix multiplication
batchmatmul(vecA.data(), vecB.data(), vecC.data(), ...); //Clarification on this

Additional Information
If using separate pointers is supported, any guidelines or examples on how to correctly set up and pass these pointers to the batchmatmul function would be greatly appreciated.

@vpirogov vpirogov self-assigned this Jun 26, 2024
@vpirogov
Copy link
Member

vpirogov commented Jun 26, 2024

@IshitaShreya, oneDNN does not have batchmatmul function. Batched matrix-matrix multiplication is supported by matmul primitive. For batched case matrices must be provided as a memory object with (at least) 3 dimensions. Matmul example covers this case.

@IshitaShreya
Copy link
Author

It is understood that for batched matrix multiplication, matrices must be provided as memory objects with at least 3 dimensions. However, I am interested in knowing if I can pass non-contiguous memory blocks for each batch of matrices.

1. Contiguous Memory Block Example:

// Allocate contiguous memory blocks for matrices A, B, and C
float *combined_A = new float[sizeA1 + sizeA2 + sizeA3];
float *combined_B = new float[sizeB1 + sizeB2 + sizeB3];
float *combined_C = new float[sizeC1 + sizeC2 + sizeC3];
// Set up memory descriptors and memory objects
auto A_mem = memory({{batch, rowsA, colsA}, memory::data_type::f32, memory::format_tag::abc}, eng, combined_A);
auto B_mem = memory({{batch, rowsB, colsB}, memory::data_type::f32, memory::format_tag::abc}, eng, combined_B);
auto C_mem = memory({{batch, rowsC, colsC}, memory::data_type::f32, memory::format_tag::abc}, eng, combined_C);
// Perform batched matrix multiplication
matmul_prim.execute(strm, {{DNNL_ARG_SRC, A_mem}, {DNNL_ARG_WEIGHTS, B_mem}, {DNNL_ARG_DST, C_mem}});

2. Non-Contiguous Memory Block Example:
//With this approach where I am creating memory object with vecA.data(), vecB.data() and vecC.data() I am getting errors like Seg fault, Invalid next Size

// Vectors of pointers to individual matrices
std::vector<float *> vecA = {ptrA1, ptrA2, ptrA3};
std::vector<float *> vecB = {ptrB1, ptrB2, ptrB3};
std::vector<float *> vecC = {ptrC1, ptrC2, ptrC3};
// How to properly set up memory descriptors and memory objects for non-contiguous memory?
// Example (need confirmation if this approach is correct):
auto A_mem = memory({{batch, rowsA, colsA}, memory::data_type::f32, memory::format_tag::abc}, eng, vecA.data());
auto B_mem = memory({{batch, rowsB, colsB}, memory::data_type::f32, memory::format_tag::abc}, eng, vecB.data());
auto C_mem = memory({{batch, rowsC, colsC}, memory::data_type::f32, memory::format_tag::abc}, eng, vecC.data());

// Attempt to perform batched matrix multiplication
matmul_prim.execute(strm, {{DNNL_ARG_SRC, A_mem}, {DNNL_ARG_WEIGHTS, B_mem}, {DNNL_ARG_DST, C_mem}});

Questions
Can I provide non-contiguous memory blocks for each batch(for example 3) of matrices src(A1, A2, and A3), weight(B1, B2, B3) and dst(C1, C2, C3) in the matmul primitive for batched matrix multiplication?
If non-contiguous memory is supported, could you provide an example or guidelines on how to correctly set this up?

@vpirogov
Copy link
Member

@IshitaShreya, oneDNN memory objects use contiguous memory blocks. Please refer to the memory object section of the API reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants