Skip to content

webgpu: add MatMul and Gemm cases with large shapes#26572

Open
xhcao wants to merge 9 commits intomicrosoft:mainfrom
xhcao:matmul_gemm_test
Open

webgpu: add MatMul and Gemm cases with large shapes#26572
xhcao wants to merge 9 commits intomicrosoft:mainfrom
xhcao:matmul_gemm_test

Conversation

@xhcao
Copy link
Copy Markdown
Contributor

@xhcao xhcao commented Nov 14, 2025

Description

Motivation and Context

@xhcao
Copy link
Copy Markdown
Contributor Author

xhcao commented Nov 14, 2025

The PR add cases for #26433 and #26461

@jchen10 @Jiawei-Shao PTAL

}

TEST(MatMul_Large, Float16_Subgroup) {
RunTestTyped<float, 13>({512, 1024}, {1024, 1024});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLFloat16

}

} // namespace test
} // namespace onnxruntime
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many duplicated code lines between the Float32 and Float16 cases. You can have one case covering both Float32 and Float16.
Split_Dim_Inner and Subgroup are not good names for the cases. They don't help me understand the rationale behind the tested various shapes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds large-scale test cases for MatMul and Gemm operations for WebGPU provider, testing with various large tensor shapes (ranging from hundreds to over 1000 elements per dimension) to ensure correctness with both Float32 and Float16 data types.

  • Introduces comprehensive test coverage for MatMul with 2D, 3D, and 4D tensors with large dimensions
  • Adds Gemm test cases with large shapes, testing transpose operations, bias broadcasting, and alpha/beta parameters
  • Includes helper functions for computing output shapes and expected results for validation

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
onnxruntime/test/providers/cpu/math/matmul_large_test.cc Adds MatMul large shape test cases with helper functions for shape computation and result validation; tests Float32 and Float16 with various batch dimensions and subgroup sizes
onnxruntime/test/providers/cpu/math/gemm_large_test.cc Adds Gemm large shape test cases with support for transpose, bias broadcasting, and alpha/beta scaling; includes split dimension tests for different tensor configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +229 to +232
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.3f, 1.5f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.3f, 1.5f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.3f, 1.5f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.3f, 1.5f);
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The alpha and beta values are swapped (1.3f, 1.5f) compared to the corresponding Float32 test cases (1.5f, 1.3f) at lines 209-212. For consistency and easier comparison between test suites, consider using the same alpha and beta values in both Float32 and Float16 tests.

Suggested change
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.3f, 1.5f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.3f, 1.5f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.3f, 1.5f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.3f, 1.5f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.5f, 1.3f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.5f, 1.3f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.5f, 1.3f);
RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.5f, 1.3f);

Copilot uses AI. Check for mistakes.
}

TEST(MatMul_Large, Float32_Subgroup) {
RunTestTyped<float, 13>({512, 1024}, {1024, 1024});
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate test case: line 173 is identical to line 170 with the same parameters {512, 1024}, {1024, 1024}. Consider removing the duplicate.

Copilot uses AI. Check for mistakes.
TEST(Gemm_Large, Float32_Subgroup) {
RunTestTyped<float, 13>({512, 1024}, 0, {1024, 1024}, 0, {512, 1024});
RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024});
RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024});
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate test case: line 176 is identical to line 175 with the same parameters {127, 1024}, 0, {1024, 1024}, 0, {1024}. Consider removing the duplicate.

Suggested change
RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024});

Copilot uses AI. Check for mistakes.
K = left.NumDimensions() == 2 ? left[1] : left[0];
}

N = trans_right == 1 ? N = right[0] : N = right[1];
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant assignment in ternary operator. The expression N = trans_right == 1 ? N = right[0] : N = right[1] contains unnecessary nested assignments. Simplify to N = trans_right == 1 ? right[0] : right[1].

Suggested change
N = trans_right == 1 ? N = right[0] : N = right[1];
N = trans_right == 1 ? right[0] : right[1];

Copilot uses AI. Check for mistakes.
template <typename T1, int version>
void RunTestTyped(std::initializer_list<int64_t> a_dims, std::initializer_list<int64_t> b_dims) {
assert(a_dims.size() < 5 && b_dims.size() < 5, "max supported tensor dim is 4-D.");
assert(a_dims.size() > 1 && b_dims.size() > 1, "cannot support 1-D tensor.");
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert macro has incorrect syntax. The C++ assert macro only takes a single expression argument, not a message. Use ASSERT_TRUE or ASSERT_FALSE from gtest for test assertions with messages, or include <cassert> and use assert() without the message.

Suggested change
assert(a_dims.size() > 1 && b_dims.size() > 1, "cannot support 1-D tensor.");
ASSERT_TRUE(a_dims.size() > 1 && b_dims.size() > 1) << "cannot support 1-D tensor.";

Copilot uses AI. Check for mistakes.
Comment on lines +181 to +188
RunTestTyped<float, 13>({512, 1024}, {1024, 1024});
RunTestTyped<float, 13>({511, 1024}, {1024, 1024});
RunTestTyped<float, 13>({511, 1024}, {1024, 1023});
RunTestTyped<float, 13>({512, 1024}, {1024, 1024});
RunTestTyped<float, 13>({1, 512, 1024}, {1024, 1024});
RunTestTyped<float, 13>({2, 512, 1024}, {1024, 1024});
RunTestTyped<float, 13>({2, 512, 1024}, {2, 1024, 1024});
RunTestTyped<float, 13>({2, 2, 512, 1024}, {2, 1024, 1024});
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case Float16_Subgroup is using float type instead of MLFloat16. All calls to RunTestTyped should use MLFloat16 as the template parameter (e.g., RunTestTyped<MLFloat16, 13>) to properly test Float16 operations.

Suggested change
RunTestTyped<float, 13>({512, 1024}, {1024, 1024});
RunTestTyped<float, 13>({511, 1024}, {1024, 1024});
RunTestTyped<float, 13>({511, 1024}, {1024, 1023});
RunTestTyped<float, 13>({512, 1024}, {1024, 1024});
RunTestTyped<float, 13>({1, 512, 1024}, {1024, 1024});
RunTestTyped<float, 13>({2, 512, 1024}, {1024, 1024});
RunTestTyped<float, 13>({2, 512, 1024}, {2, 1024, 1024});
RunTestTyped<float, 13>({2, 2, 512, 1024}, {2, 1024, 1024});
RunTestTyped<MLFloat16, 13>({512, 1024}, {1024, 1024});
RunTestTyped<MLFloat16, 13>({511, 1024}, {1024, 1024});
RunTestTyped<MLFloat16, 13>({511, 1024}, {1024, 1023});
RunTestTyped<MLFloat16, 13>({512, 1024}, {1024, 1024});
RunTestTyped<MLFloat16, 13>({1, 512, 1024}, {1024, 1024});
RunTestTyped<MLFloat16, 13>({2, 512, 1024}, {1024, 1024});
RunTestTyped<MLFloat16, 13>({2, 512, 1024}, {2, 1024, 1024});
RunTestTyped<MLFloat16, 13>({2, 2, 512, 1024}, {2, 1024, 1024});

Copilot uses AI. Check for mistakes.
@xhcao
Copy link
Copy Markdown
Contributor Author

xhcao commented Dec 1, 2025

@fs-eire All test cases nearly cost ~120s, including CPU EP. Should I let CPU EP not execute these cases , make these case only for WebGPU EP, and in a WebGPU EP special directory, not test on the other EPs?
Is there a way that not need to provide expected results, comparing CPU EP and WebGPU EP results directly, and let CPU EP result as a reference result?

@guschmue
Copy link
Copy Markdown
Contributor

guschmue commented Dec 3, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@xiaofeihan1
Copy link
Copy Markdown
Contributor

I added GenerateGemmParams in gemm_test.cc to cover many shapes and different biasType/transpose/type combinations. Would that be useful for you?

@xhcao
Copy link
Copy Markdown
Contributor Author

xhcao commented Dec 4, 2025

I added GenerateGemmParams in gemm_test.cc to cover many shapes and different biasType/transpose/type combinations. Would that be useful for you?

Yes, thanks

@xiaofeihan1
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@xhcao
Copy link
Copy Markdown
Contributor Author

xhcao commented Dec 5, 2025

There are some bots failed for precision issues. Special machine issue? Our code issue? Or set the tolerance to small?

@fs-eire
Copy link
Copy Markdown
Contributor

fs-eire commented Feb 3, 2026

Sorry for the late reply.

I am thinking about whether enabling a ~120sec test case by default is a good idea. Is it OK that the test case disabled by default? (can use commandline to run explicitly)

@xhcao
Copy link
Copy Markdown
Contributor Author

xhcao commented Feb 4, 2026

Sorry for the late reply.

I am thinking about whether enabling a ~120sec test case by default is a good idea. Is it OK that the test case disabled by default? (can use commandline to run explicitly)

Hi, @fs-eire, no problem. I will follow your comments and try to modify the code.

@fs-eire
Copy link
Copy Markdown
Contributor

fs-eire commented Feb 4, 2026

One way is to use DISABLED_ as prefix to the test case name so that gtest won't run it by default. Can manually run by onnxruntime_provider_test --gtest_filter=<FULLNAME>.

This may not be a very good option to be honest, because it's very easy to forget to run the test for a long time. But it's at least than not having the tests at all. I am open to other options.

@qjia7
Copy link
Copy Markdown
Contributor

qjia7 commented Feb 4, 2026

One way is to use DISABLED_ as prefix to the test case name so that gtest won't run it by default. Can manually run by onnxruntime_provider_test --gtest_filter=<FULLNAME>.

This may not be a very good option to be honest, because it's very easy to forget to run the test for a long time. But it's at least than not having the tests at all. I am open to other options.

@fs-eire
Can we add some webgpu ep options specific for tests so that some specific matmul can be tested directly no matter it satisfies the shapes requirement? In this way, we can use very small tests to cover different matmul algorithms if the purpose of this PR is to test different matmul path.

@fs-eire
Copy link
Copy Markdown
Contributor

fs-eire commented Feb 5, 2026

One way is to use DISABLED_ as prefix to the test case name so that gtest won't run it by default. Can manually run by onnxruntime_provider_test --gtest_filter=<FULLNAME>.
This may not be a very good option to be honest, because it's very easy to forget to run the test for a long time. But it's at least than not having the tests at all. I am open to other options.

@fs-eire Can we add some webgpu ep options specific for tests so that some specific matmul can be tested directly no matter it satisfies the shapes requirement? In this way, we can use very small tests to cover different matmul algorithms if the purpose of this PR is to test different matmul path.

this is a good idea. However, we need to be very careful designing the flags. Otherwise it can turn out to be too many or inconsistent to make it difficult to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants