webgpu: add MatMul and Gemm cases with large shapes by xhcao · Pull Request #26572 · microsoft/onnxruntime

xhcao · 2025-11-14T08:54:03Z

Description

Motivation and Context

xhcao · 2025-11-14T08:56:10Z

The PR add cases for #26433 and #26461

jchen10 · 2025-11-14T13:09:20Z

+}
+
+TEST(MatMul_Large, Float16_Subgroup) {
+  RunTestTyped<float, 13>({512, 1024}, {1024, 1024});


MLFloat16

jchen10 · 2025-11-14T13:24:39Z

+}
+
+}  // namespace test
+}  // namespace onnxruntime


Too many duplicated code lines between the Float32 and Float16 cases. You can have one case covering both Float32 and Float16.
Split_Dim_Inner and Subgroup are not good names for the cases. They don't help me understand the rationale behind the tested various shapes.

Copilot

Pull request overview

This PR adds large-scale test cases for MatMul and Gemm operations for WebGPU provider, testing with various large tensor shapes (ranging from hundreds to over 1000 elements per dimension) to ensure correctness with both Float32 and Float16 data types.

Introduces comprehensive test coverage for MatMul with 2D, 3D, and 4D tensors with large dimensions
Adds Gemm test cases with large shapes, testing transpose operations, bias broadcasting, and alpha/beta parameters
Includes helper functions for computing output shapes and expected results for validation

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
onnxruntime/test/providers/cpu/math/matmul_large_test.cc	Adds MatMul large shape test cases with helper functions for shape computation and result validation; tests Float32 and Float16 with various batch dimensions and subgroup sizes
onnxruntime/test/providers/cpu/math/gemm_large_test.cc	Adds Gemm large shape test cases with support for transpose, bias broadcasting, and alpha/beta scaling; includes split dimension tests for different tensor configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-21T20:11:28Z

+  RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.3f, 1.5f);
+  RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.3f, 1.5f);
+  RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.3f, 1.5f);
+  RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.3f, 1.5f);


[nitpick] The alpha and beta values are swapped (1.3f, 1.5f) compared to the corresponding Float32 test cases (1.5f, 1.3f) at lines 209-212. For consistency and easier comparison between test suites, consider using the same alpha and beta values in both Float32 and Float16 tests.

Suggested change

RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.3f, 1.5f);

RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.3f, 1.5f);

RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.3f, 1.5f);

RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.3f, 1.5f);

RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.5f, 1.3f);

RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.5f, 1.3f);

RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.5f, 1.3f);

RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.5f, 1.3f);

Copilot · 2025-11-21T20:11:29Z

+}
+
+TEST(MatMul_Large, Float32_Subgroup) {
+  RunTestTyped<float, 13>({512, 1024}, {1024, 1024});


Duplicate test case: line 173 is identical to line 170 with the same parameters {512, 1024}, {1024, 1024}. Consider removing the duplicate.

Copilot · 2025-11-21T20:11:29Z

+TEST(Gemm_Large, Float32_Subgroup) {
+  RunTestTyped<float, 13>({512, 1024}, 0, {1024, 1024}, 0, {512, 1024});
+  RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024});
+  RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024});


Duplicate test case: line 176 is identical to line 175 with the same parameters {127, 1024}, 0, {1024, 1024}, 0, {1024}. Consider removing the duplicate.

Suggested change

RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024});

Copilot · 2025-11-21T20:11:29Z

+    K = left.NumDimensions() == 2 ? left[1] : left[0];
+  }
+
+  N = trans_right == 1 ? N = right[0] : N = right[1];


Redundant assignment in ternary operator. The expression N = trans_right == 1 ? N = right[0] : N = right[1] contains unnecessary nested assignments. Simplify to N = trans_right == 1 ? right[0] : right[1].

Suggested change

N = trans_right == 1 ? N = right[0] : N = right[1];

N = trans_right == 1 ? right[0] : right[1];

Copilot · 2025-11-21T20:11:30Z

+template <typename T1, int version>
+void RunTestTyped(std::initializer_list<int64_t> a_dims, std::initializer_list<int64_t> b_dims) {
+  assert(a_dims.size() < 5 && b_dims.size() < 5, "max supported tensor dim is 4-D.");
+  assert(a_dims.size() > 1 && b_dims.size() > 1, "cannot support 1-D tensor.");


assert macro has incorrect syntax. The C++ assert macro only takes a single expression argument, not a message. Use ASSERT_TRUE or ASSERT_FALSE from gtest for test assertions with messages, or include <cassert> and use assert() without the message.

Suggested change

assert(a_dims.size() > 1 && b_dims.size() > 1, "cannot support 1-D tensor.");

ASSERT_TRUE(a_dims.size() > 1 && b_dims.size() > 1) << "cannot support 1-D tensor.";

Copilot · 2025-11-21T20:11:30Z

+  RunTestTyped<float, 13>({512, 1024}, {1024, 1024});
+  RunTestTyped<float, 13>({511, 1024}, {1024, 1024});
+  RunTestTyped<float, 13>({511, 1024}, {1024, 1023});
+  RunTestTyped<float, 13>({512, 1024}, {1024, 1024});
+  RunTestTyped<float, 13>({1, 512, 1024}, {1024, 1024});
+  RunTestTyped<float, 13>({2, 512, 1024}, {1024, 1024});
+  RunTestTyped<float, 13>({2, 512, 1024}, {2, 1024, 1024});
+  RunTestTyped<float, 13>({2, 2, 512, 1024}, {2, 1024, 1024});


Test case Float16_Subgroup is using float type instead of MLFloat16. All calls to RunTestTyped should use MLFloat16 as the template parameter (e.g., RunTestTyped<MLFloat16, 13>) to properly test Float16 operations.

Suggested change

RunTestTyped<float, 13>({512, 1024}, {1024, 1024});

RunTestTyped<float, 13>({511, 1024}, {1024, 1024});

RunTestTyped<float, 13>({511, 1024}, {1024, 1023});

RunTestTyped<float, 13>({512, 1024}, {1024, 1024});

RunTestTyped<float, 13>({1, 512, 1024}, {1024, 1024});

RunTestTyped<float, 13>({2, 512, 1024}, {1024, 1024});

RunTestTyped<float, 13>({2, 512, 1024}, {2, 1024, 1024});

RunTestTyped<float, 13>({2, 2, 512, 1024}, {2, 1024, 1024});

RunTestTyped<MLFloat16, 13>({512, 1024}, {1024, 1024});

RunTestTyped<MLFloat16, 13>({511, 1024}, {1024, 1024});

RunTestTyped<MLFloat16, 13>({511, 1024}, {1024, 1023});

RunTestTyped<MLFloat16, 13>({512, 1024}, {1024, 1024});

RunTestTyped<MLFloat16, 13>({1, 512, 1024}, {1024, 1024});

RunTestTyped<MLFloat16, 13>({2, 512, 1024}, {1024, 1024});

RunTestTyped<MLFloat16, 13>({2, 512, 1024}, {2, 1024, 1024});

RunTestTyped<MLFloat16, 13>({2, 2, 512, 1024}, {2, 1024, 1024});

xhcao · 2025-12-01T07:47:04Z

@fs-eire All test cases nearly cost ~120s, including CPU EP. Should I let CPU EP not execute these cases , make these case only for WebGPU EP, and in a WebGPU EP special directory, not test on the other EPs?
Is there a way that not need to provide expected results, comparing CPU EP and WebGPU EP results directly, and let CPU EP result as a reference result?

guschmue · 2025-12-03T16:34:51Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-12-03T16:35:10Z

Azure Pipelines successfully started running 4 pipeline(s).

xiaofeihan1 · 2025-12-04T02:35:08Z

I added GenerateGemmParams in gemm_test.cc to cover many shapes and different biasType/transpose/type combinations. Would that be useful for you?

xhcao · 2025-12-04T02:47:45Z

I added GenerateGemmParams in gemm_test.cc to cover many shapes and different biasType/transpose/type combinations. Would that be useful for you?

Yes, thanks

xiaofeihan1 · 2025-12-04T07:32:39Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-12-04T07:33:00Z

Azure Pipelines successfully started running 4 pipeline(s).

xhcao · 2025-12-05T04:56:35Z

There are some bots failed for precision issues. Special machine issue? Our code issue? Or set the tolerance to small?

fs-eire · 2026-02-03T01:31:13Z

Sorry for the late reply.

I am thinking about whether enabling a ~120sec test case by default is a good idea. Is it OK that the test case disabled by default? (can use commandline to run explicitly)

xhcao · 2026-02-04T07:02:10Z

Sorry for the late reply.

I am thinking about whether enabling a ~120sec test case by default is a good idea. Is it OK that the test case disabled by default? (can use commandline to run explicitly)

Hi, @fs-eire, no problem. I will follow your comments and try to modify the code.

fs-eire · 2026-02-04T07:37:19Z

One way is to use DISABLED_ as prefix to the test case name so that gtest won't run it by default. Can manually run by onnxruntime_provider_test --gtest_filter=<FULLNAME>.

This may not be a very good option to be honest, because it's very easy to forget to run the test for a long time. But it's at least than not having the tests at all. I am open to other options.

qjia7 · 2026-02-04T08:17:09Z

One way is to use DISABLED_ as prefix to the test case name so that gtest won't run it by default. Can manually run by onnxruntime_provider_test --gtest_filter=<FULLNAME>.

This may not be a very good option to be honest, because it's very easy to forget to run the test for a long time. But it's at least than not having the tests at all. I am open to other options.

@fs-eire
Can we add some webgpu ep options specific for tests so that some specific matmul can be tested directly no matter it satisfies the shapes requirement? In this way, we can use very small tests to cover different matmul algorithms if the purpose of this PR is to test different matmul path.

fs-eire · 2026-02-05T01:32:49Z

One way is to use DISABLED_ as prefix to the test case name so that gtest won't run it by default. Can manually run by onnxruntime_provider_test --gtest_filter=<FULLNAME>.
This may not be a very good option to be honest, because it's very easy to forget to run the test for a long time. But it's at least than not having the tests at all. I am open to other options.

@fs-eire Can we add some webgpu ep options specific for tests so that some specific matmul can be tested directly no matter it satisfies the shapes requirement? In this way, we can use very small tests to cover different matmul algorithms if the purpose of this PR is to test different matmul path.

this is a good idea. However, we need to be very careful designing the flags. Otherwise it can turn out to be too many or inconsistent to make it difficult to maintain.

webgpu: add MatMul and Gemm cases with large shapes

6c1f5ee

jchen10 reviewed Nov 14, 2025

View reviewed changes

guschmue added the ep:WebGPU ort-web webgpu provider label Nov 21, 2025

guschmue requested a review from Copilot November 21, 2025 20:07

Copilot started reviewing on behalf of guschmue November 21, 2025 20:08 View session

Copilot finished reviewing on behalf of guschmue November 21, 2025 20:10

Copilot AI reviewed Nov 21, 2025

View reviewed changes

xhcao added 3 commits December 1, 2025 11:07

Address comments

5132c90

Merge remote-tracking branch 'upstream/main' into matmul_gemm_test

21aef6c

Rename test cases

638797b

xhcao added 2 commits December 4, 2025 14:28

Fix build error

9b479ee

Merge branch 'main' into matmul_gemm_test

96a6e6e

Fix build errors

0b49e7d

xhcao added 2 commits December 11, 2025 16:45

Merge gemm_large_test file into gemm_test file

fd7a751

Merge remote-tracking branch 'upstream/main' into matmul_gemm_test

070c79d

	N = trans_right == 1 ? N = right[0] : N = right[1];
	N = trans_right == 1 ? right[0] : right[1];

	assert(a_dims.size() > 1 && b_dims.size() > 1, "cannot support 1-D tensor.");
	ASSERT_TRUE(a_dims.size() > 1 && b_dims.size() > 1) << "cannot support 1-D tensor.";

Conversation

xhcao commented Nov 14, 2025

Description

Motivation and Context

Uh oh!

xhcao commented Nov 14, 2025

Uh oh!

jchen10 Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

jchen10 Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

xhcao commented Dec 1, 2025

Uh oh!

guschmue commented Dec 3, 2025

Uh oh!

azure-pipelines Bot commented Dec 3, 2025

Uh oh!

xiaofeihan1 commented Dec 4, 2025

Uh oh!

xhcao commented Dec 4, 2025

Uh oh!

xiaofeihan1 commented Dec 4, 2025

Uh oh!

azure-pipelines Bot commented Dec 4, 2025

Uh oh!

xhcao commented Dec 5, 2025

Uh oh!

fs-eire commented Feb 3, 2026

Uh oh!

xhcao commented Feb 4, 2026

Uh oh!

fs-eire commented Feb 4, 2026

Uh oh!

qjia7 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fs-eire commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

qjia7 commented Feb 4, 2026 •

edited

Loading