webgpu: add MatMul and Gemm cases with large shapes#26572
webgpu: add MatMul and Gemm cases with large shapes#26572xhcao wants to merge 9 commits intomicrosoft:mainfrom
Conversation
|
The PR add cases for #26433 and #26461 @jchen10 @Jiawei-Shao PTAL |
| } | ||
|
|
||
| TEST(MatMul_Large, Float16_Subgroup) { | ||
| RunTestTyped<float, 13>({512, 1024}, {1024, 1024}); |
| } | ||
|
|
||
| } // namespace test | ||
| } // namespace onnxruntime |
There was a problem hiding this comment.
Too many duplicated code lines between the Float32 and Float16 cases. You can have one case covering both Float32 and Float16.
Split_Dim_Inner and Subgroup are not good names for the cases. They don't help me understand the rationale behind the tested various shapes.
There was a problem hiding this comment.
Pull request overview
This PR adds large-scale test cases for MatMul and Gemm operations for WebGPU provider, testing with various large tensor shapes (ranging from hundreds to over 1000 elements per dimension) to ensure correctness with both Float32 and Float16 data types.
- Introduces comprehensive test coverage for MatMul with 2D, 3D, and 4D tensors with large dimensions
- Adds Gemm test cases with large shapes, testing transpose operations, bias broadcasting, and alpha/beta parameters
- Includes helper functions for computing output shapes and expected results for validation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| onnxruntime/test/providers/cpu/math/matmul_large_test.cc | Adds MatMul large shape test cases with helper functions for shape computation and result validation; tests Float32 and Float16 with various batch dimensions and subgroup sizes |
| onnxruntime/test/providers/cpu/math/gemm_large_test.cc | Adds Gemm large shape test cases with support for transpose, bias broadcasting, and alpha/beta scaling; includes split dimension tests for different tensor configurations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.3f, 1.5f); | ||
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.3f, 1.5f); | ||
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.3f, 1.5f); | ||
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.3f, 1.5f); |
There was a problem hiding this comment.
[nitpick] The alpha and beta values are swapped (1.3f, 1.5f) compared to the corresponding Float32 test cases (1.5f, 1.3f) at lines 209-212. For consistency and easier comparison between test suites, consider using the same alpha and beta values in both Float32 and Float16 tests.
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.3f, 1.5f); | |
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.3f, 1.5f); | |
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.3f, 1.5f); | |
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.3f, 1.5f); | |
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 1}, 1.5f, 1.3f); | |
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {1, 192}, 1.5f, 1.3f); | |
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {192}, 1.5f, 1.3f); | |
| RunTestTyped<MLFloat16, 13>({16, 1024}, 0, {1024, 192}, 0, {16, 192}, 1.5f, 1.3f); |
| } | ||
|
|
||
| TEST(MatMul_Large, Float32_Subgroup) { | ||
| RunTestTyped<float, 13>({512, 1024}, {1024, 1024}); |
There was a problem hiding this comment.
Duplicate test case: line 173 is identical to line 170 with the same parameters {512, 1024}, {1024, 1024}. Consider removing the duplicate.
| TEST(Gemm_Large, Float32_Subgroup) { | ||
| RunTestTyped<float, 13>({512, 1024}, 0, {1024, 1024}, 0, {512, 1024}); | ||
| RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024}); | ||
| RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024}); |
There was a problem hiding this comment.
Duplicate test case: line 176 is identical to line 175 with the same parameters {127, 1024}, 0, {1024, 1024}, 0, {1024}. Consider removing the duplicate.
| RunTestTyped<float, 13>({127, 1024}, 0, {1024, 1024}, 0, {1024}); |
| K = left.NumDimensions() == 2 ? left[1] : left[0]; | ||
| } | ||
|
|
||
| N = trans_right == 1 ? N = right[0] : N = right[1]; |
There was a problem hiding this comment.
Redundant assignment in ternary operator. The expression N = trans_right == 1 ? N = right[0] : N = right[1] contains unnecessary nested assignments. Simplify to N = trans_right == 1 ? right[0] : right[1].
| N = trans_right == 1 ? N = right[0] : N = right[1]; | |
| N = trans_right == 1 ? right[0] : right[1]; |
| template <typename T1, int version> | ||
| void RunTestTyped(std::initializer_list<int64_t> a_dims, std::initializer_list<int64_t> b_dims) { | ||
| assert(a_dims.size() < 5 && b_dims.size() < 5, "max supported tensor dim is 4-D."); | ||
| assert(a_dims.size() > 1 && b_dims.size() > 1, "cannot support 1-D tensor."); |
There was a problem hiding this comment.
assert macro has incorrect syntax. The C++ assert macro only takes a single expression argument, not a message. Use ASSERT_TRUE or ASSERT_FALSE from gtest for test assertions with messages, or include <cassert> and use assert() without the message.
| assert(a_dims.size() > 1 && b_dims.size() > 1, "cannot support 1-D tensor."); | |
| ASSERT_TRUE(a_dims.size() > 1 && b_dims.size() > 1) << "cannot support 1-D tensor."; |
| RunTestTyped<float, 13>({512, 1024}, {1024, 1024}); | ||
| RunTestTyped<float, 13>({511, 1024}, {1024, 1024}); | ||
| RunTestTyped<float, 13>({511, 1024}, {1024, 1023}); | ||
| RunTestTyped<float, 13>({512, 1024}, {1024, 1024}); | ||
| RunTestTyped<float, 13>({1, 512, 1024}, {1024, 1024}); | ||
| RunTestTyped<float, 13>({2, 512, 1024}, {1024, 1024}); | ||
| RunTestTyped<float, 13>({2, 512, 1024}, {2, 1024, 1024}); | ||
| RunTestTyped<float, 13>({2, 2, 512, 1024}, {2, 1024, 1024}); |
There was a problem hiding this comment.
Test case Float16_Subgroup is using float type instead of MLFloat16. All calls to RunTestTyped should use MLFloat16 as the template parameter (e.g., RunTestTyped<MLFloat16, 13>) to properly test Float16 operations.
| RunTestTyped<float, 13>({512, 1024}, {1024, 1024}); | |
| RunTestTyped<float, 13>({511, 1024}, {1024, 1024}); | |
| RunTestTyped<float, 13>({511, 1024}, {1024, 1023}); | |
| RunTestTyped<float, 13>({512, 1024}, {1024, 1024}); | |
| RunTestTyped<float, 13>({1, 512, 1024}, {1024, 1024}); | |
| RunTestTyped<float, 13>({2, 512, 1024}, {1024, 1024}); | |
| RunTestTyped<float, 13>({2, 512, 1024}, {2, 1024, 1024}); | |
| RunTestTyped<float, 13>({2, 2, 512, 1024}, {2, 1024, 1024}); | |
| RunTestTyped<MLFloat16, 13>({512, 1024}, {1024, 1024}); | |
| RunTestTyped<MLFloat16, 13>({511, 1024}, {1024, 1024}); | |
| RunTestTyped<MLFloat16, 13>({511, 1024}, {1024, 1023}); | |
| RunTestTyped<MLFloat16, 13>({512, 1024}, {1024, 1024}); | |
| RunTestTyped<MLFloat16, 13>({1, 512, 1024}, {1024, 1024}); | |
| RunTestTyped<MLFloat16, 13>({2, 512, 1024}, {1024, 1024}); | |
| RunTestTyped<MLFloat16, 13>({2, 512, 1024}, {2, 1024, 1024}); | |
| RunTestTyped<MLFloat16, 13>({2, 2, 512, 1024}, {2, 1024, 1024}); |
|
@fs-eire All test cases nearly cost ~120s, including CPU EP. Should I let CPU EP not execute these cases , make these case only for WebGPU EP, and in a WebGPU EP special directory, not test on the other EPs? |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
I added GenerateGemmParams in gemm_test.cc to cover many shapes and different biasType/transpose/type combinations. Would that be useful for you? |
Yes, thanks |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
There are some bots failed for precision issues. Special machine issue? Our code issue? Or set the tolerance to small? |
|
Sorry for the late reply. I am thinking about whether enabling a ~120sec test case by default is a good idea. Is it OK that the test case disabled by default? (can use commandline to run explicitly) |
Hi, @fs-eire, no problem. I will follow your comments and try to modify the code. |
|
One way is to use This may not be a very good option to be honest, because it's very easy to forget to run the test for a long time. But it's at least than not having the tests at all. I am open to other options. |
@fs-eire |
this is a good idea. However, we need to be very careful designing the flags. Otherwise it can turn out to be too many or inconsistent to make it difficult to maintain. |
Description
Motivation and Context