Skip to content

[BENCHMARK][GEMM] Append missing wait to cutlass kernel#4071

Merged
whitneywhtsang merged 2 commits into
intel:mainfrom
jle-quel:jle-quel/update-cutlass-benchmark
May 2, 2025
Merged

[BENCHMARK][GEMM] Append missing wait to cutlass kernel#4071
whitneywhtsang merged 2 commits into
intel:mainfrom
jle-quel:jle-quel/update-cutlass-benchmark

Conversation

@jle-quel
Copy link
Copy Markdown
Contributor

@jle-quel jle-quel commented May 2, 2025

Description

This PR updates the GEMM invoker by appending the missing wait after the GEMM invocation. The absence of this wait was causing incorrect results in the benchmarking table shown below

# Before
matmul-performance:
        B    M      N        K  CUTLASS-GB/s  CUTLASS-GB/s-min  CUTLASS-GB/s-max  CUTLASS-TFlops  CUTLASS-TFlops-min  CUTLASS-TFlops-max  CUTLASS-CV
0  1024.0  8.0  128.0  16384.0    663.445529        658.732307      1.019553e+06        4.990767            4.955312         7669.584252    0.004058

# After
matmul-performance:
        B    M      N        K  CUTLASS-GB/s  CUTLASS-GB/s-min  CUTLASS-GB/s-max  CUTLASS-TFlops  CUTLASS-TFlops-min  CUTLASS-TFlops-max  CUTLASS-CV
0  1024.0  8.0  128.0  16384.0    664.932928         661.40312        669.719935        5.001956            4.975403            5.037967     0.00255

@whitneywhtsang whitneywhtsang enabled auto-merge (squash) May 2, 2025 17:55
@whitneywhtsang whitneywhtsang merged commit b05ff39 into intel:main May 2, 2025
10 checks passed
@jle-quel jle-quel deleted the jle-quel/update-cutlass-benchmark branch May 5, 2025 09:13
david-hls pushed a commit to david-hls/intel-xpu-backend-for-triton that referenced this pull request Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants