Conversation
…76,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4
|
Explore the complete analysis inside the Version Insights Performance Review Summary: PR #326Analysis Scope: Comparing version 6b1d7254-fefa-44ba-8e76-256501ca6ef9 against baseline aab9b31c-ad35-48ba-b9fe-4c0fd3dc2df2 Condition Assessment: Condition 1 applies - no measurable performance changes detected. PR #326 introduces AMD RDNA4 WMMA support through two targeted code changes in CUDA kernel files. Analysis of 15 performance-critical functions across 16 binaries shows zero measurable impact on Response Time, Throughput Time, and power consumption. All functions report Code Changes:
Performance Metrics:
Conclusion: Changes are architecture-specific correctness fixes with no runtime impact on current build. RDNA4-targeted builds would benefit from enabled WMMA functionality. |
89ba2e9 to
e4a4e1d
Compare
47d1dc9 to
297c352
Compare
Mirrored from ggml-org/llama.cpp#17502
Patched failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4 (verified all test cases passing when running ./build/bin/test-backend-ops test -o MUL_MAT
Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162
for ggml-org/llama.cpp#17156