UPSTREAM PR #17990: HIP: Refactor mma for RDNA and CDNA#548
UPSTREAM PR #17990: HIP: Refactor mma for RDNA and CDNA#548
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #548OverviewPR #548 refactors AMD GPU matrix multiply-accumulate infrastructure for RDNA3, RDNA4, and CDNA architectures. The changes introduce a data layout abstraction system replacing architecture-specific conditional compilation with template-based approaches. Analysis shows zero performance impact as the compared versions (50ec7e39 vs 7d958d88) are functionally identical binaries. Performance MetricsAll analyzed functions show 0% change in both response time and throughput:
Power consumption analysis across 16 binaries shows negligible variation (< 0.001%), with the largest change being +1.35 nJ in llama-run. Code ChangesThe PR modifies GPU-specific matrix operation primitives in three files (mma.cuh, mmf.cuh, mmq.cuh). Key changes include:
These changes affect AMD GPU code paths only and do not modify CPU inference or tokenization functions. Inference ImpactTokens per second: No impact. The tokenization and inference functions (llama_decode, llama_encode, llama_tokenize) show zero response time changes. Since the reference model experiences 7% tokens/sec reduction with 2 ms llama_decode slowdown, and the measured change is 0 ns, no inference performance degradation occurs. Impacted functions for tokens/sec: None - llama_decode, llama_encode, and llama_tokenize maintain identical performance. Power consumption: All binaries show stable power profiles with changes below measurement noise. Impacted binaries: build.bin.llama-run (+0.001%), build.bin.libllama.so (-0.0%), all others (0.0%). |
799183f to
26e8fe3
Compare
048ad94 to
6c1fde6
Compare
823244c to
bab7d39
Compare
9ea4a65 to
c001e9f
Compare
Mirrored from ggml-org/llama.cpp#17990
Refactor mma.cuh for RDNA and CDNA, clean up row-major and colum-major matrix for future development like FA, add dual matrix type for RDNA3.
CDNA isn't tested as I don't have a GPU, @JohannesGaessler could you help to do a raw test on your MI GPU? Thank you. Honestly, I probably need your coding help to fix the bug on CDNA as I don't have a GPU, thank you.