UPSTREAM PR #17817: HIP: fix RDNA3 FP16/BF16 matrix multiplication#467
UPSTREAM PR #17817: HIP: fix RDNA3 FP16/BF16 matrix multiplication#467
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary - PR #467OverviewPR #467 introduces a hardware-specific correctness fix for AMD RDNA3 GPUs in the HIP backend, restricting FP16/BF16 WMMA operations to RDNA4 architecture only. The change modifies Performance Impact AssessmentFunction-Level Analysis: Power Consumption Analysis:
Inference Performance: Hardware-Specific Behavior: Conclusion: |
6d9272a to
4ca17fb
Compare
adf9533 to
7103504
Compare
Mirrored from ggml-org/llama.cpp#17817
Fixes ggml-org/llama.cpp#17797 by simply adding an explicit RDNA4 requirement to MMF. @jiachengjason as outlined in https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#pull-requests-for-contributors--collaborators , please test changes to the CUDA/HIP backend for correctness using
test-backend-ops.