UPSTREAM PR #19063: HIP: Enable MMA flash attention for RDNA3 with head size 576 by loci-dev · Pull Request #1018 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-24T05:38:13Z

Mirrored from ggml-org/llama.cpp#19063

Summary

This PR enables MMA-based flash attention on RDNA3 GPUs (gfx1100/1101/1102) for models with head size 576, such as GLM-4.7-Flash and other MLA (Multi-head Latent Attention) models.

Previously, flash attention with head size 576 only worked on CUDA (via #18953) and RDNA4. RDNA3 users had to disable flash attention, resulting in ~3x slower inference.

Changes

fattn.cu: Route RDNA3 + head size 576 to MMA kernel (was RDNA4-only)
fattn-mma-f16.cuh:
- Enable AMD WMMA guards for all RDNA3/RDNA4 (was RDNA4-only)
- Allow DKQ == 576 in AMD path (was limited to ≤128)
mma.cuh:
- Add RDNA3 to make_identity_mat()
- Add RDNA3 f16→f16 WMMA intrinsic with correct 4-argument signature

Performance

Tested on AMD RX 7900 XTX (gfx1100) with GLM-4.7-Flash-REAP-23B-A3B:

Configuration	Generation Speed
FA off (before)	~77 t/s
FA on (before - broken)	~27 t/s
FA on (after fix)	~83 t/s

Testing

Builds successfully with -DGGML_HIP=ON -DGGML_HIP_ROCWMMA_FATTN=ON -DGPU_TARGETS="gfx1100"
GLM-4.7-Flash-REAP inference works with flash attention enabled
No regressions on standard head sizes (64, 128)

This enables MMA-based flash attention on RDNA3 GPUs (gfx1100/1101/1102) for models with head size 576, such as GLM-4.7-Flash and other MLA (Multi-head Latent Attention) models. Previously, flash attention with head size 576 only worked on CUDA (via PR #18953) and RDNA4. RDNA3 users had to disable flash attention, resulting in ~3x slower inference. Changes: - fattn.cu: Route RDNA3 + head size 576 to MMA kernel (was RDNA4-only) - fattn-mma-f16.cuh: Enable AMD WMMA for all RDNA3/RDNA4, allow DKQ==576 - mma.cuh: Add RDNA3 to make_identity_mat(), add f16->f16 WMMA intrinsic Tested on AMD RX 7900 XTX (gfx1100) with GLM-4.7-Flash-REAP-23B: - FA off: ~77 t/s - FA on (before, broken): ~27 t/s - FA on (after fix): ~83 t/s

loci-review · 2026-01-24T06:27:21Z

Based on the analysis, no functions were identified with meaningful performance changes between the base and target versions. The function_insights_topk tool returned empty results for both response time and throughput time metrics, indicating that the code changes between these versions did not produce measurable performance impacts in the analyzed binaries.

This suggests that the modifications were either:

Non-performance-affecting changes (documentation, comments, refactoring)
Changes to code paths not captured in the static analysis
Modifications with performance impacts below the detection threshold
Updates to components not included in the analyzed binary versions

Conclusion: No significant performance regression or improvement was detected between the two versions based on the available metrics.

See the complete breakdown in Version Insights
Have questions? Tag @loci-dev to ask about this PR.

loci-dev temporarily deployed to PROD__AL_DEMO January 24, 2026 05:38 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 30f9ba9 to 0e2fcc8 Compare January 24, 2026 06:12

loci-dev force-pushed the main branch 26 times, most recently from a50395f to 8587aee Compare January 27, 2026 19:14

loci-dev force-pushed the main branch 30 times, most recently from 5fea2ef to 8a7ef20 Compare January 31, 2026 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19063: HIP: Enable MMA flash attention for RDNA3 with head size 576#1018

UPSTREAM PR #19063: HIP: Enable MMA flash attention for RDNA3 with head size 576#1018
loci-dev wants to merge 1 commit intomainfrom
upstream-PR19063-branch_linus-amg-hip-rdna3-mma-fattn-576

loci-dev commented Jan 24, 2026

Uh oh!

loci-review bot commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 24, 2026

Summary

Changes

Performance

Testing

Related

Uh oh!

loci-review bot commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants