Skip to content

UPSTREAM PR #18202: HIP: Use mmq on MFMA devices for MUL_MAT_ID in cases where a lot of splits would be generated#623

Closed
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18202-branch_IMbackK-mmidopt
Closed

UPSTREAM PR #18202: HIP: Use mmq on MFMA devices for MUL_MAT_ID in cases where a lot of splits would be generated#623
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18202-branch_IMbackK-mmidopt

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18202

On MFMA hardware, MMQ performs better for medium sized problems, while dequant+rocblas performs better for large problem sizes.

currently ggml_cuda_should_use_mmq choses based on batch size and data type. This is suboptimal for MUL_MAT_ID as, even if the involved tensors are large, we end up calling rocblas for a large number of small tensors if the number of experts is high, causing poor performance.
This pr addresses this by choosing MMQ when the number of experts is high.

branch marks on a MI100 @ 160W power limit.

Model Microbatch size Test t/s master t/s mmidopt Speedup
gpt-oss 20B MXFP4 MoE 32 pp1024 737.25 745.02 1.01
gpt-oss 20B MXFP4 MoE 64 pp1024 962.68 974.75 1.01
gpt-oss 20B MXFP4 MoE 128 pp1024 955.28 967.76 1.01
gpt-oss 20B MXFP4 MoE 256 pp1024 1720.56 1725.10 1.00
gpt-oss 20B MXFP4 MoE 512 pp1024 2277.16 2291.13 1.01
gpt-oss 20B MXFP4 MoE 1024 pp1024 2665.15 2685.24 1.01
qwen3moe 30B.A3B Q4_K_M 32 pp1024 436.42 434.94 1.00
qwen3moe 30B.A3B Q4_K_M 64 pp1024 562.45 563.55 1.00
qwen3moe 30B.A3B Q4_K_M 128 pp1024 716.47 721.23 1.01
qwen3moe 30B.A3B Q4_K_M 256 pp1024 1032.03 1124.19 1.09
qwen3moe 30B.A3B Q4_K_M 512 pp1024 782.11 1497.25 1.91
qwen3moe 30B.A3B Q4_K_M 1024 pp1024 1058.36 1738.98 1.64

future note: possibly it would be better to select based on the size of the resulting splits.

@loci-dev loci-dev force-pushed the main branch 19 times, most recently from 26a6f0f to cf53bc9 Compare December 22, 2025 14:09
@DajanaV DajanaV closed this Dec 22, 2025
@DajanaV DajanaV deleted the upstream-PR18202-branch_IMbackK-mmidopt branch December 22, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants