CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 #462

ikawrakow · 2025-05-26T16:40:49Z

This PR is a follow up to PR #461 and adds CUDA implementation for IQ4_KS_R4 and IQ5_KS_R4

Note: because GEMM is implemented via dequantize+cuBLAS, if you want to use a IQX_K_R4 DeepSeek-V3/R1 model on the GPU, you may need to build with -DGGML_CUDA_IQK_FORCE_BF16=1 to force bf16 arithmetic with cuBLAS as fp16 has been noted to lead to numerical instabilities and garbled output. I did not enable GGML_CUDA_IQK_FORCE_BF16 by default as it reduces prompt processing performance while, as far as I can tell, bf16 is only required for DeepSeek.

Iwan Kawrakow added 2 commits May 26, 2025 19:36

CUDA: iq4_ks_r4 GEMV and GEMM

f0efb1f

CUDA: iq5_ks_r4 GEMV and GEMM

64c754b

ikawrakow merged commit 0976467 into main May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 #462

CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 #462

Uh oh!

ikawrakow commented May 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 #462

CUDA GEMM and GEMV for IQ4_KS_R4 and IQ5_KS_R4 #462

Uh oh!

Conversation

ikawrakow commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ikawrakow commented May 26, 2025 •

edited

Loading