Skip to content

opencl: use flat variants of gemv for very large M#24006

Merged
lhez merged 1 commit into
ggml-org:masterfrom
qualcomm:lh/gemv-large-m-reroute
Jun 2, 2026
Merged

opencl: use flat variants of gemv for very large M#24006
lhez merged 1 commit into
ggml-org:masterfrom
qualcomm:lh/gemv-large-m-reroute

Conversation

@lhez
Copy link
Copy Markdown
Contributor

@lhez lhez commented Jun 2, 2026

Overview

After some profiling, it turns out that gemv-noshuffle kernels for Q4_K and Q6_K are slow with very large M (those seen in vocab). On the contrary, the flat variants are faster. This PR uses flat GEMV variants for such large M.

Additional information

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, asked Claude to profile gemma-4 and Qwen3.5 non-MoE models and identified this.

@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jun 2, 2026
@lhez lhez marked this pull request as ready for review June 2, 2026 06:42
@lhez lhez requested a review from a team as a code owner June 2, 2026 06:42
@lhez
Copy link
Copy Markdown
Contributor Author

lhez commented Jun 2, 2026

@ggml-org/maintainers Can I please get another approval?

@lhez lhez merged commit 63e66fd into ggml-org:master Jun 2, 2026
25 of 26 checks passed
arichiardi pushed a commit to arichiardi/llama.cpp that referenced this pull request Jun 2, 2026
jimbothigpen pushed a commit to jimbothigpen/llama.cpp that referenced this pull request Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants