Skip to content

opencl: fix crash when warming up MoE on Adreno#22876

Merged
lhez merged 1 commit into
ggml-org:masterfrom
qualcomm:lh/fix-moe-warmup-crash
May 13, 2026
Merged

opencl: fix crash when warming up MoE on Adreno#22876
lhez merged 1 commit into
ggml-org:masterfrom
qualcomm:lh/fix-moe-warmup-crash

Conversation

@lhez
Copy link
Copy Markdown
Contributor

@lhez lhez commented May 9, 2026

Overview

When warming up MoE models on Adreno (in this case, gpt-oss-20b-mxfp4), it crashes with invalid workgroup size.

This is because the warmup run ne20 = 128 (use all experts) and the workgroup size ends up exceeding the max workgroup size of 1024. During a normal run, ne20 is the number of used experts and the workgroup size does not exceed the max workgroup size.

size_t histogram_global_size[] = {(size_t)(((ne21 + 63) / 64) * 64), static_cast<size_t>(ne20), 1};
size_t histogram_local_size[] = {64, static_cast<size_t>(ne20), 1};
backend_ctx->enqueue_ndrange_kernel(kernel, 3, histogram_global_size, histogram_local_size, src);

Additional information

Requirements

@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels May 9, 2026
@lhez lhez marked this pull request as ready for review May 12, 2026 06:35
@lhez lhez requested a review from a team as a code owner May 12, 2026 06:35
@lhez
Copy link
Copy Markdown
Contributor Author

lhez commented May 13, 2026

@ggml-org/maintainers Can I get another approval?

@lhez lhez merged commit 1e4579f into ggml-org:master May 13, 2026
77 of 78 checks passed
xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 13, 2026
dandm1 pushed a commit to dandm1/llama.cpp that referenced this pull request May 16, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request May 19, 2026
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
carlosfundora pushed a commit to carlosfundora/llama.cpp-1-bit-turbo that referenced this pull request May 24, 2026
winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants