Skip to content

ggml-webgpu: Support GPU profiling beyond the maximum query count#22995

Merged
reeselevine merged 1 commit into
ggml-org:masterfrom
yomaytk:new-flush-gpu-profile
May 13, 2026
Merged

ggml-webgpu: Support GPU profiling beyond the maximum query count#22995
reeselevine merged 1 commit into
ggml-org:masterfrom
yomaytk:new-flush-gpu-profile

Conversation

@yomaytk
Copy link
Copy Markdown
Contributor

@yomaytk yomaytk commented May 13, 2026

Overview

This PR fixes the bug described in the Additional Information section.

  • Flush timestamp slots and reset the timestamp state when the number of used timestamp slots is nearly full.

I confirmed that GPU profiles can now be collected for Qwen3.5-35B-A3B-GGUF and several other models (Qwen3.5, Qwen3.6, Gemma 4, and Llama 3).

Additional Information

I noticed that unsloth/Qwen3.5-35B-A3B-GGUF overflowed the timestamp QuerySet when I tried to collect a GPU profile:

llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:571: GGML_ASSERT(ctx->profile_timestamp_query_count + 2 <= WEBGPU_MAX_PROFILE_QUERY_COUNT) failed

This suggests that we need logic to allow profile collection even when a model requires more than 4096 timestamp queries.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - I used AI to investigate WebGPU specification

@yomaytk yomaytk requested a review from a team as a code owner May 13, 2026 00:42
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning WebGPU labels May 13, 2026
@reeselevine
Copy link
Copy Markdown
Contributor

thanks, this is a nice clean addition!

@reeselevine reeselevine requested a review from CISC May 13, 2026 16:30
@reeselevine reeselevine merged commit 527045b into ggml-org:master May 13, 2026
46 checks passed
xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 13, 2026
@yomaytk yomaytk deleted the new-flush-gpu-profile branch May 18, 2026 13:18
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request May 19, 2026
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
carlosfundora pushed a commit to carlosfundora/llama.cpp-1-bit-turbo that referenced this pull request May 24, 2026
winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning WebGPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants