Skip to content

UPSTREAM PR #18880: ggml webgpu: support for backend sampling#942

Open
loci-dev wants to merge 23 commits intomainfrom
upstream-PR18880-branch_reeselevine-sampling
Open

UPSTREAM PR #18880: ggml webgpu: support for backend sampling#942
loci-dev wants to merge 23 commits intomainfrom
upstream-PR18880-branch_reeselevine-sampling

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18880

The refactor to backend-sampler tests in ggml-org/llama.cpp#18753 broke WebGPU CI, because we hadn't implemented necessary operations yet and something about that refactor caused the tests to automatically run and fail on whatever backend is available. I suppose I could have just modified the tests to not run on the WebGPU backend, but instead I implemented all the operations necessary to run the sampler tests via WebGPU.

So, this PR adds the following:

  • Support for CLAMP, FILL, PAD, ARGMAX, ARGSORT, TOP_K, CUMSUM, SUM, SUM_ROWS, as well as a few unary operators that weren't needed for sampling, but which were written by @abhijitramesh and which I included as part of my unary ops refactor to JIT compilation while supporting CLAMP.
  • Support for I32 indices for SET_ROWS and I32 variants of CPY.
  • Implements compilation of new operations in a JIT manner, so we only compile variants of shaders that are actually used by the models. WebGPU compilation always happens at runtime because it needs to know what platform it's running on, but many of the existing shaders are compiled at startup, which reduces flexibility and increases resource usage for no good reason (in my opinion). We're moving to this new approach generally.
  • Update ops support

abhijitramesh and others added 20 commits December 18, 2025 15:27
Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32
precision for intermediate calculations to prevent f16 overflow.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
* Follow Vulkan backend numerical stability pattern
Implements EXPM1 (exp(x) - 1) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
Implements FLOOR (rounds down to nearest integer) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
Implements CEIL (rounds up to nearest integer) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
Implements ROUND (rounds to nearest integer) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
Implements TRUNC (truncates towards zero) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
@loci-review
Copy link

loci-review bot commented Jan 16, 2026

Explore the complete analysis inside the Version Insights

@loci-review
Copy link

loci-review bot commented Jan 16, 2026

Explore the complete analysis inside the Version Insights

@loci-review
Copy link

loci-review bot commented Jan 16, 2026

Explore the complete analysis inside the Version Insights

@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 8587aee to b17a397 Compare January 27, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants