UPSTREAM PR #18880: ggml webgpu: support for backend sampling by loci-dev · Pull Request #942 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-16T17:39:20Z

The refactor to backend-sampler tests in ggml-org/llama.cpp#18753 broke WebGPU CI, because we hadn't implemented necessary operations yet and something about that refactor caused the tests to automatically run and fail on whatever backend is available. I suppose I could have just modified the tests to not run on the WebGPU backend, but instead I implemented all the operations necessary to run the sampler tests via WebGPU.

So, this PR adds the following:

Support for CLAMP, FILL, PAD, ARGMAX, ARGSORT, TOP_K, CUMSUM, SUM, SUM_ROWS, as well as a few unary operators that weren't needed for sampling, but which were written by @abhijitramesh and which I included as part of my unary ops refactor to JIT compilation while supporting CLAMP.
Support for I32 indices for SET_ROWS and I32 variants of CPY.
Implements compilation of new operations in a JIT manner, so we only compile variants of shaders that are actually used by the models. WebGPU compilation always happens at runtime because it needs to know what platform it's running on, but many of the existing shaders are compiled at startup, which reduces flexibility and increases resource usage for no good reason (in my opinion). We're moving to this new approach generally.
Update ops support

Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern

Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

… TRUNC, EXPM1, SOFTPLUS)

loci-review · 2026-01-16T18:31:54Z

Explore the complete analysis inside the Version Insights

loci-review · 2026-01-16T20:21:11Z

Explore the complete analysis inside the Version Insights

loci-review · 2026-01-16T22:25:01Z

Explore the complete analysis inside the Version Insights

abhijitramesh and others added 20 commits December 18, 2025 15:27

ggml webgpu: add EXPM1 unary operator

ac51c62

Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

ggml webgpu: add FLOOR unary operator

e2a00cf

Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

ggml webgpu: add CEIL unary operator

267d3b4

Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

ggml webgpu: add ROUND unary operator

0e59487

Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

ggml webgpu: add TRUNC unary operator

4f358f7

Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND,…

c4c4f77

… TRUNC, EXPM1, SOFTPLUS)

Updates to webgpu get_memory

0ba2cc1

Merge branch 'ggml-org:master' into master

e7a0a59

Merge remote-tracking branch 'abhijit/abhijit/unary'

0db8291

Add argmax

c212d18

Add argmax,cumsum,sum,sum_rows

4c13d60

Add necessary CPY/GET_ROWS operators

be94a50

Support for argsort using multi-pass strategy

8fa1895

Update set_rows for i32 indices, move to pre-wgsl

8eaaf13

Port unary operators to pre-wgsl and support FILL

5f60b23

Implement PAD

233736c

Add support for top-k

b0588b1

clean up, scope pipeline init mutex

b33df67

Merge remote-tracking branch 'upstream/master' into sampling

df04f92

loci-dev temporarily deployed to PROD__AL_DEMO January 16, 2026 17:39 — with GitHub Actions Inactive

reeselevine added 2 commits January 16, 2026 11:08

fix newline

4071092

Add support for log

6c5cf57

loci-dev temporarily deployed to PROD__AL_DEMO January 16, 2026 19:33 — with GitHub Actions Inactive

Update LOG for better precision, and ops doc

9f06f67

loci-dev temporarily deployed to PROD__AL_DEMO January 16, 2026 21:36 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from bd89648 to 27b0027 Compare January 17, 2026 00:36

loci-dev force-pushed the main branch 30 times, most recently from 8587aee to b17a397 Compare January 27, 2026 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18880: ggml webgpu: support for backend sampling#942

UPSTREAM PR #18880: ggml webgpu: support for backend sampling#942
loci-dev wants to merge 23 commits intomainfrom
upstream-PR18880-branch_reeselevine-sampling

loci-dev commented Jan 16, 2026

Uh oh!

loci-review bot commented Jan 16, 2026

Uh oh!

loci-review bot commented Jan 16, 2026

Uh oh!

loci-review bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

loci-dev commented Jan 16, 2026

Uh oh!

loci-review bot commented Jan 16, 2026

Uh oh!

loci-review bot commented Jan 16, 2026

Uh oh!

loci-review bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants