ggml webgpu: support for backend sampling#18880
ggml webgpu: support for backend sampling#18880reeselevine merged 23 commits intoggml-org:masterfrom
Conversation
Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern
Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
… TRUNC, EXPM1, SOFTPLUS)
CISC
left a comment
There was a problem hiding this comment.
Awesome!
The CI failure is just corrupt ccache, I have deleted the cache and will rerun it once CIs are done.
|
2026-01-16T21:14:19.3151243Z 35: Failing tests:
2026-01-16T21:14:19.3154644Z 35: LOG(type=f16,ne=[10,5,4,3])
2026-01-16T21:14:19.3158114Z 35: LOG(type=f16,ne=[7,1,5,3])
2026-01-16T21:14:19.3161950Z 35: Backend WebGPU: WebGPU: �[1;31mFAIL�[0m |
looks like it's numerical issues, let me try casting to f32 for the actual operation, shouldn't affect performance much if at all |
Could be a rounding issue on the result - for ggml-vulkan we've had to force RTNE to make a lot of these f16 tests pass. |
Ah interesting, if this doesn't fix it I'll look into that. I don't think WebGPU supports the same features though. |
|
Looks like the higher precision worked! |
* ggml webgpu: add SOFTPLUS unary operator Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern * ggml webgpu: add EXPM1 unary operator Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add FLOOR unary operator Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add CEIL unary operator Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add ROUND unary operator Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add TRUNC unary operator Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS) * Updates to webgpu get_memory * Add argmax * Add argmax,cumsum,sum,sum_rows * Add necessary CPY/GET_ROWS operators * Support for argsort using multi-pass strategy * Update set_rows for i32 indices, move to pre-wgsl * Port unary operators to pre-wgsl and support FILL * Implement PAD * Add support for top-k * clean up, scope pipeline init mutex * fix newline * Add support for log * Update LOG for better precision, and ops doc --------- Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>
The refactor to backend-sampler tests in #18753 broke WebGPU CI, because we hadn't implemented necessary operations yet and something about that refactor caused the tests to automatically run and fail on whatever backend is available. I suppose I could have just modified the tests to not run on the WebGPU backend, but instead I implemented all the operations necessary to run the sampler tests via WebGPU.
So, this PR adds the following: