Conversation
commit b3c6bf4b0450d8d452b934df27a0fb7cb53cd755
Author: Abhijit Ramesh <abhijitramesh2k@gmail.com>
Date: Mon Dec 1 18:29:00 2025 -0800
ggml webgpu: fix xielu parameter passing (#11)
The XIELU operation was incorrectly using static_cast to convert
float parameters to uint32_t, which converted numeric values instead
of preserving IEEE 754 bit patterns. This caused incorrect values
to be interpreted by the GPU shader.
* Use reinterpret_cast to preserve float bit patterns when passing
through uint32_t params buffer
* Update WGSL shader parameter types from u32 to f32
* Re-enable XIELU support (was disabled due to numerical issues)
Fixes NMSE test failures for XIELU operation on WebGPU backend.
commit 5ca9b5e49ea7cddc9ab7c8b43a11a9c76a4dff4a
Author: neha-ha <137219201+neha-ha@users.noreply.github.com>
Date: Tue Nov 18 12:17:00 2025 -0800
Refactored pipelines and workgroup calculations (#10)
* refactored pipelines
* refactored workgroup calculation
* removed commented out block of prior maps
* Clean up ceiling division pattern
---------
Co-authored-by: Neha Abbas <nehaabbas@eduroam-169-233-141-223.ucsc.edu>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>
Author: James Contini <jamescontini@gmail.com>
Date: Wed Oct 29 23:13:06 2025 -0700
formatted embed wgsl and ggml-webgpu.cpp
commit e1f6baea31645e5d96ad53664acae856f74b96f4
Author: James Contini <jamescontini@gmail.com>
Date: Wed Oct 29 23:08:37 2025 -0700
implemented REPL_Template support and removed bug in unary operators kernel
commit 8c70b8fece445cdc9a8c660dbddbf201e52da2bb
Author: James Contini <jamescontini@gmail.com>
Date: Wed Oct 15 16:14:20 2025 -0700
responded and dealt with PR comments
commit f9282c660c10dec4487d434549bdb707a9cd9f37
Author: James Contini <jamescontini@gmail.com>
Date: Sun Oct 12 13:41:41 2025 -0700
removed unnecesarry checking if node->src[1] exists for unary operators
commit 4cf28d7dec41c29186d66152735b244c5699f9dc
Author: James Contini <jamescontini@gmail.com>
Date: Sun Oct 12 13:32:45 2025 -0700
All operators (inlcluding xielu) working
commit 74c6add1761a59d2c2ff60b60e8ad3c8300f6d3e
Author: James Contini <jamescontini@gmail.com>
Date: Fri Oct 10 13:16:48 2025 -0700
fixed autoconfig
commit 362749910be4f0120c8ffb21ceddeb7d2c088e51
Author: James Contini <jamescontini@gmail.com>
Date: Fri Oct 10 13:10:46 2025 -0700
removed vestigial files
commit cb0858333785757804c5104e59c4981843207c16
Author: James Contini <jamescontini@gmail.com>
Date: Fri Oct 10 12:59:32 2025 -0700
abides by editor-config
commit 5360e2852a4b51197d7d67d0a5d42e908b02d7ed
Author: James Contini <jamescontini@gmail.com>
Date: Fri Oct 10 12:45:57 2025 -0700
rms_norm double declaration bug atoned
commit 7b09baa4aa53711be5a126043670cc182c78bfcd
Merge: 8a6ec843 74b8fc1
Author: James Contini <jamescontini@gmail.com>
Date: Fri Oct 10 11:50:03 2025 -0700
resolving merge conflicts
commit 8a6ec843a50ab82f8cef59b4558eb63f318ba02d
Author: James Contini <jamescontini@gmail.com>
Date: Wed Oct 8 18:06:47 2025 -0700
unary operators pass ggml tests
commit c3ae38278a2db236adc5912c9140e4f0d63f2c19
Author: James Contini <jamescontini@gmail.com>
Date: Wed Oct 1 16:22:40 2025 -0700
neg passes backend test
commit aa1c9b2f8877a405470ca56709c42a1fd43713de
Author: James Contini <jamescontini@gmail.com>
Date: Tue Sep 30 23:55:27 2025 -0700
neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though
Co-authored-by: James Contini <jamescontini@gmail.com>
Co-authored-by: Neha Abbas <neabbas@ucsc.edu>
Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>
Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern
Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support
… TRUNC, EXPM1, SOFTPLUS)
…ation context, device context, and buffer context, and move into backend context
…global state while moving Queue, pipelines, and buffers to per-thread state.
|
Analysis didn’t complete successfully. Explore Version Insights for details. |
1 similar comment
|
Analysis didn’t complete successfully. Explore Version Insights for details. |
1c71b76 to
57ead3c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mirrored from ggml-org/llama.cpp#18976
Right now, the WebGPU backend has a global
webgpu_contextstruct with all the information required to instantiate and run a WebGPU graph.We want to split up the
webgpu_contextstruct as follows:get_tensor_sharing_bufto global state, along with themutexwgpu::Device,wgpu::Adapter, andwgpu::Instanceto global state in the registration apiwebgpu_contexta per-thread struct, with pipelines and buffers as needed to run a graph (i.e. error buffers, timing buffers, debug buffers)