UPSTREAM PR #18976: Split shared state (webgpu_context) into global state and per-thread state by loci-dev · Pull Request #988 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-20T22:38:23Z

Right now, the WebGPU backend has a global webgpu_context struct with all the information required to instantiate and run a WebGPU graph.

We want to split up the webgpu_context struct as follows:

Move get_tensor_sharing_buf to global state, along with the mutex
Move WebGPU handles like wgpu::Device, wgpu::Adapter, and wgpu::Instance to global state in the registration api
Move WebGPU device capabilities to global state
Make webgpu_context a per-thread struct, with pipelines and buffers as needed to run a graph (i.e. error buffers, timing buffers, debug buffers)

commit b3c6bf4b0450d8d452b934df27a0fb7cb53cd755 Author: Abhijit Ramesh <abhijitramesh2k@gmail.com> Date: Mon Dec 1 18:29:00 2025 -0800 ggml webgpu: fix xielu parameter passing (#11) The XIELU operation was incorrectly using static_cast to convert float parameters to uint32_t, which converted numeric values instead of preserving IEEE 754 bit patterns. This caused incorrect values to be interpreted by the GPU shader. * Use reinterpret_cast to preserve float bit patterns when passing through uint32_t params buffer * Update WGSL shader parameter types from u32 to f32 * Re-enable XIELU support (was disabled due to numerical issues) Fixes NMSE test failures for XIELU operation on WebGPU backend. commit 5ca9b5e49ea7cddc9ab7c8b43a11a9c76a4dff4a Author: neha-ha <137219201+neha-ha@users.noreply.github.com> Date: Tue Nov 18 12:17:00 2025 -0800 Refactored pipelines and workgroup calculations (#10) * refactored pipelines * refactored workgroup calculation * removed commented out block of prior maps * Clean up ceiling division pattern --------- Co-authored-by: Neha Abbas <nehaabbas@eduroam-169-233-141-223.ucsc.edu> Co-authored-by: Reese Levine <reeselevine1@gmail.com> Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 29 23:13:06 2025 -0700 formatted embed wgsl and ggml-webgpu.cpp commit e1f6baea31645e5d96ad53664acae856f74b96f4 Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 29 23:08:37 2025 -0700 implemented REPL_Template support and removed bug in unary operators kernel commit 8c70b8fece445cdc9a8c660dbddbf201e52da2bb Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 15 16:14:20 2025 -0700 responded and dealt with PR comments commit f9282c660c10dec4487d434549bdb707a9cd9f37 Author: James Contini <jamescontini@gmail.com> Date: Sun Oct 12 13:41:41 2025 -0700 removed unnecesarry checking if node->src[1] exists for unary operators commit 4cf28d7dec41c29186d66152735b244c5699f9dc Author: James Contini <jamescontini@gmail.com> Date: Sun Oct 12 13:32:45 2025 -0700 All operators (inlcluding xielu) working commit 74c6add1761a59d2c2ff60b60e8ad3c8300f6d3e Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 13:16:48 2025 -0700 fixed autoconfig commit 362749910be4f0120c8ffb21ceddeb7d2c088e51 Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 13:10:46 2025 -0700 removed vestigial files commit cb0858333785757804c5104e59c4981843207c16 Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 12:59:32 2025 -0700 abides by editor-config commit 5360e2852a4b51197d7d67d0a5d42e908b02d7ed Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 12:45:57 2025 -0700 rms_norm double declaration bug atoned commit 7b09baa4aa53711be5a126043670cc182c78bfcd Merge: 8a6ec843 74b8fc1 Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 11:50:03 2025 -0700 resolving merge conflicts commit 8a6ec843a50ab82f8cef59b4558eb63f318ba02d Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 8 18:06:47 2025 -0700 unary operators pass ggml tests commit c3ae38278a2db236adc5912c9140e4f0d63f2c19 Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 1 16:22:40 2025 -0700 neg passes backend test commit aa1c9b2f8877a405470ca56709c42a1fd43713de Author: James Contini <jamescontini@gmail.com> Date: Tue Sep 30 23:55:27 2025 -0700 neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though Co-authored-by: James Contini <jamescontini@gmail.com> Co-authored-by: Neha Abbas <neabbas@ucsc.edu> Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>

Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern

Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

… TRUNC, EXPM1, SOFTPLUS)

…ation context, device context, and buffer context, and move into backend context

…global state while moving Queue, pipelines, and buffers to per-thread state.

loci-review · 2026-01-20T23:51:02Z

Analysis didn’t complete successfully. Explore Version Insights for details.

loci-review · 2026-01-20T23:51:11Z

Analysis didn’t complete successfully. Explore Version Insights for details.

reeselevine and others added 27 commits December 3, 2025 15:01

Remove extra code and format

893e6af

Add ops documentation (finally)

417fa79

ggml webgpu: add EXPM1 unary operator

ac51c62

Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

ggml webgpu: add FLOOR unary operator

e2a00cf

Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

ggml webgpu: add CEIL unary operator

267d3b4

Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

resolve merge conflict

5a1b566

ggml webgpu: add ROUND unary operator

0e59487

Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

ggml webgpu: add TRUNC unary operator

4f358f7

Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support

docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND,…

c4c4f77

… TRUNC, EXPM1, SOFTPLUS)

Updates to webgpu get_memory

0ba2cc1

Merge branch 'ggml-org:master' into master

e7a0a59

Merge remote-tracking branch 'abhijit/abhijit/unary'

0db8291

Resolve merge

c74424b

Move shared state (webgpu_context) and device creation out of registr…

08f7937

…ation context, device context, and buffer context, and move into backend context

Small cleanup

c0373f5

Merge

82b52ee

Move Instance, Device, Adapter, Device creation, and capabilities to …

51b392c

…global state while moving Queue, pipelines, and buffers to per-thread state.

Cleanups

5f64575

More cleanup

2e4c9f8

Move staging_buf mutex to global context

1dd5676

Resolve merge

f979670

Resolve merge

07838a1

Resolve merge

af9c613

Resolve merge

c036856

Resolve merge

94101f1

loci-dev temporarily deployed to PROD__AL_DEMO January 20, 2026 23:37 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 30 times, most recently from 1c71b76 to 57ead3c Compare January 29, 2026 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18976: Split shared state (webgpu_context) into global state and per-thread state#988

UPSTREAM PR #18976: Split shared state (webgpu_context) into global state and per-thread state#988
loci-dev wants to merge 28 commits intomainfrom
upstream-PR18976-branch_nikhilJain17-nikhilJain17/device-init

loci-dev commented Jan 20, 2026

Uh oh!

loci-review bot commented Jan 20, 2026

Uh oh!

loci-review bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

loci-dev commented Jan 20, 2026

Uh oh!

loci-review bot commented Jan 20, 2026

Uh oh!

loci-review bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants