ggml-webgpu: Enable NVIDIA self-hosted CI by reeselevine · Pull Request #22976 · ggml-org/llama.cpp

reeselevine · 2026-05-12T14:33:01Z

Overview

Enables the self-hosted NVIDIA CI for the WebGPU backend. In order to pass the CI, the NMSE threshold had to be relaxed, to avoid errors in many operations that write to f16 tensors. This includes operations like DIV, where even if the calculation is done in f32, casting to f16 causes slight drift, and SET_ROWS, where the operation is a straightahead cast. I found that the errors were usually between 2e-7 to 3e-7, just above the default 1e-7 threshold set by test-backend-ops.

Since the WebGPU backend ultimately lowers to Vulkan on this CI host, I investigated the difference in the SPIR-V code between the two, and found that while the instruction for the cast is the same (OpFConvert), the Vulkan backend adds Vulkan's "round-to-even" mode, which matches ggml-cpu's conversion from f32 to f16. However, WebGPU does not specify the rounding mode, leaving it implementation-defined, and Dawn currently does not expose rounding mode control to my knowledge (although interestingly, rounding mode is an example in a hypothetical extension for WGSL).

Ultimately, this means that the WebGPU backend may need slightly looser tolerances for floating-point operations. While that may mean some models on some devices are slightly off compared to other backends, that is already the case right now, so I think enabling this CI and making it an explicit decision for now is worth it. If Dawn or WebGPU ever adds support for rounding mode, we can revisit this.

The other related change in this PR is to clamp random values to the range [-10, 10] for EXP and EXPM1 f16 tensors, since another quirk of WebGPU is that some inf f32 values can be cast to the max f16 value (65504.0), due to the rules on discarding extra signficand bits, and the existing range was exposing this.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: yes, to investigate various rounding methods in ggml

taronaeo · 2026-05-14T07:01:35Z

+        ggml_backend_reg_t reg = ggml_backend_dev_backend_reg(ggml_backend_get_device(backend));
+        if (contains_f16 && strcmp(ggml_backend_reg_name(reg), "WebGPU") == 0) {
+            return std::max(max_nmse_err(), 1e-6);
+        }


You may want to reference the change to this PR otherwise future maintainers would wonder why WebGPU has a special case.

added. Can I get a reapproval?

Added comment referencing pull request for clarification.

* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.

github-actions Bot added devops improvements to build systems and github actions testing Everything test related ggml changes relating to the ggml tensor library for machine learning WebGPU labels May 12, 2026

reeselevine added 5 commits May 13, 2026 15:15

Enabel nvidia ci for webgpu

a6711b2

Address precision issues

6ff207e

fix placement

0b7d000

Relax more set_rows and div

27a4993

Try relaxing all f16

7112fc3

reeselevine force-pushed the enable-nvidia-ci branch from d90db22 to 7112fc3 Compare May 13, 2026 22:15

formatting and naming

5761ea2

reeselevine marked this pull request as ready for review May 14, 2026 03:17

reeselevine requested review from a team and ggerganov as code owners May 14, 2026 03:17

reeselevine requested a review from CISC May 14, 2026 03:18

taronaeo approved these changes May 14, 2026

View reviewed changes

CISC approved these changes May 14, 2026

View reviewed changes

Add comment explaining max_nmse_err logic

350ff80

Added comment referencing pull request for clarification.

CISC approved these changes May 14, 2026

View reviewed changes

reeselevine merged commit 834a243 into ggml-org:master May 14, 2026
40 of 49 checks passed

reeselevine mentioned this pull request May 28, 2026

ggml-webgpu: add q4_0/q8_0 SET_ROWS #23760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-webgpu: Enable NVIDIA self-hosted CI#22976

ggml-webgpu: Enable NVIDIA self-hosted CI#22976
reeselevine merged 7 commits into
ggml-org:masterfrom
reeselevine:enable-nvidia-ci

reeselevine commented May 12, 2026 •

edited

Loading

Uh oh!

taronaeo May 14, 2026

Uh oh!

reeselevine May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

reeselevine commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

taronaeo May 14, 2026

Choose a reason for hiding this comment

Uh oh!

reeselevine May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

reeselevine commented May 12, 2026 •

edited

Loading