ggml-webgpu: Enable NVIDIA self-hosted CI#22976
Merged
Merged
Conversation
d90db22 to
7112fc3
Compare
taronaeo
approved these changes
May 14, 2026
Comment on lines
+1131
to
+1134
| ggml_backend_reg_t reg = ggml_backend_dev_backend_reg(ggml_backend_get_device(backend)); | ||
| if (contains_f16 && strcmp(ggml_backend_reg_name(reg), "WebGPU") == 0) { | ||
| return std::max(max_nmse_err(), 1e-6); | ||
| } |
Member
There was a problem hiding this comment.
You may want to reference the change to this PR otherwise future maintainers would wonder why WebGPU has a special case.
Contributor
Author
There was a problem hiding this comment.
added. Can I get a reapproval?
CISC
approved these changes
May 14, 2026
Added comment referencing pull request for clarification.
CISC
approved these changes
May 14, 2026
xxmustafacooTR
pushed a commit
to xxPlayground/llama-cpp-turboquant
that referenced
this pull request
May 15, 2026
* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.
dandm1
pushed a commit
to dandm1/llama.cpp
that referenced
this pull request
May 16, 2026
* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.
rsenthilkumar6
pushed a commit
to rsenthilkumar6/llama.cpp
that referenced
this pull request
May 19, 2026
* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.
ArberSephirotheca
pushed a commit
to ArberSephirotheca/llama.cpp
that referenced
this pull request
May 19, 2026
* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.
baramofme
pushed a commit
to baramofme/llama-cpp-turboquant
that referenced
this pull request
May 23, 2026
* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.
winstonma
pushed a commit
to winstonma/llama.cpp
that referenced
this pull request
May 27, 2026
* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.
fewtarius
pushed a commit
to fewtarius/llama.cpp
that referenced
this pull request
May 30, 2026
* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Enables the self-hosted NVIDIA CI for the WebGPU backend. In order to pass the CI, the NMSE threshold had to be relaxed, to avoid errors in many operations that write to
f16tensors. This includes operations likeDIV, where even if the calculation is done inf32, casting tof16causes slight drift, andSET_ROWS, where the operation is a straightahead cast. I found that the errors were usually between2e-7to3e-7, just above the default1e-7threshold set bytest-backend-ops.Since the WebGPU backend ultimately lowers to Vulkan on this CI host, I investigated the difference in the SPIR-V code between the two, and found that while the instruction for the cast is the same (
OpFConvert), the Vulkan backend adds Vulkan's "round-to-even" mode, which matches ggml-cpu's conversion fromf32tof16. However, WebGPU does not specify the rounding mode, leaving it implementation-defined, and Dawn currently does not expose rounding mode control to my knowledge (although interestingly, rounding mode is an example in a hypothetical extension for WGSL).Ultimately, this means that the WebGPU backend may need slightly looser tolerances for floating-point operations. While that may mean some models on some devices are slightly off compared to other backends, that is already the case right now, so I think enabling this CI and making it an explicit decision for now is worth it. If Dawn or WebGPU ever adds support for rounding mode, we can revisit this.
The other related change in this PR is to clamp random values to the range
[-10, 10]forEXPandEXPM1f16tensors, since another quirk of WebGPU is that someinff32values can be cast to the maxf16value (65504.0), due to the rules on discarding extra signficand bits, and the existing range was exposing this.Requirements