Skip to content

mtmd: fix gemma 4 audio rms norm eps#23815

Merged
ngxson merged 2 commits into
masterfrom
xsn/fix_gemma4a_esp
May 28, 2026
Merged

mtmd: fix gemma 4 audio rms norm eps#23815
ngxson merged 2 commits into
masterfrom
xsn/fix_gemma4a_esp

Conversation

@ngxson
Copy link
Copy Markdown
Contributor

@ngxson ngxson commented May 28, 2026

Overview

Seems to be a mistake from #21421

All gemma 4 models (text / vision / audio) use 1e-6 for norm eps:

While on GGUF conversion code, it's hard coded to 1e-5

I guess that's why we have trust issues with AI-generated code

Requirements

@ngxson ngxson requested review from a team and CISC as code owners May 28, 2026 11:55
@github-actions github-actions Bot added examples python python script changes labels May 28, 2026
Comment thread tools/mtmd/clip.cpp Outdated
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Copy link
Copy Markdown
Member

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, no \r\n mess this time? 😆

@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented May 28, 2026

hmm, something goes wrong with the CI?

@ngxson ngxson merged commit d6be315 into master May 28, 2026
22 of 29 checks passed
adrianhoehne pushed a commit to adrianhoehne/llama.cpp that referenced this pull request May 28, 2026
* mtmd: fix gemma 4 audio rms norm eps

* Update tools/mtmd/clip.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 28, 2026
* origin/master: (32 commits)
hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (ggml-org#23835)
mtmd-debug: add color and rainbow mode (ggml-org#23829)
mtmd: fix gemma 4 projector pre_norm (ggml-org#23822)
opencl: move backend info printing into its own function (ggml-org#23702)
ci : run ui publish on ubuntu-slim (ggml-org#23818)
ui: fix audio and video modality detection (ggml-org#23756)
ci : releases use Github-hosted builds for the UI (ggml-org#23823)
app : improve help output (ggml-org#23805)
mtmd: n_head_kv defaults to n_head (ggml-org#23782)
mtmd: fix gemma 4 audio rms norm eps (ggml-org#23815)
ci : change Vulkan builds to Release to reduce ccache (ggml-org#23820)
arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (ggml-org#23167)
test-llama-archs: fix table format [no release] (ggml-org#23810)
ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007)
mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (ggml-org#23729)
CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ggml-org#23227)
server: minor tweaks to use more cpp features (ggml-org#23785)
hexagon: minor refresh for HMX FA and MM (ggml-org#23796)
vulkan: fast path for walsh-hadamard transform (ggml-org#23687)
chat : add Granite 4.1 chat template (ggml-org#23518)
...
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* mtmd: fix gemma 4 audio rms norm eps

* Update tools/mtmd/clip.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@joseph777111
Copy link
Copy Markdown

joseph777111 commented Jun 2, 2026

@danielhanchen How would this affect the computation of Gemma-4's imatrices? Would having the norm_eps being hardcoded to 1e-5, instead of 1e-6 (like Gemma-4 expects) cause significant enough representational losses that recomputing Gemma-4's imatrices would be necessary to undo the representational losses? I'm curious if we should train another imatrix for Gemma-4 again with freshly converted Gemma-4 models. 🤔

Obviously this PR is already merged, and so you definitely want to use the latest version of llama.cpp (main). But, my posited question still stands. Because norm_eps seems pretty significant as to need to be configured correctly; otherwise, with norm_eps being 1e-5 (which is wrong for Gemma-4), we are losing accuracy for its norm_eps.

Update: I have been playing with a newly converted and quantized version of Gemma-4-E4B-it, and the model seems to be more capable with its responses (I haven't run into refusals - even with reasoning on, using a custom system prompt). This is huge as even with a newly converted and imatrix (Unsloth's) quantized version of Gemma-4-26B-4B-it, I still hit refusals currently - even though it appears that having the proper norm_eps (1e-6) seems to make the model behave better. 🤔

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
* mtmd: fix gemma 4 audio rms norm eps

* Update tools/mtmd/clip.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented Jun 2, 2026

imatrix doesn't cover multimodal input or mmproj, so it's unaffected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants