mtmd: fix gemma 4 audio rms norm eps#23815
Conversation
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
hmm, something goes wrong with the CI? |
* mtmd: fix gemma 4 audio rms norm eps * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* origin/master: (32 commits) hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (ggml-org#23835) mtmd-debug: add color and rainbow mode (ggml-org#23829) mtmd: fix gemma 4 projector pre_norm (ggml-org#23822) opencl: move backend info printing into its own function (ggml-org#23702) ci : run ui publish on ubuntu-slim (ggml-org#23818) ui: fix audio and video modality detection (ggml-org#23756) ci : releases use Github-hosted builds for the UI (ggml-org#23823) app : improve help output (ggml-org#23805) mtmd: n_head_kv defaults to n_head (ggml-org#23782) mtmd: fix gemma 4 audio rms norm eps (ggml-org#23815) ci : change Vulkan builds to Release to reduce ccache (ggml-org#23820) arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (ggml-org#23167) test-llama-archs: fix table format [no release] (ggml-org#23810) ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (ggml-org#23729) CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ggml-org#23227) server: minor tweaks to use more cpp features (ggml-org#23785) hexagon: minor refresh for HMX FA and MM (ggml-org#23796) vulkan: fast path for walsh-hadamard transform (ggml-org#23687) chat : add Granite 4.1 chat template (ggml-org#23518) ...
* mtmd: fix gemma 4 audio rms norm eps * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
@danielhanchen How would this affect the computation of Gemma-4's imatrices? Would having the norm_eps being hardcoded to 1e-5, instead of 1e-6 (like Gemma-4 expects) cause significant enough representational losses that recomputing Gemma-4's imatrices would be necessary to undo the representational losses? I'm curious if we should train another imatrix for Gemma-4 again with freshly converted Gemma-4 models. 🤔 Obviously this PR is already merged, and so you definitely want to use the latest version of llama.cpp (main). But, my posited question still stands. Because norm_eps seems pretty significant as to need to be configured correctly; otherwise, with norm_eps being 1e-5 (which is wrong for Gemma-4), we are losing accuracy for its norm_eps. Update: I have been playing with a newly converted and quantized version of Gemma-4-E4B-it, and the model seems to be more capable with its responses (I haven't run into refusals - even with reasoning on, using a custom system prompt). This is huge as even with a newly converted and imatrix (Unsloth's) quantized version of Gemma-4-26B-4B-it, I still hit refusals currently - even though it appears that having the proper norm_eps (1e-6) seems to make the model behave better. 🤔 |
* mtmd: fix gemma 4 audio rms norm eps * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
imatrix doesn't cover multimodal input or mmproj, so it's unaffected |
Overview
Seems to be a mistake from #21421
All gemma 4 models (text / vision / audio) use 1e-6 for norm eps:
While on GGUF conversion code, it's hard coded to 1e-5
I guess that's why we have trust issues with AI-generated code
Requirements