fix(graph): remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) by ynankani · Pull Request #22421 · ggml-org/llama.cpp

ynankani · 2026-04-27T07:02:11Z

Overview

Observed that build_attn present in llama-graph already applies NVFP4 per tensor scale (wo_s) via
llama-graph.cpp (build_lora_mm(wo, cur, wo_s) or explicit wo_s mul).
Also observed these model builders(qwen3, qwen3moe, llama) are also multiplied the
result by wo_s again, so wo_s was applied twice whenever companion
blk.*.attn_output.scale tensors were present.
That crushed the attention residual (roughly wo_s^2 per layer) and broke
NVFP4 GGUFs for LLM_ARCH_QWEN3, LLM_ARCH_QWEN3MOE, and LLM_ARCH_LLAMA /
LLM_ARCH_LLAMA_EMBED.
Remove the redundant ggml_mul blocks in:

src/models/qwen3.cpp
src/models/qwen3moe.cpp
src/models/llama.cpp
Non-NVFP4 GGUFs keep wo_s == nullptr, so behavior is unchanged there.

Additional information

This issue was observed when running inference on https://huggingface.co/nvidia/Qwen3-8B-NVFP4 converted GGUF model

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: No

Signed-off-by: Yash Nankani <ynankani@nvidia.com>

CISC

Ooops, nice catch. :)

…gml-org#22421) Signed-off-by: Yash Nankani <ynankani@nvidia.com>

fix(graph): remove duplicate wo_s scale after build_attn (Qwen1, LLaMA)

01fef2f

Signed-off-by: Yash Nankani <ynankani@nvidia.com>

ynankani requested a review from CISC as a code owner April 27, 2026 07:02

ynankani changed the title ~~fix(graph): remove duplicate wo_s scale after build_attn (Qwen1, LLaMA)~~ fix(graph): remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) Apr 27, 2026

github-actions Bot added the model Model specific label Apr 27, 2026

CISC approved these changes Apr 27, 2026

View reviewed changes

CISC added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 27, 2026

danbev approved these changes Apr 27, 2026

View reviewed changes

CISC merged commit 0f1bb60 into ggml-org:master Apr 27, 2026
44 of 46 checks passed

IntelNav pushed a commit to IntelNav/llama.cpp that referenced this pull request Apr 29, 2026

model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (g…

d0acafa

…gml-org#22421) Signed-off-by: Yash Nankani <ynankani@nvidia.com>

IntelNav pushed a commit to IntelNav/llama.cpp that referenced this pull request Apr 29, 2026

model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (g…

8046fc0

…gml-org#22421) Signed-off-by: Yash Nankani <ynankani@nvidia.com>

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (g…

bffd151

…gml-org#22421) Signed-off-by: Yash Nankani <ynankani@nvidia.com>

samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request May 6, 2026

model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (g…

65fb851

…gml-org#22421) Signed-off-by: Yash Nankani <ynankani@nvidia.com>

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (g…

afb0ca8

…gml-org#22421) Signed-off-by: Yash Nankani <ynankani@nvidia.com>

meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026

model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (g…

ec3a09a

…gml-org#22421) Signed-off-by: Yash Nankani <ynankani@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(graph): remove duplicate wo_s scale after build_attn (Qwen3, LLaMA)#22421

fix(graph): remove duplicate wo_s scale after build_attn (Qwen3, LLaMA)#22421
CISC merged 1 commit into
ggml-org:masterfrom
ynankani:ynankani/qwen3_nvfp4_fix

ynankani commented Apr 27, 2026

Uh oh!

CISC left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ynankani commented Apr 27, 2026

Overview

Additional information

Requirements

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants