models : dedup qwen35 graphs by ggerganov · Pull Request #19660 · ggml-org/llama.cpp

ggerganov · 2026-02-16T07:53:28Z

cont #19597

Use the new struct llm_build_delta_net_base to deduplicate the delta net graphs from Qwen35 models.

TODO:

Before merging, check that this does not break something after the Qwen 3.5 models are released
Try to avoid the explicit repeat of Q and K (again, requires the actual models for testing)

JamePeng · 2026-02-16T09:46:45Z

The model is now online.
https://huggingface.co/Qwen/Qwen3.5-397B-A17B

ggerganov · 2026-02-17T09:12:37Z

Unless Qwen releases smaller models, I won't be able to test this. So if anyone gives it a try would appreciate feedback.

ghost · 2026-02-17T13:54:11Z

I tested this with these two models:
https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF/tree/main/UD-Q4_K_XL
https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF/tree/main/MXFP4_MOE

The model works on afa6bfe (current master).
It crashes on cdd792c (this PR).
It works on cc45f2a (parent of cdd792c).

It crashes at this assert: src\models\delta-net-base.cpp:279: GGML_ASSERT(s->ne[0] == S_v && s->ne[1] == S_v && s->ne[2] == H_v && s->ne[3] == n_seqs) failed
The values in the assert are: s->ne[0] = 128 s->ne[1] = 8192 s->ne[2] = 1 s->ne[3] = 4 S_v = 128 H_v = 64 n_seqs = 4

ggerganov · 2026-02-18T11:29:31Z

@matbrez Thank you for the feedback. Could you try again the latest version of this branch and confirm that it works?

bunnynode · 2026-02-18T11:53:57Z

@ggerganov The model can load success with recent commit. But generate repeat 0 tokens.

./llama-server
-m /models/unsloth/Qwen3.5-397B-A17B-GGUF/MXFP4_MOE/Qwen3.5-397B-A17B-MXFP4_MOE-00001-of-00006.gguf
--jinja
--alias "qwen35"
--batch-size 2048
--ubatch-size 2048
--cache-type-k f16
--cache-type-v f16
-ngl 999 -fa on --temp 0.7 --top-k 20 --top-p 0.8 --min-p 0 --presence-penalty 1.0 -c 128000 -n 128000 --no-context-shift
--host 0.0.0.0
--port 8076
--offline
--chat-template-kwargs "{"enable_thinking": false}"

git show --stat

commit 3f6ca46 (HEAD -> pr-19660, origin/gg/qwen35-dedup)
Date: Mon Feb 16 09:51:21 2026 +0200

models : dedup qwen35 graphs

src/models/models.h | 58 +---------------------
src/models/qwen35.cpp | 454 ++++++++++++++++++--------------------------------------------------------------------------------------------------------------------------------------------------------
src/models/qwen35moe.cpp | 455 ++++++++++++++++++---------------------------------------------------------------------------------------------------------------------------------------------------------
3 files changed, 96 insertions(+), 871 deletions(-)

ggerganov · 2026-02-18T12:15:09Z

@bunnynode And to make sure - the master works ok, correct?

ggerganov · 2026-02-18T12:17:00Z

Please try with 1f5a7e4

bunnynode · 2026-02-18T12:27:01Z

@ggerganov It works now with 1f5a7e4, on m3 ultra, the tg improved from 19.8t/s(master) to 21.7/s(this pr)

But far slow than mlx-lm (34t/s) , both q4

CISC

Very nice.

* models : dedup qwen35 graphs * cont : add missing sigmoid

github-actions bot added the model Model specific label Feb 16, 2026

Base automatically changed from gg/qwen3-dedup to master February 16, 2026 12:35

ggerganov force-pushed the gg/qwen35-dedup branch from 4d51055 to cdd792c Compare February 16, 2026 12:52

models : dedup qwen35 graphs

3f6ca46

ggerganov force-pushed the gg/qwen35-dedup branch from cdd792c to 3f6ca46 Compare February 18, 2026 11:29

cont : add missing sigmoid

1f5a7e4

ggerganov marked this pull request as ready for review February 18, 2026 13:44

ggerganov requested a review from CISC as a code owner February 18, 2026 13:44

CISC approved these changes Feb 18, 2026

View reviewed changes

ggerganov merged commit 27326bf into master Feb 19, 2026
74 of 78 checks passed

ggerganov deleted the gg/qwen35-dedup branch February 19, 2026 06:17

ggerganov mentioned this pull request Feb 19, 2026

models : fix qwen3.5 beta/gate shapes #19730

Merged

liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026

models : dedup qwen35 graphs (ggml-org#19660)

efd99ab

* models : dedup qwen35 graphs * cont : add missing sigmoid

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026

models : dedup qwen35 graphs (ggml-org#19660)

56a2eb3

* models : dedup qwen35 graphs * cont : add missing sigmoid

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026

models : dedup qwen35 graphs (ggml-org#19660)

7005e22

* models : dedup qwen35 graphs * cont : add missing sigmoid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models : dedup qwen35 graphs#19660

models : dedup qwen35 graphs#19660
ggerganov merged 2 commits intomasterfrom
gg/qwen35-dedup

ggerganov commented Feb 16, 2026 •

edited

Loading

Uh oh!

JamePeng commented Feb 16, 2026

Uh oh!

ggerganov commented Feb 17, 2026

Uh oh!

ghost commented Feb 17, 2026

Uh oh!

ggerganov commented Feb 18, 2026

Uh oh!

bunnynode commented Feb 18, 2026 •

edited

Loading

Uh oh!

ggerganov commented Feb 18, 2026

Uh oh!

ggerganov commented Feb 18, 2026

Uh oh!

bunnynode commented Feb 18, 2026 •

edited

Loading

Uh oh!

CISC left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ggerganov commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamePeng commented Feb 16, 2026

Uh oh!

ggerganov commented Feb 17, 2026

Uh oh!

ghost commented Feb 17, 2026

Uh oh!

ggerganov commented Feb 18, 2026

Uh oh!

bunnynode commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Feb 18, 2026

Uh oh!

ggerganov commented Feb 18, 2026

Uh oh!

bunnynode commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ggerganov commented Feb 16, 2026 •

edited

Loading

bunnynode commented Feb 18, 2026 •

edited

Loading

bunnynode commented Feb 18, 2026 •

edited

Loading