Skip to content

models : dedup qwen35 graphs#19660

Merged
ggerganov merged 2 commits intomasterfrom
gg/qwen35-dedup
Feb 19, 2026
Merged

models : dedup qwen35 graphs#19660
ggerganov merged 2 commits intomasterfrom
gg/qwen35-dedup

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Feb 16, 2026

cont #19597

Use the new struct llm_build_delta_net_base to deduplicate the delta net graphs from Qwen35 models.

TODO:

  • Before merging, check that this does not break something after the Qwen 3.5 models are released
  • Try to avoid the explicit repeat of Q and K (again, requires the actual models for testing)

@github-actions github-actions bot added the model Model specific label Feb 16, 2026
@JamePeng
Copy link

The model is now online.
https://huggingface.co/Qwen/Qwen3.5-397B-A17B

Base automatically changed from gg/qwen3-dedup to master February 16, 2026 12:35
@ggerganov
Copy link
Member Author

Unless Qwen releases smaller models, I won't be able to test this. So if anyone gives it a try would appreciate feedback.

@ghost
Copy link

ghost commented Feb 17, 2026

I tested this with these two models:
https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF/tree/main/UD-Q4_K_XL
https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF/tree/main/MXFP4_MOE

The model works on afa6bfe (current master).
It crashes on cdd792c (this PR).
It works on cc45f2a (parent of cdd792c).

It crashes at this assert: src\models\delta-net-base.cpp:279: GGML_ASSERT(s->ne[0] == S_v && s->ne[1] == S_v && s->ne[2] == H_v && s->ne[3] == n_seqs) failed
The values in the assert are: s->ne[0] = 128 s->ne[1] = 8192 s->ne[2] = 1 s->ne[3] = 4 S_v = 128 H_v = 64 n_seqs = 4

@ggerganov
Copy link
Member Author

@matbrez Thank you for the feedback. Could you try again the latest version of this branch and confirm that it works?

@bunnynode
Copy link

bunnynode commented Feb 18, 2026

@ggerganov The model can load success with recent commit. But generate repeat 0 tokens.

image

./llama-server
-m /models/unsloth/Qwen3.5-397B-A17B-GGUF/MXFP4_MOE/Qwen3.5-397B-A17B-MXFP4_MOE-00001-of-00006.gguf
--jinja
--alias "qwen35"
--batch-size 2048
--ubatch-size 2048
--cache-type-k f16
--cache-type-v f16
-ngl 999 -fa on --temp 0.7 --top-k 20 --top-p 0.8 --min-p 0 --presence-penalty 1.0 -c 128000 -n 128000 --no-context-shift
--host 0.0.0.0
--port 8076
--offline
--chat-template-kwargs "{"enable_thinking": false}"

git show --stat

commit 3f6ca46 (HEAD -> pr-19660, origin/gg/qwen35-dedup)
Date: Mon Feb 16 09:51:21 2026 +0200

models : dedup qwen35 graphs

src/models/models.h | 58 +---------------------
src/models/qwen35.cpp | 454 ++++++++++++++++++--------------------------------------------------------------------------------------------------------------------------------------------------------
src/models/qwen35moe.cpp | 455 ++++++++++++++++++---------------------------------------------------------------------------------------------------------------------------------------------------------
3 files changed, 96 insertions(+), 871 deletions(-)

@ggerganov
Copy link
Member Author

@bunnynode And to make sure - the master works ok, correct?

@ggerganov
Copy link
Member Author

Please try with 1f5a7e4

@bunnynode
Copy link

bunnynode commented Feb 18, 2026

@ggerganov It works now with 1f5a7e4, on m3 ultra, the tg improved from 19.8t/s(master) to 21.7/s(this pr)

But far slow than mlx-lm (34t/s) , both q4

@ggerganov ggerganov marked this pull request as ready for review February 18, 2026 13:44
@ggerganov ggerganov requested a review from CISC as a code owner February 18, 2026 13:44
Copy link
Member

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

@ggerganov ggerganov merged commit 27326bf into master Feb 19, 2026
74 of 78 checks passed
@ggerganov ggerganov deleted the gg/qwen35-dedup branch February 19, 2026 06:17
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
* models : dedup qwen35 graphs

* cont : add missing sigmoid
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
* models : dedup qwen35 graphs

* cont : add missing sigmoid
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
* models : dedup qwen35 graphs

* cont : add missing sigmoid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants