Graph parallel for dense Qwen-3.5 models by ikawrakow · Pull Request #1331 · ikawrakow/ik_llama.cpp

ikawrakow · 2026-02-26T15:37:33Z

Tanks to PR #1329, it is now easy to add graph parallel (a.k.a., split mode graph) support for Qwen-3.5 (just the dense models for now).

As with graph parallel for Qwen3-Next (#1292), the recurrent attention layer are not split between GPUs. Nevertheless (and unlike Qwen3-Next), we do see a small performance gain compared to split mode layer even at zero context.

Here some sweep-bench results for Qwen-3.5-27B quantized with Q4_K_S on a 2x3090 system. We see about 10% better PP at zero context, and 25% at a context of 64k tokens. TG is ~4% better at zero context, and ~12% better at context of 64k.

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
2048	128	0	1.642	1247.52	2.675	47.85
2048	128	2048	1.587	1290.67	2.615	48.95
2048	128	4096	1.581	1295.41	2.634	48.60
2048	128	6144	1.592	1286.09	2.646	48.37
2048	128	8192	1.605	1275.95	2.656	48.18
2048	128	10240	1.619	1264.84	2.671	47.92
2048	128	12288	1.631	1256.00	2.687	47.63
2048	128	14336	1.637	1250.88	2.695	47.49
2048	128	16384	1.647	1243.57	2.730	46.88
2048	128	18432	1.662	1232.52	2.737	46.77
2048	128	20480	1.675	1222.62	2.757	46.42
2048	128	22528	1.681	1218.03	2.759	46.39
2048	128	24576	1.698	1206.47	2.763	46.33
2048	128	26624	1.709	1198.49	2.770	46.21
2048	128	28672	1.724	1188.23	2.775	46.13
2048	128	30720	1.737	1179.23	2.787	45.92
2048	128	32768	1.748	1171.90	2.819	45.41
2048	128	34816	1.763	1161.74	2.824	45.32
2048	128	36864	1.771	1156.16	2.838	45.11
2048	128	38912	1.781	1150.21	2.842	45.04
2048	128	40960	1.796	1140.12	2.847	44.96
2048	128	43008	1.810	1131.71	2.860	44.76
2048	128	45056	1.819	1125.60	2.871	44.58
2048	128	47104	1.833	1117.12	2.878	44.48
2048	128	49152	1.843	1111.05	2.913	43.95
2048	128	51200	1.860	1101.15	2.914	43.93
2048	128	53248	1.872	1093.75	2.931	43.68
2048	128	55296	1.886	1085.67	2.938	43.57
2048	128	57344	1.901	1077.11	2.942	43.50
2048	128	59392	1.913	1070.39	2.951	43.38
2048	128	61440	1.927	1062.56	2.962	43.21
2048	128	63488	1.941	1055.29	2.971	43.08

ikawrakow added 2 commits February 26, 2026 15:27

Graph parallel for idense Qwen-3.5 models

7292e23

Cleanup

fc46213

ikawrakow merged commit 1e6d36b into main Feb 27, 2026

ikawrakow mentioned this pull request Mar 9, 2026

Full graph parallel for Qwen3.5 (dense and MoE) #1388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph parallel for dense Qwen-3.5 models#1331

Graph parallel for dense Qwen-3.5 models#1331
ikawrakow merged 2 commits intomainfrom
ik/sm_graph_q35

ikawrakow commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ikawrakow commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant