Skip to content

Graph parallel for dense Qwen-3.5 models#1331

Merged
ikawrakow merged 2 commits intomainfrom
ik/sm_graph_q35
Feb 27, 2026
Merged

Graph parallel for dense Qwen-3.5 models#1331
ikawrakow merged 2 commits intomainfrom
ik/sm_graph_q35

Conversation

@ikawrakow
Copy link
Owner

Tanks to PR #1329, it is now easy to add graph parallel (a.k.a., split mode graph) support for Qwen-3.5 (just the dense models for now).

As with graph parallel for Qwen3-Next (#1292), the recurrent attention layer are not split between GPUs. Nevertheless (and unlike Qwen3-Next), we do see a small performance gain compared to split mode layer even at zero context.

Here some sweep-bench results for Qwen-3.5-27B quantized with Q4_K_S on a 2x3090 system. We see about 10% better PP at zero context, and 25% at a context of 64k tokens. TG is ~4% better at zero context, and ~12% better at context of 64k.

PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
2048 128 0 1.642 1247.52 2.675 47.85
2048 128 2048 1.587 1290.67 2.615 48.95
2048 128 4096 1.581 1295.41 2.634 48.60
2048 128 6144 1.592 1286.09 2.646 48.37
2048 128 8192 1.605 1275.95 2.656 48.18
2048 128 10240 1.619 1264.84 2.671 47.92
2048 128 12288 1.631 1256.00 2.687 47.63
2048 128 14336 1.637 1250.88 2.695 47.49
2048 128 16384 1.647 1243.57 2.730 46.88
2048 128 18432 1.662 1232.52 2.737 46.77
2048 128 20480 1.675 1222.62 2.757 46.42
2048 128 22528 1.681 1218.03 2.759 46.39
2048 128 24576 1.698 1206.47 2.763 46.33
2048 128 26624 1.709 1198.49 2.770 46.21
2048 128 28672 1.724 1188.23 2.775 46.13
2048 128 30720 1.737 1179.23 2.787 45.92
2048 128 32768 1.748 1171.90 2.819 45.41
2048 128 34816 1.763 1161.74 2.824 45.32
2048 128 36864 1.771 1156.16 2.838 45.11
2048 128 38912 1.781 1150.21 2.842 45.04
2048 128 40960 1.796 1140.12 2.847 44.96
2048 128 43008 1.810 1131.71 2.860 44.76
2048 128 45056 1.819 1125.60 2.871 44.58
2048 128 47104 1.833 1117.12 2.878 44.48
2048 128 49152 1.843 1111.05 2.913 43.95
2048 128 51200 1.860 1101.15 2.914 43.93
2048 128 53248 1.872 1093.75 2.931 43.68
2048 128 55296 1.886 1085.67 2.938 43.57
2048 128 57344 1.901 1077.11 2.942 43.50
2048 128 59392 1.913 1070.39 2.951 43.38
2048 128 61440 1.927 1062.56 2.962 43.21
2048 128 63488 1.941 1055.29 2.971 43.08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant