Skip to content

Simplify delta-net#1335

Merged
ikawrakow merged 3 commits intomainfrom
ik/simplify_delta_net
Feb 28, 2026
Merged

Simplify delta-net#1335
ikawrakow merged 3 commits intomainfrom
ik/simplify_delta_net

Conversation

@ikawrakow
Copy link
Owner

With PR #1333 merged, and the fused delta-net implementation strictly better than autoregressive or chunked, there is no need to keep these two around. This PR removes them, and also removes the -fdn | --fused-delta-net command line argument.

@ubergarm
Copy link
Contributor

Did a quick sweep-bench on Qwen3.5-27B Dense and running well with -sm graph being faster than -sm layer.

sweep-bench-Qwen3 5-27B
👈 Details

-sm graph

CUDA_VISIBLE_DEVICES="0,1" \
./build/bin/llama-sweep-bench \
  -n 128 \
  --warmup-batch \
  --model "$model" \
  -c 69632 \
  -ger \
  -sm graph \
  -ngl 99 \
  -ub 4096 -b 4096 \
  --threads 1 \
--no-mmap
PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
4096 128 0 2.582 1586.35 3.310 38.67
4096 128 4096 2.631 1556.58 3.252 39.37
4096 128 8192 2.678 1529.51 3.288 38.93
4096 128 12288 2.728 1501.24 3.322 38.53
4096 128 16384 2.794 1465.98 3.381 37.86
4096 128 20480 2.846 1439.10 3.402 37.62
4096 128 24576 2.919 1403.19 3.425 37.38
4096 128 28672 2.983 1373.13 3.448 37.13
4096 128 32768 3.031 1351.58 3.497 36.61
4096 128 36864 3.089 1326.05 3.517 36.39
4096 128 40960 3.144 1302.94 3.540 36.16
4096 128 45056 3.208 1276.80 3.560 35.95
4096 128 49152 3.270 1252.78 3.607 35.49
4096 128 53248 3.320 1233.66 3.628 35.28
4096 128 57344 3.376 1213.32 3.651 35.05
4096 128 61440 3.435 1192.31 3.670 34.88
4096 128 65536 3.501 1169.89 3.717 34.44

-sm layer

CUDA_VISIBLE_DEVICES="0,1" \
./build/bin/llama-sweep-bench \
  -n 128 \
  --warmup-batch \
  --model "$model" \
  -c 69632 \
  -ger \
  --merge-qkv \
  -sm layer \
  -ngl 99 \
  -ub 4096 -b 4096 \
  --threads 1 \
  --no-mmap
PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
4096 128 0 2.970 1379.08 3.551 36.05
4096 128 4096 3.071 1333.65 3.533 36.23
4096 128 8192 3.161 1295.95 3.612 35.44
4096 128 12288 3.278 1249.61 3.646 35.11
4096 128 16384 3.388 1209.10 3.713 34.47
4096 128 20480 3.496 1171.73 3.752 34.11
4096 128 24576 3.608 1135.25 3.821 33.50
4096 128 28672 3.717 1102.02 3.858 33.18
4096 128 32768 3.822 1071.55 3.925 32.61
4096 128 36864 3.927 1042.93 3.959 32.33
4096 128 40960 4.030 1016.27 4.025 31.80
4096 128 45056 4.136 990.22 4.064 31.50
4096 128 49152 4.240 966.08 4.128 31.01
4096 128 53248 4.346 942.47 4.164 30.74
4096 128 57344 4.463 917.86 4.231 30.25
4096 128 61440 4.551 900.07 4.265 30.01
4096 128 65536 4.650 880.88 4.330 29.56

@ikawrakow ikawrakow merged commit e5fc302 into main Feb 28, 2026
ikawrakow added a commit that referenced this pull request Feb 28, 2026
ikawrakow added a commit that referenced this pull request Feb 28, 2026
* Revert "Simplify delta-net (#1335)"

This reverts commit e5fc302.

* Revert "Fused delta net 3 (#1333)"

This reverts commit 7b68353.
ikawrakow added a commit that referenced this pull request Feb 28, 2026
* Bring back fused delta net 3

* Remove autoregressive and chunking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants