Skip to content

vulkan: add GATED_DELTA_NET op support#20333

Closed
ProgenyAlpha wants to merge 1 commit intoggml-org:masterfrom
ProgenyAlpha:vulkan-gated-delta-net
Closed

vulkan: add GATED_DELTA_NET op support#20333
ProgenyAlpha wants to merge 1 commit intoggml-org:masterfrom
ProgenyAlpha:vulkan-gated-delta-net

Conversation

@ProgenyAlpha
Copy link
Contributor

@ProgenyAlpha ProgenyAlpha commented Mar 10, 2026

Summary

Adds Vulkan compute shader implementation for the fused GATED_DELTA_NET op (merged in #19504), enabling the recurrence fusion for Vulkan users running delta net models (Qwen3.5, Qwen3-Coder-Next, etc).

  • Fused compute shader (gated_delta_net.comp) — one workgroup per (head, sequence), one thread per state column
  • Specialization constants for head size (32/64/128) and KDA mode — six pipeline variants compiled from one shader source
  • Full support: scalar gate, KDA vector gate, GQA broadcast (v_repeat), multi-token, permuted non-contiguous q/k
  • Matches the CUDA kernel's fused decay + rank-1 update pattern (S[i] = g * S[i] + k[i] * delta)
  • Uses beta strides (sb*) consistent with the merged op interface

Status

WIP / Draft — still needs real model benchmarking and multi-vendor driver testing.

  • 13/13 test-backend-ops passed (AMD Radeon 890M, RADV GFX1150)
  • Real model benchmarking (Qwen3.5 inference)
  • Testing on NVIDIA and Intel Vulkan drivers
  • Cache exp() in KDA path (currently computed twice per loop)
  • Verify local_size_x_id spec constant behavior across drivers

Mentions #19504, #14909

Implements the fused gated delta net recurrence as a Vulkan compute
shader with full support for scalar gate, KDA vector gate, GQA
broadcast, multi-token sequences, and permuted (non-contiguous) q/k
inputs. Specialization constants select head size (32/64/128) and
KDA mode at pipeline creation time.

Passes all 13 test-backend-ops cases on AMD Radeon 890M (RADV GFX1150).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ProgenyAlpha ProgenyAlpha requested a review from 0cc4m as a code owner March 10, 2026 08:00
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 10, 2026
@ProgenyAlpha ProgenyAlpha marked this pull request as draft March 10, 2026 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant