ggml: add GATED_DELTA_NET op by am17an · Pull Request #19504 · ggml-org/llama.cpp

am17an · 2026-02-11T07:09:29Z

Add CPU/CUDA impl for GATED_DELTA_NET used in qwen3next and a lot of upcoming recent attention models. This is a basic vector impl and not the chunking impl, although this should work for n_tokens > 1 as a reference implementation. I tested this vs build_delta_net_autoregressive and the results were good. I plan to add the chunked implementation for CPU and CUDA.

master:

model	size	params	backend	threads	fa	test	t/s
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CPU	16	1	tg32	4.77 ± 0.03
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CPU	16	1	tg32 @ d1024	4.55 ± 0.13

sched_reserve: graph nodes = 14990 (with bs=512), 6242 (with bs=1)

ggml_op_gated_delta_net added to the qwen3next graph (not added in the PR)

model	size	params	backend	threads	fa	test	t/s
qwen35moe ?B Q4_K - Small	18.55 GiB	34.66 B	CPU	16	1	tg32	11.08 ± 0.20
qwen35moe ?B Q4_K - Small	18.55 GiB	34.66 B	CPU	16	1	tg32 @ d1024	11.21 ± 0.07

sched_reserve: graph nodes = 14990 (with bs=512), 5342 (with bs=1)

ggerganov · 2026-02-11T07:24:45Z

I think it is too early to implement the dedicated delta net ops. There are still many things to optimize in the existing implementation (you can keep track of my progress in #19375). After that we have to consolidate the KDA version of the delta net (#18792). Btw the l2 norm should not be part of this OP - fixed in my branch. Also not sure how to handle the 2 variants of this operator (autoregressive and chunked).

So I think we can experiment with a dedicated op in a branch, but merging this in master will likely take time.

am17an · 2026-02-11T07:44:26Z

@ggerganov I defer to your judgement, my thinking was that qwen3.5 is already a major model series, so even if the op is just for that model it makes sense.

for KDA, AFAIK it the gate is a matrix, so it will just be another dot product instead of a scale. For chunk vs autoregressive, we have the vec FA path for CPU which now serves a reference kernel. I was thinking it would be the same here, the autoregressive kernel remains the simple kernel while chunking is the optimisation, both solve the same recurrence.

ggerganov · 2026-02-11T07:48:26Z

Ok, let's prototype a branch that also has this op together with the CUDA implementation rebased on #19375. I will then add the Metal version of the kernel and from there we can consider a quicker merge if things are looking good. Also, want to see if having this op will allow the CUDA graphs to be more easily enabled.

pwilkin · 2026-02-11T10:12:02Z

So this is basically what the Transformers implementations have as the "recurrent" implementation, right? No chunking, just iterating over tokens.

am17an · 2026-02-11T10:17:59Z

@pwilkin yes, just calculating the recurrence token by token

ggerganov · 2026-02-11T11:35:08Z

Btw, should also consider small batch sizes larger than 1 to be handled by this operator too. I'm not sure where the break-even point would be, but I imagine that processing a few tokens auto-regressively (i.e. more than 1 and less than ~16) would be more efficient compared to the chunking path. Also don't forget that dim 3 will handle separate sequences - though from a quick look, this implementation already accounts for that.

am17an · 2026-02-11T11:50:47Z

Btw, should also consider small batch sizes larger than 1 to be handled by this operator too. I'm not sure where the break-even point would be, but I imagine that processing a few tokens auto-regressively (i.e. more than 1 and less than ~16) would be more efficient compared to the chunking path.

Yes for small amount of tokens we can just run a loop even in CUDA. I have not looked into the chunked impl yet, but I will invest some time in finding the breakeven point

Also don't forget that dim 3 will handle separate sequences - though from a quick look, this implementation already accounts for that.

I think this should be fine, the work is split among dim1 * dim3 (heads * sequences)

ymcki · 2026-02-11T13:07:52Z

Great performance gain for inference. Looking forward to seeing your implementation done for the major backends.

If you have plan to do the chunking version as well, it will be great if it is based on the block implementation done at fla.

https://github.com/fla-org/flash-linear-attention/blob/main/fla/ops/kda/chunk_intra_token_parallel.py
https://github.com/fla-org/flash-linear-attention/blob/main/fla/ops/kda/chunk_intra.py

pwilkin

Looks clean to me. Are you planning on doing the chunking version here as well, or separate op / PR?

ggerganov · 2026-02-11T15:10:40Z

Converted to draft since I am not sure if my comment was clear: #19504 (comment). First we will be prototyping a new branch and after that we will consider adding the new op.

pwilkin · 2026-02-11T15:27:52Z

Should we use this PR or will you create a dedicated branch?

am17an · 2026-02-11T15:58:28Z

@ggerganov I removed the norm, and also added the autoregressive cuda op in 01eda69, it passes test-backend-ops. I have not done the rebase on #19375

ggerganov · 2026-02-11T16:59:39Z

Just a heads up, I will be rebasing the #19375 branch from time to time. Hope it's not a big issue. Just always put your commits on top. I'm hoping to merge in a day or two.

am17an · 2026-02-11T17:23:10Z

I did a quick perf test this PR + #19375 + replacing the autoregressive for qwen3next with gated_delta_net. On a 5090

Details

master

model	size	params	backend	ngl	fa	test	t/s
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32	83.92 ± 0.39
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d1024	84.45 ± 0.36
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d2048	84.20 ± 0.61
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d4096	83.82 ± 0.56
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d8192	83.43 ± 1.73
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d16384	83.56 ± 0.47

PR:

model	size	params	backend	ngl	fa	test	t/s
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32	105.95 ± 0.36
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d1024	105.05 ± 0.91
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d2048	105.33 ± 0.42
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d4096	105.10 ± 0.50
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d8192	98.13 ± 1.79
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d16384	97.22 ± 0.49

ggerganov · 2026-02-11T17:30:21Z

For reference, what do you get with CUDA graphs forced enabled:

diff --git a/ggml/src/ggml-cuda/ggml-cuda.cu b/ggml/src/ggml-cuda/ggml-cuda.cu
index f3d8317e1..605cb3ed4 100644
--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -2894,7 +2894,7 @@ static bool ggml_cuda_graph_check_compability(ggml_cgraph * cgraph) {
 #endif
         }
 
-        if (node->op == GGML_OP_ADD &&
+        if (false && node->op == GGML_OP_ADD &&
             node->src[1] && node->src[1]->ne[1] > 1 &&
             (node->src[0] ? node->src[0]->name != gemma3n_per_layer_proj_src0_name : true) &&
             (node->src[1] ? node->src[1]->name != gemma3n_per_layer_proj_src1_name : true) &&

am17an · 2026-02-11T17:35:17Z

With force enabled CUDA graphs

Details

model	size	params	backend	ngl	fa	test	t/s
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32	111.89 ± 2.48
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d1024	135.26 ± 6.93
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d2048	135.89 ± 4.95
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d4096	134.77 ± 4.67
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d8192	123.30 ± 6.07
qwen3next 80B.A3B Q2_K - Medium	27.12 GiB	79.67 B	CUDA	99	1	tg32 @ d16384	121.22 ± 5.28

ggml/src/ggml-cpu/ops.cpp

am17an · 2026-03-07T04:55:32Z

I see actually a huge difference in PP on CPU when just using the autoregressive kernel instead of the current one i.e. just use the fused op regardless of n_tokens. But I think I will optimize this later

jacekpoplawski · 2026-03-07T11:26:54Z

Great speedup on tg (Qwen Next and Qwen 3.5)!

  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes

jeffbolznv · 2026-03-07T17:52:06Z

Hi @ProgenyAlpha, just wanted to check whether you still plan to submit a PR for the vulkan backend support.

CISC · 2026-03-08T08:24:31Z

Huh, not sure exactly what's happening, but MUSA build is now throwing an ICE:
https://github.com/ggml-org/llama.cpp/actions/runs/22774203231/job/66063122971#step:6:416

Edit: This killed our Docker release as well:
https://github.com/ggml-org/llama.cpp/actions/runs/22814140371/job/66176342700#step:9:900

am17an · 2026-03-08T17:35:29Z

@CISC not sure who maintains the MUSA backend, but it seems like a compiler bug

Add a fused Metal kernel for the gated delta net recurrence op (ggml-org#19504), enabling GPU-accelerated inference for DeltaNet-based models (Qwen3.5, etc.) on Apple Silicon. Supports both GDA (scalar gate) and KDA (per-row gate) modes with head_size 64 and 128. Unsupported configurations (head_size 32, non-contiguous tensors) gracefully fall back to CPU. Performance: Qwen3.5-0.8B Q4_K_M on M4 Max tg128: 170 -> 213 t/s (+25%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ggerganov · 2026-03-09T07:43:51Z

@yeahdongcn PTAL at the MUSA issue above.

@am17an In the meantime we can change supports_op to return false for MUSA

yeahdongcn · 2026-03-09T08:04:00Z

@yeahdongcn PTAL at the MUSA issue above.

@am17an In the meantime we can change supports_op to return false for MUSA

No problem. I'll try a local build first and see if I should open an internal ticket. Thanks!

ProgenyAlpha · 2026-03-09T09:31:38Z

Hi @ProgenyAlpha, just wanted to check whether you still plan to submit a PR for the vulkan backend support.

I wasn't sure where the thread was going so I wanted to let you guys cook and see how things unfolded before I jump back in. I'll rebase and work on that this week if I have time. Thanks for pinging me!

Add a fused Metal kernel for the gated delta net recurrence op (#19504), enabling GPU-accelerated inference for DeltaNet-based models (Qwen3.5, etc.) on Apple Silicon. Supports both GDA (scalar gate) and KDA (per-row gate) modes with head_size 64 and 128. Unsupported configurations (head_size 32, non-contiguous tensors) gracefully fall back to CPU. Performance: Qwen3.5-0.8B Q4_K_M on M4 Max tg128: 170 -> 213 t/s (+25%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ggml: add GATED_DELTA_NET op * remove the transpose * add KDA * add qwen35 dense * llama : check for fused gated delta net backend support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* metal : add Metal backend for GGML_OP_GATED_DELTA_NET Add a fused Metal kernel for the gated delta net recurrence op (#19504), enabling GPU-accelerated inference for DeltaNet-based models (Qwen3.5, etc.) on Apple Silicon. Supports both GDA (scalar gate) and KDA (per-row gate) modes with head_size 64 and 128. Unsupported configurations (head_size 32, non-contiguous tensors) gracefully fall back to CPU. Performance: Qwen3.5-0.8B Q4_K_M on M4 Max tg128: 170 -> 213 t/s (+25%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * metal : validate contiguity of all input tensors in supports_op Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * metal : add algorithm equivalence comment for GDA decay path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * cont : unslop + optimize * cont : clean-up --------- Co-authored-by: Paul Flynn <paul@arkavo.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* llama : enable chunked fused GDN path * models : avoid Q and K repeats when using fused GDA * cont : fix comment Co-authored-by: Aman Gupta <amangupta052@gmail.com> * cont : fix the fix Co-authored-by: Aman Gupta <amangupta052@gmail.com> * cont : fix * metal : add GDN kernel (#20361) * metal : add Metal backend for GGML_OP_GATED_DELTA_NET Add a fused Metal kernel for the gated delta net recurrence op (#19504), enabling GPU-accelerated inference for DeltaNet-based models (Qwen3.5, etc.) on Apple Silicon. Supports both GDA (scalar gate) and KDA (per-row gate) modes with head_size 64 and 128. Unsupported configurations (head_size 32, non-contiguous tensors) gracefully fall back to CPU. Performance: Qwen3.5-0.8B Q4_K_M on M4 Max tg128: 170 -> 213 t/s (+25%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * metal : validate contiguity of all input tensors in supports_op Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * metal : add algorithm equivalence comment for GDA decay path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * cont : unslop + optimize * cont : clean-up --------- Co-authored-by: Paul Flynn <paul@arkavo.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * CUDA: AR gated delta net improvements (#20391) * Add FastDiv to gated_delta_net_cuda * Shard columns across warps This reduces register pressure (avoids spill for S_v = 128) and gives the warp-scheduler more CTAs to schedule (thus hiding data-access latencies). * Remove unneded include in gated_delta_net.cu * Improve comments * Apply code-formating * Make sharding HIP-compatible 1. Use ggml_cuda_get_physical_warp_size() to determine warp size flexibly 2. Add test with partial warp to test sum reduction on CUDA * Remove fastdiv_s64, as we can treat neqk1 and rq3 as uint32_t * Rename variables * Enable GDN also for prefill, move TODO for chunked_GDN * Actually remove the TODO from 2068908 * Get warp size at runtime warp_size is not known at compile time in hip host code. * Don't expose ggml_cuda_get_physical_warp_size on host --------- Co-authored-by: uvos <devnull@uvos.xyz> * llama : refactor llm_build_delta_net_base API --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Paul Flynn <paul@arkavo.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Oliver Simons <osimons@nvidia.com> Co-authored-by: uvos <devnull@uvos.xyz>

* llama : enable chunked fused GDN path * models : avoid Q and K repeats when using fused GDA * cont : fix comment Co-authored-by: Aman Gupta <amangupta052@gmail.com> * cont : fix the fix Co-authored-by: Aman Gupta <amangupta052@gmail.com> * cont : fix * metal : add GDN kernel (ggml-org#20361) * metal : add Metal backend for GGML_OP_GATED_DELTA_NET Add a fused Metal kernel for the gated delta net recurrence op (ggml-org#19504), enabling GPU-accelerated inference for DeltaNet-based models (Qwen3.5, etc.) on Apple Silicon. Supports both GDA (scalar gate) and KDA (per-row gate) modes with head_size 64 and 128. Unsupported configurations (head_size 32, non-contiguous tensors) gracefully fall back to CPU. Performance: Qwen3.5-0.8B Q4_K_M on M4 Max tg128: 170 -> 213 t/s (+25%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * metal : validate contiguity of all input tensors in supports_op Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * metal : add algorithm equivalence comment for GDA decay path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * cont : unslop + optimize * cont : clean-up --------- Co-authored-by: Paul Flynn <paul@arkavo.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * CUDA: AR gated delta net improvements (ggml-org#20391) * Add FastDiv to gated_delta_net_cuda * Shard columns across warps This reduces register pressure (avoids spill for S_v = 128) and gives the warp-scheduler more CTAs to schedule (thus hiding data-access latencies). * Remove unneded include in gated_delta_net.cu * Improve comments * Apply code-formating * Make sharding HIP-compatible 1. Use ggml_cuda_get_physical_warp_size() to determine warp size flexibly 2. Add test with partial warp to test sum reduction on CUDA * Remove fastdiv_s64, as we can treat neqk1 and rq3 as uint32_t * Rename variables * Enable GDN also for prefill, move TODO for chunked_GDN * Actually remove the TODO from 2068908 * Get warp size at runtime warp_size is not known at compile time in hip host code. * Don't expose ggml_cuda_get_physical_warp_size on host --------- Co-authored-by: uvos <devnull@uvos.xyz> * llama : refactor llm_build_delta_net_base API --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Paul Flynn <paul@arkavo.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Oliver Simons <osimons@nvidia.com> Co-authored-by: uvos <devnull@uvos.xyz>

am17an requested a review from ggerganov as a code owner February 11, 2026 07:09

am17an requested a review from pwilkin February 11, 2026 07:09

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Feb 11, 2026

pwilkin approved these changes Feb 11, 2026

View reviewed changes

pwilkin mentioned this pull request Feb 11, 2026

Qwen3-Coder-Next (Qwen3-Next-80B) CPU inference ~5x slower than expected — consumer hardware benchmarks #19480

Closed

ggerganov marked this pull request as draft February 11, 2026 15:09

am17an force-pushed the gated_delta_net branch from f655ba4 to 01eda69 Compare February 11, 2026 15:56

github-actions bot added model Model specific Nvidia GPU Issues specific to Nvidia GPUs labels Feb 11, 2026

am17an force-pushed the gated_delta_net branch from 01eda69 to 54ea122 Compare February 11, 2026 16:29

ggerganov reviewed Feb 12, 2026

View reviewed changes

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ggerganov reviewed Feb 12, 2026

View reviewed changes

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ggerganov reviewed Feb 12, 2026

View reviewed changes

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ORippler mentioned this pull request Feb 12, 2026

CUDA: Enable cuda graphs for qwen3 next-style architectures #19521

Closed

CISC approved these changes Mar 6, 2026

View reviewed changes

am17an merged commit c5a7788 into ggml-org:master Mar 7, 2026
76 of 78 checks passed

am17an deleted the gated_delta_net branch March 7, 2026 07:41

This was referenced Mar 8, 2026

Eval bug: Qwen 3.5 degraded pp performance on HIP/ROCm #20218

Closed

Eval bug: Regression from 8189 to 8234 - TG down from 37tps to 29tps Qwen3.5 35b #20237

Closed

arkavo-com mentioned this pull request Mar 8, 2026

metal : add Metal backend for GGML_OP_GATED_DELTA_NET #20244

Closed

3 tasks

engrtipusultan mentioned this pull request Mar 9, 2026

Feature Request: Implement missing ops from backends #14909

Open

4 tasks

vnicolici mentioned this pull request Mar 9, 2026

Eval bug: rpc-server.exe crash including and after b8233 #20259

Closed

This was referenced Mar 10, 2026

vulkan: add GATED_DELTA_NET op support #20333

Closed

vulkan: add GATED_DELTA_NET op support #20334

Merged

ggerganov mentioned this pull request Mar 10, 2026

llama : enable chunked fused GDN path #20340

Merged

appwa9 mentioned this pull request Mar 10, 2026

metal: fused_gdn_ch (chunked prompt processing) needed for Qwen3.5 on older Apple GPUs #20342

Open

nsyring mentioned this pull request Mar 10, 2026

Vulkan: missing GATED_DELTA_NET compute shader; ROCm/HIP: fused kernel underperforms on RDNA 3.5 (gfx1151) #20354

Closed

johnlim5847 mentioned this pull request Mar 11, 2026

Qwen 3.5 models generates only zeroes on A14 chips mybigday/llama.rn#305

Open

inforithmics mentioned this pull request Mar 15, 2026

Revert revert vendor update (Vendor Update to b8353) ollama/ollama#14134

Open

5 tasks

jkleckner mentioned this pull request Mar 16, 2026

Massive difference in speed between Ollama and llama.cpp with qwen3.5:35b! ollama/ollama#14861

Open

Conversation

am17an commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Feb 11, 2026

Uh oh!

ggerganov commented Feb 11, 2026

Uh oh!

pwilkin commented Feb 11, 2026

Uh oh!

am17an commented Feb 11, 2026

Uh oh!

ggerganov commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Feb 11, 2026

Uh oh!

ymcki commented Feb 11, 2026

Uh oh!

pwilkin left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Feb 11, 2026

Uh oh!

pwilkin commented Feb 11, 2026

Uh oh!

am17an commented Feb 11, 2026

Uh oh!

ggerganov commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Feb 11, 2026

Uh oh!

ggerganov commented Feb 11, 2026

Uh oh!

am17an commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

am17an commented Mar 7, 2026

Uh oh!

Uh oh!

jacekpoplawski commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Mar 7, 2026

Uh oh!

CISC commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Mar 8, 2026

Uh oh!

ggerganov commented Mar 9, 2026

Uh oh!

yeahdongcn commented Mar 9, 2026

Uh oh!

ProgenyAlpha commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

am17an commented Feb 11, 2026 •

edited

Loading

ggerganov commented Feb 11, 2026 •

edited

Loading

ggerganov commented Feb 11, 2026 •

edited

Loading

ggerganov commented Feb 11, 2026 •

edited

Loading

am17an commented Feb 11, 2026 •

edited

Loading

jacekpoplawski commented Mar 7, 2026 •

edited

Loading

CISC commented Mar 8, 2026 •

edited

Loading