model: (qwen3next) correct vectorized key_gdiff calculation#19324
model: (qwen3next) correct vectorized key_gdiff calculation#19324ngxson merged 2 commits intoggml-org:masterfrom
Conversation
|
We've officially arrived at self-improving AI it looks like ;) |
|
My test cases that were failing before are now passing with this change. |
|
I updated to compare-logprobs script can reran it. There are still some diversions from vLLM (I suppose due to numerical issues), but it does look better on long context (see tokens past 5000 depth): PR
master
|
|
It does still deviate in the token that was picked at position 5030. Shouldn't numerical precision issues still result in the same token? |
|
Not always, numerical differences can accumulate enough to change the output logits. But I think it may depend on the quantization that I'm using (q8_0). Will need to do more testing. But for now, I think the current fix should already be good enough. |
That's the only reason we write buggy code, right? * cough * |
hmm ok I ran locally editorconfig but didn't catch it earlier, probably I was on a wrong branch. pushing a fix along with #19331 |
I 100% bet you they're just vibe-translating LCPP PRs to Go. lollllll |
|
Do GGUFs need to be regenerated after this change? I was under the impression that wouldn't be needed, but this message by the Unsloth team tells me that I do: https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/discussions/5 |
No there are no conversion changes, no idea why they reconverted the model. |
|
It should only affect I-quants, since imatrix is generated from intermediate activations. Normal quants (Qx_0, Qx_1, Qx_K) should not be affected |
|
Ah, yes, imatrix would be affected. |
|
Oh yes Q8_K_XL, Q8_0, BF16, MXFP4_MOE are fine - the rest are imatrix so they did change a bit |
…#19324) * model: (qwen3next) correct vectorized key_gdiff calculation * move transpose to outside of loop



Testing with the provided prompt from #19305