Deepseek v3.2 dense attention support from @fairydreaming#18849
Deepseek v3.2 dense attention support from @fairydreaming#18849createthis wants to merge 1 commit intoggml-org:masterfrom
Conversation
|
In its current form this is too hacky, I'm fine with supporting conversion without the indexer tensors, but make it its own class, inheriting from Don't bother with the |
|
@createthis By the way in the most recent version of the patch I decided to modify add_bos_token in tokenizer_settings.json manually instead of overriding the value in conversion script, then the patch collapses to merely 2 new/changed lines, details in README here: https://huggingface.co/sszymczyk/DeepSeek-V3.2-nolight-GGUF If you go this way I guess the need for manual modification shall be documented somewhere (or perhaps detected if not done with some error msg). |
I would prefer my suggestion, also I see we need to provide a chat template for it to work, I suggest adding that from |
|
@createthis gentle ping |
|
@CISC My company's policy excludes me from using my local llm machine for work purposes so my motivation has evaporated. Tempted to sell off the RDIMMs given the market value. lol. |
This is a bare minimum implementation of DeepSeek V3.2 using dense attention only. @fairydreaming wrote this code, I just packaged it into a PR.
I've generated GGUFs with this: https://huggingface.co/createthis/DeepSeek-V3.2-dense-GGUF
Then inferred them out to 48300 context with several turns. It seems to work fine.
The major issue is that the sparse attention tensors are left out of the GGUF.
If this is unacceptable, I have another PR from back in October that populates the sparse attention tensors in the GGUF, but still doesn't use them for inference. I abandoned that PR because it fell into degenerate generation at about 45k context. Now that I know this PR works, I can attempt to fix the other PR.
Let me know what you think.