Skip to content

Deepseek v3.2 dense attention support from @fairydreaming#18849

Open
createthis wants to merge 1 commit intoggml-org:masterfrom
createthis:deepseek_v3_2
Open

Deepseek v3.2 dense attention support from @fairydreaming#18849
createthis wants to merge 1 commit intoggml-org:masterfrom
createthis:deepseek_v3_2

Conversation

@createthis
Copy link
Contributor

This is a bare minimum implementation of DeepSeek V3.2 using dense attention only. @fairydreaming wrote this code, I just packaged it into a PR.

I've generated GGUFs with this: https://huggingface.co/createthis/DeepSeek-V3.2-dense-GGUF

Then inferred them out to 48300 context with several turns. It seems to work fine.

The major issue is that the sparse attention tensors are left out of the GGUF.

If this is unacceptable, I have another PR from back in October that populates the sparse attention tensors in the GGUF, but still doesn't use them for inference. I abandoned that PR because it fell into degenerate generation at about 45k context. Now that I know this PR works, I can attempt to fix the other PR.

Let me know what you think.

@CISC
Copy link
Collaborator

CISC commented Jan 14, 2026

In its current form this is too hacky, I'm fine with supporting conversion without the indexer tensors, but make it its own class, inheriting from DeepseekV2Model, and override set_vocab to set add_bos_token and modify_tensor to filter out the indexer tensors. adding comments to make it clear what you're doing.

Don't bother with the override_tokenizer_settings, just add the new chkhsh.

@github-actions github-actions bot added the python python script changes label Jan 14, 2026
@fairydreaming
Copy link
Collaborator

@createthis By the way in the most recent version of the patch I decided to modify add_bos_token in tokenizer_settings.json manually instead of overriding the value in conversion script, then the patch collapses to merely 2 new/changed lines, details in README here: https://huggingface.co/sszymczyk/DeepSeek-V3.2-nolight-GGUF

If you go this way I guess the need for manual modification shall be documented somewhere (or perhaps detected if not done with some error msg).

@CISC
Copy link
Collaborator

CISC commented Jan 15, 2026

If you go this way I guess the need for manual modification shall be documented somewhere (or perhaps detected if not done with some error msg).

I would prefer my suggestion, also I see we need to provide a chat template for it to work, I suggest adding that from models/templates on conversion so that you create a fully functional GGUF.

@CISC
Copy link
Collaborator

CISC commented Feb 3, 2026

@createthis gentle ping

@createthis
Copy link
Contributor Author

@CISC My company's policy excludes me from using my local llm machine for work purposes so my motivation has evaporated. Tempted to sell off the RDIMMs given the market value. lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants