Deepseek v3.2 dense attention support from @fairydreaming by createthis · Pull Request #18849 · ggml-org/llama.cpp

createthis · 2026-01-14T22:13:45Z

This is a bare minimum implementation of DeepSeek V3.2 using dense attention only. @fairydreaming wrote this code, I just packaged it into a PR.

I've generated GGUFs with this: https://huggingface.co/createthis/DeepSeek-V3.2-dense-GGUF

Then inferred them out to 48300 context with several turns. It seems to work fine.

The major issue is that the sparse attention tensors are left out of the GGUF.

If this is unacceptable, I have another PR from back in October that populates the sparse attention tensors in the GGUF, but still doesn't use them for inference. I abandoned that PR because it fell into degenerate generation at about 45k context. Now that I know this PR works, I can attempt to fix the other PR.

Let me know what you think.

CISC · 2026-01-14T23:08:29Z

In its current form this is too hacky, I'm fine with supporting conversion without the indexer tensors, but make it its own class, inheriting from DeepseekV2Model, and override set_vocab to set add_bos_token and modify_tensor to filter out the indexer tensors. adding comments to make it clear what you're doing.

Don't bother with the override_tokenizer_settings, just add the new chkhsh.

fairydreaming · 2026-01-15T09:48:34Z

@createthis By the way in the most recent version of the patch I decided to modify add_bos_token in tokenizer_settings.json manually instead of overriding the value in conversion script, then the patch collapses to merely 2 new/changed lines, details in README here: https://huggingface.co/sszymczyk/DeepSeek-V3.2-nolight-GGUF

If you go this way I guess the need for manual modification shall be documented somewhere (or perhaps detected if not done with some error msg).

CISC · 2026-01-15T09:55:01Z

If you go this way I guess the need for manual modification shall be documented somewhere (or perhaps detected if not done with some error msg).

I would prefer my suggestion, also I see we need to provide a chat template for it to work, I suggest adding that from models/templates on conversion so that you create a fully functional GGUF.

CISC · 2026-02-03T18:39:45Z

@createthis gentle ping

createthis · 2026-02-03T20:31:06Z

@CISC My company's policy excludes me from using my local llm machine for work purposes so my motivation has evaporated. Tempted to sell off the RDIMMs given the market value. lol.

Deepseek v3.2 dense attention support from @fairydreaming

577801f

createthis requested a review from CISC as a code owner January 14, 2026 22:13

createthis mentioned this pull request Jan 14, 2026

Feature Request: DeepSeek V3.2-Exp support #16331

Open

4 tasks

loci-dev mentioned this pull request Jan 14, 2026

UPSTREAM PR #18849: Deepseek v3.2 dense attention support from @fairydreaming auroralabs-loci/llama.cpp#923

Open

github-actions bot added the python python script changes label Jan 14, 2026

DocShotgun mentioned this pull request Feb 9, 2026

model: support GLM MoE DSA arch (NOTE: indexer is not yet supported) #19460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepseek v3.2 dense attention support from @fairydreaming#18849

Deepseek v3.2 dense attention support from @fairydreaming#18849
createthis wants to merge 1 commit intoggml-org:masterfrom
createthis:deepseek_v3_2

createthis commented Jan 14, 2026

Uh oh!

CISC commented Jan 14, 2026

Uh oh!

fairydreaming commented Jan 15, 2026

Uh oh!

CISC commented Jan 15, 2026

Uh oh!

CISC commented Feb 3, 2026

Uh oh!

createthis commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

createthis commented Jan 14, 2026

Uh oh!

CISC commented Jan 14, 2026

Uh oh!

fairydreaming commented Jan 15, 2026

Uh oh!

CISC commented Jan 15, 2026

Uh oh!

CISC commented Feb 3, 2026

Uh oh!

createthis commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants