graph : ensure DS32 kq_mask_lid is F32 by CISC · Pull Request #23864 · ggml-org/llama.cpp

CISC · 2026-05-29T10:47:57Z

Overview

Additional information

Since build_attn_inp_kq_mask returns F16 mask when flash attention is enabled, pass a modified copy of cparams for kq_mask_lid.

llama.cpp/src/models/deepseek32.cpp

Lines 341 to 344 in 1f0aa2a

    
           // mask indexer scores 
        
           ggml_tensor * indexer_kq_mask = inp_attn_dsa->get_kq_mask_lid(); 
        
           indexer_score = ggml_add(ctx0, indexer_score, indexer_kq_mask); 
        
           cb(indexer_score, "indexer_score", il);

This is a bit hacky, open for better solutions. cc/ @am17an

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: mboten

am17an · 2026-05-29T11:17:07Z

Does this mask need to be f32?

CISC · 2026-05-29T11:21:34Z

Does this mask need to be f32?

Either that or we have to cast indexer_score to F16.

fairydreaming · 2026-05-29T16:03:15Z

So... I checked how DeepSeek V3.2 works in master (a couple of hours too late) and ended up here. But this PR helps, ggml_cuda_op_add error is gone.

* origin/master: vocab : support tokenizer for LFM2.5-8B-A1B (ggml-org#23826) graph : ensure DS32 kq_mask_lid is F32 (ggml-org#23864) server: remove obsolete scripts (ggml-org#23870) ci : update macos release to use macos-26 runner (ggml-org#23878) download: add option to skip_download (ggml-org#23059) mtmd: Add DeepSeekOCR 2 Support (ggml-org#20975) CUDA: Check PTX version on host side to guard PDL dispatch (ggml-org#23530) server: bump timeout to 3600s (ggml-org#23842) model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (ggml-org#23346) llama: use f16 mask for FA to save VRAM (ggml-org#23764) sync : ggml ggml : bump version to 0.13.1 (ggml/1523) ngram-mod : Add missing include (ggml-org#23857) llama: add llm_graph_input_mtp (ggml-org#23643) app : move licences to llama-app (ggml-org#23824) cuda : disables launch_fattn PDL enrollment due to compiler bug (ggml-org#23825) meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear (ggml-org#23480)

ensure DS32 kq_mask_lid is F32

d19c6cb

CISC requested a review from ggerganov May 29, 2026 10:48

am17an approved these changes May 29, 2026

View reviewed changes

ggerganov approved these changes May 29, 2026

View reviewed changes

CISC mentioned this pull request May 29, 2026

Support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation #23346

Merged

CISC merged commit 764f1e6 into master May 29, 2026
27 checks passed

CISC deleted the cisc/graph-ds32-lid-mask-fix branch May 29, 2026 17:55

CISC mentioned this pull request May 29, 2026

test-llama-archs: disable DS 3.2 [no release] #23876

Closed

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

graph : ensure DS32 kq_mask_lid is F32 (ggml-org#23864)

b2135d4

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

graph : ensure DS32 kq_mask_lid is F32 (ggml-org#23864)

5b727d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph : ensure DS32 kq_mask_lid is F32#23864

graph : ensure DS32 kq_mask_lid is F32#23864
CISC merged 1 commit into
masterfrom
cisc/graph-ds32-lid-mask-fix

CISC commented May 29, 2026 •

edited

Loading

Uh oh!

am17an commented May 29, 2026

Uh oh!

CISC commented May 29, 2026

Uh oh!

fairydreaming commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	// mask indexer scores
	ggml_tensor * indexer_kq_mask = inp_attn_dsa->get_kq_mask_lid();
	indexer_score = ggml_add(ctx0, indexer_score, indexer_kq_mask);
	cb(indexer_score, "indexer_score", il);

Conversation

CISC commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

am17an commented May 29, 2026

Uh oh!

CISC commented May 29, 2026

Uh oh!

fairydreaming commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CISC commented May 29, 2026 •

edited

Loading