Attempting to improve performance #10

createthis · 2025-10-27T15:12:53Z

No description provided.

- Removed the forced CPU backend assignment of kvaware_indices - src/llama-sparse-topk.cpp: deleted the block that moved result to backend_cpu. Now it stays where it’s produced. - src/llama-model.cpp: removed both instances of ggml_backend_sched_set_tensor_backend(sched, kvaware_indices, backend_cpu) so we don’t bounce indices to host in MLA and MHA sparse paths. - Gate debug-only float32 cast of indices: - src/llama-sparse-topk.cpp: only cast to F32 and log the f32 indices when LLAMA_SPARSE_DEBUG is set. This cuts extra nodes/copies in normal runs. - Increase default Top-K token tile size: - src/llama-sparse-topk.cpp: default TILE_T from 32 to 128, still overridable via LLAMA_SPARSE_TOPK_TILE_T.

branches, so we avoid the extra backend hop to CPU after apply_sparse_attention_kvaware

Streaming per-head accumulation to avoid [N_kv, H, Tc] temporaries

edc23f9

createthis self-assigned this Oct 27, 2025

createthis added 4 commits October 27, 2025 11:13

Revert last change as it was objectively worse.

9e9a84a

kept the sparse attention output tensor “cur” on device in the sparse

7866fd5

branches, so we avoid the extra backend hop to CPU after apply_sparse_attention_kvaware

WIP radix top-k

b96f5fb

github-actions bot added the testing label Oct 27, 2025

createthis added 3 commits October 27, 2025 20:49

Ported radix top-k selection with thresholding and tail refinement

d3e4a6a

Integrate radix top-k

100535b

Guard printf's with dbg

7780061

createthis merged commit 7780061 into deepseek_v3_2_exp Oct 28, 2025
33 of 64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Attempting to improve performance #10

Attempting to improve performance #10

createthis commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Attempting to improve performance #10

Attempting to improve performance #10

Conversation

createthis commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant