Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
bf94899
init FI topk integration
LopezCastroRoberto Feb 10, 2026
2c4ae45
add adapted topK FI kernel
LopezCastroRoberto Feb 12, 2026
e189d43
moving workkspace buffer allocation
LopezCastroRoberto Feb 12, 2026
3f6159d
add new heuristic
LopezCastroRoberto Feb 13, 2026
6d088b6
refractor
LopezCastroRoberto Feb 13, 2026
f103fbb
refractor
LopezCastroRoberto Feb 13, 2026
dc03510
refractor kernel
LopezCastroRoberto Feb 16, 2026
143da75
improve test set
LopezCastroRoberto Feb 16, 2026
d5ce8a5
rename kernels
LopezCastroRoberto Feb 16, 2026
5024a36
review comments
LopezCastroRoberto Feb 27, 2026
d5507a9
fix cudagraph issue
LopezCastroRoberto Mar 3, 2026
367255a
add one more bucket
LopezCastroRoberto Mar 6, 2026
02bc7c5
add persistent scheduler for topK
LopezCastroRoberto Mar 17, 2026
c1d9d55
Merge branch 'main' into perf/topKperRow-FI
LopezCastroRoberto Mar 17, 2026
41c3f5b
Merge branch 'main' into perf/topKperRow-FI
LopezCastroRoberto Mar 17, 2026
4743674
update persistent scheduler
LopezCastroRoberto Mar 18, 2026
639a068
init persistent scheduler PR
LopezCastroRoberto Mar 18, 2026
37188d2
cleaning and adding tests
LopezCastroRoberto Mar 18, 2026
41e4f56
add removed file
LopezCastroRoberto Mar 18, 2026
580abcd
add missing cpyright comments
LopezCastroRoberto Mar 18, 2026
23e5f80
cleaning
LopezCastroRoberto Mar 18, 2026
1733c5b
cleaning
LopezCastroRoberto Mar 18, 2026
531c275
cleaning
LopezCastroRoberto Mar 18, 2026
4538046
additional optimizations
LopezCastroRoberto Mar 20, 2026
cbd2e8f
fix CG issue
LopezCastroRoberto Mar 24, 2026
43f6a9f
fixed comments and improved performance
LopezCastroRoberto Mar 27, 2026
a6bac3a
Merge branch 'main' into perf/topK_persistent_scheduler
LopezCastroRoberto Mar 31, 2026
e856535
use get_simultaneous for topk workspace
LopezCastroRoberto Apr 1, 2026
72752fb
Merge branch 'main' into perf/topK_persistent_scheduler
LopezCastroRoberto Apr 1, 2026
30d7b24
Merge branch 'main' into perf/topK_persistent_scheduler
LopezCastroRoberto Apr 1, 2026
0f4126d
Merge branch 'main' into perf/topK_persistent_scheduler
LucasWilkinson Apr 2, 2026
32d82f8
fix ci
LopezCastroRoberto Apr 6, 2026
f8c8df3
Merge branch 'main' into perf/topK_persistent_scheduler
LopezCastroRoberto Apr 6, 2026
8162e84
fix ci
LopezCastroRoberto Apr 6, 2026
b2f714b
fix ci
LopezCastroRoberto Apr 6, 2026
267c225
Merge branch 'main' into perf/topK_persistent_scheduler
LopezCastroRoberto Apr 6, 2026
026cefb
Merge branch 'main' into perf/topK_persistent_scheduler
LopezCastroRoberto Apr 6, 2026
1e83a30
Merge branch 'main' into perf/topK_persistent_scheduler
LopezCastroRoberto Apr 7, 2026
749aba3
Merge branch 'main' into perf/topK_persistent_scheduler
LopezCastroRoberto Apr 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .buildkite/test_areas/kernels.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,9 @@ steps:
source_file_dependencies:
- csrc/
- tests/kernels/core
- tests/kernels/test_top_k_per_row.py
- tests/kernels/test_concat_mla_q.py
commands:
- pytest -v -s kernels/core kernels/test_top_k_per_row.py kernels/test_concat_mla_q.py
- pytest -v -s kernels/core kernels/test_concat_mla_q.py

- label: Kernels Attention Test %N
timeout_in_minutes: 35
Expand Down Expand Up @@ -107,6 +106,7 @@ steps:
- vllm/v1/attention/backends/mla/flashinfer_mla.py
- vllm/v1/attention/selector.py
- vllm/platforms/cuda.py
- tests/kernels/test_top_k_per_row.py
commands:
- nvidia-smi
- python3 examples/basic/offline_inference/chat.py
Expand All @@ -117,6 +117,7 @@ steps:
- pytest -v -s tests/kernels/attention/test_flashinfer_trtllm_attention.py
- pytest -v -s tests/kernels/attention/test_cutlass_mla_decode.py
- pytest -v -s tests/kernels/attention/test_flashinfer_mla_decode.py
- pytest -v -s tests/kernels/test_top_k_per_row.py
# Quantization
- pytest -v -s tests/kernels/quantization/test_cutlass_scaled_mm.py -k 'fp8'
- pytest -v -s tests/kernels/quantization/test_nvfp4_quant.py
Expand Down
6 changes: 3 additions & 3 deletions csrc/ops.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,9 @@ void top_k_per_row_decode(const torch::Tensor& logits, int64_t next_n,
int64_t numRows, int64_t stride0, int64_t stride1,
int64_t topK);

void large_context_topk(const torch::Tensor& score, torch::Tensor& indices,
const torch::Tensor& lengths,
std::optional<torch::Tensor> row_starts_opt);
void persistent_topk(const torch::Tensor& logits, const torch::Tensor& lengths,
torch::Tensor& output, torch::Tensor& workspace, int64_t k,
int64_t max_seq_len);

void rms_norm_static_fp8_quant(torch::Tensor& out, torch::Tensor& input,
torch::Tensor& weight, torch::Tensor& scale,
Expand Down
Loading
Loading