[kernel] remove eager kernel due to the new primitive by WindChimeRan · Pull Request #372 · vllm-project/vllm-metal

WindChimeRan · 2026-05-14T06:58:27Z

cleanup eager kernel, due to the working primitive (#225)

now eager kernel is dead code

Signed-off-by: ran <hzz5361@psu.edu>

Resolve modify/delete conflict on test_metal_unified_attention.py: upstream removed the eager kernel test (PR vllm-project#372), we had added prefill test cases. Accept deletion, migrate prefill-only test parametrizations to test_primitive_and_donation.py::test_primitive_vs_reference_varlen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## Summary After #372 removed the eager `paged_attention_*` bindings, `PagedAttentionPrimitive::eval_gpu` is the only caller of `dispatch_paged_attention_v2_online` (and indirectly `dispatch_paged_attention_tiled`), and it always passes `from_primitive=true`. The `from_primitive` parameter, the `!from_primitive` `add_temporary` branches in both dispatchers, and the `add_paged_attn_temporaries` helper that those branches called are all unreachable. `vllm_metal/metal/paged_ops.cpp` — 37 lines deleted: - `from_primitive` parameter removed from `dispatch_paged_attention_tiled` and `dispatch_paged_attention_v2_online` - the `if (!from_primitive)` `add_temporary_compat` branches removed in both (including the TurboQuant scale/zero/centroid temporaries) - `add_paged_attn_temporaries` deleted (only those branches called it) - call sites in `dispatch_paged_attention_v2_online → dispatch_paged_attention_tiled` and `PagedAttentionPrimitive::eval_gpu → dispatch_paged_attention_v2_online` updated to drop the argument ## What stays `dispatch_reshape_and_cache` keeps its `from_primitive` parameter — it still has a live eager caller via `reshape_and_cache_impl`, which `vllm_metal/paged_attention_backend/mha.py:47` calls as part of the Metal-shader warm-up. Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>

remove eager kernel

c1d39c9

Signed-off-by: ran <hzz5361@psu.edu>

LxYuan0420 reviewed May 14, 2026

View reviewed changes

Comment thread tests/test_metal_unified_attention.py

LxYuan0420 approved these changes May 14, 2026

View reviewed changes

add sliding window tests

fcac421

Signed-off-by: ran <hzz5361@psu.edu>

WindChimeRan merged commit 8753b8c into vllm-project:main May 14, 2026
5 checks passed

scyyh11 mentioned this pull request May 15, 2026

metal: drop dead from_primitive plumbing in v2_online + tiled #376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kernel] remove eager kernel due to the new primitive#372

[kernel] remove eager kernel due to the new primitive#372
WindChimeRan merged 2 commits into
vllm-project:mainfrom
WindChimeRan:cleanup/remove-eager-gqa-binding

WindChimeRan commented May 14, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WindChimeRan commented May 14, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants