Skip to content

Test tree flash attn#25511

Draft
TheEpicDolphin wants to merge 3 commits intovllm-project:mainfrom
TheEpicDolphin:test_tree_flash_attn
Draft

Test tree flash attn#25511
TheEpicDolphin wants to merge 3 commits intovllm-project:mainfrom
TheEpicDolphin:test_tree_flash_attn

Conversation

@TheEpicDolphin
Copy link
Copy Markdown
Collaborator

@TheEpicDolphin TheEpicDolphin commented Sep 23, 2025

benchmarks comparing standard vs tree spec decode with new FA2 + mask kernel: https://docs.google.com/spreadsheets/d/1imDQmv-5yPbDZwWRD7FslDUw5KQ794ET8RY7701jfmk/edit?usp=sharing

Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Sep 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @TheEpicDolphin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@foolusion
Copy link
Copy Markdown

I had to make a change for the drafter to use the tree, but it didn't seem to improve the performance

out.patch

@TheEpicDolphin TheEpicDolphin force-pushed the test_tree_flash_attn branch 2 times, most recently from 3c94337 to afd134b Compare September 25, 2025 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants