Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] plan to support medusa? #859

Closed
CSEEduanyu opened this issue Aug 1, 2024 · 4 comments
Closed

[Feature] plan to support medusa? #859

CSEEduanyu opened this issue Aug 1, 2024 · 4 comments
Assignees

Comments

@CSEEduanyu
Copy link

Motivation

plan to support medusa?

Related resources

No response

@zhyncs
Copy link
Member

zhyncs commented Aug 1, 2024

Speculative decoding support is on our roadmap. Currently, FlashInfer has implemented the corresponding kernel and made targeted optimizations, please stay tuned.

@zhyncs zhyncs self-assigned this Aug 1, 2024
@zhyncs zhyncs mentioned this issue Aug 1, 2024
29 tasks
@zhyncs zhyncs added the feature label Aug 1, 2024
@chuangzhidan
Copy link

chuangzhidan commented Sep 30, 2024

Speculative decoding support is on our roadmap. Currently, FlashInfer has implemented the corresponding kernel and made targeted optimizations, please stay tuned.

really looking for fast decoding methods like medusa,Speculative decoding, LOOKAHEAD DECODING and such

@vkc1vk
Copy link

vkc1vk commented Oct 20, 2024

Hi I was wondering Medusa will be supported with full tree attention or the Top-1 version currently available in vLLM?

Thanks.
cc: @zhyncs @merrymercy

Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants