[Attention][Spec Decode] FlashMLA spec decode support#26541
[Attention][Spec Decode] FlashMLA spec decode support#26541benchislett merged 14 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces speculative decoding support for the FlashMLA backend, which is a significant feature enhancement. The changes are well-structured, particularly the introduction of the QueryLenSupport enum to clearly define the capabilities of different backends. The test cases have been updated appropriately to cover speculative decoding scenarios. My main feedback is a minor performance improvement in the test suite by moving a repeated import out of a loop.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
dae6cb0 to
b11a56e
Compare
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
LucasWilkinson
left a comment
There was a problem hiding this comment.
Nice! LGTM but lets have @benchislett look too
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
benchislett
left a comment
There was a problem hiding this comment.
minor issue flagged. otherwise LGTM, please ping me when resolved
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
@benchislett thanks for your review! I've addressed your comments |
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
…6541) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Purpose
This PR implements speculative decoding support for the FlashMLA backend.
NOTE: the comment about the intermittent test failure was true prior to this PR, I just made a note of it.
cc @LucasWilkinson
Test Plan
pytest tests/v1/attention/test_mla_backends.pyTest Result
Passes
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.