[main][refactor] Align spec_decode with vllm#6913
[main][refactor] Align spec_decode with vllm#6913drslark wants to merge 1 commit intovllm-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on refactoring the speculative decoding mechanism to enhance its alignment and compatibility with the main vLLM codebase. It introduces a more structured approach to class inheritance and method organization, particularly for the proposer components. Additionally, it improves the handling of rejected tokens within the Triton kernel, which is crucial for the efficiency and correctness of speculative decoding. The changes also include dynamic class manipulation utilities to facilitate seamless integration and patching of upstream classes. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the speculative decoding implementation to align it with the upstream vLLM library. The changes are quite extensive, introducing a new SpecDecodeBaseProposer and using dynamic class creation to reuse upstream logic while injecting custom behavior. This alignment also involves adding support for returning the number of rejected tokens from the prepare_inputs_padded kernel and plumbing this information through the model runner.
My review identified a critical bug in vllm_ascend/spec_decode/eagle_proposer.py that would cause an UnboundLocalError under certain conditions. The rest of the changes appear to be consistent with the refactoring goal. Please address the identified issue.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
4cbba31 to
a7e97ee
Compare
Signed-off-by: drslark <slarksblood@qq.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
According to #6881, this pr follows vllm-project/vllm#32887.
vllm-ascendcodes,And we want to change the structure to follow
vllm:But unfortunately, there are a lot of
isinstance(proposer, EagleProposer)invllmcode and we don't want to patch these codes.Also, we don't want to copy and paste most of
EagleProposerandDraftModelProposercodes.Based on the above requirements, we decide to do as below:
EagleProposerandDraftModelProposerto beABCMeta(no logic will be modified!), and uses python's virtual subclass mechanism.EagleProposerandDraftModelProposerwithout any redundant codes, but withvllm-ascend's own base classSpecDecodeBaseProposer.Does this PR introduce any user-facing change?
N/A
How was this patch tested?