[Feature] add Dflash on Ascend#36764
[Feature] add Dflash on Ascend#36764chenaoxuan wants to merge 1 commit intovllm-project:releases/v0.13.0from
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Documentation preview: https://vllm--36764.org.readthedocs.build/en/36764/ |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Documentation preview: https://vllm--36764.org.readthedocs.build/en/36764/ |
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Code Review
This pull request introduces a new speculative decoding method named "dflash" for Qwen3 models. The changes involve defining a new DFlashQwen3ForCausalLM model, integrating "dflash" into the speculative configuration types, hash computation, method detection, and argument verification. It also adds specific auxiliary hidden state layer configurations for "dflash" within the Qwen3 model and adjusts position tensor initialization in the speculative decoding framework. Review comments indicate a type hint violation in get_eagle3_aux_hidden_state_layers where a list is returned instead of a tuple for the "dflash" method, and a potential side effect in Qwen3Model's __init__ due to an in-place modification of self.config.eagle_config when updating drafter_config.
| def get_eagle3_aux_hidden_state_layers(self) -> tuple[int, ...]: | ||
| def get_eagle3_aux_hidden_state_layers(self, method: str | None = None) -> tuple[int, ...]: | ||
| if method is not None and method == "dflash": | ||
| return [1, 9, 17, 25, 33] |
There was a problem hiding this comment.
The function get_eagle3_aux_hidden_state_layers is type-hinted to return a tuple[int, ...], but for the dflash method, it returns a list. This violates the type hint and could lead to unexpected behavior. Please return a tuple instead.
| return [1, 9, 17, 25, 33] | |
| return (1, 9, 17, 25, 33) |
| drafter_config = getattr(self.config, "eagle_config", {}) | ||
| drafter_config.update(getattr(self.config, "dflash_config", {})) |
There was a problem hiding this comment.
The update method is called on drafter_config, which might be a direct reference to self.config.eagle_config. This can lead to an unintended in-place modification of self.config.eagle_config, which could have side effects elsewhere. To avoid this, you should create a copy of eagle_config before updating it.
| drafter_config = getattr(self.config, "eagle_config", {}) | |
| drafter_config.update(getattr(self.config, "dflash_config", {})) | |
| drafter_config = getattr(self.config, "eagle_config", {}).copy() | |
| drafter_config.update(getattr(self.config, "dflash_config", {})) |
|
Please leave as a draft PR until it is functional and ready-for-review, at which time you should include a PR description and unit tests. |
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.