[Feature] add Dflash on Ascend by chenaoxuan · Pull Request #36764 · vllm-project/vllm

chenaoxuan · 2026-03-11T09:23:40Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2026-03-11T09:23:51Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2026-03-11T09:24:18Z

Documentation preview: https://vllm--36764.org.readthedocs.build/en/36764/

mergify · 2026-03-11T09:24:25Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chenaoxuan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-03-11T09:32:50Z

Documentation preview: https://vllm--36764.org.readthedocs.build/en/36764/

mergify · 2026-03-11T09:32:51Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chenaoxuan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces a new speculative decoding method named "dflash" for Qwen3 models. The changes involve defining a new DFlashQwen3ForCausalLM model, integrating "dflash" into the speculative configuration types, hash computation, method detection, and argument verification. It also adds specific auxiliary hidden state layer configurations for "dflash" within the Qwen3 model and adjusts position tensor initialization in the speculative decoding framework. Review comments indicate a type hint violation in get_eagle3_aux_hidden_state_layers where a list is returned instead of a tuple for the "dflash" method, and a potential side effect in Qwen3Model's __init__ due to an in-place modification of self.config.eagle_config when updating drafter_config.

gemini-code-assist · 2026-03-11T09:43:12Z

-    def get_eagle3_aux_hidden_state_layers(self) -> tuple[int, ...]:
+    def get_eagle3_aux_hidden_state_layers(self, method: str | None = None) -> tuple[int, ...]:
+        if method is not None and method == "dflash":
+            return [1, 9, 17, 25, 33]


The function get_eagle3_aux_hidden_state_layers is type-hinted to return a tuple[int, ...], but for the dflash method, it returns a list. This violates the type hint and could lead to unexpected behavior. Please return a tuple instead.

Suggested change

return [1, 9, 17, 25, 33]

return (1, 9, 17, 25, 33)

gemini-code-assist · 2026-03-11T09:43:12Z

+        drafter_config = getattr(self.config, "eagle_config", {})
+        drafter_config.update(getattr(self.config, "dflash_config", {}))


The update method is called on drafter_config, which might be a direct reference to self.config.eagle_config. This can lead to an unintended in-place modification of self.config.eagle_config, which could have side effects elsewhere. To avoid this, you should create a copy of eagle_config before updating it.

Suggested change

drafter_config = getattr(self.config, "eagle_config", {})

drafter_config.update(getattr(self.config, "dflash_config", {}))

drafter_config = getattr(self.config, "eagle_config", {}).copy()

drafter_config.update(getattr(self.config, "dflash_config", {}))

benchislett · 2026-03-11T15:24:01Z

Please leave as a draft PR until it is functional and ready-for-review, at which time you should include a PR description and unit tests.

[Feature] Dflash on Ascend

e57ca04

mergify bot added the documentation Improvements or additions to documentation label Mar 11, 2026

mergify bot added ci/build llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models speculative-decoding v1 labels Mar 11, 2026

mergify bot added the needs-rebase label Mar 11, 2026

chenaoxuan closed this Mar 11, 2026

chenaoxuan deleted the dflash branch March 11, 2026 09:31

chenaoxuan restored the dflash branch March 11, 2026 09:31

chenaoxuan reopened this Mar 11, 2026

chenaoxuan changed the base branch from main to releases/v0.13.0 March 11, 2026 09:33

mergify bot removed the needs-rebase label Mar 11, 2026

chenaoxuan mentioned this pull request Mar 11, 2026

[Feature] add DFlash Support vllm-project/vllm-ascend#7162

Closed

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

benchislett closed this Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] add Dflash on Ascend#36764

[Feature] add Dflash on Ascend#36764
chenaoxuan wants to merge 1 commit intovllm-project:releases/v0.13.0from
chenaoxuan:dflash

chenaoxuan commented Mar 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

benchislett commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		drafter_config = getattr(self.config, "eagle_config", {})
		drafter_config.update(getattr(self.config, "dflash_config", {}))

Uh oh!

Conversation

chenaoxuan commented Mar 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenaoxuan commented Mar 11, 2026 •

edited by github-actions bot

Loading