Skip to content

[Feature] add Dflash on Ascend#36764

Closed
chenaoxuan wants to merge 1 commit intovllm-project:releases/v0.13.0from
chenaoxuan:dflash
Closed

[Feature] add Dflash on Ascend#36764
chenaoxuan wants to merge 1 commit intovllm-project:releases/v0.13.0from
chenaoxuan:dflash

Conversation

@chenaoxuan
Copy link
Copy Markdown

@chenaoxuan chenaoxuan commented Mar 11, 2026

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 11, 2026

Documentation preview: https://vllm--36764.org.readthedocs.build/en/36764/

@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 11, 2026
@mergify mergify bot added ci/build llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models speculative-decoding v1 labels Mar 11, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chenaoxuan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 11, 2026
@chenaoxuan chenaoxuan closed this Mar 11, 2026
@chenaoxuan chenaoxuan deleted the dflash branch March 11, 2026 09:31
@chenaoxuan chenaoxuan restored the dflash branch March 11, 2026 09:31
@chenaoxuan chenaoxuan reopened this Mar 11, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 11, 2026

Documentation preview: https://vllm--36764.org.readthedocs.build/en/36764/

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chenaoxuan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@chenaoxuan chenaoxuan changed the base branch from main to releases/v0.13.0 March 11, 2026 09:33
@mergify mergify bot removed the needs-rebase label Mar 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new speculative decoding method named "dflash" for Qwen3 models. The changes involve defining a new DFlashQwen3ForCausalLM model, integrating "dflash" into the speculative configuration types, hash computation, method detection, and argument verification. It also adds specific auxiliary hidden state layer configurations for "dflash" within the Qwen3 model and adjusts position tensor initialization in the speculative decoding framework. Review comments indicate a type hint violation in get_eagle3_aux_hidden_state_layers where a list is returned instead of a tuple for the "dflash" method, and a potential side effect in Qwen3Model's __init__ due to an in-place modification of self.config.eagle_config when updating drafter_config.

def get_eagle3_aux_hidden_state_layers(self) -> tuple[int, ...]:
def get_eagle3_aux_hidden_state_layers(self, method: str | None = None) -> tuple[int, ...]:
if method is not None and method == "dflash":
return [1, 9, 17, 25, 33]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function get_eagle3_aux_hidden_state_layers is type-hinted to return a tuple[int, ...], but for the dflash method, it returns a list. This violates the type hint and could lead to unexpected behavior. Please return a tuple instead.

Suggested change
return [1, 9, 17, 25, 33]
return (1, 9, 17, 25, 33)

Comment on lines +260 to +261
drafter_config = getattr(self.config, "eagle_config", {})
drafter_config.update(getattr(self.config, "dflash_config", {}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The update method is called on drafter_config, which might be a direct reference to self.config.eagle_config. This can lead to an unintended in-place modification of self.config.eagle_config, which could have side effects elsewhere. To avoid this, you should create a copy of eagle_config before updating it.

Suggested change
drafter_config = getattr(self.config, "eagle_config", {})
drafter_config.update(getattr(self.config, "dflash_config", {}))
drafter_config = getattr(self.config, "eagle_config", {}).copy()
drafter_config.update(getattr(self.config, "dflash_config", {}))

@benchislett
Copy link
Copy Markdown
Collaborator

Please leave as a draft PR until it is functional and ready-for-review, at which time you should include a PR description and unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants