Skip to content

[Perf] Deepseekv3 performance optimization for eager mode#598

Merged
wangxiyuan merged 15 commits intovllm-project:mainfrom
ganyi1996ppo:ganyi/deepseek_v0_scheduler
Apr 29, 2025
Merged

[Perf] Deepseekv3 performance optimization for eager mode#598
wangxiyuan merged 15 commits intovllm-project:mainfrom
ganyi1996ppo:ganyi/deepseek_v0_scheduler

Conversation

@ganyi1996ppo
Copy link
Copy Markdown
Collaborator

What this PR does / why we need it?

Deepseek v3 now adopt vanilla chunked prefill on MLA part which is ineffcient for computing but necessary for chunked prefill. Since PR #543 bring v0 scheduler into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside the mla backend for more performance boost. Also there are some redundant computation inside the rope, which is also removed. This PR should bring some performance gain for deepseek eager mode inference.

Does this PR introduce any user-facing change?

How was this patch tested?

@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

Testing this PR on [2048in, 128out] * 16 scen with tp=2, ep=2 on deepseekv3-lite eager mode, got roughly 50% perf boost.

…0 scheduler

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
… cache

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/deepseek_v0_scheduler branch from 7a49e27 to a61cf81 Compare April 22, 2025 03:59
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

ganyi1996ppo commented Apr 22, 2025

Found custom rotary_embedding was not opened before, after enable custom rotary_embedding, performance gain reach to roughly 100% compared to the main branch

@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

BTW, this PR will enable v0 scheduler plug into engine v1 by default. Maybe we should discuss about this @wangxiyuan @wuhuikx .

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo ganyi1996ppo requested a review from wangxiyuan April 22, 2025 05:34
@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

@wuhuikx @wangxiyuan please help to review this pr

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

Disabled default v0 scheduler

@wuhuikx @wangxiyuan please help to review this pr

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

# TODO: Patch when aclnn ops avaiable
RotaryEmbedding.forward_oot = rope_forward_oot
DeepseekScalingRotaryEmbedding.__init__ = deepseek_rope_init_func
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this kind of change which override vllm. We should add note to describe why. L269 is the same.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DeepseekScalingRotaryEmbeeding is not a custom op in stock vLLM? Can we file a PR to vLLM to make it a custom op, after that can use the forward_oot, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a custom op, but there are some thing exist makes us needs for this changes.

  • native deepseek rope cache the cos_sin differently compared with the naive implement in deepseek huggingface, which we found is more ascend friendly compared with vllm's impl
  • its just override the forward rather than reuse its custom op's interface.

I'll add some comments on those line to explain the specific reason

if self._num_prefills > 0:
reqs_start = self._num_decodes # prefill_start
tokens_start = self._num_decode_tokens
max_query_len = query_lens[tokens_start:].max().item()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_lens is a device tensor? if so, many D2H here, is this operation necessary?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_lens is actually a cpu tensor, so no d2h operation will happened here, you can refer to line 220

# TODO: below padding should be removed after kernel is ready
# we found npu_flash_attention can only works on 128 divisible head_dim, we pad it to target size here
# and slice the final result to guarantee its functionality.
self.padding_head_dim = (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prefill, we use MHA for computation, then the head_dim = nope_dim + rope_dim (192), while in decode, the absorbed and move_elision strategies are adopt, the head_dim=nope_dim, and we don't need pad, am I right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are definately right, this padding dim is used for prefill to padding the tensor. Not just for v_head_dim vs (qk_rope + qk_nope), but also for the 128 divisble head_dim alignment requirements for the _npu_flash_attention

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo
Copy link
Copy Markdown
Collaborator Author

ganyi1996ppo commented Apr 27, 2025

The above data is measured under the v0 scheduler + v1 engine, if v0 scheduler is not enable, vanilla chunked prefill will be used to guarantee the functionality, which is combined with bunch of small ops. Vanilla operation harm the performance enoumously with lots of additional host and device operation, so we strongly recommand to adopt scheduler v0 for perf test by just add one single line.

To enable v0 scheduler in engine v1, you can pass additional_config with dict ascend_scheduler_config. For example, in your LLM offline_inference script, you should pass additional_config with kwarg ascend_scheduler_config to activate v0 scheduler in v1 engine in LLM:

    llm = LLM(model="/data/weights/deepseek-ai/deepseekv3-lite-base-latest",
              tensor_parallel_size=2,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=1024,
              additional_config={"ascend_scheduler_config": {}})

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo ganyi1996ppo changed the title Deepseekv3 performance optimization for eager mode [Perf] Deepseekv3 performance optimization for eager mode Apr 27, 2025
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Comment thread vllm_ascend/platform.py
else:
CUSTOM_OP_ENABLED = True
except ImportError:
logging.warning(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logger from vllm is better. And line 39 can be moved to L33. We can fix the nit later.

@wangxiyuan wangxiyuan merged commit 0329fad into vllm-project:main Apr 29, 2025
14 checks passed
@wangxiyuan
Copy link
Copy Markdown
Collaborator

The above data is measured under the v0 scheduler + v1 engine, if v0 scheduler is not enable, vanilla chunked prefill will be used to guarantee the functionality, which is combined with bunch of small ops. Vanilla operation harm the performance enoumously with lots of additional host and device operation, so we strongly recommand to adopt scheduler v0 for perf test by just add one single line.

To enable v0 scheduler in engine v1, you can pass additional_config with dict ascend_scheduler_config. For example, in your LLM offline_inference script, you should pass additional_config with kwarg ascend_scheduler_config to activate v0 scheduler in v1 engine in LLM:

    llm = LLM(model="/data/weights/deepseek-ai/deepseekv3-lite-base-latest",
              tensor_parallel_size=2,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=1024,
              additional_config={"ascend_scheduler_config": {}})

This info is very useful. Consider we plan to write a feature doc about ascend scheduler, this info can be involved in.

Yikun added a commit that referenced this pull request Jun 9, 2025
### What this PR does / why we need it?
As plus of #1070, this
patch adds `Nominating and Removing Maintainers` section (reference some
design from [PyTorch
Governance](https://docs.pytorch.org/docs/stable/community/governance.html))

Below are key info about existing maintainers:

## @wangxiyuan: 
- Super active code and high quality reviewer [450+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3Awangxiyuan).
- One of the top contributors, he also active contribute [50+ commits
](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3Awangxiyuan+)
with good quality, he dares to [refactor the
code](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Awangxiyuan+is%3Aclosed+refactor),
which also shows his deep understanding of vllm and vllm ascend.
- He leads the [[RFC]: Hardware
pluggable](vllm-project/vllm#11162) feature,
this make vllm-ascend project become true.
- Active community involved cross wechat group, slack, github issue.
Involved on [150+
issue](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Awangxiyuan)
and help users. He is also the spearker of vLLM Beijing meetup help more
users understand vLLM Ascend.
- Relase manager of
[v0.7.1rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.1rc1),
[v0.7.3rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc1),
[v0.7.3rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc2),
[v0.8.4rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.4rc1),
[v0.7.3.post1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3.post1).

## @Yikun: 
- High active code reviewer: [190+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AYikun),
especially for new developers to help them onboarding.
- One of the top contributors with sustained contributions: [50+
commits](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3AYikun+)
since the first day of vLLM Ascend.
- High quality contributions around vLLM compatibility guarantee and
also maintain [CI
](#1040) and [test
Framework](#730).
- Active community involved cross local group, github issue Involved on
[170+
issue](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3AYikun).
He is also main organizer of vLLM Beijing Meetup and speaker of [PyTorch
Day China
2025](https://pytorchdaychina2025.sched.com/event/2401V/poster-session)
to help vLLM Ascend growth.
- Relase manager of
[v0.8.4rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.4rc2),
[v0.8.5rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.5rc1),
[v0.7.3](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3).

## @ganyi1996ppo 
- High active code and high quality reviewer: [90+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3Aganyi1996ppo),
he has a deep understanding of Ascend operators can always find some key
issues, has deeply understand of the codebase, good code quality and
qualified judgement.
- Major and high quality contributions: [10+
commits](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3Aganyi1996ppo)
with high quality.
- He is the main contributor of [Custom AscendC op
support](#371),
[Deepseekv3 performance
optimization](#598).
- Community Involvement‌: Involved on [11+ issue and help
users](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Aganyi1996ppo),
share [custom ops
topic](https://www.bilibili.com/video/BV1Z25az3EqS/?share_source=copy_web&vd_source=72ef9c665af5f2f1370abe26ce1f719f&t=1342)
on vLLM Ascend Weekly meeting.


### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
…ct#598)

### What this PR does / why we need it?
Deepseek v3 now adopt vanilla chunked prefill on MLA part which is
ineffcient for computing but necessary for chunked prefill. Since PR
vllm-project#543 bring v0 scheduler
into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside
the mla backend for more performance boost. Also there are some
redundant computation inside the rope, which is also removed. This PR
should bring some performance gain for deepseek eager mode inference.

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
### What this PR does / why we need it?
As plus of vllm-project#1070, this
patch adds `Nominating and Removing Maintainers` section (reference some
design from [PyTorch
Governance](https://docs.pytorch.org/docs/stable/community/governance.html))

Below are key info about existing maintainers:

## @wangxiyuan: 
- Super active code and high quality reviewer [450+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3Awangxiyuan).
- One of the top contributors, he also active contribute [50+ commits
](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3Awangxiyuan+)
with good quality, he dares to [refactor the
code](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Awangxiyuan+is%3Aclosed+refactor),
which also shows his deep understanding of vllm and vllm ascend.
- He leads the [[RFC]: Hardware
pluggable](vllm-project/vllm#11162) feature,
this make vllm-ascend project become true.
- Active community involved cross wechat group, slack, github issue.
Involved on [150+
issue](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Awangxiyuan)
and help users. He is also the spearker of vLLM Beijing meetup help more
users understand vLLM Ascend.
- Relase manager of
[v0.7.1rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.1rc1),
[v0.7.3rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc1),
[v0.7.3rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc2),
[v0.8.4rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.4rc1),
[v0.7.3.post1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3.post1).

## @Yikun: 
- High active code reviewer: [190+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AYikun),
especially for new developers to help them onboarding.
- One of the top contributors with sustained contributions: [50+
commits](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3AYikun+)
since the first day of vLLM Ascend.
- High quality contributions around vLLM compatibility guarantee and
also maintain [CI
](vllm-project#1040) and [test
Framework](vllm-project#730).
- Active community involved cross local group, github issue Involved on
[170+
issue](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3AYikun).
He is also main organizer of vLLM Beijing Meetup and speaker of [PyTorch
Day China
2025](https://pytorchdaychina2025.sched.com/event/2401V/poster-session)
to help vLLM Ascend growth.
- Relase manager of
[v0.8.4rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.4rc2),
[v0.8.5rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.5rc1),
[v0.7.3](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3).

## @ganyi1996ppo 
- High active code and high quality reviewer: [90+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3Aganyi1996ppo),
he has a deep understanding of Ascend operators can always find some key
issues, has deeply understand of the codebase, good code quality and
qualified judgement.
- Major and high quality contributions: [10+
commits](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3Aganyi1996ppo)
with high quality.
- He is the main contributor of [Custom AscendC op
support](vllm-project#371),
[Deepseekv3 performance
optimization](vllm-project#598).
- Community Involvement‌: Involved on [11+ issue and help
users](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Aganyi1996ppo),
share [custom ops
topic](https://www.bilibili.com/video/BV1Z25az3EqS/?share_source=copy_web&vd_source=72ef9c665af5f2f1370abe26ce1f719f&t=1342)
on vLLM Ascend Weekly meeting.


### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ct#598)

### What this PR does / why we need it?
Deepseek v3 now adopt vanilla chunked prefill on MLA part which is
ineffcient for computing but necessary for chunked prefill. Since PR
vllm-project#543 bring v0 scheduler
into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside
the mla backend for more performance boost. Also there are some
redundant computation inside the rope, which is also removed. This PR
should bring some performance gain for deepseek eager mode inference.

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
As plus of vllm-project#1070, this
patch adds `Nominating and Removing Maintainers` section (reference some
design from [PyTorch
Governance](https://docs.pytorch.org/docs/stable/community/governance.html))

Below are key info about existing maintainers:

## @wangxiyuan: 
- Super active code and high quality reviewer [450+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3Awangxiyuan).
- One of the top contributors, he also active contribute [50+ commits
](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3Awangxiyuan+)
with good quality, he dares to [refactor the
code](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Awangxiyuan+is%3Aclosed+refactor),
which also shows his deep understanding of vllm and vllm ascend.
- He leads the [[RFC]: Hardware
pluggable](vllm-project/vllm#11162) feature,
this make vllm-ascend project become true.
- Active community involved cross wechat group, slack, github issue.
Involved on [150+
issue](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Awangxiyuan)
and help users. He is also the spearker of vLLM Beijing meetup help more
users understand vLLM Ascend.
- Relase manager of
[v0.7.1rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.1rc1),
[v0.7.3rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc1),
[v0.7.3rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc2),
[v0.8.4rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.4rc1),
[v0.7.3.post1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3.post1).

## @Yikun: 
- High active code reviewer: [190+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AYikun),
especially for new developers to help them onboarding.
- One of the top contributors with sustained contributions: [50+
commits](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3AYikun+)
since the first day of vLLM Ascend.
- High quality contributions around vLLM compatibility guarantee and
also maintain [CI
](vllm-project#1040) and [test
Framework](vllm-project#730).
- Active community involved cross local group, github issue Involved on
[170+
issue](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3AYikun).
He is also main organizer of vLLM Beijing Meetup and speaker of [PyTorch
Day China
2025](https://pytorchdaychina2025.sched.com/event/2401V/poster-session)
to help vLLM Ascend growth.
- Relase manager of
[v0.8.4rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.4rc2),
[v0.8.5rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.8.5rc1),
[v0.7.3](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3).

## @ganyi1996ppo 
- High active code and high quality reviewer: [90+ PR
reviewed](https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3Aganyi1996ppo),
he has a deep understanding of Ascend operators can always find some key
issues, has deeply understand of the codebase, good code quality and
qualified judgement.
- Major and high quality contributions: [10+
commits](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+review%3Aapproved+author%3Aganyi1996ppo)
with high quality.
- He is the main contributor of [Custom AscendC op
support](vllm-project#371),
[Deepseekv3 performance
optimization](vllm-project#598).
- Community Involvement‌: Involved on [11+ issue and help
users](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Aganyi1996ppo),
share [custom ops
topic](https://www.bilibili.com/video/BV1Z25az3EqS/?share_source=copy_web&vd_source=72ef9c665af5f2f1370abe26ce1f719f&t=1342)
on vLLM Ascend Weekly meeting.


### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants