Skip to content

[0.9.1][bugfix] fix deepseek memory bug#1551

Merged
ganyi1996ppo merged 1 commit intovllm-project:v0.9.1-devfrom
zzzzwwjj:v0.9.1-dev
Jul 2, 2025
Merged

[0.9.1][bugfix] fix deepseek memory bug#1551
ganyi1996ppo merged 1 commit intovllm-project:v0.9.1-devfrom
zzzzwwjj:v0.9.1-dev

Conversation

@zzzzwwjj
Copy link
Copy Markdown
Collaborator

@zzzzwwjj zzzzwwjj commented Jul 1, 2025

What this PR does / why we need it?

fix OOM error when chunked_prefill_for_mla is enable and long input scene.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Comment thread vllm_ascend/worker/model_runner_v1.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a compressed mask be used? If the sequence is too long, it might cause memory waste here.

Refer #1100

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ringmla's mask size is equal to chunksize, it won't cause memory waste.

@wangxiyuan wangxiyuan changed the title [bugfix] fix deepseek memory bug [0.9.1][bugfix] fix deepseek memory bug Jul 2, 2025
@ganyi1996ppo
Copy link
Copy Markdown
Collaborator

Looks good, please cherry-pick this change back to main

@ganyi1996ppo ganyi1996ppo merged commit 129a472 into vllm-project:v0.9.1-dev Jul 2, 2025
15 checks passed
@Yikun Yikun added the no-main label Jul 14, 2025
Comment thread vllm_ascend/worker/model_runner_v1.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider which scheduler is be using?

self.use_ring_mla = ascend_config.chunked_prefill_for_mla or \
            not ascend_config.ascend_scheduler_config.enabled

jianzs pushed a commit to jianzs/vllm-ascend that referenced this pull request Jul 29, 2025
fix OOM error when `chunked_prefill_for_mla` is enable and long input
scene.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
jianzs pushed a commit to jianzs/vllm-ascend that referenced this pull request Jul 29, 2025
fix OOM error when `chunked_prefill_for_mla` is enable and long input
scene.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
jianzs pushed a commit to jianzs/vllm-ascend that referenced this pull request Aug 6, 2025
fix OOM error when `chunked_prefill_for_mla` is enable and long input
scene.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
jianzs pushed a commit to jianzs/vllm-ascend that referenced this pull request Aug 7, 2025
fix OOM error when `chunked_prefill_for_mla` is enable and long input
scene.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
jianzs pushed a commit to jianzs/vllm-ascend that referenced this pull request Aug 8, 2025
fix OOM error when `chunked_prefill_for_mla` is enable and long input
scene.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
jianzs pushed a commit to jianzs/vllm-ascend that referenced this pull request Aug 11, 2025
fix OOM error when `chunked_prefill_for_mla` is enable and long input
scene.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants