[Feature] Optimize forward metadata collection across dp ranks#1857
[Feature] Optimize forward metadata collection across dp ranks#1857jianzs wants to merge 1 commit intovllm-project:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1857 +/- ##
=======================================
Coverage 72.55% 72.56%
=======================================
Files 146 146
Lines 21710 21706 -4
=======================================
- Hits 15752 15751 -1
+ Misses 5958 5955 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Rebase here: f9dfde0 |
7b605f8 to
2b0c3c1
Compare
vllm_ascend/worker/worker_v1.py
Outdated
| if runner.dp_size > 1: | ||
| max_num_tokens, with_prefill = runner._get_forward_metadata_across_dp( | ||
| max_num_tokens, with_prefill) | ||
| if runner.dp_size <= 1: |
There was a problem hiding this comment.
Let's do the torchair refactor fist. Then add related code there #1885
There was a problem hiding this comment.
What are your plans for the model_runner refactoring?
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
please rebase to fix the merge conflict if this PR is still needed. |
ca448e1 to
5792cb0
Compare
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
5792cb0 to
c05804d
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
| with_prefill = bool(num_tokens_across_dp[-2]) | ||
| enable_dbo = not bool(num_tokens_across_dp[-1]) | ||
| num_tokens_across_dp = num_tokens_across_dp[:-2] | ||
| local_forward_metadata = torch.tensor( |
There was a problem hiding this comment.
Please provide the data for the memory increase.
Resubmit #1593
What this PR does / why we need it?
This PR introduces two optimizations for cases where data parallel size > 1:
1. Eliminates DP communication inset_forward_context2. Implements HCCL for DP metadata communication, resulting in significant performance improvements for large DP configurations
Does this PR introduce any user-facing change?
no
How was this patch tested?
CI passed.