[bugfix] Add ep initialization check and change the return check to is_driver_worker #896
Conversation
|
|
||
| # NOTE: Get port from envs directly when using torchrun | ||
| port = int(os.environ.get("MASTER_PORT", answer)) # type: ignore | ||
| port = int(os.environ.get("VLLM_DP_MASTER_PORT", answer)) # type: ignore |
There was a problem hiding this comment.
using envs.VLLM_DP_MASTER_PORT is better?
9df5c0f to
293eefe
Compare
| from torch.distributed.rendezvous import rendezvous | ||
| from vllm.config import ParallelConfig | ||
|
|
||
| _DP_GROUP = None |
There was a problem hiding this comment.
There is still process group for dp in vllm now, why we add this here?
There was a problem hiding this comment.
This is used to determine whether to execute the dummy_run of prefill process. The native stateless process does not have global variables to obtain.
|
|
||
|
|
||
| def model_parallel_initialized(): | ||
| return (_ETP is not None and _EP is not None) |
There was a problem hiding this comment.
I think we could use ep without etp, thus this will break this senario
There was a problem hiding this comment.
No. If ETP is not enabled, communication groups will still be created.
|
@NINGBENZHE Thanks for the details! FYI, vllm-project/vllm#18763 has been merged now. But I think global dp group maybe still should be kept in |
|
@zzzzwwjj please take a look at this pr, it fixes a bug on graph mode |
| intermediate_tensors=intermediate_tensors, | ||
| inputs_embeds=inputs_embeds) | ||
| if self.enable_torchair_graph_mode and attn_state == AscendAttentionState.DecodeOnly: | ||
| attn_metadata = self.attn_metadata_builder.dummy_build( |
There was a problem hiding this comment.
seem can't find where this dummy_build implementation
| num_reqs=num_tokens, num_actual_tokens=1) | ||
| torch._dynamo.mark_static(input_ids) | ||
| torch._dynamo.mark_static(positions) | ||
| torch._dynamo.mark_static(attn_metadata.decode.block_table) |
There was a problem hiding this comment.
How do you implement this dummy block_table? Did you request the free block id from the sheduler?
| attn_metadata.attn_state == AscendAttentionState.DecodeOnly) | ||
|
|
||
| if self.dp_group: | ||
| while not self.has_prefilled and self.enable_torchair_graph_mode and attn_metadata.attn_state == AscendAttentionState.DecodeOnly: |
There was a problem hiding this comment.
This part is quiet complicate. This is aim for mix running for decode and prefill across dp rank right? Can you extract this part out with more simple strategy. This part may trigger some unexpected error when someone try to update this.
There was a problem hiding this comment.
fixed. It will be removed along with its related calls once the official solution is implemented.
| positions = torch.cat([positions, padding]) | ||
|
|
||
| if self.enable_torchair_graph_mode: | ||
| self.sync_prefill_when_enable_graph(attn_metadata) |
There was a problem hiding this comment.
Can you add more comments here? this sync between prefill and decode code piece will be removed in the future for more robust implementation.
|
This implementation relay on the PR #839 , and is more like a work round for emergency task, just discussed with @NINGBENZHE , some of the main code piece will wrapped into some function and will be removed after more robust implementation delivered. This may helped on the code maintenance and prevent us from diverge vllm main tree too far. |
|
@wangxiyuan @Yikun Please aware this
|
| return (attn_metadata, hidden_states, spec_decode_metadata, positions, | ||
| total_num_scheduled_tokens, sample_indices) | ||
|
|
||
| def sync_prefill_when_enable_graph(self, attn_metadata): |
There was a problem hiding this comment.
This part will be removed after chunk mc2 support, cc @zzzzwwjj
|
DP support will soon be implemented in the next PR, so I think this workaround can temporarily not be merge to the main? |
When will the next pr be incorporated? |
after #839 merged |
Would it be possible to merge this first since another feature depends on the current functionality? When you submit the next PR, you could remove the related content later. |
If this PR is so urgent, perhaps you can add an option to disable these changes by default? |
|
@wangxiyuan can you look at this PR to decide whether this PR needs to be merge? |
The code is written in model_runner_v1 and remarks are made. |
The dp dummy run support on torchair will adopt another PR, this PR will not merge in vllm-ascend
2a308fa to
552daf0
Compare
|
|
||
| world_szie = torch.distributed.get_world_size() | ||
| tp_size = world_szie // all_to_all_group_size | ||
| tp_size = get_etp_group().world_size |
There was a problem hiding this comment.
@ganyi1996ppo please double check this change.
|
dp related change has been removed. I'm fine with this PR. thanks for the fix |
Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
|
Please update PR title and mesasge more meaningful. |
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
fixed |
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other bugs.
Does this PR introduce any user-facing change?
Wouldn't be
How was this patch tested?
Follow the end-to-end test