[distributed] add function to create ipc buffers directly#10064
[distributed] add function to create ipc buffers directly#10064youkaichao merged 4 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
|
test failure is because we don't have a100 gpus in our ci queue. |
|
@youkaichao I can confirm this work on my machine. |
| world_size = dist.get_world_size(group=group) | ||
| rank = dist.get_rank(group=group) | ||
| handles = [None] * world_size | ||
| dist.all_gather_object(handles, handle, group=group) |
There was a problem hiding this comment.
Why do we need to use broadcast with device=cpu in _gather_ipc_meta but not here?
There was a problem hiding this comment.
when you use this function, the group argument should be cpu_group passed to custom allreduce object.
There was a problem hiding this comment.
see
vllm/vllm/distributed/parallel_state.py
Lines 231 to 236 in 4089985
There was a problem hiding this comment.
Why is all_gather fine here but not in _gather_ipc_meta?
There was a problem hiding this comment.
oh, that is because we met some issues with all_gather for tensors directly. here we are using all_gather_object , so it should be fine. see pytorch/pytorch#126032 for the pytorch issue.
There was a problem hiding this comment.
I see. Someone refactored this to return a Tensor
https://github.com/vllm-project/vllm/pull/5047/files#diff-44d9d733ee604800cbce9858a9201db1044aeff2c940fa4a0521d0c9b6541b3eL137
A better way should be returning a string, if torch bindings doesn't support int8 directly.
There was a problem hiding this comment.
yeah string should be fine. #5047 aims to get rid of pybind11 so that we can release python version agnostic wheels.
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com>
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
pytorch's ipc handle format can change, and using pytorch for cuda ipc will suffer from pytorch's change. see #9815 for example.
cc @hanzhi713