[Model] Pipeline parallel support for Mixtral#6516
[Model] Pipeline parallel support for Mixtral#6516youkaichao merged 4 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well. To run full CI, you can do one of these:
🚀 |
f83603e to
d74f2e6
Compare
|
Tested locally with PP=8 and worked. |
|
can you test the correctness locally, using https://github.com/vllm-project/vllm/blob/main/tests/distributed/test_pipeline_parallel.py ? |
|
Passed with the following configures. Note that I tested it on 8xL4 so I have to use 8 GPUs to host the model. Also fixed some issues in the test file:
|
| # Use the same number or at most 8 GPUs to hold the model. | ||
| # In this test we assume the model can fit in 8 GPUs. | ||
| str(min(TP_SIZE * PP_SIZE, 8)), |
There was a problem hiding this comment.
it's not going to work. this will run in multi-node tests with mp backend, and we can use at most 2 GPUs.
you can revert this change, keep it only for your local testing.
There was a problem hiding this comment.
Reverted with comments.
Signed-off-by: Alvant <alvasian@yandex.ru>
Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
Take from #6403. Co-authored by @binxuan
cc @youkaichao