[Model] Pipeline parallel support for Mixtral#6403
[Model] Pipeline parallel support for Mixtral#6403binxuan wants to merge 4 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well. To run full CI, you can do one of these:
🚀 |
|
can you fix the format, and also use the latest change of #6406 ? |
Yeah sure, will use the format proposed in this PR. |
|
@binxuan I also need Mixtral PP support. If you don't have bandwidth, I could take over and file another PR with you as a co-author. Plz let me know. Thanks |
Sure, feel free to collaborate |
|
Closing as superseded by #6516 |
Add pipeline support for Mixtral. This is an extension of a previous merged PR
Below are the tests we have performed:
Environment: two AWS P4D instances each with 8 A100.
Model checkpoint: Mixtral-8x22B-Instruct-v0.1 running with TP=8 and PP=2
Case 1:
Input: {"role": "user", "content": "Who won the world series in 2020?"}
Output: {"role":"assistant","content":" The Los Angeles Dodgers won the World Series in 2020. They defeated the Tampa Bay Rays in six games. It was their first championship since 1988. The series was played at Globe Life Field in Arlington, Texas, due to the COVID-19 pandemic.","tool_calls":[]}
Case 2:
Input: {"role": "user", "content": "Who are you?"}
Output: {"role":"assistant","content":" I am an artificial intelligence assistant, designed to help answer questions, provide information, and assist with various tasks. I don't have personal experiences, emotions, or consciousness, but I can process and generate text based on the data I've been trained on.","tool_calls":[]}