Integrate quick allreduce and select the best allreduce implementation#18473
Integrate quick allreduce and select the best allreduce implementation#18473lihaoyang-amd wants to merge 9 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
5a6a5a8 to
e272658
Compare
|
what's the relationship between this and #16804 ? |
Aha, maybe we are in competition. We're from amd. We recently spent some time trying to integrate qr into vllm (because qr is very suitable for rocm) Integrating qr makes the two pr have many similarities, but it seems that the pr you mentioned #16804 only supports Q8 and Q 4. There are no obvious boundary conditions, quantization seems to have some problems, and lack of experimental data. Maybe we can work together to finish the work. |
08caa03 to
0989304
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
0989304 to
84b2ca1
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
d280d21 to
f194cac
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
f194cac to
50bd787
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
50bd787 to
3458abc
Compare
|
@youkaichao hi, kaichao, |
|
This pull request has merge conflicts that must be resolved before it can be |
I wish qr itself contains the logic of selecting qr or custom allreduce, since their interface is quite the same. My request is that we don't touch the cuda code path, so that people reading the code will not need to think about quick reduce. graph mode allreduce is necessary for some low-latency workloads where the batchsize is small. |
Hi, @youkaichao |
|
closing in favor of #19744 |
Uh oh!
There was an error while loading. Please reload this page.