Load tuned fused_moe_lora shrink and expand kernel configs separately…#21
Load tuned fused_moe_lora shrink and expand kernel configs separately…#21yugong333 wants to merge 2873 commits intowcwuwc:mainfrom yugong333:json_config_loading
Conversation
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
…lm-project#26728) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…dels (vllm-project#26526) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
…24354) Signed-off-by: Lu Fang <fanglu@fb.com>
…t#26732) Signed-off-by: mgoin <mgoin64@gmail.com>
…cifying compile sizes (vllm-project#26681) Signed-off-by: angelayi <yiangela7@gmail.com>
…NSE (vllm-project#26742) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
…-project#24024) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Update to default_act_function and pass as callable
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
…m-project#26723) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
…d. (alternative PR) (vllm-project#26718) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
…#26758) Signed-off-by: Ryan Li <ryanli@ryanli.org>
…-project#26684) Signed-off-by: wangyafeng <wangyafeng@baidu.com>
…26750) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
… ` (vllm-project#20983) Signed-off-by: Max Wittig <max.wittig@siemens.com> Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com> Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
…m-project#27169) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Chen Wu <cntryroa@gmail.com>
enable passing activation func so act_wrapper in lora will be called …
…tural output are enabled (vllm-project#26586) Signed-off-by: southfreebird <yvorott@gmail.com>
…ole` (vllm-project#27166) Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Andy Lo <andy@mistral.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Natan Bagrov <nbagrov@nvidia.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Natan Bagrov <nbagrov@nvidia.com> Co-authored-by: Roger Wang <hey@rogerw.io>
… H100 (FP8/BF16) (vllm-project#26268) Signed-off-by: Shivam <shivampr.dev@gmail.com>
…llm-project#27195) Signed-off-by: NickLucche <nlucches@redhat.com>
vllm-project#23812) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
fused_moe_lorakernelshrinkandexpandkernel configs in thefused_moe_lorafunctionnum_stagesandnum_warpsparameters in the configsNote: Based on PR vllm-project#21229 and vllm-project#26319
Test Plan
Test Result
Together with vllm-project#26319 we can improve the OTPS 80% - 90% in GPT-OSS-120B when concurrency is 1 or 2.
(Optional) Documentation Update