[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply#29066
[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply#29066vllm-bot merged 24 commits intovllm-project:mainfrom
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
FusedMoEMethodBase.apply
FusedMoEMethodBase.applyThere was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Looks like we need to land #29067 first? marking draft until that lands 👍 |
669f47c to
67fd46e
Compare
Yeah, sorry this should have been a draft to start with. It needs the other PR to land first. |
bf0aaf9 to
587ee3b
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
Hi @bnellnm, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @bnellnm, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
mgoin
left a comment
There was a problem hiding this comment.
Great work Bill! Thinking now on the state, the main downside I see of this move is that since we aren't passing arguments into the function, it is less explicit for each apply method to assert against unsupported modes as we add more. Hopefully we can move to a system like the attention backends for each kernel backend
Culprit commit: vllm-project/vllm#29066 --------- Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com> Signed-off-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai>
### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com>
…lm-project#29066) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Purpose
After making
select_expertsa non-static method (#29067), we can avoid passing most of the arguments toFusedMoEMethodBase.applyand get them from the layer directly.This doesn't really decrease flexibility, the MoE quantization methods were already fairly tightly coupled with
FusedMoE. A further step could be makingFusedMoEan abstract base class that simply provides thequant_methodparameters. Then the currentFusedMoEclass would become a subclass of that.Test Plan
CI
Test Result
Blackwell failure is known bug fixed by #30336
cc @varun-sundar-rabindranath
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.