[torch.compile] Speed up MOE handling in forward_context#33184
Merged
zou3519 merged 1 commit intovllm-project:mainfrom Jan 27, 2026
Merged
[torch.compile] Speed up MOE handling in forward_context#33184zou3519 merged 1 commit intovllm-project:mainfrom
zou3519 merged 1 commit intovllm-project:mainfrom
Conversation
This is a follow up to the comments on vllm-project#32805 . It contains the following two perf optimizations: - We don't need to recompute all of the MOE layer names on every forward pass. Instead we can get all of the layer names when the model is being initialized - Stop popping strings from a list. Instead, maintain a counter. Signed-off-by: Richard Zou <zou3519@gmail.com>
Contributor
There was a problem hiding this comment.
Code Review
The pull request successfully implements two performance optimizations for MOE handling: pre-computing MOE layer names during model initialization and using a counter-based approach instead of popping from a list in the ForwardContext. The changes are well-aligned with the stated purpose and appear to be correctly implemented across the affected files. The refactoring improves efficiency by reducing redundant computations and list manipulations during the forward pass. No critical or high-severity issues were found.
ProExpertProg
approved these changes
Jan 27, 2026
VedantMadane
pushed a commit
to VedantMadane/vllm
that referenced
this pull request
Jan 28, 2026
…t#33184) Signed-off-by: Richard Zou <zou3519@gmail.com> Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
1 task
apd10
pushed a commit
to apd10/vllm
that referenced
this pull request
Jan 31, 2026
…t#33184) Signed-off-by: Richard Zou <zou3519@gmail.com>
khluu
pushed a commit
that referenced
this pull request
Feb 3, 2026
Signed-off-by: Richard Zou <zou3519@gmail.com> (cherry picked from commit d9aa39a)
PiratePai
pushed a commit
to PiratePai/epd_shm
that referenced
this pull request
Feb 3, 2026
…t#33184) Signed-off-by: Richard Zou <zou3519@gmail.com> Signed-off-by: Pai <416932041@qq.com>
wangxiyuan
added a commit
to vllm-project/vllm-ascend
that referenced
this pull request
Feb 5, 2026
### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
veeceey
added a commit
to veeceey/vllm
that referenced
this pull request
Feb 7, 2026
Avoid hard-coding attention layer name strings into the compiled graph in unified_kv_cache_update. Each layer having a different name prevents Inductor from reusing piecewise graphs across layers, increasing cold start compilation time. Apply the same approach used for MOE layers (vllm-project#32805, vllm-project#33184): store the list of all KV cache update layer names at model init time and resolve them at runtime via a counter in ForwardContext. Fixes vllm-project#33267 Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
chenchuw886
pushed a commit
to chenchuw886/vllm-ascend
that referenced
this pull request
Feb 12, 2026
### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>
ItzDEXX
pushed a commit
to ItzDEXX/vllm
that referenced
this pull request
Feb 19, 2026
…t#33184) Signed-off-by: Richard Zou <zou3519@gmail.com>
ZRJ026
pushed a commit
to ZRJ026/vllm-ascend
that referenced
this pull request
Feb 28, 2026
### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241
pushed a commit
to maoxx241/vllm-ascend
that referenced
this pull request
Mar 2, 2026
### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ZRJ026
pushed a commit
to ZRJ026/vllm-ascend
that referenced
this pull request
Mar 4, 2026
### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
SongyouZhong
pushed a commit
to SongyouZhong/vllm
that referenced
this pull request
Mar 6, 2026
Avoid hard-coding attention layer name strings into the compiled graph in unified_kv_cache_update. Each layer having a different name prevents Inductor from reusing piecewise graphs across layers, increasing cold start compilation time. Apply the same approach used for MOE layers (vllm-project#32805, vllm-project#33184): store the list of all KV cache update layer names at model init time and resolve them at runtime via a counter in ForwardContext. Fixes vllm-project#33267 Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
LCAIZJ
pushed a commit
to LCAIZJ/vllm-ascend
that referenced
this pull request
Mar 7, 2026
### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This is a follow up to the comments on
#32805 .
It contains the following two perf optimizations:
Test Plan & Test Result
I tested this locally. Compilation time remains good while models still produce reasonable results. Also, wait for CI.