Skip to content

[torch.compile] Speed up MOE handling in forward_context#33184

Merged
zou3519 merged 1 commit intovllm-project:mainfrom
zou3519:moe_cold_start_cleanup
Jan 27, 2026
Merged

[torch.compile] Speed up MOE handling in forward_context#33184
zou3519 merged 1 commit intovllm-project:mainfrom
zou3519:moe_cold_start_cleanup

Conversation

@zou3519
Copy link
Copy Markdown
Collaborator

@zou3519 zou3519 commented Jan 27, 2026

Purpose

This is a follow up to the comments on
#32805 .

It contains the following two perf optimizations:

  • We don't need to recompute all of the MOE layer names on every forward pass. Instead we can get all of the layer names when the model is being initialized
  • Stop popping strings from a list. Instead, maintain a counter.

Test Plan & Test Result

I tested this locally. Compilation time remains good while models still produce reasonable results. Also, wait for CI.

This is a follow up to the comments on
vllm-project#32805 .

It contains the following two perf optimizations:
- We don't need to recompute all of the MOE layer names on every forward
  pass. Instead we can get all of the layer names when the model is
  being initialized
- Stop popping strings from a list. Instead, maintain a counter.

Signed-off-by: Richard Zou <zou3519@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully implements two performance optimizations for MOE handling: pre-computing MOE layer names during model initialization and using a counter-based approach instead of popping from a list in the ForwardContext. The changes are well-aligned with the stated purpose and appear to be correctly implemented across the affected files. The refactoring improves efficiency by reducing redundant computations and list manipulations during the forward pass. No critical or high-severity issues were found.

@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 27, 2026
@zou3519 zou3519 enabled auto-merge (squash) January 27, 2026 18:43
@zou3519 zou3519 merged commit d9aa39a into vllm-project:main Jan 27, 2026
58 checks passed
VedantMadane pushed a commit to VedantMadane/vllm that referenced this pull request Jan 28, 2026
…t#33184)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026
khluu pushed a commit that referenced this pull request Feb 3, 2026
Signed-off-by: Richard Zou <zou3519@gmail.com>
(cherry picked from commit d9aa39a)
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
…t#33184)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Signed-off-by: Pai <416932041@qq.com>
wangxiyuan added a commit to vllm-project/vllm-ascend that referenced this pull request Feb 5, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
veeceey added a commit to veeceey/vllm that referenced this pull request Feb 7, 2026
Avoid hard-coding attention layer name strings into the compiled graph
in unified_kv_cache_update. Each layer having a different name prevents
Inductor from reusing piecewise graphs across layers, increasing cold
start compilation time.

Apply the same approach used for MOE layers (vllm-project#32805, vllm-project#33184): store the
list of all KV cache update layer names at model init time and resolve
them at runtime via a counter in ForwardContext.

Fixes vllm-project#33267

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
SongyouZhong pushed a commit to SongyouZhong/vllm that referenced this pull request Mar 6, 2026
Avoid hard-coding attention layer name strings into the compiled graph
in unified_kv_cache_update. Each layer having a different name prevents
Inductor from reusing piecewise graphs across layers, increasing cold
start compilation time.

Apply the same approach used for MOE layers (vllm-project#32805, vllm-project#33184): store the
list of all KV cache update layer names at model init time and resolve
them at runtime via a counter in ForwardContext.

Fixes vllm-project#33267

Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants