[torch.compile] Make inductor partition rules respect splitting_ops #25691 #25845

baonudesifeizhai · 2025-09-28T22:01:52Z

Key Changes

Preserve user-specified splitting_ops: When use_inductor_graph_partition=True, user-provided splitting_ops are now preserved in _user_specified_splitting_ops instead of being cleared
Dynamic partition rules: Implement _setup_dynamic_partition_rules() using PyTorch 2.9+'s register_should_partition_rule API to register custom partition points
Robust fallback mechanism: If dynamic rule registration fails for any user-specified operation, the system falls back to traditional splitting behavior with preserved splitting_ops
Comprehensive logging: Add detailed debug logging to track partition rule setup, registration success/failure, and fallback behavior
PyTorch version compatibility: Update requirements to support PyTorch 2.10.0.dev20250927+cu128 for the new partition rule API

Technical Implementation

Operation mapping: Map user-provided splitting_ops (including aliases like "flash_attention") to torch._ops.OpOverload objects
Rule registration: Use register_should_partition_rule(op_overload, partition_function) for each resolved operation
Duplicate prevention: Track registered overloads globally to prevent duplicate registrations
Alias resolution: Handle common operation aliases (e.g., "flash_attention" → "vllm.unified_attention")

Test Plan

Unit Tests

Updated test_splitting_ops_dynamic() to verify splitting_ops preservation behavior
Added comprehensive test coverage for dynamic partition rule setup
Updated existing compilation tests to reflect new default behavior

Integration Tests

Test with various splitting_ops configurations:

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-7B-Instruct \
  --compilation-config '{"use_inductor_graph_partition": true, "splitting_ops": ["flash_attention", "addmm", "aten.bmm.default"]}' \
  --host 0.0.0.0 --port 8000

Verify fallback behavior when PyTorch version doesn't support dynamic rules
Test with mixed resolvable/unresolvable operations

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

- Add _user_specified_splitting_ops field to store user configuration - Modify set_splitting_ops_for_inductor_graph_partition to respect user settings - Add debug logging to track splitting_ops handling - Addresses issue vllm-project#25691 - partial implementation for dynamic partitioning This change preserves user-specified splitting_ops when use_inductor_graph_partition=True, laying groundwork for future PyTorch 2.9+ register_should_partition_rule integration.

…ion-rules

- Add _setup_dynamic_partition_rules() method - Implement register_should_partition_rule integration - Support both attention ops and user-specified splitting_ops - Add comprehensive debug logging for partition decisions - Graceful fallback if PyTorch API not available This completes the implementation for issue vllm-project#25691

tests/compile/piecewise/test_simple.py

Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]>

Signed-off-by: Huy Do <[email protected]>

…llm-project#25691 (vllm-project#25845) Signed-off-by: baonudesifeizhai <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>

…llm-project#25691 (vllm-project#25845) Signed-off-by: baonudesifeizhai <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: bbartels <[email protected]>

…llm-project#25691 (vllm-project#25845) Signed-off-by: baonudesifeizhai <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a 1. refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a 2. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by vllm-project/vllm#25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by vllm-project/vllm#26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by vllm-project/vllm#23485 - Fix `MLAAttention` import,caused by vllm-project/vllm#25103 - Fix `SharedFusedMoE` import, caused by vllm-project/vllm#26145 - Fix `LazyLoader` improt, caused by vllm-project/vllm#27022 - Fix `vllm.utils.swap_dict_values` improt, caused by vllm-project/vllm#26990 - Fix `Backend` enum import, caused by vllm-project/vllm#25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by vllm-project/vllm#26355 - Fix fused_moe ops, caused by vllm-project/vllm#24097 - Fix bert model because of `inputs_embeds`, caused by vllm-project/vllm#25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by vllm-project/vllm#24172 - Fix `splitting_ops` changes introduced by vllm-project/vllm#25845 - Fix multi-modality changes introduced by vllm-project/vllm#16229 - Fix lora bias dropping issue introduced by vllm-project/vllm#25807 - Fix structured ouput break introduced by vllm-project/vllm#26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <[email protected]> Signed-off-by: Icey <[email protected]> Co-authored-by: Icey <[email protected]>

…llm-project#25691 (vllm-project#25845) Signed-off-by: baonudesifeizhai <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…llm-project#25691 (vllm-project#25845) Signed-off-by: baonudesifeizhai <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: 0xrushi <[email protected]>

ProExpertProg · 2025-10-31T15:36:06Z

tests/compile/test_config.py

+    # When attn_fusion pass enabled, splitting_ops now default to attention ops.
    config = VllmConfig(
        compilation_config=CompilationConfig(
+            level=CompilationLevel.PIECEWISE,
            pass_config={"enable_attn_fusion": True, "enable_noop": True},
            custom_ops=["+quant_fp8"],
            cudagraph_mode=CUDAGraphMode.PIECEWISE,
        )
    )
-    assert config.compilation_config.splitting_ops == []
-    # cudagraph mode also fall back to FULL
-    assert config.compilation_config.cudagraph_mode == CUDAGraphMode.FULL
-
-    # splitting_ops can not contain attention ops when attn_fusion
-    # pass enabled.
-    with pytest.raises(AssertionError):
-        config = VllmConfig(
-            compilation_config=CompilationConfig(
-                pass_config={"enable_attn_fusion": True, "enable_noop": True},
-                custom_ops=["+quant_fp8"],
-                cudagraph_mode=CUDAGraphMode.PIECEWISE,
-                # work around for accessing all attntion ops
-                splitting_ops=CompilationConfig()._attention_ops,
-            )
-        )
+    # With the new simplified logic, attention fusion works with splitting_ops
+    assert config.compilation_config.splitting_ops_contain_attention()
+    # cudagraph mode remains PIECEWISE
+    assert config.compilation_config.cudagraph_mode == CUDAGraphMode.PIECEWISE


I think this should be restored; without inductor partition, the old logic still applies @hmellor

…llm-project#25691 (vllm-project#25845) Signed-off-by: baonudesifeizhai <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

) ### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a 1. refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a 2. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by vllm-project/vllm#25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by vllm-project/vllm#26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by vllm-project/vllm#23485 - Fix `MLAAttention` import,caused by vllm-project/vllm#25103 - Fix `SharedFusedMoE` import, caused by vllm-project/vllm#26145 - Fix `LazyLoader` improt, caused by vllm-project/vllm#27022 - Fix `vllm.utils.swap_dict_values` improt, caused by vllm-project/vllm#26990 - Fix `Backend` enum import, caused by vllm-project/vllm#25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by vllm-project/vllm#26355 - Fix fused_moe ops, caused by vllm-project/vllm#24097 - Fix bert model because of `inputs_embeds`, caused by vllm-project/vllm#25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by vllm-project/vllm#24172 - Fix `splitting_ops` changes introduced by vllm-project/vllm#25845 - Fix multi-modality changes introduced by vllm-project/vllm#16229 - Fix lora bias dropping issue introduced by vllm-project/vllm#25807 - Fix structured ouput break introduced by vllm-project/vllm#26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <[email protected]> Signed-off-by: Icey <[email protected]> Co-authored-by: Icey <[email protected]> Signed-off-by: luolun <[email protected]>

) ### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a 1. refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a 2. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by vllm-project/vllm#25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by vllm-project/vllm#26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by vllm-project/vllm#23485 - Fix `MLAAttention` import,caused by vllm-project/vllm#25103 - Fix `SharedFusedMoE` import, caused by vllm-project/vllm#26145 - Fix `LazyLoader` improt, caused by vllm-project/vllm#27022 - Fix `vllm.utils.swap_dict_values` improt, caused by vllm-project/vllm#26990 - Fix `Backend` enum import, caused by vllm-project/vllm#25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by vllm-project/vllm#26355 - Fix fused_moe ops, caused by vllm-project/vllm#24097 - Fix bert model because of `inputs_embeds`, caused by vllm-project/vllm#25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by vllm-project/vllm#24172 - Fix `splitting_ops` changes introduced by vllm-project/vllm#25845 - Fix multi-modality changes introduced by vllm-project/vllm#16229 - Fix lora bias dropping issue introduced by vllm-project/vllm#25807 - Fix structured ouput break introduced by vllm-project/vllm#26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <[email protected]> Signed-off-by: Icey <[email protected]> Co-authored-by: Icey <[email protected]> Signed-off-by: hwhaokun <[email protected]>

) ### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a 1. refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a 2. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by vllm-project/vllm#25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by vllm-project/vllm#26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by vllm-project/vllm#23485 - Fix `MLAAttention` import,caused by vllm-project/vllm#25103 - Fix `SharedFusedMoE` import, caused by vllm-project/vllm#26145 - Fix `LazyLoader` improt, caused by vllm-project/vllm#27022 - Fix `vllm.utils.swap_dict_values` improt, caused by vllm-project/vllm#26990 - Fix `Backend` enum import, caused by vllm-project/vllm#25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by vllm-project/vllm#26355 - Fix fused_moe ops, caused by vllm-project/vllm#24097 - Fix bert model because of `inputs_embeds`, caused by vllm-project/vllm#25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by vllm-project/vllm#24172 - Fix `splitting_ops` changes introduced by vllm-project/vllm#25845 - Fix multi-modality changes introduced by vllm-project/vllm#16229 - Fix lora bias dropping issue introduced by vllm-project/vllm#25807 - Fix structured ouput break introduced by vllm-project/vllm#26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <[email protected]> Signed-off-by: Icey <[email protected]> Co-authored-by: Icey <[email protected]> Signed-off-by: nsdie <[email protected]>

…llm-project#25691 (vllm-project#25845) Signed-off-by: baonudesifeizhai <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

baonudesifeizhai and others added 19 commits September 26, 2025 20:15

Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…

454c7e6

…ion-rules

debugZ

08bdb8e

change torch version

0dc121a

change torch version

37a07a6

change torch

32127e5

update

6349450

change

8c31eda

debug

02dc0f2

debug

6b6e24d

debug

e89fc82

debug

10458e0

change

0d6821c

fix ruff and yapf

195ca5e

change test

aef7f80

fix test

5a23f4f

fix user

5901e88

fix ruff and yapf

d5f6b7c

baonudesifeizhai requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners September 28, 2025 22:01

mergify bot added the ci/build label Sep 28, 2025

node.target can be OpOverloadPacket, need to check .default

bbd1bbd

ProExpertProg reviewed Oct 10, 2025

View reviewed changes

tests/compile/piecewise/test_simple.py Outdated Show resolved Hide resolved

Update tests/compile/piecewise/test_simple.py

ad4419e

Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: baonudesifeizhai <[email protected]>

ProExpertProg enabled auto-merge (squash) October 10, 2025 12:53

ProExpertProg disabled auto-merge October 10, 2025 12:54

ProExpertProg enabled auto-merge (squash) October 10, 2025 12:54

ProExpertProg changed the title ~~Feature/dynamic inductor partition rules #25691~~ [torch.compile] Make inductor partition rules respect splitting_ops #25691 Oct 10, 2025

ProExpertProg merged commit cddce79 into vllm-project:main Oct 10, 2025
49 checks passed

github-project-automation bot moved this from In review to Done in torch.compile integration Oct 10, 2025

huydhn added a commit to pytorch/pytorch that referenced this pull request Oct 11, 2025

Avoid vllm-project/vllm#25845

358ce1c

Signed-off-by: Huy Do <[email protected]>

ProExpertProg mentioned this pull request Oct 11, 2025

[compile] Fix inductor partition config #26645

Merged

huydhn mentioned this pull request Oct 12, 2025

[vllm hash update] update the pinned vllm hash pytorch/pytorch#164628

Closed

huydhn added a commit to huydhn/pytorch that referenced this pull request Oct 12, 2025

Update vLLM pinned commit to vllm-project/vllm#25845

d7ba8da

Signed-off-by: Huy Do <[email protected]>

angelayi mentioned this pull request Oct 13, 2025

[Bug]: use_inductor_partition + splitting_ops results in AssertionError #26678

Closed

1 task

BoyuanFeng mentioned this pull request Oct 16, 2025

[torch.compile] fix simple inductor graph partition test #27050

Merged

MengqingCao mentioned this pull request Oct 22, 2025

[1/N][Refactor] Refactor code to adapt with vllm main vllm-project/vllm-ascend#3612

Merged

ProExpertProg reviewed Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[torch.compile] Make inductor partition rules respect splitting_ops #25691 #25845

[torch.compile] Make inductor partition rules respect splitting_ops #25691 #25845

Uh oh!

baonudesifeizhai commented Sep 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

ProExpertProg Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[torch.compile] Make inductor partition rules respect splitting_ops #25691 #25845

[torch.compile] Make inductor partition rules respect splitting_ops #25691 #25845

Uh oh!

Conversation

baonudesifeizhai commented Sep 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Technical Implementation

Test Plan

Unit Tests

Integration Tests

Uh oh!

Uh oh!

Uh oh!

ProExpertProg Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

baonudesifeizhai commented Sep 28, 2025 •

edited by github-actions bot

Loading