disable graph partition in custom op by BoyuanFeng · Pull Request #26952 · vllm-project/vllm

BoyuanFeng · 2025-10-15T23:14:29Z

This PR fixes a nested cudagraph capture issue.

Example:

We apply torch.compile directly on some ops (e.g., grouped_topk) wrapped in custom ops. Inductor graph partition applies cudagraph within the custom op.
At the same time, we compile the model which uses these custom ops. Inductor graph partition also wraps each graph partition with CUDAGraph. Some partitions may include custom ops, which has already been applied cudagraph. This leads to nested cudagraph which is not supported.

This context manager should be wrapped around torch.compile calls within custom ops to avoid the nested cudagraph capture.

Test:
VLLM_USE_STANDALONE_COMPILE=1 python examples/offline_inference/basic/generate.py --model deepseek-ai/DeepSeek-V2-Lite -O.use_inductor_graph_partition=True --max-model-len 1024

Signed-off-by: Boyuan Feng <boyuan@meta.com>

gemini-code-assist

Code Review

This pull request introduces a disable_graph_partition context manager to resolve a nested CUDAGraph capture issue that occurs when torch.compile is used on both a model and its custom operations. The approach is sound and correctly implemented using a context manager to temporarily modify Inductor's configuration. The fix is applied to the grouped_topk operation, which aligns with the problem description. My primary concern is the reliance on a private PyTorch API (torch._inductor.config), which poses a maintainability risk for future PyTorch upgrades. I've added a comment to highlight this.

gemini-code-assist · 2025-10-15T23:15:35Z

vllm/model_executor/utils.py

+    old_val = torch._inductor.config.graph_partition
+    try:
+        torch._inductor.config.graph_partition = False
+        yield
+    finally:
+        torch._inductor.config.graph_partition = old_val


This implementation relies on modifying torch._inductor.config.graph_partition, which is an internal, undocumented API of PyTorch's Inductor backend. While this is a clever solution to the nested CUDAGraph problem, it makes the code brittle and susceptible to breaking with future PyTorch updates. It would be beneficial to add a comment here warning about this dependency to aid future maintenance.

Suggested change

old_val = torch._inductor.config.graph_partition

try:

torch._inductor.config.graph_partition = False

yield

finally:

torch._inductor.config.graph_partition = old_val

# NOTE: This relies on an internal PyTorch Inductor API.

# This may break in future PyTorch versions.

old_val = torch._inductor.config.graph_partition

try:

torch._inductor.config.graph_partition = False

yield

finally:

torch._inductor.config.graph_partition = old_val

This config will be BC and tested in pytorch x vllm ci.

ProExpertProg

Could we try to make this a decorator so that people can just add it to the same callsite as the torch.compile call?

ProExpertProg · 2025-10-15T23:41:31Z

e.g.

@disable_inductor_partition
@torch.compile(...)
def grouped_topk(...)

Signed-off-by: Boyuan Feng <boyuan@meta.com>

ProExpertProg

Silly question: we can't just add options={"graph_partition": False} to the nested torch.compile decorator, can we? Assuming so, can you add a brief comment explaining why not?

vllm/model_executor/utils.py

Signed-off-by: Boyuan Feng <boyuan@meta.com>

Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com>

Signed-off-by: Boyuan Feng <boyuan@meta.com>

BoyuanFeng · 2025-10-16T01:06:24Z

@ProExpertProg yeah options should work

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

disable graph partition in custom op

29782df

Signed-off-by: Boyuan Feng <boyuan@meta.com>

BoyuanFeng requested a review from mgoin as a code owner October 15, 2025 23:14

gemini-code-assist bot reviewed Oct 15, 2025

View reviewed changes

zou3519 approved these changes Oct 15, 2025

View reviewed changes

zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2025

zou3519 requested a review from ProExpertProg October 15, 2025 23:36

ProExpertProg reviewed Oct 15, 2025

View reviewed changes

rewrite as decorator

8e08521

Signed-off-by: Boyuan Feng <boyuan@meta.com>

ProExpertProg approved these changes Oct 16, 2025

View reviewed changes

vllm/model_executor/utils.py Outdated Show resolved Hide resolved

BoyuanFeng and others added 2 commits October 15, 2025 17:22

nit

04aadb3

Signed-off-by: Boyuan Feng <boyuan@meta.com>

Update vllm/model_executor/utils.py

d5d36c3

Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com>

ProExpertProg mentioned this pull request Oct 16, 2025

[DO NOT MERGE] 2.9, Inductor partition, standalone compile, monkeypatch fix(es) #26738

Closed

BoyuanFeng added 3 commits October 15, 2025 17:49

lint

0ab7175

Signed-off-by: Boyuan Feng <boyuan@meta.com>

use torch.compile options

6f9339a

Signed-off-by: Boyuan Feng <boyuan@meta.com>

nit

c1dfad6

Signed-off-by: Boyuan Feng <boyuan@meta.com>

ProExpertProg added this to the vllm==v0.12.0/torch==2.9.0 compilation improvements milestone Oct 16, 2025

Merge branch 'main' into bf/disable-partition-in-custom-op

574cddf

ProExpertProg approved these changes Oct 16, 2025

View reviewed changes

DarkLight1337 merged commit 0840560 into vllm-project:main Oct 17, 2025
50 checks passed

ProExpertProg mentioned this pull request Oct 17, 2025

[RFC]: To Inductor partition or to not Inductor partition (by default in v0.11.1) #27080

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

disable graph partition in custom op#26952

disable graph partition in custom op#26952
DarkLight1337 merged 8 commits intovllm-project:mainfrom
BoyuanFeng:bf/disable-partition-in-custom-op

BoyuanFeng commented Oct 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 15, 2025

Uh oh!

BoyuanFeng Oct 15, 2025

Uh oh!

ProExpertProg left a comment

Uh oh!

ProExpertProg commented Oct 15, 2025

Uh oh!

ProExpertProg left a comment •

edited

Loading

Uh oh!

Uh oh!

BoyuanFeng commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

BoyuanFeng commented Oct 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

BoyuanFeng Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg commented Oct 15, 2025

Uh oh!

ProExpertProg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BoyuanFeng commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BoyuanFeng commented Oct 15, 2025 •

edited by github-actions bot

Loading

ProExpertProg left a comment •

edited

Loading