disable graph partition in custom op#26952
Conversation
Signed-off-by: Boyuan Feng <boyuan@meta.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a disable_graph_partition context manager to resolve a nested CUDAGraph capture issue that occurs when torch.compile is used on both a model and its custom operations. The approach is sound and correctly implemented using a context manager to temporarily modify Inductor's configuration. The fix is applied to the grouped_topk operation, which aligns with the problem description. My primary concern is the reliance on a private PyTorch API (torch._inductor.config), which poses a maintainability risk for future PyTorch upgrades. I've added a comment to highlight this.
vllm/model_executor/utils.py
Outdated
| old_val = torch._inductor.config.graph_partition | ||
| try: | ||
| torch._inductor.config.graph_partition = False | ||
| yield | ||
| finally: | ||
| torch._inductor.config.graph_partition = old_val |
There was a problem hiding this comment.
This implementation relies on modifying torch._inductor.config.graph_partition, which is an internal, undocumented API of PyTorch's Inductor backend. While this is a clever solution to the nested CUDAGraph problem, it makes the code brittle and susceptible to breaking with future PyTorch updates. It would be beneficial to add a comment here warning about this dependency to aid future maintenance.
| old_val = torch._inductor.config.graph_partition | |
| try: | |
| torch._inductor.config.graph_partition = False | |
| yield | |
| finally: | |
| torch._inductor.config.graph_partition = old_val | |
| # NOTE: This relies on an internal PyTorch Inductor API. | |
| # This may break in future PyTorch versions. | |
| old_val = torch._inductor.config.graph_partition | |
| try: | |
| torch._inductor.config.graph_partition = False | |
| yield | |
| finally: | |
| torch._inductor.config.graph_partition = old_val |
There was a problem hiding this comment.
This config will be BC and tested in pytorch x vllm ci.
ProExpertProg
left a comment
There was a problem hiding this comment.
Could we try to make this a decorator so that people can just add it to the same callsite as the torch.compile call?
|
e.g. |
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com>
|
@ProExpertProg yeah options should work |
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
This PR fixes a nested cudagraph capture issue.
Example:
We apply torch.compile directly on some ops (e.g., grouped_topk) wrapped in custom ops. Inductor graph partition applies cudagraph within the custom op.
At the same time, we compile the model which uses these custom ops. Inductor graph partition also wraps each graph partition with CUDAGraph. Some partitions may include custom ops, which has already been applied cudagraph. This leads to nested cudagraph which is not supported.
This context manager should be wrapped around torch.compile calls within custom ops to avoid the nested cudagraph capture.
Test:
VLLM_USE_STANDALONE_COMPILE=1 python examples/offline_inference/basic/generate.py --model deepseek-ai/DeepSeek-V2-Lite -O.use_inductor_graph_partition=True --max-model-len 1024