[Graph Partition] fix graph partition input signature for fallback kernels by BoyuanFeng · Pull Request #165815 · pytorch/pytorch

BoyuanFeng · 2025-10-18T02:35:46Z

Scheduler relies on node.last_usage to free buffers. last_usage may contain a buffer that is allocated in previous graph partition AND not directly accessed in the current graph partition.

Example

def f(x):
    y = x + 1
    z = torch.ops.aten.view.dtype(y, torch.float8_e4m3fn)
    z_cpu = z.cpu()
    u_cuda = z_cpu.cuda()
    return u_cuda

In the generated code, we have

def partition_0(args):
    ...
    # Topologically Sorted Source Nodes: [y, z], Original ATen: [aten.add, aten.view]
    buf1 = torch.ops.aten.view.dtype(buf0, torch.float8_e4m3fn) # < ------ buf1 is a view of buf0
    buf2 = buf1 # <------- buf2 is buf1 
    assert_size_stride(buf2, (8, ), (1, ), 'torch.ops.aten.view.dtype')
    assert_alignment(buf2, 16, 'torch.ops.aten.view.dtype')
    return (buf2, )

def call(self, args):
    ...
    (buf2,) = self.partitions[0](partition0_args)
    ...
    buf3.copy_(buf2, False)
    del buf0
    del buf1
    del buf2  # <---- `del buf2` leads to `del buf0`. BUT `buf0` is not returned from partition_0.
    ...

Note: view is treated as a fallback kernel due to its special dtype.

pytorch/torch/_inductor/lowering.py

Lines 841 to 843 in de09bab

    
           if src_bits != dst_bits: 
        
               # fallback to aten eager implementation for differing bitwidths 
        
               return fallback_handler(aten.view.dtype)(x, dtype)

Fix

This PR fixes the issue by also returning these buffers to be freed later.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot · 2025-10-18T02:35:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165815

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

AWS was down, GHA infrastructure effected / recovering

❌ 1 New Failure, 1 Unrelated Failure

As of commit 976a2c7 with merge base de09bab ():

NEW FAILURE - The following job has failed:

pull / linux-jammy-py3.10-gcc11 / test (distributed, 1, 2, linux.2xlarge) (gh)
test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_allgather

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge) (gh) (trunk failure)
export/tests/test_target_recipes.py::TestTargetRecipes::test_w2l_model

This comment was automatically generated by Dr. CI and updates every 15 minutes.

BoyuanFeng · 2025-10-20T22:21:22Z

@pytorchbot merge -f "skip unrelated distributed test failure"

pytorchmergebot · 2025-10-20T22:23:09Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…rnels (pytorch#165815) Scheduler relies on node.last_usage to free buffers. `last_usage` may contain a buffer that is allocated in previous graph partition AND not directly accessed in the current graph partition. ## Example ```python def f(x): y = x + 1 z = torch.ops.aten.view.dtype(y, torch.float8_e4m3fn) z_cpu = z.cpu() u_cuda = z_cpu.cuda() return u_cuda ``` In the generated code, we have ``` def partition_0(args): ... # Topologically Sorted Source Nodes: [y, z], Original ATen: [aten.add, aten.view] buf1 = torch.ops.aten.view.dtype(buf0, torch.float8_e4m3fn) # < ------ buf1 is a view of buf0 buf2 = buf1 # <------- buf2 is buf1 assert_size_stride(buf2, (8, ), (1, ), 'torch.ops.aten.view.dtype') assert_alignment(buf2, 16, 'torch.ops.aten.view.dtype') return (buf2, ) def call(self, args): ... (buf2,) = self.partitions[0](partition0_args) ... buf3.copy_(buf2, False) del buf0 del buf1 del buf2 # <---- `del buf2` leads to `del buf0`. BUT `buf0` is not returned from partition_0. ... ``` Note: view is treated as a fallback kernel due to its special dtype. https://github.com/pytorch/pytorch/blob/de09bab4b66002a8a9a2195f50f96a78868a3d39/torch/_inductor/lowering.py#L841-L843 ## Fix This PR fixes the issue by also returning these buffers to be freed later. Pull Request resolved: pytorch#165815 Approved by: https://github.com/eellison

BoyuanFeng · 2025-11-04T18:45:59Z

@pytorchbot cherry-pick --onto release/2.9 --fixes "Inductor partition introduced in 2.9.0 breaking vllm (vllm-project/vllm#27139)" -c fixnewfeature

…rnels (#165815) Scheduler relies on node.last_usage to free buffers. `last_usage` may contain a buffer that is allocated in previous graph partition AND not directly accessed in the current graph partition. ## Example ```python def f(x): y = x + 1 z = torch.ops.aten.view.dtype(y, torch.float8_e4m3fn) z_cpu = z.cpu() u_cuda = z_cpu.cuda() return u_cuda ``` In the generated code, we have ``` def partition_0(args): ... # Topologically Sorted Source Nodes: [y, z], Original ATen: [aten.add, aten.view] buf1 = torch.ops.aten.view.dtype(buf0, torch.float8_e4m3fn) # < ------ buf1 is a view of buf0 buf2 = buf1 # <------- buf2 is buf1 assert_size_stride(buf2, (8, ), (1, ), 'torch.ops.aten.view.dtype') assert_alignment(buf2, 16, 'torch.ops.aten.view.dtype') return (buf2, ) def call(self, args): ... (buf2,) = self.partitions[0](partition0_args) ... buf3.copy_(buf2, False) del buf0 del buf1 del buf2 # <---- `del buf2` leads to `del buf0`. BUT `buf0` is not returned from partition_0. ... ``` Note: view is treated as a fallback kernel due to its special dtype. https://github.com/pytorch/pytorch/blob/de09bab4b66002a8a9a2195f50f96a78868a3d39/torch/_inductor/lowering.py#L841-L843 ## Fix This PR fixes the issue by also returning these buffers to be freed later. Pull Request resolved: #165815 Approved by: https://github.com/eellison (cherry picked from commit 1891239)

pytorchbot · 2025-11-04T18:53:49Z

Cherry picking #165815

The cherry pick PR is at #166985 and it is linked with issue Inductor partition introduced in 2.9.0 breaking vllm (vllm-project/vllm#27139). The following tracker issues are updated:

[v2.9.1] Release Tracker #166758 (comment)

Details for Dev Infra team

Raised by workflow job

…rnels (#166985) [Graph Partition] fix graph partition input signature for fallback kernels (#165815) Scheduler relies on node.last_usage to free buffers. `last_usage` may contain a buffer that is allocated in previous graph partition AND not directly accessed in the current graph partition. ## Example ```python def f(x): y = x + 1 z = torch.ops.aten.view.dtype(y, torch.float8_e4m3fn) z_cpu = z.cpu() u_cuda = z_cpu.cuda() return u_cuda ``` In the generated code, we have ``` def partition_0(args): ... # Topologically Sorted Source Nodes: [y, z], Original ATen: [aten.add, aten.view] buf1 = torch.ops.aten.view.dtype(buf0, torch.float8_e4m3fn) # < ------ buf1 is a view of buf0 buf2 = buf1 # <------- buf2 is buf1 assert_size_stride(buf2, (8, ), (1, ), 'torch.ops.aten.view.dtype') assert_alignment(buf2, 16, 'torch.ops.aten.view.dtype') return (buf2, ) def call(self, args): ... (buf2,) = self.partitions[0](partition0_args) ... buf3.copy_(buf2, False) del buf0 del buf1 del buf2 # <---- `del buf2` leads to `del buf0`. BUT `buf0` is not returned from partition_0. ... ``` Note: view is treated as a fallback kernel due to its special dtype. https://github.com/pytorch/pytorch/blob/de09bab4b66002a8a9a2195f50f96a78868a3d39/torch/_inductor/lowering.py#L841-L843 ## Fix This PR fixes the issue by also returning these buffers to be freed later. Pull Request resolved: #165815 Approved by: https://github.com/eellison (cherry picked from commit 1891239) Co-authored-by: Boyuan Feng <boyuan@meta.com>

fix graph partition for view fallback

4c8fdf6

BoyuanFeng added ciflow/trunk Trigger trunk jobs on your pull request module: inductor ci-no-td Do not run TD on this PR labels Oct 18, 2025

pytorch-bot bot added the ciflow/inductor label Oct 18, 2025

BoyuanFeng added the topic: not user facing topic category label Oct 18, 2025

BoyuanFeng requested a review from eellison October 18, 2025 02:52

Merge branch 'main' into bf/partition-view-fallback

976a2c7

ProExpertProg mentioned this pull request Oct 18, 2025

[BugFix] fix graph partition signature vllm-project/vllm#27139

Merged

eellison approved these changes Oct 20, 2025

View reviewed changes

pytorchmergebot added the merging label Oct 20, 2025

pytorchmergebot closed this in 1891239 Oct 20, 2025

pytorchmergebot added Merged and removed merging labels Oct 20, 2025

pytorchbot mentioned this pull request Nov 4, 2025

[v2.9.1] Release Tracker #166758

Closed

github-actions bot deleted the bf/partition-view-fallback branch December 5, 2025 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Graph Partition] fix graph partition input signature for fallback kernels#165815

[Graph Partition] fix graph partition input signature for fallback kernels#165815
BoyuanFeng wants to merge 2 commits intomainfrom
bf/partition-view-fallback

BoyuanFeng commented Oct 18, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 18, 2025 •

edited

Loading

Uh oh!

BoyuanFeng commented Oct 20, 2025

Uh oh!

pytorchmergebot commented Oct 20, 2025

Uh oh!

BoyuanFeng commented Nov 4, 2025

Uh oh!

pytorchbot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if src_bits != dst_bits:
	# fallback to aten eager implementation for differing bitwidths
	return fallback_handler(aten.view.dtype)(x, dtype)

Conversation

BoyuanFeng commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example

Fix

Uh oh!

pytorch-bot bot commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165815

❗ 1 Active SEVs

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

BoyuanFeng commented Oct 20, 2025

Uh oh!

pytorchmergebot commented Oct 20, 2025

Merge started

Uh oh!

BoyuanFeng commented Nov 4, 2025

Uh oh!

pytorchbot commented Nov 4, 2025

Cherry picking #165815

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BoyuanFeng commented Oct 18, 2025 •

edited

Loading

pytorch-bot bot commented Oct 18, 2025 •

edited

Loading