Skip to content

[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation#36093

Merged
zou3519 merged 1 commit intovllm-project:mainfrom
zou3519:compile_size_fake
Mar 11, 2026
Merged

[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation#36093
zou3519 merged 1 commit intovllm-project:mainfrom
zou3519:compile_size_fake

Conversation

@zou3519
Copy link
Copy Markdown
Collaborator

@zou3519 zou3519 commented Mar 5, 2026

create_concrete_args previously allocated real GPU tensors (via torch.empty) just to carry shape/stride/dtype/device metadata into standalone_compile. Switch to FakeTensors under a FakeTensorMode with a dummy ShapeEnv. (dummy ShapeEnv instead of None is needed to keep AOTAutogradCache happy)

standalone_compile("from_example_inputs") creates its own FakeTensorMode internally, which would conflict with our FakeTensors. Work around this by patching FakeTensorMode in standalone_compile to reuse our mode. Tracked upstream: pytorch/pytorch#176562

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optimization to the torch.compile integration by using FakeTensors instead of real GPU tensors during single-size compilation. This avoids unnecessary GPU memory allocation. The changes are implemented in two parts: create_concrete_args is updated to generate FakeTensors, and InductorStandaloneAdaptor.compile is patched to handle these tensors correctly by reusing the FakeTensorMode, which also serves as a workaround for an upstream PyTorch issue. The implementation is clean, well-commented, and the logic appears sound. I have no major concerns with this change.

@zou3519 zou3519 force-pushed the compile_size_fake branch from 6f5635a to bcb9e5b Compare March 5, 2026 05:19
@zou3519
Copy link
Copy Markdown
Collaborator Author

zou3519 commented Mar 5, 2026

cc @zhxchen17 @eellison

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 6, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zou3519.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Mar 6, 2026
@BoyuanFeng
Copy link
Copy Markdown
Collaborator

looks good. please resolve merge conflicts.

@zou3519 zou3519 force-pushed the compile_size_fake branch from bcb9e5b to d19ac4a Compare March 9, 2026 17:15
@mergify mergify Bot removed the needs-rebase label Mar 9, 2026
@zou3519 zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026
@zou3519 zou3519 enabled auto-merge (squash) March 10, 2026 13:37
create_concrete_args previously allocated real GPU tensors (via
torch.empty) just to carry shape/stride/dtype/device metadata into
standalone_compile. Switch to FakeTensors under a FakeTensorMode with
a dummy ShapeEnv. (dummy ShapeEnv instead of None is needed to keep
AOTAutogradCache happy)

standalone_compile("from_example_inputs") creates its own FakeTensorMode
internally, which would conflict with our FakeTensors. Work around this
by patching FakeTensorMode in standalone_compile to reuse our mode.
Tracked upstream: pytorch/pytorch#176562

Signed-off-by: Richard Zou <zou3519@gmail.com>
@zou3519 zou3519 force-pushed the compile_size_fake branch from 17d977c to 7c46bdb Compare March 11, 2026 13:41
@zou3519 zou3519 merged commit 822e250 into vllm-project:main Mar 11, 2026
53 checks passed
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…e-size compilation (vllm-project#36093)

Signed-off-by: Richard Zou <zou3519@gmail.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…e-size compilation (vllm-project#36093)

Signed-off-by: Richard Zou <zou3519@gmail.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
…e-size compilation (vllm-project#36093)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants