[torch.compile] Stop assuming 32 bit indexing#33113
Merged
zou3519 merged 1 commit intovllm-project:mainfrom Jan 27, 2026
Merged
[torch.compile] Stop assuming 32 bit indexing#33113zou3519 merged 1 commit intovllm-project:mainfrom
zou3519 merged 1 commit intovllm-project:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
The pull request correctly addresses the issue of assume_32_bit_indexing causing errors when Tensor.numel() is not 32-bit. Changing the default to False is a good step to improve robustness, and the added comment clarifying the PyTorch version requirement for True is helpful. No high or critical severity issues were found in this change.
BoyuanFeng
approved these changes
Jan 26, 2026
atalman
approved these changes
Jan 26, 2026
ProExpertProg
approved these changes
Jan 27, 2026
Collaborator
|
Should we assert 2.10 version if the flag is used? |
Contributor
|
buildkite/ci/pr failed for a flake which was fixed by #33093. Could you rebase the PR to make the CI happy😊? |
We ran into some errors with this internally. Previously I thought this meant that we assume that the number of tokens is 32-bit, but this flag actually means if the Tensor.numel is 32-bit, which is not always True. We should actually be able to infer this, but until then, stop assuming the Tensor.numel is 32-bit. Signed-off-by: Richard Zou <zou3519@gmail.com>
b3e03c1 to
2449fef
Compare
apd10
pushed a commit
to apd10/vllm
that referenced
this pull request
Jan 31, 2026
Signed-off-by: Richard Zou <zou3519@gmail.com>
ItzDEXX
pushed a commit
to ItzDEXX/vllm
that referenced
this pull request
Feb 19, 2026
Signed-off-by: Richard Zou <zou3519@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
In PyTorch 2.10 we added a way for vLLM to assume 32 bit indexing and made it True by default. In PyTorch 2.9 the default was "don't assume 32 bit indexing".
We ran into some errors with this flag internally. Previously I thought this meant that we assume that the number of tokens is 32-bit, but this flag actually means if the Tensor.numel() is 32-bit, which is not always True.
We should actually be able to infer this, but until then, stop assuming the Tensor.numel is 32-bit.
Test Plan & Test Result
wait for CI