Skip to content

[torch.compile] Stop assuming 32 bit indexing#33113

Merged
zou3519 merged 1 commit intovllm-project:mainfrom
zou3519:stop_assume_32bit
Jan 27, 2026
Merged

[torch.compile] Stop assuming 32 bit indexing#33113
zou3519 merged 1 commit intovllm-project:mainfrom
zou3519:stop_assume_32bit

Conversation

@zou3519
Copy link
Copy Markdown
Collaborator

@zou3519 zou3519 commented Jan 26, 2026

Purpose

In PyTorch 2.10 we added a way for vLLM to assume 32 bit indexing and made it True by default. In PyTorch 2.9 the default was "don't assume 32 bit indexing".

We ran into some errors with this flag internally. Previously I thought this meant that we assume that the number of tokens is 32-bit, but this flag actually means if the Tensor.numel() is 32-bit, which is not always True.

We should actually be able to infer this, but until then, stop assuming the Tensor.numel is 32-bit.

Test Plan & Test Result

wait for CI

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly addresses the issue of assume_32_bit_indexing causing errors when Tensor.numel() is not 32-bit. Changing the default to False is a good step to improve robustness, and the added comment clarifying the PyTorch version requirement for True is helpful. No high or critical severity issues were found in this change.

Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 26, 2026
@ProExpertProg
Copy link
Copy Markdown
Collaborator

Should we assert 2.10 version if the flag is used?

@pacoxu
Copy link
Copy Markdown
Contributor

pacoxu commented Jan 27, 2026

buildkite/ci/pr failed for a flake which was fixed by #33093.

Could you rebase the PR to make the CI happy😊?

We ran into some errors with this internally. Previously I thought this
meant that we assume that the number of tokens is 32-bit, but this flag
actually means if the Tensor.numel is 32-bit, which is not always True.

We should actually be able to infer this, but until then, stop assuming
the Tensor.numel is 32-bit.

Signed-off-by: Richard Zou <zou3519@gmail.com>
@zou3519 zou3519 enabled auto-merge (squash) January 27, 2026 02:30
@zou3519 zou3519 merged commit 3b8f0fe into vllm-project:main Jan 27, 2026
49 checks passed
apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026
Signed-off-by: Richard Zou <zou3519@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Richard Zou <zou3519@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants