[BUGFIX] Fix degenerate strides in TRTLLM query tensors for FlashInfer backend. Fixes issue #32353 by vadiklyutiy · Pull Request #32417 · vllm-project/vllm

vadiklyutiy · 2026-01-15T15:42:04Z

Summary

This PR fixes an issue #32353 with degenerate strides in query tensors when using the TRTLLM kernels in the FlashInfer attention backend. The .contiguous() call alone doesn't fix degenerate strides when a dimension has size 1, which can cause issues with kernel execution.

Problem

Query tensors can have degenerate strides and .contiguous() doesn't fix it. In #32353:

Shape: torch.Size([1, 32, 128])
Stride: (4608, 128, 1)

Test

vllm serve nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 --trust-remote-code

starts successfully

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

gemini-code-assist

Code Review

This pull request effectively addresses the issue of degenerate strides in TRTLLM query tensors for the FlashInfer backend. The addition of .reshape(tensor.shape) after .contiguous() correctly forces non-degenerate strides, resolving the reported bug. The accompanying comments clearly explain the rationale behind this change, enhancing code clarity and maintainability.

vllm/v1/attention/backends/flashinfer.py

vadiklyutiy · 2026-01-17T03:35:19Z

According to ci-health v1-test-attention-b200 fails on top of the tree

vadiklyutiy · 2026-01-17T22:53:51Z

previously failed test was workarounded in another commit. Now CI passed successfully

pavanimajety

Thanks for the fixes @vadiklyutiy

…r backend. Fixes issue vllm-project#32353 (vllm-project#32417) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

…r backend. Fixes issue vllm-project#32353 (vllm-project#32417) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…r backend. Fixes issue vllm-project#32353 (vllm-project#32417) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

fix 32353

25f37ce

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vadiklyutiy requested review from mgoin and pavanimajety as code owners January 15, 2026 15:42

mergify bot added nvidia v1 bug Something isn't working labels Jan 15, 2026

github-project-automation bot added this to NVIDIA Jan 15, 2026

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

mgoin reviewed Jan 15, 2026

View reviewed changes

vllm/v1/attention/backends/flashinfer.py Show resolved Hide resolved

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 15, 2026

Merge branch 'main' into vadim/issue32353

11f479e

pavanimajety approved these changes Jan 19, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Jan 19, 2026

pavanimajety merged commit 6101a26 into vllm-project:main Jan 19, 2026
53 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Jan 19, 2026

vadiklyutiy mentioned this pull request Jan 20, 2026

[Bug]: Nemotron-3-Nano is broken when using TRTLLM attention on Blackwell #32353

Closed

1 task

vadiklyutiy deleted the vadim/issue32353 branch January 20, 2026 13:54

This was referenced Jan 20, 2026

[BugFix] Fix is_strictly_contiguous assertion for decode_query in TRT… #32453

Closed

[Bug]: is_strictly_contiguous assertion fails in FlashInfer TRTLLM decode path on Blackwell for Scout #32452

Closed

robertgshaw2-redhat mentioned this pull request Feb 2, 2026

[Bug]: GPT-OSS with CPU KV cache offload break with FlashInfer #33572

Closed

1 task

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[BUGFIX] Fix degenerate strides in TRTLLM query tensors for FlashInfe…

31a1482

…r backend. Fixes issue vllm-project#32353 (vllm-project#32417) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vadiklyutiy mentioned this pull request Mar 12, 2026

[Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer layout #34158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUGFIX] Fix degenerate strides in TRTLLM query tensors for FlashInfer backend. Fixes issue #32353#32417

[BUGFIX] Fix degenerate strides in TRTLLM query tensors for FlashInfer backend. Fixes issue #32353#32417
pavanimajety merged 2 commits intovllm-project:mainfrom
CentML:vadim/issue32353

vadiklyutiy commented Jan 15, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

vadiklyutiy commented Jan 17, 2026

Uh oh!

vadiklyutiy commented Jan 17, 2026

Uh oh!

pavanimajety left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

vadiklyutiy commented Jan 15, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

vadiklyutiy commented Jan 17, 2026

Uh oh!

vadiklyutiy commented Jan 17, 2026

Uh oh!

pavanimajety left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vadiklyutiy commented Jan 15, 2026 •

edited by github-actions bot

Loading