[MISC] Add strict contiguity check for FlashInfer attention tensors by vadiklyutiy · Pull Request #32008 · vllm-project/vllm

vadiklyutiy · 2026-01-09T02:21:41Z

Early check of potential error as in #30842. See also #31617, flashinfer-ai/flashinfer#2232

Updates FlashInfer attention path to use stricter contiguous check, preventing potential CUDA kernel memory access issues.
Introduces is_strictly_contiguous() utility to detect tensors with degenerate strides that PyTorch's is_contiguous() reports as contiguous.

Note

Strengthens memory layout validation for FlashInfer TRTLLM attention.

Adds is_strictly_contiguous(t) in vllm/utils/torch_utils.py to verify canonical contiguous strides and catch degenerate-stride tensors
Replaces .is_contiguous() assertions with is_strictly_contiguous() in flashinfer.py TRTLLM (HND) prefill and decode paths for query, kv_cache_permute, workspace buffer, block_tables, and seq_lens
Aims to fail fast before launching CUDA kernels; no algorithmic changes

^{Written by Cursor Bugbot for commit b1e334e. This will update automatically on new commits. Configure here.}

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a is_strictly_contiguous utility function to perform a stricter check for tensor contiguity, addressing potential memory access issues in FlashInfer CUDA kernels caused by degenerate strides. This new check is correctly applied to the FlashInfer TRTLLM prefill attention path. The implementation of the new utility is robust. However, a similar vulnerability seems to exist in the TRTLLM decode path which has not been addressed in this PR. I've added a critical comment to highlight this omission.

vllm/v1/attention/backends/flashinfer.py

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vllm/v1/attention/backends/flashinfer.py

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

benchislett

LGTM!

pavanimajety · 2026-01-09T18:51:30Z

vllm/v1/attention/backends/flashinfer.py

                else:
                    assert self.o_sf_scale is None
                    out = output[num_decode_tokens:]



Should we also check if out is contiguous?

pavanimajety · 2026-01-09T19:15:45Z

vllm/v1/attention/backends/flashinfer.py

            else:
                assert isinstance(attn_metadata.prefill, TRTLLMPrefill)
                # prefill_query may be non-contiguous
                prefill_query = prefill_query.contiguous()


I did some tests, and if torch tensor's is_contiguous() returns True where is_strictly_contiguous returns False, tensor.contiguous() actually doesn't make it a contiguous tensor. Eg:

t_base = torch.randn(16, 8, 32) t2 = t_base.as_strided(size=(16, 1, 8, 32), stride=(256, 1, 32, 1)) t5 = t2.contiguous()

here, the result is -

t2 -> Shape: torch.Size([16, 1, 8, 32]), Stride: (256, 1, 32, 1) torch.is_contiguous(): True is_strictly_contiguous: False Expected canonical stride: (256, 256, 32, 1)

t5 -> Shape: torch.Size([16, 1, 8, 32]), Stride: (256, 1, 32, 1) torch.is_contiguous(): True is_strictly_contiguous: False

t2.contiguous() didn't actually convert to a contiguous tensor. While the assertion works in detecting a noncontiguous tensor, the earlier prefill_query.contiguous() may be insufficient. Wondering if we need to do anything additionally?

we may need to do -

prefill_query_contiguous = torch.empty(t2.shape, dtype=t2.dtype, device=t2.device) prefill_query_contiguous.copy_(prefill_query)

And similar for the rest where we may do squeeze / unsqueeze / slice.

Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>

mergify · 2026-01-10T13:30:57Z

Hi @vadiklyutiy, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

…llm-project#32008) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>

…llm-project#32008) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…llm-project#32008) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>

Add strict contiguity check for FlashInfer attention tensors

d079a0d

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vadiklyutiy requested review from mgoin and pavanimajety as code owners January 9, 2026 02:21

vadiklyutiy requested a review from benchislett January 9, 2026 02:22

mergify bot added nvidia v1 labels Jan 9, 2026

github-project-automation bot added this to NVIDIA Jan 9, 2026

vadiklyutiy self-assigned this Jan 9, 2026

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

vllm/v1/attention/backends/flashinfer.py Show resolved Hide resolved

fix review comment

d402760

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

cursor bot reviewed Jan 9, 2026

View reviewed changes

vllm/v1/attention/backends/flashinfer.py Show resolved Hide resolved

fix review comment

1b45a81

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

benchislett approved these changes Jan 9, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Jan 9, 2026

vadiklyutiy added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 9, 2026

pavanimajety reviewed Jan 9, 2026

View reviewed changes

Merge branch 'main' into vadim/trtgen-attn-stronger-asserts

4cb6572

Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>

fix misprint

b1e334e

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vllm-bot merged commit e15a5ff into vllm-project:main Jan 10, 2026
54 of 56 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Jan 10, 2026

mgoin mentioned this pull request Jan 14, 2026

[Bug]: Nemotron-3-Nano is broken when using TRTLLM attention on Blackwell #32353

Closed

1 task

luccafong mentioned this pull request Jan 16, 2026

[Bug]: is_strictly_contiguous assertion fails in FlashInfer TRTLLM decode path on Blackwell for Scout #32452

Closed

2 tasks

vadiklyutiy deleted the vadim/trtgen-attn-stronger-asserts branch March 11, 2026 08:01

vadiklyutiy mentioned this pull request Mar 12, 2026

[Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer layout #34158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MISC] Add strict contiguity check for FlashInfer attention tensors#32008

[MISC] Add strict contiguity check for FlashInfer attention tensors#32008
vllm-bot merged 5 commits intovllm-project:mainfrom
CentML:vadim/trtgen-attn-stronger-asserts

vadiklyutiy commented Jan 9, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Uh oh!

pavanimajety Jan 9, 2026

Uh oh!

pavanimajety Jan 9, 2026

Uh oh!

pavanimajety Jan 9, 2026

Uh oh!

mergify bot commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

vadiklyutiy commented Jan 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

pavanimajety Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

pavanimajety Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

pavanimajety Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vadiklyutiy commented Jan 9, 2026 •

edited by github-actions bot

Loading