Skip to content

[BugFix] Fix is_strictly_contiguous assertion for decode_query in TRT…#32453

Closed
luccafong wants to merge 2 commits intovllm-project:mainfrom
luccafong:fix-strict-contiguous-decode-query
Closed

[BugFix] Fix is_strictly_contiguous assertion for decode_query in TRT…#32453
luccafong wants to merge 2 commits intovllm-project:mainfrom
luccafong:fix-strict-contiguous-decode-query

Conversation

@luccafong
Copy link
Copy Markdown
Collaborator

@luccafong luccafong commented Jan 16, 2026

Purpose

Fix decode_query strict contiguity assertion failure in FlashInfer TRTLLM decode path on Blackwell GPUs in issue #32452

The issue: decode_query may have non-contiguous memory layout, and calling .contiguous() alone doesn't always produce canonical strides required by TRTLLM kernels.

The fix: Use .contiguous().reshape(shape) to ensure strictly contiguous layout with canonical strides.

Test Plan

Run repro in the issue

Test Result

No errors

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…LLM path

Fix decode_query strict contiguity assertion failure in FlashInfer
TRTLLM decode path on Blackwell GPUs.

The issue: decode_query may have non-contiguous memory layout, and
calling .contiguous() alone doesn't always produce canonical strides
required by TRTLLM kernels.

The fix: Use .contiguous().reshape(shape) to ensure strictly contiguous
layout with canonical strides.

Signed-off-by: Lu Fang <fanglu@fb.com>
@luccafong luccafong requested review from hmellor and removed request for mgoin and pavanimajety January 16, 2026 03:57
@mergify mergify bot added nvidia v1 bug Something isn't working labels Jan 16, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a potential runtime assertion failure by ensuring decode_query is strictly contiguous. The fix, which involves using .contiguous().reshape(), is correct. However, the same potential issue exists for prefill_query in a different part of the code which this PR does not address. I've added a specific comment with a suggestion to apply the same fix for prefill_query to ensure robustness and prevent similar assertion failures.

@luccafong luccafong requested a review from benchislett January 16, 2026 03:59
Signed-off-by: Lu Fang <fanglu@fb.com>
@benchislett
Copy link
Copy Markdown
Collaborator

I think this is a duplicate PR

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

what causes the non-continuous layout to suddenly occur

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 19, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @luccafong.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 19, 2026
@mgoin
Copy link
Copy Markdown
Member

mgoin commented Jan 20, 2026

Resolved by #32417

@mgoin mgoin closed this Jan 20, 2026
@github-project-automation github-project-automation bot moved this to Done in NVIDIA Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase nvidia v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants