[BugFix] Fix is_strictly_contiguous assertion for decode_query in TRT…#32453
Closed
luccafong wants to merge 2 commits intovllm-project:mainfrom
Closed
[BugFix] Fix is_strictly_contiguous assertion for decode_query in TRT…#32453luccafong wants to merge 2 commits intovllm-project:mainfrom
luccafong wants to merge 2 commits intovllm-project:mainfrom
Conversation
…LLM path Fix decode_query strict contiguity assertion failure in FlashInfer TRTLLM decode path on Blackwell GPUs. The issue: decode_query may have non-contiguous memory layout, and calling .contiguous() alone doesn't always produce canonical strides required by TRTLLM kernels. The fix: Use .contiguous().reshape(shape) to ensure strictly contiguous layout with canonical strides. Signed-off-by: Lu Fang <fanglu@fb.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request addresses a potential runtime assertion failure by ensuring decode_query is strictly contiguous. The fix, which involves using .contiguous().reshape(), is correct. However, the same potential issue exists for prefill_query in a different part of the code which this PR does not address. I've added a specific comment with a suggestion to apply the same fix for prefill_query to ensure robustness and prevent similar assertion failures.
Signed-off-by: Lu Fang <fanglu@fb.com>
Collaborator
|
I think this is a duplicate PR |
Collaborator
|
what causes the non-continuous layout to suddenly occur |
|
This pull request has merge conflicts that must be resolved before it can be |
Member
|
Resolved by #32417 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Fix decode_query strict contiguity assertion failure in FlashInfer TRTLLM decode path on Blackwell GPUs in issue #32452
The issue: decode_query may have non-contiguous memory layout, and calling .contiguous() alone doesn't always produce canonical strides required by TRTLLM kernels.
The fix: Use .contiguous().reshape(shape) to ensure strictly contiguous layout with canonical strides.
Test Plan
Run repro in the issue
Test Result
No errors
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.