[None][feat] reuse triton slicing kernel for GDN prefill transpose#12737
Conversation
📝 WalkthroughWalkthroughThe changes refactor tensor transposition and slicing logic for prefill tokens by introducing a new utility function Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/models/modeling_qwen3_next.py (1)
701-709:⚠️ Potential issue | 🟠 MajorExplicitly copy back the decode result to ensure correctness across all backends.
Lines 701–709 assign the decode result to
mixed_qkv_dwithout explicitly copying it back to the split view, whereas the prefill result is explicitly copied at line 709 viamixed_qkv_p.copy_(...). This asymmetry creates a latent bug: ifcausal_conv1d_updatematerializes a new tensor (as the Triton backend does), the parentmixed_qkvtensor retains stale decode rows, which corrupts the subsequent Q/K/V split.Suggested fix with data_ptr guard
- mixed_qkv_d = causal_conv1d_update( + mixed_qkv_d_out = causal_conv1d_update( mixed_qkv_d, conv_states_to_use, self.conv1d.weight, self.conv1d.bias, activation=self.activation, conv_state_indices=state_indices_d, ) + if mixed_qkv_d_out.data_ptr() != mixed_qkv_d.data_ptr(): + mixed_qkv_d.copy_(mixed_qkv_d_out) mixed_qkv_p.copy_(mixed_qkv_p_t.transpose(0, 1))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_qwen3_next.py` around lines 701 - 709, The decode path may materialize a new tensor in causal_conv1d_update causing mixed_qkv's decode rows to remain stale; mirror the prefill handling by explicitly copying the updated decode data back into the split view. After calling causal_conv1d_update (for mixed_qkv_d), perform an in-place copy into the original decode view (the same way mixed_qkv_p.copy_(...) is used) and optionally guard with a data_ptr comparison between the returned tensor and the destination to avoid redundant copies; update references around mixed_qkv_d, mixed_qkv_p.copy_, causal_conv1d_update and mixed_qkv accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py`:
- Around line 54-80: The store-side offset computation in the Triton kernel
_extract_transpose_prefill_kernel is done in 32-bit arithmetic and can overflow
when width * num_prefill_tokens exceeds 2,147,483,647; mirror the fix used for
src_offsets by casting the operands to tl.int64 (or performing the
multiplication in tl.int64) before computing dst_offsets and before any
subsequent index arithmetic or calls to tl.store so the write addresses cannot
overflow. Locate the dst_offsets/dst_ptr computation inside
_extract_transpose_prefill_kernel and change the multiplication/additions to use
tl.int64 (e.g., cast row/col/width or the product) and ensure the tl.store uses
the 64-bit offset variable.
---
Outside diff comments:
In `@tensorrt_llm/_torch/models/modeling_qwen3_next.py`:
- Around line 701-709: The decode path may materialize a new tensor in
causal_conv1d_update causing mixed_qkv's decode rows to remain stale; mirror the
prefill handling by explicitly copying the updated decode data back into the
split view. After calling causal_conv1d_update (for mixed_qkv_d), perform an
in-place copy into the original decode view (the same way mixed_qkv_p.copy_(...)
is used) and optionally guard with a data_ptr comparison between the returned
tensor and the destination to avoid redundant copies; update references around
mixed_qkv_d, mixed_qkv_p.copy_, causal_conv1d_update and mixed_qkv accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 124c4bc8-c452-4797-af80-6edb29df5311
📒 Files selected for processing (2)
tensorrt_llm/_torch/models/modeling_qwen3_next.pytensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py
1f78c5a to
d522412
Compare
|
/bot run --add-multi-gpu-test |
|
PR_Github #41693 [ run ] triggered by Bot. Commit: |
|
PR_Github #41693 [ run ] completed with state
|
|
/bot run |
|
PR_Github #41775 [ run ] triggered by Bot. Commit: |
|
PR_Github #41775 [ run ] completed with state
|
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
d522412 to
114834c
Compare
|
/bot run |
|
PR_Github #41867 [ run ] triggered by Bot. Commit: |
|
PR_Github #41867 [ run ] completed with state
|
|
/bot run |
rosenrodt
left a comment
There was a problem hiding this comment.
LGTM as long as tests pass
|
PR_Github #41886 [ run ] triggered by Bot. Commit: |
|
PR_Github #41886 [ run ] completed with state |
…VIDIA#12737) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
…VIDIA#12737) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
…VIDIA#12737) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
…VIDIA#12737) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.