[None][feat] retune causalConv1d fwd dispatch for varlen and short sequences by nv-guomingz · Pull Request #12739 · NVIDIA/TensorRT-LLM

nv-guomingz · 2026-04-03T14:54:17Z

Summary by CodeRabbit

Refactor
- Optimized causal convolution kernel dispatch to dynamically select optimal configurations based on input characteristics, improving performance for various sequence lengths and input patterns.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

nv-guomingz · 2026-04-03T14:57:25Z

/bot run

coderabbitai · 2026-04-03T14:57:58Z

📝 Walkthrough

Walkthrough

A CUDA kernel optimization that dispatches causal convolution forward operations between two thread configurations (64 vs 128 threads) based on input characteristics such as variable-length sequences and sequence length thresholds. Copyright year updated to 2026.

Changes

Cohort / File(s)	Summary
Causal Conv1D Kernel Dispatch Optimization `cpp/tensorrt_llm/kernels/causalConv1d/causalConv1d.cu`	Refactored `causal_conv1d_fwd_cuda` to use new `causal_conv1d_fwd_dispatch` helper that conditionally selects between 64-thread and 128-thread kernel launch configurations based on variable-length input detection and sequence length threshold (`params.seqlen <= 64 * kNElts`). Updated copyright year to 2026.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is entirely empty—all template sections (Description, Test Coverage, PR Checklist) contain only HTML comments with no actual content provided by the author.	Fill in the Description section explaining the motivation and technical approach; document relevant tests under Test Coverage; and ensure PR Checklist items are properly addressed.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: implementing tuned dispatch logic for causalConv1d kernels based on variable-length and short sequence detection.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

cpp/tensorrt_llm/kernels/causalConv1d/causalConv1d.cu (1)

356-359: Rename new constants to match repository constant naming convention.

Please rename the new constexpr constants to uppercase snakecase with k prefix for consistency with project rules.

Suggested diff

-    constexpr int kNarrowThreads = 64;
-    constexpr int kWideThreads = 128;
-    constexpr int kNElts = sizeof(input_t) == 4 ? 4 : 8;
-    constexpr int kShortSeqThreshold = kNarrowThreads * kNElts;
+    constexpr int kNARROW_THREADS = 64;
+    constexpr int kWIDE_THREADS = 128;
+    constexpr int kNELTS = sizeof(input_t) == 4 ? 4 : 8;
+    constexpr int kSHORT_SEQ_THRESHOLD = kNARROW_THREADS * kNELTS;
@@
-    bool const preferNarrowKernel = isVarlen || params.seqlen <= kShortSeqThreshold;
+    bool const preferNarrowKernel = isVarlen || params.seqlen <= kSHORT_SEQ_THRESHOLD;
@@
-        causal_conv1d_fwd_launch<kNarrowThreads, kWidth, input_t, weight_t>(params, stream);
+        causal_conv1d_fwd_launch<kNARROW_THREADS, kWidth, input_t, weight_t>(params, stream);
@@
-        causal_conv1d_fwd_launch<kWideThreads, kWidth, input_t, weight_t>(params, stream);
+        causal_conv1d_fwd_launch<kWIDE_THREADS, kWidth, input_t, weight_t>(params, stream);

As per coding guidelines, "C++ constants should use uppercase snakecase with prefix 'k': int const kDIGIT_NUM = 10;."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/causalConv1d/causalConv1d.cu` around lines 356 -
359, Rename the four new constexpr names to the project's
uppercase-snakecase-with-k-prefix convention: change kNarrowThreads ->
kNARROW_THREADS, kWideThreads -> kWIDE_THREADS, kNElts -> kN_ELTS (or kN_ELTS if
you prefer to preserve the "N" token), and kShortSeqThreshold ->
kSHORT_SEQ_THRESHOLD; update every usage/reference in the file (e.g., in kernels
or device code) to the new names so compilation succeeds and coding conventions
are followed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/tensorrt_llm/kernels/causalConv1d/causalConv1d.cu`:
- Around line 356-359: Rename the four new constexpr names to the project's
uppercase-snakecase-with-k-prefix convention: change kNarrowThreads ->
kNARROW_THREADS, kWideThreads -> kWIDE_THREADS, kNElts -> kN_ELTS (or kN_ELTS if
you prefer to preserve the "N" token), and kShortSeqThreshold ->
kSHORT_SEQ_THRESHOLD; update every usage/reference in the file (e.g., in kernels
or device code) to the new names so compilation succeeds and coding conventions
are followed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f6384b4b-60ce-41a0-9307-b6679dfa7686

📥 Commits

Reviewing files that changed from the base of the PR and between 1045f38 and 171f3aa.

📒 Files selected for processing (1)

cpp/tensorrt_llm/kernels/causalConv1d/causalConv1d.cu

tensorrt-cicd · 2026-04-03T15:03:15Z

PR_Github #41689 [ run ] triggered by Bot. Commit: 171f3aa Link to invocation

tensorrt-cicd · 2026-04-03T16:28:02Z

PR_Github #41689 [ run ] completed with state SUCCESS. Commit: 171f3aa
/LLM/main/L0_MergeRequest_PR pipeline #32592 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-04T01:13:56Z

/bot run

tensorrt-cicd · 2026-04-04T01:19:17Z

PR_Github #41776 [ run ] triggered by Bot. Commit: 171f3aa Link to invocation

tensorrt-cicd · 2026-04-04T02:33:04Z

PR_Github #41776 [ run ] completed with state SUCCESS. Commit: 171f3aa
/LLM/main/L0_MergeRequest_PR pipeline #32671 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-04T13:31:53Z

/bot run

tensorrt-cicd · 2026-04-04T13:38:46Z

PR_Github #41827 [ run ] triggered by Bot. Commit: 171f3aa Link to invocation

tensorrt-cicd · 2026-04-04T13:38:47Z

PR_Github #41827 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 4/4.

Link to invocation

nv-guomingz · 2026-04-05T10:46:06Z

/bot run

tensorrt-cicd · 2026-04-05T10:51:54Z

PR_Github #41865 [ run ] triggered by Bot. Commit: 171f3aa Link to invocation

tensorrt-cicd · 2026-04-05T11:56:27Z

PR_Github #41865 [ run ] completed with state SUCCESS. Commit: 171f3aa
/LLM/main/L0_MergeRequest_PR pipeline #32731 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-06T04:17:55Z

/bot run

tensorrt-cicd · 2026-04-06T04:23:32Z

PR_Github #41903 [ run ] triggered by Bot. Commit: 171f3aa Link to invocation

tensorrt-cicd · 2026-04-06T05:30:34Z

PR_Github #41903 [ run ] completed with state SUCCESS. Commit: 171f3aa
/LLM/main/L0_MergeRequest_PR pipeline #32762 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

nv-guomingz · 2026-04-06T06:08:34Z

/bot run

tensorrt-cicd · 2026-04-06T06:15:16Z

PR_Github #41913 [ run ] triggered by Bot. Commit: b6e6ae0 Link to invocation

tensorrt-cicd · 2026-04-06T10:26:54Z

PR_Github #41913 [ run ] completed with state SUCCESS. Commit: b6e6ae0
/LLM/main/L0_MergeRequest_PR pipeline #32772 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-06T13:39:53Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-06T13:46:12Z

PR_Github #41932 [ run ] triggered by Bot. Commit: b6e6ae0 Link to invocation

tensorrt-cicd · 2026-04-06T17:26:34Z

PR_Github #41932 [ run ] completed with state SUCCESS. Commit: b6e6ae0
/LLM/main/L0_MergeRequest_PR pipeline #32789 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

…quences (NVIDIA#12739) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

github-actions bot assigned nv-guomingz Apr 3, 2026

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

nv-guomingz changed the title ~~retune causalConv1d fwd dispatch for varlen and short sequences~~ [None][feat] retune causalConv1d fwd dispatch for varlen and short sequences Apr 3, 2026

nv-guomingz mentioned this pull request Apr 3, 2026

[None][feat] Optimize GDN of Qwen3-Next/3.5; adds BF16 TRTLLM MoE #12557

Open

1 task

nv-guomingz requested a review from rosenrodt April 4, 2026 01:13

rosenrodt approved these changes Apr 5, 2026

View reviewed changes

retune causalConv1d fwd dispatch for varlen and short sequences

b6e6ae0

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

nv-guomingz force-pushed the user/guomingz/causalconv1d-varlen-tuning branch from 171f3aa to b6e6ae0 Compare April 6, 2026 06:08

nv-guomingz requested a review from Wanli-Jiang April 7, 2026 02:21

Wanli-Jiang approved these changes Apr 7, 2026

View reviewed changes

nv-guomingz merged commit a03aeb1 into NVIDIA:main Apr 7, 2026
5 checks passed

Conversation

nv-guomingz commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

nv-guomingz commented Apr 3, 2026

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

nv-guomingz commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

nv-guomingz commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

nv-guomingz commented Apr 5, 2026

Uh oh!

tensorrt-cicd commented Apr 5, 2026

Uh oh!

tensorrt-cicd commented Apr 5, 2026

Uh oh!

nv-guomingz commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

nv-guomingz commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

nv-guomingz commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nv-guomingz commented Apr 3, 2026 •

edited

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading