Skip to content

Update trtllm FMHA cubins#3317

Merged
jimmyzho merged 1 commit into
flashinfer-ai:mainfrom
djmmoss:dmoss/fmha-trtllm-public-cubins
May 19, 2026
Merged

Update trtllm FMHA cubins#3317
jimmyzho merged 1 commit into
flashinfer-ai:mainfrom
djmmoss:dmoss/fmha-trtllm-public-cubins

Conversation

@djmmoss
Copy link
Copy Markdown
Collaborator

@djmmoss djmmoss commented May 14, 2026

📌 Description

Updates the trtllm FMHA artifact path and checksum to the newer cubins. Aligns the FMHA parameter ABI expected by those cubins and uses dense mask selection for MLA decode generation kernels.

🔍 Related Issues

None.

🧪 Tests

  • pre-commit run --all-files
  • pytest tests/attention/test_cute_dsl_mla_decode.py::test_cute_dsl_vs_trtllm_gen[True-128-1] tests/attention/test_trtllm_ragged_kv_stride.py -q -ra --tb=short
  • every fifth collected tracked tests/attention/*.py item after -k trtllm: 18314 passed, 22979 skipped, 342803 deselected, 686 warnings
  • H64 BF16 Q + FP8 KV ULP sweep: 7/7 PASS, all mismatches 0

Reviewer Notes

None.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 83c82cde-4a6c-4c46-a415-63e89f224208

📥 Commits

Reviewing files that changed from the base of the PR and between 9113c91 and a6b9087.

📒 Files selected for processing (3)
  • csrc/trtllm_fmha_kernel_launcher.cu
  • flashinfer/artifacts.py
  • include/flashinfer/trtllm/fmha/kernelParams.h
✅ Files skipped from review due to trivial changes (1)
  • flashinfer/artifacts.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • csrc/trtllm_fmha_kernel_launcher.cu
  • include/flashinfer/trtllm/fmha/kernelParams.h

📝 Walkthrough

Walkthrough

This PR: (1) uses an is_mla_decode flag to gate sparse-MLA and selects Dense vs Causal mask for MLA-decode in generation, (2) adjusts KernelParams TMA descriptor/ABI and adds K/V TMA reshape validation + persisted reshape factor, and (3) updates TRTLLM-FMHA artifact path and checksum.

Changes

TRTLLM FMHA MLA Decode and TMA Reshape Updates

Layer / File(s) Summary
MLA-decode detection and generation mask selection
csrc/trtllm_fmha_kernel_launcher.cu
Introduced is_mla_decode for head-dim pairs (576/512 or 320/256); sparse-MLA gating now requires is_mla_decode; generation-mode runner_params.mMaskType is Dense when is_mla_decode, otherwise Causal.
KernelParams TMA descriptor & ABI layout changes
include/flashinfer/trtllm/fmha/kernelParams.h
Removed sliding-window KV pool map, added tmaOSf_, added mReservedAttentionWindowState[2], added mReshapeFactorKv, and removed ptrSparseMlaTopKLens field.
K/V TMA reshape validation and persisted reshape factor
include/flashinfer/trtllm/fmha/kernelParams.h
Added templated helper canUseTmaKvReshape(...) that checks descriptor stride vs physical head-dim under element packing; canReshapeTmaKv now requires this check for K and V; computed reshapeFactorKv is stored in params.mReshapeFactorKv.
Artifact path and checksum update
flashinfer/artifacts.py
Updated ArtifactPath.TRTLLM_GEN_FMHA artifactory subpath hash and CheckSumHash.TRTLLM_GEN_FMHA checksum to new values.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

run-ci, op: attention

Suggested reviewers

  • sricketts
  • aleozlx
  • yzh119
  • cyx-6
  • yongwww

Poem

🐇 A rabbit hops through kernel rows with glee,
It marks the MLA pairs and sets the mask to be,
Reshapes KV safely, stores the factor near,
Updates artifact hashes — the new builds cheer,
Hooray for tiny hops that keep inference clear.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change—updating TRTLLM FMHA cubins—which is the primary objective of the PR.
Description check ✅ Passed The PR description covers the main changes and includes a testing section with specific test results, but the optional pre-commit and unit test checklist items are not explicitly marked.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the TRT-LLM FMHA kernel launcher and parameters to align with newer cubin ABIs, including support for output scaling factors and refined TMA box reshaping logic. Key changes involve adding tmaOSf_ and mReshapeFactorKv to the KernelParams struct and implementing conditional mask type selection for MLA decode. Review feedback identifies a critical omission where the new tmaOSf_ descriptor is not initialized in setKernelParams, which could lead to undefined behavior. Additionally, it is recommended to refactor duplicated MLA detection logic for better maintainability and to use specific K/V data types when checking for TMA reshape compatibility to improve robustness.

// TMA descriptor for V.
CUtensorMap tmaV_;
// TMA descriptor for output scaling factor.
CUtensorMap tmaOSf_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new tmaOSf_ member is added to the KernelParams struct to align with the newer cubin ABI, but it is not initialized in the setKernelParams function. If the newer cubins expect a valid TMA descriptor for output scaling factors (e.g., when performing FP4 quantization on output), this will lead to undefined behavior or crashes as the descriptor will be all zeros. Please add the necessary logic in setKernelParams to build the TMA descriptor for tmaOSf_ when options.oSfPtr is provided, similar to how tmaKSf_ and tmaVSf_ are handled.

Copy link
Copy Markdown
Contributor

@PerkzZheng PerkzZheng May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djmmoss I am not quite sure why those changes are made. please add me to review next time. Thanks!
And no worries about that. I will revert them in my MR.

Comment thread csrc/trtllm_fmha_kernel_launcher.cu Outdated
Comment thread include/flashinfer/trtllm/fmha/kernelParams.h Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/flashinfer/trtllm/fmha/kernelParams.h`:
- Around line 703-706: The code block defining the canReshapeTmaKv boolean is
misformatted; run clang-format (e.g., pre-commit run clang-format --all-files)
to reformat this declaration so spacing and line breaks follow project style for
the canReshapeTmaKv initializer that references isPagedKv(options.mQkvLayout),
options.mHeadDimQk, swizzleKv, canUseTmaKvReshape(options,
kernelMeta.mDataTypeKv, /*isK*/ true/false), and ensure the file
include/flashinfer/trtllm/fmha/kernelParams.h is committed after formatting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 12a74a50-4c9b-43b3-90f4-2ca34f9b744f

📥 Commits

Reviewing files that changed from the base of the PR and between f6d6bd4 and ac48b6e.

📒 Files selected for processing (3)
  • csrc/trtllm_fmha_kernel_launcher.cu
  • flashinfer/artifacts.py
  • include/flashinfer/trtllm/fmha/kernelParams.h

Comment thread include/flashinfer/trtllm/fmha/kernelParams.h Outdated
@djmmoss djmmoss force-pushed the dmoss/fmha-trtllm-public-cubins branch from ac48b6e to 9113c91 Compare May 14, 2026 00:27
@djmmoss djmmoss changed the title Update TRTLLM FMHA public cubins Update trtllm FMHA public cubins May 14, 2026
@djmmoss djmmoss changed the title Update trtllm FMHA public cubins Update trtllm FMHA cubins May 14, 2026
Point FMHA at the newer public trtllm cubin publish, align the FMHA parameter ABI, and use dense mask selection for MLA decode kernels.
@djmmoss djmmoss force-pushed the dmoss/fmha-trtllm-public-cubins branch from 9113c91 to a6b9087 Compare May 14, 2026 23:55
@djmmoss djmmoss requested a review from dhiraj113 as a code owner May 14, 2026 23:55
@djmmoss
Copy link
Copy Markdown
Collaborator Author

djmmoss commented May 15, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !677 has been created, and the CI pipeline #51413113 is currently running. I'll report back once the pipeline job completes.

@djmmoss
Copy link
Copy Markdown
Collaborator Author

djmmoss commented May 16, 2026

All tests are green; can we get this one in? @yzh119 @aleozlx

Copy link
Copy Markdown
Contributor

@jimmyzho jimmyzho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jimmyzho jimmyzho merged commit 9035311 into flashinfer-ai:main May 19, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants