Skip to content

feat: update trtllm-gen bmm cubins#3045

Draft
samuellees wants to merge 2 commits intoflashinfer-ai:mainfrom
samuellees:feat/update-tg-bmm-cubins
Draft

feat: update trtllm-gen bmm cubins#3045
samuellees wants to merge 2 commits intoflashinfer-ai:mainfrom
samuellees:feat/update-tg-bmm-cubins

Conversation

@samuellees
Copy link
Copy Markdown
Collaborator

@samuellees samuellees commented Apr 13, 2026

Fix #2671

Thanks a lot for @bkryu @nekorobov @ David Clark

📌 Description

Update TRT-LLM Gen Batched GEMM Cubins for perf imporvement

🔍 Related Issues

#2671

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Chores
    • Updated TRT-LLM batch matrix multiplication artifacts used by the application.
    • Updated corresponding verification checksums for those artifacts to ensure downloads validate correctly.
    • These maintenance updates improve stability and integrity of artifact retrieval during installs and updates.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6d04f934-cdca-4218-b361-442e73fcc04c

📥 Commits

Reviewing files that changed from the base of the PR and between ca4738d and 7b2b535.

📒 Files selected for processing (1)
  • flashinfer/artifacts.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • flashinfer/artifacts.py

📝 Walkthrough

Walkthrough

Updated the remote artifact subdirectory identifier and its SHA256 checksum for the TRTLLM batched_gemm (BMM) artifact constants used to construct download paths and checksum verification.

Changes

Cohort / File(s) Summary
Artifact constants
flashinfer/artifacts.py
Replaced ArtifactPath.TRTLLM_GEN_BMM remote subdirectory identifier and updated CheckSumHash.TRTLLM_GEN_BMM SHA256 to match the new batched_gemm artifact set; affects download target path construction and checksum verification.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested labels

op: gemm

Suggested reviewers

  • bkryu
  • aleozlx
  • cyx-6
  • yongwww
  • nv-yunzheq
  • kahyunnam
  • yzh119

Poem

🐰 A tiny hop for bytes and bits,
New paths and sums in tidy fits,
The BMM bundle found its way,
Checksums snug — hip-hop hooray! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: updating TRT-LLM Gen BMM (Batched GEMM) cubins, which matches the code changes modifying artifact paths and checksums.
Description check ✅ Passed The description follows the template with sections for Description, Related Issues, and Pre-commit Checklist. However, the Pre-commit section is incomplete: hooks were not run manually, and test status is missing.
Linked Issues check ✅ Passed The PR updates BMM cubins to address the performance regression in issue #2671 for trtllm_fp4_block_scale_moe with mxfp4_bf16, which aligns with the stated objective.
Out of Scope Changes check ✅ Passed The PR changes only artifact paths and checksums in flashinfer/artifacts.py, which are directly related to updating BMM cubins per the linked issue requirements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@samuellees
Copy link
Copy Markdown
Collaborator Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !538 has been created, and the CI pipeline #48406843 is currently running. I'll report back once the pipeline job completes.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the artifact path and checksum for TRTLLM_GEN_BMM in flashinfer/artifacts.py. A review comment suggests adding a trailing slash to the updated path to maintain consistency with other artifact definitions in the class.

Comment thread flashinfer/artifacts.py Outdated
@samuellees
Copy link
Copy Markdown
Collaborator Author

@flashinfer-bot rerun

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@samuellees samuellees marked this pull request as draft April 13, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] trtllm_fp4_block_scale_moe mxfp4_bf16 performance regression between 0.6.0 and 0.6.4

2 participants