chore: Update CODEOWNERS by flashinfer-bot · Pull Request #1984 · flashinfer-ai/flashinfer

flashinfer-bot · 2025-10-27T01:11:39Z

Summary

This PR updates the CODEOWNERS file based on git commit history analysis from the last 180 days.

Changes

Updated .github/CODEOWNERS with current code ownership based on:
- Commit frequency
- File coverage
- Commit recency

How to Review

Review the changes to .github/CODEOWNERS
Verify that the assigned owners are appropriate for each module
Make manual adjustments if needed before merging

Notes

This is an automated PR generated weekly
Minimum commits threshold: 1
Analysis period: 180 days
Directory depth: 3 levels
Top N owners per module: 5

🤖 This PR was automatically generated by the update-codeowners workflow

Summary by CodeRabbit

Chores
- Updated code ownership assignments and reorganized related section mappings for internal development processes.

coderabbitai · 2025-10-27T01:11:49Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The .github/CODEOWNERS file was edited to reassign, add/remove, and reorder owners across many path blocks (notably multiple csrc/*, nv_internal/*, csrc/xqa, and several flashinfer/* paths). No code changes beyond ownership mappings.

Changes

Cohort / File(s)	Summary
Primary CODEOWNERS entries `.github/CODEOWNERS`	Reordered and updated owner lists for core native/back-end paths: `csrc/fused_moe/`, `csrc/fused_moe/cutlass_backend/`, `csrc/nv_internal/`, `csrc/nv_internal/tensorrt_llm/`, and `csrc/xqa/*`. Owners added, removed, or re-ordered across those blocks.
FlashInfer subsystem `.github/CODEOWNERS`	Adjusted ownership for `flashinfer/` areas including `flashinfer/jit/`, `flashinfer/jit/attention/`, `flashinfer/comm/`, `flashinfer/gemm/`, `flashinfer/trtllm/`, and `include/include/flashinfer/`. Owner sets re-assigned, with some additions/removals and reordering.
Docs & auxiliary scripts `.github/CODEOWNERS`, `scripts/*`, `docs/...`	Minor ownership reassignments and reorderings for `docs/` paths and script-related entries referenced in CODEOWNERS (ownership shifts between listed members).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Administrative changes concentrated in a single file (.github/CODEOWNERS) but spanning many path blocks.
Review focus: ensure each path block lists the intended owners and no accidental deletions or syntax issues.
Pay extra attention to owner entries for: csrc/fused_moe/*, csrc/nv_internal/*, flashinfer/jit/attention/*.

Possibly related PRs

ci: Create Github Action to Automate CODEOWNER update #1870 — adds automation tooling and scripts related to generating/updating .github/CODEOWNERS (strongly related).
chore: Update CODEOWNERS #1949 — performs similar reassignments/reordering of CODEOWNERS entries across overlapping paths.
chore: update the list of authorized codeowners #1970 — updates scripts/authorized_codeowner.txt and coordinates additions to CODEOWNERS (related to owner list updates).

Suggested reviewers

nvmbreughe
yzh119

Poem

🐰 I hopped through lines of owner names and keys,
I nudged a few badges with careful little ease.
From fused kernels to jit’s bright lane,
New keepers noted, the list rearranged—hooray again! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description provides substantive information about the changes (what was updated, why, and how to review), but it significantly deviates from the required template structure. The template specifies required sections including "📌 Description," "🔍 Related Issues," and "🚀 Pull Request Checklist" with specific pre-commit and testing checkboxes. The provided description completely omits the Related Issues section and the Pull Request Checklist with its required checkboxes, instead presenting a custom format with "Summary," "Changes," "How to Review," and "Notes" sections. While this is an automated PR, the description still fails to follow the repository's prescribed template structure for PR descriptions.	To resolve this, the PR description should be updated to follow the repository template more closely. Include the "## 🔍 Related Issues" section (even if empty/none), and add the "## 🚀 Pull Request Checklist" section with the pre-commit verification and testing checkboxes. The substantive content about the CODEOWNERS update can be incorporated into the "## 📌 Description" section. Alternatively, if automated PRs are intended to bypass this template, the template or automation process should be adjusted accordingly.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "chore: Update CODEOWNERS" is clear, concise, and directly related to the primary change in the pull request. It uses conventional commit formatting with a "chore:" prefix, which is appropriate for administrative updates. The title accurately reflects that the CODEOWNERS file is being updated based on commit history analysis, which matches the core objective described in the PR.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch auto-update-codeowners

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-10-27T01:11:50Z

Summary of Changes

Hello @flashinfer-bot, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request automates the maintenance of the CODEOWNERS file, ensuring that the designated owners for various parts of the codebase are accurate and up-to-date. By regularly analyzing commit history, the system identifies active contributors to specific modules, thereby streamlining the review process and ensuring that relevant team members are notified of changes. This helps maintain code quality and efficient collaboration.

Highlights

Automated CODEOWNERS Update: The .github/CODEOWNERS file has been updated automatically to reflect current code ownership based on recent commit history.
Ownership Analysis Criteria: The update process analyzed commit frequency, file coverage, and commit recency over the last 180 days to determine appropriate owners.
Specific Ownership Adjustments: Existing owners for several csrc/ paths were reordered, and @cyx-6 was added as an owner for the flashinfer/jit/attention/ directory.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR is an automated update to the CODEOWNERS file based on recent commit history. The changes mostly involve reordering owners and adding a new owner to one path, which is expected from such an automation. My review focuses on improving the conciseness and maintainability of the generated CODEOWNERS file.

I've found a couple of instances of redundant ownership rules where a subdirectory is assigned the exact same owners as its parent directory. Since CODEOWNERS rules are inherited, these specific rules for subdirectories are unnecessary. Removing them would make the file cleaner. I've added specific comments on the lines that can be removed.

For future improvements, it would be beneficial to enhance the generation script (scripts/codeowner_analyzer.py) to automatically detect and consolidate these redundant rules.

gemini-code-assist · 2025-10-27T01:12:52Z

.github/CODEOWNERS

-csrc/fused_moe/cutlass_backend/ @yzh119 @yongwww @djmmoss @wenscarl @cyx-6
-csrc/nv_internal/ @wenscarl @djmmoss @yzh119 @cyx-6 @yongwww
+csrc/fused_moe/ @yzh119 @yongwww @djmmoss @cyx-6 @wenscarl
+csrc/fused_moe/cutlass_backend/ @yzh119 @yongwww @djmmoss @cyx-6 @wenscarl


This line is redundant because the owners for csrc/fused_moe/cutlass_backend/ are identical to the owners of its parent directory csrc/fused_moe/ (defined on line 12). Since GitHub's CODEOWNERS file inherits ownership from parent directories, this line can be removed to make the file more concise without changing the effective owners.

gemini-code-assist · 2025-10-27T01:12:52Z

.github/CODEOWNERS

 csrc/nv_internal/include/ @wenscarl
-csrc/nv_internal/tensorrt_llm/ @wenscarl @djmmoss @yzh119 @cyx-6 @yongwww
-csrc/xqa/ @yzh119 @cyx-6
+csrc/nv_internal/tensorrt_llm/ @wenscarl @djmmoss @cyx-6 @yzh119 @yongwww


This line is also redundant. The owners listed for csrc/nv_internal/tensorrt_llm/ are the same as those for the parent directory csrc/nv_internal/ on line 14. This line can be safely removed to simplify the CODEOWNERS file.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/CODEOWNERS (1)
26-26: Remove duplicate owner entry.

The owner @Anerudhan is listed twice on this line, which appears to be an error in the automated generation script.

Apply this diff to remove the duplicate:
-flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6 @Anerudhan
+flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6

♻️ Duplicate comments (2)

.github/CODEOWNERS (2)

12-13: Redundant subdirectory entry still present.

Line 13 (csrc/fused_moe/cutlass_backend/) is redundant because the owners are identical to the parent directory on line 12. Since GitHub's CODEOWNERS inherits ownership from parent directories, this line can be removed.

14-17: Redundant subdirectory entry still present.

Line 17 (csrc/nv_internal/tensorrt_llm/) is redundant because the owners are identical to the parent directory on line 14. This line can be removed per GitHub's CODEOWNERS inheritance rules.

🧹 Nitpick comments (1)

.github/CODEOWNERS (1)

1-43: Consider improving the automation script quality.

While the ownership updates reflect recent commit activity, the automation script has some quality issues:

It generated a duplicate owner entry (line 26)

It recreated redundant subdirectory entries (lines 13, 17) that were previously flagged as unnecessary

These redundant entries persist despite GitHub's CODEOWNERS inheritance model

Consider enhancing the script to:

Deduplicate owner lists

Detect and skip subdirectories with identical ownership to their parent

Optionally incorporate feedback from previous reviews

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4a3ff4 and 8098d21.

📒 Files selected for processing (1)

.github/CODEOWNERS (2 hunks)

🔇 Additional comments (2)

.github/CODEOWNERS (2)

18-18: LGTM - ownership update appears reasonable.

The updated ownership for csrc/xqa/ to @cyx-6 @yzh119 appears to be a legitimate change based on recent commit activity.

30-30: LGTM - ownership update appears reasonable.

The updated ownership for flashinfer/jit/attention/ to include @cyx-6 alongside the existing owners reflects recent contributions to this module.

Auto-generated CODEOWNERS update based on commit activity over the last 180 days. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

.github/CODEOWNERS (2)

12-13: Consider removing redundant child directory entry.

Lines 12 and 13 have identical owner lists. Since GitHub's CODEOWNERS inherits ownership from parent directories, line 13 (csrc/fused_moe/cutlass_backend/) is redundant and can be removed unless you intend to assign different owners in the future.

Note: This redundancy was previously flagged but the automation regenerated it.

14-17: Consider removing redundant child directory entry.

Lines 14 and 17 have identical owner lists. The entry for csrc/nv_internal/tensorrt_llm/ (line 17) is redundant since it inherits the same owners from its parent directory csrc/nv_internal/ (line 14).

Note: This redundancy was previously flagged but the automation regenerated it.

🧹 Nitpick comments (1)

.github/CODEOWNERS (1)

36-37: Consider removing redundant child directory entry.

Lines 36 and 37 have identical owner lists. The entry for include/flashinfer/ (line 37) is redundant since it inherits the same owners from its parent directory include/ (line 36). Consider removing line 37 unless you plan to assign different owners.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8098d21 and 1915653.

📒 Files selected for processing (1)

.github/CODEOWNERS (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

coderabbitai · 2025-11-03T01:13:11Z

.github/CODEOWNERS

@@ -26,18 +26,18 @@ flashinfer/comm/ @yzh119 @cyx-6 @nvmbreughe @wenscarl @djmmoss
 flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6 @Anerudhan


⚠️ Potential issue | 🟠 Major

Remove duplicate owner @Anerudhan.

The owner @Anerudhan appears twice in the same line. This indicates a bug in the automated generation script that should be addressed.

Apply this diff:

-flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6 @Anerudhan +flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6 @Anerudhan

flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6

🤖 Prompt for AI Agents

In .github/CODEOWNERS around line 26, remove the duplicate owner entry so the line reads with a single occurrence of @Anerudhan (e.g., "flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6"), ensure owners are space-separated and there are no repeated entries, then save and commit the change.

@IwakuraRein

Update to v0.5.2 and opt cuda graph launch config for MTP situation * fix q len for MTP; * release: Bump version for v0.5.2 release (flashinfer-ai#2057)  ## 📌 Description  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Chores** * Version updated to 0.5.2 ; * [BUG] Fix trtllm-gen fp4 moe renormalize routing (flashinfer-ai#2049)  ## 📌 Description Temporarily disable `routingIndicesBlockKernel` as it's not compatible with the current packing format (topk-id and expert weights are packed into a 32 bit tensor). This solves the issue flashinfer-ai#2032 ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Bug Fixes** * Forced multi-block MoE execution to avoid sporadic single-block selection and improve stability with certain workloads. * **New Features** * Added an alternative packed top‑k routing input path that propagates routing scores when present. * **Tests** * Added a comprehensive parametrized test validating routed fused MoE across token counts, model sizes, expert counts and multiple quantization modes.  --------- Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com> Co-authored-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>; * test: Skip test_fp8_quantize.py on Hopper (flashinfer-ai#2052)  ## 📌 Description The unit test `test_fp8_quantize.py` currently fails on sm90. Root cause: The test file tests the accuracy of `mxfp8_quantize()`. However, in [fp8_quantization.py](https://github.com/flashinfer-ai/flashinfer/blob/adb0e89fdee0a3140a43982bc3bef4e79ce20046/flashinfer/fp8_quantization.py#L7), the `mxfp8_quantize()`'s underlying module only exists for `gen_mxfp8_quantization_sm100_module` with no sm90 support. Current PR changes test file to skip for pre-SM100 SM archs as they are not supported.. Results: * Before current PR on SM90: `72 failed, 40 passed in 2.69s` * After current PR on SM90: `40 passed, 72 skipped in 1.41s` * Before current PR on SM120: `112 passed in 1.59s` * After current PR on SM120: `112 passed in 1.54s` (expected to be the same as before)  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Added conditional checks to skip FP8 quantization tests on GPUs that lack required computational capabilities. ; * Add support for topkPacked input in block-level renormalize (flashinfer-ai#2051)  ## 📌 Description Add support for topkPacked input in block-level renormalize ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Performance** * Optimized routing layer efficiency through improved index handling in specialized processing configurations.  Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>; * chore: Update CODEOWNERS (flashinfer-ai#1984) ## Summary This PR updates the CODEOWNERS file based on git commit history analysis from the last 180 days. ## Changes - Updated `.github/CODEOWNERS` with current code ownership based on: - Commit frequency - File coverage - Commit recency ## How to Review 1. Review the changes to `.github/CODEOWNERS` 2. Verify that the assigned owners are appropriate for each module 3. Make manual adjustments if needed before merging ## Notes - This is an automated PR generated weekly - Minimum commits threshold: 1 - Analysis period: 180 days - Directory depth: 3 levels - Top N owners per module: 5 --- 🤖 This PR was automatically generated by the [update-codeowners workflow](.github/workflows/update-codeowners.yml)  ## Summary by CodeRabbit * **Chores** * Updated code ownership assignments and reorganized related section mappings for internal development processes.  Co-authored-by: flashinfer-bot <flashinfer-bot@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>; * Update trtllm-gen fused moe routing kernel and add more kernels (flashinfer-ai#1955)  ## 📌 Description co-work with @IwakuraRein - update the trtllm-gen fused moe headers - add new kernels for trtllm-gen fused moe - for NvFp4, add tile 256 - for MxFp8 x MxFp4, add 128, 256 - for FP8 per-tensor, add 192, 256 - for FP8 block scale, add 128 - update the logics of `computeSelectedTileN` - add `tune_max_num_tokens` to FP8 per-tensor and FP8 block scale - rename `TLLM_GEN_BMM_CUBIN_PATH` to `TLLM_GEN_GEMM_CUBIN_PATH` - add `TLLM_GEN_EXPORT_FLASHINFER` **NOTE: split-k kernels are temporarily disabled as they cause failure in renormalize + expert 256 tests.** ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **New Features** * Expanded MoE tiling (adds 128/192/256), FP8 per‑tensor MoE path, FP8/FP4 autotuner benchmark, and new tune_max_num_tokens tuning parameter. * **Improvements** * Router now supports tile‑based (non‑power‑of‑two) layouts and propagates explicit valid M/N/K for safer sizing; autotuner logs include exception details; added export/compile flags and clearer kernel error messages. * **Bug Fixes** * Relaxed strict padding/power‑of‑two checks and made log2 handling safer. * **Tests** * Extended MoE tests to cover new FP8 block‑scale and routing scenarios.  --------- Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>; * Fix dtype of output scales from mnnvl_moe_alltoallv_prepare_without_allgather (flashinfer-ai#2048)  ## 📌 Description During flashinfer-ai#1641 the dtype of output scales in moePrepare(mnnvl_moe_alltoallv_prepare_without_allgather) was accidently changed from float to int32. This PR fixes that. ## 🔍 Related Issues Fix flashinfer-ai#2040 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Bug Fixes** * Corrected tensor type validation for mixture-of-experts scale preparation so scales are validated and handled as float32, preventing type mismatches with downstream float operations. * Ensured scale tensors are created on the same device as expert identifiers, keeping tensor placement consistent across distributed processing and avoiding cross-device issues.  --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>; * test: Fix test_sampling.py on Spark (flashinfer-ai#2042)  ## 📌 Description Current PR fixes `test_sampling.py::test_softmax` on Spark by inserting a `torch.cuda.synchronize()` before calling the softmax function. tl; dr why it works: PDL is enabled in these tests. Investigation shows that when PDL is enabled, `logits.view(-1).index_fill_(0, inf_idx, float("-inf"))` that prepares the inputs overlaps with the `probs = flashinfer.sampling.softmax(logits, temperature=temperature_arr)` function itself. Hence, we need to ensure that the input preparation is complete before running the softmax function to get the correct output. #### Observations `test_sampling.py::test_softmax` fails on select cases Spark. Example output ``` # pytest tests/utils/test_sampling.py::test_softmax =================================================================================================================================================== test session starts =================================================================================================================================================== platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 rootdir: /flashinfer configfile: pytest.ini collected 324 items ... ================================================================================================================================================= short test summary info ================================================================================================================================================= FAILED tests/utils/test_sampling.py::test_softmax[True-True-1.0-normal_distribution(std=1)-128256-989] - AssertionError: assert False FAILED tests/utils/test_sampling.py::test_softmax[True-True-1.0-normal_distribution(std=5)-128256-989] - AssertionError: assert False FAILED tests/utils/test_sampling.py::test_softmax[True-True-1.0-gumbel_distribution(beta=0.1)-128256-989] - AssertionError: assert False ======================================================================================================================================== 3 failed, 321 passed, 1 warning in 10.33s ``` Observations from debugging: * When outputs are printed, rows containing all `nan`s are produced in the output of `probs = flashinfer.sampling.softmax(logits)` * Surprisingly, the test passes with `CUDA_LAUNCH_BLOCKING=1 pytest tests/utils/test_sampling.py::test_softmax` * `compute-sanitizer` does not detect any IMAs * Running only a failed test results in a pass: ``` $ pytest tests/utils/test_sampling.py::test_softmax[True-True-1.0-normal_distribution$std=1$-128256-989] ... 1 passed, 1 warning in 0.80s ``` Towards a fix: * I empirically find that the test passes: * when the reference `torch.softmax()` is called before `flashinfer.sampling.softmax()` (currently reference is called after) * when pdl is disabled in [line 67](https://github.com/flashinfer-ai/flashinfer/blob/main/tests/utils/test_sampling.py#L67) with `probs = flashinfer.sampling.softmax(logits, temperature=temperature_arr,enable_pdf=False)` * when `torch.cuda.synchronize()` is inserted in the line 64 as in this PR. ``` if neg_inf_input: # assign random logits to -inf num_inf = torch.randint(0, logits.numel() - 1, (), device=logits.device).item() inf_idx = torch.randperm(logits.numel(), device=logits.device)[:num_inf] logits.view(-1).index_fill_(0, inf_idx, float("-inf")) torch.cuda.synchronize() ## This fixes the issue for some reason! if temperature_arr: temperature_arr = torch.full((batch_size,), temperature, device="cuda:0") probs = flashinfer.sampling.softmax(logits, temperature=temperature_arr) logits_scaled = logits / temperature_arr.unsqueeze(-1) ``` but **does not fix the issue if I place the synchronization any earlier** An nsys profile shows that surprisingly the `logits.view(-1).index_fill_(0, inf_idx, float("-inf"))` and `flashinfer.sampling.softmax(logits, temperature=temperature_arr)` can overlap execution when pdl is enabled. <img width="1243" height="640" alt="Screenshot 2025-11-04 at 5 49 50 PM" src="https://github.com/user-attachments/assets/950ab8ab-0843-49c8-8411-ff81c00c34a6" /> This means that the softmax kernel is launching before inputs are done being prepared when `neg_inf_input=True`. Hence, placing a `torch.cuda.synchronize()` after the fill or disabling pdl can solve the issue. With the current PR, the nsys timeline changes to: <img width="1240" height="643" alt="Screenshot 2025-11-04 at 5 51 32 PM" src="https://github.com/user-attachments/assets/aae63a88-d7cd-4661-8476-6d8c581879b2" /> and the unit test passes.  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit ## Release Notes * **Bug Fixes** * Improved synchronization of concurrent operations to ensure proper execution order and prevent potential timing-related issues. ; * fix: support both pip and uv pip for finding flashinfer-python package (flashinfer-ai#2043) Update getJitIncludeDirs() to try pip first, then fallback to uv pip if pip is not available. This ensures compatibility with both standard pip and uv pip package managers when locating the flashinfer-python installation for JIT compilation include paths. The command now uses shell OR operator (||) to attempt pip first, and only falls back to uv pip if the first command fails. ``` pytest -xs tests/moe/test_trtllm_cutlass_fused_moe.py::test_moe_fp8_block_scaling ============================================================================================================================================================ test session starts ============================================================================================================================================================= platform linux -- Python 3.10.12, pytest-8.4.2, pluggy-1.6.0 rootdir: /home/scratch.dmoss_gpu_1/repos/flashinfer configfile: pytest.ini collected 1 item tests/moe/test_trtllm_cutlass_fused_moe.py [TensorRT-LLM][INFO] Compiling JIT runtime gemm_swapAB_256_128_128_16_128_2_82_8_1_GroupedWithOffset with options: [TensorRT-LLM][INFO] -std=c++17 [TensorRT-LLM][INFO] --gpu-architecture=sm_90a [TensorRT-LLM][INFO] --ptxas-options=-allow-expensive-optimizations=true [TensorRT-LLM][INFO] --ptxas-options=--register-usage-level=10 [TensorRT-LLM][INFO] --diag-suppress=161,174,177,940 [TensorRT-LLM][INFO] -D__FORCE_INCLUDE_CUDA_FP16_HPP_FROM_FP16_H__=1 [TensorRT-LLM][INFO] -D__FORCE_INCLUDE_CUDA_BF16_HPP_FROM_BF16_H__=1 [TensorRT-LLM][INFO] -O3 [TensorRT-LLM][INFO] -cubin [TensorRT-LLM][INFO] --expt-relaxed-constexpr [TensorRT-LLM][INFO] --expt-extended-lambda [TensorRT-LLM][INFO] --compiler-options=-fPIC,-O3,-Wno-deprecated-declarations,-Wno-abi [TensorRT-LLM][INFO] -I/home/scratch.dmoss_gpu_1/repos/flashinfer/flashinfer/data/csrc/nv_internal/tensorrt_llm [TensorRT-LLM][INFO] [TensorRT-LLM][INFO] Generated kernel code: #ifdef __CUDACC_RTC__ #ifndef NVRTC_JIT_COMPILATION #define NVRTC_JIT_COMPILATION #endif #include <deep_gemm/nvrtc_std.cuh> #else #include <string> #include <cuda.h> #endif #include <cuda_bf16.h> #include <cuda_fp8.h> #include <deep_gemm/nvrtc_cutlass.cuh> #include <deep_gemm/fp8_gemm_impl.cuh> using namespace deep_gemm; using SchedulerType = typename SchedulerSelectorSwapAB<GemmType::GroupedWithOffset, 256, 128, 128, 16, 128, 2, 1>::type; __global__ void dummy_kernel() { void *ptr = (void *)&fp8_gemm_kernel_swapAB<256, 128, 128, 16, 128, 2, 8, 128, 128, 1, SchedulerType, GroupedWithOffsetSchedulerInputSwapAB>; } [TensorRT-LLM][INFO] NVCC compilation took 3064 ms [TensorRT-LLM][INFO] Compilation log: [TensorRT-LLM][INFO] Successfully copied kernel files to cache directory: /home/dmoss/.tensorrt_llm/cache/gemm_swapAB_256_128_128_16_128_2_82_8_1_GroupedWithOffset [TensorRT-LLM][INFO] Compiling JIT runtime gemm_swapAB_128_128_128_16_128_2_82_8_1_GroupedWithOffset with options: [TensorRT-LLM][INFO] -std=c++17 [TensorRT-LLM][INFO] --gpu-architecture=sm_90a [TensorRT-LLM][INFO] --ptxas-options=-allow-expensive-optimizations=true [TensorRT-LLM][INFO] --ptxas-options=--register-usage-level=10 [TensorRT-LLM][INFO] --diag-suppress=161,174,177,940 [TensorRT-LLM][INFO] -D__FORCE_INCLUDE_CUDA_FP16_HPP_FROM_FP16_H__=1 [TensorRT-LLM][INFO] -D__FORCE_INCLUDE_CUDA_BF16_HPP_FROM_BF16_H__=1 [TensorRT-LLM][INFO] -O3 [TensorRT-LLM][INFO] -cubin [TensorRT-LLM][INFO] --expt-relaxed-constexpr [TensorRT-LLM][INFO] --expt-extended-lambda [TensorRT-LLM][INFO] --compiler-options=-fPIC,-O3,-Wno-deprecated-declarations,-Wno-abi [TensorRT-LLM][INFO] -I/home/scratch.dmoss_gpu_1/repos/flashinfer/flashinfer/data/csrc/nv_internal/tensorrt_llm [TensorRT-LLM][INFO] [TensorRT-LLM][INFO] Generated kernel code: #ifdef __CUDACC_RTC__ #ifndef NVRTC_JIT_COMPILATION #define NVRTC_JIT_COMPILATION #endif #include <deep_gemm/nvrtc_std.cuh> #else #include <string> #include <cuda.h> #endif #include <cuda_bf16.h> #include <cuda_fp8.h> #include <deep_gemm/nvrtc_cutlass.cuh> #include <deep_gemm/fp8_gemm_impl.cuh> using namespace deep_gemm; using SchedulerType = typename SchedulerSelectorSwapAB<GemmType::GroupedWithOffset, 128, 128, 128, 16, 128, 2, 1>::type; __global__ void dummy_kernel() { void *ptr = (void *)&fp8_gemm_kernel_swapAB<128, 128, 128, 16, 128, 2, 8, 128, 128, 1, SchedulerType, GroupedWithOffsetSchedulerInputSwapAB>; } [TensorRT-LLM][INFO] NVCC compilation took 1479 ms [TensorRT-LLM][INFO] Compilation log: [TensorRT-LLM][INFO] Successfully copied kernel files to cache directory: /home/dmoss/.tensorrt_llm/cache/gemm_swapAB_128_128_128_16_128_2_82_8_1_GroupedWithOffset . ============================================================================================================================================================= 1 passed in 9.02s ============================================================================================================================================================== ```  ## Summary by CodeRabbit * **Bug Fixes** * Improved package detection compatibility for alternative package management tool installations. ; * use scalar for kv_scale in xqa (flashinfer-ai#2033)  ## 📌 Description  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Breaking Changes** * Public xqa/xqa_mla entry points now accept kv_scale as a plain float (default 1.0) instead of a 1-element tensor. Update call sites accordingly. * **Documentation** * Docstrings updated to reflect kv_scale as float. * **Tests** * Tests updated to pass scalar kv_scale, with added parameterization and conditional skip for FP8 kv-cache scenarios.  --------- Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>; * Support cc common check decorator for empty backends (flashinfer-ai#2015)  ## 📌 Description  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Bug Fixes** * Improved backend/compute-capability validation with clearer errors and correct fallback when backend-specific checks are absent. * **New Features** * Decorated functions expose runtime attributes to query backend availability and choices. * Default-backend behavior: kernels use a default when none is passed. * **Compatibility** * Expanded supported compute-capability set and raised minimum cuDNN package requirements. * **Tests** * Added tests for empty-backend common-checks and default-backend behavior. * **Chores** * Version bumped to 0.5.1. ; * perf: Speed up fp4 quantization for small batch with swizzling for cutlass MoE (flashinfer-ai#2025)  ## 📌 Description Performance optimization for `fp4_quantize()` function. The performance issue was raised in issues flashinfer-ai#1734 and flashinfer-ai#2021 Observed behavior was slow performance when `is_sf_swizzled_layout=True` (as opposed to False). Root cause of the issue was * Excessive Padding Overhead: Swizzled layouts require row padding to tile boundaries where `SWIZZLED_128x4` pads to multiples of 128 rows and `SWIZZLED_8x4` pads to multiples of 8 rows * This means `For batch_size=1` with SWIZZLED_128x4: 127 out of 128 rows are padding (99.2% wasted work) * Sequential Processing: The original grid launch used grid.x = min(m, multiProcessorCount * numBlocksPerSM), so: For batch_size=1: only 1 block launched * This single block iterated sequentially over all 128 padded rows * Each padding row still computed scale factors, checked bounds, and performed conditional logic * No Fast Path: Every row (real or padding) went through the same expensive code path with multiple conditional branches The fix: 1. Kernel-Level Early Exit Fast Path (`quantization.cuh`): Added branch divergence optimization with separate handling for padding vs. data rows - Padding rows now execute ~10× fewer instructions; Eliminates memory loads/stores for input/output data on padding rows; Reduces register pressure and divergence overhead 2. Host-Level Parallel Grid Launch (`quantization.cu`): Modified grid calculation to launch blocks proportional to padded rows instead of actual rows: - For batch_size=1 with SWIZZLED_128x4: launches up to 128 blocks instead of 1; Each block processes 1 row in parallel instead of sequentially; overall tries to achieve full GPU occupancy even with small batch sizes  `fp4_quantize()` performance before fix: ``` $ python3 bench_fp4_quantize.py +------------+---------------------+-------------------------+ | batch size | swizzled_times (us) | non_swizzled_times (us) | +------------+---------------------+-------------------------+ | 1.0 | 71.52 | 3.136 | | 2.0 | 37.152 | 3.168 | | 4.0 | 19.904 | 3.168 | | 8.0 | 11.296 | 3.2 | | 16.0 | 7.103 | 3.296 | | 32.0 | 4.96 | 3.376 | | 64.0 | 4.128 | 3.487 | | 128.0 | 3.808 | 3.648 | | 256.0 | 4.32 | 4.161 | | 512.0 | 5.472 | 5.184 | +------------+---------------------+-------------------------+ ``` After fix in current PR: ``` $ python3 bench_fp4_quantize.py +------------+---------------------+-------------------------+ | batch size | swizzled_times (us) | non_swizzled_times (us) | +------------+---------------------+-------------------------+ | 1.0 | 3.456 | 3.264 | | 2.0 | 3.488 | 3.296 | | 4.0 | 3.536 | 3.296 | | 8.0 | 3.52 | 3.296 | | 16.0 | 3.52 | 3.456 | | 32.0 | 3.696 | 3.488 | | 64.0 | 3.744 | 3.584 | | 128.0 | 3.936 | 3.776 | | 256.0 | 4.384 | 4.288 | | 512.0 | 5.568 | 5.248 | +------------+---------------------+-------------------------+ ``` where the `bench_fp4_quantize.py` script used to benchmark (adopted from flashinfer-ai#1734) : ``` from flashinfer.testing.utils import bench_gpu_time_with_cupti from flashinfer import fp4_quantize import torch import numpy as np import pandas as pd from tabulate import tabulate A_scale = torch.randn(16).cuda().float() bsz = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512] swizzled_times = [] for bs in bsz: A = torch.randn(bs, 5120).cuda().to(torch.bfloat16) t = np.median(bench_gpu_time_with_cupti( lambda: fp4_quantize(A, A_scale, is_sf_swizzled_layout=True), dry_run_iters = 10, repeat_iters = 100, ) ) * 1000 swizzled_times.append(t) non_swizzled_times = [] for bs in bsz: A = torch.randn(bs, 5120).cuda().to(torch.bfloat16) t = np.median(bench_gpu_time_with_cupti( lambda: fp4_quantize(A, A_scale, is_sf_swizzled_layout=False), dry_run_iters = 10, repeat_iters = 100, ) ) * 1000 non_swizzled_times.append(t) summary_df = pd.DataFrame({ "batch size": bsz, "swizzled_times (us)": swizzled_times, "non_swizzled_times (us)": non_swizzled_times, }) # Round numeric columns to three decimals before printing summary_df_rounded = summary_df.copy() summary_df_rounded["batch size"] = summary_df_rounded["batch size"].astype(int) summary_df_rounded["swizzled_times (us)"] = summary_df_rounded["swizzled_times (us)"].round(3) summary_df_rounded["non_swizzled_times (us)"] = summary_df_rounded["non_swizzled_times (us)"].round(3) print(tabulate(summary_df_rounded, headers='keys', tablefmt='pretty', showindex=False)) ``` ## 🔍 Related Issues flashinfer-ai#1734 flashinfer-ai#2021  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Bug Fixes** * Improved quantization for swizzled memory layouts by adjusting how effective processing rows are computed to better utilize GPU resources. * Added early-exit handling for padding-only rows so padding outputs are zeroed without processing data. * Ensured consistent zeroing of scale/format outputs for padded columns across all quantization paths. ; * bugfix: fix failed unittest `test_green_ctx` and `test_jit_example` on spark (sm_121) (flashinfer-ai#1951)  ## 📌 Description There are three failed unittests on spark (sm_121): * tests/utils/test_green_ctx.py * tests/utils/test_jit_example.py * tests/utils/test_sampling.py First one is because spark has small number of SMs (48) and we don't have a guard on green context splitting. Second one is an unknown issue (logits don't match with reference) and probably related to barriers on sm_121, xfail now and will fix later. The last one will be fixed by another PR from @bkryu , this PR fixes the first two issues. ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Tests now pre-check GPU resources and auto-skip with informative messages including available and requested SM counts to avoid spurious failures. * Added a conditional xfail for GPUs with compute capability 12.1 to avoid false negatives on that hardware. * Tightened a sampling test by adding a relative tolerance for more robust numerical validation. * **Bug Fixes** * Improved runtime error handling to surface clearer guidance when GPU SM resources are insufficient.  --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>; * Update Docker CI tags to 20251104-d528f0c (flashinfer-ai#2041) This PR updates the Docker CI image tags to the latest version: `20251104-d528f0c` Updated images: - flashinfer/flashinfer-ci-cu126:20251104-d528f0c - flashinfer/flashinfer-ci-cu128:20251104-d528f0c - flashinfer/flashinfer-ci-cu129:20251104-d528f0c - flashinfer/flashinfer-ci-cu130:20251104-d528f0c Auto-generated by [release-ci-docker workflow](https://github.com/flashinfer-ai/flashinfer/actions/runs/19084098717)  ## Summary by CodeRabbit * **Chores** * Updated Docker image tags to latest versions for CUDA 12.6, 12.8, 12.9, and 13.0 distributions.  Co-authored-by: yzh119 <11773619+yzh119@users.noreply.github.com>; * test: Mark test_fp8_prefill.py as xfail on SM90 (flashinfer-ai#2038)  ## 📌 Description `test_fp8_prefill.py` is currently failing on SM90, but consumes too much time to run/fail, causing unit-tests to time out. --Current PR marks it as xfail so that unit tests can progress forward.-- Update: Root cause of failure is because mixed precision attention is not available on `fa3` backend, but the attention prefill wrapper automatically selects `backend='fa3'` on SM90. Fix is to explicitly specify the `backend='fa2'` so that fa2 is always used. Status after fix: ``` $ pytest tests/attention/test_fp8_prefill.py =================================================================================================================================================== test session starts =================================================================================================================================================== ... collected 768 items tests/attention/test_fp8_prefill.py ............................................................................................................................................................................................................................................................................... [ 35%] ................................................................................................................................................................................................................................................................................................................... [ 75%] .............................................................................................................................................................................................. [100%] ======================================================================================================================================= 768 passed, 1 warning in 131.42s (0:02:11) ======================================================================================================================================== ```  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Adjusted FP8/FP16 attention test configuration to explicitly select a backend during prefill/decoding, stabilizing test behavior across environments. * **Public API** * Constructors now accept an explicit backend parameter to allow selecting the backend used for KV cache operations. ; * ci: Update cudnn version requirements in CI container (flashinfer-ai#2039)  ## 📌 Description cuDNN versions specified in CI container setup (`docker/install/install_python_packages.sh`) are currently 9.11 and 9.12. In unit testing, this causes issues as `mm_fp4(backend='cudnn')` is not supported on Spark (sm121) for older cuDNN versions in cu130. Failure is due to cuDNN version shipped with container being too old. In the [latest container build pipeline output](https://github.com/flashinfer-ai/flashinfer/actions/runs/18778064727/job/53577233568#step:6:727), cudnn 9.13.0.50 is installed ``` flashinfer-ai#16 207.0 Requirement already satisfied: nvidia-cudnn-cu13>=9.12.0.46 in /opt/conda/envs/py312/lib/python3.12/site-packages (9.13.0.50) flashinfer-ai#16 207.0 Requirement already satisfied: nvidia-cublas in /opt/conda/envs/py312/lib/python3.12/site-packages (from nvidia-cudnn-cu13>=9.12.0.46) (13.0.0.19) ``` Current PR updates the minimum cudnn version for both [cu12](https://pypi.org/project/nvidia-cudnn-cu12/#history) and [cu13](https://pypi.org/project/nvidia-cudnn-cu13/#history) to 9.14.0.64. cudnn 9.13 --> unit test fails with 180 failed, 270 passed, 2790 skipped, 1 warning in 8.97s ``` # pytest tests/gemm/test_mm_fp4.py =================================================================================================================================================== test session starts =================================================================================================================================================== platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 rootdir: /flashinfer configfile: pytest.ini collected 3240 items ... FAILED tests/gemm/test_mm_fp4.py::test_mm_fp4[mxfp4_alpha-False-True-cudnn-res_dtype1-512-512-256] - cudnn._compiled_module.cudnnGraphNotSupportedError: No valid engine configs for Matmul_MUL_ FAILED tests/gemm/test_mm_fp4.py::test_mm_fp4[mxfp4_alpha-False-True-cudnn-res_dtype1-512-512-512] - cudnn._compiled_module.cudnnGraphNotSupportedError: No valid engine configs for Matmul_MUL_ ================================================================================================================================ 180 failed, 270 passed, 2790 skipped, 1 warning in 8.97s ================================================================================================================================= ``` cudnn 9.14 --> unit test passes with 450 passed, 2790 skipped, 1 warning in 5.37s ``` # pytest tests/gemm/test_mm_fp4.py =================================================================================================================================================== test session starts =================================================================================================================================================== platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 rootdir: /flashinfer configfile: pytest.ini collected 3240 items tests/gemm/test_mm_fp4.py ... ====================================================================================================================================== 450 passed, 2790 skipped, 1 warning in 5.37s ======================================================================================================================================= ```  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Chores** * Updated internal dependencies for improved system stability and compatibility. ; * release: Bump version for v0.5.1 release (flashinfer-ai#2031)  ## 📌 Description Update `version.txt`  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Chores** * Version updated to 0.5.1 ; * Updated decorator to support unspecified default (flashinfer-ai#2026)  ## 📌 Description Updated decorator to support unspecified default. This was causing issues when calling mm_fp4 without backend specified. Also added SM 110 as a supported backend on the cutlass backend (mm_fp4) ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **New Features** * FP4 Cutlass GEMM now supports the SM110 GPU compute capability. * **Bug Fixes** * Kernels called without an explicit backend now consistently use the default backend. * **Tests** * Added a unit test to verify default backend selection and correct results when backend is omitted. ; * test: Enable xfailed trtllm decode long seqlen tests and update microbenchmark (flashinfer-ai#2018)  ## 📌 Description [tests/attention/test_trtllm_gen_attention.py](https://github.com/flashinfer-ai/flashinfer/blob/v0.5.0rc2/tests/attention/test_trtllm_gen_attention.py#L1021-L1076) was failing and therefore marked xfail. PR flashinfer-ai#2002 fixed the underlying root cause. Current PR thus removed the `xfail` marker so that these long seqlen cases could be fixed moving forward. Additionally, PR flashinfer-ai#2002 revealed a bug in the microbenchmark script where [trtllm_batch_decode_with_kv_cache](https://github.com/flashinfer-ai/flashinfer/blob/v0.5.0rc2/flashinfer/decode.py#L2082-L2083) explicitly requires the workspace to

## Summary This PR updates the CODEOWNERS file based on git commit history analysis from the last 180 days. ## Changes - Updated `.github/CODEOWNERS` with current code ownership based on: - Commit frequency - File coverage - Commit recency ## How to Review 1. Review the changes to `.github/CODEOWNERS` 2. Verify that the assigned owners are appropriate for each module 3. Make manual adjustments if needed before merging ## Notes - This is an automated PR generated weekly - Minimum commits threshold: 1 - Analysis period: 180 days - Directory depth: 3 levels - Top N owners per module: 5 --- 🤖 This PR was automatically generated by the [update-codeowners workflow](.github/workflows/update-codeowners.yml)  ## Summary by CodeRabbit * **Chores** * Updated code ownership assignments and reorganized related section mappings for internal development processes.  Co-authored-by: flashinfer-bot <flashinfer-bot@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>

flashinfer-bot added automated maintenance labels Oct 27, 2025

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

chore: update CODEOWNERS based on git history

1915653

Auto-generated CODEOWNERS update based on commit activity over the last 180 days. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

flashinfer-bot force-pushed the auto-update-codeowners branch from 8098d21 to 1915653 Compare November 3, 2025 01:10

coderabbitai bot reviewed Nov 3, 2025

View reviewed changes

yzh119 approved these changes Nov 6, 2025

View reviewed changes

yzh119 merged commit 26d587a into main Nov 6, 2025
4 checks passed

yzh119 deleted the auto-update-codeowners branch November 6, 2025 07:29

coderabbitai bot mentioned this pull request Nov 10, 2025

chore: Update CODEOWNERS #2067

Merged

coderabbitai bot mentioned this pull request Nov 24, 2025

chore: Update CODEOWNERS #2135

Merged

coderabbitai bot mentioned this pull request Dec 1, 2025

chore: Update CODEOWNERS #2152

Merged

coderabbitai bot mentioned this pull request Jan 5, 2026

chore: Update CODEOWNERS #2286

Merged

coderabbitai bot mentioned this pull request Feb 28, 2026

Manual codeowner update for qwen3-next/gdn PR reviews #2652

Closed

5 tasks

coderabbitai bot mentioned this pull request Mar 11, 2026

[chore] Add jiahanc to moe related code owner #2748

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Update CODEOWNERS#1984

chore: Update CODEOWNERS#1984
yzh119 merged 1 commit intomainfrom
auto-update-codeowners

flashinfer-bot commented Oct 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 27, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -26,18 +26,18 @@ flashinfer/comm/ @yzh119 @cyx-6 @nvmbreughe @wenscarl @djmmoss
		flashinfer/cudnn/ @Anerudhan @yzh119 @cyx-6 @Anerudhan

Conversation

flashinfer-bot commented Oct 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

How to Review

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flashinfer-bot commented Oct 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 27, 2025 •

edited

Loading