fix: allow fmha_v2_prefill_deepseek on SM121 (DGX Spark) by blake-snc · Pull Request #2559 · flashinfer-ai/flashinfer

blake-snc · 2026-02-13T18:45:07Z

Summary

fmha_v2_prefill_deepseek() only checked is_sm120a_supported() (SM12.0), excluding SM12.1 devices like NVIDIA DGX Spark (GB10)
The JIT compilation context already handles SM12.1 via supported_major_versions=[12] and the kernels work correctly
This adds is_sm121a_supported() to the guard check, consistent with how other SM12x checks are done in the codebase (e.g., prefill.py:106-107, gemm_base.py:3757)

Validation

Tested on DGX Spark (GB10, SM121a, CUDA 13.0, flashinfer 0.6.2):

Test	Result
fmha_v2_prefill_deepseek BF16 (batch=2, seq=64, heads=128, qk=192, vo=128)	PASS, no NaN
fmha_v2_prefill_deepseek FP8 e4m3 (same config)	PASS, no NaN
Bug reproduction (original code on SM12.1)	ValueError as expected

Test plan

Existing test_fmha_v2_prefill_deepseek test updated with same fix
No impact on SM12.0 (RTX 5090) — is_sm120a_supported check still in place
No impact on non-SM12x GPUs — they still get the existing ValueError
Bug reproduced: original code raises ValueError: fmha_v2_prefill_deepseek is only supported on SM120 GPUs. on SM12.1
Fix applied to installed package (no monkeypatch): BF16 and FP8 both produce correct output

Related: #2555

From Second Nature Computing — tested on NVIDIA DGX Spark.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Expanded GPU support for Deepseek operations to include SM121A GPUs in addition to SM120A, broadening hardware compatibility and enabling execution on a wider range of SM12x GPU architectures.

gemini-code-assist · 2026-02-13T18:45:25Z

Summary of Changes

Hello @blake-snc, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the flashinfer library by extending the fmha_v2_prefill_deepseek function's compatibility to NVIDIA SM12.1 GPUs. This ensures that users with newer hardware, specifically DGX Spark (GB10), can utilize this optimized prefill operation, improving hardware utilization and performance without impacting existing SM12.0 or other unsupported devices.

Highlights

Expanded GPU Support: The fmha_v2_prefill_deepseek() function, previously limited to SM12.0 GPUs, now explicitly supports SM12.1 devices, such as NVIDIA DGX Spark (GB10).
Compatibility Check Update: The device guard check for fmha_v2_prefill_deepseek() was updated to include is_sm121a_supported(), aligning with existing SM12x checks in the codebase.
JIT Compilation Alignment: The underlying JIT compilation context already handled SM12.1 devices, meaning the kernels were already compatible, and this change primarily updates the frontend guard.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

flashinfer/prefill.py
- Modified the fmha_v2_prefill_deepseek function's device compatibility check to include SM12.1 alongside SM12.0.
- Updated the ValueError message to reflect support for "SM12x GPUs".
tests/attention/test_fmha_v2_prefill_deepseek.py
- Imported is_sm121a_supported from flashinfer.utils.
- Updated the pytest.skip condition in test_fmha_v2_prefill_deepseek to include SM12.1 support, mirroring the change in the main function.
- Modified the skip message to indicate support for "SM12x GPUs".

Activity

The fmha_v2_prefill_deepseek function was tested on a DGX Spark (GB10, SM121a, CUDA 13.0, flashinfer 0.6.2) with BF16 and FP8 configurations, both passing without NaN values.
A test plan was outlined to ensure existing tests are updated, no impact on SM12.0, and correct ValueError for non-SM12x GPUs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-13T18:45:30Z

📝 Walkthrough

Walkthrough

The PR relaxes the GPU architecture check for fmha_v2_prefill_deepseek to accept SM12x GPUs (SM120A or SM121A) instead of SM120A-only, and updates the corresponding test skip condition and messages.

Changes

Cohort / File(s)	Summary
DeepSeek FMHA v2 `flashinfer/prefill.py`	Broadened supported-SM check from SM120A-only to SM12x (SM120A or SM121A); updated error message to reference "SM12x GPUs".
Tests — FMHA v2 prefill `tests/attention/test_fmha_v2_prefill_deepseek.py`	Added `is_sm121a_supported` import and updated skip condition/message to treat SM12x (SM120A or SM121A) as supported for tests.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

fix: compile flags for trtllm fmha_v2 #2175: Modifies the same prefill SM120A support check in flashinfer/prefill.py, overlapping with this change to SM12x support.

Suggested reviewers

cyx-6
nvmbreughe
yzh119
Anerudhan

Poem

🐰 I nibble code beneath the moonlight,
SM12x joins the hopping flight.
Prefill wakes with broader cheer,
DeepSeek runs on GPUs near,
A joyful thump — updates take flight! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding SM121 GPU support to fmha_v2_prefill_deepseek, with the hardware context (DGX Spark) for clarity.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`
Description check	✅ Passed	The PR description comprehensively addresses the change: problem statement, solution details, validation evidence, test plan, and related issues. All required template sections are completed with substantial content.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request correctly extends support for fmha_v2_prefill_deepseek to SM12.1 GPUs by updating the hardware capability check. The change is applied to both the function implementation and its corresponding test, ensuring correctness and maintainability. The updated error message and test skip message are also clear and accurate. The implementation is clean and follows the existing pattern in the codebase. Overall, this is a good and necessary fix.

jimmyzho

LGTM

The `fmha_v2_prefill_deepseek` function only checked `is_sm120a_supported()` which requires SM12.0 (RTX 5090). This excluded SM12.1 devices like NVIDIA DGX Spark (GB10) which support the same fmha_v2 SM120 kernels. The JIT compilation context already handles SM12.1 correctly via `supported_major_versions=[12]`, and the kernels compile and run successfully on SM12.1. Only the Python guard was blocking. Tested on DGX Spark (SM121a, CUDA 13.0): both BF16 and FP8 prefill produce correct results with no NaN. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jimmyzho · 2026-02-13T20:04:23Z

/bot run

flashinfer-bot · 2026-02-13T20:05:14Z

GitLab MR !317 has been created, and the CI pipeline #43999178 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-02-14T03:21:59Z

[CANCELING] Pipeline #43999178: canceled

blake-snc requested review from bkryu, cyx-6, jimmyzho, nvmbreughe and yzh119 as code owners February 13, 2026 18:45

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

jimmyzho approved these changes Feb 13, 2026

View reviewed changes

blake-snc force-pushed the fix/sm12x-fmha-v2-wiring branch from 0df2e31 to db94171 Compare February 13, 2026 19:47

blake-snc mentioned this pull request Feb 17, 2026

fix: guard CUTLASS FMHA against SM12x and fix fmha_v2 SM121a check #2560

Merged

4 tasks

jimmyzho merged commit 9e5ec6a into flashinfer-ai:main Feb 19, 2026
17 checks passed

mmangkad mentioned this pull request Feb 19, 2026

[FlashInfer] Bump FlashInfer version from 0.6.3 to 0.6.4 sgl-project/sglang#19005

Merged

5 tasks

This was referenced Feb 20, 2026

feat: add is_sm12x_supported() helper for SM12x family detection #2574

Merged

feat: add CuTe DSL flash attention backend for SM120 GPUs #2598

Open

coderabbitai bot mentioned this pull request Mar 16, 2026

[Spark unit test debugging] Fix for tests/gemm/test_groupwise_scaled_gemm_fp8.py #2751

Merged

5 tasks

coderabbitai bot mentioned this pull request Mar 24, 2026

feat: add SM120 fmha_v2 kernels to AOT pip wheel builds #2885

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow fmha_v2_prefill_deepseek on SM121 (DGX Spark)#2559

fix: allow fmha_v2_prefill_deepseek on SM121 (DGX Spark)#2559
jimmyzho merged 1 commit intoflashinfer-ai:mainfrom
blake-snc:fix/sm12x-fmha-v2-wiring

blake-snc commented Feb 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Uh oh!

coderabbitai bot commented Feb 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

jimmyzho left a comment

Uh oh!

jimmyzho commented Feb 13, 2026

Uh oh!

flashinfer-bot commented Feb 13, 2026

Uh oh!

flashinfer-bot commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

blake-snc commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Test plan

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jimmyzho left a comment

Choose a reason for hiding this comment

Uh oh!

jimmyzho commented Feb 13, 2026

Uh oh!

flashinfer-bot commented Feb 13, 2026

Uh oh!

flashinfer-bot commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

blake-snc commented Feb 13, 2026 •

edited

Loading

coderabbitai bot commented Feb 13, 2026 •

edited

Loading