Skip to content

fix: allow fmha_v2_prefill_deepseek on SM121 (DGX Spark)#2559

Merged
jimmyzho merged 1 commit intoflashinfer-ai:mainfrom
blake-snc:fix/sm12x-fmha-v2-wiring
Feb 19, 2026
Merged

fix: allow fmha_v2_prefill_deepseek on SM121 (DGX Spark)#2559
jimmyzho merged 1 commit intoflashinfer-ai:mainfrom
blake-snc:fix/sm12x-fmha-v2-wiring

Conversation

@blake-snc
Copy link
Contributor

@blake-snc blake-snc commented Feb 13, 2026

Summary

  • fmha_v2_prefill_deepseek() only checked is_sm120a_supported() (SM12.0), excluding SM12.1 devices like NVIDIA DGX Spark (GB10)
  • The JIT compilation context already handles SM12.1 via supported_major_versions=[12] and the kernels work correctly
  • This adds is_sm121a_supported() to the guard check, consistent with how other SM12x checks are done in the codebase (e.g., prefill.py:106-107, gemm_base.py:3757)

Validation

Tested on DGX Spark (GB10, SM121a, CUDA 13.0, flashinfer 0.6.2):

Test Result
fmha_v2_prefill_deepseek BF16 (batch=2, seq=64, heads=128, qk=192, vo=128) PASS, no NaN
fmha_v2_prefill_deepseek FP8 e4m3 (same config) PASS, no NaN
Bug reproduction (original code on SM12.1) ValueError as expected

Test plan

  • Existing test_fmha_v2_prefill_deepseek test updated with same fix
  • No impact on SM12.0 (RTX 5090) — is_sm120a_supported check still in place
  • No impact on non-SM12x GPUs — they still get the existing ValueError
  • Bug reproduced: original code raises ValueError: fmha_v2_prefill_deepseek is only supported on SM120 GPUs. on SM12.1
  • Fix applied to installed package (no monkeypatch): BF16 and FP8 both produce correct output

Related: #2555

From Second Nature Computing — tested on NVIDIA DGX Spark.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Expanded GPU support for Deepseek operations to include SM121A GPUs in addition to SM120A, broadening hardware compatibility and enabling execution on a wider range of SM12x GPU architectures.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @blake-snc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the flashinfer library by extending the fmha_v2_prefill_deepseek function's compatibility to NVIDIA SM12.1 GPUs. This ensures that users with newer hardware, specifically DGX Spark (GB10), can utilize this optimized prefill operation, improving hardware utilization and performance without impacting existing SM12.0 or other unsupported devices.

Highlights

  • Expanded GPU Support: The fmha_v2_prefill_deepseek() function, previously limited to SM12.0 GPUs, now explicitly supports SM12.1 devices, such as NVIDIA DGX Spark (GB10).
  • Compatibility Check Update: The device guard check for fmha_v2_prefill_deepseek() was updated to include is_sm121a_supported(), aligning with existing SM12x checks in the codebase.
  • JIT Compilation Alignment: The underlying JIT compilation context already handled SM12.1 devices, meaning the kernels were already compatible, and this change primarily updates the frontend guard.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • flashinfer/prefill.py
    • Modified the fmha_v2_prefill_deepseek function's device compatibility check to include SM12.1 alongside SM12.0.
    • Updated the ValueError message to reflect support for "SM12x GPUs".
  • tests/attention/test_fmha_v2_prefill_deepseek.py
    • Imported is_sm121a_supported from flashinfer.utils.
    • Updated the pytest.skip condition in test_fmha_v2_prefill_deepseek to include SM12.1 support, mirroring the change in the main function.
    • Modified the skip message to indicate support for "SM12x GPUs".
Activity
  • The fmha_v2_prefill_deepseek function was tested on a DGX Spark (GB10, SM121a, CUDA 13.0, flashinfer 0.6.2) with BF16 and FP8 configurations, both passing without NaN values.
  • A test plan was outlined to ensure existing tests are updated, no impact on SM12.0, and correct ValueError for non-SM12x GPUs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

The PR relaxes the GPU architecture check for fmha_v2_prefill_deepseek to accept SM12x GPUs (SM120A or SM121A) instead of SM120A-only, and updates the corresponding test skip condition and messages.

Changes

Cohort / File(s) Summary
DeepSeek FMHA v2
flashinfer/prefill.py
Broadened supported-SM check from SM120A-only to SM12x (SM120A or SM121A); updated error message to reference "SM12x GPUs".
Tests — FMHA v2 prefill
tests/attention/test_fmha_v2_prefill_deepseek.py
Added is_sm121a_supported import and updated skip condition/message to treat SM12x (SM120A or SM121A) as supported for tests.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested reviewers

  • cyx-6
  • nvmbreughe
  • yzh119
  • Anerudhan

Poem

🐰 I nibble code beneath the moonlight,
SM12x joins the hopping flight.
Prefill wakes with broader cheer,
DeepSeek runs on GPUs near,
A joyful thump — updates take flight! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding SM121 GPU support to fmha_v2_prefill_deepseek, with the hardware context (DGX Spark) for clarity.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main
Description check ✅ Passed The PR description comprehensively addresses the change: problem statement, solution details, validation evidence, test plan, and related issues. All required template sections are completed with substantial content.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly extends support for fmha_v2_prefill_deepseek to SM12.1 GPUs by updating the hardware capability check. The change is applied to both the function implementation and its corresponding test, ensuring correctness and maintainability. The updated error message and test skip message are also clear and accurate. The implementation is clean and follows the existing pattern in the codebase. Overall, this is a good and necessary fix.

Copy link
Contributor

@jimmyzho jimmyzho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The `fmha_v2_prefill_deepseek` function only checked `is_sm120a_supported()`
which requires SM12.0 (RTX 5090). This excluded SM12.1 devices like
NVIDIA DGX Spark (GB10) which support the same fmha_v2 SM120 kernels.

The JIT compilation context already handles SM12.1 correctly via
`supported_major_versions=[12]`, and the kernels compile and run
successfully on SM12.1. Only the Python guard was blocking.

Tested on DGX Spark (SM121a, CUDA 13.0): both BF16 and FP8 prefill
produce correct results with no NaN.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@blake-snc blake-snc force-pushed the fix/sm12x-fmha-v2-wiring branch from 0df2e31 to db94171 Compare February 13, 2026 19:47
@jimmyzho
Copy link
Contributor

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !317 has been created, and the CI pipeline #43999178 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Collaborator

[CANCELING] Pipeline #43999178: canceled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants