Conversation
…g GDC fl…" This reverts commit 4c4013b.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request reverts a previous change that introduced specific CUTLASS GDC compilation flags to several GEMM kernel generation functions. The original change, intended to fix NaN issues under concurrency, inadvertently broke Ahead-Of-Time (AOT) packages. By reverting these flags, the PR aims to restore the stability of AOT packages and unblock the ongoing 0.6.6 release. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughRemoved the -DCUTLASS_ENABLE_GDC_FOR_SM100=1 and -DCUTLASS_ENABLE_GDC_FOR_SM90=1 compile flags from multiple GEMM JIT generator configurations in Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review: Revert of GDC Flags (#2737)Context: This PR reverts #2716 to unblock the v0.6.6 release. The change itself is minimal and mechanically correct — it removes Critical Concern: Restoring a Known Race ConditionThe core trade-off here deserves explicit acknowledgment before merging. PR #2716 fixed a real, verified, production-impacting bug:
This revert restores that race condition for SM90 and SM100/SM103/SM120 users. This is a data-correctness issue, not just a performance one. Missing InformationThe PR description states the flags "seem to have broken AOT packages" with a link to a CI job. Before merging, it would be valuable to understand:
Inconsistency After This RevertNote that RecommendationIf the 0.6.6 release is blocked and this revert is necessary to ship:
Summary
|
There was a problem hiding this comment.
Code Review
This pull request reverts a previous fix for a concurrency issue in GEMM kernels that could lead to NaN results, aiming to unblock the 0.6.6 release by resolving a build failure in AOT packages. However, a security audit identified several high-severity issues in the codebase, including Command Injection (e.g., FLASHINFER_CUDA_ARCH_LIST, nvcc_flags, CXX, FLASHINFER_NVCC_THREADS used without sanitization in flashinfer/jit/gemm/core.py), Path Traversal (artifact download in flashinfer/artifacts.py and flashinfer/jit/cubin_loader.py using unsanitized filenames), and Path Traversal via Workspace Directory (FLASHINFER_CUDA_ARCH_LIST used unsanitized for workspace path in flashinfer/jit/env.py). While the revert is a valid short-term solution, it re-introduces a known correctness bug. It is critical to address the AOT build issue and re-instate GDC flags for GEMM operations. Additionally, strict validation for all environment variables and sanitization of all filenames used in path construction are strongly recommended to mitigate the identified security vulnerabilities.
I am having trouble creating individual review comments. Click here to see my feedback.
flashinfer/jit/gemm/core.py (94-95)
Removing these GDC (Global Data Coherency) flags re-introduces a known concurrency bug that can cause GEMM kernels to produce NaN values. The original fix was to ensure PDL (Producer-Driven Load) synchronization barriers are not compiled as no-ops. While I understand this revert is necessary to unblock the release, please ensure a high-priority task is created to investigate the AOT build failure and re-apply this fix. The correctness of concurrent GEMM operations depends on it. This feedback applies to all similar removals of these flags in this file.
flashinfer/jit/gemm/core.py (519)
The nvcc_flags variable used here is constructed from the FLASHINFER_CUDA_ARCH_LIST environment variable without sufficient sanitization. An attacker who can control this environment variable can inject arbitrary shell commands or malicious compiler flags (e.g., by including shell metacharacters in the version string), leading to potential Remote Code Execution (RCE) when the JIT compilation is triggered. While this usage is not new, it is part of the modified code in this PR. It is recommended to validate the environment variable in flashinfer/compilation_context.py to ensure it only contains valid numeric versions.
|
It's weird to me that this change will break compilation, ideally it should only affect functionalities relevant to PDL. Let me double check. |
Code ReviewThis is a partial revert of #2716, removing only the Critical Concern: Potential NaN Bug Regression for SM120The original PR #2716 explicitly called out a key detail in its root cause analysis:
If this is accurate, removing
…re-enables the race condition for SM120 kernels. These modules still have
Missing Root Cause ExplanationThe PR states that #2716 "broke AOT packages" and links to a CI failure. However, it does not document:
Without this context, it is difficult to assess whether removing the flag is the right fix or just a workaround that silently re-introduces the original correctness bug. Suggested Targeted FixIf the issue is that
This preserves the NaN fix for SM120 while addressing whatever AOT issue the SM90 flag introduced in SM100-only modules. Minor Notes
SummaryThe concern is correctness, not style. Before merging, please confirm whether SM120 CUTLASS FP4 kernels (and
|
This reverts commit 95992b1.
PR Review: Revert of #2716 (GDC Flags Fix)Thank you for the quick action to unblock the 0.6.6 release. However, this revert deserves careful consideration before merging, as it re-introduces a known data correctness bug. Here is a detailed analysis: What This Revert DoesThis PR removes the
Concern 1: Re-introduces a Correctness Bug (High Severity)The original fix (#2716) addressed a real race condition — without Reverting this fix means users with SM90/SM100/SM120 GPUs running concurrent workloads are vulnerable to returning incorrect (NaN/garbage) results from GEMM operations, with no error raised. This is a production correctness regression. Concern 2: Inconsistency —
|
| Issue | Assessment |
|---|---|
| Re-introduces NaN race condition under concurrency | High severity regression |
Inconsistency with tgv_gemm keeping GDC flag |
Medium — creates confusing asymmetry |
| Root cause of AOT failure unexplained | Needs investigation before merging |
| Release urgency | Understandable motivation, but correctness should take priority |
Recommendation: Before merging, please investigate and document why the AOT build fails with the GDC flags. If the AOT issue cannot be quickly resolved, consider disabling PDL rather than removing the synchronization barriers, to avoid shipping a release with a known silent correctness bug on SM90+ hardware.
If this revert is merged anyway, please open a follow-up issue to re-add the fix with an AOT-compatible approach, and document the known risk in the release notes.
|
/bot run |
|
[FAILED] Pipeline #45837015: 7/20 passed |
Re-applies the fix from flashinfer-ai#2716 (reverted in flashinfer-ai#2737) using only -DCUTLASS_ENABLE_GDC_FOR_SM100=1, without -DCUTLASS_ENABLE_GDC_FOR_SM90=1. The SM90 flag was the cause of the AOT build failure: it triggers a direct #ifdef in sm90_gemm_tma_warpspecialized_cooperative.hpp (line 794) that calls scheduler.is_last_tile() — but SM100+/SM120 schedulers (PersistentTileSchedulerSm100StreamK) don't have that method. The SM100 flag alone is sufficient because CUTLASS 4.2.1's grid_dependency_control.h defines CUTLASS_GDC_ENABLED for the entire SM100 family (SM100, SM101, SM103, SM120, SM121) when CUTLASS_ENABLE_GDC_FOR_SM100 is set. All affected GEMM kernels use enablePDL=true, so the device-side GDC barriers (griddepcontrol.wait / griddepcontrol.launch_dependents) must be compiled in — otherwise PDL enables host-side kernel overlap but device-side synchronization is compiled out as no-ops, causing a race condition (NaN/garbage in output tiles under concurrency). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g GDC flags cause PDL synchronization barriers to compile as no-ops" (flashinfer-ai#2737) Proposing to revert flashinfer-ai#2716 in order to unblock 0.6.6 release flashinfer-ai#2716 seems to have broken AOT packages https://github.com/flashinfer-ai/flashinfer/actions/runs/22870567870/job/66353637447?pr=2730 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Removed legacy GPU compilation flags related to GDC enablement for certain GPU tiers during JIT GEMM generation, reducing extra compile flags and build noise; GDC-related flags for the latest GPU tier remain enabled where still applicable. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: yzh119 <zihaoy@nvidia.com>
…g GDC flags cause PDL synchronization barriers to compile as no-ops" (flashinfer-ai#2737) Proposing to revert flashinfer-ai#2716 in order to unblock 0.6.6 release flashinfer-ai#2716 seems to have broken AOT packages https://github.com/flashinfer-ai/flashinfer/actions/runs/22870567870/job/66353637447?pr=2730 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Removed legacy GPU compilation flags related to GDC enablement for certain GPU tiers during JIT GEMM generation, reducing extra compile flags and build noise; GDC-related flags for the latest GPU tier remain enabled where still applicable. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: yzh119 <zihaoy@nvidia.com> Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
## Summary Re-applies #2716 (reverted in #2737) with the fix for the AOT build failure. **Only `-DCUTLASS_ENABLE_GDC_FOR_SM100=1`** is added. The `-DCUTLASS_ENABLE_GDC_FOR_SM90=1` flag that broke AOT builds is intentionally omitted. ## Why the original PR broke AOT `sm90_gemm_tma_warpspecialized_cooperative.hpp:794` has a direct `#ifdef CUTLASS_ENABLE_GDC_FOR_SM90` guard (not `CUTLASS_GDC_ENABLED`) that calls `scheduler.is_last_tile()`. When compiling SM120 kernels with that flag, the SM120 scheduler (`PersistentTileSchedulerSm100StreamK`) doesn't have `is_last_tile()` → compilation error. ## Why SM100 flag alone is sufficient CUTLASS 4.2.1 `grid_dependency_control.h` defines `CUTLASS_GDC_ENABLED` for the entire SM100 family (SM100/101/103/120/121) when `CUTLASS_ENABLE_GDC_FOR_SM100` is set. This enables `griddepcontrol.wait` and `griddepcontrol.launch_dependents` device-side barriers for all affected architectures. ## Why this is needed All affected GEMM kernels hardcode `enablePDL=true`, which enables host-side kernel overlap. Without the GDC compile flag, the device-side synchronization barriers compile as no-ops → race condition → NaN/garbage output tiles under concurrency. ## Affected modules - `fp4_gemm_cutlass` (SM100) - `fp4_gemm_cutlass_sm103` (SM103) - `fp4_gemm_cutlass_sm120` (SM120) - `fp8_gemm_cutlass` (SM100) - `mxfp8_gemm_cutlass` (SM100) - `gemm_sm120` (SM120 FP8 groupwise) (`tgv_gemm` already had the SM100 flag.) ## Test plan - [ ] AOT build with `FLASHINFER_CUDA_ARCH_LIST="12.1a"` (the exact config that broke before) - [ ] AOT build with full arch list `"7.5 8.0 8.9 9.0a 10.0a 12.0a"` - [ ] FP4 GEMM correctness under concurrent streams on SM120 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated CUDA compilation configurations for matrix multiplication kernels across multiple data format variants (FP4, FP8, MXFP8, BF16) supporting additional GPU architectures. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Proposing to revert #2716 in order to unblock 0.6.6 release
#2716 seems to have broken AOT packages
https://github.com/flashinfer-ai/flashinfer/actions/runs/22870567870/job/66353637447?pr=2730
Summary by CodeRabbit