Cutlass dsl 4.5 bump#3246
Conversation
CUTLASS DSL 4.5 adds sm_121a to MmaSM120BlockScaledOp.admissible_archs, fixing FP4 block-scaled MMA on DGX Spark (SM121) without a monkey-patch. Also update the cute-dsl test skip condition to run on SM120/SM121. Co-authored-by: Cursor <cursoragent@cursor.com>
Update requirements.txt and Docker install script to match the pyproject.toml bump so CI picks up the new version. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughBumps nvidia-cutlass-dsl to >=4.5.0 in installer, requirements, and pyproject optionals; upgrades pip/setuptools early in the installer and adds cu13 prep steps; installs branch requirements during test setup; relaxes SM12x test gating to accept SM120/SM121 and adds CUDA‑13 detection for relevant tests. ChangesDependency, Packaging, and Test Gating
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the nvidia-cutlass-dsl dependency to version 4.5.0 across the project and expands the test suite to include SM12.x architectures for the cute-dsl backend. However, the changes to the test file are incomplete because the underlying requirement function in flashinfer/gemm/gemm_base.py still restricts the backend to SM10.x architectures, which will cause tests to be skipped on SM12.x devices despite the updates in this PR.
I am having trouble creating individual review comments. Click here to see my feedback.
tests/gemm/test_mm_fp4.py (35-36)
This change to allow testing cute-dsl on SM12.x architectures appears to be incomplete.
The check mm_fp4.is_backend_supported('cute-dsl', ...) at lines 22-25 will return False on SM12.x devices. This is because the underlying requirement function, _cute_dsl_gemm_fp4_requirement in flashinfer/gemm/gemm_base.py, is still restricted to SM10.x architectures via the @supported_compute_capability([100, 103]) decorator.
As a result, the test will be skipped before this new condition is evaluated, rendering this change ineffective for enabling tests on SM12.x.
To complete this change, the @supported_compute_capability decorator for _cute_dsl_gemm_fp4_requirement should be updated to include the new architectures (e.g., [100, 103, 120, 121]).
The CI test runner uses `pip install -e . --no-deps`, which skips dependency resolution. Add an explicit `pip install -r requirements.txt` before the editable install so that dependency bumps on the branch (e.g. nvidia-cutlass-dsl) are picked up without rebuilding the Docker image. Co-authored-by: Cursor <cursoragent@cursor.com>
nvidia-cutlass-dsl 4.5.0 adds sm_121a to MmaSM120BlockScaledOp.admissible_archs, enabling warp-level MMA on SM121 (DGX Spark). Remove the SM121 test skips that were guarding against the missing arch in 4.4.2. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/bot run |
1 similar comment
|
/bot run |
The base Docker image may ship older setuptools that can't properly build the package. This matches the build-system requirement already declared in pyproject.toml. Co-authored-by: Cursor <cursoragent@cursor.com>
419ad00 to
4fa728c
Compare
|
/bot run |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pyproject.toml`:
- Around line 23-27: The pyproject.toml is missing a PEP 621-compliant version
declaration in the [project] table which breaks metadata generation; update the
[project] table to include a concrete version entry (e.g., version = "0.0.0") or
add dynamic = ["version", ...] in [project] to match what
[tool.setuptools.dynamic] provides so pip can build metadata; edit the [project]
section in pyproject.toml (and ensure consistency with
[tool.setuptools.dynamic]) to restore the version metadata required for pip
install -e ..
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c3e855bd-bcb1-4655-9509-4c2fe62e9f9c
📒 Files selected for processing (6)
docker/install/install_python_packages.shpyproject.tomlrequirements.txtscripts/test_utils.shtests/gemm/test_mm_fp4.pytests/moe/test_b12x_fused_moe.py
💤 Files with no reviewable changes (1)
- tests/moe/test_b12x_fused_moe.py
setuptools 82.x has a regression with dynamic version fields and custom build backends, and torch 2.11.0+cu129 requires setuptools<82. Pin to >=77,<82 in both build-system requires and Docker setup. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pyproject.toml (1)
15-24:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRestore
[project].dynamicforversionanddependenciesto keep metadata generation valid.
[tool.setuptools.dynamic]is configured, but[project]does not declare those fields as dynamic. That can breakpip install -e ./ metadata builds.🔧 Minimal fix
[project] name = "flashinfer-python" description = "FlashInfer: Kernel Library for LLM Serving" requires-python = ">=3.10,<4.0" authors = [{ name = "FlashInfer team" }] license = "Apache-2.0" readme = "README.md" urls = { Homepage = "https://github.com/flashinfer-ai/flashinfer" } license-files = ["LICENSE", "LICENSE*.txt"] +dynamic = ["version", "dependencies"]#!/bin/bash python - <<'PY' import sys, tomllib with open("pyproject.toml", "rb") as f: data = tomllib.load(f) project = data.get("project", {}) dynamic = set(project.get("dynamic", [])) missing = [] if "version" not in project and "version" not in dynamic: missing.append("version") if "dependencies" not in project and "dependencies" not in dynamic: missing.append("dependencies") if missing: print("FAIL: missing [project].dynamic declarations for:", ", ".join(missing)) sys.exit(1) print("PASS: [project] declares required dynamic metadata fields.") PY🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pyproject.toml` around lines 15 - 24, pyproject.toml is missing a [project].dynamic entry for the fields declared in [tool.setuptools.dynamic]; add a dynamic array under the [project] table that includes "version" and "dependencies" (e.g., project.dynamic = ["version","dependencies"]) so metadata generation works with the existing [tool.setuptools.dynamic] configuration; update the [project] table where name/description are defined to include this dynamic declaration.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pyproject.toml`:
- Around line 15-24: pyproject.toml is missing a [project].dynamic entry for the
fields declared in [tool.setuptools.dynamic]; add a dynamic array under the
[project] table that includes "version" and "dependencies" (e.g.,
project.dynamic = ["version","dependencies"]) so metadata generation works with
the existing [tool.setuptools.dynamic] configuration; update the [project] table
where name/description are defined to include this dynamic declaration.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 334df8f7-7ad9-4d17-946b-8122522eecb5
📒 Files selected for processing (2)
docker/install/install_python_packages.shpyproject.toml
…mpliance setuptools 82.x strictly enforces that dynamically-provided fields must be declared in the [project] dynamic list. Without this, editable installs fail with "project must contain ['version'] properties". This is the proper fix; the setuptools<82 cap is no longer needed. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/bot run |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pyproject.toml (1)
26-28:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftSame
>=4.5.0version-floor concern applies here as in the Docker install script.Both optional-dependency entries (
cu12andcu13) adopt the>=4.5.0floor. If the stable4.5.0wheel is not yet live on the public PyPI index, anypip install flashinfer-python[cu12]orpip install flashinfer-python[cu13]invocation will fail to resolve the dependency. See the corresponding comment ondocker/install/install_python_packages.shfor the verification steps.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pyproject.toml` around lines 26 - 28, The optional-dependencies entries cu12 and cu13 in pyproject.toml pin nvidia-cutlass-dsl with a floor of ">=4.5.0", which will fail if the 4.5.0 wheel isn't available; update those entries (cu12 and cu13) to reference a concrete, available package spec—either a released exact version (e.g., nvidia-cutlass-dsl==<released-version>) or a constrained range that matches a published wheel, or add a documented alternative index/source for fetching the 4.5.0 wheel; ensure the change targets the nvidia-cutlass-dsl spec for both cu12 and cu13.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pyproject.toml`:
- Around line 26-28: The optional-dependencies entries cu12 and cu13 in
pyproject.toml pin nvidia-cutlass-dsl with a floor of ">=4.5.0", which will fail
if the 4.5.0 wheel isn't available; update those entries (cu12 and cu13) to
reference a concrete, available package spec—either a released exact version
(e.g., nvidia-cutlass-dsl==<released-version>) or a constrained range that
matches a published wheel, or add a documented alternative index/source for
fetching the 4.5.0 wheel; ensure the change targets the nvidia-cutlass-dsl spec
for both cu12 and cu13.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5727f60e-386a-4908-8b37-ebb925d68a3e
📒 Files selected for processing (2)
docker/install/install_python_packages.shpyproject.toml
|
/bot run |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/test_utils.sh (1)
168-177: 💤 Low valueLGTM — minor note on the non-cu13 else branch
The cu13-path install (line 174) is meaningful:
requirements.txtinstalls the base package without the[cu13]extra, and the explicit call adds it. The else-branch (line 176), however, re-installs a constraint (nvidia-cutlass-dsl>=4.5.0) thatrequirements.txtalready satisfied, which causes a redundant PyPI round-trip on every non-cu13 CI run. Consider dropping the else branch ifrequirements.txtreliably covers the non-cu13 case:♻️ Optional simplification
- if [[ "${CUDA_VERSION}" == *"cu13"* ]] || [[ "${CUDA_VERSION}" == "13."* ]]; then - pip install --upgrade "nvidia-cutlass-dsl[cu13]>=4.5.0" - else - pip install --upgrade "nvidia-cutlass-dsl>=4.5.0" - fi + if [[ "${CUDA_VERSION}" == *"cu13"* ]] || [[ "${CUDA_VERSION}" == "13."* ]]; then + pip install --upgrade "nvidia-cutlass-dsl[cu13]>=4.5.0" + fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/test_utils.sh` around lines 168 - 177, The else branch redundantly re-installs "nvidia-cutlass-dsl>=4.5.0" after requirements.txt already satisfied it, causing unnecessary PyPI overhead; modify scripts/test_utils.sh to only run the extra-cu13 pip install when CUDA_VERSION indicates cu13 (i.e., keep the if branch that runs pip install --upgrade "nvidia-cutlass-dsl[cu13]>=4.5.0") and remove the else branch so non-cu13 CI relies on requirements.txt instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@scripts/test_utils.sh`:
- Around line 168-177: The else branch redundantly re-installs
"nvidia-cutlass-dsl>=4.5.0" after requirements.txt already satisfied it, causing
unnecessary PyPI overhead; modify scripts/test_utils.sh to only run the
extra-cu13 pip install when CUDA_VERSION indicates cu13 (i.e., keep the if
branch that runs pip install --upgrade "nvidia-cutlass-dsl[cu13]>=4.5.0") and
remove the else branch so non-cu13 CI relies on requirements.txt instead.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b8e4bd56-8bca-4926-b391-672699edf8f5
📒 Files selected for processing (2)
requirements.txtscripts/test_utils.sh
💤 Files with no reviewable changes (1)
- requirements.txt
|
/bot run |
cutlass-dsl 4.5.0 no longer exposes Constexpr parameters (like max_active_clusters) in the compiled kernel's runtime interface — they are baked in at cute.compile() time. Remove the trailing mac argument from static/micro and dynamic kernel launch calls. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/bot run |
📌 Description
Bump cutlass-dsl to 4.5 so sm121 is supported in BlockScaledMmaOp
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
Chores
Tests