[Refactor] Use cuda capability from torch to be more generic #1557

oraluben · 2025-12-29T05:13:43Z

This shall reduce confusion caused by the default sm_50 from tvm when failed to detect arch.

Summary by CodeRabbit

Refactor
- Tightened parsing of NVIDIA compute-version strings for more consistent architecture detection, including handling of trailing modifiers and simplifying 3-character cases.
- Broadened accepted compute-version input formats for increased flexibility.
New Behavior
- Improved automatic CUDA detection: when available, the device compute capability is now reported as a detailed CUDA target instead of a generic label.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-12-29T05:13:51Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-12-29T05:14:08Z

📝 Walkthrough

Walkthrough

get_target_arch now accepts either a compute-version string or a (major, minor) tuple; compute-version parsing for 3-character arch strings is simplified and trailing "a"/"f" suffixes are stripped. determine_target detects CUDA at runtime via torch and can return a structured Target-like object with kind "cuda" and an sm_ arch.

Changes

Cohort / File(s)	Summary
CUDA architecture parsing `tilelang/contrib/nvcc.py`	`get_target_arch(compute_version: str
Target detection (runtime CUDA) `tilelang/utils/target.py`	Added module-level `torch` import. `determine_target` auto-detection: when `torch.cuda.is_available()` and device capability is obtainable, returns a Target-like object (`kind: "cuda"`, `arch: "sm_<xx>"`) instead of the plain `"cuda"` string. HIP/Metal branches unchanged.

Sequence Diagram(s)

(omitted — changes are localized parsing and runtime detection without new multi-component interaction flows)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Use TargetIsCuda for all cuda target #1522: Related changes to how CUDA targets/arch are detected and represented (returns Target(kind="cuda", arch="sm_<...>") and adjusts nvcc arch parsing).

Suggested reviewers

LeiWang1999

Poem

🐰
I nibble bits of arch and code,
Trailing letters trimmed from the road,
When CUDA wakes I dance and see,
An sm_ reply hops back to me,
Carrots and builds — a tidy spree! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main objective of the changeset: using CUDA capability from torch for more generic architecture detection.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d920eef and 804f0ac.

📒 Files selected for processing (1)

tilelang/contrib/nvcc.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-24T17:20:32.819Z

Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

tilelang/contrib/nvcc.py

🪛 Ruff (0.14.10)

tilelang/contrib/nvcc.py

418-418: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Test for Python 3.12 with Metal (on macos-latest)

🔇 Additional comments (3)

tilelang/contrib/nvcc.py (3)

408-408: LGTM! Robust handling of architecture suffixes.

The rstrip("af") correctly handles NVIDIA's architectural variants (e.g., sm_90a for Hopper, sm_90f variants) by stripping suffix characters before parsing the compute capability. This aligns with the PR objective to make arch detection more generic.

409-418: LGTM! Good forward compatibility for future architectures.

The enhanced parsing logic correctly handles:

2-character arch strings (e.g., "89" → "8.9") for current GPUs

3-character arch strings (e.g., "100" → "10.0") for future compute capability >= 10.0

Also confirmed that the error message fix from the past review comment has been properly addressed.

452-456: All callers are compatible with the updated signature.

The signature change to accept str | tuple[int, int] correctly handles the existing call patterns:

tilelang/utils/target.py:119 passes a tuple from torch.cuda.get_device_capability(0)

tilelang/jit/env.py:39, tilelang/jit/adapter/libgen.py:59, and tilelang/engine/lower.py:71 all pass strings from get_target_compute_version(), which returns str

The implementation correctly routes both input types to the appropriate handling logic.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 193eff1 and d2b1020.

📒 Files selected for processing (2)

tilelang/contrib/nvcc.py
tilelang/utils/target.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-24T17:20:32.819Z

Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

tilelang/contrib/nvcc.py

🧬 Code graph analysis (2)

tilelang/utils/target.py (2)

tilelang/language/ast/ir.py (1)

target (1677-1707)

tilelang/contrib/nvcc.py (1)

get_target_arch (452-460)

tilelang/contrib/nvcc.py (1)

tilelang/contrib/rocm.py (1)

parse_compute_version (179-201)

🪛 Ruff (0.14.10)

tilelang/contrib/nvcc.py

418-418: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)

🔇 Additional comments (4)

tilelang/utils/target.py (2)

2-4: Verify torch dependency is acceptable.

Adding torch as an unconditional import increases the package's dependency footprint. While this aligns with the PR objective to use PyTorch's CUDA capability detection, ensure this is acceptable for all use cases of tilelang.

Consider whether a lazy import or optional dependency pattern would be more appropriate if torch isn't always available in deployment environments.

89-89: Update the docstring to document the dict[str, str] return type.

The function signature correctly includes dict[str, str] in the return type (when CUDA is auto-detected with capabilities), but the docstring at line 98-99 only documents Union[str, Target] and omits the dict case. This creates a documentation mismatch. All callers already handle this correctly—they either wrap the result in Target() (which accepts dict), use return_object=True (which returns Target), or pass to Target.canon_target(). Update the docstring to reflect the actual return type.

Likely an incorrect or invalid review comment.

tilelang/contrib/nvcc.py (2)

452-460: LGTM! Enhanced flexibility for compute version input.

The function now correctly handles both string and tuple inputs, which enables seamless integration with torch.cuda.get_device_capability() that returns tuples. The logic correctly:

Parses string format via existing parse_compute_version()

Accepts tuples directly from PyTorch

Maintains the 'a' suffix for architectures >= 9.0

408-418: The architecture parsing logic correctly handles both current and future NVIDIA GPU architectures. NVIDIA's upcoming Blackwell architecture uses 3-digit compute capabilities (sm_100 family), confirming that the code's handling of 3-character architectures is sound and forward-looking, not speculative. The rstrip("af") removal of suffixes followed by length-based parsing (2-char for current sm_70/80/90 architectures, 3-char for future sm_100+) is appropriate and requires no changes.

tilelang/contrib/nvcc.py

tilelang/utils/target.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tilelang/utils/target.py (1)

118-121: Add error handling for torch CUDA device access.

As noted in a previous review, torch.cuda.get_device_capability(0) can raise exceptions if device 0 is not accessible or PyTorch CUDA initialization fails, even when torch.cuda.is_available() returns True. This should be wrapped in a try/except to gracefully fall back to the plain "cuda" target.

🔎 Proposed fix with error handling

         if is_cuda_available:
-            if torch.cuda.is_available() and (cap := torch.cuda.get_device_capability(0)):
-                return_var = Target({"kind": "cuda", "arch": f"sm_{nvcc.get_target_arch(cap)}"})
-            else:
-                return_var = "cuda"
+            try:
+                import torch
+                if torch.cuda.is_available() and torch.cuda.device_count() > 0:
+                    cap = torch.cuda.get_device_capability(0)
+                    if cap:
+                        return_var = Target({"kind": "cuda", "arch": f"sm_{nvcc.get_target_arch(cap)}"})
+                    else:
+                        return_var = "cuda"
+                else:
+                    return_var = "cuda"
+            except Exception:
+                # Fall back to plain "cuda" if torch is unavailable or CUDA detection fails
+                return_var = "cuda"

tilelang/contrib/nvcc.py (1)

418-418: Fix the error message format string.

The error message is missing the f prefix and the variable in the format placeholder. This was noted in a previous review but appears unaddressed.
🔎 Proposed fix
         else:
-            raise ValueError("Unsupported arch: {}")
+            raise ValueError(f"Unsupported arch: {arch}")

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2b1020 and d920eef.

📒 Files selected for processing (2)

tilelang/contrib/nvcc.py
tilelang/utils/target.py

🧰 Additional context used

🧬 Code graph analysis (2)

tilelang/utils/target.py (1)

tilelang/contrib/nvcc.py (1)

get_target_arch (452-460)

tilelang/contrib/nvcc.py (1)

tilelang/contrib/rocm.py (1)

parse_compute_version (179-201)

🪛 Ruff (0.14.10)

tilelang/contrib/nvcc.py

418-418: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Test for Python 3.12 with Metal (on macos-latest)

🔇 Additional comments (2)

tilelang/contrib/nvcc.py (2)

452-460: LGTM! Signature change to accept tuple is well-integrated.

The updated get_target_arch correctly handles both string compute versions (e.g., "8.6") and tuples from torch.cuda.get_device_capability() (e.g., (8, 6)). The math and suffix logic remain correct:

(8, 6) → "86"

(9, 0) → "90a"

408-416: Arch parsing simplification looks correct.

The rstrip("af") properly handles architecture suffixes (e.g., "90a" → "90", "120a" → "120"), and the 3-character case correctly parses newer architectures like sm_120 to "12.0".

tilelang/utils/target.py

oraluben requested a review from LeiWang1999 December 29, 2025 05:13

Use cuda capability from torch to be more generic

d920eef

oraluben force-pushed the use-cuda-cap-from-torch branch from d2b1020 to d920eef Compare December 29, 2025 05:17

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

tilelang/contrib/nvcc.py Outdated Show resolved Hide resolved

tilelang/utils/target.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

tilelang/utils/target.py Show resolved Hide resolved

fix

804f0ac

LeiWang1999 approved these changes Dec 29, 2025

View reviewed changes

LeiWang1999 changed the title ~~Use cuda capability from torch to be more generic~~ [Refactor] Use cuda capability from torch to be more generic Dec 29, 2025

LeiWang1999 merged commit d317710 into tile-ai:main Dec 29, 2025
6 checks passed

kurisu6912 mentioned this pull request Feb 11, 2026

[LoopVectorize] Loop Independent Var Optimization in IfThenElse Expr kurisu6912/tilelang#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Use cuda capability from torch to be more generic #1557

[Refactor] Use cuda capability from torch to be more generic #1557

Uh oh!

oraluben commented Dec 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

coderabbitai bot commented Dec 29, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Refactor] Use cuda capability from torch to be more generic #1557

[Refactor] Use cuda capability from torch to be more generic #1557

Uh oh!

Conversation

oraluben commented Dec 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

coderabbitai bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oraluben commented Dec 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 29, 2025 •

edited

Loading