Skip to content

[Compat] Add CUDA version check for __nv_fp8_e8m0 type#1537

Merged
LeiWang1999 merged 1 commit intotile-ai:mainfrom
LeiWang1999:fix/cuda-fp8-e8m0-compat
Dec 25, 2025
Merged

[Compat] Add CUDA version check for __nv_fp8_e8m0 type#1537
LeiWang1999 merged 1 commit intotile-ai:mainfrom
LeiWang1999:fix/cuda-fp8-e8m0-compat

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Dec 25, 2025

Summary

  • Add conditional compilation for __nv_fp8_e8m0 type which is only available in CUDA 12.6+
  • Provide a placeholder struct for older CUDA versions to maintain compatibility
  • Define TL_HAS_FP8_E8M0 macro to allow runtime feature detection

Test plan

  • Compile with CUDA < 12.6 to verify no __nv_fp8_e8m0 undefined error
  • Compile with CUDA >= 12.6 to verify native type is used

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added support for CUDA fp8_e8_t data type with conditional availability based on CUDA version (12.6+).
    • Introduced feature detection capability to identify fp8_e8m0 support availability.
    • Implemented backward compatibility for CUDA versions prior to 12.6 with appropriate fallback mechanisms.

✏️ Tip: You can customize this high-level summary in your review settings.

__nv_fp8_e8m0 is only available in CUDA 12.6+. Add conditional
compilation to provide a placeholder struct for older CUDA versions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 25, 2025

📝 Walkthrough

Walkthrough

Introduces conditional CUDA fp8_e8_t type support in the cuda_fp8.h header. For CUDA 12.6+, defines fp8_e8_t as an alias to __nv_fp8_e8m0 with a public TL_HAS_FP8_E8M0 macro set to 1. For earlier CUDA versions, provides a placeholder struct and sets the macro to 0.

Changes

Cohort / File(s) Summary
CUDA FP8 Type Definitions
src/tl_templates/cuda/cuda_fp8.h
Added conditional typedef for fp8_e8_t aliasing __nv_fp8_e8m0 (CUDA ≥12.6), placeholder struct for earlier versions, and public TL_HAS_FP8_E8M0 macro (1 or 0 depending on CUDA version support).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

Poem

🐰 A tiny hop in CUDA's land,

fp8_e8 now in our hand,

Version checks, macros stand tall,

New precision for one and all! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[Compat] Add CUDA version check for __nv_fp8_e8m0 type' directly and clearly describes the main change: adding conditional compilation support for a CUDA type based on version compatibility.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dff10e5 and d46bd11.

📒 Files selected for processing (1)
  • src/tl_templates/cuda/cuda_fp8.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-03T06:24:11.411Z
Learnt from: Rachmanino
Repo: tile-ai/tilelang PR: 1175
File: src/op/math.cc:44-52
Timestamp: 2025-11-03T06:24:11.411Z
Learning: In tilelang's `src/op/math.cc`, the `infinity_op` function uses `std::numeric_limits<float>::infinity()` as a placeholder for all float types (including float64 and bfloat16). The codegen layer (PrintConst:Inf) handles the correct infinity value based on the dtype field of the FloatImm node, so the specific C++ template argument doesn't matter.

Applied to files:

  • src/tl_templates/cuda/cuda_fp8.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)

Comment on lines +10 to +21
// __nv_fp8_e8m0 is only available in CUDA 12.6+
#if __CUDACC_VER_MAJOR__ > 12 || \
(__CUDACC_VER_MAJOR__ == 12 && __CUDACC_VER_MINOR__ >= 6)
using fp8_e8_t = __nv_fp8_e8m0;
#define TL_HAS_FP8_E8M0 1
#else
// Placeholder for CUDA < 12.6
struct fp8_e8_t {
unsigned char data;
};
#define TL_HAS_FP8_E8M0 0
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify TL_HAS_FP8_E8M0 usage across the codebase

echo "=== Searching for TL_HAS_FP8_E8M0 usage (excluding cuda_fp8.h) ==="
rg -n 'TL_HAS_FP8_E8M0' --type-list | head -1
rg -n 'TL_HAS_FP8_E8M0' | grep -v 'src/tl_templates/cuda/cuda_fp8.h'

echo -e "\n=== Searching for fp8_e8_t usage patterns ==="
rg -nP 'fp8_e8_t' --type c --type cpp --type h -C 2

echo -e "\n=== Searching for __nv_fp8_e8m0 usage ==="
rg -nP '__nv_fp8_e8m0' --type c --type cpp --type h -C 2

Repository: tile-ai/tilelang

Length of output: 6603


🏁 Script executed:

#!/bin/bash
# More thorough search for TL_HAS_FP8_E8M0 across all file types
echo "=== All occurrences of TL_HAS_FP8_E8M0 ==="
rg -r 'TL_HAS_FP8_E8M0' --type-add 'header:*.h' --type-add 'source:*.cc' --type-add 'cuda:*.cu' --type-add 'cpp:*.cpp'

echo -e "\n=== File count containing TL_HAS_FP8_E8M0 ==="
rg -l 'TL_HAS_FP8_E8M0' | wc -l

echo -e "\n=== Check if hip_fp8.h defines its own fp8_e8_t ==="
head -30 src/tl_templates/hip/hip_fp8.h

Repository: tile-ai/tilelang

Length of output: 1427


🏁 Script executed:

#!/bin/bash
# Simpler search for TL_HAS_FP8_E8M0 usage
echo "=== All lines containing TL_HAS_FP8_E8M0 ==="
rg 'TL_HAS_FP8_E8M0'

echo -e "\n=== Check for conditional compilation using this macro ==="
rg '#if.*TL_HAS_FP8_E8M0|#ifdef.*TL_HAS_FP8_E8M0'

echo -e "\n=== Verify placeholder struct size at compile time ==="
echo "Note: unsigned char is guaranteed to be 1 byte by C++ standard"
echo "sizeof(unsigned char) == 1 (C++ standard guaranteed)"

Repository: tile-ai/tilelang

Length of output: 454


Conditional compilation approach is sound, but the TL_HAS_FP8_E8M0 macro is unused dead code.

The version check correctly identifies CUDA 12.6+, and the placeholder struct is properly sized—unsigned char is guaranteed to be 1 byte by the C++ standard, matching __nv_fp8_e8m0. All operations on fp8_e8_t in the codebase are simple assignments and member access, which are compatible with both the native type and the placeholder struct.

However, the TL_HAS_FP8_E8M0 macro is defined but never referenced anywhere in the codebase. Either remove this macro if feature detection is not needed, or integrate it by using #if TL_HAS_FP8_E8M0 to gate any fp8_e8m0-specific operations (conversions, intrinsics, etc.) that should only run on CUDA 12.6+.

@LeiWang1999 LeiWang1999 merged commit d219f6c into tile-ai:main Dec 25, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant