[Compat] Add CUDA version check for __nv_fp8_e8m0 type#1537
[Compat] Add CUDA version check for __nv_fp8_e8m0 type#1537LeiWang1999 merged 1 commit intotile-ai:mainfrom
Conversation
__nv_fp8_e8m0 is only available in CUDA 12.6+. Add conditional compilation to provide a placeholder struct for older CUDA versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughIntroduces conditional CUDA fp8_e8_t type support in the cuda_fp8.h header. For CUDA 12.6+, defines fp8_e8_t as an alias to __nv_fp8_e8m0 with a public TL_HAS_FP8_E8M0 macro set to 1. For earlier CUDA versions, provides a placeholder struct and sets the macro to 0. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/tl_templates/cuda/cuda_fp8.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-03T06:24:11.411Z
Learnt from: Rachmanino
Repo: tile-ai/tilelang PR: 1175
File: src/op/math.cc:44-52
Timestamp: 2025-11-03T06:24:11.411Z
Learning: In tilelang's `src/op/math.cc`, the `infinity_op` function uses `std::numeric_limits<float>::infinity()` as a placeholder for all float types (including float64 and bfloat16). The codegen layer (PrintConst:Inf) handles the correct infinity value based on the dtype field of the FloatImm node, so the specific C++ template argument doesn't matter.
Applied to files:
src/tl_templates/cuda/cuda_fp8.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
| // __nv_fp8_e8m0 is only available in CUDA 12.6+ | ||
| #if __CUDACC_VER_MAJOR__ > 12 || \ | ||
| (__CUDACC_VER_MAJOR__ == 12 && __CUDACC_VER_MINOR__ >= 6) | ||
| using fp8_e8_t = __nv_fp8_e8m0; | ||
| #define TL_HAS_FP8_E8M0 1 | ||
| #else | ||
| // Placeholder for CUDA < 12.6 | ||
| struct fp8_e8_t { | ||
| unsigned char data; | ||
| }; | ||
| #define TL_HAS_FP8_E8M0 0 | ||
| #endif |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify TL_HAS_FP8_E8M0 usage across the codebase
echo "=== Searching for TL_HAS_FP8_E8M0 usage (excluding cuda_fp8.h) ==="
rg -n 'TL_HAS_FP8_E8M0' --type-list | head -1
rg -n 'TL_HAS_FP8_E8M0' | grep -v 'src/tl_templates/cuda/cuda_fp8.h'
echo -e "\n=== Searching for fp8_e8_t usage patterns ==="
rg -nP 'fp8_e8_t' --type c --type cpp --type h -C 2
echo -e "\n=== Searching for __nv_fp8_e8m0 usage ==="
rg -nP '__nv_fp8_e8m0' --type c --type cpp --type h -C 2Repository: tile-ai/tilelang
Length of output: 6603
🏁 Script executed:
#!/bin/bash
# More thorough search for TL_HAS_FP8_E8M0 across all file types
echo "=== All occurrences of TL_HAS_FP8_E8M0 ==="
rg -r 'TL_HAS_FP8_E8M0' --type-add 'header:*.h' --type-add 'source:*.cc' --type-add 'cuda:*.cu' --type-add 'cpp:*.cpp'
echo -e "\n=== File count containing TL_HAS_FP8_E8M0 ==="
rg -l 'TL_HAS_FP8_E8M0' | wc -l
echo -e "\n=== Check if hip_fp8.h defines its own fp8_e8_t ==="
head -30 src/tl_templates/hip/hip_fp8.hRepository: tile-ai/tilelang
Length of output: 1427
🏁 Script executed:
#!/bin/bash
# Simpler search for TL_HAS_FP8_E8M0 usage
echo "=== All lines containing TL_HAS_FP8_E8M0 ==="
rg 'TL_HAS_FP8_E8M0'
echo -e "\n=== Check for conditional compilation using this macro ==="
rg '#if.*TL_HAS_FP8_E8M0|#ifdef.*TL_HAS_FP8_E8M0'
echo -e "\n=== Verify placeholder struct size at compile time ==="
echo "Note: unsigned char is guaranteed to be 1 byte by C++ standard"
echo "sizeof(unsigned char) == 1 (C++ standard guaranteed)"Repository: tile-ai/tilelang
Length of output: 454
Conditional compilation approach is sound, but the TL_HAS_FP8_E8M0 macro is unused dead code.
The version check correctly identifies CUDA 12.6+, and the placeholder struct is properly sized—unsigned char is guaranteed to be 1 byte by the C++ standard, matching __nv_fp8_e8m0. All operations on fp8_e8_t in the codebase are simple assignments and member access, which are compatible with both the native type and the placeholder struct.
However, the TL_HAS_FP8_E8M0 macro is defined but never referenced anywhere in the codebase. Either remove this macro if feature detection is not needed, or integrate it by using #if TL_HAS_FP8_E8M0 to gate any fp8_e8m0-specific operations (conversions, intrinsics, etc.) that should only run on CUDA 12.6+.
Summary
__nv_fp8_e8m0type which is only available in CUDA 12.6+TL_HAS_FP8_E8M0macro to allow runtime feature detectionTest plan
__nv_fp8_e8m0undefined error🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.