[Feature] Support E8M0 related type conversion and vectorized cast by SiriusNEO · Pull Request #1731 · tile-ai/tilelang

SiriusNEO · 2026-01-26T10:03:22Z

Address #1710

Summary by CodeRabbit

New Features
- Added vectorized FP8 E8M0 type conversions supporting scalar and vector variants.
- Expanded FP8 casting support with bidirectional conversions between FP8 and BFloat16.
- Added Float and Double to FP8 E8M0 conversion paths.
Tests
- Extended test coverage for new FP8 vectorized conversion operations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2026-01-26T10:03:33Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-01-26T10:03:52Z

📝 Walkthrough

Walkthrough

This PR adds vectorized cast support for CUDA's FP8 E8M0 format, enabling efficient conversions between E8M0 and BFloat16, as well as conversions from float/double types. The implementation spans code generation, utility checks, FP8 template functions, and corresponding test coverage.

Changes

Cohort / File(s)	Summary
Code generation and vectorization utilities `src/target/codegen_cuda.cc`, `src/target/utils.cc`	Extends `CodeGenTileLangCUDA::VisitExpr_` with new vectorized cast branches for E8M0 ↔ BFloat16 and float/double → E8M0 conversions; adds helper comments in `IsCudaVectorizableFP8` and three new vectorization checks in `IsCudaVectorizableCast` for E8M0-related conversions.
FP8 template library `src/tl_templates/cuda/cuda_fp8.h`	Adds 8 new conversion functions guarded by `TL_HAS_FP8_E8M0`, supporting scalar and vector (x2) variants for E8M0 ↔ BFloat16, float/double → E8M0, and their reverse paths using NVIDIA's intrinsics.
Vectorized cast tests `testing/python/language/test_tilelang_language_vectorized_cast.py`	Removes debug print; adds early return to skip E8M0 vectorized tests; extends FP8 test coverage with E8M0 ↔ BFloat16 and float/double → E8M0 conversion test cases.

Sequence Diagram

sequenceDiagram
    participant Frontend as TileLang Frontend
    participant Codegen as CodeGenTileLangCUDA
    participant Utils as Vectorization Utils
    participant Templates as CUDA FP8 Templates
    participant GPU as GPU Execution

    Frontend->>Codegen: VisitExpr_(CastNode)
    activate Codegen
    
    Codegen->>Utils: IsCudaVectorizableCast(src, dst)
    activate Utils
    Utils->>Utils: Check if E8M0↔BFloat16<br/>or float/double→E8M0
    Utils-->>Codegen: vectorizable = true
    deactivate Utils

    alt Vectorized Path
        Codegen->>Codegen: Generate vectorized<br/>call to template function
        Codegen-->>Templates: e.g., __tl_cvt_e8m0x2_to_bfloat162()
        activate Templates
        Templates->>Templates: Reinterpret cast +<br/>NVIDIA intrinsics
        Templates-->>GPU: CUDA kernel code
        deactivate Templates
    else Non-Vectorized Path
        Codegen-->>GPU: Scalar cast code
    end
    
    deactivate Codegen
    GPU->>GPU: Execute cast conversion

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[Enhancement] Enhance Cast operations Vectorization #1156 — Adds additional CUDA vectorized cast paths in the same CodeGenTileLangCUDA Cast handling logic with corresponding test expansions.
[Feature] Enhance vectorized conversion support in CUDA codegen #1095 — Modifies the same CastNode vectorization branches in codegen_cuda.cc and extends vectorized-cast tests for FP8/BF16 conversions.
[Enhancement] Implement vectorized FP8 to FP32 cast #1438 — Extends CUDA FP8 vectorized cast support by adding new tl_templates helpers (__tl_cvt_fp8x2_to_float2) alongside utility and codegen modifications.

Suggested reviewers

LeiWang1999

Poem

🐰 Precision hops with E8M0,
From floats and doubles, swift and low,
BFloat pairs now dance with care,
Vectorized casts beyond compare! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarizes the main change: adding support for E8M0-related type conversions and vectorized casts.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@src/target/codegen_cuda.cc`:
- Around line 1277-1299: Update the misleading comments that reference
__nv_cvt_* wrappers to the TileLang wrapper names actually used: change the
comment mentioning __nv_cvt_float2_to_e8m0x2 to __tl_cvt_float2_to_e8m0x2 and
the one mentioning __nv_cvt_double2_to_e8m0x2 to __tl_cvt_double2_to_e8m0x2 so
they match the call sites that invoke
PrintVectorizedCast("__tl_cvt_float2_to_e8m0x2", "float2",
"__nv_fp8x2_storage_t", "", false, true) and
PrintVectorizedCast("__tl_cvt_double2_to_e8m0x2", "double2",
"__nv_fp8x2_storage_t", "", false, true).

In `@src/target/utils.cc`:
- Around line 185-191: The comments above the two conversion checks are
incorrect: they mention "E4M3/E5M2" but the code tests
target_ty.is_float8_e8m0fnu(); update the comment text to accurately describe
the checked target (e.g., mention float8 E8M0FNU/float8_e8m0fnu) or otherwise
make the comment consistent with the condition using from_ty.is_bfloat16(),
from_ty.is_float(), and target_ty.is_float8_e8m0fnu(); ensure both comment lines
reference the correct float8 format and keep wording consistent with the
is_float8_e8m0fnu() predicate.

In `@src/tl_templates/cuda/cuda_fp8.h`:
- Line 319: The preprocessor guard using "defined(TL_HAS_FP8_E8M0)" is incorrect
because TL_HAS_FP8_E8M0 is defined as 0 or 1; update the check around the E8M0
code (the `#if` that currently reads "#if defined(TL_HAS_FP8_E8M0)") to test the
macro's value instead (e.g., "#if TL_HAS_FP8_E8M0" or "#if TL_HAS_FP8_E8M0 ==
1") so the E8M0 blocks in cuda_fp8.h are only compiled when TL_HAS_FP8_E8M0 is
set to 1 (CUDA >= 12.6).

In `@testing/python/language/test_tilelang_language_vectorized_cast.py`:
- Around line 114-118: Update the inaccurate comment "E8M0 <-> FP16" to reflect
that the conversions in the test tuples use bfloat16 (BF16), not float16 (FP16);
locate the test entries that include T.float8_e8m0fnu and T.bfloat16 (the tuples
with "__tl_cvt_e8m0x2_to_bfloat162", "__tl_cvt_bfloat162_to_e8m0x2",
"__tl_cvt_float2_to_e8m0x2", "__tl_cvt_double2_to_e8m0x2") and change the
comment to something like "E8M0 <-> BF16" or otherwise mention bfloat16.

🧹 Nitpick comments (1)

examples/gemm/example_gemm_autotune.py (1)
222-222: Consider making the kernel source print conditional or removing it.

This print statement will output the full kernel source code on every run, which may produce excessive output. If this is intended for debugging, consider guarding it with a verbose flag or removing it for cleaner example output.
♻️ Suggested alternatives

Option 1: Remove if unintended:
-    print(kernel.get_kernel_source())
Option 2: Make it conditional:
+    if os.environ.get("TILELANG_DEBUG"):
+        print(kernel.get_kernel_source())
-    print(kernel.get_kernel_source())

src/target/codegen_cuda.cc

src/target/utils.cc

src/tl_templates/cuda/cuda_fp8.h

testing/python/language/test_tilelang_language_vectorized_cast.py

LeiWang1999

Overall LGTM!

examples/gemm/example_gemm_autotune.py

testing/python/language/test_tilelang_language_vectorized_cast.py

src/tl_templates/cuda/copy.h

[Feature] Support E8M0 related vectorized cast

68a5289

fix

eeeb8aa

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

src/target/codegen_cuda.cc Show resolved Hide resolved

src/target/utils.cc Outdated Show resolved Hide resolved

src/tl_templates/cuda/cuda_fp8.h Outdated Show resolved Hide resolved

testing/python/language/test_tilelang_language_vectorized_cast.py Outdated Show resolved Hide resolved

LeiWang1999 requested changes Jan 26, 2026

View reviewed changes

examples/gemm/example_gemm_autotune.py Outdated Show resolved Hide resolved

testing/python/language/test_tilelang_language_vectorized_cast.py Outdated Show resolved Hide resolved

src/tl_templates/cuda/copy.h Outdated Show resolved Hide resolved

address comments

0269293

SiriusNEO requested a review from LeiWang1999 January 27, 2026 02:59

LeiWang1999 approved these changes Jan 27, 2026

View reviewed changes

LeiWang1999 merged commit a17230e into tile-ai:main Jan 27, 2026
6 checks passed

coderabbitai bot mentioned this pull request Jan 27, 2026

[BugFix] Fix FP4 related vectorized cast #1741

Merged

SiriusNEO mentioned this pull request Jan 28, 2026

[Feature Request] Support convertion related to ue8m0 #1710

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support E8M0 related type conversion and vectorized cast#1731

[Feature] Support E8M0 related type conversion and vectorized cast#1731
LeiWang1999 merged 3 commits intotile-ai:mainfrom
SiriusNEO:chaofan/ue8m0_0122

SiriusNEO commented Jan 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

coderabbitai bot commented Jan 26, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LeiWang1999 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SiriusNEO commented Jan 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

coderabbitai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LeiWang1999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SiriusNEO commented Jan 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 26, 2026 •

edited

Loading