Skip to content

Conversation

@Antonyvance
Copy link

@Antonyvance Antonyvance commented Oct 23, 2025

Intel Xe Architecture Support for CUTLASS Library generation

Feature: Add Intel Xe12/Xe20 architecture support with operation generation and Python bindings.

Use Case: Enable kernel generation for PyTorch inductor path and ML frameworks on Intel Arc/PVC GPUs.

Key Changes:

  • Architecture Support: Added Xe12 (PVC) and Xe20 (BMG) with compute capability 12-50
  • Operations: FP16, BF16, FP8 (E4M3/E5M2), INT8 GEMM kernels with multiple tile sizes (256×256, 128×256, etc.)
  • Build Flags: New CMake options -DCUTLASS_LIBRARY_GENERATOR_ARCHS="20" for Intel GPU targets
  • Python Integration: CMake-based shared library (examples/11_xe20_cutlass_library/) + ctypes bindings
  • Generator: Extended python/cutlass_library/generator.py with GenerateIntelXe() functions
  • Examples: Python test scripts with performance benchmarking

Testing: ✅ Tested BF16 generated kernels, Examples, Documentation

Note These changes do not make use of new APIs (or modified collectives). That must be different feature / refactoring effort.

ToDo:

  • Build Failures
  • Benchmark tests for comprehensive performance analysis
  • Testing kernels beyond BF16 (FP16, FP8, INT8)
  • Optimizing generated kernels with tile sizes
  • Modify CMake to avoid explicitly linking with libsycl.so

Type: Feature | Tested On: Xe20 ✅

@Antonyvance Antonyvance added enhancement New feature or request release urgent PR requires a urgent attention (for release or blocking another PR) labels Oct 23, 2025
@Antonyvance Antonyvance added this to the 0.6 milestone Oct 23, 2025
@tdeng5 tdeng5 merged commit 9acfcd5 into intel:main Oct 29, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request release urgent PR requires a urgent attention (for release or blocking another PR)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants