Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ad2695d
[CK] [CK_Tile] Add GroupConv to Kernel Dispatcher (#5168)
vidyasagar-amd Apr 9, 2026
5a8b805
[CK] Adding FMHA functionality.
vidyasagar-amd Mar 9, 2026
b8dc9e7
[CK] Adding FMHA functionality.
vidyasagar-amd Mar 9, 2026
a1cf3b5
[CK] Add further support for bwd kernels.
vidyasagar-amd Mar 10, 2026
0d10d23
[CK] Add parity matrix for fmha against current example folder.
vidyasagar-amd Mar 10, 2026
f4cded7
[CK] Resolve issue with hdims mismatch.
vidyasagar-amd Mar 10, 2026
657e50f
[CK] Relax validation rules to match example.
vidyasagar-amd Mar 11, 2026
dce8dfe
[CK] Add a few more examples for fmha features.
vidyasagar-amd Mar 11, 2026
74f0d6b
[CK] Address review comments.
vidyasagar-amd Mar 11, 2026
6dcd644
[CK] Address further review comments.
vidyasagar-amd Mar 11, 2026
3695a5a
[CK] Tile engine fmha support through dispatcher interface.
vidyasagar-amd Mar 12, 2026
1e18c99
[CK] Fixing readmes and further review comments.
vidyasagar-amd Mar 12, 2026
ea07406
[CK] Code cleanup and another round of review comments.
vidyasagar-amd Mar 12, 2026
6450998
[CK] Addressing another round of review comments.
vidyasagar-amd Mar 12, 2026
342636d
[CK] Add support for bwd kernels.
vidyasagar-amd Mar 13, 2026
eb9c43d
[CK] Add testing matrix.
vidyasagar-amd Mar 16, 2026
281d1ed
[CK] Fix missing instances.
vidyasagar-amd Mar 19, 2026
dd0dd44
[CK] Fix issues with kernel runtime errors.
vidyasagar-amd Mar 20, 2026
f850518
[CK] Fix bug in bwd kernels.
vidyasagar-amd Mar 21, 2026
d1c2518
[CK] Fix minor issues with bwd group kernels.
vidyasagar-amd Mar 22, 2026
f99064a
[CK] Fix filtering rules, improve tile engine parallelism.
vidyasagar-amd Mar 24, 2026
ac94ada
[CK] Fix process parallelism for tile engine generation.
vidyasagar-amd Mar 24, 2026
fb7aa8d
[CK] Further improve benchmarking outputs.
vidyasagar-amd Mar 25, 2026
3566a22
[CK] Batch benchmarking for speed.
vidyasagar-amd Mar 25, 2026
1a98a59
[CK] Further benchmarking efficiency improvements.
vidyasagar-amd Mar 25, 2026
32825ed
Merge branch 'develop' into users/vanantha/ck/dispatcher-fmha
vidyasagar-amd Apr 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion projects/composablekernel/dispatcher/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ endif()
add_library(ck_tile_dispatcher
src/registry.cpp
src/dispatcher.cpp
src/fmha_registry.cpp
src/fmha_dispatcher.cpp
)

# Enable PIC for Python bindings
Expand All @@ -34,13 +36,21 @@ target_include_directories(ck_tile_dispatcher
$<INSTALL_INTERFACE:include>
)

# Link against CK Tile headers (header-only)
# CK Tile core headers (ck_tile/core, ck_tile/ops, etc.)
target_include_directories(ck_tile_dispatcher
PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/../include>
$<INSTALL_INTERFACE:include>
)

# CK project root -- needed only for FMHA generated wrappers that include
# "example/ck_tile/01_fmha/fmha_fwd.hpp". PRIVATE to avoid exposing the
# entire project tree to downstream consumers.
target_include_directories(ck_tile_dispatcher
PRIVATE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/..>
)

# Link against HIP headers if available
if(hip_FOUND)
target_link_libraries(ck_tile_dispatcher PUBLIC hip::host)
Expand Down
38 changes: 31 additions & 7 deletions projects/composablekernel/dispatcher/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,12 @@ python3 examples/grouped_conv/python/03_bwd_data.py # Backward data +
python3 examples/grouped_conv/python/04_bwd_weight.py # Backward weight + CPU ref
python3 examples/grouped_conv/python/05_benchmark.py # Multi-problem benchmark
python3 examples/grouped_conv/python/06_registry_json.py # Heuristic selection + JSON

# FMHA Examples (JIT-compiled on the fly)
python3 examples/fmha/python/01_basic_fmha.py # Basic forward attention
python3 examples/fmha/python/12_masks_fmha.py # Causal masks
python3 examples/fmha/python/18_backward_fmha.py # Backward pass
python3 examples/fmha/python/16_splitkv_fmha.py # Split-KV for long sequences
```

### Example Output
Expand Down Expand Up @@ -716,7 +722,7 @@ This matrix shows all CK Tile operations with per-data-type, per-layout, and per
| GEMM | streamk_gemm<br>example: `40_streamk_gemm/` | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Reduce | multi_reduce2d<br>example: `05_reduce/` | ❌ | | ❌ | | | | | | | | | ❌ | ❌ | ❌ | ❌ |
| Reduce | reduce2d<br>example: `05_reduce/` | ❌ | | ❌ | | | | | | | | | ❌ | ❌ | ❌ | ❌ |
| Attention | fmha<br>example: `01_fmha/` | | | | | ❌ | | | | | | | | | | ❌ |
| Attention | fmha<br>example: `01_fmha/` | | | | | ❌ | | | | | | | | | | ❌ |
| Attention | sparse_attn<br>example: `50_sparse_attn/` | ❌ | | ❌ | | ❌ | | | | | | | ❌ | ❌ | ❌ | ❌ |
| Activation | softmax | ❌ | | ❌ | | | | | | | | | ❌ | ❌ | ❌ | ❌ |
| Activation | topk_softmax<br>example: `09_topk_softmax/` | ❌ | ❌ | ❌ | | | | | | | | | ❌ | ❌ | ❌ | ❌ |
Expand Down Expand Up @@ -871,36 +877,52 @@ dispatcher/
| |---- grouped_conv_problem.hpp # Grouped conv problem (with builder)
| |---- grouped_conv_kernel_decl.hpp # Grouped conv kernel declarations
| |---- grouped_conv_registry.hpp # Grouped conv registry (thread-safe)
| +---- grouped_conv_utils.hpp # Grouped conv utilities
| |---- grouped_conv_utils.hpp # Grouped conv utilities
| |---- fmha_types.hpp # FMHA fwd/bwd args and traits structs
| |---- fmha_problem.hpp # FmhaProblem, FmhaProblemBuilder
| |---- fmha_kernel_key.hpp # FmhaKernelKey (Signature + Algorithm)
| |---- fmha_kernel_instance.hpp # FmhaKernelInstance virtual interface
| |---- fmha_kernel_decl.hpp # Declarative FmhaSignature/FmhaAlgorithm
| |---- fmha_registry.hpp # FmhaRegistry (thread-safe)
| +---- fmha_dispatcher.hpp # FmhaDispatcher (plan, select, run)
|
|---- src/ # C++ implementation
|
|---- codegen/ # Kernel generation
| |---- codegen_common.py # Shared: TileConfig, TraitConfigBase, type mappings
| |---- unified_gemm_codegen.py # GEMM kernel generator
| |---- unified_grouped_conv_codegen.py # Grouped conv kernel generator
| |---- unified_fmha_codegen.py # FMHA kernel generator
| |---- fmha_arch_specs.json # FMHA per-arch tile/pipeline specs
| |---- fmha_rules.py # FMHA validation rules
| |---- fmha_profiles.py # FMHA named profiles/receipts
| +---- arch_specs.json # GPU specifications
|
|---- python/ # Python utilities
| |---- dispatcher_common.py # Shared: paths, validation, Colors, phased output
| |---- ctypes_utils.py # GEMM ctypes utilities
| +---- grouped_conv_utils.py # Grouped conv utilities
| |---- grouped_conv_utils.py # Grouped conv utilities
| +---- fmha_utils.py # FMHA: JIT compile, FmhaRunner, FmhaKernelConfig
|
|---- scripts/ # Build scripts
| |---- compile_gemm_examples.py # GEMM build script
| +---- compile_grouped_conv_examples.py # Grouped conv build script
|
|---- bindings/ctypes/ # Python ctypes interface
| |---- gemm_ctypes_lib.cpp # GEMM Python library
| +---- conv_ctypes_lib.cpp # Grouped conv Python library
| |---- conv_ctypes_lib.cpp # Grouped conv Python library
| +---- fmha_ctypes_lib.cpp # FMHA Python library
|
|---- examples/ # Examples
| |---- gemm/
| | |---- cpp/ # C++ GEMM examples (01-07)
| | +---- python/ # Python GEMM examples (01-11)
| +---- grouped_conv/
| |---- cpp/ # C++ Grouped Conv examples (01-07)
| +---- python/ # Python Grouped Conv examples (01-06)
| |---- grouped_conv/
| | |---- cpp/ # C++ Grouped Conv examples (01-07)
| | +---- python/ # Python Grouped Conv examples (01-06)
| +---- fmha/
| |---- cpp/ # C++ FMHA examples (01-35)
| +---- python/ # Python FMHA examples (01-38)
|
+---- tests/ # Unit tests (C++ and Python)
```
Expand All @@ -913,6 +935,8 @@ dispatcher/
|-----------|--------|
| GEMM C++ | [examples/gemm/cpp/README.md](examples/gemm/cpp/README.md) |
| GEMM Python | [examples/gemm/python/README.md](examples/gemm/python/README.md) |
| FMHA C++ | examples/fmha/cpp/ (35 examples covering all FMHA variants) |
| FMHA Python | examples/fmha/python/ (38 examples with JIT compilation) |
| Codegen | [codegen/README.md](codegen/README.md) |
| Python Utils | [python/README.md](python/README.md) |
| C++ Headers | [include/ck_tile/dispatcher/README.md](include/ck_tile/dispatcher/README.md) |
Expand Down
1 change: 1 addition & 0 deletions projects/composablekernel/dispatcher/bindings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ bindings/
| |---- gemm_ctypes_lib.cpp # GEMM dispatcher C API
| |---- conv_ctypes_lib.cpp # Grouped conv dispatcher C API (fwd + bwd_data)
| |---- conv_bwdw_ctypes_lib.cpp # Grouped conv backward weight C API (separate library)
| |---- fmha_ctypes_lib.cpp # FMHA dispatcher C API (fwd + bwd)
| |---- gpu_helper.cpp # CLI helper for Python
| +---- CMakeLists.txt
+---- README.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention about fmha ctypes?

Expand Down
Loading
Loading