[Triton-MLIR][Backend]Add the missing support when MMAv1 as the paren… by goostavz · Pull Request #983 · triton-lang/triton

goostavz · 2022-12-14T12:00:43Z

…t of sliceEncodingAttr

The tests cases mentioned in triton-lang#983 have been added to A770 skip list. Fixes triton-lang#1579.

Summary: This change improves the TLX emission pass to produce cleaner, more readable output by: 1. **Inlining constants at use sites** - Instead of emitting `c32_i32 = 32` and referencing `c32_i32`, constants are now inlined directly as `32` where used. This significantly reduces noise from constant definitions. 2. **Skipping non-meaningful operations** - Operations that don't contribute to TLX understanding are now filtered out: - `gpu.barrier` - not needed in TLX - `ttg.convert_layout` - internal layout conversion - `tt.return` / `tt.reduce.return` - terminators - Various warp specialization internals already skipped 3. **Skipping empty async_task blocks** - Partition regions that only contain skipped operations (like a single `tt.return`) are now omitted, eliminating empty `with tlx.async_task():` blocks. 4. **Refactored skip logic** - Replaced individual `if` statements with a `llvm::StringSet<>` lookup for cleaner, more maintainable code. Pull Request resolved: facebookexperimental/triton#983 Test Plan: 1. Generated fwd.txt output using: ``` TRITON_TLX_OUTPUT_FILE=~/tritonbench/output.py TRITON_TLX_COMPILABLE=1 TRITON_DUMP_TTGIR_TO_TLX=1 TRITON_ALWAYS_COMPILE=1 TRITON_KERNEL_DUMP=1 TRITON_DUMP_DIR=/tmp/triton_tissue030 TRITON_USE_META_WS=1 TRITON_PRINT_AUTOTUNING=1 CUDA_VISIBLE_DEVICES=3 bash ~/fbsource/fbcode/ads_mkl/benchmarks/denoise.sh python run.py --op blackwell_attentions --seq-len 8192 --batch 4 --n-heads 32 --d-head 128 --rep 3000 --sleep 1.0 --metrics tflops --simple-output --only triton_tutorial_flash_persistent_blackwell --force ``` 2. Verified: - Constants are inlined (e.g., `mul(arg2, 32)` instead of `mul(arg2, c32_i32)`) - No empty `with tlx.async_task():` blocks at end of output - `gpu.barrier`, `ttg.convert_layout`, `tt.return` are not emitted Before the change [P2205734619](https://www.internalfb.com/phabricator/paste/view/P2205734619) After the change [P2208628663](https://www.internalfb.com/phabricator/paste/view/P2208628663) Reviewed By: jma2333 Differential Revision: D94436902 Pulled By: tissue3 fbshipit-source-id: ddaf3e9d939b25573b2d3cac400bccae3516df44

[Triton-MLIR][Backend]Add the missing support when MMAv1 as the paren…

be6eabd

…t of sliceEncodingAttr

goostavz requested a review from ptillet as a code owner December 14, 2022 12:00

ptillet approved these changes Dec 14, 2022

View reviewed changes

ptillet merged commit 8025455 into triton-lang:triton-mlir Dec 14, 2022

ZzEeKkAa pushed a commit to ZzEeKkAa/triton that referenced this pull request Aug 5, 2024

Skip list for A770 (triton-lang#1588)

6b57ef7

The tests cases mentioned in triton-lang#983 have been added to A770 skip list. Fixes triton-lang#1579.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Triton-MLIR][Backend]Add the missing support when MMAv1 as the paren…#983

[Triton-MLIR][Backend]Add the missing support when MMAv1 as the paren…#983
ptillet merged 1 commit intotriton-lang:triton-mlirfrom
goostavz:goostavz/dev_mma_v1

goostavz commented Dec 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goostavz commented Dec 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants