Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
d4124de
CI: Operators tunning pipelines
gyohuangxin Oct 11, 2025
6f5f0a6
Updates
gyohuangxin Oct 11, 2025
16437c1
Updates
gyohuangxin Oct 11, 2025
e9cc35d
Updates
gyohuangxin Oct 11, 2025
0d4538e
Updates
gyohuangxin Oct 11, 2025
edcf216
Show computing unints
gyohuangxin Oct 11, 2025
4457885
Updates
gyohuangxin Oct 11, 2025
80a62c5
Updates
gyohuangxin Oct 14, 2025
9238a2b
Add op_tune.sh
gyohuangxin Oct 14, 2025
343c394
Updates
gyohuangxin Oct 14, 2025
81d1e09
Disable a4w4
gyohuangxin Oct 14, 2025
448e492
Updates the error handling
gyohuangxin Oct 15, 2025
7711fd4
Updates the error handling
gyohuangxin Oct 15, 2025
af7d984
Updates
gyohuangxin Oct 15, 2025
6ae4e6b
Updates
gyohuangxin Oct 15, 2025
e95cbbc
Updates
gyohuangxin Oct 15, 2025
b60ccfc
Updates
gyohuangxin Oct 15, 2025
c5c5549
Update .github/scripts/op_tune.sh
gyohuangxin Oct 15, 2025
4da377b
Updates
gyohuangxin Oct 15, 2025
a934c5e
Add uloading tuned CSVs
gyohuangxin Oct 15, 2025
4c773e5
Updates
gyohuangxin Oct 15, 2025
752d201
Add shape name
gyohuangxin Oct 22, 2025
d079a5f
Add shape arg
gyohuangxin Oct 22, 2025
c1c4dc3
Allows users to select the shapes they want to tune and specify the a…
gyohuangxin Oct 23, 2025
e967bcd
Only be triggered when modify the untuned csv files under aiter confi…
gyohuangxin Oct 23, 2025
0dcc6a9
Test
gyohuangxin Oct 23, 2025
09dbe05
Updates
gyohuangxin Oct 23, 2025
488f4c5
Updates
gyohuangxin Oct 27, 2025
8d009e7
Update .github/workflows/operators-tuning.yaml
gyohuangxin Oct 27, 2025
dfd4a30
Update csrc/ck_gemm_a8w8_blockscale_bpreshuffle/README.md
gyohuangxin Oct 27, 2025
723dd13
Update a4w4_blockscale_untuned_gemm.csv
gyohuangxin Oct 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions .github/scripts/op_tune.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/bin/bash
set -euo pipefail

if [ $# -lt 1 ] || [ $# -gt 3 ]; then
echo "Usage: $0 [test|tune] [shape_name (optional)] [tuning_arg (optional)]"
exit 1
fi

mode="$1"
shape_filter="${2:-}"
tuning_arg="${3:-}"

tuneFailed=false
testFailed=false
tuneFailedCmds=()
testFailedFiles=()

declare -a tune_jobs=(
"ck_batched_gemm_a8w8:csrc/ck_batched_gemm_a8w8:op_tests/test_batched_gemm_a8w8.py:python3 csrc/ck_batched_gemm_a8w8/batched_gemm_a8w8_tune.py -i aiter/configs/a8w8_untuned_batched_gemm.csv -o aiter/configs/a8w8_tuned_batched_gemm.csv"
"ck_batched_gemm_bf16:csrc/ck_batched_gemm_bf16:op_tests/test_batched_gemm_bf16.py:python3 csrc/ck_batched_gemm_bf16/batched_gemm_bf16_tune.py -i aiter/configs/bf16_untuned_batched_gemm.csv -o aiter/configs/bf16_tuned_batched_gemm.csv"
# "csrc/ck_gemm_a4w4_blockscale:op_tests/test_gemm_a4w4_blockscale.py:python3 csrc/ck_gemm_a4w4_blockscale/gemm_a4w4_blockscale_tune.py -i aiter/configs/a4w4_blockscale_untuned_gemm.csv -o aiter/configs/a4w4_blockscale_tuned_gemm.csv"
"ck_gemm_a8w8:csrc/ck_gemm_a8w8:op_tests/test_gemm_a8w8.py:python3 csrc/ck_gemm_a8w8/gemm_a8w8_tune.py -i aiter/configs/a8w8_untuned_gemm.csv -o aiter/configs/a8w8_tuned_gemm.csv"
"ck_gemm_a8w8_blockscale:csrc/ck_gemm_a8w8_blockscale:op_tests/test_gemm_a8w8_blockscale.py:python3 csrc/ck_gemm_a8w8_blockscale/gemm_a8w8_blockscale_tune.py -i aiter/configs/a8w8_blockscale_untuned_gemm.csv -o aiter/configs/a8w8_blockscale_tuned_gemm.csv"
"ck_gemm_a8w8_blockscale_bpreshuffle:csrc/ck_gemm_a8w8_blockscale_bpreshuffle:op_tests/test_gemm_a8w8_blockscale.py:python3 csrc/ck_gemm_a8w8_blockscale_bpreshuffle/gemm_a8w8_blockscale_bpreshuffle_tune.py -i aiter/configs/a8w8_blockscale_bpreshuffle_untuned_gemm.csv -o aiter/configs/a8w8_blockscale_bpreshuffle_tuned_gemm.csv"
"ck_gemm_a8w8_bpreshuffle:csrc/ck_gemm_a8w8_bpreshuffle:op_tests/test_gemm_a8w8.py:python3 csrc/ck_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py -i aiter/configs/a8w8_bpreshuffle_untuned_gemm.csv -o aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv"
)

for job in "${tune_jobs[@]}"; do
IFS=':' read -r shape dir test_path tune_cmd <<< "$job"
if [ -n "$shape_filter" ] && [ "$shape" != "$shape_filter" ]; then
continue
fi
echo "============================================================"
echo "🧪 Processing shape: $shape under directory: $dir"
echo "------------------------------------------------------------"
if [ "$mode" == "test" ]; then
echo "Running operator test: python3 $test_path"
if python3 "$test_path"; then
echo "✅ Test PASSED: $test_path"
else
echo "❌ Test FAILED: $test_path"
testFailed=true
testFailedFiles+=("$test_path")
fi
elif [ "$mode" == "tune" ]; then
# Append tuning_arg if provided
if [ -n "$tuning_arg" ]; then
full_tune_cmd="$tune_cmd $tuning_arg"
else
full_tune_cmd="$tune_cmd"
fi
echo "Running tuning script: $full_tune_cmd"
if eval "$full_tune_cmd"; then
echo "✅ Tuning PASSED: $full_tune_cmd"
else
echo "❌ Tuning FAILED: $full_tune_cmd"
tuneFailed=true
tuneFailedCmds+=("$full_tune_cmd")
fi
else
echo "Unknown mode: $mode"
exit 1
fi
echo "==============================================="
echo
done

if [ "$tuneFailed" = true ]; then
echo "Failed tune commands:"
for c in "${tuneFailedCmds[@]}"; do
echo " $c"
done
exit 1
elif [ "$testFailed" = true ]; then
echo "Failed test files:"
for f in "${testFailedFiles[@]}"; do
echo " $f"
done
exit 1
else
echo "All tunes/tests passed."
exit 0
fi
124 changes: 124 additions & 0 deletions .github/workflows/operators-tuning.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
name: Operators Tuning

on:
pull_request:
paths:
- 'aiter/configs/*untuned*.csv'
workflow_dispatch:
inputs:
shapes:
description: 'Comma separated shape names to run (leave empty for all)'
required: false
default: ''
arguments:
description: 'Additional arguments for the tuning script'
required: false
default: ''

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
operators_tuning:
strategy:
fail-fast: false
matrix:
runner: [aiter-1gpu-runner] # TODO: add more runners
name: Operators tuning
runs-on: ${{ matrix.runner }}

steps:
- name: Checkout aiter repo
uses: actions/checkout@v4

- name: Sync submodules
run: |
set -e
git submodule sync
git submodule update --init --recursive --depth 1 --jobs 4

- name: Clean up Rocm processes
run: |
./.github/scripts/clean_up_rocm.sh

- name: Run the container
run: |
set -ex
echo "Starting container: operators_tuning_test"
docker run -dt \
--device=/dev/dri \
--device=/dev/kfd \
--shm-size=16G \
--group-add $(getent group render | cut -d: -f3) \
--group-add $(getent group video | cut -d: -f3) \
-v "${{ github.workspace }}:/workspace" \
-w /workspace \
--name operators_tuning_test \
rocm/pytorch:latest

- name: Setup-Triton
run: |
set -ex
echo "Setting up Triton..."
docker exec \
-w /workspace \
operators_tuning_test \
./.github/scripts/build_triton.sh

- name: Show Computing Units
run: |
set -ex
echo "Showing Computing Units..."
docker exec \
-w /workspace \
operators_tuning_test \
rocminfo | grep -i 'Compute Unit'

- name: Test Performance before tuning
run: |
set -ex
docker exec \
-w /workspace \
operators_tuning_test \
./.github/scripts/op_tune.sh test "${{ github.event.inputs.shapes }}"

- name: Operators tuning Tests
run: |
set -ex
echo "Running Operators tuning Tests..."
docker exec \
-w /workspace \
operators_tuning_test \
./.github/scripts/op_tune.sh tune "${{ github.event.inputs.shapes }}" "${{ github.event.inputs.arguments }}"

- name: Show the difference after tuning
run: |
set -ex
git diff --color-words --word-diff=porcelain

- name: Test Performance after tuning
run: |
set -ex
docker exec \
-w /workspace \
operators_tuning_test \
./.github/scripts/op_tune.sh test "${{ github.event.inputs.shapes }}"

- name: Upload tuned CSVs
uses: actions/upload-artifact@v4
with:
name: tuned-csvs
path: aiter/configs/*.csv

- name: Cleanup
if: always()
run: |
set -ex
echo "Cleaning up..."
docker rm -f operators_tuning_test || true

- name: Clean up Rocm processes
if: always()
run: |
./.github/scripts/clean_up_rocm.sh
6 changes: 3 additions & 3 deletions csrc/ck_gemm_a8w8_blockscale_bpreshuffle/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ You can find the results of the tuning in `aiter/configs/a8w8_blockscale_bpreshu
`cu_num` means the number of compute units, and it is used to distinguish between graphics.

4. Build tuned kernels and test:
Test the performance, modify the test instance in `op_tests/test_gemm_a8w8_blockscale_bpreshuffle.py` and run it, please wait a few minutes as it will build gemm_a8w8_blockscale_bpreshuffle tuned kernels in `aiter/configs/a8w8_blockscale_bpreshuffle_tuned_gemm.csv` via jit:
`python3 op_tests/test_gemm_a8w8_blockscale_bpreshuffle.py`
If you have built gemm_a8w8 kernels brefore tuning new GEMM shapes, please add `AITER_REBUILD=1` before your test cmd, such as `AITER_REBUILD=1 python3 op_tests/test_gemm_a8w8_blockscale_bpreshuffle.py`. It will rebuild kernels from `AITER_CONFIG_GEMM_A8W8_BLOCKSCALE_BPRESHUFFLE` the default one will be `aiter/configs/a8w8_blockscale_bpreshuffle_tuned_gemm.csv`.
Test the performance, modify the test instance in `op_tests/test_gemm_a8w8_blockscale.py` and run it, please wait a few minutes as it will build gemm_a8w8_blockscale_bpreshuffle tuned kernels in `aiter/configs/a8w8_blockscale_bpreshuffle_tuned_gemm.csv` via jit:
`python3 op_tests/test_gemm_a8w8_blockscale.py`
If you have built gemm_a8w8 kernels before tuning new GEMM shapes, please add `AITER_REBUILD=1` before your test cmd, such as `AITER_REBUILD=1 python3 op_tests/test_gemm_a8w8_blockscale_bpreshuffle.py`. It will rebuild kernels from `AITER_CONFIG_GEMM_A8W8_BLOCKSCALE_BPRESHUFFLE` the default one will be `aiter/configs/a8w8_blockscale_bpreshuffle_tuned_gemm.csv`.

## More
If you use flag `PREBUILD_KERNELS=1` when you install aiter, it will build gemm a8w8 kernels in tuned gemm csv by default. If you want to use the new result of gemm_a8w8_tune, please remove `build` and `*.so` in `aiter/jit` first, then re-intall aiter after finishing tune. This can take a lot of time and is not recommended.