Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
9ebf406
fix: remap QuantType.No to per_1x32 for fp4x2 MoE weights (W4A6 support)
vecheruk-amd Mar 18, 2026
feb21bc
Fixing two cascading bugs when running the MoE tuner
xaguilar-amd Mar 25, 2026
f1f94cf
Enable split-K for block-scale A8W8 CK and CKTile GEMMs
samremes Mar 30, 2026
ec422c3
Wire splitK from tuning CSV through production blockscale GEMM dispatch
samremes Mar 30, 2026
ab58051
fix: ck_moe_stage1 split-K output buffer overflow from padding scatter
ChuanLi1101 Mar 31, 2026
71e80fd
Merge branch 'main' into samremes/blockscale_splitk
samremes Apr 1, 2026
0c5544f
Merge branch 'main' into samremes/blockscale_splitk
samremes Apr 1, 2026
b15b5c5
Merge branch 'main' into samremes/blockscale_splitk
samremes Apr 2, 2026
2bf04dd
Address PR review feedback: validate splitK, fix hipMemset stride iss…
Copilot Apr 8, 2026
285092d
Merge branch 'main' into samremes/blockscale_splitk
samremes Apr 9, 2026
7e319e5
black format
samremes Apr 9, 2026
118099e
fix splitk test dimensions
samremes Apr 10, 2026
3216bce
Add gdn fusions
hellozhuo-amd Apr 10, 2026
9811501
style: fix ruff F841 and black-format Triton PR files
hellozhuo-amd Apr 10, 2026
b26972f
Update fused_rearrange_sigmoid_gdr.py
hellozhuo-amd Apr 13, 2026
8695885
Update op_tests
hellozhuo-amd Apr 13, 2026
b69cb72
Fix BLACK format problem
hellozhuo-amd Apr 13, 2026
c4db40f
Fix black check failure
hellozhuo-amd Apr 13, 2026
ac48df0
Update test_fused_rearrange_sigmoid_gdr.py
hellozhuo-amd Apr 13, 2026
b9f33dd
Merge branch 'origin/main' into zhuo/qwen3_triton_gdn
hellozhuo-amd Apr 13, 2026
56a2b85
Merge branch 'main' into zhuo/qwen3_triton_gdn
hellozhuo-amd Apr 14, 2026
bc49759
Allow callers to pass pre-allocated moe_buf to avoid output copy
tpopp Apr 9, 2026
60a459c
Add moe_buf pass-through test to existing test_moe_sorting
tpopp Apr 10, 2026
02c26a1
Merge branch 'main' into samremes/blockscale_splitk
samremes Apr 14, 2026
60bb24c
Merge branch 'main' into samremes/blockscale_splitk
samremes Apr 15, 2026
5580ea9
Merge branch 'main' into samremes/blockscale_splitk
samremes Apr 16, 2026
f214128
Merge branch 'main' into zhuo/qwen3_triton_gdn
hellozhuo-amd Apr 16, 2026
5084462
Merge branch 'main' into zhuo/qwen3_triton_gdn
juuso-oskari Apr 20, 2026
3d084e2
Merge branch 'main' into zhuo/qwen3_triton_gdn
juuso-oskari Apr 21, 2026
b2ab876
Merge branch 'main' into zhuo/qwen3_triton_gdn
juuso-oskari Apr 21, 2026
bdc9a96
Merge branch 'main' into zhuo/qwen3_triton_gdn
hellozhuo-amd Apr 22, 2026
7fbd9ad
Merge branch 'main' into zhuo/qwen3_triton_gdn
hellozhuo-amd Apr 22, 2026
3ffd13c
Replace _fast with _single_token for causal conv1d update kernels for…
hellozhuo-amd Apr 22, 2026
9946258
Fix blck format error
hellozhuo-amd Apr 22, 2026
48eda94
Add tuned a8w8 blockscale GEMM config for Qwen3-Next-80B-A3B on MI355X
nholmber Apr 22, 2026
b8ea372
Merge branch 'main' into zhuo/qwen3_triton_gdn
hellozhuo-amd Apr 22, 2026
0f41d78
Merge branch 'main' into zhuo/qwen3_triton_gdn
hellozhuo-amd Apr 23, 2026
2aa2493
refactor(triton): rename gated RMSNorm+FP8 op to fused_rms_gated_fp8_…
hellozhuo-amd Apr 23, 2026
35035ff
Merge branch 'main' into zhuo/qwen3_triton_gdn
juuso-oskari Apr 24, 2026
711c9e9
Merge branch 'main' into zhuo/qwen3_triton_gdn
hellozhuo-amd Apr 24, 2026
f6ca360
Retune blockscale GEMM configs to fix invalid kernelId+splitK combina…
nholmber Apr 25, 2026
275ceff
[Bug] pa_mqa_logits: mask OOB stores on OutLogits_buffer
maeehart Apr 22, 2026
9f1d14e
Merge branch 'pr-2423' into silo/v0.1.13-kernels
sunway513 May 1, 2026
e371596
Merge branch 'pr-2457' into silo/v0.1.13-kernels
sunway513 May 1, 2026
2b27b22
Merge branch 'pr-2464' into silo/v0.1.13-kernels
sunway513 May 1, 2026
6b02368
Merge kernel PR #2541 (auto-resolved)
sunway513 May 1, 2026
088873e
Merge kernel PR #2547 (auto-resolved)
sunway513 May 1, 2026
58bab19
Merge branch 'pr-2687' into silo/v0.1.13-kernels
sunway513 May 1, 2026
8e969c8
Merge branch 'pr-2866' into silo/v0.1.13-kernels
sunway513 May 1, 2026
c47dfbb
Merge branch 'pr-2868' into silo/v0.1.13-kernels
sunway513 May 1, 2026
85e96b0
style: fix Black formatting
sunway513 May 1, 2026
e53aa00
style: fix Black formatting (Python 3.12 compatible)
sunway513 May 1, 2026
1daa01a
ci: replace deprecated zmq package with pyzmq
sunway513 May 3, 2026
00f37f9
ci: increase pip retries and timeout for CI reliability
sunway513 May 3, 2026
9ba8fb3
ci: make pyzmq install non-blocking in triton test setup
sunway513 May 3, 2026
e1aa5eb
ci: retry pip install individually on batch failure
sunway513 May 3, 2026
ea15872
[MLA] Fix nhead=32 non-persistent decode crash on gfx950
frida-andersson Apr 30, 2026
b6fe3d6
revert: remove #2983 (MLA nhead=32 fix) — causes test_mla CI failures
sunway513 May 3, 2026
7faa32e
fix: restore tuple unpack for FlyDSL fused-quant stage1 return
sunway513 May 4, 2026
ac5782c
Merge branch 'main' into silo/v0.1.13-kernels
sunway513 May 4, 2026
f9988ae
Revert leaked changes from excluded PRs #2457/#2547/#2687 in fused_mo…
sunway513 May 4, 2026
9ca055a
fix: restore fp4_utils.moe_mxfp4_sort for new code paths (different o…
sunway513 May 4, 2026
b5a0ce7
style: fix Black formatting for local imports
sunway513 May 4, 2026
5e7d2a9
fix: remove rejected W4A6 QuantType remap from fused_moe_dp_shared_ex…
sunway513 May 4, 2026
c8d2764
fix: restore silently-reverted main features from bad merge resolution
azaidy May 4, 2026
e468206
chore: remove #2464 from bulk merge per author request
azaidy May 4, 2026
f1f5002
Merge remote-tracking branch 'origin/main' into silo/v0.1.13-kernels
azaidy May 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/scripts/build_aiter_triton.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,15 @@ dpkg -l | grep rocm || echo "No ROCm packages found."
echo
echo "==== Install dependencies and aiter ===="
git config --global --add safe.directory /workspace
pip install --upgrade pandas zmq einops numpy==1.26.2
pip config set global.retries 15
pip config set global.timeout 120
pip install --upgrade pandas pyzmq einops numpy==1.26.2 || {
echo "WARNING: batch pip install failed, retrying packages individually..."
pip install --upgrade pandas || true
pip install --upgrade pyzmq || echo "WARNING: pyzmq unavailable (only needed by aiter.dist.shm_broadcast)"
pip install --upgrade einops
pip install --upgrade "numpy==1.26.2"
}
pip uninstall -y aiter || true
pip install --upgrade "pybind11>=3.0.1"
pip install --upgrade "ninja>=1.11.1"
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/aiter-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ jobs:
shopt -s nullglob &&
rm -rf dist build aiter_meta ./*.egg-info &&
pip install -r requirements.txt &&
pip install --upgrade pandas zmq einops numpy==1.26.2 &&
pip install --upgrade pandas pyzmq einops numpy==1.26.2 &&
pip install --upgrade "pybind11>=3.0.1" &&
pip install --upgrade "ninja>=1.11.1" &&
pip install --upgrade setuptools_scm &&
Expand Down Expand Up @@ -372,7 +372,7 @@ jobs:
bash -lc "
pip uninstall -y amd-aiter aiter || true
pip install -r requirements.txt
pip install --upgrade pandas zmq einops numpy==1.26.2
pip install --upgrade pandas pyzmq einops numpy==1.26.2
pip install --upgrade 'pybind11>=3.0.1'
pip install --upgrade 'ninja>=1.11.1'
pip install tabulate
Expand Down Expand Up @@ -573,7 +573,7 @@ jobs:
bash -lc "
pip uninstall -y amd-aiter aiter || true
pip install -r requirements.txt
pip install --upgrade pandas zmq einops numpy==1.26.2
pip install --upgrade pandas pyzmq einops numpy==1.26.2
pip install --upgrade 'pybind11>=3.0.1'
pip install --upgrade 'ninja>=1.11.1'
pip install tabulate
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/vllm_benchmark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ jobs:
pip config set global.retries 10
pip config set global.index-url https://ausartifactory.amd.com/artifactory/api/pypi/hw-cpe-prod-remote/simple
pip install -r requirements.txt
pip install --upgrade pandas zmq einops numpy==1.26.2
pip install --upgrade pandas pyzmq einops numpy==1.26.2
pip install --upgrade "pybind11>=3.0.1"
pip install --upgrade "ninja>=1.11.1"
pip install --upgrade "setuptools_scm[toml]>=6.2" wheel packaging psutil
Expand Down
1,483 changes: 1,483 additions & 0 deletions aiter/configs/model_configs/a8w8_blockscale_tuned_gemm_qwen3_next_80b_a3b.csv

Large diffs are not rendered by default.

Loading
Loading