Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
498a03d
Fix gfx950 triton test failures: invalid JSON config and tight tolera…
azaidy Mar 17, 2026
792c6a5
Fix split-K GEMM producing wrong results for M < BLOCK_SIZE_M
azaidy Mar 17, 2026
7c3cbe4
docs: add Triton upgrade GEMM tuning spec and implementation plan
azaidy Mar 17, 2026
f4f836f
feat(tunning): add 7 new ut_*.py tuning scripts for basic GEMM kernels
azaidy Mar 17, 2026
dca879b
feat(tunning): add orchestration utilities for Triton upgrade pipeline
azaidy Mar 17, 2026
6212271
feat(tunning): add orchestrate.py top-level pipeline driver
azaidy Mar 17, 2026
dd2733c
fix(tunning): parallelize baseline/validation collection and iterate …
azaidy Mar 17, 2026
311574a
fix(tunning): correct kernel name patterns for a16w16 variants
azaidy Mar 17, 2026
d399158
fix(tunning): correct atomic kernel pattern, note agnostic is broken
azaidy Mar 17, 2026
2b2b1e8
fix(tunning): single run_tuning call per kernel with both num_stages
azaidy Mar 17, 2026
8c60cfb
fix(tunning): pass GPU ID directly to screen.py instead of HIP_VISIBL…
azaidy Mar 17, 2026
fe3ff84
perf(configs): add tuned gfx950 A8W8 GEMM configs for Triton 3.6
azaidy Mar 18, 2026
bbe27f1
perf(configs): update gfx950 A8W8 default config for Triton 3.6
azaidy Mar 19, 2026
bf887d1
docs: add tuning learnings and updated per-kernel procedure
azaidy Mar 19, 2026
0a16985
perf(configs): retune gfx950 A16W16 GEMM configs for Triton 3.6
azaidy Mar 20, 2026
e94af5e
docs+perf: update plan with BK=64 learning, commit manual tuning fixes
azaidy Mar 20, 2026
ee877aa
perf(configs): retune gfx950 A8W8_BLOCKSCALE GEMM configs for Triton 3.6
azaidy Mar 23, 2026
395cbd2
Remove redundant GEMM
azaidy Mar 23, 2026
3e13c65
Merge branch 'main' into alizaidy/gfx950-kernel-fixes-cherry-picked
azaidy Mar 23, 2026
55fd679
perf(configs): retune gfx950 AFP4WFP4 GEMM configs for Triton 3.6
azaidy Mar 24, 2026
1624e07
perf(configs): retune gfx950 A8W8_BLOCKSCALE_PRESHUFFLED GEMM configs…
azaidy Mar 24, 2026
8576155
fix(configs): clamp BLOCK_SIZE_M <= M in A8W8_BLOCKSCALE configs
azaidy Mar 24, 2026
dafc9a4
perf(configs): selective BM<=M clamp for 2 a16w16 shapes
azaidy Mar 24, 2026
2ad1315
docs: update tuning plan with all learnings from 3 kernels
azaidy Mar 24, 2026
2d84c01
perf(configs): retune M=8192 N=32768 K=512 blockscale config
azaidy Mar 25, 2026
a74566d
Merge remote-tracking branch 'origin/alizaidy/gfx950-kernel-fixes-che…
azaidy Mar 25, 2026
de3b012
Merge branch 'main' into alizaidy/gfx950-kernel-fixes-cherry-picked
azaidy Mar 25, 2026
1ac8445
perf(configs): manually tune preshuffle regression configs
azaidy Mar 25, 2026
2b3b47e
perf(configs): manually tune blockscale regression configs
azaidy Mar 25, 2026
6a05a64
perf(configs): fix last blockscale regressions with expanded search
azaidy Mar 25, 2026
416d5e4
perf(configs): retune gfx950 A8W8_PER_TOKEN_SCALE GEMM configs for Tr…
azaidy Mar 27, 2026
ae55c9b
perf(configs): manually tune M=128 N=8192 K=32768 per_token_scale
azaidy Mar 27, 2026
d5f14d9
docs: update tuning plan with split-K + stages and nonkdim learnings
azaidy Mar 27, 2026
c08b1a6
Rebase to main (#2496)
azaidy Mar 27, 2026
3f39a84
Merge branch 'alizaidy/gfx950-kernel-fixes-cherry-picked' of https://…
azaidy Mar 27, 2026
07b4f99
Merge branch 'main' into alizaidy/gfx950-kernel-fixes-cherry-picked
azaidy Mar 27, 2026
eb9fbf3
perf(configs): retune gfx950 A16W16-ATOMIC GEMM configs for Triton 3.6
azaidy Mar 27, 2026
505a916
docs: add early verification rule for long-running tasks
azaidy Mar 27, 2026
b1e5cab
perf(configs): retune gfx950 A8WFP4 GEMM configs for Triton 3.6
azaidy Mar 28, 2026
231329c
feat: add SCREEN_MAX_BATCH env var to screen.py for tuning large shapes
azaidy Mar 30, 2026
08c12a0
perf: tune preshuffled AFP4WFP4 GEMM configs for Triton 3.6
azaidy Mar 30, 2026
b2d920c
Remove AOT
azaidy Mar 30, 2026
256de0e
docs: add agentic kernel tuning pipeline design spec
azaidy Mar 30, 2026
0ff8e1e
docs: add Plan 1 (Infrastructure Layer) for agentic tuning pipeline
azaidy Mar 30, 2026
270e9b1
docs: expand spec to cover all GEMM categories (batched, fused, feed_…
azaidy Mar 30, 2026
b53408e
docs: fix fused/ff kernel note — they work on gfx950, just need new c…
azaidy Mar 30, 2026
02de034
docs: add Plans 2-4 for agentic kernel tuning pipeline
azaidy Mar 30, 2026
1d44035
feat(tuning-agent): add shared type definitions
azaidy Mar 30, 2026
0fe8c1a
feat(tuning-agent): add YAML config parsing with validation
azaidy Mar 30, 2026
d0277c7
fix(tuning-agent): fix docker_exec container_id assertion in test
azaidy Mar 30, 2026
2d34a04
feat(tuning-agent): add notification system with approval gates
azaidy Mar 30, 2026
1109a2f
feat(tuning-agent): add machine pool manager with allocation and heal…
azaidy Mar 30, 2026
65fbff9
feat(tuning-agent): add watchdog for timeout and progress monitoring
azaidy Mar 30, 2026
7da09fa
feat(tuning-agent): add artifact manager for results and checkpoints
azaidy Mar 30, 2026
9b983cc
test(tuning-agent): add integration tests for infrastructure layer
azaidy Mar 30, 2026
b09df78
feat(tuning-agent): add BaseSubagent ABC and SubagentResult types
azaidy Mar 30, 2026
d4d6621
feat(tuning-agent): add 6 skeleton subagent modules
azaidy Mar 30, 2026
4185000
feat(tuning-agent): add BaselineAgent with rocprof --stats parsing
azaidy Mar 30, 2026
11e7541
feat(tuning-agent): add TuningAgent with screen.py orchestration
azaidy Mar 30, 2026
c12981a
feat(tuning-agent): add RegressionFixerAgent with never-modify-fallba…
azaidy Mar 30, 2026
a2c8439
feat(tuning-agent): add subagent package exports
azaidy Mar 30, 2026
db07a51
feat(tuning-agent): add KernelSupervisor types and checkpoint logic
azaidy Mar 30, 2026
5df431c
feat(tuning-agent): add subagent dispatch, retry, and Triton switching
azaidy Mar 30, 2026
45935d5
feat(tuning-agent): add phase runners 0-4 (setup through tuning pipel…
azaidy Mar 30, 2026
c2866d9
feat(tuning-agent): add phases 5-6 and main run() loop with checkpoin…
azaidy Mar 30, 2026
2a273a0
feat(tuning-agent): export KernelSupervisor from package init
azaidy Mar 30, 2026
76dd57f
feat(tuning-agent): add kernel discovery across all GEMM categories
azaidy Mar 30, 2026
5caba30
feat(tuning-agent): add terminal dashboard with ANSI color output
azaidy Mar 30, 2026
9325f4f
feat(tuning-agent): add CLI entry point with --dry-run and auto repo …
azaidy Mar 30, 2026
9aece1b
feat(tuning-agent): add Orchestrator with machine scheduling and kern…
azaidy Mar 30, 2026
98b0209
feat(tuning-agent): implement SetupAgent _execute()
azaidy Mar 30, 2026
9d30f80
feat(tuning-agent): implement DiscoveryAgent _execute()
azaidy Mar 30, 2026
f89aa5c
feat(tuning-agent): implement PatternAnalyzerAgent with adaptive sear…
azaidy Mar 30, 2026
cb26692
feat(tuning-agent): implement ConfigGeneratorAgent with view-screen.py
azaidy Mar 31, 2026
da61e58
feat(tuning-agent): implement ValidationAgent with parallel rocprof c…
azaidy Mar 31, 2026
595297f
feat(tuning-agent): implement ScriptCreatorAgent with kernel source a…
azaidy Mar 31, 2026
06a3932
fix(tuning-agent): add results_dir param to Orchestrator, add dry-run…
azaidy Mar 31, 2026
fc68b9c
fix(tuning-agent): fix critical and important issues from code review
azaidy Mar 31, 2026
24fdd7b
fix(tuning-agent): fix test mocks for docker_exec artifact writes
azaidy Mar 31, 2026
b4e786c
fix(tuning-agent): fix remaining e2e blocking issues and test mocks
azaidy Mar 31, 2026
ba4cba3
fix(tuning-agent): fix critical integration wiring between supervisor…
azaidy Mar 31, 2026
c0b5aa5
fix(tuning-agent): fix 5 issues from review round 4 (SetupAgent prefl…
azaidy Mar 31, 2026
dd9d792
fix(tuning-agent): fix remaining issues from comprehensive review
azaidy Mar 31, 2026
3b129a3
fix(tuning-agent): use discovered ut_script instead of hardcoded ut_g…
azaidy Mar 31, 2026
0ce6628
fix(tuning-agent): fix log paths, threshold units, and geomean calcul…
azaidy Mar 31, 2026
bb624c1
fix(tuning-agent): enrich regression dicts with config_file and bucke…
azaidy Mar 31, 2026
fc8733e
docs(tuning-agent): add E2E testing guide for new agents
azaidy Mar 31, 2026
9734935
fix(tunning): add timeout and smarter error handling to screen.py
azaidy Mar 29, 2026
01e7046
perf(configs): retune gfx950 A16W16-gated GEMM configs for Triton 3.6
azaidy Apr 1, 2026
e3366ef
perf(configs): retune gfx950 A16W8_BLOCKSCALE GEMM configs for Triton…
azaidy Apr 2, 2026
dd047f4
docs: add gemm_a16wfp4 tuning design spec
azaidy Apr 2, 2026
09c005f
perf(configs): retune gfx950 AFP4WFP4 GEMM configs for Triton 3.6
azaidy Apr 2, 2026
b2d8351
Merge branch 'alizaidy/gfx950-kernel-fixes-cherry-picked' of https://…
azaidy Apr 2, 2026
33633b7
perf(configs): retune gfx950 AFP4WFP4_PRESHUFFLED GEMM configs for Tr…
azaidy Apr 2, 2026
fb2d639
perf(configs): retune gfx950 A8W8_BLOCKSCALE GEMM configs for Triton 3.6
azaidy Apr 3, 2026
771e9e1
perf(configs): retune gfx950 A16W16-ATOMIC GEMM configs for Triton 3.6
azaidy Apr 3, 2026
6caa85d
Manual tuning for AFP4WFP4-N=32768-K=512
azaidy Apr 3, 2026
194b8a2
Manually tune A16W16-N=128-K=2880
azaidy Apr 3, 2026
003c6d8
Add shapes info
azaidy Apr 3, 2026
495dcbd
Add shapes info
azaidy Apr 3, 2026
c8f15be
Remove unnecessary files
azaidy Apr 3, 2026
41cc69f
Remove unnecessary files
azaidy Apr 3, 2026
f57a89b
Revert tolerance change
azaidy Apr 3, 2026
b8c827f
perf(configs): revert config buckets to match main branch defaults fo…
azaidy Apr 6, 2026
b150fff
Merge branch 'main' into alizaidy/gfx950-kernel-fixes-cherry-picked
azaidy Apr 6, 2026
9fcd5d2
Fix lint
azaidy Apr 6, 2026
a042c3a
Remove unused file
azaidy Apr 6, 2026
7b10b72
Update CK to main
azaidy Apr 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 0 additions & 6 deletions aiter/ops/triton/configs/gemm/aot/README.md

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.

This file was deleted.

Binary file not shown.
Loading
Loading