Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
98fbbb8
chip_info: add GFX_CU_NUM_MAP and get_build_targets()
eppaneamd Mar 27, 2026
9ba74fe
aiter/configs: migrate tuned GEMM CSVs to add gfx as first column
eppaneamd Mar 27, 2026
2f59ded
csrc: fix gen_instances.py to filter by (gfx, cu_num) build targets
eppaneamd Mar 27, 2026
b64d576
aiter/ops: add gfx to runtime GEMM dispatch lookup keys
eppaneamd Mar 27, 2026
20dffe5
aiter/utility: add gfx to GemmCommonTuner key and tune result output
eppaneamd Mar 27, 2026
ab6928b
csrc, gradlib: add gfx to all GEMM tuner output keys
eppaneamd Mar 27, 2026
4e87d24
op_tests: fix is_shape_tuned to filter by (gfx, cu_num)
eppaneamd Mar 29, 2026
77589e5
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 1, 2026
220da49
fix(configs): resolve model_configs merge conflicts and add gfx column
eppaneamd Apr 1, 2026
f4cee12
op_tests: add CSV input, output saving, and stable iter counts to a8w…
eppaneamd Apr 1, 2026
6a18cd6
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 7, 2026
c5ff46b
fix(merge): resolve conflict in gemm_op_a4w4.py after main sync
eppaneamd Apr 7, 2026
eabb402
fix(configs): add missing gfx column to dsv3 model_configs overrides
eppaneamd Apr 7, 2026
f9cde71
op_tests: add bpreshuffle-csv entry point and skip_ck flag to test_ge…
eppaneamd Apr 7, 2026
8adf08a
op_tests: add gfx filter unit tests and repro CSVs for both GEMM modules
eppaneamd Apr 7, 2026
dc3a66e
fix(ck_gemm): key C++ dispatch map by (cu_num,M,N,K) to prevent multi…
eppaneamd Apr 7, 2026
202b1e0
op_tests/configs/gemm_codegen_gfx_filter.csv
eppaneamd Apr 7, 2026
847094c
chip_info: split arch constants and env-only build targets into torch…
eppaneamd Apr 7, 2026
cdd506a
op_tests: fix repro CSV gfx942/304 kernels to be valid for M=1 and M=32
eppaneamd Apr 7, 2026
eb99b16
chip_info: use bare import for build_targets to fix build context Mod…
eppaneamd Apr 7, 2026
4065689
docs: add gfx column to tuning CSV examples and update cu_num descrip…
eppaneamd Apr 7, 2026
c4b61eb
lint: apply black formatting and fix ruff violations in modified files
eppaneamd Apr 7, 2026
f91e920
lint: fix black/ruff violations in csrc gen_instances and gradlib
eppaneamd Apr 7, 2026
6ead964
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 7, 2026
309b1d1
fix(gemm_op_a8w8): eliminate StopIteration risk and use AITER_CONFIGS…
eppaneamd Apr 8, 2026
f9ba652
fix(chip_info): guard kernelId/kernelName lookups with .get() to avoi…
eppaneamd Apr 8, 2026
76679e3
fix(base_tuner): add gfx legacy fallback to if branch of get_retune_g…
eppaneamd Apr 8, 2026
adb7669
docs(test_gemm_codegen): fix comment reference for GFX_CU_NUM_MAP loc…
eppaneamd Apr 8, 2026
2bbbce2
fix(gemm_dispatch_utils): check HIP return codes in get_device_cu_num()
eppaneamd Apr 8, 2026
88b6509
fix(chip_info): add get_gfx_runtime() and fix GPU_ARCHS=native in get…
eppaneamd Apr 8, 2026
27e3dfb
chore(configs): sync dsv3/kimik2 bf16 tuned gemm CSVs with main and a…
eppaneamd Apr 8, 2026
d1ea965
fix(op_tests): use get_gfx_runtime() in GEMM test files for correct a…
eppaneamd Apr 8, 2026
2818d33
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 8, 2026
5850f13
fix(core): self-heal CSV dedup without requiring a re-run
eppaneamd Apr 8, 2026
3a33ebd
fix(chip_info): add shape and arch context to kernelId/kernelName ski…
eppaneamd Apr 8, 2026
b48d4c4
fix(chip_info): use logger.warning instead of print for kernel skip w…
eppaneamd Apr 8, 2026
e4fe685
style(chip_info): fix E402 import order after logger initialization
eppaneamd Apr 8, 2026
dfb6987
fix(gemm_dispatch_utils): initialize device to -1 to clarify output-p…
eppaneamd Apr 9, 2026
26b24bd
test(test_gemm_codegen): fix Section 3 runtime dispatch tests to use …
eppaneamd Apr 9, 2026
c90aca7
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 9, 2026
9d51f55
fix(gemm_op_a8w8): remove duplicate get_gfx_runtime import
eppaneamd Apr 9, 2026
5320347
docs(chip_info): fix build_tune_dict docstring for kernels_by_name fa…
eppaneamd Apr 9, 2026
e1b8a38
fix(gemm): extend C++ dispatch key with gfx arch string — (cu_num,M,N…
eppaneamd Apr 9, 2026
380495f
style(chip_info, test_gemm_codegen): apply black/ruff formatting
eppaneamd Apr 9, 2026
588beed
feat: add PRETUNE_MODULES build flag to auto-tune GEMM shapes on live…
eppaneamd Apr 9, 2026
da178d6
feat(pretune): add run_tune_direct() and CLI for standalone retuning …
eppaneamd Apr 9, 2026
b9208cd
refactor(pretune): remove run_tune_direct wrapper, add input validati…
eppaneamd Apr 9, 2026
6c50025
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 9, 2026
2a44d97
fix(pretune): suppress ruff F841/E402 false positives on eval-scope v…
eppaneamd Apr 9, 2026
03103d9
refactor(pretune): extract _parse_module_list, fix silent skip of uns…
eppaneamd Apr 9, 2026
8012d1e
docs(pretune, setup, test_gemm_codegen): fix stale docstrings and add…
eppaneamd Apr 9, 2026
247bd12
fix(pretune): write tuned results to source CSV, not ephemeral /tmp; …
eppaneamd Apr 9, 2026
e272d4a
setup.py: import pretune directly to avoid premature aiter package init
eppaneamd Apr 10, 2026
4cdbc5c
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 10, 2026
f7f748d
pretune: add warmup API — check_tuning_coverage, warn_if_undertuned, …
eppaneamd Apr 10, 2026
c6b65b4
pretune: tune only missing model shapes in warmup(), not full CSV
eppaneamd Apr 10, 2026
a5ffeaa
fix(pretune): remove vLLM-specific env var hint from warmup() warning
eppaneamd Apr 10, 2026
7ac49a5
revert: remove warmup API from pretune.py
eppaneamd Apr 15, 2026
7311b49
Merge remote-tracking branch 'origin/main' into fix/gemm_codegen_gfx_…
eppaneamd Apr 15, 2026
4ab4ffd
fix(tuners): clear module-level CSV caches in _clear_op_caches
eppaneamd Apr 15, 2026
c0b819d
fix(build): add _parse_gpu_archs_env()
eppaneamd Apr 15, 2026
f0bc33f
fix(docs/tests): docstring accuracy, test coverage, and gfx-aware dedup
eppaneamd Apr 15, 2026
2434fb4
fix(tests): route aiter logger to stdout in test_pretune to fix warni…
eppaneamd Apr 15, 2026
41b642e
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 15, 2026
5c4e4a9
fix(gemm_dispatch_utils): cache cu_num and gfx per device ID via Sync…
eppaneamd Apr 15, 2026
06afd00
tuning: use get_gfx_runtime() in tuner imports so live GPU arch is us…
eppaneamd Apr 15, 2026
00ad5f7
fix(configs): add missing gfx column to bf16 model_configs CSVs intro…
eppaneamd Apr 15, 2026
f8c3316
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 15, 2026
5cd0539
raise error when having duplicate shape entries
yzhou103 Apr 16, 2026
5f9c982
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
yzhou103 Apr 16, 2026
916a114
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
gyohuangxin Apr 16, 2026
77391c7
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
gyohuangxin Apr 16, 2026
852f4e4
fix(configs): remove duplicate shape entries from a8w8_blockscale_bpr…
eppaneamd Apr 16, 2026
1382d80
resolve duplicated shapes
yzhou103 Apr 16, 2026
2bd5b8b
Merge branch 'main' into fix/gemm_codegen_gfx_build_targets
eppaneamd Apr 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,942 changes: 1,471 additions & 1,471 deletions aiter/configs/a4w4_blockscale_tuned_gemm.csv

Large diffs are not rendered by default.

118 changes: 59 additions & 59 deletions aiter/configs/a8w8_blockscale_bpreshuffle_tuned_gemm.csv

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions aiter/configs/a8w8_blockscale_tuned_gemm.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cu_num,M,N,K,libtype,kernelId,splitK,us,kernelName,tflops,bw,errRatio
256,8192,512,7168,ck,0,0,64.1614,a8w8_blockscale_1x128x128_256x128x128x128_16x16_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8_1x1_intrawave_v3,937.16,1103.14,0.0
256,16384,512,7168,cktile,11,0,98.713,a8w8_blockscale_cktile_192x256x128_4x2x1_16x16x128_intrawave_0x1x0_1,1218.27,1396.85,0.0
256,20480,512,7168,cktile,27,0,95.1492,a8w8_blockscale_cktile_192x256x128_4x2x1_16x16x128_intrawave_0x1x0_3,1579.88,1801.82,0.0
256,128,1024,4096,ck,8,0,13.7599,a8w8_blockscale_1x128x128_256x16x64x256_16x16_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4_1x1_intrawave_v1,78.03,361.97,0.0
256,128,4096,1280,ck,7,0,7.4194,a8w8_blockscale_1x128x128_256x16x128x256_16x16_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8_1x2_intrawave_v1,180.9,870.06,0.0
gfx,cu_num,M,N,K,libtype,kernelId,splitK,us,kernelName,tflops,bw,errRatio
gfx950,256,8192,512,7168,ck,0,0,64.1614,a8w8_blockscale_1x128x128_256x128x128x128_16x16_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8_1x1_intrawave_v3,937.16,1103.14,0.0
gfx950,256,16384,512,7168,cktile,11,0,98.713,a8w8_blockscale_cktile_192x256x128_4x2x1_16x16x128_intrawave_0x1x0_1,1218.27,1396.85,0.0
gfx950,256,20480,512,7168,cktile,27,0,95.1492,a8w8_blockscale_cktile_192x256x128_4x2x1_16x16x128_intrawave_0x1x0_3,1579.88,1801.82,0.0
gfx950,256,128,1024,4096,ck,8,0,13.7599,a8w8_blockscale_1x128x128_256x16x64x256_16x16_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4_1x1_intrawave_v1,78.03,361.97,0.0
gfx950,256,128,4096,1280,ck,7,0,7.4194,a8w8_blockscale_1x128x128_256x16x128x256_16x16_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8_1x2_intrawave_v1,180.9,870.06,0.0
1,102 changes: 551 additions & 551 deletions aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv

Large diffs are not rendered by default.

54 changes: 27 additions & 27 deletions aiter/configs/a8w8_tuned_batched_gemm.csv
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
cu_num,B,M,N,K,kernelId,splitK,us,kernelName,tflops,bw,errRatio
304,16,32,1280,8192,28,0,68.9821,a8w8_batched_rowwise_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3,155.6551,5004.8295,0.0
304,16,64,1280,8192,21,0,74.9374,a8w8_batched_rowwise_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3,286.5703,4736.5264,0.0
304,16,128,1280,8192,41,0,111.2581,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5,386.0364,3364.6236,0.0
304,16,192,1280,8192,11,0,136.9273,a8w8_batched_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3,470.5016,2875.5426,0.0
304,16,256,1280,8192,11,0,150.6582,a8w8_batched_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3,570.1604,2742.2267,0.0
304,16,320,1280,8192,41,0,194.5238,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5,551.9848,2223.5716,0.0
304,16,512,1280,8192,4,0,235.9793,a8w8_batched_rowwise_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3,728.0244,2079.5619,0.0
304,16,1024,1280,8192,4,0,457.3867,a8w8_batched_rowwise_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3,751.2186,1412.2029,0.0
304,16,2048,1280,8192,13,0,831.9798,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,825.9753,1149.4285,0.0
304,16,4096,1280,8192,39,0,1490.3195,a8w8_batched_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3,922.2113,1058.2015,0.0
304,16,8192,1280,8192,1,0,2894.8037,a8w8_batched_rowwise_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,949.5563,973.6661,0.0
304,16,16384,1280,8192,1,0,5696.639,a8w8_batched_rowwise_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,965.0529,930.6541,0.0
304,16,1,8192,1024,78,0,37.703,a8w8_batched_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2,7.1197,7127.5593,0.0
304,16,32,8192,1024,62,0,46.8522,a8w8_batched_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2,183.3411,5930.8344,0.0
304,16,64,8192,1024,47,0,56.4451,a8w8_batched_rowwise_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,304.3642,5090.0756,0.0
304,16,128,8192,1024,13,0,78.8949,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,435.5128,3880.9124,0.0
304,16,192,8192,1024,39,0,113.2351,a8w8_batched_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3,455.1558,2870.6519,0.0
304,16,256,8192,1024,13,0,127.391,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,539.4375,2699.8212,0.0
304,16,320,8192,1024,13,0,172.9103,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,496.7856,2098.2399,0.0
304,16,512,8192,1024,13,0,229.5169,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,598.8184,1827.4489,0.0
304,16,1024,8192,1024,13,0,426.5342,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,644.4452,1337.3496,0.0
304,16,2048,8192,1024,13,0,823.4174,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,667.6514,1059.5055,0.0
304,16,4096,8192,1024,1,0,1583.6971,a8w8_batched_rowwise_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,694.2689,932.2458,0.0
304,16,8192,8192,1024,13,0,3131.9626,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,702.1231,857.0838,0.0
304,16,16384,8192,1024,1,0,6094.2926,a8w8_batched_rowwise_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,721.6665,836.8935,0.0
80,16,1,1280,8192,78,0,86.7259,a8w8_batched_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2,3.869,3872.5159,0.0
gfx,cu_num,B,M,N,K,kernelId,splitK,us,kernelName,tflops,bw,errRatio
gfx942,304,16,32,1280,8192,28,0,68.9821,a8w8_batched_rowwise_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3,155.6551,5004.8295,0.0
gfx942,304,16,64,1280,8192,21,0,74.9374,a8w8_batched_rowwise_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3,286.5703,4736.5264,0.0
gfx942,304,16,128,1280,8192,41,0,111.2581,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5,386.0364,3364.6236,0.0
gfx942,304,16,192,1280,8192,11,0,136.9273,a8w8_batched_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3,470.5016,2875.5426,0.0
gfx942,304,16,256,1280,8192,11,0,150.6582,a8w8_batched_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3,570.1604,2742.2267,0.0
gfx942,304,16,320,1280,8192,41,0,194.5238,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5,551.9848,2223.5716,0.0
gfx942,304,16,512,1280,8192,4,0,235.9793,a8w8_batched_rowwise_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3,728.0244,2079.5619,0.0
gfx942,304,16,1024,1280,8192,4,0,457.3867,a8w8_batched_rowwise_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3,751.2186,1412.2029,0.0
gfx942,304,16,2048,1280,8192,13,0,831.9798,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,825.9753,1149.4285,0.0
gfx942,304,16,4096,1280,8192,39,0,1490.3195,a8w8_batched_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3,922.2113,1058.2015,0.0
gfx942,304,16,8192,1280,8192,1,0,2894.8037,a8w8_batched_rowwise_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,949.5563,973.6661,0.0
gfx942,304,16,16384,1280,8192,1,0,5696.639,a8w8_batched_rowwise_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,965.0529,930.6541,0.0
gfx942,304,16,1,8192,1024,78,0,37.703,a8w8_batched_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2,7.1197,7127.5593,0.0
gfx942,304,16,32,8192,1024,62,0,46.8522,a8w8_batched_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2,183.3411,5930.8344,0.0
gfx942,304,16,64,8192,1024,47,0,56.4451,a8w8_batched_rowwise_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,304.3642,5090.0756,0.0
gfx942,304,16,128,8192,1024,13,0,78.8949,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,435.5128,3880.9124,0.0
gfx942,304,16,192,8192,1024,39,0,113.2351,a8w8_batched_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3,455.1558,2870.6519,0.0
gfx942,304,16,256,8192,1024,13,0,127.391,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,539.4375,2699.8212,0.0
gfx942,304,16,320,8192,1024,13,0,172.9103,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,496.7856,2098.2399,0.0
gfx942,304,16,512,8192,1024,13,0,229.5169,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,598.8184,1827.4489,0.0
gfx942,304,16,1024,8192,1024,13,0,426.5342,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,644.4452,1337.3496,0.0
gfx942,304,16,2048,8192,1024,13,0,823.4174,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,667.6514,1059.5055,0.0
gfx942,304,16,4096,8192,1024,1,0,1583.6971,a8w8_batched_rowwise_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,694.2689,932.2458,0.0
gfx942,304,16,8192,8192,1024,13,0,3131.9626,a8w8_batched_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,702.1231,857.0838,0.0
gfx942,304,16,16384,8192,1024,1,0,6094.2926,a8w8_batched_rowwise_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3,721.6665,836.8935,0.0
gfx942,80,16,1,1280,8192,78,0,86.7259,a8w8_batched_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2,3.869,3872.5159,0.0
Loading
Loading