Skip to content

[rocroller] Fix ASan build when using Ninja instead of Make#1111

Merged
jaopaulolc merged 3 commits into
developfrom
users/jolabega/fix-asan-build
Aug 20, 2025
Merged

[rocroller] Fix ASan build when using Ninja instead of Make#1111
jaopaulolc merged 3 commits into
developfrom
users/jolabega/fix-asan-build

Conversation

@jaopaulolc
Copy link
Copy Markdown
Contributor

@jaopaulolc jaopaulolc commented Aug 7, 2025

Short story:

  • Fixes a typo in test/CMakeLists.txt which prevent ASan build from being configured (cmake --preset asan).
  • Allows ASan builds using Ninja instead of Make (Default) by avoiding typeinfo symbol for rocRoller::Error from being undefined due to conflicting definitions in objects compiled w/o RTTI.

Long story:

After fixing a typo in test/CMakeLists.txt ASan build would fail when using Ninja instead of Make (RR default) due to linking errors of undefined typeinfo symbol for rocRoller::Error. Such symbol was undefined because objects compiled with -fno-rtti and that use Error.hpp had a WEAK definition for said symbol and these objects -- e.g. InProcessAssembler.cpp.o and FastDivision.cpp.o -- were ordered first in the link command invocation.

(It is not clear to me why CMake order these objects differently when using Make & Ninja, but my suspicion is that they do because when Make is used the paths are wrapped with quotes and thus they are ordered after paths that start with [a-zA-Z].)

I verified that both Ninja & Make execute exactly the same commands with same flags, except for the ordering mentioned above. I have also verified that manually changing the order produced by CMake when using Ninja resolves the linking issues.

Since both InProcessAssembler.cpp & FastDivision.cpp do not use LLVM headers anymore, then they don't need to be compiled with RTTI disabled (335a3a7).

Once Comgr is used instead of LLVM in ReadELF.cpp, then all logic to build objects with RTTI disabled can be safely removed and similar linking issues can be avoided.

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

Generated Documentation

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

ROCmMathLibrariesBot commented Aug 7, 2025

Code Coverage Report for gfx942

Summary

Type Total Missed Master Missed Missed Change Coverage Master Coverage Coverage Change
Lines 51434 7710 7710 0 85.01% 85.01% 0%
Functions 4966 831 831 0 83.27% 83.27% 0%
Regions 32317 7913 7913 0 75.51% 75.51% 0%
Branches 18096 4820 4820 0 73.36% 73.36% 0%

Artifacts

Commit Hashes

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

CodeQL report

Results Summary

Full table of results
Tool Severity Code Location Line

Links

  • HTML
  • Sarif (for download and usage in conjunction with SARIF viewers)

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

ROCmMathLibrariesBot commented Aug 7, 2025

Performance Report for gfx950-perf

Results

@@            Significant (p-val <0.05) Performance Diffs            @@
====================================================================================================
+   0.44% | p=2.5635e-58 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 16, wave_n: 16, wave_k: 128, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: False, betaInFma: True, direct2LDS_A: True, direct2LDS_B: True, scheduler: Priority, prefetch: True, prefetchInFlight: 4, prefetchLDSFactor: 1, prefetchMixMemOps: True, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: True, prefetchScale: True, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.16% | p=3.5139e-15 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: False, betaInFma: True, direct2LDS_A: True, direct2LDS_B: True, scheduler: Priority, prefetch: True, prefetchInFlight: 4, prefetchLDSFactor: 1, prefetchMixMemOps: True, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: True, prefetchScale: True, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.06% | p=1.7418e-03 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: 0, workgroupMappingValue: -1)| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 16, wave_n: 16, wave_k: 128, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: False, betaInFma: True, direct2LDS_A: True, direct2LDS_B: True, scheduler: Priority, prefetch: True, prefetchInFlight: 4, prefetchLDSFactor: 1, prefetchMixMemOps: True, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: True, prefetchScale: True, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=0, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.06% | p=5.5589e-03 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: 0, workgroupMappingValue: -1)| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: False, betaInFma: True, direct2LDS_A: True, direct2LDS_B: True, scheduler: Priority, prefetch: True, prefetchInFlight: 4, prefetchLDSFactor: 1, prefetchMixMemOps: True, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: True, prefetchScale: True, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=0, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.24% | p=1.6058e-34 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': True, 'scaleShuffleTileA': [64, 4, 2], 'scaleShuffleTileB': [64, 4, 2]}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: 0, workgroupMappingValue: -1)| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: False, betaInFma: True, direct2LDS_A: True, direct2LDS_B: True, scheduler: Priority, prefetch: True, prefetchInFlight: 4, prefetchLDSFactor: 1, prefetchMixMemOps: True, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: True, prefetchScale: True, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': True, 'scaleShuffleTileA': [64, 4, 2], 'scaleShuffleTileB': [64, 4, 2]}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=0, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.36% | p=1.2777e-45 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0.5, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 2, prefetchMixMemOps: False, loadLDSScale_A: True, loadLDSScale_B: True, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.56% | p=1.5045e-134 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0.5, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: True, direct2LDS_B: True, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 1, prefetchMixMemOps: False, loadLDSScale_A: True, loadLDSScale_B: True, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=1, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.25% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: -1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.81% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.88% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.13% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 128, wave_m: 16, wave_n: 16, wave_k: 128, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.29% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.43% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 128, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.08% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.43% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 64, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.48% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 128, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.53% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 128, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.41% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 128, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.71% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.21% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 256, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.35% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.20% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.39% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.13% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.61% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 8448, N: 8448, K: 128, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Cooperative, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   3.03% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: comments)| CodeGen() | CodeGen(instCount: 40000, instructions: comments)
+   1.77% | p=1.7451e-03 
	| CodeGen(instCount: 40000, instructions: simple_mi)| CodeGen() | CodeGen(instCount: 40000, instructions: simple_mi)
Full table of results
Problem Median Diff % Moods p-val Gen A (ns) Gen B (ns)
CodeGen(instCount: 40000, instructions: comments) -3.03% 5.6994e-05 0 0
CodeGen(instCount: 40000, instructions: complex_mi_with_coop) -0.06% 1.0000e+00 0 0
CodeGen(instCount: 40000, instructions: simple_mi) -1.77% 1.7451e-03 0 0
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.12% 1.0000e+00 990,307,481 959,456,876
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.57% 1.0000e+00 996,656,516 963,955,733
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.00% 1.0000e+00 920,010,339 933,291,344
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -2.14% 6.5472e-01 980,431,985 962,186,733
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.08% 1.0000e+00 974,228,216 960,338,475
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.66% 1.0000e+00 923,396,184 914,901,454
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.60% 1.0000e+00 787,686,921 783,228,200
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.06% 1.0000e+00 4,056,225,059 4,103,927,899
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.61% 1.0000e+00 793,785,677 783,021,509
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.45% 6.5472e-01 4,110,428,982 4,044,725,759
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.35% 6.5472e-01 1,093,050,174 1,093,100,778
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.77% 6.5472e-01 2,419,724,510 2,396,895,349
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.30% 1.0000e+00 965,531,960 955,102,600
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.85% 1.0000e+00 977,987,644 968,203,095
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 8.84% 6.5472e-01 941,598,714 919,409,545
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 6.68% 6.5472e-01 966,847,277 965,469,158
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.56% 1.0000e+00 1,013,564,688 961,184,484
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.87% 1.0000e+00 924,345,537 920,107,154
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.44% 2.5635e-58 28,816,145,273 29,516,126,848
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.16% 3.5139e-15 7,252,703,317 7,347,129,160
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=0, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.06% 1.7418e-03 51,030,122,402 51,652,580,199
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=0, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.06% 5.5589e-03 28,119,878,798 28,165,033,868
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': True, 'scaleShuffleTileA': [64, 4, 2], 'scaleShuffleTileB': [64, 4, 2]}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=0, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.24% 1.6058e-34 28,134,254,424 28,363,973,230
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': True, 'scaleShuffleTileA': [64, 4, 4], 'scaleShuffleTileB': [64, 4, 4]}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=0, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.00% 9.6433e-01 53,515,382,722 53,645,866,934
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.36% 1.2777e-45 5,002,024,574 5,191,736,972
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=1, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.56% 1.5045e-134 5,434,036,095 5,476,880,372
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=0, workgroupMappingValue=-1)GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=1, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.02% 3.7108e-01 26,345,029,350 26,585,651,640
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.02% 1.0000e+00 559,915,667 561,507,092
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.28% 6.5472e-01 494,097,161 495,623,988
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.25% 2.5347e-02 580,114,534 582,735,759
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.98% 1.7971e-01 511,012,013 506,754,495
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.36% 1.7971e-01 540,358,699 536,243,382
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.08% 1.7971e-01 473,883,445 474,247,005
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.04% 1.0000e+00 560,017,696 563,826,253
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.03% 1.0000e+00 492,893,562 491,244,641
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.81% 2.5347e-02 563,937,560 557,557,765
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.10% 6.5472e-01 483,568,615 481,481,546
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.88% 2.5347e-02 563,472,787 569,380,387
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.13% 2.5347e-02 494,047,433 501,308,370
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.80% 6.5472e-01 539,307,146 542,010,994
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.34% 6.5472e-01 469,442,697 473,017,546
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.08% 1.0000e+00 553,589,001 555,756,426
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.07% 6.5472e-01 495,524,368 485,271,328
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.18% 6.5472e-01 554,599,965 566,560,939
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.82% 6.5472e-01 485,148,229 481,330,642
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.14% 6.5472e-01 563,981,660 561,611,791
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.69% 6.5472e-01 495,443,170 490,367,472
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.07% 6.5472e-01 536,791,878 536,407,771
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.47% 6.5472e-01 474,324,758 467,981,442
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.00% 1.0000e+00 549,092,045 548,215,964
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.23% 1.0000e+00 483,455,108 481,804,475
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.41% 1.7971e-01 561,941,453 564,067,748
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.62% 6.5472e-01 500,206,161 492,165,189
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 580,736,564 584,345,791
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.53% 6.5472e-01 514,684,893 511,398,364
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.28% 6.5472e-01 540,981,328 542,479,058
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.25% 6.5472e-01 480,052,823 471,061,370
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.49% 1.0000e+00 559,396,115 571,386,947
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.49% 6.5472e-01 496,833,647 493,383,248
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.61% 1.0000e+00 564,099,047 554,140,698
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.89% 1.7971e-01 483,714,933 484,179,260
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.51% 1.0000e+00 591,970,047 569,340,026
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.33% 1.0000e+00 497,681,563 492,796,557
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.29% 1.7451e-03 557,145,411 539,064,320
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.22% 6.5472e-01 479,812,237 471,098,938
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.15% 6.5472e-01 565,489,225 552,052,295
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.04% 1.0000e+00 484,903,574 511,269,511
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.32% 6.5472e-01 1,312,406,846 1,321,275,899
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.02% 1.0000e+00 1,342,984,433 1,315,703,608
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.56% 6.5472e-01 594,210,421 594,426,709
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.53% 6.5472e-01 1,437,188,646 1,416,762,880
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.69% 6.5472e-01 1,328,488,530 1,309,831,679
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.62% 1.0000e+00 1,359,433,281 1,315,398,507
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.03% 1.0000e+00 589,035,778 590,982,890
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.21% 6.5472e-01 1,041,089,319 1,040,403,997
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.10% 1.0000e+00 2,031,798,371 1,989,697,005
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.49% 1.7971e-01 3,988,593,507 4,085,827,174
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.06% 1.0000e+00 1,437,249,439 1,411,096,446
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.36% 6.5472e-01 1,426,283,212 1,416,957,722
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.28% 1.0000e+00 1,418,040,977 1,418,214,787
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.24% 1.0000e+00 1,331,970,232 1,328,708,542
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.93% 6.5472e-01 651,233,925 639,743,888
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.03% 6.5472e-01 548,845,971 541,225,463
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.17% 1.0000e+00 590,983,977 595,909,631
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.43% 1.7971e-01 1,090,524,492 1,083,270,394
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.05% 6.5472e-01 2,011,192,743 2,008,954,645
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.00% 1.0000e+00 1,240,370,439 1,220,200,852
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.43% 2.5347e-02 2,367,375,155 2,399,889,862
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.10% 1.7971e-01 1,577,594,952 1,577,447,738
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.01% 1.0000e+00 1,070,284,649 1,049,105,857
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.49% 6.5472e-01 2,007,928,959 2,018,499,003
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 4,009,138,291 4,001,669,149
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.61% 6.5472e-01 1,397,140,321 1,365,433,871
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.29% 6.5472e-01 1,444,146,196 1,403,108,542
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.07% 1.0000e+00 1,428,971,896 1,415,944,642
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.47% 6.5472e-01 1,370,539,829 1,336,210,308
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.86% 6.5472e-01 2,465,134,024 2,387,600,166
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.25% 1.0000e+00 3,922,079,932 3,901,102,372
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.05% 1.0000e+00 667,360,078 663,876,042
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.08% 5.6994e-05 1,285,467,341 1,274,471,778
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.17% 6.5472e-01 764,091,631 727,907,957
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.36% 1.7971e-01 1,430,760,900 1,410,276,445
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.04% 6.5472e-01 880,697,638 880,880,215
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.43% 2.5347e-02 1,758,515,130 1,747,983,014
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.48% 2.5347e-02 684,495,431 680,701,280
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.53% 1.7451e-03 1,315,029,160 1,316,034,166
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.41% 1.7451e-03 752,812,397 759,470,020
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.20% 1.7971e-01 1,459,070,944 1,458,670,470
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.13% 1.7971e-01 897,299,980 902,261,552
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.11% 6.5472e-01 1,854,296,055 1,797,326,810
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.71% 1.7451e-03 1,035,318,221 1,027,648,977
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -4.63% 1.0000e+00 1,981,901,318 1,954,096,184
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.05% 1.0000e+00 1,190,689,379 1,218,113,453
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.21% 2.5347e-02 2,311,027,021 2,303,084,284
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.47% 1.7971e-01 1,571,995,127 1,573,683,665
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.27% 1.7971e-01 612,877,613 620,897,999
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.35% 1.7451e-03 1,108,491,499 1,081,640,399
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.20% 1.7451e-03 611,042,502 608,304,270
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.39% 5.6994e-05 1,153,077,816 1,146,495,680
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.02% 1.0000e+00 672,277,815 679,434,248
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.13% 2.5347e-02 1,284,412,412 1,276,306,703
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 10.31% 6.5472e-01 656,678,886 638,532,479
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.61% 2.5347e-02 1,362,608,121 1,366,384,841
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.25% 1.7971e-01 1,362,140,548 1,363,833,741
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.37% 1.7971e-01 1,373,022,825 1,275,461,229
Links

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

ROCmMathLibrariesBot commented Aug 8, 2025

Performance Report for gfx12

Results

@@            Significant (p-val <0.05) Performance Diffs            @@
====================================================================================================
+   0.95% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 16, wave_n: 16, wave_k: 16, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Cooperative, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='')
+   0.35% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 16, wave_n: 16, wave_k: 16, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Sequential, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='')
+   0.45% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 16, wave_n: 16, wave_k: 16, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Cooperative, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='')
-   0.94% | p=2.5347e-02 
	| CodeGen(instCount: 40000, instructions: comments)| CodeGen() | CodeGen(instCount: 40000, instructions: comments)
+   2.89% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: simple_mi)| CodeGen() | CodeGen(instCount: 40000, instructions: simple_mi)
Full table of results
Problem Median Diff % Moods p-val Gen A (ns) Gen B (ns)
CodeGen(instCount: 40000, instructions: comments) 0.94% 2.5347e-02 0 0
CodeGen(instCount: 40000, instructions: complex_mi_with_coop) 3.20% 1.7971e-01 0 0
CodeGen(instCount: 40000, instructions: simple_mi) -2.89% 5.6994e-05 0 0
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.01% 1.0000e+00 553,729,688 535,884,229
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.10% 6.5472e-01 560,522,909 564,405,079
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.00% 1.0000e+00 537,814,503 529,703,238
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -1.67% 6.5472e-01 561,865,760 550,199,652
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 1.66% 6.5472e-01 579,670,232 562,759,890
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.65% 6.5472e-01 544,655,355 568,645,113
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.04% 1.0000e+00 532,816,631 547,624,114
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.07% 1.0000e+00 543,221,095 518,814,055
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.09% 6.5472e-01 523,797,982 526,927,120
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.95% 5.6994e-05 564,540,761 541,609,901
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.03% 1.0000e+00 543,176,048 536,601,821
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.35% 2.5347e-02 544,084,975 528,690,388
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.02% 1.0000e+00 537,927,102 536,235,289
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.03% 1.0000e+00 558,970,619 561,211,817
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 6.5472e-01 540,205,970 530,614,441
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.04% 1.0000e+00 589,050,589 603,418,327
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.02% 1.0000e+00 579,637,423 574,876,289
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.21% 6.5472e-01 564,123,484 584,218,813
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.11% 6.5472e-01 510,679,960 515,187,392
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 504,387,295 495,337,618
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 503,216,976 509,044,342
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.10% 6.5472e-01 552,393,895 557,929,074
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.04% 6.5472e-01 555,400,459 538,412,890
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.22% 1.7971e-01 534,709,387 547,335,945
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.03% 6.5472e-01 544,204,333 541,185,317
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.06% 1.0000e+00 566,173,283 558,635,514
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.23% 6.5472e-01 527,408,543 547,480,889
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.10% 1.7971e-01 580,946,355 601,543,682
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.02% 1.0000e+00 598,910,916 571,012,877
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.03% 1.0000e+00 580,238,060 555,756,940
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.23% 6.5472e-01 517,923,646 492,977,712
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.08% 6.5472e-01 520,860,860 495,134,561
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.04% 6.5472e-01 495,100,675 504,643,140
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.08% 1.0000e+00 544,868,765 561,341,366
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.17% 6.5472e-01 554,386,099 545,433,350
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.00% 1.0000e+00 528,889,315 522,293,498
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.03% 1.0000e+00 552,075,436 533,356,070
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.06% 6.5472e-01 559,823,915 573,103,958
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.13% 6.5472e-01 536,599,341 546,428,803
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 6.5472e-01 578,853,071 579,152,949
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.24% 6.5472e-01 591,716,686 569,889,604
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.03% 6.5472e-01 588,818,400 584,454,188
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 6.5472e-01 521,347,406 522,465,641
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.27% 6.5472e-01 518,234,793 501,957,624
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.14% 1.7971e-01 494,537,185 513,990,185
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.10% 6.5472e-01 552,418,192 562,609,498
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.07% 6.5472e-01 553,546,816 535,952,412
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 6.5472e-01 531,371,676 548,784,799
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.04% 6.5472e-01 551,722,763 534,888,328
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.09% 1.0000e+00 559,296,440 562,323,271
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.06% 1.0000e+00 530,304,094 518,602,429
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.06% 6.5472e-01 586,353,664 568,732,585
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.06% 6.5472e-01 583,255,935 568,885,591
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.08% 1.7971e-01 573,344,761 584,342,780
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.09% 6.5472e-01 517,328,669 503,724,548
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.07% 1.0000e+00 520,876,744 491,165,901
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.11% 1.7971e-01 496,945,945 512,855,034
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.09% 6.5472e-01 540,393,529 563,549,695
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.08% 6.5472e-01 551,791,511 530,365,072
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.06% 1.0000e+00 539,627,689 522,768,704
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.16% 6.5472e-01 530,253,438 522,628,073
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.03% 1.0000e+00 548,641,284 522,575,550
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 1.7971e-01 539,075,994 521,150,269
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -1.28% 6.5472e-01 552,402,488 543,924,189
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 1.68% 6.5472e-01 549,720,059 541,433,151
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 1.16% 6.5472e-01 536,331,782 526,592,590
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.10% 6.5472e-01 520,332,437 530,333,267
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.05% 6.5472e-01 526,367,688 501,624,021
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.13% 1.7971e-01 515,641,791 515,025,065
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.53% 1.7971e-01 529,463,177 551,090,198
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.02% 1.0000e+00 535,562,866 523,553,248
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.22% 6.5472e-01 519,466,135 538,819,642
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.34% 6.5472e-01 545,460,627 528,564,078
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.13% 1.0000e+00 554,296,555 533,711,993
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.06% 1.0000e+00 520,562,944 545,041,322
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.89% 1.0000e+00 563,513,813 546,060,755
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.88% 6.5472e-01 575,538,636 565,177,183
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -2.89% 6.5472e-01 537,752,132 555,377,048
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.24% 1.7971e-01 515,414,184 529,919,924
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.11% 6.5472e-01 524,905,566 511,375,306
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 513,722,188 521,274,322
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.45% 2.5347e-02 530,486,248 530,440,989
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.32% 1.7971e-01 551,981,687 524,108,164
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.30% 1.7971e-01 524,903,579 525,895,445
Links

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

ROCmMathLibrariesBot commented Aug 8, 2025

Performance Report for gfx942

Results

@@            Significant (p-val <0.05) Performance Diffs            @@
====================================================================================================
+   3.84% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 1024, N: 50304, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 128, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Cooperative, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   7.23% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 1024, N: 50304, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 128, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Sequential, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   4.89% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 4, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   4.18% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 4, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.57% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 64, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: True, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.27% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 128, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   2.27% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 128, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: True, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.75% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 128, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.00% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 128, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: True, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.11% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: True, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.18% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.49% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: True, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   7.32% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   3.96% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 8448, N: 8448, K: 128, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A: 1, scaleValue_B: 1, workgroupMappingDim: -1, workgroupMappingValue: -1)| GEMM(mac_m: 128, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-  14.46% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: comments)| CodeGen() | CodeGen(instCount: 40000, instructions: comments)
+   3.20% | p=2.5347e-02 
	| CodeGen(instCount: 40000, instructions: complex_mi_with_coop)| CodeGen() | CodeGen(instCount: 40000, instructions: complex_mi_with_coop)
Full table of results
Problem Median Diff % Moods p-val Gen A (ns) Gen B (ns)
CodeGen(instCount: 40000, instructions: comments) 14.46% 5.6994e-05 0 0
CodeGen(instCount: 40000, instructions: complex_mi_with_coop) -3.20% 2.5347e-02 0 0
CodeGen(instCount: 40000, instructions: simple_mi) -5.82% 6.5472e-01 0 0
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.98% 6.5472e-01 1,711,387,243 1,912,149,198
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.66% 6.5472e-01 1,768,797,908 2,080,163,191
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.64% 6.5472e-01 1,659,724,118 1,687,510,271
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -3.84% 2.5347e-02 1,732,510,719 1,574,651,591
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.85% 6.5472e-01 1,769,205,971 1,818,593,348
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 7.23% 1.7451e-03 1,660,263,734 1,693,059,602
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -3.50% 6.5472e-01 1,581,484,505 1,394,553,897
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 7,476,779,650 7,078,521,075
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.10% 1.0000e+00 1,246,449,916 1,498,906,875
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.07% 6.5472e-01 6,273,508,789 7,564,085,681
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.06% 6.5472e-01 1,890,089,233 1,721,487,184
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.07% 1.0000e+00 3,864,056,090 3,757,396,477
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 3.00% 1.0000e+00 1,718,410,791 1,644,770,013
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.74% 1.0000e+00 1,809,229,454 2,044,563,321
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.23% 1.0000e+00 1,552,042,022 1,683,226,845
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.92% 1.0000e+00 1,754,990,610 1,630,392,803
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.46% 1.0000e+00 1,799,137,265 1,637,243,479
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.77% 1.0000e+00 1,671,599,041 1,708,282,423
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.73% 1.0000e+00 2,538,999,089 2,548,704,747
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 4.89% 1.7451e-03 2,669,345,273 2,345,895,345
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.40% 6.5472e-01 960,247,371 1,073,039,354
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 3.05% 6.5472e-01 3,052,133,071 2,545,326,455
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.15% 1.7971e-01 2,294,147,484 2,626,662,082
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.66% 6.5472e-01 2,171,377,744 2,524,997,214
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.96% 1.7971e-01 942,769,475 968,743,741
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.59% 6.5472e-01 1,831,798,227 1,987,600,094
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.26% 1.0000e+00 3,564,081,468 3,414,770,511
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.10% 6.5472e-01 6,774,628,640 6,331,214,080
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -7.34% 1.7971e-01 2,389,220,935 2,534,261,885
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 10.27% 1.7971e-01 2,648,450,626 2,644,232,261
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.13% 1.0000e+00 2,342,051,496 2,896,785,709
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -5.12% 6.5472e-01 2,574,120,982 2,545,059,070
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 4.70% 1.0000e+00 1,020,663,285 1,192,491,917
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.02% 1.7971e-01 970,267,988 998,421,318
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.11% 6.5472e-01 973,680,398 1,144,020,705
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.35% 1.7971e-01 1,778,764,689 1,775,068,884
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.03% 6.5472e-01 3,242,243,964 3,644,911,422
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.06% 1.0000e+00 2,142,609,375 2,141,757,467
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.47% 6.5472e-01 3,841,201,529 4,491,285,992
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.07% 1.0000e+00 2,523,193,096 3,091,937,043
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -2.52% 6.5472e-01 1,922,595,747 1,690,482,862
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.33% 6.5472e-01 3,134,719,205 4,049,401,097
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.49% 1.7971e-01 6,879,036,120 6,031,670,021
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.86% 6.5472e-01 2,489,115,006 2,251,042,130
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -6.93% 1.7971e-01 2,645,462,892 2,316,217,215
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.96% 1.7971e-01 2,320,897,184 2,615,023,446
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -4.18% 2.5347e-02 2,265,187,346 2,674,407,153
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.40% 6.5472e-01 3,761,456,719 5,320,291,977
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.47% 6.5472e-01 6,511,829,622 8,200,671,685
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.31% 1.7971e-01 1,140,645,254 1,157,613,951
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.20% 1.7971e-01 1,993,480,544 2,024,895,527
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.04% 1.0000e+00 1,145,422,880 1,182,395,433
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 4.63% 6.5472e-01 2,312,723,998 2,200,228,116
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.12% 6.5472e-01 1,577,470,158 1,458,038,733
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.57% 1.7451e-03 2,839,579,281 3,070,100,126
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.27% 2.5347e-02 1,198,989,196 1,110,520,106
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.27% 5.6994e-05 2,107,350,137 2,536,789,639
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.75% 2.5347e-02 1,163,212,843 1,180,957,804
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.00% 1.7451e-03 2,290,089,020 2,808,663,849
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.05% 1.0000e+00 1,492,196,768 1,421,492,346
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.11% 2.5347e-02 3,048,860,535 3,673,211,722
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.18% 2.5347e-02 1,825,502,331 1,723,076,189
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.06% 1.0000e+00 3,089,720,079 4,058,580,983
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.03% 1.0000e+00 2,151,677,453 1,912,022,895
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.18% 6.5472e-01 3,633,729,375 4,389,103,006
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.10% 1.7971e-01 3,069,560,065 2,966,284,618
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -4.48% 1.7971e-01 1,072,148,917 918,131,430
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.49% 1.7451e-03 1,708,366,037 2,162,269,085
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.10% 1.0000e+00 1,100,829,363 1,002,559,303
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.06% 6.5472e-01 1,879,613,304 2,269,205,349
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -7.32% 2.5347e-02 1,232,689,923 1,107,446,928
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.30% 6.5472e-01 2,000,477,173 2,061,791,115
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.06% 1.0000e+00 1,076,457,842 1,043,979,642
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.28% 6.5472e-01 2,422,176,861 2,520,097,040
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -3.96% 5.6994e-05 2,334,938,512 2,761,367,455
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False, 'scaleShuffleTileA': [], 'scaleShuffleTileB': []}, scaleValue_A=1, scaleValue_B=1, workgroupMappingDim=-1, workgroupMappingValue=-1)GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.34% 1.7971e-01 2,323,816,250 2,351,651,113
Links

Copy link
Copy Markdown
Contributor

@ThanHenderson ThanHenderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird problem with ninja/make. LGTM. Thanks for patching the typo.

@jaopaulolc
Copy link
Copy Markdown
Contributor Author

@ROCm/rocm-math-lib-build-infra please review once you have a chance.

@jaopaulolc jaopaulolc merged commit 47f2752 into develop Aug 20, 2025
16 checks passed
@jaopaulolc jaopaulolc deleted the users/jolabega/fix-asan-build branch August 20, 2025 21:53
assistant-librarian Bot pushed a commit to ROCm/rocRoller that referenced this pull request Aug 20, 2025
[rocroller] Fix ASan build when using Ninja instead of Make
 (#1111)

Short story:

- Fixes a typo in `test/CMakeLists.txt` which prevent ASan build from
being configured (`cmake --preset asan`).
- Allows ASan builds using Ninja instead of Make (Default) by avoiding
`typeinfo` symbol for `rocRoller::Error` from being undefined due to
conflicting definitions in objects compiled w/o RTTI.

Long story:

After fixing a typo in `test/CMakeLists.txt` ASan build would fail when
using Ninja instead of Make (RR default) due to linking errors of
undefined `typeinfo` symbol for `rocRoller::Error`. Such symbol was
undefined because objects compiled with `-fno-rtti` and that use
`Error.hpp` had a WEAK definition for said symbol and these objects --
e.g. `InProcessAssembler.cpp.o` and `FastDivision.cpp.o` -- were ordered
first in the link command invocation.

(It is not clear to me why CMake order these objects differently when
using Make & Ninja, but my suspicion is that they do because when Make
is used the paths are wrapped with quotes and thus they are ordered
after paths that start with `[a-zA-Z]`.)

I verified that both Ninja & Make execute exactly the same commands with
same flags, except for the ordering mentioned above. I have also
verified that manually changing the order produced by CMake when using
Ninja resolves the linking issues.

Since both `InProcessAssembler.cpp` & `FastDivision.cpp` do not use LLVM
headers anymore, then they don't need to be compiled with RTTI disabled
(ROCm/rocm-libraries@335a3a7).

Once Comgr is used instead of LLVM in `ReadELF.cpp`, then all logic to
build objects with RTTI disabled can be safely removed and similar
linking issues can be avoided.
BrianHarrisonAMD pushed a commit that referenced this pull request Mar 25, 2026
…/sphinx (#5767)

Bumps [pyjwt](https://github.com/jpadilla/pyjwt) from 2.10.1 to 2.12.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jpadilla/pyjwt/releases">pyjwt's
releases</a>.</em></p>
<blockquote>
<h2>2.12.0</h2>
<h2>Security</h2>
<ul>
<li>Validate the crit (Critical) Header Parameter defined in RFC 7515
§4.1.11. by <a
href="https://github.com/dmbs335"><code>@​dmbs335</code></a> in <a
href="https://github.com/jpadilla/pyjwt/security/advisories/GHSA-752w-5fwx-jx9f">GHSA-752w-5fwx-jx9f</a></li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1132">jpadilla/pyjwt#1132</a></li>
<li>chore(docs): fix docs build by <a
href="https://github.com/tamird"><code>@​tamird</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1137">jpadilla/pyjwt#1137</a></li>
<li>Annotate PyJWKSet.keys for pyright by <a
href="https://github.com/tamird"><code>@​tamird</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1134">jpadilla/pyjwt#1134</a></li>
<li>fix: close HTTPError to prevent ResourceWarning on Python 3.14 by <a
href="https://github.com/veeceey"><code>@​veeceey</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1133">jpadilla/pyjwt#1133</a></li>
<li>chore: remove superfluous constants by <a
href="https://github.com/tamird"><code>@​tamird</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1136">jpadilla/pyjwt#1136</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1135">jpadilla/pyjwt#1135</a></li>
<li>chore(tests): enable mypy by <a
href="https://github.com/tamird"><code>@​tamird</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1138">jpadilla/pyjwt#1138</a></li>
<li>Bump actions/download-artifact from 7 to 8 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1142">jpadilla/pyjwt#1142</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1141">jpadilla/pyjwt#1141</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1145">jpadilla/pyjwt#1145</a></li>
<li>fix: do not store reference to algorithms dict on PyJWK by <a
href="https://github.com/akx"><code>@​akx</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1143">jpadilla/pyjwt#1143</a></li>
<li>Use PyJWK algorithm when encoding without explicit algorithm by <a
href="https://github.com/jpadilla"><code>@​jpadilla</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1148">jpadilla/pyjwt#1148</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/tamird"><code>@​tamird</code></a> made
their first contribution in <a
href="https://github.com/jpadilla/pyjwt/pull/1137">jpadilla/pyjwt#1137</a></li>
<li><a href="https://github.com/veeceey"><code>@​veeceey</code></a> made
their first contribution in <a
href="https://github.com/jpadilla/pyjwt/pull/1133">jpadilla/pyjwt#1133</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/jpadilla/pyjwt/compare/2.11.0...2.12.0">https://github.com/jpadilla/pyjwt/compare/2.11.0...2.12.0</a></p>
<h2>2.11.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Fixed type error in comment by <a
href="https://github.com/shuhaib-aot"><code>@​shuhaib-aot</code></a> in
<a
href="https://github.com/jpadilla/pyjwt/pull/1026">jpadilla/pyjwt#1026</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1018">jpadilla/pyjwt#1018</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1033">jpadilla/pyjwt#1033</a></li>
<li>Make note of use of leeway with nbf by <a
href="https://github.com/djw8605"><code>@​djw8605</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1034">jpadilla/pyjwt#1034</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1035">jpadilla/pyjwt#1035</a></li>
<li>Fixes <a
href="https://github.com/jpadilla/pyjwt/issues/964">#964</a>:
Validate key against allowed types for Algorithm family by <a
href="https://github.com/pachewise"><code>@​pachewise</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/985">jpadilla/pyjwt#985</a></li>
<li>Feat <a
href="https://github.com/jpadilla/pyjwt/issues/1024">#1024</a>:
Add iterator for PyJWKSet by <a
href="https://github.com/pachewise"><code>@​pachewise</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1041">jpadilla/pyjwt#1041</a></li>
<li>Fixes <a
href="https://github.com/jpadilla/pyjwt/issues/1039">#1039</a>:
Add iss, issuer type checks by <a
href="https://github.com/pachewise"><code>@​pachewise</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1040">jpadilla/pyjwt#1040</a></li>
<li>Fixes <a
href="https://github.com/jpadilla/pyjwt/issues/660">#660</a>:
Improve typing/logic for <code>options</code> in decode,
decode_complete; Improve docs by <a
href="https://github.com/pachewise"><code>@​pachewise</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1045">jpadilla/pyjwt#1045</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1042">jpadilla/pyjwt#1042</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1052">jpadilla/pyjwt#1052</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1053">jpadilla/pyjwt#1053</a></li>
<li>Fix <a
href="https://github.com/jpadilla/pyjwt/issues/1022">#1022</a>:
Map <code>algorithm=None</code> to &quot;none&quot; by <a
href="https://github.com/qqii"><code>@​qqii</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1056">jpadilla/pyjwt#1056</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1055">jpadilla/pyjwt#1055</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1058">jpadilla/pyjwt#1058</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1060">jpadilla/pyjwt#1060</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1061">jpadilla/pyjwt#1061</a></li>
<li>Fixes <a
href="https://github.com/jpadilla/pyjwt/issues/1047">#1047</a>:
Correct <code>PyJWKClient.get_signing_key_from_jwt</code> annotation by
<a href="https://github.com/khvn26"><code>@​khvn26</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1048">jpadilla/pyjwt#1048</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1062">jpadilla/pyjwt#1062</a></li>
<li>Fixed doc string typo in _validate_jti() function <a
href="https://github.com/jpadilla/pyjwt/issues/1063">#1063</a>
by <a
href="https://github.com/kuldeepkhatke"><code>@​kuldeepkhatke</code></a>
in <a
href="https://github.com/jpadilla/pyjwt/pull/1064">jpadilla/pyjwt#1064</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1065">jpadilla/pyjwt#1065</a></li>
<li>Update SECURITY.md by <a
href="https://github.com/auvipy"><code>@​auvipy</code></a> in <a
href="https://github.com/jpadilla/pyjwt/pull/1057">jpadilla/pyjwt#1057</a></li>
<li>Typing fix: use <code>float</code> instead of <code>int</code> for
<code>lifespan</code> and <code>timeout</code> by <a
href="https://github.com/nikitagashkov"><code>@​nikitagashkov</code></a>
in <a
href="https://github.com/jpadilla/pyjwt/pull/1068">jpadilla/pyjwt#1068</a></li>
<li>[pre-commit.ci] pre-commit autoupdate by <a
href="https://github.com/pre-commit-ci"><code>@​pre-commit-ci</code></a>[bot]
in <a
href="https://github.com/jpadilla/pyjwt/pull/1067">jpadilla/pyjwt#1067</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jpadilla/pyjwt/blob/master/CHANGELOG.rst">pyjwt's
changelog</a>.</em></p>
<blockquote>
<h2><code>v2.12.0
&lt;https://github.com/jpadilla/pyjwt/compare/2.11.0...2.12.0&gt;</code>__</h2>
<p>Fixed</p>
<pre><code>
- Annotate PyJWKSet.keys for pyright by @tamird in
`[#1134](jpadilla/pyjwt#1134)
&lt;https://github.com/jpadilla/pyjwt/pull/1134&gt;`__
- Close ``HTTPError`` response to prevent ``ResourceWarning`` on Python
3.14 by @veeceey in
`[#1133](jpadilla/pyjwt#1133)
&lt;https://github.com/jpadilla/pyjwt/pull/1133&gt;`__
- Do not keep ``algorithms`` dict in PyJWK instances by @akx in
`[#1143](jpadilla/pyjwt#1143)
&lt;https://github.com/jpadilla/pyjwt/pull/1143&gt;`__
- Validate the crit (Critical) Header Parameter defined in RFC 7515
§4.1.11. by @dmbs335 in `GHSA-752w-5fwx-jx9f
&lt;https://github.com/jpadilla/pyjwt/security/advisories/GHSA-752w-5fwx-jx9f&gt;`__
- Use PyJWK algorithm when encoding without explicit algorithm in
`[#1148](jpadilla/pyjwt#1148)
&lt;https://github.com/jpadilla/pyjwt/pull/1148&gt;`__
<p>Added
</code></pre></p>
<ul>
<li>Docs: Add <code>PyJWKClient</code> API reference and document the
two-tier caching system (JWK Set cache and signing key LRU cache).</li>
</ul>
<h2><code>v2.11.0
&lt;https://github.com/jpadilla/pyjwt/compare/2.10.1...2.11.0&gt;</code>__</h2>
<p>Fixed</p>
<pre><code>
- Enforce ECDSA curve validation per RFC 7518 Section 3.4.
- Fix build system warnings by @kurtmckee in
`[#1105](jpadilla/pyjwt#1105)
&lt;https://github.com/jpadilla/pyjwt/pull/1105&gt;`__
- Validate key against allowed types for Algorithm family in
`[#964](jpadilla/pyjwt#964)
&lt;https://github.com/jpadilla/pyjwt/pull/964&gt;`__
- Add iterator for JWKSet in
`[#1041](jpadilla/pyjwt#1041)
&lt;https://github.com/jpadilla/pyjwt/pull/1041&gt;`__
- Validate `iss` claim is a string during encoding and decoding by
@pachewise in `[#1040](jpadilla/pyjwt#1040)
&lt;https://github.com/jpadilla/pyjwt/pull/1040&gt;`__
- Improve typing/logic for `options` in decode, decode_complete by
@pachewise in `[#1045](jpadilla/pyjwt#1045)
&lt;https://github.com/jpadilla/pyjwt/pull/1045&gt;`__
- Declare float supported type for lifespan and timeout by
@nikitagashkov in
`[#1068](jpadilla/pyjwt#1068)
&lt;https://github.com/jpadilla/pyjwt/pull/1068&gt;`__
- Fix ``SyntaxWarning``\s/``DeprecationWarning``\s caused by invalid
escape sequences by @kurtmckee in
`[#1103](jpadilla/pyjwt#1103)
&lt;https://github.com/jpadilla/pyjwt/pull/1103&gt;`__
- Development: Build a shared wheel once to speed up test suite setup
times by @kurtmckee in
`[#1114](jpadilla/pyjwt#1114)
&lt;https://github.com/jpadilla/pyjwt/pull/1114&gt;`__
- Development: Test type annotations across all supported Python
versions,
increase the strictness of the type checking, and remove the mypy
pre-commit hook
by @kurtmckee in `[#1112](jpadilla/pyjwt#1112)
&lt;https://github.com/jpadilla/pyjwt/pull/1112&gt;`__
<p>Added
</code></pre></p>
<ul>
<li>Support Python 3.14, and test against PyPy 3.10 and 3.11 by <a
href="https://github.com/kurtmckee"><code>@​kurtmckee</code></a> in
<code>[#1104](jpadilla/pyjwt#1104)
&lt;https://github.com/jpadilla/pyjwt/pull/1104&gt;</code>__</li>
<li>Development: Migrate to <code>build</code> to test package building
in CI by <a
href="https://github.com/kurtmckee"><code>@​kurtmckee</code></a> in
<code>[#1108](jpadilla/pyjwt#1108)
&lt;https://github.com/jpadilla/pyjwt/pull/1108&gt;</code>__</li>
<li>Development: Improve coverage config and eliminate unused test suite
code by <a
href="https://github.com/kurtmckee"><code>@​kurtmckee</code></a> in
<code>[#1115](jpadilla/pyjwt#1115)
&lt;https://github.com/jpadilla/pyjwt/pull/1115&gt;</code>__</li>
<li>Docs: Standardize CHANGELOG links to PRs by <a
href="https://github.com/kurtmckee"><code>@​kurtmckee</code></a> in
<code>[#1110](jpadilla/pyjwt#1110)
&lt;https://github.com/jpadilla/pyjwt/pull/1110&gt;</code>__</li>
<li>Docs: Fix Read the Docs builds by <a
href="https://github.com/kurtmckee"><code>@​kurtmckee</code></a> in
<code>[#1111](jpadilla/pyjwt#1111)
&lt;https://github.com/jpadilla/pyjwt/pull/1111&gt;</code>__</li>
<li>Docs: Add example of using leeway with nbf by <a
href="https://github.com/djw8605"><code>@​djw8605</code></a> in
<code>[#1034](jpadilla/pyjwt#1034)
&lt;https://github.com/jpadilla/pyjwt/pull/1034&gt;</code>__</li>
<li>Docs: Refactored docs with <code>autodoc</code>; added
<code>PyJWS</code> and <code>jwt.algorithms</code> docs by <a
href="https://github.com/pachewise"><code>@​pachewise</code></a> in
<code>[#1045](jpadilla/pyjwt#1045)
&lt;https://github.com/jpadilla/pyjwt/pull/1045&gt;</code>__</li>
<li>Docs: Documentation improvements for &quot;sub&quot; and
&quot;jti&quot; claims by <a
href="https://github.com/cleder"><code>@​cleder</code></a> in
<code>[#1088](jpadilla/pyjwt#1088)
&lt;https://github.com/jpadilla/pyjwt/pull/1088&gt;</code>__</li>
<li>Development: Add pyupgrade as a pre-commit hook by <a
href="https://github.com/kurtmckee"><code>@​kurtmckee</code></a> in
<code>[#1109](jpadilla/pyjwt#1109)
&lt;https://github.com/jpadilla/pyjwt/pull/1109&gt;</code>__</li>
<li>Add minimum key length validation for HMAC and RSA keys (CWE-326).
Warns by default via <code>InsecureKeyLengthWarning</code> when keys are
below</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/bd9700cca7f9258fadcc429c1034e508025931f2"><code>bd9700c</code></a>
Use PyJWK algorithm when encoding without explicit algorithm (<a
href="https://github.com/jpadilla/pyjwt/issues/1148">#1148</a>)</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/051ea341b5573fe3edcd53042f347929b92c2b92"><code>051ea34</code></a>
Merge commit from fork</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/1451d70eca2059bc472703692f0bb0777bc0fe93"><code>1451d70</code></a>
fix: do not store reference to algorithms dict on PyJWK (<a
href="https://github.com/jpadilla/pyjwt/issues/1143">#1143</a>)</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/f3ba74c106df9ce10e272dfaad96acb4ab3ef5a5"><code>f3ba74c</code></a>
[pre-commit.ci] pre-commit autoupdate (<a
href="https://github.com/jpadilla/pyjwt/issues/1145">#1145</a>)</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/0318ffa7b156b01600376e38952bf961382e0724"><code>0318ffa</code></a>
[pre-commit.ci] pre-commit autoupdate (<a
href="https://github.com/jpadilla/pyjwt/issues/1141">#1141</a>)</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/a52753db3c1075ac01337fa8b7cc92b13a19ac09"><code>a52753d</code></a>
Bump actions/download-artifact from 7 to 8 (<a
href="https://github.com/jpadilla/pyjwt/issues/1142">#1142</a>)</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/b85050f1d444c6828bb4618ee764443b0a3f5d18"><code>b85050f</code></a>
chore(tests): enable mypy (<a
href="https://github.com/jpadilla/pyjwt/issues/1138">#1138</a>)</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/1272b264779717cc481c8341f321a7fc8b3aaba6"><code>1272b26</code></a>
[pre-commit.ci] pre-commit autoupdate (<a
href="https://github.com/jpadilla/pyjwt/issues/1135">#1135</a>)</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/99a87287c26cb97c94399084ee4186ee52207a7f"><code>99a8728</code></a>
chore: remove superfluous constants (<a
href="https://github.com/jpadilla/pyjwt/issues/1136">#1136</a>)</li>
<li><a
href="https://github.com/jpadilla/pyjwt/commit/412cb67a93363812ae4029d6a95f5d4d40ab2609"><code>412cb67</code></a>
fix: close HTTPError to prevent ResourceWarning on Python 3.14 (<a
href="https://github.com/jpadilla/pyjwt/issues/1133">#1133</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/jpadilla/pyjwt/compare/2.10.1...2.12.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pyjwt&package-manager=pip&previous-version=2.10.1&new-version=2.12.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/ROCm/rocm-libraries/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants