Skip to content

[rocRoller] Delete debug code#922

Merged
jaopaulolc merged 3 commits into
developfrom
users/chunxlin/unused-code
Jul 30, 2025
Merged

[rocRoller] Delete debug code#922
jaopaulolc merged 3 commits into
developfrom
users/chunxlin/unused-code

Conversation

@amd-chunxlin
Copy link
Copy Markdown
Contributor

Delete code that was used for debugging during implementation.

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

Generated Documentation

TorreZuk pushed a commit that referenced this pull request Jul 28, 2025
Dependency on hipblas-common-dev

[ROCm/hipBLAS commit: 32ff940]
@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

CodeQL report

Results Summary

Full table of results
Tool Severity Code Location Line

Links

  • HTML
  • Sarif (for download and usage in conjunction with SARIF viewers)

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

ROCmMathLibrariesBot commented Jul 29, 2025

Code Coverage Report for gfx942

Summary

Type Total Missed Master Missed Missed Change Coverage Master Coverage Coverage Change
Lines 50535 7445 7920 -475 85.27% 84.52% .75%
Functions 4874 803 806 -3 83.52% 83.46% .06%
Regions 31700 7705 5144 2561 75.69% 81.53% -5.84%
Branches 17765 4690 4345 345 73.60% 73.05% .55%

This PR adds/edits 12 newly uncovered lines.

Artifacts

Commit Hashes

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

ROCmMathLibrariesBot commented Jul 29, 2025

Performance Report for gfx950-perf

Results

@@            Significant (p-val <0.05) Performance Diffs            @@
====================================================================================================
+   0.27% | p=3.1951e-05 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [0, 2])| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: False, betaInFma: True, direct2LDS_A: True, direct2LDS_B: True, scheduler: Priority, prefetch: True, prefetchInFlight: 4, prefetchLDSFactor: 1, prefetchMixMemOps: True, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: True, prefetchScale: True, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[0, 2])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.12% | p=8.1561e-25 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0.5, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 2, prefetchMixMemOps: False, loadLDSScale_A: True, loadLDSScale_B: True, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.40% | p=3.4279e-07 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 32768, alpha: 2, beta: 0.5, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [0, 2])| GEMM(mac_m: 256, mac_n: 256, mac_k: 128, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: True, direct2LDS_B: True, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 1, prefetchMixMemOps: False, loadLDSScale_A: True, loadLDSScale_B: True, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[0, 2])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=1, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.80% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 128, wave_m: 16, wave_n: 16, wave_k: 128, wave_b: -1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.45% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   2.19% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 128, wave_m: 16, wave_n: 16, wave_k: 128, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   4.26% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 128, wave_m: 16, wave_n: 16, wave_k: 128, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.07% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.91% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: -1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.91% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: -1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   4.42% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 128, wave_m: 16, wave_n: 16, wave_k: 128, wave_b: -1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.47% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 4096, N: 4096, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 64, wave_b: 1, workgroup_size_x: 256, workgroup_size_y: 1, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.52% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.89% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 128, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.59% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.53% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.65% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 64, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.12% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 64, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.16% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 128, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.19% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 128, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.74% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 128, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.24% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 128, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.79% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.58% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.14% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 256, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.87% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 256, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.11% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.59% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.29% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   0.09% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 32, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.85% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: False, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.54% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 256, streamKTwoTile: True, architecture: {'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.59% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: comments)| CodeGen() | CodeGen(instCount: 40000, instructions: comments)
+   0.74% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: simple_mi)| CodeGen() | CodeGen(instCount: 40000, instructions: simple_mi)
Full table of results
Problem Median Diff % Moods p-val Gen A (ns) Gen B (ns)
CodeGen(instCount: 40000, instructions: comments) -0.59% 5.6994e-05 0 0
CodeGen(instCount: 40000, instructions: complex_mi_with_coop) -0.09% 6.5472e-01 0 0
CodeGen(instCount: 40000, instructions: simple_mi) -0.74% 5.6994e-05 0 0
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.48% 1.0000e+00 947,622,460 950,353,897
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.30% 6.5472e-01 947,874,844 944,844,013
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.75% 1.0000e+00 906,411,357 905,068,293
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.38% 1.0000e+00 950,602,092 948,191,194
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.51% 1.0000e+00 949,574,108 950,115,985
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.96% 1.0000e+00 912,491,592 905,957,458
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.79% 6.5472e-01 774,369,322 769,975,969
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.16% 1.0000e+00 4,004,139,138 4,001,774,426
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.86% 1.0000e+00 770,040,974 767,479,402
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.61% 6.5472e-01 4,002,599,501 4,007,981,569
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 1,051,945,806 1,058,264,403
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.47% 6.5472e-01 2,319,333,368 2,325,094,989
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.86% 1.0000e+00 951,584,935 951,355,553
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -9.41% 1.7971e-01 954,172,699 952,392,557
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -4.03% 6.5472e-01 910,201,769 909,174,427
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 13.05% 1.7971e-01 949,061,470 948,428,767
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -2.21% 1.0000e+00 951,846,971 949,086,498
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.86% 6.5472e-01 910,901,651 911,330,533
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.04% 9.2873e-01 28,206,421,526 28,020,124,627
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.12% 2.2725e-01 7,124,256,282 7,122,930,707
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[0, 2])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.12% 5.9150e-01 47,735,958,610 47,750,515,732
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[0, 2])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=4, prefetchLDSFactor=1, prefetchMixMemOps=True, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=True, prefetchScale=True, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.27% 3.1951e-05 25,812,142,993 25,847,209,989
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.12% 8.1561e-25 4,920,242,910 4,894,868,284
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=1, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.07% 5.9150e-01 5,245,918,373 5,240,662,706
GEMMProblem(M=4096, N=4096, K=32768, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'Separate', 'scaleType_A': 'E8M0', 'scale_B': 'Separate', 'scaleType_B': 'E8M0', 'scaleBlockSize': 32, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[0, 2])GEMMSolution(mac_m=256, mac_n=256, mac_k=128, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=True, direct2LDS_B=True, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=1, prefetchMixMemOps=False, loadLDSScale_A=True, loadLDSScale_B=True, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.40% 3.4279e-07 24,129,224,870 24,141,200,596
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.45% 1.0000e+00 557,245,864 552,658,886
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.50% 6.5472e-01 485,753,262 482,797,473
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.56% 1.0000e+00 579,212,425 578,075,841
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.18% 1.0000e+00 501,885,084 500,507,829
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.57% 1.0000e+00 535,995,448 532,047,873
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.80% 2.5347e-02 467,133,573 468,673,665
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.69% 6.5472e-01 555,812,238 552,424,199
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf6', 'type_B': 'bf6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.95% 1.7971e-01 484,554,320 485,387,266
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.45% 1.7451e-03 548,320,396 543,444,336
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -2.19% 1.7451e-03 477,555,926 474,232,338
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.15% 1.0000e+00 561,776,487 562,424,715
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 4.26% 2.5347e-02 487,382,231 488,064,678
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.07% 5.6994e-05 535,298,485 531,188,044
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.82% 1.0000e+00 463,433,565 466,026,985
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.67% 6.5472e-01 548,658,008 549,130,935
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.11% 1.0000e+00 476,260,711 477,485,989
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.32% 1.0000e+00 541,301,114 538,509,716
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.20% 1.0000e+00 476,149,601 474,973,590
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.23% 1.7971e-01 559,426,004 556,113,565
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.24% 6.5472e-01 485,441,714 486,229,048
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.91% 2.5347e-02 529,699,417 530,353,741
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -4.92% 1.7971e-01 466,793,198 462,880,525
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.01% 6.5472e-01 541,538,943 544,244,996
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp4', 'type_B': 'fp4', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 6.98% 1.0000e+00 474,806,673 472,299,575
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.91% 2.5347e-02 557,156,733 555,976,479
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.41% 1.0000e+00 486,548,983 487,225,544
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.83% 6.5472e-01 577,456,825 575,625,750
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.51% 6.5472e-01 504,242,882 502,577,207
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.21% 6.5472e-01 534,657,269 530,969,890
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.73% 6.5472e-01 471,845,218 470,370,610
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.35% 6.5472e-01 557,998,280 551,806,723
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp6', 'type_B': 'fp6', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=-1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -4.42% 2.5347e-02 487,303,814 482,020,540
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -2.87% 6.5472e-01 544,259,285 544,951,672
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.22% 6.5472e-01 477,039,481 473,176,093
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.47% 1.7451e-03 562,124,186 561,577,917
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.53% 1.7971e-01 490,925,103 490,828,584
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.13% 1.0000e+00 536,139,125 532,314,160
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.18% 6.5472e-01 467,613,224 467,190,099
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=64, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.10% 6.5472e-01 549,594,897 543,672,478
GEMMProblem(M=4096, N=4096, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=128, wave_m=16, wave_n=16, wave_k=128, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.94% 1.7971e-01 476,516,824 474,084,640
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.02% 1.0000e+00 1,303,252,922 1,291,750,732
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.64% 1.0000e+00 1,299,686,309 1,307,277,943
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.52% 2.5347e-02 582,945,012 580,585,167
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.39% 6.5472e-01 1,390,551,647 1,389,085,390
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.97% 6.5472e-01 1,299,878,319 1,301,485,085
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.99% 6.5472e-01 1,305,136,032 1,295,462,343
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.24% 1.7971e-01 582,652,320 584,404,709
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.23% 1.0000e+00 1,032,561,104 1,030,641,786
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.21% 1.7971e-01 1,936,414,539 1,933,157,911
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.05% 6.5472e-01 3,815,603,236 3,792,812,814
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.44% 1.0000e+00 1,400,309,825 1,395,151,702
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.62% 1.0000e+00 1,511,836,022 1,395,451,064
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.58% 1.0000e+00 1,407,969,928 1,399,441,303
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.21% 1.0000e+00 1,302,205,109 1,308,797,837
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.40% 1.0000e+00 631,744,655 631,286,577
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.03% 1.0000e+00 535,822,569 533,055,127
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.64% 1.7971e-01 586,328,277 581,978,916
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.89% 5.6994e-05 1,052,412,923 1,053,522,223
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.18% 1.7971e-01 1,944,259,583 1,942,714,679
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.20% 6.5472e-01 1,193,748,999 1,198,511,221
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.11% 6.5472e-01 2,289,343,922 2,282,230,436
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.53% 1.7971e-01 1,540,230,915 1,543,529,236
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.70% 6.5472e-01 1,033,060,468 1,032,924,281
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.08% 1.0000e+00 1,944,421,124 1,933,230,602
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.23% 6.5472e-01 3,838,389,968 3,842,114,001
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.16% 1.0000e+00 1,354,305,397 1,350,572,344
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.87% 1.0000e+00 1,401,971,839 1,398,212,680
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.11% 1.0000e+00 1,401,162,218 1,397,851,316
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.24% 1.0000e+00 1,323,081,126 1,324,171,263
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.69% 6.5472e-01 2,348,396,199 2,347,538,308
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.51% 1.0000e+00 3,811,273,688 3,839,880,002
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.59% 1.7451e-03 637,054,153 635,096,505
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.53% 5.6994e-05 1,203,713,784 1,209,627,842
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.65% 5.6994e-05 702,768,116 699,294,942
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.12% 5.6994e-05 1,343,924,491 1,344,676,582
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.26% 1.7971e-01 846,900,629 844,098,143
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.24% 6.5472e-01 1,683,191,628 1,688,907,399
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.16% 1.7451e-03 659,227,461 655,758,619
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.12% 1.0000e+00 1,246,792,363 1,252,745,926
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.19% 5.6994e-05 719,689,025 714,513,724
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.74% 5.6994e-05 1,380,166,757 1,391,136,509
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.37% 1.7971e-01 862,499,668 863,464,329
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.24% 2.5347e-02 1,733,865,036 1,730,771,092
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.79% 5.6994e-05 997,105,360 1,002,676,511
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.58% 5.6994e-05 1,856,757,090 1,869,781,310
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.14% 2.5347e-02 1,151,387,411 1,142,459,770
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.10% 6.5472e-01 2,221,479,659 2,212,180,270
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.87% 1.7451e-03 1,543,617,605 1,541,488,617
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.11% 5.6994e-05 547,142,249 549,842,014
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.59% 5.6994e-05 1,015,647,889 1,012,714,826
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.29% 1.7451e-03 574,745,067 576,061,314
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.09% 2.5347e-02 1,066,386,511 1,066,634,413
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.85% 5.6994e-05 635,550,581 637,110,744
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=256, streamKTwoTile=True, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.54% 5.6994e-05 1,202,814,220 1,198,007,879
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.27% 1.0000e+00 634,391,651 631,696,729
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.17% 6.5472e-01 1,352,991,978 1,349,207,191
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.34% 1.0000e+00 1,349,920,431 1,346,658,642
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx950', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.19% 1.0000e+00 1,255,674,216 1,249,383,339
Links

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

ROCmMathLibrariesBot commented Jul 29, 2025

Performance Report for gfx12

Results

@@            Significant (p-val <0.05) Performance Diffs            @@
====================================================================================================
+   0.29% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 16, wave_n: 16, wave_k: 16, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Cooperative, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='')
+   0.16% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 16, wave_n: 16, wave_k: 16, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Sequential, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='')
+   0.16% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 16, wave_n: 16, wave_k: 16, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Sequential, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='')
+   0.26% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8192, alpha: 2, beta: 0.5, types: {'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 16, wave_n: 16, wave_k: 16, wave_b: 1, workgroup_size_x: 64, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Cooperative, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='')
-   4.82% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: comments)| CodeGen() | CodeGen(instCount: 40000, instructions: comments)
-   5.42% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: complex_mi_with_coop)| CodeGen() | CodeGen(instCount: 40000, instructions: complex_mi_with_coop)
-   3.98% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: simple_mi)| CodeGen() | CodeGen(instCount: 40000, instructions: simple_mi)
Full table of results
Problem Median Diff % Moods p-val Gen A (ns) Gen B (ns)
CodeGen(instCount: 40000, instructions: comments) 4.82% 5.6994e-05 0 0
CodeGen(instCount: 40000, instructions: complex_mi_with_coop) 5.42% 5.6994e-05 0 0
CodeGen(instCount: 40000, instructions: simple_mi) 3.98% 5.6994e-05 0 0
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.00% 1.0000e+00 552,471,321 550,017,683
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.12% 1.0000e+00 554,768,251 541,136,708
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 1.0000e+00 534,612,912 526,063,384
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.27% 1.0000e+00 563,254,008 557,158,187
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.65% 6.5472e-01 563,725,696 555,778,573
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -2.16% 6.5472e-01 536,222,810 558,543,777
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.16% 6.5472e-01 530,956,223 528,434,390
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.02% 1.0000e+00 527,834,293 525,694,307
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.00% 1.0000e+00 498,817,389 508,630,841
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 6.5472e-01 535,225,707 534,735,955
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.07% 6.5472e-01 551,933,918 538,725,432
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf16', 'type_B': 'bf16', 'type_C': 'bf16', 'type_D': 'bf16', 'type_acc': 'bf16', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 515,597,300 526,261,163
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.01% 1.0000e+00 550,422,964 551,729,495
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.07% 6.5472e-01 557,702,848 538,724,274
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.04% 6.5472e-01 510,963,818 528,995,734
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.11% 1.7971e-01 573,522,492 583,736,755
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.07% 1.0000e+00 592,751,014 574,335,941
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.09% 1.7971e-01 554,692,813 576,694,006
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.03% 1.0000e+00 497,953,852 504,307,756
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.12% 6.5472e-01 514,829,611 509,175,748
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.14% 6.5472e-01 483,157,897 494,967,367
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.29% 2.5347e-02 542,715,059 533,956,318
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.20% 6.5472e-01 528,831,493 541,553,277
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.16% 2.5347e-02 518,663,423 534,187,588
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 1.0000e+00 533,963,967 530,686,531
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.19% 6.5472e-01 551,058,941 535,588,295
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.07% 1.7971e-01 519,328,071 531,696,025
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.03% 6.5472e-01 586,520,711 589,882,494
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.02% 1.0000e+00 577,303,545 584,270,304
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.16% 2.5347e-02 556,115,937 554,515,516
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.29% 1.7971e-01 493,125,754 500,032,941
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.22% 1.7971e-01 505,678,741 490,292,164
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.11% 1.7971e-01 480,239,789 506,632,995
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.26% 2.5347e-02 529,412,829 550,923,355
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.06% 1.7971e-01 543,897,034 535,533,846
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'bf8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.18% 1.7971e-01 524,587,051 543,410,326
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.13% 1.7971e-01 551,952,238 541,059,302
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.06% 1.0000e+00 554,506,475 538,743,543
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.16% 1.7971e-01 522,003,933 527,015,226
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.02% 1.0000e+00 585,526,876 586,372,107
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.02% 1.0000e+00 575,191,385 588,229,352
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.14% 6.5472e-01 553,049,164 562,205,453
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.04% 6.5472e-01 488,840,746 505,775,669
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.12% 6.5472e-01 508,525,890 506,613,925
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.13% 6.5472e-01 484,356,892 491,560,607
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.03% 1.0000e+00 530,296,346 537,804,476
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.02% 1.0000e+00 527,012,029 533,765,116
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'bf8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.15% 1.7971e-01 518,465,423 534,336,804
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.06% 6.5472e-01 556,230,200 545,617,316
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.01% 1.0000e+00 541,902,855 542,238,246
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.10% 1.7971e-01 526,702,426 525,563,756
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.09% 6.5472e-01 573,898,482 583,731,507
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.08% 1.0000e+00 572,275,696 571,869,925
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.05% 6.5472e-01 554,850,332 563,268,343
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.08% 1.0000e+00 496,010,521 503,266,478
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.11% 6.5472e-01 508,671,460 498,398,737
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.26% 6.5472e-01 482,875,640 484,681,147
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.06% 1.0000e+00 529,752,854 545,169,671
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.05% 6.5472e-01 537,896,983 540,202,421
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'fp8', 'type_B': 'fp8', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 537,242,972 524,708,101
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.10% 1.0000e+00 522,655,646 535,443,646
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 2.54% 6.5472e-01 526,419,440 539,715,946
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.03% 1.0000e+00 516,309,080 513,149,835
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.75% 6.5472e-01 542,783,897 542,219,690
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.35% 1.0000e+00 555,772,339 551,994,442
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.20% 1.0000e+00 528,742,615 533,096,926
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.01% 6.5472e-01 514,702,969 504,006,699
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.12% 6.5472e-01 520,872,256 506,671,333
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.20% 1.7971e-01 492,642,707 508,119,687
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.09% 6.5472e-01 534,220,515 520,974,836
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.21% 6.5472e-01 540,095,881 518,307,284
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.07% 1.0000e+00 507,492,369 509,614,220
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.02% 1.0000e+00 541,491,263 537,730,304
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.75% 1.7971e-01 539,563,747 531,376,967
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.17% 1.0000e+00 511,015,396 513,000,900
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.06% 1.0000e+00 545,465,178 544,567,936
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.47% 1.0000e+00 567,979,920 563,986,725
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 1.56% 6.5472e-01 526,502,864 540,960,285
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.15% 6.5472e-01 514,515,314 515,872,072
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.17% 1.7971e-01 518,917,688 524,631,417
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') -0.08% 6.5472e-01 506,691,499 499,095,178
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.15% 6.5472e-01 520,586,505 533,416,826
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.09% 6.5472e-01 539,190,241 521,860,938
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'half', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=16, wave_n=16, wave_k=16, wave_b=1, workgroup_size_x=64, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx1201', 'Xnack': False, 'Sramecc': False}, matchMemoryAccess=True, version='') 0.01% 1.0000e+00 512,710,880 528,450,198
Links

@ROCmMathLibrariesBot
Copy link
Copy Markdown
Contributor

Performance Report for gfx942

Results

@@            Significant (p-val <0.05) Performance Diffs            @@
====================================================================================================
+   2.70% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 128, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: True, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   1.08% | p=1.7451e-03 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   0.24% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: True, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
+   4.80% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 7680, N: 8448, K: 8448, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 64, mac_n: 64, mac_k: 64, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Priority, prefetch: False, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: True, numWGs: 304, streamKTwoTile: True, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.09% | p=2.5347e-02 
	| 3. FloatsGEMM(M: 8448, N: 8448, K: 128, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Cooperative, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   1.75% | p=5.6994e-05 
	| 3. FloatsGEMM(M: 8448, N: 8448, K: 128, alpha: 2, beta: 0.5, types: {'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A: 1, scaleValue_B: 1, workgroupMapping: [-1, -1])| GEMM(mac_m: 128, mac_n: 256, mac_k: 16, wave_m: 32, wave_n: 32, wave_k: 8, wave_b: 1, workgroup_size_x: 128, workgroup_size_y: 2, workgroupRemapXCC: False, workgroupRemapXCCValue: -1, unroll_x: 0, unroll_y: 0, loadLDS_A: True, loadLDS_B: True, storeLDS_D: True, betaInFma: True, direct2LDS_A: False, direct2LDS_B: False, scheduler: Sequential, prefetch: True, prefetchInFlight: 2, prefetchLDSFactor: 0, prefetchMixMemOps: False, loadLDSScale_A: False, loadLDSScale_B: False, swizzleScale: False, prefetchScale: False, streamK: False, numWGs: 0, streamKTwoTile: False, architecture: {'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess: True, version: ) | GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='')
-   2.23% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: comments)| CodeGen() | CodeGen(instCount: 40000, instructions: comments)
-  14.16% | p=5.6994e-05 
	| CodeGen(instCount: 40000, instructions: complex_mi_with_coop)| CodeGen() | CodeGen(instCount: 40000, instructions: complex_mi_with_coop)
Full table of results
Problem Median Diff % Moods p-val Gen A (ns) Gen B (ns)
CodeGen(instCount: 40000, instructions: comments) 2.23% 5.6994e-05 0 0
CodeGen(instCount: 40000, instructions: complex_mi_with_coop) 14.16% 5.6994e-05 0 0
CodeGen(instCount: 40000, instructions: simple_mi) -0.23% 6.5472e-01 0 0
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.98% 6.5472e-01 1,678,313,856 1,545,643,136
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.45% 6.5472e-01 1,811,907,459 1,725,682,888
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.46% 6.5472e-01 1,518,605,635 1,572,487,599
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.07% 1.0000e+00 1,549,079,056 1,534,021,574
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.52% 6.5472e-01 1,524,523,113 1,957,451,681
GEMMProblem(M=1024, N=50304, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.10% 6.5472e-01 1,632,695,651 1,546,650,212
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.79% 6.5472e-01 1,334,505,078 1,265,522,696
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.02% 6.5472e-01 7,351,354,226 6,211,864,694
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.38% 1.0000e+00 1,391,714,888 1,463,997,676
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.12% 1.7971e-01 7,165,388,755 6,588,955,279
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.54% 1.0000e+00 1,721,643,134 1,789,426,958
GEMMProblem(M=3072, N=4096, K=4096, alpha=2, beta=0.5, types={'type_A': 'float', 'type_B': 'float', 'type_C': 'float', 'type_D': 'float', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=2, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.46% 1.7971e-01 3,629,993,045 3,685,482,408
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.64% 1.0000e+00 1,789,734,241 1,534,217,932
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.98% 6.5472e-01 1,622,812,449 1,645,627,931
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=False, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -4.28% 6.5472e-01 1,559,588,189 1,620,412,782
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.81% 6.5472e-01 1,544,346,276 1,552,935,137
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -2.83% 1.0000e+00 1,800,374,981 2,012,448,687
GEMMProblem(M=3840, N=4224, K=4224, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.45% 1.0000e+00 1,466,497,740 1,535,837,117
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.50% 1.0000e+00 2,089,543,293 2,240,550,050
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -2.88% 6.5472e-01 2,091,333,489 2,191,884,894
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.43% 1.7971e-01 996,560,137 1,033,860,550
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.07% 1.0000e+00 2,226,204,216 2,277,064,601
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.27% 6.5472e-01 2,405,633,088 2,333,126,119
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.33% 6.5472e-01 2,511,152,549 3,132,281,210
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.56% 6.5472e-01 1,085,794,194 973,172,482
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.14% 1.0000e+00 1,857,969,545 1,715,135,311
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.50% 1.7971e-01 3,546,293,813 3,081,998,609
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.03% 1.0000e+00 5,785,046,793 6,313,449,863
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.54% 1.0000e+00 2,625,022,513 2,811,654,046
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 3.74% 6.5472e-01 2,326,051,513 2,327,970,994
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 3.21% 1.7971e-01 2,471,248,301 2,317,509,478
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.23% 1.0000e+00 2,410,834,746 2,210,947,677
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.02% 1.0000e+00 1,093,837,128 1,168,271,227
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'N', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.05% 1.7971e-01 916,588,187 1,213,489,417
GEMMProblem(M=7680, N=8448, K=8192, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'T', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.61% 6.5472e-01 953,803,459 1,120,719,224
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.67% 6.5472e-01 1,749,938,800 1,689,389,059
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.14% 6.5472e-01 3,460,775,967 3,583,728,186
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.08% 1.0000e+00 1,939,570,825 1,910,810,734
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -8.07% 1.7971e-01 4,158,013,723 3,845,634,603
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.87% 6.5472e-01 2,543,798,391 2,625,919,402
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=False, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.13% 6.5472e-01 1,843,018,831 1,646,821,363
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.33% 1.0000e+00 3,335,234,425 3,500,791,259
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.08% 1.0000e+00 5,880,189,835 5,685,195,104
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.78% 1.0000e+00 2,222,799,392 2,249,117,008
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=2, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.97% 1.0000e+00 2,293,771,153 2,879,601,952
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=256, workgroup_size_y=1, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 2.46% 1.0000e+00 2,931,758,406 2,487,664,774
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=64, workgroup_size_y=4, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.87% 6.5472e-01 2,292,217,630 3,024,292,399
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.89% 1.7971e-01 3,878,980,748 4,297,063,588
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.46% 6.5472e-01 5,681,891,499 7,395,075,874
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.69% 1.7971e-01 1,097,521,984 1,065,871,362
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.13% 1.0000e+00 2,072,457,135 1,933,336,811
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.08% 1.0000e+00 1,162,229,727 1,188,010,982
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.21% 6.5472e-01 2,158,869,719 2,170,007,275
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.31% 6.5472e-01 1,375,970,987 1,406,059,120
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.82% 1.7971e-01 2,989,906,151 3,131,762,988
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.09% 1.0000e+00 1,030,313,417 1,039,052,472
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -2.70% 5.6994e-05 1,985,569,587 1,986,630,380
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.03% 1.0000e+00 1,162,055,357 1,172,278,309
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.31% 6.5472e-01 2,189,886,152 2,196,039,438
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.55% 6.5472e-01 1,408,540,399 1,562,808,026
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=128, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.12% 6.5472e-01 2,776,674,904 2,956,459,643
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.93% 1.7971e-01 1,850,220,874 1,587,535,976
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.71% 1.7971e-01 2,957,073,909 3,221,644,487
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.01% 1.0000e+00 1,889,958,939 1,936,838,569
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.47% 6.5472e-01 3,667,788,347 3,586,298,390
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=256, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.09% 6.5472e-01 2,522,077,835 2,491,670,958
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -1.08% 1.7451e-03 888,790,890 901,909,904
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.24% 2.5347e-02 1,591,604,859 1,582,910,503
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.22% 1.0000e+00 898,713,269 897,838,414
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=32, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.16% 6.5472e-01 1,687,190,079 1,666,023,609
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -0.34% 6.5472e-01 998,177,805 1,000,218,790
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=False, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=True, numWGs=304, streamKTwoTile=True, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') -4.80% 2.5347e-02 1,883,021,640 1,890,482,935
GEMMProblem(M=7680, N=8448, K=8448, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=64, mac_n=64, mac_k=64, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.23% 6.5472e-01 1,181,796,598 1,072,320,867
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Cooperative', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.09% 2.5347e-02 2,246,540,716 2,220,114,976
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Priority', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 0.47% 6.5472e-01 2,302,880,902 2,236,341,360
GEMMProblem(M=8448, N=8448, K=128, alpha=2, beta=0.5, types={'type_A': 'half', 'type_B': 'half', 'type_C': 'half', 'type_D': 'half', 'type_acc': 'float', 'trans_A': 'N', 'trans_B': 'T', 'scale_A': 'None', 'scaleType_A': 'None', 'scale_B': 'None', 'scaleType_B': 'None', 'scaleBlockSize': -1, 'scaleSkipPermlane': False}, scaleValue_A=1, scaleValue_B=1, workgroupMapping=[-1, -1])GEMMSolution(mac_m=128, mac_n=256, mac_k=16, wave_m=32, wave_n=32, wave_k=8, wave_b=1, workgroup_size_x=128, workgroup_size_y=2, workgroupRemapXCC=False, workgroupRemapXCCValue=-1, unroll_x=0, unroll_y=0, loadLDS_A=True, loadLDS_B=True, storeLDS_D=True, betaInFma=True, direct2LDS_A=False, direct2LDS_B=False, scheduler='Sequential', prefetch=True, prefetchInFlight=2, prefetchLDSFactor=0, prefetchMixMemOps=False, loadLDSScale_A=False, loadLDSScale_B=False, swizzleScale=False, prefetchScale=False, streamK=False, numWGs=0, streamKTwoTile=False, architecture={'ArchString': 'gfx942', 'Xnack': False, 'Sramecc': True}, matchMemoryAccess=True, version='') 1.75% 5.6994e-05 2,269,906,986 2,006,871,204
Links

Copy link
Copy Markdown
Contributor

@ThanHenderson ThanHenderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Been meaning to do this for a while.

@jaopaulolc jaopaulolc merged commit 5cc8868 into develop Jul 30, 2025
18 checks passed
@jaopaulolc jaopaulolc deleted the users/chunxlin/unused-code branch July 30, 2025 19:07
assistant-librarian Bot pushed a commit to ROCm/rocRoller that referenced this pull request Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants