Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
251 commits
Select commit Hold shift + click to select a range
148d196
Contraction and DataType rebase
javier-amd Jan 26, 2026
3b41f20
client modification
javier-amd Jan 27, 2026
1bc4230
Parameters and LocalRead rebased
wen-des Jan 27, 2026
32d700f
rocisa supportand other changes
javier-amd Jan 27, 2026
81c40d2
Components rebased
wen-des Jan 27, 2026
28f2602
writer related change and yaml
javier-amd Jan 27, 2026
d89ad91
Conversion rebased
wen-des Jan 27, 2026
2822693
Fixed errors in compiling
wen-des Jan 27, 2026
0df3258
Fixed python space issues
wen-des Jan 28, 2026
089cd5a
Bugfixed in python files and generated kernel sucessfully
wen-des Jan 28, 2026
74d6373
Disable swap address for mxsa/mxsb
wen-des Jan 28, 2026
e7cb8f6
Committed some missing fixes
wen-des Jan 28, 2026
f56cb78
Fixed mxsa/mxsb address offset
wen-des Jan 29, 2026
0649b1e
Added TODO memo for later consideration
wen-des Jan 29, 2026
b81161d
bpe function fix
javier-amd Jan 29, 2026
ee1714e
Bugfixed for the wrong address offset calculation
wen-des Jan 29, 2026
c89efb4
MX F8 functional testes passed in tensilelite
wen-des Jan 30, 2026
cf1251a
Updated f8 yaml file
wen-des Jan 30, 2026
8478d05
Removed the mx f6 yaml files for mx f6 is not ready by now
wen-des Feb 4, 2026
67b50bd
Updated f4 yaml file for test coverage
wen-des Feb 4, 2026
045f9ec
Standardize kernel names with MX types (#4363)
AlexBrownAMD Feb 6, 2026
9cb5440
Fix some errors breaking non-mx tests on mx branch (#4616)
AlexBrownAMD Feb 18, 2026
de7dee5
Fix for gfx950 mxfp4 DirectToLds (#4644)
nakajee Feb 18, 2026
e0a7991
[hipBLASLt] Enable MX data generation for Tensile host and support ca…
amd-chunxlin Feb 20, 2026
9e0422c
[hipBLASLt] Add block size into predicate for correct solution select…
amd-chunxlin Feb 20, 2026
7afd6fb
[Tensilelite] Add MXFP4 data generator for Tensile (#4597)
archana-ramalingam Feb 21, 2026
e91ecf3
Enable DirectToLds for MXSA/B and re-enable LdsPad for MXFP4 + Direc…
nakajee Feb 21, 2026
e0e6ecc
Fix data initialization (#4827)
bnemanich Feb 23, 2026
dab0b9c
Fix a verification fail with MXFP4 + non DTL (#4715)
nakajee Feb 24, 2026
a3654aa
[hipblaslt] Fixing build issues for gfx_950_mx_rebase (#4465)
NineKa Feb 25, 2026
fd621eb
[TensileLite] Fix MX FP4 scale data overwrite in initializeCPUInputs …
archana-ramalingam Feb 26, 2026
a2ce1ab
Fix stream-k with mx scaling (#4388)
AlexBrownAMD Feb 26, 2026
1c2fe0e
[hipblaslt] Fix fails with dtl.yaml and xfp32.yaml on gfx950_mx_rebas…
nakajee Feb 26, 2026
3b3c84b
Merge commit '4ffdf58b7d36b29ad86806c642e8d7aa930deeaf' into users/ho…
NineKa Feb 27, 2026
613ccdb
add kernel["ProblemType"]["Sparse"] to condition
NineKa Feb 27, 2026
a4a6368
Merge commit '0db944b2e05878e30d441fb1b32421096107ddf5' into users/ho…
nakajee Feb 27, 2026
337dbbe
fix dependency issues for tensilelite clients
NineKa Feb 27, 2026
20e4cb1
Merge commit '70b16b75e53a69200142bf27fa6f90771a0ba0c9' into users/ho…
NineKa Feb 27, 2026
85d98aa
fix computeInputType in tensilelite
NineKa Feb 27, 2026
e1a5bfb
Merge commit '7c3a3e5c044b8abbf77aaf97c2b93f303e763fff' into users/ho…
nakajee Feb 27, 2026
d5b8ff8
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Feb 28, 2026
9c46a42
fix computeInputType issue in ReferenceValidator.cpp
NineKa Feb 28, 2026
84d18f9
[hipblaslt] fix unit tests for gfx950_mx_rebase (#4912)
NineKa Mar 2, 2026
0a645bd
Merge branch 'gfx950_mx_rebase' into users/hongjche/gfx950_mx_rebase_…
NineKa Mar 2, 2026
9a3591d
[hipblaslt] Fix a verification fail with spmm_i8hs.yaml (#5034)
nakajee Mar 3, 2026
0f5c904
initial set of testcase for MXFP4 (#4739)
pdhirajkumarprasad Mar 3, 2026
8724f2f
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 4, 2026
01f29ad
[Tensilelite] Add regression test for MX FP4 scale buffer determinism…
archana-ramalingam Mar 4, 2026
f83ef8e
Merge branch 'gfx950_mx_rebase' into users/hongjche/gfx950_mx_rebase_…
NineKa Mar 5, 2026
24c36e1
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 5, 2026
024ba23
UseF32XEmulation in forceLrvwTile1 for B tensor (#5143)
talumbau Mar 5, 2026
8de6b1a
Merge branch 'users/hongjche/gfx950_mx_rebase_sync' into gfx950_mx_re…
NineKa Mar 5, 2026
ede3a2a
[hipblaslt] Enable StoreSwapAddr for MXFP4, plus add GRVWMXSA/B adust…
nakajee Mar 6, 2026
34bca88
[Tensilelite] Fix UserArgs struct stride mismatch in grouped GEMM (#…
archana-ramalingam Mar 6, 2026
946988a
[hipBLASLt] Disable failed mx f8 problem sizes (#5105)
amd-chunxlin Mar 6, 2026
dcc90b5
[hipblaslt] Scheduling related fixes for MXFP4 (#5169)
nakajee Mar 6, 2026
20b1923
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 6, 2026
e09725d
remove explicit constructor from variable_value class
NineKa Mar 6, 2026
faa7dc7
fix return statement in hipDataType_to_tensile_type and add type chec…
NineKa Mar 6, 2026
ae18131
Merge branch 'users/hongjche/gfx950_mx_rebase_sync' into gfx950_mx_re…
NineKa Mar 6, 2026
2a4b814
[Tensilelite] Shuffle mx scaling data in Tensile (#4864)
archana-ramalingam Mar 9, 2026
b690c88
[hipblaslt] Fix fail with kringshift.yaml (#5228)
nakajee Mar 9, 2026
b967536
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 9, 2026
59349d6
[hipblaslt] Optimize StoreSwapAddr (#5217)
nakajee Mar 9, 2026
945fd17
[hipblaslt] Enable MXFP4 + DtlPlusLdsBuf (#5251)
nakajee Mar 10, 2026
5babee6
Fix gfx12 build error with integer cast
Mar 10, 2026
73b68cf
[hipblaslt] Fix SIA3 issues with MXFP4 (#5245)
nakajee Mar 10, 2026
87181ea
[hipBLASLt] Fix CI failures for gfx942 (#5216)
amd-chunxlin Mar 11, 2026
b6c3d45
Make the usage side’s logic consistent with allocation side (tPackM) …
tomchengchitang Mar 11, 2026
1ee50eb
[hipblaslt] Fix fail with gfx942+dtv/dtl.yaml (#5349)
nakajee Mar 11, 2026
e20a090
[hipblaslt] disable mxDataGenerator for windows builds (#5298)
NineKa Mar 12, 2026
30665e0
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 12, 2026
086a3f2
fix build errors of merge
NineKa Mar 12, 2026
c44133d
Fix: add MacDataTypeA to mock kernel (#5351)
talumbau Mar 12, 2026
9701b31
[hipblaslt] Fix tox test fp8_gfx12 failed when dtva1=1 or dtvb1=1 (#5…
tomchengchitang Mar 12, 2026
b3a0b98
Revert "[hipblaslt] disable mxDataGenerator for windows builds (#5298)"
nakajee Mar 12, 2026
c488af4
[hipBLASLt] Fix failed swizzle tests (#5400)
amd-chunxlin Mar 13, 2026
4339bba
[hipblaslt] Fix tailLoop errors in GLOBAL_OFFSET_{A or B} for fp16_gf…
tomchengchitang Mar 14, 2026
76433a2
[hipblaslt] disable mxDataGenerator for windows builds (#5414)
NineKa Mar 14, 2026
4263c28
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 16, 2026
7c50394
[hipblaslt] Add F4/F6/BF6 to instTypeToDataType (#5457)
nakajee Mar 16, 2026
bf1d16b
[hipblaslt] remove MXFP4 TN logic file (#5487)
nakajee Mar 16, 2026
9a9655a
[hipblaslt] Use64bShadowLimitMX support (#5499)
nakajee Mar 17, 2026
92d332e
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 17, 2026
7b3a5e5
cleanup redundant lines of code
NineKa Mar 17, 2026
e6785bb
[hipBLASLt] Fix failed rocRoller test (#5529)
amd-chunxlin Mar 18, 2026
b96a10a
Revert "cleanup redundant lines of code"
NineKa Mar 18, 2026
f85dc43
[hipsparselt] Fix numSplitMetadata logic (#5608)
tomchengchitang Mar 20, 2026
c377746
[hipblaslt] const GRInc support (#5526)
nakajee Mar 20, 2026
a620792
Merge branch 'develop' into gfx950_mx_rebase
bnemanich Mar 23, 2026
456c3dd
Fix merge
bnemanich Mar 23, 2026
9a1bd95
Fix tensilelite build error due to merge conflict
nakajee Mar 23, 2026
637f05b
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_cleanup
NineKa Mar 24, 2026
52b8654
fix various issue in review
NineKa Mar 24, 2026
03a359b
[hipblaslt] Reject MX + nonDTL + UnrollLoopSwapGlobalReadOrder (#5794)
nakajee Mar 25, 2026
ae8f301
[hipsparselt] Restore to develop logic and fix mistakenly used PackKF…
tomchengchitang Mar 26, 2026
cff72f1
[hipsparelt] Delete spurious rIdx_ loop for hipsparselt failed tests …
tomchengchitang Mar 27, 2026
f0849a4
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 27, 2026
dc3eac5
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Mar 30, 2026
0b5c339
[hipsparselt] Fix metadata vgpr idx calculation (#5920)
tomchengchitang Mar 31, 2026
8a6ed73
[hipblaslt] Add support for MXFP4 + TailLoop (K multiple of 32) (#5692)
nakajee Apr 1, 2026
abce258
Merge branch 'develop' into users/hongjche/gfx950_mx_rebase_sync
NineKa Apr 2, 2026
41a35bc
Merge remote-tracking branch 'origin/develop' into gfx950_mx_rebase
nakajee Apr 3, 2026
15c010e
hipBLASLt Tensile: tighten LRVW/GRVW validation and trim Solution.py …
NineKa Apr 3, 2026
f8f6070
Merge remote-tracking branch 'origin/gfx950_mx_rebase' into gfx950_mx…
nakajee Apr 4, 2026
90fbfa8
Add conversion for MX types for Origami (#6271)
yenong-amd Apr 9, 2026
3dd2873
Merge remote-tracking branch 'origin/gfx950_mx_rebase' into gfx950_mx…
nakajee Apr 16, 2026
a377e5b
Merge remote-tracking branch 'origin/develop' into users/nakajee/gfx9…
nakajee Apr 16, 2026
136cbde
Update Tensile.py
pdhirajkumarprasad Apr 17, 2026
4b32be2
new line formatting fix
pdhirajkumarprasad Apr 17, 2026
ff3bc60
Resolve merge conflicts involving mx-block-a
bnemanich Apr 17, 2026
9a23f56
Merge conflicts in DataType.py
bnemanich Apr 17, 2026
3c3bc42
Fix SoftmaxGenerator.py SolutionLibrary.py ClientProblemFactory.hpp R…
amd-chunxlin Apr 16, 2026
4e8e337
conflict resolution
Apr 17, 2026
0ed0a44
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of https://github…
Apr 17, 2026
4a1054c
Resolve conflicts in client/include/DataInitialization.hpp
vinayakdsci Apr 17, 2026
b2313cc
Resolve conflicts in client/src/Reference.cpp
vinayakdsci Apr 17, 2026
7083d99
Resolve conflicts in client/include/TypedId.hpp
vinayakdsci Apr 17, 2026
a039d38
Rsolve conflicts in Tensile/AsmAddressCalculation.py
NineKa Apr 17, 2026
382df2c
Solved merge conflict for KernelWriter.py
nakajee Apr 17, 2026
56166f5
Fix ContractionProblem.hpp
amd-chunxlin Apr 17, 2026
a66f3fa
Fix Serialization/ContractionPredicates.hpp
amd-chunxlin Apr 17, 2026
301467f
Resolve conflicts in Tensile/Components/SIA.py
NineKa Apr 17, 2026
33a4a67
Resolve merge conflicts in GSU.py
bnemanich Apr 17, 2026
9dba4aa
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 17, 2026
d531271
Removed unnecessary change in KernelWriter.py
nakajee Apr 17, 2026
5ec2a35
Resolve conflicts for tensilelite/include/Tensile/DataTypes_BFloat6.hpp
NineKa Apr 17, 2026
949193c
HipUtils.hpp/ContractionSolution.cpp conflicts
Apr 17, 2026
792f16a
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of https://github…
Apr 17, 2026
84f949e
conflict resolution
pdhirajkumarprasad Apr 19, 2026
8a67779
Fix rocisa files conflicts
Apr 20, 2026
66fe871
Resolve conflicts in DataTypes.cpp and enum
Apr 20, 2026
8a708eb
Resolve conflicts in ContractionProblem.cpp and related functions call
Apr 20, 2026
5ab3ff8
Retain both mxfp4/6 types, with and without "_EXT"
archana-ramalingam Apr 20, 2026
4b38bda
fix conflict in computeLoadSrd
pdhirajkumarprasad Apr 20, 2026
4fb414d
Fix conflicts in DataTypes_Float6.hpp
CurtisFu1002 Apr 20, 2026
6df6e78
Remove Float6x16 and Float6x16_Storage
CurtisFu1002 Apr 20, 2026
d69aa7d
Resolve conflicts in include/Tensile/DataTypes.hpp
vinayakdsci Apr 20, 2026
996091c
Resolve conflicts in include/Tensile/Contraction{Solution,ProblemPred…
vinayakdsci Apr 20, 2026
4c56ae9
fix the conflict
pdhirajkumarprasad Apr 20, 2026
2c9cb4c
Resolve conflicts in Tensile/SolutionStructs/Validators/MatrixInstruc…
vinayakdsci Apr 20, 2026
67c7caf
Use single-character keys (e.g., S) when MacDataTypeA equals MacDataT…
pdhirajkumarprasad Apr 20, 2026
412e1b9
Fix TensorDescriptor.hpp
amd-chunxlin Apr 20, 2026
98c723a
Fix merge conflicts in testing_matmul.hpp
bnemanich Apr 20, 2026
13a0005
Resolve conflicts in tensile_host except prob scaleAType part
Apr 20, 2026
cff3f03
conflict res LSU.py
Apr 20, 2026
0ae609e
Fix setUseScaleAB conflict
amd-chunxlin Apr 20, 2026
e3bcdba
Fix rocsparselt/src/tensile_host.cpp
amd-chunxlin Apr 20, 2026
1bea41a
Resolve conflicts for tensilelite/include/Tensile/DataTypes_Float4.hpp
NineKa Apr 20, 2026
f616292
Matching implementations in gfx950_mx_rebase
NineKa Apr 20, 2026
d70475d
Fix conflicts for rocblaslt's tensile_host.cpp
archana-ramalingam Apr 20, 2026
43a0a6d
Replace _EXT with non _EXT data types
archana-ramalingam Apr 20, 2026
9716cf2
Resolve conflicts in projects/hipblaslt/tensilelite/Tensile/KernelWri…
nakajee Apr 21, 2026
7b78a84
Merge remote-tracking branch 'origin/users/nakajee/gfx950_mx_rebase_m…
nakajee Apr 21, 2026
ff741a3
Resolve conflicts in projects/hipblaslt/tensilelite/Tensile/SolutionS…
nakajee Apr 21, 2026
af13b27
Resolve conflicts in projects/hipblaslt/tensilelite/Tensile/Component…
nakajee Apr 21, 2026
0b38012
Fix conflicts for DataInitialization.cpp
CurtisFu1002 Apr 21, 2026
1efc795
Resolve initializeConstantInputs in DataInitialization.cpp
Apr 21, 2026
5114c7a
Resolve macro guards in testing_matmul.hpp
Apr 21, 2026
9fe02d4
Fix bugs in DataTypes.cpp and ContractionProblem.cpp
Apr 21, 2026
4a51b17
Resolve 6x16 <-> 6x32 and redefinition issues
Apr 21, 2026
3cd7f37
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 21, 2026
6991994
Fix merge issues
bnemanich Apr 21, 2026
dd01c69
fix regression due to merge conflict resolution
pdhirajkumarprasad Apr 21, 2026
f05171e
Fix for errors with mx32f8_tn.yaml
nakajee Apr 21, 2026
86c625e
Fix for mx32f8_tn.yaml (#6641)
bnemanich Apr 21, 2026
0a4346f
Fix sk_mx32f4_quick
bnemanich Apr 21, 2026
00a0ed7
Revert "Fix sk_mx32f4_quick"
bnemanich Apr 22, 2026
d54a6b3
Fix unsupported GEMM problem
bnemanich Apr 22, 2026
26b175e
Removed duplicated vgpr allocation code for MX
nakajee Apr 22, 2026
c13b6cc
Fix sk_mx32f4_quick and remove duplicate code
amd-chunxlin Apr 22, 2026
252b894
More fixes for mx32f4_tn.yaml
nakajee Apr 22, 2026
3fa7a7c
More fix for sk_mx32f4_quick.yaml
nakajee Apr 23, 2026
4fdb314
Fix sparse yaml tests (#6657)
tomchengchitang Apr 23, 2026
516aa5f
Merge remote-tracking branch 'origin/develop' into users/nakajee/gfx9…
nakajee Apr 23, 2026
d2e3e21
Resolve merge conflicts
bnemanich Apr 23, 2026
ba8a34d
Resolve conflicts SIA.py
archana-ramalingam Apr 23, 2026
fd41203
Fix merge conflicts in Reference.cpp
bnemanich Apr 23, 2026
934908f
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 23, 2026
a98061b
Resolve merge conflicts for KernelWriterAssembly.py
nakajee Apr 23, 2026
3c91a21
Fix StreamK merge issues
bnemanich Apr 23, 2026
0ab6aff
Resolve merge conflicts with KernelWriter.py
nakajee Apr 23, 2026
17a6b08
Merge remote-tracking branch 'origin/users/nakajee/gfx950_mx_rebase_m…
nakajee Apr 23, 2026
44d0580
Resolve merge conflicts with Solution.py
nakajee Apr 23, 2026
5cf99a6
Add scale type to generateMXInput calls in TensileLite client
bnemanich Apr 23, 2026
42a7b59
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 23, 2026
8406315
Resolve merge conflicts with mfma.hpp
nakajee Apr 23, 2026
7e4ea08
Merge remote-tracking branch 'origin/users/nakajee/gfx950_mx_rebase_m…
nakajee Apr 23, 2026
385deeb
Resolve testing_matmul.hpp
archana-ramalingam Apr 23, 2026
7de1e69
Resolve HIP_R_8F_E5M3_EXT build error
archana-ramalingam Apr 23, 2026
ebb576f
fix the typo in form of missing endif block
pdhirajkumarprasad Apr 23, 2026
1c2e31f
Fix build break when building tensilelite-client in ffm
CurtisFu1002 Apr 23, 2026
c55c690
Change hardcoded dtype for DataTypeMXS{A,B}
vinayakdsci Apr 23, 2026
8599dcc
mark these test as xfail for 1250 as mix type is not supported yet
pdhirajkumarprasad Apr 23, 2026
ddc0023
Fixes for fails with sk_mx32f4_quick and sk_mx32f8_quick
nakajee Apr 23, 2026
4b80342
Remove MXScale
bnemanich Apr 23, 2026
a5c478b
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 23, 2026
671db66
Fix fallthrough issue due to missing semicolon
amd-chunxlin Apr 23, 2026
fccb7bb
Merge origin/develop (stinkytofu MUBUF off / gfx1250 tooling)
bnemanich Apr 23, 2026
d9ff981
Fix fails with sk_mx32f8_quick.yaml
nakajee Apr 23, 2026
8583829
Removed redundant code for MX.
nakajee Apr 23, 2026
52e1131
Fix fails with mxfp4_mxfp4_fp32_tn_act.yaml (Tailloop fix)
nakajee Apr 24, 2026
e801d80
Fix for gfx950 mxfp4 + GSU
nakajee Apr 24, 2026
b0b764a
More fix for gfx950+mx+Tailloop
nakajee Apr 24, 2026
639bb30
Fix namespace errors when compiling using amdclang 23
NineKa Apr 24, 2026
4ebb168
Fix for gfx950 mx + DTL2 or 3
nakajee Apr 24, 2026
57e67ea
Merge remote-tracking branch 'origin/develop' into users/nakajee/gfx9…
nakajee Apr 24, 2026
9472aaf
fix mxf4_gfx1250 + mxf8_gfx1250 yaml files and more (#6767)
tomchengchitang Apr 24, 2026
1e35325
Small change for PR6767
nakajee Apr 24, 2026
c9557bb
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 24, 2026
51695e7
Fix hipsparselt builds
bnemanich Apr 24, 2026
e44c608
Fix Windows bug
bnemanich Apr 24, 2026
a95702c
Fix headers
bnemanich Apr 24, 2026
5a53cfd
Fix issues in hipsparselt builds
bnemanich Apr 24, 2026
28619a2
Fix fail with gfx942 xfp32.yaml
nakajee Apr 24, 2026
f84cfe2
Merge branch 'develop' of github.com:ROCm/rocm-libraries into users/n…
bnemanich Apr 24, 2026
0f4d05e
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 24, 2026
1e0e2b7
Fix for gfx950 mx fail due to previous fix for gfx1250
nakajee Apr 25, 2026
1cf9a9d
Removed duplicated mx code + redundant new line
nakajee Apr 25, 2026
e35aebc
Merge remote-tracking branch 'origin/develop' into users/nakajee/gfx9…
nakajee Apr 25, 2026
2f49f22
More small refactoring
nakajee Apr 25, 2026
c530e75
added skip-1250 for 950 specific yaml
pdhirajkumarprasad Apr 26, 2026
5527082
Fix gfx950 MX code review findings
bnemanich Apr 26, 2026
6df7f01
Merge branch 'develop' into users/nakajee/gfx950_mx_rebase_merge
bnemanich Apr 26, 2026
c76bd54
Fix datatype macro guards
bnemanich Apr 26, 2026
8122284
Fix datatype macro guards
bnemanich Apr 26, 2026
e6f99d4
Revert TENSILE_USE_{FP4,FP6,BF6} guard to fix gfx942 regression
bnemanich Apr 27, 2026
a65b1ee
Address review comments
bnemanich Apr 27, 2026
e4fb0e0
Replace MacDataTypeA with DataType
amd-chunxlin Apr 28, 2026
00759a0
Added missing B8 + F4/F6
nakajee Apr 28, 2026
5fa9df7
Added missing F8/B8 + B6
nakajee Apr 28, 2026
12e62d4
Fix srdShiftLeft MXSA/B and TailLoop MXSB code for gfx1250
nakajee Apr 28, 2026
ef26d97
Fix data initialization for mixed precision
bnemanich Apr 28, 2026
b197127
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 28, 2026
20905d2
Change xfail-gfx1250 to skip-gfx1250 for new gfx950 mx test cases
nakajee Apr 28, 2026
a9d00e2
Removed duplicated code
nakajee Apr 29, 2026
3e7e2a8
Updated year in copyright header
nakajee Apr 29, 2026
ad38d55
Uncomment a valid gemm type
amd-chunxlin Apr 29, 2026
1f85a5c
Address cmake code review requests
bnemanich Apr 29, 2026
fedf5b0
Merge branch 'users/nakajee/gfx950_mx_rebase_merge' of github.com:ROC…
bnemanich Apr 29, 2026
089eafb
Fix rotating buffers for mxfp4
bnemanich Apr 29, 2026
2b2ee72
More cmake changes
bnemanich Apr 29, 2026
4181713
Address cmake review comments
bnemanich Apr 30, 2026
7c78dd5
Merge branch 'develop' of github.com:ROCm/rocm-libraries into users/n…
bnemanich Apr 30, 2026
fce0e0f
Fix build issue
bnemanich Apr 30, 2026
d8b5d3f
Fix init error
bnemanich Apr 30, 2026
3f303bd
Merge branch 'develop' of github.com:ROCm/rocm-libraries into users/n…
bnemanich Apr 30, 2026
1ea7df2
Merge branch 'develop' of github.com:ROCm/rocm-libraries into users/n…
bnemanich May 1, 2026
4e89c91
Reduce mxfp4 test time
bnemanich May 1, 2026
6256768
Merge branch 'develop' into users/nakajee/gfx950_mx_rebase_merge
bnemanich May 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions projects/hipblaslt/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ if(HIPBLASLT_ENABLE_HOST)
option(HIPBLASLT_ENABLE_HIPBLAS_DIRECT "Use the hipblas header directly." OFF)
endif()

# mxDataGenerator is used by hipblaslt clients and tensilelite client; default OFF on Windows
cmake_dependent_option(HIPBLASLT_ENABLE_MXDATAGENERATOR "Use mxDataGenerator for MX format data generation (clients/tests)." ON "NOT WIN32" OFF)
message(STATUS "Enable mxDataGenerator for MX format data generation: ${HIPBLASLT_ENABLE_MXDATAGENERATOR}")

set(CMAKE_SKIP_BUILD_RPATH FALSE CACHE BOOL "Skip build RPATH")
set(CMAKE_BUILD_WITH_INSTALL_RPATH FALSE CACHE BOOL "Build with install RPATH")
set(CMAKE_INSTALL_RPATH "$ORIGIN/../lib:$ORIGIN/../llvm/lib" CACHE STRING "Install RPATH")
Expand Down Expand Up @@ -255,6 +259,48 @@ if(NOT ROCM_LIBS_SUPERBUILD)
endif()
endif()

# INTERFACE library that owns the public hipblaslt API headers (the in-tree
# `library/include` subtree plus its build-tree counterpart, where
# `hipblaslt-export.h` and `hipblaslt-version.h` are generated). Routing the
# include directories through a target lets consumers pick them up via the
# build graph rather than via `target_include_directories(<tgt> BEFORE
# PRIVATE .../library/include ...)`. `hip::host` is exposed as an INTERFACE
# link because the in-tree headers `#include <hip/...>`.
Comment thread
bnemanich marked this conversation as resolved.
add_library(hipblaslt-headers INTERFACE)
add_library(hipblaslt::headers ALIAS hipblaslt-headers)
Comment thread
bnemanich marked this conversation as resolved.
target_include_directories(hipblaslt-headers
INTERFACE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/library/include>
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}/library/include>
)
target_link_libraries(hipblaslt-headers INTERFACE hip::host)
Comment thread
bnemanich marked this conversation as resolved.

# `hipblaslt-version.h` is consumed by `library/include/hipblaslt/hipblaslt.h`
# (only included when HIPBLASLT_ENABLE_HOST is ON). When HOST is ON, the
# canonical `configure_file` call in `library/include/CMakeLists.txt`
# regenerates the same file from the same template; this generation is
# unconditional only because the file is cheap to produce.
configure_file(
"${CMAKE_CURRENT_SOURCE_DIR}/library/include/hipblaslt-version.h.in"
"${CMAKE_CURRENT_BINARY_DIR}/library/include/hipblaslt/hipblaslt-version.h"
)

# `hipblaslt::mxdatagen` lives in clients/common (next to its source), and is
# only built when at least one consumer needs it. tensilelite/client and
# tensilelite/tests reach for it on the same condition, so we drive the
# subdir from here to keep the dependency wiring close to the gate that
# decides whether to build it at all.
set(_hipblaslt_mxdatagen_enabled FALSE)
if(HIPBLASLT_ENABLE_MXDATAGENERATOR
AND (HIPBLASLT_ENABLE_CLIENT
OR TENSILELITE_ENABLE_CLIENT
OR TENSILELITE_BUILD_TESTING))
set(_hipblaslt_mxdatagen_enabled TRUE)
endif()
if(_hipblaslt_mxdatagen_enabled)
add_subdirectory(clients/common)
endif()
Comment thread
bnemanich marked this conversation as resolved.

add_subdirectory(tensilelite)

if(HIPBLASLT_ENABLE_HOST)
Expand Down
26 changes: 16 additions & 10 deletions projects/hipblaslt/clients/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,19 +29,19 @@ target_link_libraries(hipblaslt-clients-common
tensilelite::tensilelite-host
)

if(HIPBLASLT_ENABLE_MXDATAGENERATOR)
# PUBLIC because public headers such as testing_matmul.hpp guard
# `<mxDataGen.hpp>` on `HIPBLASLT_ENABLE_MXDATAGENERATOR`, so downstream
# targets (hipblaslt-test, hipblaslt-bench, ...) need to see the macro,
# the include path, and the C++20 floor all the way through.
target_link_libraries(hipblaslt-clients-common PUBLIC hipblaslt::mxdatagen)
endif()

if(HIPBLASLT_ENABLE_ROCROLLER)
if(NOT ROCM_LIBS_SUPERBUILD)
if(HIPBLASLT_ENABLE_THEROCK)
find_package(mxDataGenerator REQUIRED)
else()
add_subdirectory("${CMAKE_CURRENT_SOURCE_DIR}/../../../shared/mxdatagenerator" "${CMAKE_CURRENT_BINARY_DIR}/mxdatagenerator")
endif()
endif()
target_compile_definitions(hipblaslt-clients-common PRIVATE HIPBLASLT_USE_ROCROLLER)
target_link_libraries(hipblaslt-clients-common PRIVATE roc::mxDataGenerator)
target_compile_features(hipblaslt-clients-common PRIVATE cxx_std_20)
endif()


if(HIPBLASLT_ENABLE_ASAN)
hipblaslt_target_configure_sanitizers(hipblaslt-clients-common PUBLIC)
endif()
Expand All @@ -59,7 +59,13 @@ if(HIPBLASLT_ENABLE_OPENMP)
target_link_libraries(hipblaslt-bench PRIVATE OpenMP::OpenMP_CXX)
endif()

add_subdirectory(common)
# `clients/common/CMakeLists.txt` itself only declares the mxdatagen helper
# (which the project root already added when needed); the `include` and
# `src` subdirectories populate hipblaslt-clients-common's sources, so they
# are added directly here, avoiding a second `add_subdirectory(common)` that
# would clash with the root-level one.
add_subdirectory(common/include)
add_subdirectory(common/src)
add_subdirectory(bench)

add_executable(hipblaslt-api-overhead "${CMAKE_CURRENT_SOURCE_DIR}/bench/src/client_api_overhead.cpp")
Expand Down
10 changes: 5 additions & 5 deletions projects/hipblaslt/clients/bench/src/client.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
*
* MIT License
*
* Copyright (C) 2022-2025 Advanced Micro Devices, Inc.
* Copyright (C) 2022-2026 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -969,16 +969,16 @@ try
if(isBlockScaling(arg.scaleA))
{
if(arg.a_type != HIP_R_8F_E4M3 && arg.a_type != HIP_R_8F_E5M2
&& arg.a_type != HIP_R_4F_E2M1_EXT && arg.a_type != HIP_R_6F_E2M3_EXT
&& arg.a_type != HIP_R_6F_E3M2_EXT)
&& arg.a_type != HIP_R_4F_E2M1 && arg.a_type != HIP_R_6F_E2M3
&& arg.a_type != HIP_R_6F_E3M2)
throw std::invalid_argument("Invalid a_type for block scaling format: "s
+ hip_datatype_to_string(arg.a_type));
}
if(isBlockScaling(arg.scaleB))
{
if(arg.b_type != HIP_R_8F_E4M3 && arg.b_type != HIP_R_8F_E5M2
&& arg.b_type != HIP_R_4F_E2M1_EXT && arg.b_type != HIP_R_6F_E2M3_EXT
&& arg.b_type != HIP_R_6F_E3M2_EXT)
&& arg.b_type != HIP_R_4F_E2M1 && arg.b_type != HIP_R_6F_E2M3
&& arg.b_type != HIP_R_6F_E3M2)
throw std::invalid_argument("Invalid b_type for block scaling format: "s
+ hip_datatype_to_string(arg.b_type));
}
Expand Down
80 changes: 78 additions & 2 deletions projects/hipblaslt/clients/common/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,78 @@
add_subdirectory(include)
add_subdirectory(src)
# Copyright Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT

# Owns the `hipblaslt::mxdatagen` STATIC helper. The source and header live
# under this directory, so the target is defined here rather than at the root.
#
# This file is added by the project root via `add_subdirectory(clients/common)`
# only when:
# * HIPBLASLT_ENABLE_MXDATAGENERATOR is ON, and
# * at least one consumer subtree is enabled (HIPBLASLT_ENABLE_CLIENT,
# TENSILELITE_ENABLE_CLIENT, or TENSILELITE_BUILD_TESTING).
#
# Consumers should `target_link_libraries(<tgt> PUBLIC hipblaslt::mxdatagen)`
# (PRIVATE if no further consumer needs the macro). Linking propagates:
# * the include path for `<mxDataGen.hpp>`
# * the link to `roc::mxDataGenerator`
# * the C++20 requirement
# * the `HIPBLASLT_ENABLE_MXDATAGENERATOR` macro
# * `hipblaslt::headers`, which carries the in-tree `<hipblaslt/...>` API
# headers (the source pulls in `<hipblaslt/hipblaslt-export.h>` and
# `<hipblaslt/hipblaslt-types.h>`).

if(NOT ROCM_LIBS_SUPERBUILD)
if(HIPBLASLT_ENABLE_THEROCK)
find_package(mxDataGenerator REQUIRED)
else()
# `${PROJECT_SOURCE_DIR}/../../shared/mxdatagenerator` resolves to the
# sibling `shared/mxdatagenerator` subtree of the project. The binary
# dir is anchored under the hipblaslt project's build tree to match
# the upstream subdirectory's existing layout assumptions.
add_subdirectory(
"${PROJECT_SOURCE_DIR}/../../shared/mxdatagenerator"
"${PROJECT_BINARY_DIR}/mxdatagenerator"
)
endif()
endif()

add_library(hipblaslt-mxdatagen STATIC
"${CMAKE_CURRENT_SOURCE_DIR}/src/mxDataGen.cpp"
)
add_library(hipblaslt::mxdatagen ALIAS hipblaslt-mxdatagen)

target_include_directories(hipblaslt-mxdatagen
PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
)

target_link_libraries(hipblaslt-mxdatagen
PUBLIC
hipblaslt::headers
roc::mxDataGenerator
PRIVATE
# `-x hip` (via hip::device's INTERFACE_COMPILE_OPTIONS) makes the
# translation unit compile through hipcc/clang. mxDataGen.cpp itself
# does not launch kernels, but the modern ROCm bf16 header it pulls
# in transitively uses clang-only builtins (e.g. __builtin_elementwise_rint).
hip::device
)

target_compile_features(hipblaslt-mxdatagen PUBLIC cxx_std_20)
target_compile_definitions(hipblaslt-mxdatagen PUBLIC HIPBLASLT_ENABLE_MXDATAGENERATOR)
set_target_properties(hipblaslt-mxdatagen
PROPERTIES POSITION_INDEPENDENT_CODE ON
)

# `<hipblaslt/hipblaslt-export.h>` is generated by
# `library/include/CMakeLists.txt`'s `generate_export_header(hipblaslt ...)`
# call when HIPBLASLT_ENABLE_HOST=ON. In a tensilelite-only build that target
# does not exist, so produce the same file from this helper instead. Both
# code paths emit the file to the same location so `hipblaslt::headers`'s
# `${CMAKE_BINARY_DIR}/library/include` interface dir resolves it either way.
if(NOT HIPBLASLT_ENABLE_HOST)
include(GenerateExportHeader)
generate_export_header(hipblaslt-mxdatagen
BASE_NAME hipblaslt
EXPORT_FILE_NAME "${PROJECT_BINARY_DIR}/library/include/hipblaslt/hipblaslt-export.h"
)
endif()
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
*
* MIT License
*
* Copyright (C) 2022-2025 Advanced Micro Devices, Inc.
* Copyright (C) 2022-2026 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -86,8 +86,8 @@ inline std::size_t realDataTypeSize(hipDataType dtype)
{
// These types were not defined in older versions of ROCm, so need to be handled specially here.
auto const dtype_int = static_cast<int>(dtype);
if(dtype_int == HIP_R_4F_E2M1_EXT || dtype_int == HIP_R_6F_E2M3_EXT
|| dtype_int == HIP_R_6F_E3M2_EXT)
if(dtype_int == HIP_R_4F_E2M1 || dtype_int == HIP_R_6F_E2M3
|| dtype_int == HIP_R_6F_E3M2)
{
return 1;
}
Expand Down
50 changes: 25 additions & 25 deletions projects/hipblaslt/clients/common/include/hipblaslt_init.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
*
* MIT License
*
* Copyright (C) 2022-2025 Advanced Micro Devices, Inc.
* Copyright (C) 2022-2026 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -153,13 +153,13 @@ inline void hipblaslt_init(void* A,
hipblaslt_init<hipblasLtInt8>(
static_cast<hipblasLtInt8*>(A), M, N, lda, stride, batch_count);
break;
case HIP_R_6F_E2M3_EXT:
case HIP_R_6F_E2M3:
hipblaslt_cerr << "hipblaslt_init not supports FP6" << std::endl;
break;
case HIP_R_6F_E3M2_EXT:
case HIP_R_6F_E3M2:
hipblaslt_cerr << "hipblaslt_init not supports BF6" << std::endl;
break;
case HIP_R_4F_E2M1_EXT:
case HIP_R_4F_E2M1:
hipblaslt_cerr << "hipblaslt_init not supports FP4" << std::endl;
break;
default:
Expand Down Expand Up @@ -263,13 +263,13 @@ inline void hipblaslt_init_sin(void* A,
hipblaslt_init_sin<hipblasLtInt8>(
static_cast<hipblasLtInt8*>(A), M, N, lda, stride, batch_count);
break;
case HIP_R_6F_E2M3_EXT:
case HIP_R_6F_E2M3:
hipblaslt_cerr << "hipblaslt_init_sin not supports FP6" << std::endl;
break;
case HIP_R_6F_E3M2_EXT:
case HIP_R_6F_E3M2:
hipblaslt_cerr << "hipblaslt_init_sin not supports BF6" << std::endl;
break;
case HIP_R_4F_E2M1_EXT:
case HIP_R_4F_E2M1:
hipblaslt_cerr << "hipblaslt_init_sin not supports FP4" << std::endl;
break;
default:
Expand Down Expand Up @@ -353,13 +353,13 @@ inline void hipblaslt_init_alternating_sign(void* A,
hipblaslt_init_alternating_sign<hipblasLtInt8>(
static_cast<hipblasLtInt8*>(A), M, N, lda, stride, batch_count);
break;
case HIP_R_6F_E2M3_EXT:
case HIP_R_6F_E2M3:
hipblaslt_cerr << "hipblaslt_init_alternating_sign not supports FP6" << std::endl;
break;
case HIP_R_6F_E3M2_EXT:
case HIP_R_6F_E3M2:
hipblaslt_cerr << "hipblaslt_init_alternating_sign not supports BF6" << std::endl;
break;
case HIP_R_4F_E2M1_EXT:
case HIP_R_4F_E2M1:
hipblaslt_cerr << "hipblaslt_init_alternating_sign not supports FP4" << std::endl;
break;
default:
Expand Down Expand Up @@ -440,13 +440,13 @@ inline void hipblaslt_init_hpl_alternating_sign(void* A,
hipblaslt_init_hpl_alternating_sign<hipblasLtInt8>(
static_cast<hipblasLtInt8*>(A), M, N, lda, stride, batch_count);
break;
case HIP_R_6F_E2M3_EXT:
case HIP_R_6F_E2M3:
hipblaslt_cerr << "hipblaslt_init_hpl_alternating_sign not supports FP6" << std::endl;
break;
case HIP_R_6F_E3M2_EXT:
case HIP_R_6F_E3M2:
hipblaslt_cerr << "hipblaslt_init_hpl_alternating_sign not supports BF6" << std::endl;
break;
case HIP_R_4F_E2M1_EXT:
case HIP_R_4F_E2M1:
hipblaslt_cerr << "hipblaslt_init_hpl_alternating_sign not supports FP4" << std::endl;
break;
default:
Expand Down Expand Up @@ -521,13 +521,13 @@ inline void hipblaslt_init_cos(void* A,
hipblaslt_init_cos<hipblasLtInt8>(
static_cast<hipblasLtInt8*>(A), M, N, lda, stride, batch_count);
break;
case HIP_R_6F_E2M3_EXT:
case HIP_R_6F_E2M3:
hipblaslt_cerr << "hipblaslt_init_cos not supports FP6" << std::endl;
break;
case HIP_R_6F_E3M2_EXT:
case HIP_R_6F_E3M2:
hipblaslt_cerr << "hipblaslt_init_cos not supports BF6" << std::endl;
break;
case HIP_R_4F_E2M1_EXT:
case HIP_R_4F_E2M1:
hipblaslt_cerr << "hipblaslt_init_cos not supports FP4" << std::endl;
break;
default:
Expand Down Expand Up @@ -608,13 +608,13 @@ inline void hipblaslt_init_hpl(void* A,
hipblaslt_init_hpl<hipblasLtInt8>(
static_cast<hipblasLtInt8*>(A), M, N, lda, stride, batch_count);
break;
case HIP_R_6F_E2M3_EXT:
case HIP_R_6F_E2M3:
hipblaslt_cerr << "hipblaslt_init_hpl not supports FP6" << std::endl;
break;
case HIP_R_6F_E3M2_EXT:
case HIP_R_6F_E3M2:
hipblaslt_cerr << "hipblaslt_init_hpl not supports BF6" << std::endl;
break;
case HIP_R_4F_E2M1_EXT:
case HIP_R_4F_E2M1:
hipblaslt_cerr << "hipblaslt_init_hpl not supports FP4" << std::endl;
break;
default:
Expand Down Expand Up @@ -678,13 +678,13 @@ inline void hipblaslt_init_nan(void* A, size_t N, hipDataType type)
case HIP_R_8I:
hipblaslt_init_nan<hipblasLtInt8>(static_cast<hipblasLtInt8*>(A), N);
break;
case HIP_R_6F_E2M3_EXT:
case HIP_R_6F_E2M3:
hipblaslt_cerr << "hipblaslt_init_nan not supports FP6" << std::endl;
break;
case HIP_R_6F_E3M2_EXT:
case HIP_R_6F_E3M2:
hipblaslt_cerr << "hipblaslt_init_nan not supports BF6" << std::endl;
break;
case HIP_R_4F_E2M1_EXT:
case HIP_R_4F_E2M1:
hipblaslt_cerr << "hipblaslt_init_nan not supports FP4" << std::endl;
break;
default:
Expand Down Expand Up @@ -733,13 +733,13 @@ inline void hipblaslt_init_nan(void* A, size_t start_offset, size_t end_offset,
case HIP_R_8I:
hipblaslt_init_nan<hipblasLtInt8>(static_cast<hipblasLtInt8*>(A), start_offset, end_offset);
break;
case HIP_R_6F_E2M3_EXT:
case HIP_R_6F_E2M3:
hipblaslt_cerr << "hipblaslt_init_nan not supports FP6" << std::endl;
break;
case HIP_R_6F_E3M2_EXT:
case HIP_R_6F_E3M2:
hipblaslt_cerr << "hipblaslt_init_nan not supports BF6" << std::endl;
break;
case HIP_R_4F_E2M1_EXT:
case HIP_R_4F_E2M1:
hipblaslt_cerr << "hipblaslt_init_nan not supports FP4" << std::endl;
break;
default:
Expand Down
Loading
Loading