Skip to content

[hipBLASLt] Enable MX data generation for Tensile host and support calling Tensile MX kernel#4599

Merged
amd-chunxlin merged 4 commits into
gfx950_mx_rebasefrom
users/chunxlin/mxGen
Feb 20, 2026
Merged

[hipBLASLt] Enable MX data generation for Tensile host and support calling Tensile MX kernel#4599
amd-chunxlin merged 4 commits into
gfx950_mx_rebasefrom
users/chunxlin/mxGen

Conversation

@amd-chunxlin
Copy link
Copy Markdown
Contributor

@amd-chunxlin amd-chunxlin commented Feb 16, 2026

Motivation

This PR enables using mxDataGenerator when Tensile is the host and supports calling FP4 kernels generated from Tensile.

Technical Details

  • Add a FP4 library (yaml) generated by Tensile under GridBased category: YAML
  • Remove macros to use mxDataGenerator regardless which host used. Now the default C++ standard is set to C++20 as it is required by mxDataGenerator.
  • Support calling Tensile FP4 solutions

Test Plan

Use cmake preset build with rocRoller host off (i.e., use Tensile as host) , gpu target set to gfx950 and -DBUILD_TESTING:BOOL=OFF (turn off tensileLite test which will error out during build)

  • Use hipblaslt-test
    ./clients/hipblaslt-test --gtest_filter=*matmul_tensile_fp4*

  • Use hipblaslt-bench
    ./clients/hipblaslt-bench --iters 0 --cold_iters 0 --transA T --transB N --a_type f4_r --b_type f4_r --c_type f32_r --d_type f32_r -m 256 -n 256 -k 256 --alpha 2.1 --beta 0.7 --scaleA 3 --scaleB 3 --scale_type f32_r --verify

Test Result

Submission Checklist

@amd-chunxlin amd-chunxlin marked this pull request as ready for review February 17, 2026 19:56
@amd-chunxlin amd-chunxlin requested review from a team as code owners February 17, 2026 19:56
tensileProblem.setSwizzleTensorA(prob.swizzleA);
tensileProblem.setSwizzleTensorB(prob.swizzleB);

if(prob.scaleAType == RocblasltContractionProblem::ScalingFormat::Block_32_UE8M0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also handle Block_32_UE8M0_32_8_EXT

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Block_32_UE8M0_32_8_EXT, thanks!

// NOTE: an assumption here is A & B must be both MX data types or non-MX data types.
// Mixing is not supported.
if(!problemType.useScaleAB.empty() or
(problemType.mxBlockA == 32 && problemType.mxBlockB == 32)) //kernel input data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should check that mxBlockA != 0 instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to !=0 instead of ==32, thanks!

@amd-chunxlin amd-chunxlin merged commit e0a7991 into gfx950_mx_rebase Feb 20, 2026
13 of 22 checks passed
@amd-chunxlin amd-chunxlin deleted the users/chunxlin/mxGen branch February 20, 2026 19:36
@amd-chunxlin amd-chunxlin restored the users/chunxlin/mxGen branch March 2, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants