Skip to content

optimize 3264x3072x1536xNxN#2596

Merged
jfactory07 merged 6 commits into
hipblaslt_common_cms_devfrom
users/jzhou/3264x3072x1536xNxN-3
Nov 19, 2025
Merged

optimize 3264x3072x1536xNxN#2596
jfactory07 merged 6 commits into
hipblaslt_common_cms_devfrom
users/jzhou/3264x3072x1536xNxN-3

Conversation

@jfactory07
Copy link
Copy Markdown
Contributor

@jfactory07 jfactory07 commented Nov 11, 2025

Motivation

optimize 3264x3072x1536xNxN

Technical Details

change MT to 192x256x64, so it can leverage current CMS

Test Result

test for 3264x3072x1536xNxN
got 12% uplift

Submission Checklist

@jfactory07 jfactory07 requested a review from a team as a code owner November 11, 2025 02:48
@jfactory07 jfactory07 added the gfx950 run CI on gfx950 label Nov 11, 2025
@jfactory07 jfactory07 mentioned this pull request Nov 11, 2025
1 task
@jfactory07 jfactory07 mentioned this pull request Nov 11, 2025
1 task
@math-ci
Copy link
Copy Markdown

math-ci Bot commented Nov 11, 2025

perfci run on commit 0b26925

math-ci run

@jfactory07 jfactory07 merged commit a3e2446 into hipblaslt_common_cms_dev Nov 19, 2025
6 checks passed
@jfactory07 jfactory07 deleted the users/jzhou/3264x3072x1536xNxN-3 branch November 19, 2025 08:52
minsukim-amd pushed a commit that referenced this pull request Nov 25, 2025
## Motivation

optimize 3264x3072x1536xNxN

## Technical Details

change MT to 192x256x64, so it can leverage current CMS


## Test Result

test for 3264x3072x1536xNxN
got 12% uplift
b-shi pushed a commit that referenced this pull request Dec 12, 2025
## Motivation

optimize 3264x3072x1536xNxN

## Technical Details

change MT to 192x256x64, so it can leverage current CMS


## Test Result

test for 3264x3072x1536xNxN
got 12% uplift
ammallya pushed a commit that referenced this pull request Feb 3, 2026
* refactor moe_sorting ctests to use gtest framework

* Refactor ctests for smoothquant to gtests

* fix clang format to use version 18

* Print local_eid in MOE sorting gtests

* Remove extra space in smoothquant output

[ROCm/composable_kernel commit: 70dce4e]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants