pickup specialization microkernels for gfx950 3.8.0rc20250909 by dezhiAmd · Pull Request #2242 · nod-ai/amd-shark-ai

dezhiAmd · 2025-09-12T21:08:54Z

pickup specialization microkernels for gfx950.
Refer to this IREE commit

Test result on gfx950 shows including the below compiling option when using iree-compile get better performance:
--iree-hip-enable-tensor-ukernels

Signed-off-by: dezhliao <dezhi.liao@amd.com>

…Perplexity[False] tests Signed-off-by: dezhliao <dezhi.liao@amd.com>

codecov-commenter · 2025-09-13T01:28:48Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@0ce9072). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2242   +/-   ##
=======================================
  Coverage        ?   78.01%           
=======================================
  Files           ?      228           
  Lines           ?    22032           
  Branches        ?        0           
=======================================
  Hits            ?    17188           
  Misses          ?     4844           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: dezhliao <dezhliao@amd.com>

Signed-off-by: dezhliao <dezhi.liao@amd.com>

Signed-off-by: dezhliao <dezhliao@amd.com>

rsuderman · 2025-09-16T18:28:29Z

            ),
        ),
-        False,
+        pytest.param(


We should not be bumping if this perplexity is failing - this needs more details if it is going to be xfailed.

The related iree issue is here

rsuderman · 2025-09-16T18:28:55Z

      fail-fast: false
      matrix:
        include:
-          - name: cpu


This shouldn't be removed - smoke test on CPU is still important. Only the batcher tests make sense to be removed.

The same iree issue iree-org/iree#22007 break smoke test on CPU.

I am curious about the scenarios where compiling MLIR to a VMFB for a CPU target would be beneficial. From my understanding, AMD's strengths lie in GPU hardware, and AI inference workloads are typically GPU-accelerated. So I'm trying to better understand the rationale or use cases behind targeting the CPU in this context

Signed-off-by: dezhliao <dezhi.liao@amd.com>

dezhiAmd · 2025-09-17T21:57:58Z

Replace this PR with #2205

dezhiAmd added 2 commits September 12, 2025 14:07

test 3.8.0rc20250909

919fa63

Signed-off-by: dezhliao <dezhi.liao@amd.com>

writing a temporary xfail marker for the TestToyLlamaIree::testDecode…

86b60c6

…Perplexity[False] tests Signed-off-by: dezhliao <dezhi.liao@amd.com>

ROCm 6.2 index only provides up to torch2.5.1, up ROCm to 6.4

218e6f4

Signed-off-by: dezhliao <dezhliao@amd.com>

dezhiAmd mentioned this pull request Sep 16, 2025

Bump IREE requirement pins to 3.8.0rc20250923 #2205

Merged

dezhiAmd and others added 4 commits September 16, 2025 10:23

resolve conflict

a919fdf

Signed-off-by: dezhliao <dezhi.liao@amd.com>

remove smoke_test on cpu, remove direct_to_batcher_test on cpu

b1dfc31

Signed-off-by: dezhliao <dezhi.liao@amd.com>

reformat

768fc33

Signed-off-by: dezhliao <dezhliao@amd.com>

remove added files by accident

8258cf2

Signed-off-by: dezhliao <dezhliao@amd.com>

dezhiAmd changed the title ~~test 3.8.0rc20250909~~ pickup specialization microkernels for gfx950 3.8.0rc20250909 Sep 16, 2025

dezhiAmd marked this pull request as ready for review September 16, 2025 17:47

dezhiAmd requested a review from rsuderman September 16, 2025 17:58

dezhiAmd enabled auto-merge (squash) September 16, 2025 17:58

rsuderman requested changes Sep 16, 2025

View reviewed changes

Add details about xfail

4fa46ae

Signed-off-by: dezhliao <dezhi.liao@amd.com>

dezhiAmd requested a review from rsuderman September 17, 2025 00:08

revert change to pytorch-rocm-requirements.txt

9bdf62d

Signed-off-by: dezhliao <dezhi.liao@amd.com>

dezhiAmd closed this Sep 17, 2025

auto-merge was automatically disabled September 17, 2025 21:57
Pull request was closed

dezhiAmd deleted the gfx950_ukernel branch September 17, 2025 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pickup specialization microkernels for gfx950 3.8.0rc20250909#2242

pickup specialization microkernels for gfx950 3.8.0rc20250909#2242
dezhiAmd wants to merge 9 commits into
nod-ai:mainfrom
dezhiAmd:gfx950_ukernel

dezhiAmd commented Sep 12, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 13, 2025 •

edited

Loading

Uh oh!

rsuderman Sep 16, 2025

Uh oh!

dezhiAmd Sep 16, 2025

Uh oh!

rsuderman Sep 16, 2025

Uh oh!

dezhiAmd Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

dezhiAmd commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dezhiAmd commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rsuderman Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

dezhiAmd Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

rsuderman Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

dezhiAmd Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dezhiAmd commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dezhiAmd commented Sep 12, 2025 •

edited

Loading

codecov-commenter commented Sep 13, 2025 •

edited

Loading

dezhiAmd Sep 16, 2025 •

edited

Loading