pickup specialization microkernels for gfx950 3.8.0rc20250909#2242
pickup specialization microkernels for gfx950 3.8.0rc20250909#2242dezhiAmd wants to merge 9 commits into
Conversation
Signed-off-by: dezhliao <dezhi.liao@amd.com>
…Perplexity[False] tests Signed-off-by: dezhliao <dezhi.liao@amd.com>
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2242 +/- ##
=======================================
Coverage ? 78.01%
=======================================
Files ? 228
Lines ? 22032
Branches ? 0
=======================================
Hits ? 17188
Misses ? 4844
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: dezhliao <dezhliao@amd.com>
Signed-off-by: dezhliao <dezhi.liao@amd.com>
Signed-off-by: dezhliao <dezhi.liao@amd.com>
Signed-off-by: dezhliao <dezhliao@amd.com>
| ), | ||
| ), | ||
| False, | ||
| pytest.param( |
There was a problem hiding this comment.
We should not be bumping if this perplexity is failing - this needs more details if it is going to be xfailed.
| fail-fast: false | ||
| matrix: | ||
| include: | ||
| - name: cpu |
There was a problem hiding this comment.
This shouldn't be removed - smoke test on CPU is still important. Only the batcher tests make sense to be removed.
There was a problem hiding this comment.
The same iree issue iree-org/iree#22007 break smoke test on CPU.
I am curious about the scenarios where compiling MLIR to a VMFB for a CPU target would be beneficial. From my understanding, AMD's strengths lie in GPU hardware, and AI inference workloads are typically GPU-accelerated. So I'm trying to better understand the rationale or use cases behind targeting the CPU in this context
Signed-off-by: dezhliao <dezhi.liao@amd.com>
Signed-off-by: dezhliao <dezhi.liao@amd.com>
|
Replace this PR with #2205 |
pickup specialization microkernels for gfx950.
Refer to this IREE commit
Test result on gfx950 shows including the below compiling option when using iree-compile get better performance:
--iree-hip-enable-tensor-ukernels