Add support for new gfx1200 and gfx1201 targets #12372

slojosic-amd · 2025-03-13T15:08:49Z

No description provided.

fjankovi · 2025-03-13T15:40:16Z

CC: @powderluv

slojosic-amd · 2025-03-13T15:44:51Z

@JohannesGaessler Could you please update the labels because I don't have correct permissions for that:

GraphQL: slojosic-amd does not have the correct permissions to execute AddLabelsToLabelable (addLabelsToLabelable)

docs/build.md

ggml/src/ggml-cuda/common.cuh

ggml/src/ggml-cuda/ggml-cuda.cu

IMbackK · 2025-03-16T18:31:29Z

ggml/src/ggml-cuda/ggml-cuda.cu

        CUBLAS_CHECK(cublasSetStream(ctx.cublas_handle(id), stream));

-        if (GGML_CUDA_CC_IS_CDNA(compute_capability)) {
+        if (GGML_CUDA_CC_IS_CDNA(compute_capability) || GGML_CUDA_CC_IS_RDNA4(compute_capability)) {


If V_WMMA_F32_16X16X16_F16 dose better here than V_WMMA_F16_16X16X16_F16 on rdna4 it stands to reason that it dose on rdna3 too.

V_WMMA_F32_16X16X16_F16 does better on RDNA4 because hipBLASLt has support for it and hipBLASLt is default rocBLAS backend for non-batched and strided batched GEMMs on gfx12. However, Tensile is default backend for gfx11 and perf numbers are worse with V_WMMA_F32_16X16X16_F16 on gfx11

Is there a plan to fix this rather arbitrary limitation on gfx11 in rocblas/hipblaslt?

Wonder if that's worth an issue report?

codeliger · 2025-03-24T18:19:21Z

Any progress on this PR?

ggml/src/ggml-cuda/vendors/hip.h

slojosic-amd · 2025-03-26T10:43:31Z

Any progress on this PR?
Sorry, I was on sick leave ...

IMbackK · 2025-03-26T18:36:45Z

ggml/src/ggml-cuda/ggml-cuda.cu

        CUBLAS_CHECK(cublasSetStream(ctx.cublas_handle(id), stream));

-        if (GGML_CUDA_CC_IS_CDNA(cc)) {
+        const int compute_capability = ggml_cuda_info().devices[ctx.device].cc;


do not repeat a value already available in the function

Done: 6b46213

IMbackK · 2025-03-26T18:42:35Z

ggml/src/ggml-cuda/ggml-cuda.cu

        CUBLAS_CHECK(cublasSetStream(ctx.cublas_handle(id), stream));

-        if (GGML_CUDA_CC_IS_CDNA(compute_capability)) {
+        if (GGML_CUDA_CC_IS_CDNA(compute_capability) || GGML_CUDA_CC_IS_RDNA4(compute_capability)) {


Is there a plan to fix this rather arbitrary limitation on gfx11 in rocblas/hipblaslt?

thevishalagarwal · 2025-04-10T05:08:48Z

Is this supported on Windows? How can I build for gfx1200 on Windows?

jammm · 2025-04-12T02:56:55Z

Is this supported on Windows? How can I build for gfx1200 on Windows?

it should theoretically compile fine on Windows. At least it did for RDNA3 with the HIP SDK a while ago..

slojosic-amd added 2 commits March 13, 2025 06:50

HIP: Add support for new gfx1200 and gfx1201 targets

2d7a1f9

HIP: Avoid fp32->fp16->fp32 conversion on RDNA4

f2872aa

slojosic-amd requested a review from JohannesGaessler as a code owner March 13, 2025 15:08

github-actions bot added documentation Improvements or additions to documentation Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 13, 2025

JohannesGaessler requested a review from IMbackK March 16, 2025 14:05

IMbackK suggested changes Mar 16, 2025

View reviewed changes

GDsouza mentioned this pull request Mar 18, 2025

AMD RX9070/9070XT support ollama/ollama#9812

Closed

likelovewant mentioned this pull request Mar 25, 2025

请求增加RX9070/9070XT支持 likelovewant/ollama-for-amd#69

Closed

cgmb reviewed Mar 26, 2025

View reviewed changes

ggml/src/ggml-cuda/vendors/hip.h Outdated Show resolved Hide resolved

slojosic-amd added 4 commits March 26, 2025 10:03

Addressed few comments from code review

42840e9

HIP: Fixed fp32->fp16->fp32 conversion on RDNA4

d768080

Merge branch 'master' into feature/gfx120X_targets

f763866

bugfix

f18ad77

slojosic-amd requested a review from IMbackK March 26, 2025 17:58

IMbackK suggested changes Mar 26, 2025

View reviewed changes

Additional code review changes

6b46213

IMbackK approved these changes Mar 26, 2025

View reviewed changes

IMbackK merged commit bd40678 into ggml-org:master Mar 26, 2025
48 checks passed

rokups mentioned this pull request Apr 5, 2025

rx 9070 / rx 9070 xt support lmstudio-ai/lmstudio-bug-tracker#574

Closed

Beinsezii mentioned this pull request Apr 25, 2025

Force FP32 compute in GLM4 FFN Down #13101

Merged

Add support for new gfx1200 and gfx1201 targets #12372

Add support for new gfx1200 and gfx1201 targets #12372

Uh oh!

Conversation

slojosic-amd commented Mar 13, 2025

Uh oh!

fjankovi commented Mar 13, 2025

Uh oh!

slojosic-amd commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IMbackK Mar 16, 2025

Choose a reason for hiding this comment

Uh oh!

slojosic-amd Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

IMbackK Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Beinsezii Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

codeliger commented Mar 24, 2025

Uh oh!

Uh oh!

slojosic-amd commented Mar 26, 2025

Uh oh!

IMbackK Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

slojosic-amd Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

IMbackK Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thevishalagarwal commented Apr 10, 2025

Uh oh!

jammm commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

IMbackK Mar 26, 2025 •

edited

Loading

IMbackK Mar 26, 2025 •

edited

Loading

jammm commented Apr 12, 2025 •

edited

Loading