Skip to content

Conversation

@slojosic-amd
Copy link
Contributor

No description provided.

@github-actions github-actions bot added documentation Improvements or additions to documentation Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 13, 2025
@fjankovi
Copy link

CC: @powderluv

@slojosic-amd
Copy link
Contributor Author

@JohannesGaessler Could you please update the labels because I don't have correct permissions for that:

GraphQL: slojosic-amd does not have the correct permissions to execute AddLabelsToLabelable (addLabelsToLabelable)

CUBLAS_CHECK(cublasSetStream(ctx.cublas_handle(id), stream));

if (GGML_CUDA_CC_IS_CDNA(compute_capability)) {
if (GGML_CUDA_CC_IS_CDNA(compute_capability) || GGML_CUDA_CC_IS_RDNA4(compute_capability)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If V_WMMA_F32_16X16X16_F16 dose better here than V_WMMA_F16_16X16X16_F16 on rdna4 it stands to reason that it dose on rdna3 too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V_WMMA_F32_16X16X16_F16 does better on RDNA4 because hipBLASLt has support for it and hipBLASLt is default rocBLAS backend for non-batched and strided batched GEMMs on gfx12. However, Tensile is default backend for gfx11 and perf numbers are worse with V_WMMA_F32_16X16X16_F16 on gfx11

Copy link
Collaborator

@IMbackK IMbackK Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to fix this rather arbitrary limitation on gfx11 in rocblas/hipblaslt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if that's worth an issue report?

@codeliger
Copy link

Any progress on this PR?

@slojosic-amd
Copy link
Contributor Author

Any progress on this PR?
Sorry, I was on sick leave ...

@slojosic-amd slojosic-amd requested a review from IMbackK March 26, 2025 17:58
CUBLAS_CHECK(cublasSetStream(ctx.cublas_handle(id), stream));

if (GGML_CUDA_CC_IS_CDNA(cc)) {
const int compute_capability = ggml_cuda_info().devices[ctx.device].cc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not repeat a value already available in the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 6b46213

CUBLAS_CHECK(cublasSetStream(ctx.cublas_handle(id), stream));

if (GGML_CUDA_CC_IS_CDNA(compute_capability)) {
if (GGML_CUDA_CC_IS_CDNA(compute_capability) || GGML_CUDA_CC_IS_RDNA4(compute_capability)) {
Copy link
Collaborator

@IMbackK IMbackK Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to fix this rather arbitrary limitation on gfx11 in rocblas/hipblaslt?

@IMbackK IMbackK merged commit bd40678 into ggml-org:master Mar 26, 2025
48 checks passed
@thevishalagarwal
Copy link
Contributor

Is this supported on Windows? How can I build for gfx1200 on Windows?

@jammm
Copy link
Contributor

jammm commented Apr 12, 2025

Is this supported on Windows? How can I build for gfx1200 on Windows?

it should theoretically compile fine on Windows. At least it did for RDNA3 with the HIP SDK a while ago..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants