Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] forward GEMM workloads to GEMV when one argument is actually a vector #4708

Closed
wants to merge 1 commit into from

Conversation

martin-frbg
Copy link
Collaborator

fixes #4580 and fixes #528

@martin-frbg
Copy link
Collaborator Author

obviously I don't really intend to kick out the recent Loongson patch here - this first draft was thrown together off-grid in an outdated fork

Copy link

codspeed-hq bot commented May 20, 2024

CodSpeed Performance Report

Merging #4708 will not alter performance

Comparing martin-frbg:issue4580 (c2a9b19) with develop (700ea74)

Summary

✅ 16 untouched benchmarks

Copy link

@lrbison lrbison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for posting this WIP PR. I was able to confirm the failures CI sees:

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
           EXPECTED RESULT   COMPUTED RESULT
       1       0.00000         -0.252747
 ******* SGEMM  FAILED ON CALL NUMBER:
   3403: SGEMM ('N','N',  1,  1,  0, 0.0, A,  2, B,  1, 0.0, C,  2).

I haven't had time to track it down, but I appreciate the starting point!

Comment on lines +502 to +504
char *NT=(char*)malloc(2*sizeof(char));
if (transb&1)strcpy(NT,"T");
else NT="N";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip the malloc and go straight to the NT="N" or NT="T" assignment yes?
Actually, we don't need null termination so it should just be char NT; NT='N'; ... &NT.

Comment on lines +519 to +521
// fprintf(stderr,"G E M V ! ! ! lda=%d ldb=%d ldc=%d\n",args.lda,args.ldb,args.ldc);
GEMV(NT, &args.m ,&args.k, args.alpha, args.a, &args.lda, args.b, &args.n, args.beta, args.c, &args.n);
//SUBROUTINE SGEMV(TRANS,M,N,ALPHA,A,LDA,X,INCX,BETA,Y,INCY)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this a little more. For cases when n = 1, seems like this should be pretty straightforward, and parameters should just directly translate from GEMM to GEMV.

For cases when m = 1 I think there is a challenge. in gemm (C:=AB) A is the vector, but GEMV (y:=Bx). so I think we need to re-arrange the equation by transposing (or, un-transposing) B.

Copy link

@conradsnicta conradsnicta Jul 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrbison @martin-frbg As a general word of caution, be careful with introducing any extraneous transposes for gemm variants dealing with complex numbers (cgemm and zgemm).

For complex vectors and matrices, the conjugate transpose is typically used, where the sign of the imaginary components is flipped. As an example, treating a column vector as a row vector (and vice versa) for complex numbers may not be simply a matter of interpreting the dimensions differently.
https://en.wikipedia.org/wiki/Conjugate_transpose

To reduce risk, I'd suggest to completely leave out cgemm and zgemm from the proposed optimisation path (ie. forwarding gemm calls to gemv).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@conradsnicta right, likely the performance gains would be much smaller for the complex cases anyway do to the extra coding necessary. So far I haven't even found time to get back to this issue.

@akote123
Copy link

@martin-frbg ,
Thank you for the PR. Just to be sure on sending data to GEMV: For example, when A is matrix 1xn and B is of nxk, then are we flattening A ( i.e to convert matrix to vector) to make it compatible with GEMV.

@martin-frbg
Copy link
Collaborator Author

@martin-frbg , Thank you for the PR. Just to be sure on sending data to GEMV: For example, when A is matrix 1xn and B is of nxk, then are we flattening A ( i.e to convert matrix to vector) to make it compatible with GEMV.

Yes in principle, but I am not convinced we actually have to transform the storage of A for that. (Note that the rough draft I posted here may not even compile. I need to update it and flesh it out when I have time)

else strcpy(NT,"N");
#endif
// fprintf(stderr,"G E M V ! ! ! lda=%d ldb=%d ldc=%d\n",args.lda,args.ldb,args.ldc);
GEMV(NT, &args.m ,&args.k, args.alpha, args.a, &args.lda, args.b, &args.n, args.beta, args.c, &args.n);
Copy link
Contributor

@ChipKerchner ChipKerchner Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes an error when INTERFACE64=0. The parameters passed by address need to be blasint and not blaslong. Same for the other call to GEMV

@martin-frbg
Copy link
Collaborator Author

Thanks... I have an equally unfinished newer version lying around somewhere but got caught up in other things. Let me get the fixups for the SCAL fallout out of the way... but if anybody beats me to it on this here topic it's fine of course. (probably need to remove this from the 0.3.28 milestone anyway so that the release does not get delayed all summer)

@ChipKerchner
Copy link
Contributor

@martin-frbg This is an important PR since some project cases have a large portion of GEMM in which N=1. In these cases we are spending significant time packing buffer(s) which is not necessarily needed if GEMV was called instead.

@martin-frbg
Copy link
Collaborator Author

I am aware of that, but this has been an important issue for roughly 20 years (i.e. since inception of GotoBLAS), last discussed here sometime in 2015/16 IIRC. We're almost 2 weeks past the tentative release date for 0.3.28, it bundles an excessive number of changes already, and I still need to come up with assembly code fixes for the SCAL issue in a number of architectures (where assembly isn't my strongest skill anyway).

@Mousius
Copy link
Contributor

Mousius commented Jul 23, 2024

@martin-frbg I took a look into this in #4814

@martin-frbg
Copy link
Collaborator Author

closing as superseded by #4814

@martin-frbg martin-frbg closed this Aug 3, 2024
@martin-frbg martin-frbg removed this from the 0.3.28 milestone Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants