-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] forward GEMM workloads to GEMV when one argument is actually a vector #4708
Conversation
obviously I don't really intend to kick out the recent Loongson patch here - this first draft was thrown together off-grid in an outdated fork |
CodSpeed Performance ReportMerging #4708 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for posting this WIP PR. I was able to confirm the failures CI sees:
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 0.00000 -0.252747
******* SGEMM FAILED ON CALL NUMBER:
3403: SGEMM ('N','N', 1, 1, 0, 0.0, A, 2, B, 1, 0.0, C, 2).
I haven't had time to track it down, but I appreciate the starting point!
char *NT=(char*)malloc(2*sizeof(char)); | ||
if (transb&1)strcpy(NT,"T"); | ||
else NT="N"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip the malloc and go straight to the NT="N"
or NT="T"
assignment yes?
Actually, we don't need null termination so it should just be char NT; NT='N'; ... &NT
.
// fprintf(stderr,"G E M V ! ! ! lda=%d ldb=%d ldc=%d\n",args.lda,args.ldb,args.ldc); | ||
GEMV(NT, &args.m ,&args.k, args.alpha, args.a, &args.lda, args.b, &args.n, args.beta, args.c, &args.n); | ||
//SUBROUTINE SGEMV(TRANS,M,N,ALPHA,A,LDA,X,INCX,BETA,Y,INCY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this a little more. For cases when n = 1, seems like this should be pretty straightforward, and parameters should just directly translate from GEMM to GEMV.
For cases when m = 1 I think there is a challenge. in gemm (C:=AB) A is the vector, but GEMV (y:=Bx). so I think we need to re-arrange the equation by transposing (or, un-transposing) B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrbison @martin-frbg As a general word of caution, be careful with introducing any extraneous transposes for gemm variants dealing with complex numbers (cgemm
and zgemm
).
For complex vectors and matrices, the conjugate transpose is typically used, where the sign of the imaginary components is flipped. As an example, treating a column vector as a row vector (and vice versa) for complex numbers may not be simply a matter of interpreting the dimensions differently.
https://en.wikipedia.org/wiki/Conjugate_transpose
To reduce risk, I'd suggest to completely leave out cgemm
and zgemm
from the proposed optimisation path (ie. forwarding gemm calls to gemv).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@conradsnicta right, likely the performance gains would be much smaller for the complex cases anyway do to the extra coding necessary. So far I haven't even found time to get back to this issue.
@martin-frbg , |
Yes in principle, but I am not convinced we actually have to transform the storage of A for that. (Note that the rough draft I posted here may not even compile. I need to update it and flesh it out when I have time) |
else strcpy(NT,"N"); | ||
#endif | ||
// fprintf(stderr,"G E M V ! ! ! lda=%d ldb=%d ldc=%d\n",args.lda,args.ldb,args.ldc); | ||
GEMV(NT, &args.m ,&args.k, args.alpha, args.a, &args.lda, args.b, &args.n, args.beta, args.c, &args.n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes an error when INTERFACE64=0. The parameters passed by address need to be blasint
and not blaslong
. Same for the other call to GEMV
Thanks... I have an equally unfinished newer version lying around somewhere but got caught up in other things. Let me get the fixups for the SCAL fallout out of the way... but if anybody beats me to it on this here topic it's fine of course. (probably need to remove this from the 0.3.28 milestone anyway so that the release does not get delayed all summer) |
@martin-frbg This is an important PR since some project cases have a large portion of GEMM in which N=1. In these cases we are spending significant time packing buffer(s) which is not necessarily needed if GEMV was called instead. |
I am aware of that, but this has been an important issue for roughly 20 years (i.e. since inception of GotoBLAS), last discussed here sometime in 2015/16 IIRC. We're almost 2 weeks past the tentative release date for 0.3.28, it bundles an excessive number of changes already, and I still need to come up with assembly code fixes for the SCAL issue in a number of architectures (where assembly isn't my strongest skill anyway). |
@martin-frbg I took a look into this in #4814 |
closing as superseded by #4814 |
fixes #4580 and fixes #528