[WIP] forward GEMM workloads to GEMV when one argument is actually a vector #4708

martin-frbg · 2024-05-20T20:41:43Z

fixes #4580 and fixes #528

martin-frbg · 2024-05-20T20:46:32Z

obviously I don't really intend to kick out the recent Loongson patch here - this first draft was thrown together off-grid in an outdated fork

codspeed-hq · 2024-05-20T21:02:37Z

CodSpeed Performance Report

Merging #4708 will not alter performance

_{Comparing martin-frbg:issue4580 (c2a9b19) with develop (700ea74)}

Summary

✅ 16 untouched benchmarks

lrbison

Thank you for posting this WIP PR. I was able to confirm the failures CI sees:

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
           EXPECTED RESULT   COMPUTED RESULT
       1       0.00000         -0.252747
 ******* SGEMM  FAILED ON CALL NUMBER:
   3403: SGEMM ('N','N',  1,  1,  0, 0.0, A,  2, B,  1, 0.0, C,  2).

I haven't had time to track it down, but I appreciate the starting point!

lrbison · 2024-05-21T13:44:23Z

interface/gemm.c

+	  char *NT=(char*)malloc(2*sizeof(char));
+	  if (transb&1)strcpy(NT,"T");
+	  else NT="N"; 


I think we can skip the malloc and go straight to the NT="N" or NT="T" assignment yes?
Actually, we don't need null termination so it should just be char NT; NT='N'; ... &NT.

lrbison · 2024-05-21T18:48:32Z

interface/gemm.c

+//	  fprintf(stderr,"G E M V ! ! ! lda=%d ldb=%d ldc=%d\n",args.lda,args.ldb,args.ldc);
+	  GEMV(NT, &args.m ,&args.k, args.alpha, args.a, &args.lda, args.b, &args.n, args.beta, args.c, &args.n);
+//SUBROUTINE SGEMV(TRANS,M,N,ALPHA,A,LDA,X,INCX,BETA,Y,INCY)


I was thinking about this a little more. For cases when n = 1, seems like this should be pretty straightforward, and parameters should just directly translate from GEMM to GEMV.

For cases when m = 1 I think there is a challenge. in gemm (C:=AB) A is the vector, but GEMV (y:=Bx). so I think we need to re-arrange the equation by transposing (or, un-transposing) B.

@lrbison @martin-frbg As a general word of caution, be careful with introducing any extraneous transposes for gemm variants dealing with complex numbers (cgemm and zgemm).

For complex vectors and matrices, the conjugate transpose is typically used, where the sign of the imaginary components is flipped. As an example, treating a column vector as a row vector (and vice versa) for complex numbers may not be simply a matter of interpreting the dimensions differently.
https://en.wikipedia.org/wiki/Conjugate_transpose

To reduce risk, I'd suggest to completely leave out cgemm and zgemm from the proposed optimisation path (ie. forwarding gemm calls to gemv).

@conradsnicta right, likely the performance gains would be much smaller for the complex cases anyway do to the extra coding necessary. So far I haven't even found time to get back to this issue.

akote123 · 2024-06-12T07:11:25Z

@martin-frbg ,
Thank you for the PR. Just to be sure on sending data to GEMV: For example, when A is matrix 1xn and B is of nxk, then are we flattening A ( i.e to convert matrix to vector) to make it compatible with GEMV.

martin-frbg · 2024-06-12T08:41:25Z

@martin-frbg , Thank you for the PR. Just to be sure on sending data to GEMV: For example, when A is matrix 1xn and B is of nxk, then are we flattening A ( i.e to convert matrix to vector) to make it compatible with GEMV.

Yes in principle, but I am not convinced we actually have to transform the storage of A for that. (Note that the rough draft I posted here may not even compile. I need to update it and flesh it out when I have time)

ChipKerchner · 2024-07-11T18:24:11Z

interface/gemm.c

+	  else strcpy(NT,"N"); 
+#endif
+//	  fprintf(stderr,"G E M V ! ! ! lda=%d ldb=%d ldc=%d\n",args.lda,args.ldb,args.ldc);
+	  GEMV(NT, &args.m ,&args.k, args.alpha, args.a, &args.lda, args.b, &args.n, args.beta, args.c, &args.n);


This causes an error when INTERFACE64=0. The parameters passed by address need to be blasint and not blaslong. Same for the other call to GEMV

martin-frbg · 2024-07-11T18:52:00Z

Thanks... I have an equally unfinished newer version lying around somewhere but got caught up in other things. Let me get the fixups for the SCAL fallout out of the way... but if anybody beats me to it on this here topic it's fine of course. (probably need to remove this from the 0.3.28 milestone anyway so that the release does not get delayed all summer)

ChipKerchner · 2024-07-11T20:36:38Z

@martin-frbg This is an important PR since some project cases have a large portion of GEMM in which N=1. In these cases we are spending significant time packing buffer(s) which is not necessarily needed if GEMV was called instead.

martin-frbg · 2024-07-11T20:59:31Z

I am aware of that, but this has been an important issue for roughly 20 years (i.e. since inception of GotoBLAS), last discussed here sometime in 2015/16 IIRC. We're almost 2 weeks past the tentative release date for 0.3.28, it bundles an excessive number of changes already, and I still need to come up with assembly code fixes for the SCAL issue in a number of architectures (where assembly isn't my strongest skill anyway).

Mousius · 2024-07-23T22:28:30Z

@martin-frbg I took a look into this in #4814

martin-frbg · 2024-08-03T13:56:25Z

closing as superseded by #4814

forward to GEMV when one argument is actually a vector

c2a9b19

martin-frbg added this to the 0.3.28 milestone May 20, 2024

martin-frbg mentioned this pull request May 20, 2024

Openblas sgemm is slower for small size matrices in aarch64 #4580

Closed

lrbison reviewed May 21, 2024

View reviewed changes

ChipKerchner reviewed Jul 11, 2024

View reviewed changes

Mousius mentioned this pull request Jul 23, 2024

Forward GEMM to GEMV when one argument is actually a vector #4814

Merged

martin-frbg closed this Aug 3, 2024

martin-frbg removed this from the 0.3.28 milestone Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] forward GEMM workloads to GEMV when one argument is actually a vector #4708

[WIP] forward GEMM workloads to GEMV when one argument is actually a vector #4708

martin-frbg commented May 20, 2024

martin-frbg commented May 20, 2024

codspeed-hq bot commented May 20, 2024

lrbison left a comment

lrbison May 21, 2024

lrbison May 21, 2024

conradsnicta Jul 6, 2024 •

edited

Loading

martin-frbg Jul 6, 2024

akote123 commented Jun 12, 2024

martin-frbg commented Jun 12, 2024

ChipKerchner Jul 11, 2024 •

edited

Loading

martin-frbg commented Jul 11, 2024

ChipKerchner commented Jul 11, 2024

martin-frbg commented Jul 11, 2024

Mousius commented Jul 23, 2024

martin-frbg commented Aug 3, 2024

[WIP] forward GEMM workloads to GEMV when one argument is actually a vector #4708

[WIP] forward GEMM workloads to GEMV when one argument is actually a vector #4708

Conversation

martin-frbg commented May 20, 2024

martin-frbg commented May 20, 2024

codspeed-hq bot commented May 20, 2024

CodSpeed Performance Report

Merging #4708 will not alter performance

Summary

lrbison left a comment

Choose a reason for hiding this comment

lrbison May 21, 2024

Choose a reason for hiding this comment

lrbison May 21, 2024

Choose a reason for hiding this comment

conradsnicta Jul 6, 2024 • edited Loading

Choose a reason for hiding this comment

martin-frbg Jul 6, 2024

Choose a reason for hiding this comment

akote123 commented Jun 12, 2024

martin-frbg commented Jun 12, 2024

ChipKerchner Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

martin-frbg commented Jul 11, 2024

ChipKerchner commented Jul 11, 2024

martin-frbg commented Jul 11, 2024

Mousius commented Jul 23, 2024

martin-frbg commented Aug 3, 2024

conradsnicta Jul 6, 2024 •

edited

Loading

ChipKerchner Jul 11, 2024 •

edited

Loading