Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Use batched mul_mat pathway #5591

Merged
merged 3 commits into from
Mar 1, 2024

Conversation

AidanBeltonS
Copy link
Collaborator

This PR enables using the batched mul_mat pathway when appropriate. Previously the single gemm path was being taken and it was not suitable for the type of operation causing segfaults. This PR changes things to more closely match the CUDA impl and use the batched gemm path.

This change allows a lot more tests to pass for SYCL devices. There is one limitation with this approach, we cannot use non default precision operations. As oneMKL has not open sourced the gemm_batch for the data types <half, half, float, float> (corresponding to <src0, src1, dst, scaling>) yet. This is something I have raised with oneMKL

@AidanBeltonS
Copy link
Collaborator Author

@NeoZhangJianyu, @abhilash1910, @Alcpz, feedback would be appreciated

ggml-sycl.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@abhilash1910 abhilash1910 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we can use this until MKL adds the dtypes for batched gemm . Pinging @airMeng @ggerganov for a look when available.
@AidanBeltonS could you please rebase , should fix the android build issue. Thanks

@abhilash1910 abhilash1910 merged commit 38d1521 into ggerganov:master Mar 1, 2024
59 checks passed
hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024
* Use batched mul_mat pathway

* rm extra line

* Explicitly state scaled data type

---------

Co-authored-by: Abhilash Majumder <[email protected]>
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* Use batched mul_mat pathway

* rm extra line

* Explicitly state scaled data type

---------

Co-authored-by: Abhilash Majumder <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Use batched mul_mat pathway

* rm extra line

* Explicitly state scaled data type

---------

Co-authored-by: Abhilash Majumder <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants