-
Notifications
You must be signed in to change notification settings - Fork 1
UPSTREAM PR #17030: ggml-cpu: handle 3d tensors in repack mat_mul #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
UPSTREAM PR #17030: ggml-cpu: handle 3d tensors in repack mat_mul #94
Conversation
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: PR #94 - 3D Tensor Support in Matrix MultiplicationOverviewPR #94 introduces 3D tensor support for batched matrix multiplication operations in the GGML CPU backend, specifically targeting models like LFM2 that require Key FindingsPerformance Impact:
Core Function Impact: Power Consumption Analysis:
Technical Analysis:
Implementation Details: Actionable Recommendations:
The modifications successfully enable batched matrix operations for advanced model architectures while introducing acceptable performance overhead in specialized code paths. |
eadb483 to
0b86651
Compare
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: PR #94 - 3D Tensor SupportOverviewPR #94 introduces 3D tensor support for repack matrix multiplication operations, specifically targeting models like LFM2 that require batched operations with Key FindingsPerformance Impact:
Core Function Impact: Power Consumption Analysis:
Technical Analysis:
Implementation Changes:
Scope Assessment: |
b1ace60 to
bff7103
Compare
733e776 to
2c7fec2
Compare
048ad94 to
6c1fde6
Compare
0cb533b to
ef7afbe
Compare
Mirrored from ggml-org/llama.cpp#17030
While testing #16739, perplexities for LFM2 skyrocketed. @ggerganov pointed out that some matrix shapes would probably not be supported.
LFM2 has some layers that have two batches, so MAT_MULs were only done partially, leading to incorrect results. See ggml-org/llama.cpp#16739 (comment)
This patch adds basic support for tensors with
ne2 > 1, with very naive chunking based on the non repack MUL MAT.Perplexities using this patch:
I can provide logs for other models if needed.