ggml-cpu: handle 3d tensors in repack mat_mul#17030
ggml-cpu: handle 3d tensors in repack mat_mul#17030ggerganov merged 6 commits intoggml-org:masterfrom
Conversation
eadb483 to
950671d
Compare
|
@ggerganov This is ready for review now. Thanks for your patience. |
|
|
||
| const char * src0_ptr = (const char *) src0->data + i02 * nb02; | ||
| const char * src1_ptr = (const char *) params->wdata + (i11 + i12 * ne11) * src1_col_stride; | ||
| char * dst_ptr = ((char *) dst->data + (i1 * nb1 + i2 * nb2)); | ||
|
|
There was a problem hiding this comment.
Add GGML_ASSERT here that guarantees we are within bounds of [params->wdata, params->wdata + params->wsize)
There was a problem hiding this comment.
Added one GGML_ASSERT for the upper bound. The lower bounds should always be > params->wdata since as long as ne1 and ne11 are >= 1, i11 and i12 are positive.
5a202b9 to
d1938ad
Compare
|
@ggerganov I've addressed all your comments. Let me know if something else is required. |
|
@Alcpz Here is llama3.2-1B-Q4_0 running with 6 threads with instrumented matmul code. The instrumentation simply counts number of processed chunks and the time per thread That's way too many chunks and we burn a lot of time on syncrhonization. |
|
Mmm. Let's revert this then. I will reopen a PR with the branch as a draft and we can have a better solution. I'd rather not introduce a regression upstream. @ggerganov Mind doing the revert? |
This reverts commit 1c398dc.
* ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries
…" (ggml-org#17233) This reverts commit cad17c7.
* ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries
…" (ggml-org#17233) This reverts commit 1c398dc.
While testing #16739, perplexities for LFM2 skyrocketed. @ggerganov pointed out that some matrix shapes would probably not be supported.
LFM2 has some layers that have two batches, so MAT_MULs were only done partially, leading to incorrect results. See #16739 (comment)
This patch adds basic support for tensors with
ne2 > 1, with very naive chunking based on the non repack MUL MAT.Perplexities using this patch:
I can provide logs for other models if needed.
Repro commands:
Other models: