UPSTREAM PR #17526: ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #337
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Mirrored from ggml-org/llama.cpp#17526
For small shapes where the number of columns is small (i.e. 16), the current logic skipped some chunks due to rounding.
The issue was observed with NB_COLS 8 and ne01 16, and could potentially happen with NB_COLS 4 and other combinations threads/shape.
This is also affected the corner case where chunking is disabled.
@max-krasnyansky I checked the performance here and didn't see any issue. Let me know if you'd like me to perform any particular test
Performance
RPI5
M4 max