Commit 625854d
CUDA: optimize and refactor MMQ (ggml-org#8416)
* CUDA: optimize and refactor MMQ
* explicit q8_1 memory layouts, add documentation1 parent 9b227fd commit 625854d
File tree
5 files changed
+844
-665
lines changed- ggml-cuda
5 files changed
+844
-665
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
73 | 77 | | |
74 | 78 | | |
75 | 79 | | |
| |||
0 commit comments