Skip to content

ggml-cpu: arm64: q5_K repack gemm and gemv (and generic) implementations (dotprod)#19356

Open
Alcpz wants to merge 11 commits intoggml-org:masterfrom
Alcpz:Alcpz/arm_q5_K_dotprod
Open

ggml-cpu: arm64: q5_K repack gemm and gemv (and generic) implementations (dotprod)#19356
Alcpz wants to merge 11 commits intoggml-org:masterfrom
Alcpz:Alcpz/arm_q5_K_dotprod

Conversation

@Alcpz
Copy link
Collaborator

@Alcpz Alcpz commented Feb 5, 2026

This PR extends #18860 for DOTPROD devices.

PR contents:

  • New generics for q5_K_8x4
  • New repack implementations for ARM
  • Templated generic impl

Same methodology for testing -> llama-cli output, outputs of gemm and gemvs and perplexity to double check prompt processing.

Performance

  • Apple M4 Max (-mcpu=cortex-a76+dotprod+noi8mm+nosve)
Model Test Repack OFF (t/s) Repack ON (t/s) Speedup
lfm2 1.2B Q5_K pp512 244.10 554.16 2.27
lfm2 1.2B Q5_K tg128 165.76 190.53 1.15
qwen3 8B Q5_K pp512 35.15 69.97 1.99
qwen3 8B Q5_K tg128 26.44 29.07 1.10
  • Rpi5
Model Test Repack OFF (t/s) Repack ON (t/s) Speedup
lfm2 350M Q5_K pp512 100.89 187.35 1.86
lfm2 350M Q5_K tg128 44.45 43.00 0.97
lfm2 700M Q5_K pp512 47.06 89.73 1.91
lfm2 700M Q5_K tg128 21.57 21.33 0.99

Perplexity

model Repack ON Generic Repack OFF
LFM2-1.2B 16.8328 +/- 0.96474 16.8376 +/- 0.96443 16.8716 +/- 0.96604
Qwen3-8B 11.2673 +/- 0.68529 11.2921 +/- 0.68827 11.2771 +/- 0.68677

llama-cli

llama-cli using repack

build : b7852-171ba67cf
model : LFM2-1.2B-Q5_K_M.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file

What is the capital of Turkey?

The capital of Turkey is Ankara. It was officially chosen as the capital in 1920, replacing Istanbul, which had been Turkey's capital for centuries. Ankara is located in the central Anatolian region of Turkey and is known for its historical significance, modern architecture, and as the country's political and administrative center.

[ Prompt: 137.8 t/s | Generation: 53.3 t/s ]

Can I visit the Eiffel tower in Ankara?

No, you cannot visit the Eiffel Tower in Ankara. The Eiffel Tower is located in Paris, France.

If you’re interested in visiting Turkey and its landmarks, Ankara has several notable sites! For example, you could explore:

  • Ankara Castle (Ankara Kalesi), a historic fortress with panoramic views.
  • Grand National Assembly of Turkey (Yüksek Çarşısı), the country’s parliament building.
  • Ancak Historic Areas (such as the ancient city of Ancyra), though not as iconic as Istanbul’s landmarks.

For specific Eiffel Tower-related plans (e.g., guided tours or events), check with Turkish tour operators or international travel agencies that may connect Ankara to European destinations. Safe travels! 😊

[ Prompt: 132.8 t/s | Generation: 55.3 t/s ]

llama-cli using generic

build : b7852-171ba67cf
model : LFM2-1.2B-Q5_K_M.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file

What is the capital of Turkey?

The capital of Turkey is Ankara. It has been the capital of Turkey since 1923, when the nation's capital was moved from Istanbul following Mustafa Kemal Atatürk's secularization and modernization reforms.

[ Prompt: 2.5 t/s | Generation: 13.1 t/s ]

Can I visit the tower Eiffel in Ankara?

No, you cannot visit the Eiffel Tower in Ankara. Here's why:

  • The Eiffel Tower is located in Paris, France, not Turkey. It is one of Paris’s most iconic landmarks and does not exist in Ankara.
  • There is no Eiffel Tower in Turkey, and no major tourist attraction matching its scale or purpose (a radio-tower and cultural landmark) has been built in Ankara.

If you’re interested in Turkey’s own iconic structures, the city of Istanbul (with its historic Eiffel Tower-inspired landmarks like the Galata Tower) is a better place to visit. For Ankara, focus on its modern administrative, cultural, or natural attractions (e.g., Ankara Atatürk University, the National Museum, or natural parks like the Şehitli Lake area).

Enjoy exploring Turkey’s unique sites instead! 🇹🇷

[ Prompt: 3.2 t/s | Generation: 11.2 t/s ]

@Alcpz Alcpz requested a review from ggerganov as a code owner February 5, 2026 11:19
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 5, 2026
@Alcpz Alcpz changed the title ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (dotprod) ggml-cpu: arm64: q5_K repack gemm and gemv (and generic) implementations (dotprod) Feb 5, 2026
@Alcpz
Copy link
Collaborator Author

Alcpz commented Feb 5, 2026

@tdakhran

@Alcpz Alcpz force-pushed the Alcpz/arm_q5_K_dotprod branch 2 times, most recently from 7d0ad88 to 9ee2115 Compare February 9, 2026 09:19
@Alcpz Alcpz force-pushed the Alcpz/arm_q5_K_dotprod branch from 5e197d3 to 3bec3af Compare February 10, 2026 11:48
@Alcpz
Copy link
Collaborator Author

Alcpz commented Feb 11, 2026

@ggerganov comments addressed and rebased on top of master, let me know if something else is preferred


float sumf[8];
float sum_minf[8];
uint32_t utmp[32];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alcpz, these should use the template parameter 'N'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments