Skip to content

feat: zero-pad non-128 heads for full 7-stage WHT (replaces q8_0 fall…

b74119a
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Closed

feat: CUDA port of TurboQuant3 KV cache — 3.47x compression, 98.5% of F16 decode speed on RTX 5090 #3

feat: zero-pad non-128 heads for full 7-stage WHT (replaces q8_0 fall…
b74119a
Select commit
Loading
Failed to load commit list.
labeler
succeeded Mar 29, 2026 in 14s