The goal isn't to reproduce all emerging methodologies but to extract common components across prevalent, proven paradigms that can be modularized and composed with the
torch
stack as well as other OSS ML frameworks.
galore
- fused kernels for memory-efficient pre-training / fine-tuning per the GaLore algorithmgalore/kernels
-triton
kernels that fuse various steps of theGaLore
algorithmgalore/docs
- implementation notes and discussion of issues faced in kernel design.
hqq
,awq
,marlin
,QuaRot
, and other well-researched methodologies for quantized fine-tuning and inference.- ideally, techniques that are both theoretically sound and have practical hardware-aware implementations
- AWQ and GPTQ are good examples.
cutlass
/triton
utilities for common quantization ops (numeric conversion, quant / dequant, mixed type gemm, etc.)- goal is to create a set of kernels and components that can expedite the implementation & optimization across the spectrum of quantization, fine-tuning, and inference patterns.