diff --git a/docs/source/en/quantization/overview.md b/docs/source/en/quantization/overview.md index 2c9b2babb078..08a4d719ece8 100644 --- a/docs/source/en/quantization/overview.md +++ b/docs/source/en/quantization/overview.md @@ -45,56 +45,50 @@ In short, supporting a wide range of quantization methods allows you to pick the Use the table below to help you decide which quantization method to use. -| Quantization method | On the fly quantization | CPU | CUDA GPU | ROCm GPU (AMD) | Metal (Apple Silicon) | Intel GPU | torch.compile() | Number of bits | Supports fine-tuning (through PEFT) | Serializable with πŸ€— transformers | πŸ€— transformers support | Link to library | -|--------------------------------------------|-------------------------|-----------------|----------|-----------------|------------------------------------|-----------------|-------------------------|----------------|-------------------------------------|--------------|------------------------|---------------------------------------------| -| [AQLM](./aqlm.md) | πŸ”΄ | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 1 / 2 | 🟒 | 🟒 | 🟒 | https://github.com/Vahe1994/AQLM | -| [AWQ](./awq.md) | πŸ”΄ | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | ? | 4 | 🟒 | 🟒 | 🟒 | https://github.com/casper-hansen/AutoAWQ | -| [bitsandbytes](./bitsandbytes.md) | 🟒 | 🟑 1 | 🟒 | 🟑 1 | πŸ”΄ 2 | 🟑 1 | πŸ”΄ 1 | 4 / 8 | 🟒 | 🟒 | 🟒 | https://github.com/bitsandbytes-foundation/bitsandbytes | -| [compressed-tensors](./compressed_tensors.md) | πŸ”΄ | 🟒 | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 1 / 8 | 🟒 | 🟒 | 🟒 | https://github.com/neuralmagic/compressed-tensors | -| [EETQ](./eetq.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | ? | 8 | 🟒 | 🟒 | 🟒 | https://github.com/NetEase-FuXi/EETQ | -| [GGUF / GGML (llama.cpp)](../gguf.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | 1 / 8 | πŸ”΄ | πŸ”΄ 6 | πŸ”΄ 6 | https://github.com/ggerganov/llama.cpp | -| [GPTQModel](./gptq.md) | πŸ”΄ | 🟒 3 | 🟒 | 🟒 | 🟒 | 🟒 4 | πŸ”΄ | 2 / 3 / 4 / 8 | 🟒 | 🟒 | 🟒 | https://github.com/ModelCloud/GPTQModel | -| [AutoGPTQ](./gptq.md) | πŸ”΄ | πŸ”΄ | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 2 / 3 / 4 / 8 | 🟒 | 🟒 | 🟒 | https://github.com/AutoGPTQ/AutoGPTQ | -| [HIGGS](./higgs.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 2 / 4 | πŸ”΄ | 🟒 | 🟒 | https://github.com/HanGuo97/flute | -| [HQQ](./hqq.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 1 / 8 | 🟒 | πŸ”΄ | 🟒 | https://github.com/mobiusml/hqq/ | -| [optimum-quanto](./quanto.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | 🟒 | 2 / 4 / 8 | πŸ”΄ | πŸ”΄ | 🟒 | https://github.com/huggingface/optimum-quanto | -| [FBGEMM_FP8](./fbgemm_fp8.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | πŸ”΄ | 8 | πŸ”΄ | 🟒 | 🟒 | https://github.com/pytorch/FBGEMM | -| [torchao](./torchao.md) | 🟒 | | 🟒 | πŸ”΄ | 🟑 5 | πŸ”΄ | | 4 / 8 | | πŸŸ’πŸ”΄ | 🟒 | https://github.com/pytorch/ao | -| [VPTQ](./vptq.md) | πŸ”΄ | πŸ”΄ | 🟒 | 🟑 | πŸ”΄ | πŸ”΄ | 🟒 | 1 / 8 | πŸ”΄ | 🟒 | 🟒 | https://github.com/microsoft/VPTQ | +| Quantization Method | Runtime Quantization | CPU | CUDA GPU | ROCm GPU | Metal (Apple Silicon) | Intel GPU | Torch compile() | Bits | PEFT Fine Tuning | Serializable with πŸ€—Transformers | πŸ€—Transformers Support | Link to library | +|-----------------------------------------------|----------------------|-----------------|----------|-----------|------------------------------------|-----------------|-----------------|---------------|------------------|-----------------------------|-------------------------|---------------------------------------------| +| [AQLM](./aqlm.md) | πŸ”΄ | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 1 / 2 | 🟒 | 🟒 | 🟒 | https://github.com/Vahe1994/AQLM | +| [AWQ](./awq.md) | πŸ”΄ | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | ? | 4 | 🟒 | 🟒 | 🟒 | https://github.com/casper-hansen/AutoAWQ | +| [bitsandbytes](./bitsandbytes.md) | 🟒 | 🟑 1 | 🟒 | 🟑 1 | πŸ”΄ 2 | 🟑 1 | πŸ”΄ 1 | 4 / 8 | 🟒 | 🟒 | 🟒 | https://github.com/bitsandbytes-foundation/bitsandbytes | +| [compressed-tensors](./compressed_tensors.md) | πŸ”΄ | 🟒 | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 1 / 8 | 🟒 | 🟒 | 🟒 | https://github.com/neuralmagic/compressed-tensors | +| [EETQ](./eetq.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | ? | 8 | 🟒 | 🟒 | 🟒 | https://github.com/NetEase-FuXi/EETQ | +| [GGUF / GGML (llama.cpp)](../gguf.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | 1 / 8 | πŸ”΄ | [See Notes](../gguf.md) | [See Notes](../gguf.md) | https://github.com/ggerganov/llama.cpp | +| [GPTQModel](./gptq.md) | πŸ”΄ | 🟒 3 | 🟒 | 🟒 | 🟒 | 🟒 4 | πŸ”΄ | 2 / 3 / 4 / 8 | 🟒 | 🟒 | 🟒 | https://github.com/ModelCloud/GPTQModel | +| [AutoGPTQ](./gptq.md) | πŸ”΄ | πŸ”΄ | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 2 / 3 / 4 / 8 | 🟒 | 🟒 | 🟒 | https://github.com/AutoGPTQ/AutoGPTQ | +| [HIGGS](./higgs.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 2 / 4 | πŸ”΄ | 🟒 | 🟒 | https://github.com/HanGuo97/flute | +| [HQQ](./hqq.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | 🟒 | 1 / 8 | 🟒 | πŸ”΄ | 🟒 | https://github.com/mobiusml/hqq/ | +| [optimum-quanto](./quanto.md) | 🟒 | 🟒 | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | 🟒 | 2 / 4 / 8 | πŸ”΄ | πŸ”΄ | 🟒 | https://github.com/huggingface/optimum-quanto | +| [FBGEMM_FP8](./fbgemm_fp8.md) | 🟒 | πŸ”΄ | 🟒 | πŸ”΄ | πŸ”΄ | πŸ”΄ | πŸ”΄ | 8 | πŸ”΄ | 🟒 | 🟒 | https://github.com/pytorch/FBGEMM | +| [torchao](./torchao.md) | 🟒 | | 🟒 | πŸ”΄ | 🟑 5 | πŸ”΄ | | 4 / 8 | | πŸŸ’πŸ”΄ | 🟒 | https://github.com/pytorch/ao | +| [VPTQ](./vptq.md) | πŸ”΄ | πŸ”΄ | 🟒 | 🟑 | πŸ”΄ | πŸ”΄ | 🟒 | 1 / 8 | πŸ”΄ | 🟒 | 🟒 | https://github.com/microsoft/VPTQ | -**1** bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend). Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links. +**1:** bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend). Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links. -**2** bitsandbytes is seeking contributors to help develop and lead the Apple Silicon backend. Interested? Contact them directly via their repo. Stipends may be available through sponsorships. +**2:** bitsandbytes is seeking contributors to help develop and lead the Apple Silicon backend. Interested? Contact them directly via their repo. Stipends may be available through sponsorships. -**3** GPTQModel[CPU] supports full bit range via Torch and 4-bit via IPEX on Intel/AMD. +**3:** GPTQModel[CPU] supports 4-bit via IPEX on Intel/AMD and full bit range via Torch on Intel/Amd/Apple Silicon. -**4** GPTQModel[Intel GPU] via IPEX only supports 4-bit for Intel Datacenter Max + Arc. +**4:** GPTQModel[Intel GPU] via IPEX only supports 4-bit for Intel Datacenter Max + Arc. -**5** torchao only supports int4 weight on Metal (Apple Silicon). +**5:** torchao only supports int4 weight on Metal (Apple Silicon). - - - -**6** [See GGUF section](../gguf.md) - -