PyTorch native quantization and sparsity for training and inference
-
Updated
Nov 20, 2024 - Python
PyTorch native quantization and sparsity for training and inference
A library written in C for converting between float8 (8-bit minifloat numbers) and float32 (single-precision floating-point numbers) formats.
Add a description, image, and links to the float8 topic page so that developers can more easily learn about it.
To associate your repository with the float8 topic, visit your repo's landing page and select "manage topics."