float8

Here are 3 public repositories matching this topic...

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8

minifloat (8-bit float) in Golang

golang minifloat float8

A library written in C for converting between float8 (8-bit minifloat numbers) and float32 (single-precision floating-point numbers) formats.

floating-point minifloat float32 float8

Add a description, image, and links to the float8 topic page so that developers can more easily learn about it.

To associate your repository with the float8 topic, visit your repo's landing page and select "manage topics."