v0.14.0

angeloskath released this 24 May 01:33

· 52 commits to main since this release

Highlights

Small-size build that JIT compiles kernels and omits the CPU backend which results in a binary <4MB
- Series of PRs 1, 2, 3, 4, 5
mx.gather_qmm quantized equivalent for mx.gather_mm which speeds up MoE inference by ~2x
- Some numbers
Grouped 2D convolutions
- Some numbers

Core

mx.conjugate
mx.conv3d and nn.Conv3d
List based indexing
Started mx.distributed which uses MPI (if installed) for communication across machines
- mx.distributed.init
- mx.distributed.all_gather
- mx.distributed.all_reduce_sum
Support conversion to and from dlpack
mx.linalg.cholesky on CPU
mx.quantized_matmul sped up for vector-matrix products
mx.trace
mx.block_masked_mm now supports floating point masks!

Fixes

Error messaging in eval
Add some missing docs
Scatter index bug
The extensions example now compiles and runs
CPU copy bug with many dimensions

Assets 2