Skip to content

v0.6.0

Compare
Choose a tag to compare
@ndryden ndryden released this 04 Nov 19:35
· 303 commits to master since this release
87bc552

New features:

  • Support for Send, Recv, and SendRecv in the NCCL backend.
  • Add initial support for Gather, Scatter, and Alltoall to the NCCL backend.
  • Initial support for vector collectives in the NCCL and MPI backends: Allgatherv, Alltoallv, Gatherv, Scatterv, and Reduce_scatterv.
  • Added new benchmarks for all supported operations.
  • Improved performance and correctness of the spin-wait kernel used in the host-transfer backend.
  • Improved progress engine binding logic. Related environment variables have been removed. Failing to bind no longer throws an exception.

Other changes:

  • Various code cleanups and enhancements.
  • The pairwise-exchange/ring allreduce algorithm has been removed from the MPI backend.
  • Internal CUB memory pool is used for temporary GPU memory allocations.