Skip to content

v1.4.0

Compare
Choose a tag to compare
@ndryden ndryden released this 17 Aug 18:49
· 58 commits to master since this release
3c08739

This release addresses various issues and adds a new MultiSendRecv operation.

  • The default internal stream pool size has changed to 1. This is to mitigate issues on ROCm platforms, but no performance impact was observed on other platforms.
  • Fix a compilation error when building on CUDA 12 platforms.
  • On ROCm platforms only: zero-size RCCL Send, Recv, and Sendrecv messages are skipped. This is to work around apparent hangs in RCCL with such messages and will be removed once the issue is fixed upstream.
  • Fix a memory copy issue in the host-transfer Alltoallv.
  • Updated to cxxopts 3.
  • Added a compile-time traits API for describing what operations, types, etc. are supported by each backend.
  • Added the MultiSendRecv operation, which supports an arbitrary sequence of sends and receives among ranks as a single operation.
  • Various internal reorganizations for the test and benchmark code.