Skip to content

Releases: LLNL/Aluminum

v1.0.0

06 Mar 22:03
9de194b
Compare
Choose a tag to compare

Aluminum is now officially stable.

Changes since v0.7.0:

  • Aluminum communicators have been refactored and now always operate like objects (as opposed to handles). Communicators all have a stream interface.
  • Added Barrier operation to all backends.
  • Added support for vector collectives in the host-transfer backend.
  • Fix bug in the NCCL Reduce_scatterv operation (#110).
  • Various other code cleanups and bug fixes.

v0.7.0

03 Feb 19:16
d213b54
Compare
Choose a tag to compare

The testing and benchmarking infrastructure has been entirely rewritten to be significantly more comprehensive and cleaner. There are also now scripts for nicely plotting benchmark results.

Numerous bugfixes and similar improvements:

  • Aluminum no longer attempts to use bitwise reductions for long double.
  • Fixed bug in the host-transfer Allreduce on one processor.
  • Fix in-place bugs in the NCCL Gather, Gatherv, Scatter, and Scatterv, operations.
  • Fix MPI type for long int.
  • The throw_al_exception macro works outside of the Al namespace.
  • Added a check for version mismatches in the version of HWLOC Aluminum was compiled with versus the one that is used at runtime.
  • All internal Aluminum headers are now included with the aluminum/ prefix to avoid conflicts with other projects.

v0.6.0

04 Nov 19:35
87bc552
Compare
Choose a tag to compare

New features:

  • Support for Send, Recv, and SendRecv in the NCCL backend.
  • Add initial support for Gather, Scatter, and Alltoall to the NCCL backend.
  • Initial support for vector collectives in the NCCL and MPI backends: Allgatherv, Alltoallv, Gatherv, Scatterv, and Reduce_scatterv.
  • Added new benchmarks for all supported operations.
  • Improved performance and correctness of the spin-wait kernel used in the host-transfer backend.
  • Improved progress engine binding logic. Related environment variables have been removed. Failing to bind no longer throws an exception.

Other changes:

  • Various code cleanups and enhancements.
  • The pairwise-exchange/ring allreduce algorithm has been removed from the MPI backend.
  • Internal CUB memory pool is used for temporary GPU memory allocations.

v0.5.0

28 Jul 20:20
6ea0c37
Compare
Choose a tag to compare
  • Support for the entire Aluminum API in the MPI backend.
  • The MPI-CUDA backend has been renamed to the HostTransfer backend. (Except for RMA operations.)
  • Internal cleanups.

v0.4.0

27 Jul 17:21
Compare
Choose a tag to compare
  • Bugfix for edge case that could cause hangs when not making progress.
  • Support for AMD GPUs using HIP/ROCm/RCCL.

v0.3.3

16 Nov 07:40
9ab4c42
Compare
Choose a tag to compare

Fixes a build issue with certain GPU backend configurations.

v0.3.2

12 Nov 20:04
5f5d615
Compare
Choose a tag to compare
  • Bugfixes related to ordering in MPI-CUDA.
  • Removed vector reduce-scatter.
  • Additional benchmarks and tests.

v0.2.1-1

10 Aug 22:14
b07280e
Compare
Choose a tag to compare

Similar functionality to 0.2.1. Subsequent releases will break backwards compatibility.

v0.2.1

12 Feb 01:26
Compare
Choose a tag to compare

Fixed the internal version number to properly reflect the release
number. This is required for checking API compatibility.

v0.2

30 Jan 19:16
5ddc442
Compare
Choose a tag to compare

New features/changes:

  • Host-transfer implementations of standard collectives in the MPI-CUDA backend: AllGather, AllToAll, Broadcast, Gather, Reduce, ReduceScatter, and Scatter.
  • Progress engine is now aware of separate compute streams. This enables better scheduling of non-interfering operations.
  • Experimental RMA Put/Get operations.
  • Improved Aluminum algorithm specification.
  • Non-blocking point-to-point operations.
  • Improved testing and benchmarks.
  • Bugfixes and performance improvements.