Releases · LLNL/Aluminum

06 Mar 22:03

ndryden

v1.0.0

9de194b

v1.0.0

Aluminum is now officially stable.

Changes since v0.7.0:

Aluminum communicators have been refactored and now always operate like objects (as opposed to handles). Communicators all have a stream interface.
Added Barrier operation to all backends.
Added support for vector collectives in the host-transfer backend.
Fix bug in the NCCL Reduce_scatterv operation (#110).
Various other code cleanups and bug fixes.

Assets 2

03 Feb 19:16

ndryden

v0.7.0

d213b54

v0.7.0

The testing and benchmarking infrastructure has been entirely rewritten to be significantly more comprehensive and cleaner. There are also now scripts for nicely plotting benchmark results.

Numerous bugfixes and similar improvements:

Aluminum no longer attempts to use bitwise reductions for long double.
Fixed bug in the host-transfer Allreduce on one processor.
Fix in-place bugs in the NCCL Gather, Gatherv, Scatter, and Scatterv, operations.
Fix MPI type for long int.
The throw_al_exception macro works outside of the Al namespace.
Added a check for version mismatches in the version of HWLOC Aluminum was compiled with versus the one that is used at runtime.
All internal Aluminum headers are now included with the aluminum/ prefix to avoid conflicts with other projects.

Assets 2

04 Nov 19:35

ndryden

v0.6.0

87bc552

v0.6.0

New features:

Support for Send, Recv, and SendRecv in the NCCL backend.
Add initial support for Gather, Scatter, and Alltoall to the NCCL backend.
Initial support for vector collectives in the NCCL and MPI backends: Allgatherv, Alltoallv, Gatherv, Scatterv, and Reduce_scatterv.
Added new benchmarks for all supported operations.
Improved performance and correctness of the spin-wait kernel used in the host-transfer backend.
Improved progress engine binding logic. Related environment variables have been removed. Failing to bind no longer throws an exception.

Other changes:

Various code cleanups and enhancements.
The pairwise-exchange/ring allreduce algorithm has been removed from the MPI backend.
Internal CUB memory pool is used for temporary GPU memory allocations.

Assets 2

28 Jul 20:20

ndryden

v0.5.0

6ea0c37

v0.5.0

Support for the entire Aluminum API in the MPI backend.
The MPI-CUDA backend has been renamed to the HostTransfer backend. (Except for RMA operations.)
Internal cleanups.

Assets 2

27 Jul 17:21

ndryden

v0.4.0

3f4170f

v0.4.0

Bugfix for edge case that could cause hangs when not making progress.
Support for AMD GPUs using HIP/ROCm/RCCL.

Assets 2

16 Nov 07:40

ndryden

v0.3.3

9ab4c42

v0.3.3

Fixes a build issue with certain GPU backend configurations.

Assets 2

12 Nov 20:04

ndryden

v0.3.2

5f5d615

v0.3.2

Bugfixes related to ordering in MPI-CUDA.
Removed vector reduce-scatter.
Additional benchmarks and tests.

Assets 2

10 Aug 22:14

ndryden

v0.2.1-1

b07280e

v0.2.1-1

Similar functionality to 0.2.1. Subsequent releases will break backwards compatibility.

Assets 2

12 Feb 01:26

bvanessen

v0.2.1

4ebcf8a

v0.2.1

Fixed the internal version number to properly reflect the release
number. This is required for checking API compatibility.

Assets 2

30 Jan 19:16

ndryden

v0.2

5ddc442

v0.2

New features/changes:

Host-transfer implementations of standard collectives in the MPI-CUDA backend: AllGather, AllToAll, Broadcast, Gather, Reduce, ReduceScatter, and Scatter.
Progress engine is now aware of separate compute streams. This enables better scheduling of non-interfering operations.
Experimental RMA Put/Get operations.
Improved Aluminum algorithm specification.
Non-blocking point-to-point operations.
Improved testing and benchmarks.
Bugfixes and performance improvements.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: LLNL/Aluminum

v1.0.0

v0.7.0

v0.6.0

v0.5.0

v0.4.0

v0.3.3

v0.3.2

v0.2.1-1

v0.2.1

v0.2