Releases · ROCm/rocBLAS

15 Aug 04:26

saadrahim

rocm-3.7.0

9d98138

rocBLAS-2.26.0 for ROCm 3.7.0

New Features

Improvements to User Guide and Design Document
L1 dot function optimized to utilize shuffle instructions ( improvements on bf16, f16, f32 data types )
L1 dot function added x dot x optimized kernel
Standardization of L1 rocblas-bench to use device pointer mode to focus on GPU memory bandwidth
Adjustments for hipcc (hip-clang) compiler as standard build compiler and Centos8 support
Added Fortran interface for all rocBLAS functions
Improvements to rocblas_Xgemm_batched performance for small m, n, k.
Improvements to rocblas_Xgemv_batched and rocblas_Xgemv_strided_batched performance for small m (QMCPACK use).
Improvements to rocblas_Xdot (batched and non-batched) performance when both incx and incy are 1
Improvements to FP32 ONNX BERT performance for MI50
Significant improvements to FP32 Resnext, Inception Convolution performance for gfx908
Slight improvements to FP32 DLRM Terabyte performance for gfx908
Significant improvements to FP32 BDAS performance for gfx908
Significant improvements to FP32 BDAS performance for MI50 and MI60
Added substitution method for small trsm sizes with m <= 64 && n <= 64. Increases performance drastically for small batched trsm.

Known Issues

None

Assets 2

10 Jul 22:50

amdkila

rocm-3.5.0

3ce299f

rocBLAS-2.22.0 for ROCm 3.5.0

Changelist

add geam complex, geam_batched, and geam_strided_batched
add dgmm, dgmm_batched, and dgmm_strided_batched

Optimized performance

ger
- rocblas_sger, rocblas_dger,
- rocblas_sger_batched, rocblas_dger_batched
- rocblas_sger_strided_batched, rocblas_dger_strided_batched
geru
- rocblas_cgeru, rocblas_zgeru
- rocblas_cgeru_batched, rocblas_zgeru_batched
- rocblas_cgeru_strided_batched, rocblas_zgeru_strided_batched
gerc
- rocblas_cgerc, rocblas_zgerc
- rocblas_cgerc_batched, rocblas_zgerc_batched
- rocblas_cgerc_strided_batched, rocblas_zgerc_strided_batched
symv
- rocblas_ssymv, rocblas_dsymv, rocblas_csymv, rocblas_zsymv,
- rocblas_ssymv_batched, rocblas_dsymv_batched, rocblas_csymv_batched, rocblas_zsymv_batched,
- rocblas_ssymv_strided_batched, rocblas_dsymv_strided_batched, rocblas_csymv_strided_batched, rocblas_zsymv_strided_batched,
sbmv
- rocblas_ssbmv, rocblas_dsbmv,
- rocblas_ssbmv_batched, rocblas_dsbmv_batched,
- rocblas_ssbmv_strided_batched, rocblas_dsbmv_strided_batched,
spmv
- rocblas_sspmv, rocblas_dspmv,
- rocblas_sspmv_batched, rocblas_dspmv_batched,
- rocblas_sspmv_strided_batched, rocblas_dspmv_strided_batched,
improved documentation
Fix argument checking in functions to match legacy BLAS
Fixed conjugate-transpose version of geam

Known failures

Compilation for GPU Targets
- When using the install.sh script for "all" GPU Targets, which is the default, you must first set an environment variable HCC_AMDGPU_TARGET listing the GPU targets, e.g. HCC_AMDGPU_TARGET=gfx803,gfx900,gfx906,gfx908
- If building for a specific architecture(s) using the -a | --architecture flag, you should also set the environment variable HCC_AMDGPU_TARGET to match.
- Mismatching the environment variable to the -a flag architectures creates builds that may result in SEGFAULTS when running on GPUs which weren't specified.

Assets 2

11 Jul 00:38

saadrahim

3.6beta

98ecde9

rocBLAS-2.24.0 for ROCm 3.6.0

New Features

Improvements to User Guide and Design Document
L1 dot function optimized to utilize shuffle instructions ( improvements on bf16, f16, f32 data types )
L1 dot function added x dot x optimized kernel
Standardization of L1 rocblas-bench to use device pointer mode to focus on GPU memory bandwidth
Adjustments for hipcc (hip-clang) compiler as standard build compiler and Centos8 support
Added Fortran interface for all rocBLAS functions

Known Issues

None

Assets 2

28 Feb 22:11

amcamd

v2.2.0

90bbc3b

rocBLAS-2.2.0

Changelist:

Fix compilation of TRSV, IAMAX, IAMIN
Add TRSM test sizes
Fix false negative precision failures for f16_r gemm_ex tests
Improvements to documentation and addition of sample for i8_r/i32_r gemm_ex
Tuning for i8_r/i32_r gemm_ex for MIOpen
Add gtest ConfigurableEventListner to reduce Jenkins log file size
Initial refactorization of rocblas-bench
rocblas_dgemm NT tuning

Assets 2

01 Feb 02:27

bragadeesh

v2.1.0

5acec06

rocBLAS-2.1.0

Changelist:

Refactor rocBLAS test framework
Improved performance of i8_r/i32_r rocblas_gemm_ex on gfx906
Addition of simple trsv implementation using trsm
Improved performance of trsm
Tuning improvements for resnet50 problems
Update tuning to use new Tensile solution selection logic
rocblas_gemm_ex performance improvement when ldd == lcc and strideD == strideC
Bug fixes for IAMIN and TRSV
Add sphinx based readthedoc documentation

Assets 3

19 Dec 19:46

amcamd

v2.0.0

76ab780

rocBLAS-2.0.0 for ROCm 2.0

Changelist:

improved performance of fp16/fp32 rocblas_gemm_ex on gfx906
support for i8/i32 rocblas_gemm_ex
update vega-10 resnet50 tuning
refactor testing to be data driven
change gemm-ex API solution index from uint32_t to int32_t
disable gemm and gemm_ex chunking
fix gemv argument checking
add performance script for p1b1 benchmark sizes
refactor gemm code to reduce use of macros
trsm performance regression fix

Assets 3

12 Oct 03:00

amcamd

v14.3.0

d19db5d

rocBLAS-14.3.0 for ROCm1.9

Changelist:

add rocblas_gemm_strided_batched_ex for mixed precision support
tested on ROCm1.9
fix chunking of A and B matrices
expand testing of rocblas_gemm
sgemm and hgemm tuning on gfx906 for Resnet50 from Tensile V4.6.0

Known failures:

known dgemm failures for m,n < 16

Assets 3

21 Sep 17:44

zaliu

v14.2.5

8490ca9

enable gfx906 support

A small incremental release to enable gfx906 support. To get gfx906 support, ROCm 1.9 or later must be used to build rocBLAS.

Assets 2

12 Sep 15:56

amcamd

v14.1.2

e969647

rocBLAS-14.1.2 for ROCm1.8.2

Changelist:

Add initial rocblas_gemm_ex for mixed precision support and foundation for future capabilities
use Tensile 4.5.0 for bug fixes and performance improvements
separate tests into quick, pre_checkin, and nightly
add sweep tests for gemm

Assets 3

10 Aug 04:02

amcamd

v14.1.1

093ae36

rocBLAS 14.1.1 for ROCm 1.8.2

Changelist:

update hgemm asm_full YAML file for performance; re-train hgemm hip_lite YAML file
new YAML files with PreciseBoundsCheck disabled
update hgemm asm_full YAML file, source and VW=2 for m,n,k <= 32
update hgemm asm_full YAML file, source and VW=1 for m,n,k == 1
add strided_batched tests for hgemm
correct gemm test matrix initialization
change cmake and source files to support hip-clang
change from __fp16 to _Float16

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ROCm/rocBLAS

rocBLAS-2.26.0 for ROCm 3.7.0

rocBLAS-2.22.0 for ROCm 3.5.0

rocBLAS-2.24.0 for ROCm 3.6.0

rocBLAS-2.2.0

rocBLAS-2.1.0

rocBLAS-2.0.0 for ROCm 2.0

rocBLAS-14.3.0 for ROCm1.9

enable gfx906 support

rocBLAS-14.1.2 for ROCm1.8.2

rocBLAS 14.1.1 for ROCm 1.8.2