Skip to content

BF16 replacement kernels#705

Merged
zaliu merged 6 commits into
ROCm:masterfrom
zaliu:bf16_replacement_kernels
Sep 19, 2019
Merged

BF16 replacement kernels#705
zaliu merged 6 commits into
ROCm:masterfrom
zaliu:bf16_replacement_kernels

Conversation

@zaliu
Copy link
Copy Markdown
Contributor

@zaliu zaliu commented Sep 18, 2019

Planned ROCm 2.9 Feature: BF16 replacement kernels.

amcamd
amcamd previously approved these changes Sep 18, 2019
Copy link
Copy Markdown
Contributor

@amcamd amcamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change back to D matrix for testing_gemm_ex.hpp and testing_gemm_ex_strided_batched in the future

@zaliu zaliu merged commit 5471329 into ROCm:master Sep 19, 2019
@zaliu zaliu deleted the bf16_replacement_kernels branch September 19, 2019 15:25
zaliu added a commit to zaliu/rocBLAS that referenced this pull request Sep 24, 2019
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels
zaliu added a commit that referenced this pull request Sep 24, 2019
* BF16 replacement kernels (#705)

* Revert "Switch to using separate D for gemm_ex benchmark calls (#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (#707)

* restore UseBeta=1 logic for arcturus BF16 TN
wbgilmartin pushed a commit to wbgilmartin/rocBLAS that referenced this pull request Oct 9, 2019
* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)
wbgilmartin pushed a commit to wbgilmartin/rocBLAS that referenced this pull request Oct 14, 2019
* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* Fix GEMM for half type
wbgilmartin pushed a commit to wbgilmartin/rocBLAS that referenced this pull request Oct 16, 2019
* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* Fix GEMM for half type

* Refactoring classes to be simpler

* Fix rocblas_half
wbgilmartin pushed a commit to wbgilmartin/rocBLAS that referenced this pull request Oct 17, 2019
* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* Changing Gemv and Ger stride type (ROCm#747)

* Handle spaces and newline (ROCm#748)

* Fix GEMM for half type

* Tuned Shakespeare kernels (ROCm#749)

* Refactoring classes to be simpler

* Fix rocblas_half
wbgilmartin pushed a commit to wbgilmartin/rocBLAS that referenced this pull request Oct 21, 2019
* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* Changing Gemv and Ger stride type (ROCm#747)

* Handle spaces and newline (ROCm#748)

* Fix GEMM for half type

* Tuned Shakespeare kernels (ROCm#749)

* Refactoring classes to be simpler

* Fix rocblas_half

* Cleanup source
wbgilmartin added a commit that referenced this pull request Oct 21, 2019
* integration of the new tensile client

* New tensile client (#1)

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* Map values to value categories currently represented as double

* minor changes

* merge Bill & Lee's changes

* Merge Bill & Lee's changes

* Fix build errors

* Merge 2.10 develop into new tensile client (#2)

* Changed timeout from hours to minutes (#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (#701)

* SLES support (#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (#695)

* Fixing Timeout

* BF16 replacement kernels (#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (#708)

* Batched syr (#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (#719)

* Refactor Ger and Gemv (#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (#737)

* SWDEV 203994 (#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (#742)

* fix for changes of FreeIndices change in Tensile

* New tensile client (#3)

* Changed timeout from hours to minutes (#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (#701)

* SLES support (#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (#695)

* Fixing Timeout

* BF16 replacement kernels (#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (#708)

* Batched syr (#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (#719)

* Refactor Ger and Gemv (#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (#737)

* SWDEV 203994 (#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (#742)

* Fix GEMM for half type

* updates to get rocblas-test and half sizes to work

* partial fix for NaN test failures

* New tensile client (#4)

* Changed timeout from hours to minutes (#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (#701)

* SLES support (#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (#695)

* Fixing Timeout

* BF16 replacement kernels (#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (#708)

* Batched syr (#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (#719)

* Refactor Ger and Gemv (#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (#737)

* SWDEV 203994 (#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (#742)

* Fix GEMM for half type

* Refactoring classes to be simpler

* Fix rocblas_half

* fix negative workgroup mapping error

* fix WorkGroupMapping issue for files in asm_lite

* more fixes for workgroupmapping issue

* wgm issue for asm_miopen

* New tensile client (#5)

* Changed timeout from hours to minutes (#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (#701)

* SLES support (#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (#695)

* Fixing Timeout

* BF16 replacement kernels (#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (#708)

* Batched syr (#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (#719)

* Refactor Ger and Gemv (#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (#737)

* SWDEV 203994 (#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (#742)

* Changing Gemv and Ger stride type (#747)

* Handle spaces and newline (#748)

* Fix GEMM for half type

* Tuned Shakespeare kernels (#749)

* Refactoring classes to be simpler

* Fix rocblas_half

* fix argument validation in gemm calls

* fix complex strided batch implementation

* fix validateArgs redefinition

* New tensile client (#6)

* Changed timeout from hours to minutes (#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (#701)

* SLES support (#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (#695)

* Fixing Timeout

* BF16 replacement kernels (#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (#708)

* Batched syr (#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (#719)

* Refactor Ger and Gemv (#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (#737)

* SWDEV 203994 (#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (#742)

* Changing Gemv and Ger stride type (#747)

* Handle spaces and newline (#748)

* Fix GEMM for half type

* Tuned Shakespeare kernels (#749)

* Refactoring classes to be simpler

* Fix rocblas_half

* Cleanup source
mlse-lib-jenkins pushed a commit that referenced this pull request Jun 1, 2021
* fix until hip is revised to not require static attribute
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants