Skip to content

New tensile client#1

Merged
wbgilmartin merged 5 commits into
wbgilmartin:new_client_integration_2from
leekillough:NewTensileClient
Oct 8, 2019
Merged

New tensile client#1
wbgilmartin merged 5 commits into
wbgilmartin:new_client_integration_2from
leekillough:NewTensileClient

Conversation

@leekillough
Copy link
Copy Markdown

  1. Fix CMakeLists.txt build error
  2. Temporarily workaround complex type not being supported (needs to be fixed before release), by converting beta to real before calling ContractionProblem
  3. Run clang-format on many files; remove clang-format off in some places (clang-format off was scattered throughout the old Tensile client code, which made it non-uniform from the rest of rocBLAS)
  4. Use auto placeholder for the return types of many functions
  5. Simplify exception handling try/catch blocks, such as merging all of them in runContractionProblem into one function try/catch block.
  6. Remove personal comments containing wbgilmartin name
  7. Remove rocblas_create_host_handle and rocblas-bench --lib argument, and change rocblas_create_handle to do the functionality with an environment variable; new Tensile client code must work without changing existing rocBLAS user code, including rocblas-test and rocblas-bench
  8. Change mapping from rocBLAS types to Tensile types to use variable templates (C++14) instead of function templates
  9. Move tensile_host.hpp from library/include to library/src/include: The public headers in library/include have to be compilable from a C compiler and not use C++ unless guarded with __cplusplus; we do not want to expose C++ internal implementation headers in the public library/include headers
  10. Add override default destructor to TensileHostImpl to ensure base class has a virtual destructor, because we use dynamic_cast
  11. Simplify some of the path-handling code

Notes:

Need to decide what environment variable name to use for the host path, and what the default path should be if the environment variable is not set.

alpha and beta may be on either the host or the device, depending on handle->pointer_mode. We cannot dereference *alpha or *beta on the host if they are located on device. Ideally we should test the *beta == 0 or *alpha == 0 early exits in the device code rather than in the host code if device pointer mode is in effect; optionally, we may use hipMemcpy to copy from device to host, but that is synchronizing, and not ideal long-term.

Logging needs to work like the original code did; logging occurs whenever handle != nullptr and handle->layer_mode has certain bitmasks set; another PR ROCm#722 is being worked on to change how logging handles nullptr alpha and beta.

The transpose arguments can be rocblas_operation_none, rocblas_operation_transpose or rocblas_operation_conjugate_transpose. For real types, rocblas_operation_transpose and rocblas_operation_conjugate_transpose must operate exactly the same, so normally to detect transpose, we test whether transA != rocblas_operation_transpose_none, rather comparing against rocblas_operation_transpose or rocblas_operation_conjugate_transpose, i.e., if the transpose is not "none", then it represents a transpose regardless of whether it's conjugated.

For complex types, conjugate transpose and regular transpose must be differentiated in certain places. Right now complex types cause linker errors, and std::real(*beta) is used instead of *beta to allow a function overload to work. Complex needs to be added to the overloads, and beta must not be dereferenced if handle->pointer_mode is device pointer mode.

I will be out the rest of the weekend, reading email on my phone occasionally.

@wbgilmartin @zaliu @sdquiring @amcamd @bensander

@wbgilmartin wbgilmartin merged commit b2a5ec3 into wbgilmartin:new_client_integration_2 Oct 8, 2019
wbgilmartin added a commit that referenced this pull request Oct 25, 2019
* integration of the new tensile client

* New tensile client (#1)

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* Map values to value categories currently represented as double

* minor changes

* merge Bill & Lee's changes

* Merge Bill & Lee's changes

* Fix build errors

* Merge 2.10 develop into new tensile client (#2)

* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* fix for changes of FreeIndices change in Tensile

* New tensile client (#3)

* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* Fix GEMM for half type

* updates to get rocblas-test and half sizes to work

* partial fix for NaN test failures

* New tensile client (#4)

* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* Fix GEMM for half type

* Refactoring classes to be simpler

* Fix rocblas_half

* fix negative workgroup mapping error

* fix WorkGroupMapping issue for files in asm_lite

* more fixes for workgroupmapping issue

* wgm issue for asm_miopen

* New tensile client (#5)

* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* Changing Gemv and Ger stride type (ROCm#747)

* Handle spaces and newline (ROCm#748)

* Fix GEMM for half type

* Tuned Shakespeare kernels (ROCm#749)

* Refactoring classes to be simpler

* Fix rocblas_half

* fix argument validation in gemm calls

* fix complex strided batch implementation

* fix validateArgs redefinition

* New tensile client (#6)

* Changed timeout from hours to minutes (ROCm#699)

* set clang include directory, fix for centos build error

* hot fix to restore loading of DGEMM replacement kernels (ROCm#701)

* SLES support (ROCm#704)

* Merging master with SLES commit

* Specifying GPU architecture for ubuntu and sles (ROCm#695)

* Fixing Timeout

* BF16 replacement kernels (ROCm#705)

* hot fix to restore loading of DGEMM replacement kernels
* Revert "Switch to using separate D for gemm_ex benchmark calls (ROCm#667)"
This reverts commit 402d231.
* bf16 kernels for gfx908
* use bf16 UseBeta=0 replacement kernels
* update tensile_tag to use bf16 UseBeta=0 replacement kernels

* Restore usebeta1 logic (ROCm#707)

* restore UseBeta=1 logic for arcturus BF16 TN

* Supporting clang10 for SLES (ROCm#708)

* Batched syr (ROCm#727)

* adding syr batched and strided batched

* rocblas_stride, reusable template pattern work

* WIP

* fixes testing and format

* restore dependency

* adds minimal batches  & bad arg

* fix bad arg testing

* constify ptrs, spelling

* add alpha vector support, PR feedback

* more BF16 TN sizes

* Refactoring

* Move tensile_host.hpp from public C API to private C++ implementation

* Work around missing complex; fix formatting

* Remove --lib option and argument; make use of legacy handle API instead of introducing a new host handle API

* gf908 BF16 TN 512x512x512 known issue

* Enable SLES packaging (ROCm#719)

* Refactor Ger and Gemv (ROCm#735)

* Map values to value categories currently represented as double

* Rot(m)(g) batched and strided_batched (ROCm#737)

* SWDEV 203994 (ROCm#743)

* Refactor alpha, beta logging; do not return invalid pointer errors in GEMM, GEMV when other arguments allow early exit

* Fixes

* update

* version for master branch release

* version for develop branch release

* update Tensile package number

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests (ROCm#741)

* Fixing SLES tests LD_LIBRARY_PATH and refactoring tests

* Missing braces/spelling

* spelling

* Fixing packaging

* New Winograd kernels added (ROCm#742)

* Changing Gemv and Ger stride type (ROCm#747)

* Handle spaces and newline (ROCm#748)

* Fix GEMM for half type

* Tuned Shakespeare kernels (ROCm#749)

* Refactoring classes to be simpler

* Fix rocblas_half

* Cleanup source
wbgilmartin pushed a commit that referenced this pull request Apr 17, 2020
wbgilmartin pushed a commit that referenced this pull request Apr 29, 2020
* Simplify TensileHost initialization
* cleanup path handling
wbgilmartin pushed a commit that referenced this pull request Oct 26, 2020
wbgilmartin pushed a commit that referenced this pull request Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants