Bulk PR for Tile Programming#1156
Closed
asroy wants to merge 45 commits into
Closed
Conversation
Tile Program init bulk PR --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com>
* adding gemm+softmax+gemm
* removing program server * specify launch bound per kernel instance
* make it simple * batched gemm+softmax+gemm
* adding in-thread shuffle * update softmax example * refactor grid gemm * refactor gemm: layouts * bug fix * clean * clean
* add tensor slicing API * remove redundant ck namespace * better gemm_gemm interface * modify gemm_gemm * add slice_tile api * fix merge bug * update to 3d padding, since we no longer need that much LDS size * clean * cleang * clean * clean * clean * clean * clean * clean * clean * clean * clean * clean * clean * clean --------- Co-authored-by: Chao Liu <chao.liu2@amd.com>
* slice kv, and use 3d padding LDS layout * add missing sync * put sync to another poace * move sync place * revert to normal
* refactor * refactor * change load_tile, update block gemm * debug * clean * clean * experiment lod * workaround spilling issue * clean
…aticDistribution<> (#12) * Extract store_tile() logics as method * Extract load_tile() logics as method * Rename type alias * Extract common logics as traits * Remove unnecessary access specifier * Add ComputeMode for TileWindowWithStaticDistribution * Put field check into Traits * More definition of Traits types * Use more clear static_assert() message * Enable pre-compute coordinates in store_tile() * Re-formate static assert * Undo changes to the wrong method * Enable pre-compute coords for store_tile() * Remove static_vector usage * Add method to move non-member coordinates * Force using pre-computed coordinates in Store() * Fix wrong access for SFC_Ys * Change comment * Allow users to hint # access per coord * Add comment for noting remove data members later * Unify FIXME comments * Replace FIXME comments by TODO * Let user specify HintNumCoords * clean * clean * clean * clean * refactor load/store for window * clean * clean * bug fix for window; clean --------- Co-authored-by: Chao Liu <chao.liu2@amd.com>
* Revert "Extract gemm0 prefetch0 out from loop" This reverts commit d3b56f39f9fd12edb476b24ae9cf480841d311e4. * add fmha fwd pipeline * Extract gemm0 prefetch0 out from loop * move blockSize to another place ; fix a missing header in tile_window_impl_static_distribution.hpp * remove KArgs from tile modules --------- Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>
* refactor gemm+softmax+gemm using block-gemm * reorg files * clean
* support batch & nhead * support scale * tile scheduler * rename tile-scheduler to tile-partitioner * add some exp2 math * fix a bug when chaning tile size
* unify q persistent in register * add refactor warp_gemm dispatcher
* support hdim=64/128 in same example code * support v transpose * revert gemm.cpp, not intent to modify it * remove useless code * fix a bug for swizzle C encoding, no perf change * optimize LDS encoding * update LDS layout * clean up code
* fix build for old ck examples * fix build for old ck
also fix regression
* Add include/ck/config.h to support xformers c++ extension building * Disable exp() and log() overloading for half_t to support xformers C++ extension building * config.h.default --------- Co-authored-by: Chao Liu <chao.liu2@amd.com>
This PR introduces support for double buffering in LDS into GEMM kernels that use direct load instructions. Direct loads now use inline asm instead of intrinsics. Usage of intrinsics results in compiler adding additional waitcnt instructions what breaks possible load/compute overlap in case of double buffering. Usage of inline asm results in the need to use sched_barrier in order to make sure that compiler cannot incorrectly reschedule instructions since it does not know the data dependencies between global->LDS and LDS->registers.
* add daily build with mainline compiler * fix the compiler paths for ci * remove the -flto flag * build with clang by default
* Introduce wrapper library * Update cmake files * Revert "Update cmake files" This reverts commit c27f88b. * Fix comments
* turn on -O3 compiler flag explicitly * change cmake syntax for CI * modify cmake line breaks in jenkinsfile
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.29.0 to 0.30.1. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v0.29.0...v0.30.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
* switch from ROCmSoftwarePlatform to ROCm org * replace ROCmSoftwarePlatform with ROCm in few more places
* Support broadcast for bias in grouped conv fwd * Fix comment * Comment fixes * Remove GK layout
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CK Tile Programming Interface and some examples