Skip to content

Bulk PR for Tile Programming#1156

Closed
asroy wants to merge 45 commits into
developfrom
ck_tile/main
Closed

Bulk PR for Tile Programming#1156
asroy wants to merge 45 commits into
developfrom
ck_tile/main

Conversation

@asroy
Copy link
Copy Markdown
Contributor

@asroy asroy commented Feb 2, 2024

CK Tile Programming Interface and some examples

asroy and others added 30 commits September 5, 2023 13:10
Tile Program init bulk PR

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com>
* adding gemm+softmax+gemm
* removing program server

* specify launch bound per kernel instance
* make it simple

* batched gemm+softmax+gemm
* adding in-thread shuffle

* update softmax example

* refactor grid gemm

* refactor gemm: layouts

* bug fix

* clean

* clean
* add tensor slicing API

* remove redundant ck namespace

* better gemm_gemm interface

* modify gemm_gemm

* add slice_tile api

* fix merge bug

* update to 3d padding, since we no longer need that much LDS size

* clean

* cleang

* clean

* clean

* clean

* clean

* clean

* clean

* clean

* clean

* clean

* clean

* clean

* clean

---------

Co-authored-by: Chao Liu <chao.liu2@amd.com>
* slice kv, and use 3d padding LDS layout

* add missing sync

* put sync to another poace

* move sync place

* revert to normal
* refactor

* refactor

* change load_tile, update block gemm

* debug

* clean

* clean

* experiment lod

* workaround spilling issue

* clean
…aticDistribution<> (#12)

* Extract store_tile() logics as method

* Extract load_tile() logics as method

* Rename type alias

* Extract common logics as traits

* Remove unnecessary access specifier

* Add ComputeMode for TileWindowWithStaticDistribution

* Put field check into Traits

* More definition of Traits types

* Use more clear static_assert() message

* Enable pre-compute coordinates in store_tile()

* Re-formate static assert

* Undo changes to the wrong method

* Enable pre-compute coords for store_tile()

* Remove static_vector usage

* Add method to move non-member coordinates

* Force using pre-computed coordinates in Store()

* Fix wrong access for SFC_Ys

* Change comment

* Allow users to hint # access per coord

* Add comment for noting remove data members later

* Unify FIXME comments

* Replace FIXME comments by TODO

* Let user specify HintNumCoords

* clean

* clean

* clean

* clean

* refactor load/store for window

* clean

* clean

* bug fix for window; clean

---------

Co-authored-by: Chao Liu <chao.liu2@amd.com>
* Revert "Extract gemm0 prefetch0 out from loop"

This reverts commit d3b56f39f9fd12edb476b24ae9cf480841d311e4.

* add fmha fwd  pipeline

* Extract gemm0 prefetch0 out from loop

* move blockSize to another place ; fix a missing header in tile_window_impl_static_distribution.hpp

* remove KArgs from tile modules

---------

Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* refactor gemm+softmax+gemm using block-gemm

* reorg files

* clean
* support batch & nhead

* support scale

* tile scheduler

* rename tile-scheduler to tile-partitioner

* add some exp2 math

* fix a bug when chaning tile size
* unify q persistent in register

* add refactor warp_gemm dispatcher
* support hdim=64/128 in same example code

* support v transpose

* revert gemm.cpp, not intent to modify it

* remove useless code

* fix a bug for swizzle C encoding, no perf change

* optimize LDS encoding

* update LDS layout

* clean up code
* fix build for old ck examples

* fix build for old ck
* Add include/ck/config.h to support xformers c++ extension building

* Disable exp() and log() overloading for half_t to support xformers C++ extension building

* config.h.default

---------

Co-authored-by: Chao Liu <chao.liu2@amd.com>
asroy and others added 15 commits December 8, 2023 22:50
This PR introduces support for double buffering in LDS into GEMM kernels that use direct load instructions.

Direct loads now use inline asm instead of intrinsics. Usage of intrinsics results in compiler adding additional waitcnt instructions what breaks possible load/compute overlap in case of double buffering.

Usage of inline asm results in the need to use sched_barrier in order to make sure that compiler cannot incorrectly reschedule instructions since it does not know the data dependencies between global->LDS and LDS->registers.
* add daily build with mainline compiler

* fix the compiler paths for ci

* remove the -flto flag

* build with clang by default
* Introduce wrapper library

* Update cmake files

* Revert "Update cmake files"

This reverts commit c27f88b.

* Fix comments
* turn on -O3 compiler flag explicitly

* change cmake syntax for CI

* modify cmake line breaks in jenkinsfile
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.29.0 to 0.30.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.29.0...v0.30.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
* switch from ROCmSoftwarePlatform to ROCm org

* replace ROCmSoftwarePlatform with ROCm in few more places
* Support broadcast for bias in grouped conv fwd

* Fix comment

* Comment fixes

* Remove GK layout
@asroy asroy added the WIP label Feb 2, 2024
@asroy asroy requested review from carlushuang and zjing14 February 2, 2024 22:20
@carlushuang carlushuang deleted the ck_tile/main branch September 26, 2024 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.