-
Notifications
You must be signed in to change notification settings - Fork 300
[A8W8 GEMM] Optimized weight-preshuffled implementation & add quantization datatype for CK TILE rms_norm #1862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 100 commits
Commits
Show all changes
123 commits
Select commit
Hold shift + click to select a range
4f65f7b
tempsave
aska-0096 9a99c84
temp save
aska-0096 1ca98e7
tempsave
aska-0096 cbf14ee
tempsave, epilogue optimization for universal gemm done. TODO: mulitp…
aska-0096 5d9c964
temp save
aska-0096 7c8e92f
tempsave
aska-0096 4885c38
Merge branch 'transpose_opt' of https://github.com/ROCm/composable_ke…
aska-0096 6df9170
temp save
aska-0096 99475cf
update bf16 instance list
aska-0096 41fcfbc
clang format
aska-0096 cc404d1
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 7da19c8
bug fix
aska-0096 dbfcb38
temp save
aska-0096 81fb545
tempsave
aska-0096 0b3a409
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 9a89790
revert exp changes.
aska-0096 9c715f6
add blank line
aska-0096 7fb0b32
add int8 gemm multiply multiply a8w8
junhaha666 09852d3
uncomment
junhaha666 1670bba
clang-format-12
junhaha666 f38eb5c
Add example_gemm_multiply_multiply_xdl_int8
junhaha666 d21003a
Remove shell scripts
junhaha666 47294b4
Merge branch 'develop' into gemm_multiply_multiply_int8a8w8
aska-0096 e8c1953
update preprocess number for mi308; bring back printout in ckprofiler
aska-0096 b3e5048
tempsave
aska-0096 b97c687
update ck_a8w8 library, update flush cache timing api
aska-0096 f20e48f
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 7a0ad60
remove the change in ckprofiler src
aska-0096 55cb3bd
clean the flush_cache api
aska-0096 925c071
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 2b840f5
reduce prefetch stage in blockwisepipev4
aska-0096 f3bbfe3
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 ec6b000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 4132643
update tile size for fp8 rowwise
aska-0096 c99e3d5
Merge branch 'mem_gemm_opt' of https://github.com/ROCm/composable_ker…
aska-0096 98fc138
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 ea90b01
fix bug in enable f8 gemm inside ckProfiler
aska-0096 26d5174
update instance and lds layout strategy
aska-0096 0dcd489
delete use less files
aska-0096 2e9901b
fix cmake bug
aska-0096 e8ca3da
update instances
aska-0096 c8c016d
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 fc559be
add configs to fix tunning cases
coderfeli 3f50b99
port tiles from a8w8
coderfeli 9ba219c
rm debug used files
coderfeli 5e5e1a5
add instances
coderfeli 19b7c13
remove all non gemm in cmake
coderfeli f82c9ae
fix build
coderfeli 1a089f6
sanity bug fix
aska-0096 4a1ec81
add bypass logic and build
coderfeli 3784329
can run
coderfeli e6f5a78
add double buffer scratch
coderfeli 7cec63a
remove agpr usage when vgpr usage <256
aska-0096 1d074e3
add configs to fix tunning cases
coderfeli 04f09f0
fix build
coderfeli 400cac2
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
coderfeli 031ddf3
fix performance regression on blockgemm v3 pipe
coderfeli c8d9660
using develop branch timer
coderfeli e2127d7
impl fp16 in ckprofiler
coderfeli 842d910
Merge branch 'update_cka8w8' into update_cka8w8_uc
coderfeli 174b46b
add cpu shuffle
coderfeli 2c05662
fix tail
coderfeli 7efafa1
use empty hipstream in ckprofiler
aska-0096 e92395d
Merge remote-tracking branch 'origin/cka8w8_devtimer' into update_cka…
coderfeli fda5f8c
fix missed files and fix clang format
coderfeli 1137424
fix fp16 build
coderfeli c263bbe
fix cmake rm compile options
coderfeli 54f44e6
fix brepeat, kloop and lds two buffer; works ok now
coderfeli 3f9dbca
use new pipeline for b preshuffle, run ok; revert olds to fix ckprofiler
coderfeli 5765ba5
auto calculate hard code params
coderfeli db84352
fix warnings and revert cmake and fix clang format
coderfeli 74ef502
tempsave
aska-0096 482ca68
Merge branch 'dev/a8w8_b_preshuffle' of https://github.com/ROCm/compo…
aska-0096 f60f9d5
sanity pass, most tile size enabled. TODO: NWave!=4
aska-0096 6f24c2d
disable N, K Padding, splitk enabled
aska-0096 bbbedc1
add fp16 instances
aska-0096 5bbff07
use bpreshuffle as independent example
aska-0096 72c1dda
Merge branch 'add_a8w8_preshuffle_ckprofiler' of https://github.com/R…
aska-0096 0dbe537
refine weight preshuffle format.
aska-0096 9dd74e0
tempsave
aska-0096 22fe522
optimize software pipeline
aska-0096 487a05d
refine blockgemm pipeline version as base struct.
aska-0096 35ba086
fp8 add_rmsnorm_dynamic_dequant
aska-0096 b755f37
add save_x=true instance
aska-0096 cee23c4
tempsave
aska-0096 d47461d
Add compute-friendly pipeline for bpreshuffle case; remove enable-pos…
aska-0096 115c750
fix Odd Mrepeat number pipelinev3; Add v3 instances to ckProfiler
aska-0096 add0b22
clean the code
aska-0096 af30d6b
Merge pull request #1838 from ROCm/cka8w8_uc_newpipe
aska-0096 800cf89
Merge from internal (#1857)
illsilin 1b61699
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 c1a0a8c
clang format
aska-0096 5be42bb
fix errors
aska-0096 df9868c
fix errors
aska-0096 e46d163
remove compile flags in example
aska-0096 ef86614
fix error
aska-0096 2bef550
restore cron trigger (#1863)
illsilin 1c7f994
recover enable-post-misched=0 for sanity issue
aska-0096 5bb041b
add vectorloads on non-k dim for memory pipelines (#1856)
jakpiase feb656d
Support for dtypes (fp8, bf8, bf16 and fp16) for the ck_tile/03_gemm …
kylasa d64030e
revert blockwisegemm modification
aska-0096 730b98e
revert blkgemm pipe v2 changes.
aska-0096 b5d201d
CK Tile - small fix to hotloop scheduler & KPack value. (#1867)
aosewski ae4243d
Add a host mx gemm reference kernel (#1864)
geyyer f49de49
External CI: enable amd-develop branch trigger (#1859)
danielsu-amd 4106dfa
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
8ce4103
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 a9df418
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
e0eabf0
Apply suggestions from code review
aska-0096 d6e3e83
Merge branch 'develop' into update_cka8w8_uc
aska-0096 ef2b53a
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
4658f2f
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
0172488
hotfix for ckprofiler operator
aska-0096 4599ee0
Merge branch 'update_cka8w8_uc' of https://github.com/ROCm/composable…
aska-0096 f2c1fa7
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
f18cfec
Merge branch 'develop' into update_cka8w8_uc
aska-0096 3d1701c
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
7450250
Merge branch 'develop' into update_cka8w8_uc
4b98b4f
add the 16x16 mfma instances
f01c41a
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
6e6ce35
Add more instances for 16x16 instructions
81c4a3a
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
4070b0b
Merge branch 'develop' into update_cka8w8_uc
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,7 @@ | ||
| add_example_executable(example_gemm_multiply_multiply_xdl_fp8 gemm_multiply_multiply_xdl_fp8.cpp) | ||
| # target_compile_options(example_gemm_multiply_multiply_xdl_fp8 PRIVATE -mllvm -greedy-reverse-local-assignment=1 -save-temps=$PWD -Wno-gnu-line-marker) | ||
| add_example_executable(example_gemm_multiply_multiply_xdl_fp8_ab_scale gemm_multiply_multiply_xdl_fp8_ab_scale.cpp) | ||
| add_example_executable(example_gemm_multiply_multiply_xdl_fp8_bpreshuffle gemm_multiply_multiply_xdl_fp8_bpreshuffle.cpp) | ||
| # target_compile_options(example_gemm_multiply_multiply_xdl_fp8_bpreshuffle PRIVATE -mllvm -greedy-reverse-local-assignment=1 -save-temps=$PWD -Wno-gnu-line-marker) | ||
|
aska-0096 marked this conversation as resolved.
Outdated
|
||
| add_example_executable(example_gemm_add_add_xdl_fp16 gemm_add_add_xdl_fp16.cpp) | ||
| add_example_executable(example_gemm_multiply_multiply_xdl_int8 gemm_multiply_multiply_xdl_int8.cpp) | ||
| add_example_executable(example_gemm_multiply_multiply_xdl_int8 gemm_multiply_multiply_xdl_int8.cpp) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.