Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
4f65f7b
tempsave
aska-0096 Aug 22, 2024
9a99c84
temp save
aska-0096 Aug 26, 2024
1ca98e7
tempsave
aska-0096 Aug 26, 2024
cbf14ee
tempsave, epilogue optimization for universal gemm done. TODO: mulitp…
aska-0096 Sep 1, 2024
5d9c964
temp save
aska-0096 Sep 2, 2024
7c8e92f
tempsave
aska-0096 Sep 3, 2024
4885c38
Merge branch 'transpose_opt' of https://github.com/ROCm/composable_ke…
aska-0096 Sep 3, 2024
6df9170
temp save
aska-0096 Sep 4, 2024
99475cf
update bf16 instance list
aska-0096 Sep 4, 2024
41fcfbc
clang format
aska-0096 Sep 4, 2024
cc404d1
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Sep 4, 2024
7da19c8
bug fix
aska-0096 Sep 4, 2024
dbfcb38
temp save
aska-0096 Sep 5, 2024
81fb545
tempsave
aska-0096 Sep 5, 2024
0b3a409
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Sep 6, 2024
9a89790
revert exp changes.
aska-0096 Sep 6, 2024
9c715f6
add blank line
aska-0096 Sep 6, 2024
7fb0b32
add int8 gemm multiply multiply a8w8
junhaha666 Oct 21, 2024
09852d3
uncomment
junhaha666 Oct 21, 2024
1670bba
clang-format-12
junhaha666 Oct 21, 2024
f38eb5c
Add example_gemm_multiply_multiply_xdl_int8
junhaha666 Oct 21, 2024
d21003a
Remove shell scripts
junhaha666 Oct 22, 2024
47294b4
Merge branch 'develop' into gemm_multiply_multiply_int8a8w8
aska-0096 Oct 23, 2024
e8c1953
update preprocess number for mi308; bring back printout in ckprofiler
aska-0096 Oct 25, 2024
b3e5048
tempsave
aska-0096 Oct 30, 2024
b97c687
update ck_a8w8 library, update flush cache timing api
aska-0096 Nov 5, 2024
f20e48f
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Nov 5, 2024
7a0ad60
remove the change in ckprofiler src
aska-0096 Nov 5, 2024
55cb3bd
clean the flush_cache api
aska-0096 Nov 5, 2024
925c071
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Nov 5, 2024
2b840f5
reduce prefetch stage in blockwisepipev4
aska-0096 Nov 18, 2024
f3bbfe3
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Nov 18, 2024
ec6b000
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Nov 19, 2024
4132643
update tile size for fp8 rowwise
aska-0096 Nov 19, 2024
c99e3d5
Merge branch 'mem_gemm_opt' of https://github.com/ROCm/composable_ker…
aska-0096 Nov 20, 2024
98fc138
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Nov 20, 2024
ea90b01
fix bug in enable f8 gemm inside ckProfiler
aska-0096 Nov 20, 2024
26d5174
update instance and lds layout strategy
aska-0096 Nov 26, 2024
0dcd489
delete use less files
aska-0096 Nov 26, 2024
2e9901b
fix cmake bug
aska-0096 Nov 27, 2024
e8ca3da
update instances
aska-0096 Dec 13, 2024
c8c016d
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Dec 13, 2024
fc559be
add configs to fix tunning cases
coderfeli Dec 23, 2024
3f50b99
port tiles from a8w8
coderfeli Dec 23, 2024
9ba219c
rm debug used files
coderfeli Dec 24, 2024
5e5e1a5
add instances
coderfeli Dec 24, 2024
19b7c13
remove all non gemm in cmake
coderfeli Dec 24, 2024
f82c9ae
fix build
coderfeli Dec 25, 2024
1a089f6
sanity bug fix
aska-0096 Dec 26, 2024
4a1ec81
add bypass logic and build
coderfeli Dec 26, 2024
3784329
can run
coderfeli Dec 26, 2024
e6f5a78
add double buffer scratch
coderfeli Dec 26, 2024
7cec63a
remove agpr usage when vgpr usage <256
aska-0096 Dec 27, 2024
1d074e3
add configs to fix tunning cases
coderfeli Dec 23, 2024
04f09f0
fix build
coderfeli Dec 25, 2024
400cac2
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
coderfeli Dec 27, 2024
031ddf3
fix performance regression on blockgemm v3 pipe
coderfeli Dec 27, 2024
c8d9660
using develop branch timer
coderfeli Dec 27, 2024
e2127d7
impl fp16 in ckprofiler
coderfeli Dec 27, 2024
842d910
Merge branch 'update_cka8w8' into update_cka8w8_uc
coderfeli Dec 27, 2024
174b46b
add cpu shuffle
coderfeli Dec 27, 2024
2c05662
fix tail
coderfeli Dec 27, 2024
7efafa1
use empty hipstream in ckprofiler
aska-0096 Dec 27, 2024
e92395d
Merge remote-tracking branch 'origin/cka8w8_devtimer' into update_cka…
coderfeli Dec 27, 2024
fda5f8c
fix missed files and fix clang format
coderfeli Dec 27, 2024
1137424
fix fp16 build
coderfeli Dec 27, 2024
c263bbe
fix cmake rm compile options
coderfeli Dec 27, 2024
54f44e6
fix brepeat, kloop and lds two buffer; works ok now
coderfeli Dec 30, 2024
3f9dbca
use new pipeline for b preshuffle, run ok; revert olds to fix ckprofiler
coderfeli Dec 30, 2024
5765ba5
auto calculate hard code params
coderfeli Dec 30, 2024
db84352
fix warnings and revert cmake and fix clang format
coderfeli Dec 30, 2024
74ef502
tempsave
aska-0096 Dec 30, 2024
482ca68
Merge branch 'dev/a8w8_b_preshuffle' of https://github.com/ROCm/compo…
aska-0096 Dec 30, 2024
f60f9d5
sanity pass, most tile size enabled. TODO: NWave!=4
aska-0096 Dec 30, 2024
6f24c2d
disable N, K Padding, splitk enabled
aska-0096 Dec 31, 2024
bbbedc1
add fp16 instances
aska-0096 Dec 31, 2024
5bbff07
use bpreshuffle as independent example
aska-0096 Dec 31, 2024
72c1dda
Merge branch 'add_a8w8_preshuffle_ckprofiler' of https://github.com/R…
aska-0096 Dec 31, 2024
0dbe537
refine weight preshuffle format.
aska-0096 Jan 2, 2025
9dd74e0
tempsave
aska-0096 Jan 3, 2025
22fe522
optimize software pipeline
aska-0096 Jan 8, 2025
487a05d
refine blockgemm pipeline version as base struct.
aska-0096 Jan 8, 2025
35ba086
fp8 add_rmsnorm_dynamic_dequant
aska-0096 Jan 10, 2025
b755f37
add save_x=true instance
aska-0096 Jan 13, 2025
cee23c4
tempsave
aska-0096 Jan 17, 2025
d47461d
Add compute-friendly pipeline for bpreshuffle case; remove enable-pos…
aska-0096 Jan 22, 2025
115c750
fix Odd Mrepeat number pipelinev3; Add v3 instances to ckProfiler
aska-0096 Jan 23, 2025
add0b22
clean the code
aska-0096 Jan 23, 2025
af30d6b
Merge pull request #1838 from ROCm/cka8w8_uc_newpipe
aska-0096 Jan 23, 2025
800cf89
Merge from internal (#1857)
illsilin Feb 4, 2025
1b61699
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Feb 5, 2025
c1a0a8c
clang format
aska-0096 Feb 5, 2025
5be42bb
fix errors
aska-0096 Feb 5, 2025
df9868c
fix errors
aska-0096 Feb 5, 2025
e46d163
remove compile flags in example
aska-0096 Feb 5, 2025
ef86614
fix error
aska-0096 Feb 5, 2025
2bef550
restore cron trigger (#1863)
illsilin Feb 5, 2025
1c7f994
recover enable-post-misched=0 for sanity issue
aska-0096 Feb 6, 2025
5bb041b
add vectorloads on non-k dim for memory pipelines (#1856)
jakpiase Feb 6, 2025
feb656d
Support for dtypes (fp8, bf8, bf16 and fp16) for the ck_tile/03_gemm …
kylasa Feb 6, 2025
d64030e
revert blockwisegemm modification
aska-0096 Feb 7, 2025
730b98e
revert blkgemm pipe v2 changes.
aska-0096 Feb 7, 2025
b5d201d
CK Tile - small fix to hotloop scheduler & KPack value. (#1867)
aosewski Feb 7, 2025
ae4243d
Add a host mx gemm reference kernel (#1864)
geyyer Feb 7, 2025
f49de49
External CI: enable amd-develop branch trigger (#1859)
danielsu-amd Feb 7, 2025
4106dfa
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
Feb 7, 2025
8ce4103
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
aska-0096 Feb 8, 2025
a9df418
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
Feb 10, 2025
e0eabf0
Apply suggestions from code review
aska-0096 Feb 11, 2025
d6e3e83
Merge branch 'develop' into update_cka8w8_uc
aska-0096 Feb 11, 2025
ef2b53a
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
Feb 12, 2025
4658f2f
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
Feb 12, 2025
0172488
hotfix for ckprofiler operator
aska-0096 Feb 13, 2025
4599ee0
Merge branch 'update_cka8w8_uc' of https://github.com/ROCm/composable…
aska-0096 Feb 13, 2025
f2c1fa7
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
Feb 13, 2025
f18cfec
Merge branch 'develop' into update_cka8w8_uc
aska-0096 Feb 14, 2025
3d1701c
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
Feb 14, 2025
7450250
Merge branch 'develop' into update_cka8w8_uc
Feb 15, 2025
4b98b4f
add the 16x16 mfma instances
Feb 17, 2025
f01c41a
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
Feb 20, 2025
6e6ce35
Add more instances for 16x16 instructions
Feb 20, 2025
81c4a3a
Merge branch 'develop' of https://github.com/ROCm/composable_kernel i…
Feb 20, 2025
4070b0b
Merge branch 'develop' into update_cka8w8_uc
Feb 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion example/65_gemm_multiply_multiply/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
add_example_executable(example_gemm_multiply_multiply_xdl_fp8 gemm_multiply_multiply_xdl_fp8.cpp)
add_example_executable(example_gemm_multiply_multiply_xdl_fp8_ab_scale gemm_multiply_multiply_xdl_fp8_ab_scale.cpp)
add_example_executable(example_gemm_multiply_multiply_xdl_fp8_bpreshuffle gemm_multiply_multiply_xdl_fp8_bpreshuffle.cpp)
add_example_executable(example_gemm_add_add_xdl_fp16 gemm_add_add_xdl_fp16.cpp)
add_example_executable(example_gemm_multiply_multiply_xdl_int8 gemm_multiply_multiply_xdl_int8.cpp)
add_example_executable(example_gemm_multiply_multiply_xdl_int8 gemm_multiply_multiply_xdl_int8.cpp)
Original file line number Diff line number Diff line change
Expand Up @@ -69,18 +69,21 @@ using AElementOp = PassThrough;
using BElementOp = PassThrough;
using CDEElementOp = MultiplyMultiply;

static constexpr auto GemmSpec = ck::tensor_operation::device::GemmSpecialization::MNPadding;
static constexpr auto GemmSpec = ck::tensor_operation::device::GemmSpecialization::Default;

using DeviceOpInstance = ck::tensor_operation::device::DeviceGemmMultiD_Xdl_CShuffle_V3
// clang-format off
///######| ALayout| BLayout| DsLayout| ELayout| AData| BData| DsData| EData| AccData| CShuffle| A| B| CDE| GEMM| Block| MPer| NPer| KPer| AK1| BK1| MPer| NPer| MXdl| NXdl| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockLds| BBlockTransfer| BBlockTransfer| BBlockTransfer| BlockTransfer| BBlockTransfer| BBlockTransfer| BBlockLds| CShuffle| CShuffle| CBlockTransferClusterLengths| CBlockTransfer|
///######| | | | | Type| Type| Type| Type| Type| DataType| Elementwise| Elementwise| Elementwise| Spacialization| Size| Block| Block| Block| | | XDL| XDL| Per| Per| ThreadCluster| ThreadCluster| SrcAccessOrder| SrcVectorDim| SrcScalar| DstScalar| AddExtraM| ThreadCluster| ThreadCluster| SrcAccessOrder| SrcVectorDim| SrcScalar| DstScalar| AddExtraN| MXdlPerWave| NXdlPerWave| _MBlock_MWaveMPerXdl| ScalarPerVector|
///######| | | | | | | | | | | Operation| Operation| Operation| | | | | | | | | | Wave| Wave| Lengths_K0_M_K1| ArrangeOrder| | | PerVector| PerVector_K1| | Lengths_K0_N_K1| ArrangeOrder| | | PerVector| PerVector_K1| | PerShuffle| PerShuffle| _NBlock_NWaveNPerXdl| _NWaveNPerXdl|
///######| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | S<C, D0, D1>|
///###### RRR
///< Row, Row, DsLayout, ELayout, A0DataType, B0DataType, DsDataType, EDataType, AccDataType, CShuffleDataType, AElementOp, BElementOp, CDEElementOp, GemmSpec, 256, 256, 128, 64, 16, 4, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, 0, S<16, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 8, 4, 0, 1, 1, S<1, 32, 1, 8>, S<8, 8, 1>, ck::BlockGemmPipelineScheduler::Interwave, ck::BlockGemmPipelineVersion::v1, FP8>;
///###### RCR
< Row, Col, DsLayout, ELayout, A0DataType, B0DataType, DsDataType, EDataType, AccDataType, CShuffleDataType, AElementOp, BElementOp, CDEElementOp, GemmSpec, 256, 256, 128, 64, 16, 16, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, 0, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, 0, 1, 1, S<1, 32, 1, 8>, S<8, 8, 1>, ck::BlockGemmPipelineScheduler::Interwave, ck::BlockGemmPipelineVersion::v1, FP8>;
<Row, Col, DsLayout, ELayout,
A0DataType, B0DataType, DsDataType, EDataType, AccDataType, CShuffleDataType,
AElementOp, BElementOp, CDEElementOp, GemmSpec, 256,
128, 128, 128,
16, 16,
16, 16,
4, 4,
S<8, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, 0,
S<8, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, 0,
1, 2, S<1, 32, 1, 8>, S<8, 8, 1>,
ck::BlockGemmPipelineScheduler::Intrawave, ck::BlockGemmPipelineVersion::v3, FP8>;
// clang-format on

int main(int argc, char* argv[])
Expand Down Expand Up @@ -229,7 +232,7 @@ int main(int argc, char* argv[])
"not support this GEMM problem");
}

float ave_time = invoker.Run(argument, StreamConfig{nullptr, time_kernel, 20, 50});
float ave_time = invoker.Run(argument, StreamConfig{nullptr, time_kernel, 0, 20, 50, true, 50});

std::size_t flop = std::size_t(2) * M * N * K;
std::size_t num_btype =
Expand Down
Loading