Skip to content

Backward weight v4r4r2 with xdlops#18

Merged
asroy merged 25 commits into
developfrom
backward_weight_v4r4r2_xdlops
Aug 31, 2021
Merged

Backward weight v4r4r2 with xdlops#18
asroy merged 25 commits into
developfrom
backward_weight_v4r4r2_xdlops

Conversation

@ltqin
Copy link
Copy Markdown
Collaborator

@ltqin ltqin commented Aug 20, 2021

This PR is for backward weight with xdlops, data layout is nchw

@ltqin ltqin requested review from asleepzzz, asroy and zjing14 August 20, 2021 06:47
@asroy
Copy link
Copy Markdown
Contributor

asroy commented Aug 23, 2021

@ltqin There is not atomic-add yet. Do you want to add it in this PR, or in future PR?

@ltqin
Copy link
Copy Markdown
Collaborator Author

ltqin commented Aug 24, 2021

@ltqin There is not atomic-add yet. Do you want to add it in this PR, or in future PR?

will be another PR

@zjing14
Copy link
Copy Markdown
Contributor

zjing14 commented Aug 24, 2021

@ltqin did you test both fp32 and fp16?

@zjing14
Copy link
Copy Markdown
Contributor

zjing14 commented Aug 24, 2021

@ltqin You may merge the latest develop branch and test, since there are couples of changes that affect hacks.

Comment thread host/driver_offline/src/conv_wrw_driver_offline.cpp Outdated
Comment thread host/driver_offline/src/conv_wrw_driver_offline.cpp Outdated
Comment thread host/host_tensor/include/host_conv_wrw.hpp
@ltqin
Copy link
Copy Markdown
Collaborator Author

ltqin commented Aug 25, 2021

@ltqin There is not atomic-add yet. Do you want to add it in this PR, or in future PR?

I want add it in future PR

@ltqin
Copy link
Copy Markdown
Collaborator Author

ltqin commented Aug 25, 2021

@ltqin did you test both fp32 and fp16?

yes,using follow code:
#if 0
using in_data_t = float;
using acc_data_t = float;
using out_data_t = float;
#elif 1
using in_data_t = half_t;
using acc_data_t = float;
using out_data_t = half_t;
#elif 1
using in_data_t = int8_t;
using acc_data_t = int32_t;
using out_data_t = int8_t;
#endif

@ltqin
Copy link
Copy Markdown
Collaborator Author

ltqin commented Aug 25, 2021

@ltqin You may merge the latest develop branch and test, since there are couples of changes that affect hacks.

done

@ltqin
Copy link
Copy Markdown
Collaborator Author

ltqin commented Aug 25, 2021

@ltqin You may merge the latest develop branch and test, since there are couples of changes that affect hacks.

done

Comment thread host/driver_offline/src/conv_wrw_driver_offline.cpp Outdated
@asroy
Copy link
Copy Markdown
Contributor

asroy commented Aug 28, 2021

I tested this PR and found failure, using commit b2ea9aa

./host/driver_offline/conv_wrw_driver_offline  0 0 1 4 0 1   128  256  128 3 3  14   14     1 1       1 1      1 1       1 1

layout: 0
in: dim 4, lengths {128, 128, 14, 14}, strides {25088, 196, 14, 1}
wei: dim 4, lengths {256, 128, 3, 3}, strides {1152, 9, 3, 1}
out: dim 4, lengths {128, 256, 14, 14}, strides {50176, 196, 14, 1}
InLeftPads size 2, {1, 1, }
InRightPads size 2, {1, 1, }
ConvStrides size 2, {1, 1, }
ConvDilations size 2, {1, 1, }
device_convolution_backward_weight_implicit_gemm_v4r4r2_xdlops_nchw_kcyx_nkhw
a_k0_m_k1_grid_desc{3136, 256, 8}
b_k0_n_k1_grid_desc{3136, 1152, 8}
c_m_n_grid_desc{ 256, 1152}
launch_and_time_kernel: grid_dim {18, 1, 1}, block_dim {256, 1, 1}
Warm up
Start running 1 times...
Average time : 3.41418 ms, 4.33413 TFlop/s
a_k0_m_k1_grid_desc{3136, 256, 8}
b_k0_n_k1_grid_desc{3136, 1152, 8}
c_m_n_grid_desc{ 256, 1152}
Average time : 3.39738 ms, 4.35556 TFlop/s
error: 2.8372e+07
max_diff: 843, 7786, 6943

@ltqin

@asroy
Copy link
Copy Markdown
Contributor

asroy commented Aug 28, 2021

@ltqin I fixed some review comments (adding comment, using Number<> instead of index_t, etc)

zjing14
zjing14 previously approved these changes Aug 30, 2021
@ltqin
Copy link
Copy Markdown
Collaborator Author

ltqin commented Aug 31, 2021

I tested this PR and found failure, using commit b2ea9aa

./host/driver_offline/conv_wrw_driver_offline  0 0 1 4 0 1   128  256  128 3 3  14   14     1 1       1 1      1 1       1 1
...
@ltqin

@asroy change vector load = 4, so config's ho*wo must be a multiple of 4

@asroy asroy self-requested a review August 31, 2021 03:46
@asroy asroy merged commit 627d8ef into develop Aug 31, 2021
asroy added a commit that referenced this pull request Dec 1, 2023
@illsilin illsilin deleted the backward_weight_v4r4r2_xdlops branch December 8, 2023 16:02
carlushuang pushed a commit that referenced this pull request Mar 26, 2024
hyoon1 pushed a commit to hyoon1/composable_kernel that referenced this pull request Mar 19, 2026
Slightly improve installation process
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants