Backward weight v4r4r2 with xdlops#18
Conversation
|
@ltqin There is not atomic-add yet. Do you want to add it in this PR, or in future PR? |
will be another PR |
|
@ltqin did you test both fp32 and fp16? |
|
@ltqin You may merge the latest develop branch and test, since there are couples of changes that affect hacks. |
I want add it in future PR |
yes,using follow code: |
done |
done |
|
I tested this PR and found failure, using commit |
|
@ltqin I fixed some review comments (adding comment, using Number<> instead of index_t, etc) |
@asroy change vector load = 4, so config's ho*wo must be a multiple of 4 |
Slightly improve installation process
This PR is for backward weight with xdlops, data layout is nchw