Skip to content

Depth2space#35

Closed
j4yan wants to merge 11 commits into
ROCm:developfrom
j4yan:depth2space
Closed

Depth2space#35
j4yan wants to merge 11 commits into
ROCm:developfrom
j4yan:depth2space

Conversation

@j4yan
Copy link
Copy Markdown
Contributor

@j4yan j4yan commented Sep 24, 2021

  • Add one more template parameter, TransfromToConvOutput to device_convolution_forward_implicit_gemm. This allow it to do the convolution only, or to fuse the convolution and certain types of operations. No changes in kernels.
  • For convolution + depth2space, TransformToConvOutput = TransformDepth2SpaceToConvolution_nhwc.
  • Verification.

@j4yan j4yan requested review from asroy and zjing14 September 24, 2021 00:24
asroy pushed a commit that referenced this pull request Oct 6, 2021
* create files for xdlops

* working on blockwise_gemm_xdlops

* add KReduction

* add m/n repeats

* add 2x2 pipeline

* added 128x128 wavegemm

* use StaticBuffer of vector_type

* break vector type to blk_size

* add kpack into xldops_gemm and blockwise_gemm

* abroadcast only

* add fp32 mfma instructions

* adding fp16 mfma

* pack half4_t

* rename kperwave to kpack

* add 32x32x8fp16

* add fp16 mfma

* clean code

* clean code

* V4r4 xdlops kpack (#35)

* add kpack with incorrect results

* bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2

* add 1x1 kernel

* add gridwise_gemm_v2 - single_buffer

* enabled dwordx4 for fp16

Co-authored-by: Chao Liu <chao.liu2@amd.com>

* refactor fwd-v4r4-xdlops

* add v4r4-nhwc-xdlop

* improve some perf of nhwc and nchw by tuning parameters, and change scheuduling in gridwise-gemm loop

* tweak scheduling in gridwise gemm

* add v4r3 with a single output copy

* init commit: output with slice win

* adding sliceWin

* add multiple repeats pattern

* starting adding bwd-v4r1-xdlops

* use tuple as SrcBuffer

* adding bwd-data v4r1 nhwc xdlops

* fix bug in make_dynamic_naive_tensor_descriptor_aligned_v2()

* fix bug in host bwd-data conv

* initial implementation of bwd-data v4r1 nhwc xdlops

* add launch bound flags

* enable launch bound

* add m/nrepeat=4

* tweak bwd-data v4r1 nhwc xdlops

* added bwd-data v4r1 nhwc xlops with output A and weight B

* add fwd-v4r4 nhwc xdlops, A input, B weight, C output

Co-authored-by: Chao Liu <chao.liu2@amd.com>
@asroy
Copy link
Copy Markdown
Contributor

asroy commented Nov 15, 2021

@j4yan Please:

  • merge the latest develop branch
  • Add depth2space as an example (instead of a driver) under example/ directory
    cc @zjing14

@asroy
Copy link
Copy Markdown
Contributor

asroy commented Feb 23, 2022

no longer needed

@asroy asroy closed this Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants