[MIOpen Downstream] Fix Reduction Kernel by qianfengz · Pull Request #34 · ROCm/composable_kernel

qianfengz · 2021-09-22T07:55:06Z

This P.R includes everything of the kernel layer stuffs from MIOpen's dynamic generic reduction implementation.

kernel wrappers
Grid-wise generic reduction kernels in four methods (Direct_ThreadWise, Direct_WarpWise, BlockWise, MultiBlock)
Re-usable reduction functions (ThreadWise, WarpWise, BlockWise)
Some Addings/Changes to the C.K utilities

…ct_threadwise kernel

… value

…f binary operator

…perator.hpp

…ernel wrapper files and simplify reduce_all kernel wrappers

…wrappers

asroy · 2021-09-29T01:33:24Z

A syncing PR of MIOpen PR ROCm/MIOpen#1156

qianfengz · 2021-09-29T07:10:30Z

Just push two commits from MIOpen/reduction_fix_generic. This should be the final synchronization from reduction_fix_generic

…reduction_fix_generic

* create files for xdlops * working on blockwise_gemm_xdlops * add KReduction * add m/n repeats * add 2x2 pipeline * added 128x128 wavegemm * use StaticBuffer of vector_type * break vector type to blk_size * add kpack into xldops_gemm and blockwise_gemm * abroadcast only * add fp32 mfma instructions * adding fp16 mfma * pack half4_t * rename kperwave to kpack * add 32x32x8fp16 * add fp16 mfma * clean code * clean code * V4r4 xdlops kpack (#35) * add kpack with incorrect results * bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2 * add 1x1 kernel * add gridwise_gemm_v2 - single_buffer * enabled dwordx4 for fp16 Co-authored-by: Chao Liu <chao.liu2@amd.com> * refactor fwd-v4r4-xdlops * add v4r4-nhwc-xdlop * improve some perf of nhwc and nchw by tuning parameters, and change scheuduling in gridwise-gemm loop * tweak scheduling in gridwise gemm * add v4r3 with a single output copy * init commit: output with slice win * adding sliceWin * add multiple repeats pattern * starting adding bwd-v4r1-xdlops * use tuple as SrcBuffer * adding bwd-data v4r1 nhwc xdlops * fix bug in make_dynamic_naive_tensor_descriptor_aligned_v2() * fix bug in host bwd-data conv * initial implementation of bwd-data v4r1 nhwc xdlops * add launch bound flags * enable launch bound * add m/nrepeat=4 * tweak bwd-data v4r1 nhwc xdlops * added bwd-data v4r1 nhwc xlops with output A and weight B * add fwd-v4r4 nhwc xdlops, A input, B weight, C output Co-authored-by: Chao Liu <chao.liu2@amd.com>

qianfengz added 11 commits September 15, 2021 08:49

Tiny fix in using data type template parameters in blockwise and dire…

a18e648

…ct_threadwise kernel

Fix with regard to implementing GetZeroVal() in both kernel and host

5a9f630

Avoid convert to compType from dstDataType before writting the output…

eac1753

… value

Add half_t support to NumericLimits and make constexpr GetZeroVal() o…

f0019df

…f binary operator

Add CONSTANT decorator for descriptor read buffer

92e1588

Use get_thread_local_1d_id() for thread local Id

52ae56f

Rename GetZeroVal() to GetReductionZeroVal() in the kernels

4fea425

Remove constexpr from initialized zeroVal and tiny fix in reduction_o…

7a7497f

…perator.hpp

Occasional tiny simplification and update in the kernel files

7218a2b

Update to re-order tensor dimensions on the host, split second_call k…

2bc1ce0

…ernel wrapper files and simplify reduce_all kernel wrappers

Update to remove OpenCL tidy checking failures

e030a28

asroy changed the title ~~Miopen downstream reduction fix generic~~ [MIOpen Downstream] Fix Reduction Sep 22, 2021

asroy changed the title ~~[MIOpen Downstream] Fix Reduction~~ [MIOpen Downstream] Fix Reduction Kernel Sep 22, 2021

qianfengz requested a review from asroy September 23, 2021 06:20

qianfengz added 2 commits September 24, 2021 07:37

Update for better readability

1b9c39b

Remove unused codes and not-needed template parameters in the kernel …

c36084f

…wrappers

Merge remote-tracking branch 'origin/develop' into miopen_downstream-…

f49d774

…reduction_fix_generic

asroy approved these changes Oct 6, 2021

View reviewed changes

asroy merged commit b2dc55f into develop Oct 6, 2021

qianfengz deleted the miopen_downstream-reduction_fix_generic branch June 13, 2022 09:04

illsilin mentioned this pull request Oct 15, 2025

use branch develop to test hipTensor #3034

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIOpen Downstream] Fix Reduction Kernel#34

[MIOpen Downstream] Fix Reduction Kernel#34
asroy merged 14 commits into
developfrom
miopen_downstream-reduction_fix_generic

qianfengz commented Sep 22, 2021

Uh oh!

asroy commented Sep 29, 2021

Uh oh!

qianfengz commented Sep 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qianfengz commented Sep 22, 2021

Uh oh!

asroy commented Sep 29, 2021

Uh oh!

qianfengz commented Sep 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants