[Triton-MLIR] Keren/code gen for extract slice and alloc tensor by Jokeren · Pull Request #692 · triton-lang/triton

Jokeren · 2022-09-22T06:35:15Z

No description provided.

Jokeren · 2022-09-22T18:06:09Z

@goostavz @Superjomn Please let me know if you have any comments.

I'll be working on insert_slice_async after this.

Superjomn

LGTM

goostavz · 2022-09-23T03:20:05Z

LGTM, no further comments

…yout

Jokeren · 2022-09-23T19:21:04Z

Wait for merging until #701 is merged into master and triton-mlir

Co-authored-by: gzhu <goostavz@outlook.com>

* Move preamble code into tikzplot.tex * Rename kpack to kWidth and allow kWidth = 32 * [API change] Take user input to set dim names API change: - For blocked layout, use -tensorShape, which only takes two dims as dim0,dim1 - For dot layout, use -dotShape, which takes three dims as M,N,K * Re-structure files Separate each layout's code into their own files * Extend dotLayout plot to support kWidth=32 - When kWidth is large, use a smaller elemSize honrizontally to save space - Improve the labels, such as - change vec to kWidth for operands - change opA/opB to inA/inB and include operand dims - remove group dims in the operands so that they don't overlap with operand block dims - Better alignment: dot op and mfma zoomed-in pics are bottom aligned * [API change] Add support for kGroup kGroup is defined as total elements per thread / kWidth for one mfma instruction. We need kGroup = 2 only for the newly added mfma_f32_16x16x128_f8f6f4 and mfma_f32_32x32x64_f8f6f4 with f8 input type on MI350. * [API change] Add support for data types of both operands And print mfma instruction name accordingly. For now, mixed precision mfma between 8-bit and 4- or 6-bit is not supported yet. * Support mixed mfma with bf8/fp8 and fp6/bf6/f4 * [API change] Add support for scale * [NFC] Fix format * [API change] Refactor tensor and LDS layout - Support data types - Support both 32 and 64 banks - Still working on LDS accesses * [LDS layout] Add support for ds_read access pattern for TN config - Fixed the issue with maxPhase computation. Need to submit a PR to fix it in the triton compiler - For ds_read_b64 with 64 banks, there are bank conflicts. We need to figure out a different swizzling pattern to avoid bank conflicts. * [LDS layout] Add support for ds_write access pattern Assumed a basic global access pattern * [LDS layout] Support access pattern for MN-contig without using mfma_transpose_load instructions - Elements along the M/N dim are contiguous in both global memory and LDS. Note that this is not the in-thread transpose case. - Swizzling is disabled * [LDS layout] Support access pattern for MN-contig with mfma_trans_load instructions * Clean up the code * [lds layout] support padding * Reduce tex package required

Summary: Updates the implementations to all have a causal variant. Pull Request resolved: facebookexperimental/triton#692 Reviewed By: htyu Differential Revision: D87807126 Pulled By: njriasan fbshipit-source-id: 0c9aa6ea90e992581a7aa009c26f75d4b4797602

Jokeren added 2 commits September 21, 2022 12:59

Init extract slice

d8797da

Update

b9adcb5

Jokeren changed the title ~~[WIP] Keren/gen extract slice~~ [WIP] Keren/code gen for extract slice and alloc tensor Sep 22, 2022

Jokeren changed the title ~~[WIP] Keren/code gen for extract slice and alloc tensor~~ [Triton-MLIR] Keren/code gen for extract slice and alloc tensor Sep 22, 2022

Jokeren marked this pull request as ready for review September 22, 2022 18:05

Superjomn approved these changes Sep 23, 2022

View reviewed changes

goostavz reviewed Sep 23, 2022

View reviewed changes

Comment thread lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp Outdated

goostavz and others added 5 commits September 22, 2022 23:48

[Triton-MLIR][Backend] Revesmem allocation for non-scratch convert_la…

6c9f096

…yout

Merge branch 'triton-mlir' into keren/gen-extract-slice

c896e93

Temporary commit

171a7fe

Fix test-allocation

9f0df9c

Finish merge

64f9db2

Jokeren mentioned this pull request Sep 23, 2022

[Triton-MLIR][Backend] Revert a bug of smem allocation for non-scratch cvt_layout #697

Closed

Merge branch 'triton-mlir' into keren/gen-extract-slice

26c469d

ptillet enabled auto-merge (squash) September 23, 2022 19:36

ptillet merged commit ecd1bc3 into triton-lang:triton-mlir Sep 23, 2022

ptillet pushed a commit that referenced this pull request Apr 1, 2024

[Triton-MLIR] Keren/code gen for extract slice and alloc tensor (#692)

e444e18

Co-authored-by: gzhu <goostavz@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Triton-MLIR] Keren/code gen for extract slice and alloc tensor#692

[Triton-MLIR] Keren/code gen for extract slice and alloc tensor#692
ptillet merged 8 commits into
triton-lang:triton-mlirfrom
Jokeren:keren/gen-extract-slice

Jokeren commented Sep 22, 2022

Uh oh!

Jokeren commented Sep 22, 2022

Uh oh!

Superjomn left a comment

Uh oh!

Uh oh!

goostavz commented Sep 23, 2022

Uh oh!

Jokeren commented Sep 23, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Jokeren commented Sep 22, 2022

Uh oh!

Jokeren commented Sep 22, 2022

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

goostavz commented Sep 23, 2022

Uh oh!

Jokeren commented Sep 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jokeren commented Sep 23, 2022 •

edited

Loading