Skip to content

[Triton-MLIR] Keren/code gen for extract slice and alloc tensor#692

Merged
ptillet merged 8 commits into
triton-lang:triton-mlirfrom
Jokeren:keren/gen-extract-slice
Sep 23, 2022
Merged

[Triton-MLIR] Keren/code gen for extract slice and alloc tensor#692
ptillet merged 8 commits into
triton-lang:triton-mlirfrom
Jokeren:keren/gen-extract-slice

Conversation

@Jokeren
Copy link
Copy Markdown
Contributor

@Jokeren Jokeren commented Sep 22, 2022

No description provided.

@Jokeren Jokeren changed the title [WIP] Keren/gen extract slice [WIP] Keren/code gen for extract slice and alloc tensor Sep 22, 2022
@Jokeren Jokeren changed the title [WIP] Keren/code gen for extract slice and alloc tensor [Triton-MLIR] Keren/code gen for extract slice and alloc tensor Sep 22, 2022
@Jokeren Jokeren marked this pull request as ready for review September 22, 2022 18:05
@Jokeren
Copy link
Copy Markdown
Contributor Author

Jokeren commented Sep 22, 2022

@goostavz @Superjomn Please let me know if you have any comments.

I'll be working on insert_slice_async after this.

Copy link
Copy Markdown
Collaborator

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp Outdated
@goostavz
Copy link
Copy Markdown
Collaborator

LGTM, no further comments

@Jokeren
Copy link
Copy Markdown
Contributor Author

Jokeren commented Sep 23, 2022

Wait for merging until #701 is merged into master and triton-mlir

@ptillet ptillet enabled auto-merge (squash) September 23, 2022 19:36
@ptillet ptillet merged commit ecd1bc3 into triton-lang:triton-mlir Sep 23, 2022
ptillet pushed a commit that referenced this pull request Apr 1, 2024
brunomazzottiamd pushed a commit to brunomazzottiamd/triton that referenced this pull request Jan 29, 2025
* Move preamble code into tikzplot.tex

* Rename kpack to kWidth and allow kWidth = 32

* [API change] Take user input to set dim names

API change:
- For blocked layout, use -tensorShape, which only takes two dims as dim0,dim1
- For dot layout, use -dotShape, which takes three dims as M,N,K

* Re-structure files

Separate each layout's code into their own files

* Extend dotLayout plot to support kWidth=32

- When kWidth is large, use a smaller elemSize honrizontally to save
space
- Improve the labels, such as
  - change vec to kWidth for operands
  - change opA/opB to inA/inB and include operand dims
  - remove group dims in the operands so that they don't overlap with
  operand block dims
- Better alignment: dot op and mfma zoomed-in pics are bottom aligned

* [API change] Add support for kGroup

kGroup is defined as total elements per thread / kWidth for one mfma
instruction.
We need kGroup = 2 only for the newly added mfma_f32_16x16x128_f8f6f4
and mfma_f32_32x32x64_f8f6f4 with f8 input type on MI350.

* [API change] Add support for data types of both operands

And print mfma instruction name accordingly.
For now, mixed precision mfma between 8-bit and 4- or 6-bit is not
supported yet.

* Support mixed mfma with bf8/fp8 and fp6/bf6/f4

* [API change] Add support for scale

* [NFC] Fix format

* [API change] Refactor tensor and LDS layout

- Support data types
- Support both 32 and 64 banks
- Still working on LDS accesses

* [LDS layout] Add support for ds_read access pattern for TN config

- Fixed the issue with maxPhase computation. Need to submit a PR to
fix it in the triton compiler
- For ds_read_b64 with 64 banks, there are bank conflicts. We need to
figure out a different swizzling pattern to avoid bank conflicts.

* [LDS layout] Add support for ds_write access pattern

Assumed a basic global access pattern

* [LDS layout] Support access pattern for MN-contig without using
mfma_transpose_load instructions

- Elements along the M/N dim are contiguous in both global memory and
LDS. Note that this is not the in-thread transpose case.
- Swizzling is disabled

* [LDS layout] Support access pattern for MN-contig with mfma_trans_load instructions

* Clean up the code

* [lds layout] support padding

* Reduce tex package required
scxiao pushed a commit to scxiao/triton that referenced this pull request Apr 2, 2026
Summary:
Updates the implementations to all have a causal variant.

Pull Request resolved: facebookexperimental/triton#692

Reviewed By: htyu

Differential Revision: D87807126

Pulled By: njriasan

fbshipit-source-id: 0c9aa6ea90e992581a7aa009c26f75d4b4797602
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants