Merge OpenAI Triton commit 100e2aa#1227
Merged
whitneywhtsang merged 13 commits intollvm-targetfrom Jun 2, 2024
Merged
Conversation
The `tl.dot` docs have a couple of typos and seem to be missing mention of lower precision dtypes
Co-authored-by: Tori <vwbaker@google.com>
- Correct some indices errors.
When processing loops we were incorrectly setting all the local def as livein. Since part of the code assumes that only livein variable can be loop carried dependency that was causing a mismatch in the logic. Change it to enforce that local def not livein cannot be loop carried.
The TritonGPUPipeline pass has unused pass options and the TritonGPUAccelerateMatmul pass option could instead be read from the module attributes, where the data already exists. The goal is to reduce redundancy. --------- Signed-off-by: Finlay Marno <finlay.marno@codeplay.com>
… it is not needed (#3790) This PR: - moves shortcut check earlier, to not compute scratch buffer shape if it is not needed - raise priority of AMD specific over common conversions to eliminate uncertainty which pattern to apply. - add regression test for MFMA to Dot Op shortcut
This is a follow up PR of #3832 `wave` has been replaced with `warp` for the consistency between GPUs. Unfortunately there are still remaining use of `wave` in the code as below, although I've tried to minimize it. ## Referencing AMD features (HIP API or AMDGPU) third_party/amd/backend/include/hip/: third_party/amd/backend/include/roctracer/: third_party/amd/backend/include/has/*: - Cannot completely replace waves because the definition comes from outside e.g., __AMDGCN_WAVEFRONT_SIZE, hsa_wavefront_info_t - Mixing up `warp` and `wave` together in the same place could be even worse. ## Using amdgpu compiler option third_party/amd/backend/compiler.py: python/tutorials/03-matrix-multiplication.py: python/tutorials/06-fused-attention.py: - `waves_per_eu` which is supposed to mapped to a CLANG attribute `amdgpu-waves-per-eu` - It is AMD only option and makes better sense to keep
This PR enables support of 3d dot for RDNA GPUs.
Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
18d691e100e2aa
pbchekin
approved these changes
Jun 2, 2024
wdziurdz
pushed a commit
that referenced
this pull request
Apr 7, 2026
Reverts intel-tools/intel-xpu-backend-for-triton#1130 This change causes problems in main-js CI when it is merged into main-js. Also with #1179 we will have a different way of scheduled running llama_kernels with profiling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from 021fb72 to 100e2aa (May 28).
Pass rate: 98.64%
Please do not squash and merge this PR.