Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
582 commits
Select commit Hold shift + click to select a range
830fa08
[frontend] Improve error messages about the support of fp8 types (#4604)
lezcano Sep 4, 2024
cf31278
[Frontend][Backend] Add device-side tma descriptor update API (#4633)
peterbell10 Sep 4, 2024
26c1004
[Pipeliner] Implement dynamic loop peeling (#4534)
sjw36 Sep 4, 2024
ca3fb5f
Use `device` fixture for `runtime/test_cache.py` and `runtime/test_la…
anmyachev Sep 4, 2024
5e3d855
[PROTON] Improve user experience on the CUPTI backend (#4647)
Jokeren Sep 5, 2024
7783738
Pass the target machine to the LLVM pass builder (#4655)
giuseros Sep 5, 2024
9112cfa
[BACKEND] Reduce shared memory usage when pipelining multiple TMA sto…
ThomasRaoux Sep 5, 2024
2f6cec9
[BACKEND] Relax layout supported by SplitOp (#4653)
ThomasRaoux Sep 5, 2024
fc68eae
[NFC] Remove dead code for `hex==True` branch in `getFormatSubstr` fu…
anmyachev Sep 5, 2024
4a56c0e
[NFC] Use `const auto&` instead of `auto` in `bin/triton-tensor-layou…
anmyachev Sep 5, 2024
c902145
[TEST] Use `device` fixture for `assert_helper.py` and `print_helper.…
anmyachev Sep 5, 2024
9a76cfb
[Tutorial] Add device side tensormap update to persistent matmul tuto…
peterbell10 Sep 5, 2024
96f091d
[Pipeliner] Properly fail instead of assert if cannot predicate op (#…
sjw36 Sep 5, 2024
759275d
[frontend] Warn on usage of fp8e4b15 on Hopper rather than error (#4660)
lezcano Sep 5, 2024
4f143d5
[DOC] Workaround for MLIR_ENABLE_DUMP being ignored (#4661)
kapilsh Sep 5, 2024
a4a7043
[NFC] Use `const auto&` instead of `auto` in `ConvertLayoutOpToLLVM.c…
anmyachev Sep 5, 2024
ab417c8
Fix test_reduce1d test with numpy-2.0 (#4649)
Retribution98 Sep 5, 2024
79174e7
[AMD] Enable dynamic peeling for stream pipeliner (#4650)
sjw36 Sep 5, 2024
9414165
[BACKEND] Update llvm to llvm/llvm-project@c08c6a71cfc5 (#4639)
htyu Sep 6, 2024
b933f0f
[Frontend] Warn on implicit casting in the condition in tl.where (#4646)
lezcano Sep 6, 2024
e14ee2d
[Frontend] [BC breaking] Implement PyTorch/JAX/NumPy 2.0 typecast sem…
lezcano Sep 6, 2024
3104056
Revert "[AMD] Disable block merging to avoid block argument explosion…
joviliast Sep 6, 2024
0557336
[Frontend] Improve the error when doing tensor[uint32] * -3 (#4667)
lezcano Sep 7, 2024
63c7d4c
Don't set target machine when using LLVM IR level plugins (#4669)
CRobeck Sep 8, 2024
8a3fb7e
[BACKEND] Allow backend to specify special rules for membar insertion…
ThomasRaoux Sep 9, 2024
8e2d90a
[DOC] Move sections in triton-semantics.rst from ==== to ---- (#4677)
lezcano Sep 9, 2024
4b34a20
[TMA] Increment refcount for Py_None in TMA descriptor runtime helper…
htyu Sep 9, 2024
25324a7
[AMD] Fix uniform offset computation (#4678)
giuseros Sep 9, 2024
a63a477
[BACKEND] Fix potential bug in membar TMA rules (#4681)
ThomasRaoux Sep 9, 2024
e192dba
[AMD] Hoist Q out of the loop for FA optimization (#4666)
oplavsic Sep 9, 2024
7df871d
[BACKEND] Add a loop unroller pass (#4645)
htyu Sep 9, 2024
2df33bb
[Frontend] Add TRITON_DEV_MODE for easier debugging of frontend error…
peterbell10 Sep 10, 2024
58eccfc
[BACKEND] Turn off thread locality optimization based on assumptions …
ThomasRaoux Sep 10, 2024
a0c1bc9
[BE] Accumulator init optimization (#4680)
pawelszczerbuk Sep 10, 2024
944f634
[BACKEND] Support integers in MMA-operand layouts (#4695)
davidberard98 Sep 10, 2024
845d75a
[AMD] Predicate tt.dot to override select in pipeline epilogue (#4694)
sjw36 Sep 10, 2024
6519cfb
[BE] Accumulator Init Optimization - disable optimization if max_num_…
pawelszczerbuk Sep 11, 2024
24f8dbd
[BACKEND] Don't use MMAV3 when K dim is smaller than native size (#4700)
ThomasRaoux Sep 11, 2024
8b33bcf
[Frontend] Fix scf argument order (#4692)
binarman Sep 11, 2024
f03acdc
[FRONTEND] Always use argument names as the key for `constants` and `…
Jokeren Sep 11, 2024
ec4ca60
[AMD] Refactor unsupported conversion decomposition pass (#4262)
binarman Sep 11, 2024
709e3fd
[AMD] Preserve order of load instructions when reordering (#4707)
joviliast Sep 12, 2024
4bb928e
[Backend] Fix arith.index_cast lowering to LLVM. (#4704)
chsigg Sep 12, 2024
0546d03
[python] Fix moving a temp object (#4715)
chsigg Sep 12, 2024
4942d71
[BACKEND][AMD] NFC: Fix initialization order. (#4714)
chsigg Sep 12, 2024
eb58b22
[NFC]Remove dead code in Coalesce pass. (#4687)
linuxlonelyeagle Sep 12, 2024
368c864
[BACKEND] Update LLVM version to llvm/llvm-project@de88b2c (#4703)
chsigg Sep 12, 2024
c238af8
[AMD] Enable masked load and pointer canonicalization pass (#4638)
giuseros Sep 12, 2024
df26ec6
[BACKEND] Improve perf of tensormap_fenceproxy_acquire (#4720)
peterbell10 Sep 12, 2024
84fe9da
Use fast math function for tl.math.log as exp (#4723)
ThomasRaoux Sep 13, 2024
5000e32
[AMD] Add check to fix test_store_cache_modifier for MI300 (#4726)
CRobeck Sep 13, 2024
1f5dc71
[BACKEND] Optimize code style in rewrite-tensor-pointer and add more …
tfruan2000 Sep 13, 2024
c99c214
[BACKEND] Update LLVM version to llvm/llvm-project@36adf8e (#4725)
chsigg Sep 14, 2024
09675e5
[AMD] Disallow reorder tt.load over gpu.barrier (#4735)
sjw36 Sep 16, 2024
a26848c
[Tutorial] Use per-SM descriptors in matmul tutorial (#4682)
peterbell10 Sep 16, 2024
f4c48a9
[BACKEND] Emit an error when the tensor shape is smaller than the sha…
Jokeren Sep 16, 2024
f808819
[AMD] Add load CV cache modifier (#4746)
joviliast Sep 18, 2024
9df25c5
Fix the unstable behavior of 'test_where_warning' (#4747)
Retribution98 Sep 18, 2024
15734f6
[AMD] Add support for gpu.barrier in pipelining epilogue (#4740)
sjw36 Sep 18, 2024
de4a929
Fix ylabel for tutorial 06-fused-attention (#4750)
YarShev Sep 18, 2024
d968a64
[CI][AMD] Clean up ~/.triton after a failed build (#4718)
yiqian1 Sep 18, 2024
5083988
Align resulting computation of GBs, TFLOPs in tutorials (#4752)
YarShev Sep 18, 2024
fad49b2
[Pipeliner] Fixed the epilogue predicate (#4754)
sjw36 Sep 18, 2024
ad0cdfb
Attach the datalayout before optimizing the LLVM module (#4761)
giuseros Sep 19, 2024
3ae95a8
[AMD][CanonicalizePtr] Add a series of fixes for the new pipeliner (#…
giuseros Sep 19, 2024
d6a11a4
[AMD] Add TritonAMDGPU dialect scaffolding (#4685)
oplavsic Sep 20, 2024
0f14676
Allow h.asm["sass"] to work (#4772)
apgoucher Sep 20, 2024
35a8a00
[FRONTEND] Adding unroll loops count to tl.range for scf for (#4662)
plotfi Sep 21, 2024
1e06252
[AMD] Enable the loop unroller (#4773)
htyu Sep 21, 2024
20f4b41
Turn off fast log (#4777)
ThomasRaoux Sep 21, 2024
93c2027
[PROTON] Fix option conflict with subcommands (#4775)
Jokeren Sep 21, 2024
3a647f0
Revert "Use fast math function for tl.math.log as exp (#4723)" (#4779)
ThomasRaoux Sep 21, 2024
576426b
[BACKEND] Switch back to use llvm.load for shared memory load (#4776)
ThomasRaoux Sep 21, 2024
4e73c32
[AMD] Support mfma layout in the prefetch pass (#4771)
jungpark-mlir Sep 24, 2024
6152840
[AMD][NFC] Encapsulate scheduling/pipelining details in StreamPipelin…
sjw36 Sep 24, 2024
4d711bd
[Pipeliner] Fix loop iteration calculation for negative step (#4786)
sjw36 Sep 24, 2024
615bae8
[CI] Fix dependency checks (#4793)
Jokeren Sep 24, 2024
185ad59
Get file, line and col from MLIR instead of inspecting python frames.…
arakhmati-openai Sep 24, 2024
493f991
[TEST] Add a test case to ensure that no errors occur when enabling p…
Jokeren Sep 24, 2024
c120c4c
[BACKEND] Update to llvm/llvm-project@df0864e76110 (#4791)
Moerafaat Sep 25, 2024
eb7925d
[NFC] Simplify getSharedEncIfAllUsersAreDotEnc impl (#4792)
linuxlonelyeagle Sep 25, 2024
694719a
[backend] Fix improper mma->dot shortcut when `warpsPerCTA[1] > 1` (#…
chsigg Sep 25, 2024
59305d7
[Pipeliner] Fixed signedness in predicate logic (#4805)
sjw36 Sep 25, 2024
16c5b26
[AMD] Fix computeBasePtr type specification (#4783)
jungpark-mlir Sep 25, 2024
4348109
[AMD] Add basic instruction scheduling control (#4770)
ravil-mobile Sep 25, 2024
a70d585
[AMD] Turn stream pipeline v2 as the default (#4665)
sjw36 Sep 26, 2024
1b0f9ea
[Backend] Fix device assert inside reduction/scan region (#4811)
peterbell10 Sep 26, 2024
184fb53
[AUTOTUNER] Fix issue in autotuner which may use the wrong value as t…
chengjunlu Sep 26, 2024
e65dd81
[TUTORIAL] Multiple improvements to the tutorials, especially to `09-…
Jokeren Sep 27, 2024
c210764
[SWP] split loads to handle incompatible shared encoding (#4784)
manman-ren Sep 27, 2024
2ef33c6
[SWP] When num_stages = 2, do not pipeline indirect loads (#4721)
manman-ren Sep 27, 2024
e7ec3fe
[SWP] attempt to remove a workaround for a triton llvm codegen bug (#…
manman-ren Sep 27, 2024
0b4feb7
[testing] moved `di = torch._dynamo.device_interface` into backend (#…
ptillet Sep 27, 2024
fe47f98
09-persistent-matmul.py bugfix (#4820)
embg Sep 27, 2024
6af74b2
[FRONTEND] Support passing dtype as constexpr for tma load (#4821)
htyu Sep 28, 2024
755077c
[AMD] Always swap operands of mfma and use mfma.transposed layout (#4…
zhanglx13 Sep 30, 2024
1e88441
[BUILD] Avoid using lld as the linker on macOS (#4827)
antiagainst Sep 30, 2024
80947a2
[tools/triton-tensor-layout] Allow parsing ttgir files with triton_nv…
bertmaher Sep 30, 2024
1df64d1
[AMD] Add alignment information to maskedLoad/maskedStore (#4816)
giuseros Sep 30, 2024
256ef34
[CI][macOS] Pin LLVM version and install lld (#4831)
antiagainst Sep 30, 2024
a6ecc75
[AMD] StreamPipeline V1: fix depArg return mapping (#4832)
davidberard98 Oct 1, 2024
80a5cfb
[PROTON][Experimental] Initialize instruction sampling support for NV…
Jokeren Oct 1, 2024
49266aa
[BACKEND] Linear Layout with stmatrix part 2: support stmatrix for `l…
Jokeren Oct 1, 2024
6c3e953
Add git commit to the version as a suffix (#4812)
antiagainst Oct 1, 2024
7d23ec4
Fix hardcoding of shared address space in target independent code (#4…
kishore-ganesh Oct 1, 2024
6c3e3ae
Bump tj-actions/changed-files from 44 to 45 (#4580)
dependabot[bot] Oct 1, 2024
762a7d1
[AMD][CanonicalizePointers] Propagate the attributes during the rewri…
giuseros Oct 1, 2024
ff02a46
[AMD] Implement dotOperandMfma to linear layout conversion (#4817)
oplavsic Oct 2, 2024
057a9d3
[PROTON] Fix build with clang-17 (#4838)
aobolensk Oct 2, 2024
cd1cc2d
Refactor compiler specializations to consider backend (#4734)
giuseros Oct 2, 2024
b24fa65
Fix assert loc for cases where assert is in an inlined func (#4840)
ThomasRaoux Oct 3, 2024
112b88d
[BACKEND] Support `convert_layout` with `num_ctas > 1` Using Linear L…
Jokeren Oct 3, 2024
819338d
[PROTON] Raise an exception if we try to activate/deactivate a sessio…
Jokeren Oct 3, 2024
5f77e8c
[triton][tool] Add support for printing shared memory layouts in the …
SamGinzburg Oct 3, 2024
33c0c1c
[AMD] Fix shared layout order for batch dimension in pipeline passes …
binarman Oct 3, 2024
1495116
[AMD] Add missing i16 for wmma and disable some tests (#4843)
AlexAUT Oct 3, 2024
b8d8ce9
[Backend] Bypass conversion for suitable blocked to dotOperand layout…
binarman Oct 3, 2024
219c177
[PROTON] Add metric percentage features (#4836)
CRobeck Oct 4, 2024
5f9bb95
[Backend] Copy attributes to new loop in RewriteTensorPointer (#4848)
sjw36 Oct 4, 2024
fdac594
Update llvm/llvm-project@61f8a7f61890 (#4847)
antiagainst Oct 4, 2024
3a44459
[Backend] Fix access to empty std::optional in LinearLayout debug cod…
AlexAUT Oct 4, 2024
1c729d0
[Interpreter] [Tests] Fix condition in test_dot3d (#4852)
binarman Oct 4, 2024
41006e9
[AMD][CanonicalizePtr] Fix the scalar pointer canonicalization (#4851)
giuseros Oct 4, 2024
3a9ddea
[Backend] Use symbol table to lookup smem base (#4853)
peterbell10 Oct 4, 2024
518b26e
[Frontend] Separate tensor and ir value into two different concepts (…
peterbell10 Oct 4, 2024
4ff1fd6
[IR] Add convenience builder function for program id ops (#4855)
peterbell10 Oct 4, 2024
2c498ee
[AMD] Add back "Hint compiler to preload kernel args" (#4830)
zhanglx13 Oct 5, 2024
c54f988
[frontend] added overflow checks in `debug` mode (#4589)
ptillet Oct 6, 2024
2cc227d
[TESTING] Remove the `fast_flush` parameter from `do_bench` (#4485)
int3 Oct 7, 2024
ab07e54
[AUTOTUNER] Make autotuner take `do_bench` as a parameter (#4496)
int3 Oct 7, 2024
53166ef
Allow third-party backends to add submodules to `triton.language.extr…
Alfie-Edwards Oct 7, 2024
ca70f08
Revert https://github.com/triton-lang/triton/pull/4784 (#4865)
ThomasRaoux Oct 8, 2024
9e72047
revert https://github.com/triton-lang/triton/pull/4774 (#4873)
ThomasRaoux Oct 8, 2024
e66f5ab
[BUILD] Add language extras to `.gitignore` (#4872)
Jokeren Oct 8, 2024
82fae4e
[ANALYSIS] Don't consider descending sequences as contiguous in AxisI…
ienkovich Oct 9, 2024
68aa962
Fix type hint on Python 3.8 (#4862)
aobolensk Oct 9, 2024
88f1b53
[PROTON] Make ratio metrics optional (#4874)
Jokeren Oct 9, 2024
8974d06
[frontend] Parse nv_tma_desc attribute when compiling from ttgir (#4875)
bertmaher Oct 10, 2024
e8d7957
[build] Remove LLVM_ABI_BREAKING_CHECKS=FORCE_OFF.
karupayun Oct 10, 2024
cc0cf2d
[frontend] Pretty-print `ptxas` command on failure (#4882)
bertmaher Oct 10, 2024
9c87056
[Frontend] Fix codegen when top level control flow occurs after an un…
peterbell10 Oct 10, 2024
dc233fb
[SWP][Tests] Add one test that triggers SWP error and improve test lo…
sfzhu93 Oct 11, 2024
a8adf9b
Auto import backends in `triton.language.extra` (#4889)
kbumsik Oct 11, 2024
020a1a0
[BACKEND] Avoid undefined behavior in `std::clamp` when `shapePerCTA[…
Jokeren Oct 11, 2024
ddb7098
Fix type hint in setup.py (#4894)
kbumsik Oct 11, 2024
d39ee1f
[IR] Add poison value to triton IR and use in frontend in place of un…
peterbell10 Oct 11, 2024
4daa467
Implement scaled_dot(mxfp8, fp8) via mma (#4795)
lezcano Oct 12, 2024
8966e5c
Fix typo: Correct 'piepling' to 'pipelining' in kernel comments for c…
yuWeiCute Oct 12, 2024
e87f877
[AMD] Count llvm instruction during conversion for scheduling hints (…
ravil-mobile Oct 13, 2024
d6dd04a
[Backend] Update scf.if result uses in RewriteTensorPointer pass (#4893)
yiqian1 Oct 14, 2024
fb90385
[Instrumentation] Move instrumentation lib test to stand alone lib di…
CRobeck Oct 14, 2024
037728b
[AMD] Fix "keep Q tensor in VGPRS" optimization (#4901)
oplavsic Oct 14, 2024
664ac51
[AMD] Sink the 2nd tt.load after local_load's (#4823)
zhanglx13 Oct 14, 2024
fa229d1
Re-enable NumPy 2.0 semantics for add, sub, mul. (#4905)
lezcano Oct 14, 2024
f9688ab
[BACKEND] Small fixes for dot operand properties (#4895)
Jokeren Oct 15, 2024
a60fa8c
[AMD] Fix gfx12 warp size and fix wmma in maybeDeduplicate (#4912)
AlexAUT Oct 15, 2024
ec0bd4a
[Linear Layouts] Implement LL conversion for DotOperand(version=2) (#…
lezcano Oct 15, 2024
55c9576
[CI] Run CI on all PRs (#4917)
peterbell10 Oct 15, 2024
53c2965
[Frontend] Factor out block shape validation function (#4915)
peterbell10 Oct 15, 2024
79ace62
[Pipeliner] Fix epilogue peeling for num_stages=3+ (#4890)
sjw36 Oct 15, 2024
d997364
Update pre-commit config (#4913)
anmyachev Oct 16, 2024
93de426
[AMD] revert optimizations (#4919)
ptillet Oct 16, 2024
9e90089
[Backend] Implement `scaled_dot(mxfp4, fp8)` (#4904)
lezcano Oct 16, 2024
8fb7342
[AMD] unrevert #4901; revert #4823 (#4920)
ptillet Oct 16, 2024
fc8add9
Add the predicate to the instrRepr before returning it when onlyAttac…
arakhmati-openai Oct 16, 2024
6af4f88
Fix 3xTF32 precision issues (#4934)
alexsamardzic Oct 16, 2024
d207894
[BUILD] Add instrumentation libs to `.gitignore` (#4936)
Jokeren Oct 16, 2024
185299e
[BACKEND] Update to llvm/llvm-project@b5cc222d7429 (#4927)
ravil-mobile Oct 16, 2024
d195832
[NFC] Add `#include <string>` into `TritonToTritonGPUPass.h` (#4943)
anmyachev Oct 17, 2024
1883703
[NFC] Make some tests platform independent (#4946)
anmyachev Oct 17, 2024
692143c
[AMD] Add a tt.pointer_range_32 specialization (#4910)
giuseros Oct 17, 2024
538c237
[TEST] Use device fixture for unit tests (#4948)
Retribution98 Oct 18, 2024
bce48c8
[NFC] Make cuda links parameterizable by `system` parameter (#4945)
anmyachev Oct 18, 2024
d4e5a78
[Triton] Use `UnitAttr` in `tt.reshape` definition (#4947)
Oct 18, 2024
76ed94d
[AMD] Remove stream pipeliner v1 (#4845)
sjw36 Oct 18, 2024
4ddebd2
[NFC] Remove duplicated call to function `mlir_check_link_libraries` …
vguerra Oct 20, 2024
b6c4829
[Triton] Verify all `tt.reduce` operands have the same shape (#4957)
Oct 20, 2024
9d424e0
win: __builtin_ctz* for MSVC (#4953)
wkpark Oct 20, 2024
a19f324
Replace open-coded indirect load elimination loop (#4952)
saagarjha Oct 20, 2024
ff306da
[AMD] Introduce amdgpu.buffer_load and amdgpu.buffer_store (#4903)
giuseros Oct 21, 2024
45f7344
[NFC] Removing unused MLIR context (#4964)
vguerra Oct 22, 2024
50080ef
Create dev_conference_2024.md (#4963)
kshama-msft Oct 22, 2024
6a4be78
Pipeline scale_dot (#4950)
lezcano Oct 22, 2024
9357902
[TEST] Reenable mixed precision dot tests (#4965)
Jokeren Oct 22, 2024
ed39cb0
Fix coverity issues (#4967)
anmyachev Oct 22, 2024
1064b59
[BACKEND] Propagate mma layout to following elementwise operations. (…
htyu Oct 22, 2024
c9a40b2
[Build] Remove unnecessary `NVGPUIR` from `TritonGPUToLLVM` (#4977)
makslevental Oct 23, 2024
a20ce64
[AMD] Add MFMA dot operand to LinearLayout conversion (#4961)
binarman Oct 23, 2024
a1aa58b
[BACKEND] Use vectorized atomics on Hopper (#4971)
davidberard98 Oct 23, 2024
6ad95ee
[AUTOTUNER] A quick follow-up for more device-independent do_bench (#…
minjang Oct 23, 2024
4a54311
[BACKEND] Fix when trying to convert an mma<!tt.ptr<f32>> into blocke…
lezcano Oct 23, 2024
3c13f09
[AMD] NFC: Refactor AccelerateAMDMatmul patterns (#4985)
antiagainst Oct 24, 2024
3613bf4
[BACKEND] Fix the register accessing order of dot operands of mmav2 (…
Jokeren Oct 24, 2024
13594bb
Add LL::quotient and remove uses of divideRight and sublayoutIsIdenti…
lezcano Oct 24, 2024
656f60b
[AMD] Add fast_expf to libdevice (#4937)
knwng Oct 24, 2024
17baf40
[AUTOTUNER] Return `num_warmups`, `num_reps` and `use_cuda_graph` fie…
anmyachev Oct 24, 2024
9ead0e0
[BACKEND][NVIDIA] pass ptx-version to ttgir->llir conversion pass and…
davidberard98 Oct 24, 2024
e938e90
[AMD] Rewrite transpose ops in pipeliner to mutable memory (#4969)
AlexAUT Oct 24, 2024
9719dbf
[PROTON] Emit an error for the roctracer backend if `HIP_VISIBLE_DEVI…
Jokeren Oct 24, 2024
258a5bc
[AMD] Add pass to convert tt.load/tt.store to buffer operations (#4966)
giuseros Oct 24, 2024
152ef2d
[AMD] Enable shared->MFMA dot operand conversion through LinearLayout…
binarman Oct 24, 2024
819b371
Add string representation for AttrsDescriptor (#4888)
alexbaden Oct 29, 2024
d47fa58
Update version to 3.2.0
bertmaher Oct 30, 2024
97f6cd0
[Backend] Fix predicates for device assert inside reduction/scan regi…
davidberard98 Nov 5, 2024
9c689f5
[BACKEND] Fix asserts in 2d scan and add assert mode to layout tests …
peterbell10 Nov 5, 2024
b5ceca3
Add back barrier after asserts (#5043)
ThomasRaoux Nov 2, 2024
90d44e4
[FRONTEND] Fix handling of `from m import x as y` in CodeGenerator (#…
davidberard98 Nov 6, 2024
c2143cb
[BACKEND] Make ExternElementwise op implement ConditionallySpeculatab…
davidberard98 Nov 6, 2024
601aec2
[RUNTIME] Pass full kwargs to Autotuner hooks instead of positional a…
aakhundov Nov 7, 2024
24c0fe4
[BACKEND] Fix ProgramPoint passing in AxisInfoAnalysis (#5181)
aakhundov Nov 18, 2024
3e00b0e
[RUNTIME] Add flags for detecting user-defined Autotuner hooks (#5092)
aakhundov Nov 9, 2024
35c6c7c
Revert "[BACKEND] Optimize code generation for load with other arg (#…
davidberard98 Nov 20, 2024
dbc771e
[Release Only Changes] Remove git commit hash in wheel name (#5405)
atalman Dec 11, 2024
8af9311
[release-only] Add manylinux2014_x86_64 for PyTorch release 2.6 (#5414)
atalman Dec 13, 2024
e74f027
[BACKEND] Fix and document logic for creating warp shapes in MMAv3 (#…
bertmaher Dec 17, 2024
aba0fbf
[release/3.2.x] [CHERRY PICK] Add gfx950 target definition (#5452)
jataylo Dec 19, 2024
64b80f0
[release/3.2.x] [CHERRY PICK] [AMD] Fix issue with rank=1 in tryFitCv…
jataylo Dec 23, 2024
ebffad5
Automatic Warp Specialization Optimization (#5622)
htyu Jan 15, 2025
6771065
[cherrypick release/3.2][BACKEND] Fix accumulator init optimization f…
bertmaher Jan 15, 2025
e101b08
[3.2 cherry pick] Ensure device context before launching kernel (#373…
bertmaher Jan 21, 2025
9641643
Release Only - Enable pypi promotion for 3.2.x release (#5618)
atalman Jan 22, 2025
9336065
[INIT] Init flagtree (#1)
zhzhcookie Mar 25, 2025
0f74943
[CI/CD] Init flagtree workflow (#2)
zhzhcookie Mar 26, 2025
f41876b
[DOC] [BUILD] Update build deps download addr (#3)
zhzhcookie Mar 26, 2025
3548338
[DOC] Update readme and release notes (#4)
zhzhcookie Mar 26, 2025
23d2857
[BUILD] Fix ext_sourcedir in editable_wheel mode, and update require…
zhzhcookie Mar 26, 2025
b382dd8
[CI] Fix build and test workflow (#6)
zhzhcookie Mar 26, 2025
a27316f
Revert "AMD requested cherry-picks for release/3.1.x (#4794)"
zhzhcookie Mar 31, 2025
a69557d
Revert "[RELEASE] Cherry-pick use of device-agnostic `DeviceInterface…
zhzhcookie Mar 31, 2025
fc9e223
Revert "Cherry Pick of #4247 to release/3.1.x branch (#4706)"
zhzhcookie Mar 31, 2025
7a9e367
Revert "Cherry Pick of #4311 to release/3.1.x branch (#4705)"
zhzhcookie Mar 31, 2025
1c3ee97
Revert "[Release 3.1.0] Bump version to 3.1.0 and remove temporary ch…
zhzhcookie Mar 31, 2025
af32f8a
Revert "[AMD] Cherry-pick commits from mainline to support Flex atten…
zhzhcookie Mar 31, 2025
7b5fffc
Revert "Repack wheels with build-number 1 (#4354)"
zhzhcookie Mar 31, 2025
6ea9e5f
Revert "[Runtime] Dynamically load cuTensorMapEncodeTiled (#4330) (#4…
zhzhcookie Mar 31, 2025
13b1932
Revert "[VERSION] Bumped to 3.0.0-post1 (#4340)"
zhzhcookie Mar 31, 2025
13a139b
Revert "[RELEASE][BACKEND] Fix getThreadsPerWarp for MD sliced encodi…
zhzhcookie Mar 31, 2025
d7e6ce2
Revert "[RELEASE] [AMD] Additional AMD cherry-picks (#4175)"
zhzhcookie Mar 31, 2025
e0cd80d
Revert "[BACKEND] Prevent for/yield argument number drift (#4097) (#4…
zhzhcookie Mar 31, 2025
a0ac7d8
Revert "[AMD] Add more math functions in libdevice (#4086) (#4163)"
zhzhcookie Mar 31, 2025
efea94b
Revert "Remove redundant options from passes (#4015) (#4162)"
zhzhcookie Mar 31, 2025
d5c6092
Revert "[AMD] Move MFMA shortcut check to not compute scratch buffer …
zhzhcookie Mar 31, 2025
3e3afef
Revert "Use dictionary unpacking instead of merge operator for python…
zhzhcookie Mar 31, 2025
e693ac5
Revert "[BUILD] Change hatchet as an extra dependency (#4138) (#4158)"
zhzhcookie Mar 31, 2025
b65638a
Merge remote-tracking branch 'upstream/release/3.2.x' into triton_v3.2.x
zhzhcookie May 15, 2025
72053b3
[BUILD] Fix merge Triton 3.2.x, disable triton_shared temporarily
zhzhcookie May 15, 2025
05be61b
[BACKEND] Add ascend backend (#8)
zhzhcookie Jun 4, 2025
3a72726
[BUILD] Update build ascend backend (#10)
zhzhcookie Jun 6, 2025
97e653a
[DOC] Update readme (#14)
zhzhcookie Jun 6, 2025
ae40993
[init refactor] refactor setup
StrongSpoon Jun 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
48 changes: 48 additions & 0 deletions .github/ISSUE_TEMPLATE/bug.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Report a bug
description: Report flagtree failing to compile a kernel, or giving incorrect results
labels: ["bug"]

body:
- type: markdown
attributes:
value: |
#### Disclaimer
The core flagtree team is small and has very limited capacity. We may not have time to look into your report.
For the best results, please:
- Avoid submitting duplicates. Search first to see if it's been reported previously.
- Check if the issue persists with a build from the latest source.
- Provide all relevant information in the initial report, to prevent unnecessary back and forth discussion.
- If you can, try to diagnose and/or fix the issue yourself. We welcome high quality contributions.
- type: textarea
attributes:
label: Describe the bug
description: |
Please provide a clear and concise description of what the bug is.

If relevant, add a [minimal complete example](https://stackoverflow.com/help/minimal-reproducible-example) that reproduces the bug. It is very important for the snippet to be as simple as possible, so please take time to trim down any irrelevant code to help us debug efficiently. We are going to copy-paste your code and we expect to get the same result as you did, so include both the kernel and launching code as well as any relevant imports.

If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.

Please also paste or describe the results you observe instead of the expected results. If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.
placeholder: |
A clear and concise description of what the bug is.

```python
# Sample code to reproduce the problem
```

```
The error message you got, with the full traceback.
```
validations:
required: true
- type: textarea
attributes:
label: Environment details
description: |
Please include any relevant context about how you're running the reproducer e.g. which version of triton, and what GPU you are using.
placeholder: |
Triton: ...
GPU: ...
validations:
required: true
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blank_issues_enabled: false
44 changes: 44 additions & 0 deletions .github/ISSUE_TEMPLATE/performance.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Report a performance issue
description: Report cases where triton is generating sub-optimal (but functionally correct) PTX/LLVM IR
labels: ["performance"]

body:
- type: markdown
attributes:
value: |
#### Disclaimer
The core flagtree team is small and has very limited capacity. We may not have time to look into your report.
For the best results, please:
- Avoid submitting duplicates. Search first to see if it's been reported previously.
- Check if the issue persists with a build from the latest source.
- Provide all relevant information in the initial report, to prevent unnecessary back and forth discussion.
- If you can, try to diagnose and/or fix the issue yourself. We welcome high quality contributions.
- type: textarea
attributes:
label: Describe the issue
description: |
Please provide a clear and concise description of the issue.

Include a [minimal complete example](https://stackoverflow.com/help/minimal-reproducible-example) that reproduces the issue. It is very important for the snippet to be as simple as possible, so please take time to trim down any irrelevant code to help us debug efficiently. We are going to copy-paste your code and we expect to get the same result as you did.

A reproducer could be a python program that runs a triton kernel and prints out the relevant suboptimal IR, or an IR file with an accompanying triton-opt command.

If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.
placeholder: |
A clear and concise description of the issue.

```python
# Sample code to reproduce the problem
```
validations:
required: true
- type: textarea
attributes:
label: Environment details
description: |
Please include any relevant context about how you're running the reproducer e.g. which version of triton, and what GPU you are using.
placeholder: |
Triton: ...
GPU: ...
validations:
required: true
16 changes: 16 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<!-- PR Title
[component] brief description
component options:
- BUILD
- CI/CD
- DOC
- FRONTEND
- BACKEND
- AUTOTUNER
- CACHE
- LAYOUTS
- PIPELINE
- PROTON
- TEST
- OTHER
-->
32 changes: 32 additions & 0 deletions .github/workflows/ascend-build-and-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Ascend-Build-And-Test

on:
push:
branches: [ "triton_v3.2.x" ]
pull_request:
branches: [ "triton_v3.2.x" ]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
ascend-build-and-test:
runs-on: ascend
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: FlagTree Build on Ascend
shell: bash
run: |
export FLAGTREE_BACKEND=ascend
source ~/env.sh
cd python
MAX_JOBS=32 python3.9 -m pip install . --no-build-isolation

- name: FlagTree Test on Ascend
shell: bash
run: |
source /usr/local/Ascend/ascend-toolkit/set_env.sh
python3.9 third_party/ascend/python/tutorials/01-vector-add.py
26 changes: 26 additions & 0 deletions .github/workflows/cambricon-build-and-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Cambricon-Build-And-Test

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
cambricon-build-and-test:
runs-on: cambricon
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: FlagTree Build on Cambricon
shell: bash
run: |
export FLAGTREE_BACKEND=cambricon
source ~/env.sh
cd python
MAX_JOBS=8 pip3 install . --no-build-isolation
21 changes: 21 additions & 0 deletions .github/workflows/code-format-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: Code-Format-Check

on:
push:
branches: [ "main", "triton_v3.2.x" ]
pull_request:
branches: [ "main", "triton_v3.2.x" ]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- uses: pre-commit/action@v3.0.1
57 changes: 0 additions & 57 deletions .github/workflows/documentation.yml

This file was deleted.

59 changes: 59 additions & 0 deletions .github/workflows/iluvatar-build-and-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Iluvatar-Build-And-Test

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
iluvatar-build-and-test:
runs-on: iluvatar
steps:
- name: Checkout code (attempt 1)
id: checkout1
uses: actions/checkout@v4
continue-on-error: true

- name: Sleep before checkout2
if: steps.checkout1.outcome == 'failure'
run: |
echo "First checkout attempt failed. Sleeping for 120 seconds before retry..."
sleep 120

- name: Checkout code (attempt 2)
id: checkout2
if: steps.checkout1.outcome == 'failure'
uses: actions/checkout@v4
continue-on-error: true

- name: Sleep before final checkout
if: steps.checkout1.outcome == 'failure' && steps.checkout2.outcome == 'failure'
run: |
echo "Second checkout attempt failed. Sleeping for 180 seconds before final retry..."
sleep 180

- name: Checkout code (final attempt)
if: steps.checkout1.outcome == 'failure' && steps.checkout2.outcome == 'failure'
uses: actions/checkout@v4

- name: Verify checkout success
if: success()
run: echo "Checkout completed successfully"

- name: FlagTree Build on Iluvatar
shell: bash
run: |
export FLAGTREE_BACKEND=iluvatar
source ~/env.sh
cd python
MAX_JOBS=20 pip3 install . --no-build-isolation

- name: FlagTree Test on Iluvatar
shell: bash
run: |
CUDA_VISIBLE_DEVICES=15 pytest -s third_party/iluvatar/python/test/unit
Loading
Loading