Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
631 commits
Select commit Hold shift + click to select a range
716dbef
[Example] Add GQA decoding kernel with varlen page table (#1265)
tzj-fxz Nov 16, 2025
041d4a0
[Refactor] add support for numpy dtype conversion (#1255)
kurisu6912 Nov 17, 2025
a2a2781
[EXAMPLE] In the flash attention example keep the max of all blocks s…
vpj Nov 17, 2025
b3d6f03
[Docs] Improve Installation Guide (#1270)
SiriusNEO Nov 17, 2025
3ab93cd
[Enhancement] Keep max score attention across blocks in FlashAttentio…
Rachmanino Nov 17, 2025
220c323
[Bugfix] Fix multiple cg defination when using T.sync_grid (#1272)
chengyupku Nov 17, 2025
b192251
[Minor] Remove from __future__ import annotations for python 3.8 (#1273)
oraluben Nov 18, 2025
e805f8e
[BugFix] Adding extra parameters into autotune hashkey (#1274)
SiriusNEO Nov 18, 2025
49c8571
Fix various issues under `int64_t` static and dynamic shape. (#1218)
Elevator14B Nov 18, 2025
0f980f1
Bug fix for Gated Delta Net benchmark script (#1267)
learning-chip Nov 18, 2025
1b0efb6
[Bugfix] Minor fix for some cases (#1278)
LeiWang1999 Nov 18, 2025
921b96a
[Language] Add shape check in `T.view/reshape` (#1277)
SiriusNEO Nov 18, 2025
74da369
[FFI] Use tvm ffi as the default execution backend (#1259)
LeiWang1999 Nov 18, 2025
4c8b9ad
[Bugfix] Supply missing `T.print` for bool type (#1279)
LeiWang1999 Nov 19, 2025
cd681e6
[Fix] Fix memory leak bug (#1281)
kurisu6912 Nov 19, 2025
551ac60
[Enhancement] Enhance CUDA compilation by integrating pass context co…
LeiWang1999 Nov 19, 2025
49f3539
Fix the bug in issue #1266 (#1284)
sea-with-sakura Nov 19, 2025
9e67b86
[Language][UX] Nested loop checker in pre-lowering stage (#1288)
SiriusNEO Nov 19, 2025
bef7e52
[Compatibility] Support CUDA 11.3 (#1290)
LeiWang1999 Nov 20, 2025
bccb648
[Feat] Add support for using `T.Tensor(n * 2 + 1)` in function annota…
kurisu6912 Nov 20, 2025
dd7fdb8
[Feat] add support for passing reference in T.Var annotation (#1291)
kurisu6912 Nov 20, 2025
d4b6d09
[Enhancement] Shared Memory Size Can be Dynamic (#1294)
LeiWang1999 Nov 20, 2025
2426090
[Fix] Remove unused let_bindings_ in CodeGenC to fix #1300 (#1305)
kurisu6912 Nov 21, 2025
17bbc0c
[Bugfix] Fallback to the old AtomicAdd implementation for legacy arch…
LeiWang1999 Nov 21, 2025
bf90a5f
[Fix] Fix frame scope error in T.macro (#1308)
kurisu6912 Nov 21, 2025
0d101c1
[WIP] support more dtypes for tcgen05 (#1229)
PannenetsF Nov 21, 2025
470eb74
Improve memory access safety and `T.assume` handling (#1292)
LJC00118 Nov 22, 2025
721baed
[Bugfix] Fix autotune cache (#1315)
LeiWang1999 Nov 22, 2025
9f7bac4
[Refactor] Backup Analyzer to get the appropriate arith informations …
LeiWang1999 Nov 23, 2025
ca98cc3
Revert "[WIP] support more dtypes for tcgen05 (#1229)" (#1323)
LeiWang1999 Nov 24, 2025
fddcbbd
[CI]: Bump actions/checkout from 5 to 6 (#1319)
dependabot[bot] Nov 24, 2025
2a70fd3
[CI]: Bump pypa/cibuildwheel from 3.2 to 3.3 (#1318)
dependabot[bot] Nov 24, 2025
01d207f
[Installation] Fix building using customized TVM path (#1326)
SiriusNEO Nov 24, 2025
6c2162a
[Release] Allow developer with write permission to trigger wheel rele…
oraluben Nov 24, 2025
caa6dd3
[Feat] Support warp reduce (#1316)
Rachmanino Nov 24, 2025
c30df2a
[Enhancement] Support more dtype in `T.print` (#1329)
xwhzz Nov 24, 2025
9dda774
[BugFix] Use BufferRegion in tl.cumsum to infer buffer shape (#1321)
SiriusNEO Nov 24, 2025
b020685
[Fix] fix wrong uint narrowing bug in tvm in #1310 (#1320)
kurisu6912 Nov 25, 2025
71b73e1
[Refactor] Disable strided buffer load inside tvm (#1301) (#1332)
kurisu6912 Nov 25, 2025
2f34840
[Refactor] Moving `NormalizeToBufferRegion` and `MakeAccessPtrFromReg…
LeiWang1999 Nov 25, 2025
2ae4f1b
[Fix] Fix bug copying from or to local buffer (#1304) (#1324)
kurisu6912 Nov 25, 2025
e2b10c5
[Language][UX] Semantic check for parallel fragment access (#1338)
SiriusNEO Nov 25, 2025
f810f97
Add unit tests for T.assume (#1341)
LJC00118 Nov 26, 2025
fac0400
[Feat] Extend LegalizeNegativeIndex to support buffer store stmts (#1…
ConvolutedDog Nov 26, 2025
f5d9da4
[Refactor] Phaseout vmap for Tile Operators (#1334)
LeiWang1999 Nov 26, 2025
f0c721a
[Enhancement] add more dtype and fix mma.ws for fp16 for tcgen05 (#1327)
PannenetsF Nov 26, 2025
17718be
[Refactor] Enhance CopyNode's IterVar Creation and Range Handling (#1…
LeiWang1999 Nov 26, 2025
4f84400
[Fix] Fix missing `not` rewrite in frontend (#1348)
kurisu6912 Nov 26, 2025
6bae64f
[Enhancement] Add support for k_pack in gemm_mfma (#1344)
Gongen-Ali Nov 26, 2025
b8240b7
Add sparse fine-tuning kernel for deepseek sparse attention to exampl…
hyx1999 Nov 27, 2025
1e92d11
[Refactor] Improve assertion handling in CodeGenCHost and ArgBinder (…
LeiWang1999 Nov 27, 2025
36a2b2f
[Refactor] Simplify index sign state handling in LegalizeNegativeInde…
LeiWang1999 Nov 28, 2025
17cfeb7
[Enhancement] Improve error handling and assertion messages across ru…
LeiWang1999 Nov 28, 2025
a4ea7da
[Bugfix] Disable floordiv optimization due to integer overflow risk (…
LJC00118 Nov 28, 2025
c6a19fb
[Bugfix] Fix the jit_kernel issue (#1357)
gfvvz Nov 30, 2025
1b42c87
[Refactor] Update Fragment Indexing in ParallelOpNode's InferLayout M…
LeiWang1999 Dec 1, 2025
b10ef75
[Analysis] Enhance NestedLoopChecker with tile op cases (#1358)
SiriusNEO Dec 1, 2025
283a9a0
[Language] support `T.gemm_sp_v2` on sm80 and sm89 (#1056)
botbw Dec 1, 2025
e547d24
[Bugfix] Update TIR registration for GemmSPPy to use tile operation (…
LeiWang1999 Dec 1, 2025
388ee7e
[Enhancement] Implement dynamic unroll factor in CUDA code generation…
LeiWang1999 Dec 1, 2025
e37f2ea
[CI] [pre-commit.ci] autoupdate (#1362)
pre-commit-ci[bot] Dec 2, 2025
f951b92
[Bugfix] Remove debug print in PyStmtFunctionVisitor (#1363)
LeiWang1999 Dec 2, 2025
d88594a
[Debug] Always include line info in NVCC command for improved profili…
LeiWang1999 Dec 2, 2025
6501bd0
[Refactor] Update condition for benchmarking in example_gemv.py and s…
LeiWang1999 Dec 2, 2025
422fb12
[Enhancement] Add DISABLE_CACHE environment variables (#1368)
SiriusNEO Dec 2, 2025
1da3deb
[Refactor]: Remove useless include in atomicadd_vectorize.h (#1371)
yyttt6 Dec 3, 2025
92121fc
[Refactor] Generalize fp8 process (#1372)
LeiWang1999 Dec 3, 2025
6654064
[Layout] Enhance Free Layout Inference (#1375)
LeiWang1999 Dec 5, 2025
f8e7fef
[Enhancement] Introduce buffer var lca analysis for pass plan buffer …
LeiWang1999 Dec 6, 2025
924225e
[Tool] Provide layout visualization tool (#1353)
Cunxiao2002 Dec 6, 2025
8d019eb
[Release] Relax constraint of tvm-ffi to compatible version (#1373)
oraluben Dec 6, 2025
0921328
[Language] Tilelang LazyJIT Experimental Version (#1337)
kurisu6912 Dec 6, 2025
3f8e6b5
[Builder] Enhance variable name binding and scope management (#1378)
LeiWang1999 Dec 6, 2025
a407c4a
[Bugfix] make cuda driver api compat with cuda12/13, along with tests…
PannenetsF Dec 6, 2025
8f50c12
[Fix] typo in cuda attr (#1380)
PannenetsF Dec 6, 2025
6021f86
[Language V2] Minor fix for complex annotations (#1381)
LeiWang1999 Dec 6, 2025
ce16e47
[Release] Bump Version into 0.1.7 (#1377)
LeiWang1999 Dec 7, 2025
305c854
[Typing] Enhance compatibility for advanced typing features in Python…
LeiWang1999 Dec 7, 2025
d933d65
[Bugfix][Build] Update CMake configuration to remove project root inj…
LeiWang1999 Dec 8, 2025
242b43b
[BugFix] Fix split kernel layout bug of GQA decode (#1386)
tzj-fxz Dec 8, 2025
e7e4e65
[Enhancement] Add debug output methods for Layout and Fragment classe…
kurisu6912 Dec 10, 2025
bc084aa
[Doc] Update logging docs (#1395)
SiriusNEO Dec 10, 2025
f2858fa
[Enhancement] Refactor inflight computing to support dynamic pipeline…
LeiWang1999 Dec 10, 2025
d19142f
[AMD] Fix 3 bugs when build docker on amd mi3x gpu (#1401)
danielhua23 Dec 10, 2025
79d381d
[Typo] Fix tilelang link in README.md (#1402)
senlyu163 Dec 11, 2025
0eb33f2
[Dependency] Update apache-tvm-ffi version to >=0.1.2 (#1400)
LeiWang1999 Dec 11, 2025
53be59d
[AMD] Enable FA2 fwd on AMD MI300X (#1406)
danielhua23 Dec 11, 2025
ede9eaa
[TypoFix] fix typo for SM120 (#1408)
Cunxiao2002 Dec 11, 2025
08262bc
[Doc] Minor documentation update (#1410)
LeiWang1999 Dec 11, 2025
ba2c185
[Dependency] Add torch-c-dlpack-ext to project requirements (#1403)
LeiWang1999 Dec 12, 2025
34632a1
[Dependency] Update TVM subproject to latest commit 2b1ead1a (#1412)
LeiWang1999 Dec 12, 2025
6f67da8
[Enhancement] Introduce `T.__ldg` (#1414)
LeiWang1999 Dec 12, 2025
e84b24b
[Enhancement] Improve vectorization invariant check (#1398)
LJC00118 Dec 12, 2025
2905143
[Lint] Phaseout Yapf format and embrace ruff format (#1417)
LeiWang1999 Dec 12, 2025
3546e2e
[Atomic] Use ptr for atomicAdd dst instead of reference (#1425)
LeiWang1999 Dec 13, 2025
00dd738
[CUDA] Add read-only parameter annotation for CUDA codegen (#1416)
LeiWang1999 Dec 13, 2025
89521e6
[Refactor] Phase out the primitives folder since its design has been …
LeiWang1999 Dec 15, 2025
3aa6938
[CI]: Bump actions/upload-artifact from 5 to 6 (#1431)
dependabot[bot] Dec 15, 2025
87e9e17
[CI]: Bump actions/download-artifact from 6 to 7 (#1432)
dependabot[bot] Dec 15, 2025
fba12a5
[Bugfix] Convey `compile_flags` to ffi compilation path with pass_co…
LeiWang1999 Dec 15, 2025
0788feb
[Enhancement] Improve buffer usage tracking in MakePackedAPI (#1435)
LeiWang1999 Dec 15, 2025
2feaa41
[Enhancement] Improve InjectAssumes logic and make assumes work after…
SiriusNEO Dec 15, 2025
b8003a2
[Enhancement] Include PrimFunc name in memory cache logs for better d…
LeiWang1999 Dec 15, 2025
4dbc910
[CI] Update lint dependencies and fix lint on trunk (#1433)
XuehaiPan Dec 15, 2025
e387102
[Enhancement] Refactor vectorization checks in loop_vectorize (#1440)
LeiWang1999 Dec 15, 2025
bcae814
Enhance vectorized conversion support (#1438)
LJC00118 Dec 15, 2025
869f021
[Feature] Support region as input of T.cumsum (#1426)
Dayuxiaoshui Dec 15, 2025
81b8c1b
[Fix] Fix analyzer bind conflicting (#1446)
kurisu6912 Dec 16, 2025
dda4512
[Refactor] Reduce direct dependency on PyTorch due to its limited typ…
LeiWang1999 Dec 16, 2025
0b6336b
[Refactor] Use `pytest.mark.parameterize` to speedup parallel testing…
kurisu6912 Dec 16, 2025
899f7bd
[Docs] Improve installation instructions for developers (#1450)
SiriusNEO Dec 16, 2025
9c21586
[Feat] Integrate Z3 in TVM Arith Analyzer (#1367)
kurisu6912 Dec 17, 2025
f4f87f4
[Bugfix] Improve autotune from elementwise_add function in examples (…
senlyu163 Dec 17, 2025
0814b17
[Language] Introduce `T.annotate_restrict_buffers` (#1428)
LeiWang1999 Dec 17, 2025
f914f2d
[Analyzer] Require loop extent > 0 when entering loop (#1451)
kurisu6912 Dec 17, 2025
0c25c4f
Updat ROCm CI to Nightly-ROCm-7.1 (#1449)
Gongen-Ali Dec 17, 2025
c750fb8
[Enhancement] Update examples and tests for improved type handling fu…
LeiWang1999 Dec 17, 2025
aa19342
[Issue Template] Enable blank issues in GitHub issue template(#1453)
LeiWang1999 Dec 17, 2025
6aaf3c7
[CI] Moved the clang-tidy step to after pip install (#1456)
LeiWang1999 Dec 17, 2025
3ee0939
[Bug] Fix tvm build script when patchelf is not found #1459)
kurisu6912 Dec 17, 2025
91cf796
[Analyzer] Fix floordiv & floormod bug in z3 prover (#1458)
kurisu6912 Dec 17, 2025
48e70e6
[Cache] Rename sparse compress cache directory (#1460)
LeiWang1999 Dec 17, 2025
cae06ed
[Language]Adds a random number generation capability through curand_k…
silentCoder-dev Dec 18, 2025
a6f59f3
remove unused duplicated type check (#1462)
sgjzfzzf Dec 18, 2025
7248a81
feat(cutedsl): add CuTeDSL backend (#1421)
lucifer1004 Dec 18, 2025
f067260
[Refactor] Rename test for curand & add triton baseline in `test_tile…
silentCoder-dev Dec 19, 2025
f6db201
[ArgBinder] Enhance shape variable handling and assertions (#1467)
LeiWang1999 Dec 19, 2025
1a3a64f
[Language] Make TL scripts friendly to Python syntax highlights (#1466)
SiriusNEO Dec 19, 2025
95e3b5a
[Refactor] Remove triton dependence in testing & move triton baseline…
silentCoder-dev Dec 19, 2025
3516f1e
[Language] Enhance T.dtype.as_torch conversion for compatibility (#1473)
LeiWang1999 Dec 19, 2025
2217eb7
[News] update with latest news (#1475)
LeiWang1999 Dec 19, 2025
168aec7
[Enhancement] Use static Z3 context (#1482)
LeiWang1999 Dec 19, 2025
7e8d1f8
[Enhancement] Enhance let binding handling in layout inference and wa…
LeiWang1999 Dec 20, 2025
a874e4e
[Refactor] Phaseout PassConfig `kDisableDynamicTailSplit` and `kDynam…
LeiWang1999 Dec 21, 2025
a431797
[Enhancement] Optimize the time cost of critical path for IntervalSet…
LeiWang1999 Dec 22, 2025
ba23181
[CI] Add preformance regression test script (#1489)
xwhzz Dec 22, 2025
718e398
Pin nvidia-cutlass-dsl to 4.3.3 (#1497)
lucifer1004 Dec 22, 2025
5acaab7
[Language] Remove ConstIf Frame for better meta programming (#1496)
kurisu6912 Dec 22, 2025
6e0982d
[CI] Fix concurrency bug in regression test workflow (#1500)
xwhzz Dec 22, 2025
1d9a2ea
[Refactor] Phaseout legacy `alloc_local` statement in examples and in…
LeiWang1999 Dec 22, 2025
2d8bf3e
[Enhancement] Optimize MHA varlen fwd and support autotune (#1499)
Rachmanino Dec 22, 2025
174fbe1
[Enhancement] Refactor CUDA vectorized cast generation and remove uns…
LJC00118 Dec 22, 2025
3593a73
[Dependency] Update apache-tvm-ffi to >=0.1.6 for memory safety when …
LeiWang1999 Dec 23, 2025
74aef5b
Update cutedsl docs and version check(#1503)
lucifer1004 Dec 23, 2025
4d8e609
[Misc] configure pymarkdown (#1505)
lucifer1004 Dec 23, 2025
e79bbcc
[Language] Fix gemm syntax highlight (#1476)
SiriusNEO Dec 23, 2025
783694f
[Fix] Fix TL_ENABLE_PTXAS_VERBOSE_OUTPUT has no effect in tvm-ffi (#1…
kurisu6912 Dec 23, 2025
11f122e
[Refactor] Phaseout execution_backend `ctypes` (#1510)
LeiWang1999 Dec 23, 2025
c7e8cab
[Testing] Add Memory Leak Test (#1516)
kurisu6912 Dec 24, 2025
09385e7
[Refactor] Support auto swizzling for tma store and phaseout related …
LeiWang1999 Dec 24, 2025
41603f8
[CuTeDSL][Fix] thread safety + context safety (#1513)
lucifer1004 Dec 24, 2025
feb106b
[BugFix] Phaseout unused tests for gqa decode kernels and add the ker…
tzj-fxz Dec 24, 2025
42697c0
[Cleanup] Remove unnecessary macros in tilelang examples (#1514)
Rachmanino Dec 24, 2025
98bc297
Fix ramp_lanes calculation in CUDA codegen (#1518)
LJC00118 Dec 24, 2025
0006621
[Misc] add env for default target/backend/verbose (#1512)
lucifer1004 Dec 24, 2025
bea40bd
[Dtype] Improve host codegen handling for subtype (#1517)
LeiWang1999 Dec 24, 2025
cfccd63
[Bugfix] Fallback to a Linear Layout instead of raising errors (#1521)
LeiWang1999 Dec 24, 2025
2ca5e39
Use `TargetIsCuda` for all cuda target (#1522)
oraluben Dec 24, 2025
d0bcc69
Fix fp4 pointer arithmetic in CUDA codegen (#1524)
LJC00118 Dec 24, 2025
d7e264f
[Enhancement] Improve GitHub Actions permissions check and refine per…
xwhzz Dec 24, 2025
3c11823
[Release] Bump version into 0.1.7.post1 (#1506)
LeiWang1999 Dec 24, 2025
d140415
[Pipeline] Refactor buffer allocation in Inject Pipeline Pass (#1525)
LeiWang1999 Dec 24, 2025
0c3d913
[Dev] Fix when build local version with isolated build (#1487)
oraluben Dec 25, 2025
2b79a76
[Bugfix] Skip stride check for subtype (#1531)
LeiWang1999 Dec 25, 2025
3ce8ac9
[Lint] Enable whitespace and permission bit hooks (#1439)
XuehaiPan Dec 25, 2025
14067c3
[Enhancement][Tool] Tree-style pretty ASTPrinter (#1468)
SiriusNEO Dec 25, 2025
d5d959e
[Fix] Add support for non-var complement arithmetic computation (#137…
kurisu6912 Dec 25, 2025
dff10e5
[BugFix] Complete vectorized loading for common dtypes (#1536)
SiriusNEO Dec 25, 2025
d219f6c
[Compat] Add CUDA version check for __nv_fp8_e8m0 type (#1537)
LeiWang1999 Dec 25, 2025
2e82f37
[Bug] Fix bugs of varlen attention forward examples caused by `S_q !=…
hukongyi Dec 26, 2025
a9d65d9
[Bug] Fix hanging from reduction on sm120 (#1540)
PannenetsF Dec 26, 2025
5bba4df
[example] use T.dynamic instead of tvm.te.var (#1538)
botbw Dec 26, 2025
9ff7c52
[Enhancement] Refactor KernelCache to use inheritance-based design (#…
sgjzfzzf Dec 26, 2025
9b58ed0
[Bugfix] Avoid considering `local.var` buffer as `local` (#1541)
LeiWang1999 Dec 26, 2025
875b42f
[Bugfix] Fix of `T.Fill` for local.var (#1543)
LeiWang1999 Dec 26, 2025
c9371a5
[Z3] Change z3 timeout to rlimit for determistic prove behavior (#1542)
kurisu6912 Dec 27, 2025
72ce848
[Feat] Adapt gemm v2 for cutedsl backend (#1544)
lucifer1004 Dec 27, 2025
d70cf36
[Enhancement] Support larger `H` in deepseek sparse mla backward via …
Rachmanino Dec 27, 2025
23ede42
[Bugfix] Fix regression test to use installed package instead of sour…
xwhzz Dec 28, 2025
b6ace13
[Refactor] Introduce layout annotations for `ParallelOPNode` and `Cop…
LeiWang1999 Dec 28, 2025
f57956d
[Script] Provide regression test script to help benchmark regression …
LeiWang1999 Dec 28, 2025
470d8b2
[Typing] Update Kernel signature and add type hints for buffer operat…
clouds56 Dec 29, 2025
193eff1
[CI]: Bump actions/upload-artifact from 4 to 6 (#1555)
dependabot[bot] Dec 29, 2025
d317710
Use cuda capability from torch to be more generic (#1557)
oraluben Dec 29, 2025
9f998e3
[CI]: Bump actions/github-script from 7 to 8 (#1556)
dependabot[bot] Dec 29, 2025
27db71f
[Host] Provide post process to customize host code and enhance nullab…
LeiWang1999 Dec 29, 2025
e64961f
[Release] Build tilelang against CUDA 13.1 in CI (#1532)
oraluben Dec 29, 2025
b702299
[LazyJIT] Move Type Annotations to Function Body (#1480)
kurisu6912 Dec 29, 2025
124583b
[bugfix] fix missing logic for clear_accum (#1563)
botbw Dec 29, 2025
f4ad7d3
[Misc] Remove unused `tl_pipeline_sync`. (#1566)
c8ef Dec 29, 2025
cca8b6f
[Refactor] Improve scalarization handling in vectorization logic (#1565)
LeiWang1999 Dec 29, 2025
e23bce7
[Refactor] Simplify do_bench calls by using default warmup and rep pa…
LeiWang1999 Dec 29, 2025
8c9101e
[CI] Refactor PR regression test job conditions (#1569)
xwhzz Dec 29, 2025
0f9bbd7
[Parallel][Infer] Free-mode chooses minimal replication between buffe…
LeiWang1999 Dec 30, 2025
b6a2513
[Refactor] Enhance deterministic ordering in shared memory allocation…
LeiWang1999 Dec 30, 2025
0fa16b4
[Enhancement] Improve equality checks in layout nodes and fragment va…
LeiWang1999 Dec 30, 2025
e1138ad
[Feature] add kUseCooperativeLaunch tag for tvm_ffi (#1572)
silentCoder-dev Dec 31, 2025
7cf1f26
[Refactor] Remove unnecessary logging configuration in Analyzer.py (#…
LeiWang1999 Dec 31, 2025
53ea96c
[Release] Bump version to 0.1.7.post2 (#1575)
LeiWang1999 Dec 31, 2025
15c457f
[BugFix] Change default rounding mode for fp4 conversions (#1580)
LJC00118 Dec 31, 2025
3b7ebc0
[CI] Add CUDA-aware pytest scheduler + auto workers (#1584)
LeiWang1999 Dec 31, 2025
dcacc5a
[Enhancement] Improve performance regression output with timing and s…
xwhzz Dec 31, 2025
0643349
[Bugfix] Add kernel_global_source property to TVMFFIKernelAdapter (#1…
haok1402 Jan 1, 2026
e1f76d1
Add PrimExpr substitution support for AttrStmt nodes (#1583)
LJC00118 Jan 1, 2026
d6eb5d3
[BugFix] fix tcgen5mma example (#1577)
Rachmanino Jan 1, 2026
1cd95ce
Merge mainstream TileLang with TileScale distributed features
uv-xiao Jan 1, 2026
5e368a8
[Doc] Rename docs/merge_tilescale to docs/sync_with_tilelang and add …
uv-xiao Jan 15, 2026
9894b5e
fix a typo
Rachmanino Jan 16, 2026
ecdd597
Remove symbols created by Claude's hallucination
Rachmanino Jan 16, 2026
1dec815
fix include logic in cuda codegen
Rachmanino Jan 16, 2026
de0692b
fix ldst.h
Rachmanino Jan 16, 2026
e2b9ea0
fix more files
Rachmanino Jan 16, 2026
4cbcfb6
migrate from `TIR_REGISTER_TL_OP` to `TIR_REGISTER_TL_TILE_OP`
Rachmanino Jan 16, 2026
c10bf02
let all distributed examples pass
Rachmanino Jan 19, 2026
4344f79
fix deepep regression via applying vectorization
Rachmanino Jan 19, 2026
9124334
fix lint and remove Claude's merge doc
Rachmanino Jan 19, 2026
2173854
fix sdist
Rachmanino Jan 19, 2026
f6c13e9
disable arm and macos
Rachmanino Jan 19, 2026
874e050
fix `dist.yml`
Rachmanino Jan 19, 2026
6ea6d96
disable ci for arm and metal
Rachmanino Jan 19, 2026
de8d36d
fix ts_ext
Rachmanino Jan 19, 2026
141e4d4
use sdist for ci
Rachmanino Jan 19, 2026
109e67a
use tilelang's new ci
Rachmanino Jan 19, 2026
af1906e
use cmake rather than pyproject dependency for tilescale extension
Rachmanino Jan 19, 2026
b51575d
install torch before ts_ext
Rachmanino Jan 19, 2026
78fc8fe
fix torch lib link bug
Rachmanino Jan 28, 2026
add3089
add missing codegen
Rachmanino Jan 28, 2026
b241d88
disable ci test for deepep
Rachmanino Jan 28, 2026
4ce401d
fix gitignore bug
Rachmanino Jan 28, 2026
86b49ec
disable ib for nccl
Rachmanino Jan 28, 2026
71c6800
switch to new ci runner
Rachmanino Jan 29, 2026
ed2e798
lint
Rachmanino Jan 29, 2026
93f8dd0
set num_procs to 2
Rachmanino Jan 29, 2026
6f1fb5c
fix typo
Rachmanino Jan 29, 2026
1b7b053
using tsinghua src for pip
Rachmanino Jan 29, 2026
60450af
refactor CI workflow to remove SDist download step, simplifying artif…
Rachmanino Jan 29, 2026
611d4a0
[BugFix] Add device_ids attribute to BaseAllocator for improved devic…
chengyupku Jan 29, 2026
2629772
[Doc] Update Installation Guide for TileScale: Simplify installation …
chengyupku Jan 29, 2026
9eb91e4
[Feature] Support tvm-ffi for TileScale
Rachmanino Jan 30, 2026
c250854
update DeepEP installation script and
Rachmanino Feb 2, 2026
af40faf
draft for supporting tvm-ffi in deepep
Rachmanino Feb 2, 2026
c3b5392
[Refactor] Update memory management to use constant memory for meta_d…
chengyupku Feb 2, 2026
df240f2
lint fix
chengyupku Feb 2, 2026
dca507d
[Bugfix]ci: add missing - for uv run --script stdin input
rqfeng930 Feb 5, 2026
848a491
[Bugfix]dist: exclude nvshmem and nccl libs from auditwheel repair
rqfeng930 Feb 5, 2026
b1737ad
[Bugfix]dist: disable abi3audit --strict to allow nvshmem builds
rqfeng930 Feb 5, 2026
7e157eb
dist: skip CUDA wheel tests on GitHub-hosted runners
rqfeng930 Feb 5, 2026
40ec322
fix missing loop_break import
chengyupku Feb 6, 2026
93f33f1
[CI] Enhance CI workflow and testing framework for distributed tests
chengyupku Feb 6, 2026
26a07cb
[CI] Refactor distributed test marker to a decorator
chengyupku Feb 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 4 additions & 2 deletions .clang-tidy
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
---
InheritParentConfig: true
ExtraArgs: ['-v']
ExtraArgs: []
FormatStyle: file
UseColor: true
WarningsAsErrors: '*'
ExcludeHeaderFilterRegex: '^(3rdparty|tvm)/.*$'
# FIXME: Use `ExcludeHeaderFilterRegex` instead when all maintainers upgraded their `clang-tidy`
HeaderFilterRegex: '^(?!.*(?:/|^)(3rdparty|tvm)/).*'
# ExcludeHeaderFilterRegex: '^(3rdparty|tvm)/.*$'

# NOTE: there must be no spaces before the '-', so put the comma last.
Checks: >-
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -1 +1 @@
blank_issues_enabled: false
blank_issues_enabled: true
63 changes: 63 additions & 0 deletions .github/ISSUE_TEMPLATE/release-plan.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: "Release Plan"
description: "Plan the next release"
title: "[Release Plan] vX.Y.Z"
labels:
- release-plan
- tracking
assignees: []
body:
- type: input
id: version
attributes:
label: "Version"
placeholder: "v0.2.0"
validations:
required: true

- type: input
id: milestone
attributes:
label: "Milestone"
description: "Link or name of the milestone for this release"
placeholder: "https://github.com/tile-ai/tilelang/milestone/XX"

- type: textarea
id: scope
attributes:
label: "Scope"
description: "Goals and non-goals (brief)"
placeholder: |
- Goals: ...
- Non-goals: ...

- type: textarea
id: tasks
attributes:
label: "Tasks"
description: "Task list; link issues/PRs"
value: |
- [ ] Features
- [ ] Fixes
- [ ] Docs
- [ ] API/Breaking changes
- [ ] Benchmarks
- [ ] Release notes

- type: checkboxes
id: readiness
attributes:
label: "Readiness"
options:
- label: "All planned issues closed or deferred"
- label: "Docs updated"
- label: "CI green; artifacts verified"
- label: "Release notes drafted"

- type: textarea
id: notes
attributes:
label: "Notes"
description: "Risks or communications (optional)"
placeholder: |
- Risk: ...
- Communication: ...
8 changes: 4 additions & 4 deletions .github/workflows/amd_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: [self-hosted, amd, gpu]

permissions:
contents: write
contents: write

steps:
- name: Checkout repository
Expand Down Expand Up @@ -56,7 +56,7 @@ jobs:
echo "------------------------------------"
exit 1
fi

- name: Commit and Push Changes
uses: stefanzweifel/git-auto-commit-action@v5
with:
Expand Down Expand Up @@ -86,7 +86,7 @@ jobs:
set -e
REQS_HASH=$(sha256sum requirements-rocm.txt | cut -d ' ' -f 1)
MARKER="${{ runner.tool_cache }}/.venv_marker_${{ env.PYTHON_VERSION }}_${REQS_HASH:0:8}"

echo "Installing requirements"
if [[ -f "$MARKER" ]] && [[ -f "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate" ]]; then
echo "venv exists and hash matches – reuse it"
Expand Down Expand Up @@ -117,4 +117,4 @@ jobs:
source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
cd testing/python/amd
unset PYTHONPATH
python -m pytest -v test_tilelang_test_amd.py
python -m pytest -v test_tilelang_test_amd.py
Loading