[Analysis] Enhance NestedLoopChecker with tile op cases#1358
[Analysis] Enhance NestedLoopChecker with tile op cases#1358LeiWang1999 merged 2 commits intotile-ai:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
WalkthroughAdds a semantic check that forbids TiledOp calls inside Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tilelang/analysis/nested_loop_checker.py (1)
22-26: PotentialIndexErrororAttributeErrorifop.argsis empty or first arg is not a string literal.The function assumes
op.args[0]exists and has a.valueattribute. Iftir.call_externis somehow invoked with no arguments or a non-string first argument, this will raise an unhandled exception. Consider adding a guard:def is_atomic_op(op: Call) -> bool: """Check if a call is an atomic operation.""" - - return op.op == Op.get("tir.call_extern") and op.args[0].value.startswith("Atomic") + if op.op != Op.get("tir.call_extern"): + return False + if len(op.args) == 0: + return False + first_arg = op.args[0] + return hasattr(first_arg, "value") and isinstance(first_arg.value, str) and first_arg.value.startswith("Atomic")testing/python/analysis/test_tilelang_nested_loop_checker.py (1)
604-678: Consider extracting shared test setup to reduce duplication.This function duplicates significant logic from
run_gemm_mixed_pp(lines 472-546), including the reference program and compilation setup. Consider extracting shared setup into a helper function.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
testing/python/analysis/test_tilelang_nested_loop_checker.py(1 hunks)tilelang/analysis/ast_printer.py(1 hunks)tilelang/analysis/nested_loop_checker.py(4 hunks)tilelang/engine/phase.py(0 hunks)
💤 Files with no reviewable changes (1)
- tilelang/engine/phase.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
Repo: tile-ai/tilelang PR: 794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.
Applied to files:
tilelang/analysis/nested_loop_checker.py
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
testing/python/analysis/test_tilelang_nested_loop_checker.py
🪛 Ruff (0.14.6)
tilelang/analysis/nested_loop_checker.py
63-66: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
🔇 Additional comments (7)
tilelang/analysis/ast_printer.py (1)
17-17: LGTM! Enhanced debug output.The additional statement representation improves debugging visibility during AST traversal.
tilelang/analysis/nested_loop_checker.py (2)
50-52: LGTM! Correct use ofsuper().visit_for_(op).Using the base class traversal ensures child nodes (including Call nodes) are visited, which is necessary for the new
visit_call_hook to function correctly. This follows TVM's PyStmtExprVisitor patterns.
61-66: Verify whether TIR built-in operations likeT.maxreachvisit_call_and whether they are correctly classified byis_atomic_op.The
visit_call_method rejects all non-atomic calls inside a parallel context by checkingis_atomic_op(op). However, the testtir_op_with_parallelreportedly usesT.maxinside a parallel loop. IfT.maxproduces aCallnode andis_atomic_opreturnsFalsefor it, this would raise an exception. The fact that the test passes suggests either TIR operations bypass this visitor, use different node types, oris_atomic_ophandles them differently than the current logic implies.Additionally, the error message states "Only elementwise operations are allowed" but the check only permits atomic operations—clarify whether these terms are intentionally distinct or if the message should be updated.
testing/python/analysis/test_tilelang_nested_loop_checker.py (4)
597-598: Good test case for the new restriction.This correctly tests the scenario where a tile operation (
T.gemm) is placed inside aT.Parallel(1)block, which should trigger the new semantic check.
681-694: LGTM! Good coverage for TIR operators inside parallel.This test correctly verifies that TIR built-in operations like
T.maxare permitted inside parallel loops, distinguishing them from tile operations.
697-711: LGTM! Good coverage for atomic operations inside parallel.This test correctly verifies that atomic operations (
T.atomic_add) are permitted inside parallel loops, as expected by the new checker logic.
714-723: Comprehensive test coverage for the new tile op restriction.The test function exercises all three key scenarios: rejection of tile operations in parallel, acceptance of TIR built-ins, and acceptance of atomic operations.
| def is_atomic_op(op: Call) -> bool: | ||
| """Check if a call is an atomic operation.""" | ||
|
|
||
| return op.op == Op.get("tir.call_extern") and op.args[0].value.startswith("Atomic") |
There was a problem hiding this comment.
I think this is a bit tricky, we can collect the definition of tile intrins, let me take a look
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/language/reduce.py (1)
245-257: Updatecumsumdocstring to match the newtl.tileop.cumsumintrinsicThe implementation now emits
tir.op.Op.get("tl.tileop.cumsum"), but the docstring still says it emits thetl.cumsumintrinsic, which is misleading.Suggest updating the docstring to reference the new intrinsic name:
- Negative `dim` indices are normalized (Python-style). If `dst` is None, the operation is performed in-place into `src`. Raises ValueError when `dim` is out of bounds for `src.shape`. When `src.scope() == "local.fragment"`, this delegates to `cumsum_fragment`; otherwise it emits the `tl.cumsum` intrinsic. + Negative `dim` indices are normalized (Python-style). If `dst` is None, the operation is performed in-place into `src`. Raises ValueError when `dim` is out of bounds for `src.shape`. When `src.scope() == "local.fragment"`, this delegates to `cumsum_fragment`; otherwise it emits the `tl.tileop.cumsum` intrinsic.Also applies to: 260-310
🧹 Nitpick comments (1)
tilelang/analysis/nested_loop_checker.py (1)
60-65: Clarify the error message to accurately reflect allowed operations.The error message states "Only elementwise operations are allowed inside a parallel loop," but according to the PR summary, atomic operations should also be permitted within T.Parallel. If atomic operations are indeed allowed, the message should be updated to reflect this.
Consider updating the error message to be more specific:
def visit_call_(self, op: Call) -> None: if self.in_parallel_context and is_tile_op(op): raise ValueError("[Tilelang Semantic Check] " - "Only elementwise operations are allowed inside a parallel loop. " \ + "Tile operations (except atomic operations) are not allowed inside a parallel loop. " \ f"Got a tile-op \"{op.op}\"." )Note: This suggestion assumes atomic operations should be excluded from the check. Please verify based on the outcome of the verification script in the previous comment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
src/op/atomic_add.cc(1 hunks)src/op/copy.cc(2 hunks)src/op/fill.cc(1 hunks)src/op/finalize_reducer.cc(1 hunks)src/op/gemm.cc(1 hunks)src/op/gemm_py.cc(1 hunks)src/op/gemm_sp.cc(1 hunks)src/op/operator.h(1 hunks)src/op/reduce.cc(2 hunks)src/op/region.cc(1 hunks)tilelang/analysis/nested_loop_checker.py(4 hunks)tilelang/language/atomic.py(1 hunks)tilelang/language/copy.py(2 hunks)tilelang/language/experimental/gemm_sp.py(1 hunks)tilelang/language/fill.py(1 hunks)tilelang/language/gemm.py(2 hunks)tilelang/language/reduce.py(9 hunks)tilelang/language/utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
src/op/gemm_py.cc
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
Repo: tile-ai/tilelang PR: 794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.
Applied to files:
tilelang/analysis/nested_loop_checker.py
🧬 Code graph analysis (11)
tilelang/language/utils.py (1)
tilelang/language/tir/op.py (1)
call_intrin(120-145)
tilelang/language/copy.py (1)
tilelang/language/tir/op.py (1)
call_intrin(120-145)
src/op/finalize_reducer.cc (2)
tilelang/ir.py (1)
FinalizeReducerOp(53-54)tilelang/language/reduce.py (1)
finalize_reducer(313-330)
src/op/gemm_sp.cc (2)
tilelang/ir.py (1)
GemmSP(48-49)tilelang/language/experimental/gemm_sp.py (1)
gemm_sp(9-87)
src/op/copy.cc (2)
tilelang/ir.py (2)
Copy(20-21)Conv2DIm2ColOp(25-26)tilelang/language/copy.py (2)
copy(13-94)c2d_im2col(97-128)
src/op/fill.cc (3)
src/op/fill.h (1)
Fill(45-50)tilelang/ir.py (1)
Fill(10-11)tilelang/language/fill.py (1)
fill(8-36)
src/op/region.cc (1)
tilelang/language/utils.py (1)
region(7-10)
tilelang/language/fill.py (1)
tilelang/language/tir/op.py (1)
call_intrin(120-145)
src/op/gemm_py.cc (1)
tilelang/tileop/gemm/__init__.py (1)
GemmPy(56-192)
tilelang/language/atomic.py (1)
tilelang/language/tir/op.py (1)
call_intrin(120-145)
src/op/atomic_add.cc (1)
tilelang/ir.py (1)
AtomicAdd(15-16)
🪛 Ruff (0.14.6)
tilelang/analysis/nested_loop_checker.py
62-65: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (25)
tilelang/analysis/nested_loop_checker.py (3)
4-4: LGTM!The
Callimport is correctly added to support the newis_tile_opfunction andvisit_call_method.
34-58: LGTM!The switch from manual traversal (
self.visit_stmt(child)andself.visit_stmt(op.body)) tosuper().visit_for_(op)is a good improvement. This ensures all nested statements are properly visited, making the nested loop checking more thorough and less prone to missing edge cases.
21-24: Verify that atomic operations are not incorrectly flagged as tile-ops.The PR summary states that "only atomic operations (such as T.atomic_add) and TIR operators" are permitted within T.Parallel. Since atomic operations now use the
tl.tileop.*namespace (e.g.,tl.tileop.atomicadd), they may have the "TLOpBuilder" attribute set and be incorrectly blocked by this check. This requires verification by examining the atomic operation registrations to confirm whether they set the "TLOpBuilder" attribute, and if so, whetheris_tile_op()should explicitly exclude them.tilelang/language/utils.py (1)
10-10: LGTM!The namespace migration from
tl.regiontotl.tileop.regionaligns with the broader refactoring across the codebase to standardize tile operations under thetl.tileop.*namespace.tilelang/language/gemm.py (2)
119-119: LGTM!Namespace migration from
tl.gemmtotl.tileop.gemmis consistent with the PR's objective to standardize tile operations.
148-148: LGTM!Namespace migration from
tl.gemm_pytotl.tileop.gemm_pyis consistent with the PR's objective to standardize tile operations.tilelang/language/experimental/gemm_sp.py (1)
73-73: LGTM!Namespace migration from
tl.gemm_sptotl.tileop.gemm_spis consistent with the broader refactoring to standardize tile operations.src/op/reduce.cc (2)
481-484: LGTM!The registration macro change from
TIR_REGISTER_TL_OPtoTIR_REGISTER_TL_TILE_OPaligns with the PR's migration to tile-based operator registrations. The inputs and attributes remain unchanged.
566-569: LGTM!The registration macro change from
TIR_REGISTER_TL_OPtoTIR_REGISTER_TL_TILE_OPfor CumSumOp is consistent with the broader migration to tile-based operator registrations.tilelang/language/fill.py (1)
35-36: LGTM!Namespace migration from
tl.filltotl.tileop.fillaligns with the PR's standardization of tile operations under thetl.tileop.*namespace.src/op/fill.cc (1)
212-215: LGTM!The registration macro change from
TIR_REGISTER_TL_OPtoTIR_REGISTER_TL_TILE_OPis consistent with the broader migration to tile-based operator registrations across the codebase.tilelang/language/atomic.py (1)
214-218: LGTM, but note potential interaction with nested loop checker.The namespace migration from
tl.atomicaddtotl.tileop.atomicaddis consistent with the broader refactoring. However, this change is relevant to the concern raised in tilelang/analysis/nested_loop_checker.py: if tile-region-based atomic operations now have the "TLOpBuilder" attribute (due to being registered as tile ops), they may be incorrectly blocked inside T.Parallel loops, despite the PR summary stating that atomic operations should be permitted.This approval is contingent on verifying that the nested loop checker properly handles atomic operations (see verification request in nested_loop_checker.py review).
src/op/gemm_sp.cc (1)
305-308: LGTM! Operator registration macro updated consistently.The registration macro change from
TIR_REGISTER_TL_OPtoTIR_REGISTER_TL_TILE_OPcorrectly aligns with the namespace migration totl.tileop.*. The operator's inputs and attributes remain unchanged, maintaining backward compatibility at the functional level.src/op/gemm.cc (1)
829-832: LGTM! Registration macro updated consistently.The operator registration correctly uses the new
TIR_REGISTER_TL_TILE_OPmacro, migrating fromtl.gemmtotl.tileop.gemmnamespace while preserving all operator attributes.src/op/gemm_py.cc (1)
321-324: LGTM! GemmPy registration updated correctly.The registration macro change maintains consistency with the broader tile-op namespace migration. All operator attributes remain unchanged.
src/op/region.cc (1)
79-82: LGTM! RegionOp registration migrated to tile-op namespace.The macro change correctly registers RegionOp under
tl.tileop.region, consistent with the project-wide namespace migration. The operator's configuration (num_inputs=-1, TCallEffectKind=kPure) remains intact.src/op/finalize_reducer.cc (1)
162-165: LGTM! FinalizeReducerOp registration updated.The operator registration correctly migrates to the tile-op namespace while preserving all attributes (num_inputs=1, TCallEffectKind=kOpaque).
src/op/copy.cc (2)
2040-2043: LGTM! Copy operator registration updated.Registration correctly uses the new tile-op macro and namespace.
2065-2068: LGTM! Conv2DIm2ColOp registration updated.Consistent macro migration for the im2col operator, maintaining all attributes.
tilelang/language/copy.py (2)
93-94: LGTM! Python namespace updated to match C++ registration.The
copyintrinsic call correctly referencestl.tileop.copy, aligning with the C++ operator registration namespace.
127-128: LGTM! Conv2D im2col namespace updated.The
c2d_im2colintrinsic call correctly uses the newtl.tileop.c2d_im2colnamespace, maintaining consistency with the C++ backend.src/op/operator.h (1)
80-88: Key infrastructure change: macro renamed and namespace migrated.This macro definition change is the foundation for the tile-op categorization:
- Macro renamed:
TIR_REGISTER_TL_OP→TIR_REGISTER_TL_TILE_OP- Namespace migrated:
"tl."→"tl.tileop."The change enables the NestedLoopChecker and other analysis passes to distinguish tile operations from other TL operations.
Verify that all references to the old
"tl.{op}"namespace have been migrated to"tl.tileop.{op}"throughout the codebase:
- Search for remaining
TIR_REGISTER_TL_OPmacro usages (old macro name)- Search for
Op::Get("tl.[^t]...)patterns in C++ (old namespace without "tileop")- Search for old namespace references in Python files (Op.get and tir.op.Op.get calls)
tilelang/language/reduce.py (2)
16-16: Centralizing the reduce intrinsic key via_REDUCE_OP_KEYlooks goodUsing a single
_REDUCE_OP_KEY = "tl.tileop.reduce"and wiring alltir.call_intrinsites throughtir.op.Op.get(_REDUCE_OP_KEY)is consistent and reduces the risk of string mismatches across the various reduce buffer-scope combinations. No functional issues spotted.Also applies to: 56-62, 71-77, 84-90, 95-101
313-330: Finalize-reducer intrinsic rename is consistent and self‑documentingThe docstring and implementation both now target
tl.tileop.finalize_reducerviatir.op.Op.get("tl.tileop.finalize_reducer"), aligning with the rest of the tileop-prefixed intrinsics and keeping the Python API unchanged. This looks correct.src/op/atomic_add.cc (1)
542-545: Registration macro migration toTIR_REGISTER_TL_TILE_OPis consistentSwitching
AtomicAddtoTIR_REGISTER_TL_TILE_OP(AtomicAdd, atomicadd)while keepingset_num_inputs(2)and the opaque call effect aligns this op with the newtl.tileop.*registration scheme without altering semantics. No issues from this change alone.
* [Analysis] Enhance NestedLoopChecker with tile op cases * fix tileop issue
This PR explicitly adds check for cases like:
For now we only let atomic operations like
T.atomic_addand TIR operators present in T.Parallel.Summary by CodeRabbit
Tests
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.