[Refactor] Relocate layout transformation of `ptx_stmatrix` by LeiWang1999 · Pull Request #1689 · tile-ai/tilelang

LeiWang1999 · 2026-01-18T18:00:42Z

as title.

Summary by CodeRabbit

New Features
- Added two public layout helpers for 8x8 GEMM fragments (normal and transposed).
Refactor
- Expanded layout-aware access remapping and rewriting for tile/matrix lowering with safer early-exit handling.
- Improved loop-layout inference to avoid reusing source layouts when extents don’t align.
Bug Fixes
- Fixed shared-tensor index handling and added full-range validation with a safe fallback.
Chores
- Added lightweight debug logging and disabled a benchmarking printout.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2026-01-18T18:00:52Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-01-18T18:00:54Z

📝 Walkthrough

Walkthrough

Added full-range checks and early fallback in LDSM copy lowering; removed shared-buffer remapping in copy lowering; implemented layout-aware rewrites for tvm_access_ptr/address_of and matrix intrinsics in tile lowering; guarded direct reuse of source layout when computing loop-buffer layouts. (42 words)

Changes

Cohort / File(s)	Summary
LDSM Copy Lowering Simplification `src/op/copy.cc`	Add full-range validation and early fallback to `LowerNormalCopy`; remove shared-buffer remapping and Forward-transform usage; compute shared coordinates from original `shared_indices` and analyzer-simplified local/thread indices; simplify transposed/non-transposed paths.
Access Ptr & address_of Remapping / Tile Lowering `src/transform/lower_tile_op.cc`	Extend `buffer_map_` to store handle/data keys and enable reverse lookups; replace fatal `tvm_access_ptr` paths with layout-aware rewrite: resolve original buffer, convert linear offset → multi-dim indices, apply layout Forward, compute remapped data & offset, and rewrite `tvm_access_ptr`, `address_of`-based loads, `ptx_ldmatrix`/`ptx_stmatrix`, and `tl::mma_store` calls while preserving types/spans; skip TMA-context double-lowering via `in_tma_context_`.
Loop-Layout Guard `src/op/parallel.cc`	Add guard in `ComputeLoopLayoutFromBuffer` to reuse `src_layout` only when access indices match loop vars (`IsCommonAccessIndice`) and input dimensionality and per-dimension extents provably equal loop extents; otherwise fall back to previous construction.
Public Layout API Additions `src/layout/layout.cc`, `tilelang/layout/__init__.py`, `tilelang/layout/swizzle.py`	Add FFI bindings and Python exports/wrappers for `make_gemm_fragment_8x8` and `make_gemm_fragment_8x8_transposed`.
Misc / Instrumentation & Examples `tilelang/engine/phase.py`, `examples/deepseek_v32/sparse_mla_bwd.py`	Add lightweight debug prints around MergeSharedMemoryAllocations; comment out benchmark printing/block in example script.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant IR as IR Call
  participant Lower as LowerTileOpPass
  participant BMap as buffer_map_/var_remap_
  participant Layout as Layout API
  participant Rewriter as Call Rewriter

  IR->>Lower: visit `tvm_access_ptr`/`address_of`/`ptx_*` call
  Lower->>BMap: resolve handle-key or data-key → original buffer/param
  BMap-->>Lower: original buffer (+ layout?)
  alt layout exists
    Lower->>Layout: linear offset → multi-dim indices
    Layout-->>Lower: Forward(indices) → remapped indices
    Lower->>Layout: compute new total_offset & new_buffer.data
    Lower->>Rewriter: build rewritten call (new data ptr, new offset)
    Rewriter-->>Lower: rewritten call
    Lower-->>IR: replace original call with rewritten call
  else no layout
    Lower-->>IR: leave call unchanged
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[Layout] Enhance Free Layout Inference #1375 — overlaps changes to loop-layout validation and buffer/layout handling in src/op/parallel.cc and src/op/copy.cc.
[Clean][Refactor] Phaseout Legacy Pass ParallelLoopTransformer #1672 — touches src/op/copy.cc lowering logic and LDSM/STSM-related flows similar to this PR.
[TileOp] Introduce a experimental python defined T.gemm_v2 #793 — overlaps LowerTileOpPass buffer/layout remapping and tvm_access_ptr/address_of transformation logic.

Poem

🐰 I hopped through offsets, twined indices tight,

I nudged layouts so calls find new light.
Remaps and fallbacks, tidy as a trail,
Tiny hops, neat trails—every rewrite a tale.
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title refers to relocating layout transformation of ptx_stmatrix, which is addressed in src/transform/lower_tile_op.cc with extended layout-remap handling for ptx_stmatrix. However, the changeset encompasses broader refactoring across multiple files (copy.cc, lower_tile_op.cc, parallel.cc, layout.cc) and introduces new GEMM fragment APIs, making the title only partially representative of the full scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/transform/lower_tile_op.cc`:
- Around line 346-379: The new shape may include a prepended replicate dimension
created by makeBufferWithLayout, so before indexing forward_indices in
lower_tile_op (where forward_indices = layout->Forward(multi_dim_indices) and
new_shape is used to compute new_offset) compute the replicate index as
elem_offset divided by the layout extent (use the product of
layout->OutputShape() or a provided layout_extent) (simplify with
analyzer_->Simplify if needed) and prepend that replicate index to
forward_indices (or assert that forward_indices.size() == new_shape.size());
this ensures forward_indices aligns with new_shape and prevents out-of-range
access when the output rank increases due to shared-layout replication.

coderabbitai · 2026-01-18T18:09:52Z

src/transform/lower_tile_op.cc

+      // Get original and new buffer shapes
+      Array<PrimExpr> old_shape = original_buffer->shape;
+      Array<PrimExpr> new_shape = new_buffer->shape;
+      // Convert linear offset to multi-dimensional indices
+      Array<PrimExpr> multi_dim_indices;
+      PrimExpr remaining_offset = elem_offset;
+      for (int i = static_cast<int>(old_shape.size()) - 1; i >= 0; --i) {
+        multi_dim_indices.insert(
+            multi_dim_indices.begin(),
+            analyzer_->Simplify(floormod(remaining_offset, old_shape[i])));
+        remaining_offset = floordiv(remaining_offset, old_shape[i]);
+      }
+      // Apply layout transformation
+      auto forward_indices = layout->Forward(multi_dim_indices);
+
+      PrimExpr new_offset = 0;
+      PrimExpr stride_offset = 1;
+      for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) {
+        new_offset += forward_indices[i] * stride_offset;
+        stride_offset *= new_shape[i];
+      }
+      new_offset = analyzer_->Simplify(new_offset);
+      Array<PrimExpr> new_indices;
+      for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) {
+        new_indices.insert(new_indices.begin(),
+                           floormod(new_offset, new_shape[i]));
+        new_offset = floordiv(new_offset, new_shape[i]);
+      }
+      PrimExpr total_offset = 0;
+      PrimExpr new_stride_offset = 1;
+      for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) {
+        total_offset += new_indices[i] * new_stride_offset;
+        new_stride_offset *= new_shape[i];
+      }


⚠️ Potential issue | 🟠 Major

Handle shared-layout replication when output rank increases.

makeBufferWithLayout can prepend a replicate dimension for shared buffers. When that happens, new_shape.size() becomes layout->OutputShape().size() + 1, but the loop computing new_offset indexes forward_indices with the larger rank, causing out-of-range access or incorrect offsets. Please prepend the replicate index (derived from elem_offset / layout_extent) or assert rank consistency before using forward_indices.

🛠️ Proposed fix

- // Apply layout transformation - auto forward_indices = layout->Forward(multi_dim_indices); - - PrimExpr new_offset = 0; - PrimExpr stride_offset = 1; - for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) { - new_offset += forward_indices[i] * stride_offset; - stride_offset *= new_shape[i]; - } + // Apply layout transformation + auto forward_indices = layout->Forward(multi_dim_indices); + // If makeBufferWithLayout prepended a replicate dim, prepend it here too. + if (new_shape.size() == forward_indices.size() + 1) { + int64_t layout_extent = 1; + for (const auto& s : layout->OutputShape()) { + const auto* imm = s.as<IntImmNode>(); + ICHECK(imm) << "Layout output shape must be constant integer"; + layout_extent *= imm->value; + } + PrimExpr replicate_idx = + floordiv(elem_offset, IntImm(elem_offset->dtype, layout_extent)); + forward_indices.insert(forward_indices.begin(), replicate_idx); + } else { + ICHECK_EQ(new_shape.size(), forward_indices.size()) + << "Layout output rank mismatch for remapped buffer"; + } + + PrimExpr new_offset = 0; + PrimExpr stride_offset = 1; + for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) { + new_offset += forward_indices[i] * stride_offset; + stride_offset *= new_shape[i]; + }

🤖 Prompt for AI Agents

In `@src/transform/lower_tile_op.cc` around lines 346 - 379, The new shape may include a prepended replicate dimension created by makeBufferWithLayout, so before indexing forward_indices in lower_tile_op (where forward_indices = layout->Forward(multi_dim_indices) and new_shape is used to compute new_offset) compute the replicate index as elem_offset divided by the layout extent (use the product of layout->OutputShape() or a provided layout_extent) (simplify with analyzer_->Simplify if needed) and prepend that replicate index to forward_indices (or assert that forward_indices.size() == new_shape.size()); this ensures forward_indices aligns with new_shape and prevents out-of-range access when the output rank increases due to shared-layout replication.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/transform/lower_tile_op.cc (1)

443-449: Same replicate dimension issue as tvm_access_ptr branch.

This loop also iterates over new_shape.size() but indexes into forward_indices, which will be one element smaller when a replicate dimension is prepended. Apply the same fix as recommended for the tvm_access_ptr branch.

♻️ Duplicate comments (1)

src/transform/lower_tile_op.cc (1)

354-361: Handle shared-layout replication when output rank increases.

The same issue from the previous review remains: makeBufferWithLayout can prepend a replicate dimension for shared buffers. When that happens, new_shape.size() becomes forward_indices.size() + 1, but the loop indexes forward_indices with the larger rank, causing out-of-range access.

🧹 Nitpick comments (2)

src/transform/lower_tile_op.cc (2)

363-374: Consider simplifying redundant index decomposition.

After computing new_offset (line 362), the code decomposes it to new_indices and then recomputes total_offset. This round-trip should be mathematically equivalent to using new_offset directly when indices are in range. If this is intentional (e.g., for normalization), a brief comment would clarify the intent.

♻️ Suggested simplification

       new_offset = analyzer_->Simplify(new_offset);
-      Array<PrimExpr> new_indices;
-      for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) {
-        new_indices.insert(new_indices.begin(),
-                           floormod(new_offset, new_shape[i]));
-        new_offset = floordiv(new_offset, new_shape[i]);
-      }
-      PrimExpr total_offset = 0;
-      PrimExpr new_stride_offset = 1;
-      for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) {
-        total_offset += new_indices[i] * new_stride_offset;
-        new_stride_offset *= new_shape[i];
-      }
       layout_remap_.Set(new_buffer, layout);

       // Build new tvm_access_ptr call with new buffer and offset
       Array<PrimExpr> new_args = access_ptr_call->args;
       new_args.Set(1, new_buffer->data); // Replace data var
-      new_args.Set(2, total_offset);     // Replace offset
+      new_args.Set(2, new_offset);       // Replace offset

428-431: Remove or use dead code.

buffer_row_size is computed but explicitly discarded with (void). If the CheckAndGetBufferRowSize call is needed for its assertion side effect, add a comment. Otherwise, remove the unused computation.

♻️ Suggested fix (remove if not needed)

       auto buffer_map_iter = buffer_map_.find(Downcast<Var>(remap_key->data));
-
-      int buffer_row_size = CheckAndGetBufferRowSize(buffer_map_iter->second);
-      (void)buffer_row_size;
+      // Validate buffer has at least 2 dimensions (row-major assumption)
+      CheckAndGetBufferRowSize(buffer_map_iter->second);

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/transform/lower_tile_op.cc (1)

435-441: Address same rank mismatch issue and missing cross-tile offset handling.

This address_of path has the same rank mismatch issue as the tvm_access_ptr path when shared buffers have a replicate dimension. Additionally, unlike the tvm_access_ptr path (line 364), this path doesn't add remaining_offset * stride_offset to handle accesses beyond one tile.

🛠️ Suggested fix

       auto forward_indices = layout.value()->Forward(multi_dim_indices);
+      // Handle prepended replicate dimension for shared buffers
+      if (new_shape.size() == forward_indices.size() + 1) {
+        forward_indices.insert(forward_indices.begin(), remaining_offset);
+        remaining_offset = IntImm(remaining_offset->dtype, 0);
+      }
       PrimExpr new_offset = 0;
       PrimExpr stride_offset = 1;
       for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) {
         new_offset += forward_indices[i] * stride_offset;
         stride_offset *= new_shape[i];
       }
+      // Add remaining offset for accesses beyond one tile
+      new_offset += remaining_offset * stride_offset;
       new_offset = analyzer_->Simplify(new_offset);

♻️ Duplicate comments (1)

src/transform/lower_tile_op.cc (1)

354-365: Handle shared-layout replication when output rank increases.

When makeBufferWithLayout prepends a replicate dimension for shared buffers (when replicate_extent > 1), new_shape.size() becomes layout->OutputShape().size() + 1. The loop at line 358 iterates over new_shape.size() and accesses forward_indices[i], but forward_indices only has layout->OutputShape().size() elements, causing an out-of-range access.

The remaining_offset handling at line 364 partially addresses replication, but the loop bounds still don't account for the prepended dimension.

🧹 Nitpick comments (1)

src/transform/lower_tile_op.cc (1)
420-423: Remove dead code: buffer_row_size is computed but unused.

CheckAndGetBufferRowSize is called and the result is immediately discarded with (void). This is likely leftover from previous implementation or incomplete logic. Either use this value or remove the computation.
-      int buffer_row_size = CheckAndGetBufferRowSize(buffer_map_iter->second);
-      (void)buffer_row_size;

…el source prints; update LowerLDSMCopy to handle non-full range cases and clean up unused code in copy.cc; add debug prints in OptimizeForTarget for shared memory allocation.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/op/copy.cc`:
- Around line 776-789: The full-range check uses the wrong region vector:
local_region is set to src_range when is_ldmatrix is true but it should
correspond to local_tensor (dst for LDSM, src for STSM). Change the assignment
of local_region to use dst_range when is_ldmatrix and src_range otherwise
(local_region = is_ldmatrix ? dst_range : src_range), then leave the existing
analyzer->CanProveEqual loop and fallback to LowerNormalCopy(T, analyzer) as-is.

In `@tilelang/engine/phase.py`:
- Around line 275-279: Replace the unconditional prints around the
MergeSharedMemoryAllocations pass with gated debug output: detect the existing
debug/pass-config flag (the same flag used by the C++ pass) or use a module
logger and only dump the IRModule (mod) when that flag/logger is enabled; apply
this change to the block calling
tilelang.transform.MergeSharedMemoryAllocations(enable_aggressive_merge=enable_aggressive_merge)
so the "Before/After MergeSharedMemoryAllocations" messages and IR dumps are
emitted only when debugging is enabled.

src/op/copy.cc

coderabbitai · 2026-01-21T17:10:09Z

tilelang/engine/phase.py

+    print("Before MergeSharedMemoryAllocations")
+    print(mod)
    mod = tilelang.transform.MergeSharedMemoryAllocations(enable_aggressive_merge=enable_aggressive_merge)(mod)
+    print("After MergeSharedMemoryAllocations")
+    print(mod)


⚠️ Potential issue | 🟡 Minor

Gate IR dumps behind a debug flag.

Unconditional print(mod) will spam logs and can be expensive for large IRModules. Please guard this behind a pass-config (ideally the same flag used by the C++ pass) or a debug logger.

💡 Suggested change (guarded debug output)

- print("Before MergeSharedMemoryAllocations") - print(mod) + debug_merge = bool(pass_ctx.config.get("tl.debug_merge_shared_memory_allocations", False)) + if debug_merge: + print("Before MergeSharedMemoryAllocations") + print(mod) mod = tilelang.transform.MergeSharedMemoryAllocations(enable_aggressive_merge=enable_aggressive_merge)(mod) - print("After MergeSharedMemoryAllocations") - print(mod) + if debug_merge: + print("After MergeSharedMemoryAllocations") + print(mod)

🤖 Prompt for AI Agents

In `@tilelang/engine/phase.py` around lines 275 - 279, Replace the unconditional prints around the MergeSharedMemoryAllocations pass with gated debug output: detect the existing debug/pass-config flag (the same flag used by the C++ pass) or use a module logger and only dump the IRModule (mod) when that flag/logger is enabled; apply this change to the block calling tilelang.transform.MergeSharedMemoryAllocations(enable_aggressive_merge=enable_aggressive_merge) so the "Before/After MergeSharedMemoryAllocations" messages and IR dumps are emitted only when debugging is enabled.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/transform/lower_tile_op.cc (1)

466-472: Same rank mismatch issue in address_of path.

The address_of handling has the same potential out-of-range access when makeBufferWithLayout prepends a replicate dimension. Apply the same fix as suggested for the tvm_access_ptr path.

🛠️ Proposed fix

       auto forward_indices = layout.value()->Forward(multi_dim_indices);
+      // Handle replicate dimension if present
+      if (new_shape.size() == forward_indices.size() + 1) {
+        int64_t layout_extent = 1;
+        for (const auto& s : layout.value()->OutputShape()) {
+          const auto* imm = s.as<IntImmNode>();
+          ICHECK(imm) << "Layout output shape must be constant integer";
+          layout_extent *= imm->value;
+        }
+        PrimExpr replicate_idx =
+            floordiv(smem_offset, IntImm(smem_offset->dtype, layout_extent));
+        forward_indices.insert(forward_indices.begin(), replicate_idx);
+      } else {
+        ICHECK_EQ(new_shape.size(), forward_indices.size())
+            << "Layout output rank mismatch for remapped buffer";
+      }
+
       PrimExpr new_offset = 0;
       PrimExpr stride_offset = 1;
       for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) {
         new_offset += forward_indices[i] * stride_offset;
         stride_offset *= new_shape[i];
       }

♻️ Duplicate comments (1)

src/transform/lower_tile_op.cc (1)

369-376: Handle shared-layout replication when output rank increases.

makeBufferWithLayout can prepend a replicate dimension for shared buffers (see lines 74-76). When that happens, new_shape.size() becomes forward_indices.size() + 1, but the loop indexes forward_indices with the larger rank, causing out-of-range access or incorrect offsets.

🛠️ Proposed fix

       // Apply layout transformation
       auto forward_indices = layout->Forward(multi_dim_indices);
+      // If makeBufferWithLayout prepended a replicate dim, prepend it here too.
+      if (new_shape.size() == forward_indices.size() + 1) {
+        int64_t layout_extent = 1;
+        for (const auto& s : layout->OutputShape()) {
+          const auto* imm = s.as<IntImmNode>();
+          ICHECK(imm) << "Layout output shape must be constant integer";
+          layout_extent *= imm->value;
+        }
+        PrimExpr replicate_idx =
+            floordiv(elem_offset, IntImm(elem_offset->dtype, layout_extent));
+        forward_indices.insert(forward_indices.begin(), replicate_idx);
+      } else {
+        ICHECK_EQ(new_shape.size(), forward_indices.size())
+            << "Layout output rank mismatch for remapped buffer";
+      }
+
       PrimExpr new_offset = 0;
       PrimExpr stride_offset = 1;
       for (int i = static_cast<int>(new_shape.size()) - 1; i >= 0; --i) {
         new_offset += forward_indices[i] * stride_offset;
         stride_offset *= new_shape[i];
       }

🧹 Nitpick comments (2)

src/transform/lower_tile_op.cc (2)
593-600: Clarify the intent with explicit reassignment after mutation.

After calling CopyOnWrite() and modifying call_node->args, the code reads back from call->args[5]. While this works because CopyOnWrite() modifies the object in-place when uniquely referenced, the pattern is subtle and could be clearer. Consider explicitly re-assigning for readability:
♻️ Suggested improvement for clarity
         if (!load_expr.same_as(access_ptr_call->args[0])) {
           auto call_node = call.CopyOnWrite();
-          call_node->args.Set(
-              5, Call(access_ptr_call->dtype, access_ptr_call->op, {load_expr},
-                      access_ptr_call->annotations, access_ptr_call->span));
-          access_ptr_call = Downcast<Call>(call->args[5]);
-          access_ptr = call->args[5];
+          PrimExpr new_access_call =
+              Call(access_ptr_call->dtype, access_ptr_call->op, {load_expr},
+                   access_ptr_call->annotations, access_ptr_call->span);
+          call_node->args.Set(5, new_access_call);
+          access_ptr = new_access_call;
+          access_ptr_call = Downcast<Call>(new_access_call);
         }
561-563: Consider clarifying the is_ptx_ recursion guard intent.

The early return when is_ptx_ is true acts as a recursion guard during child visitation. This prevents double-processing but assumes PTX intrinsics won't contain nested PTX calls that need independent transformation. While this is likely safe in practice, a brief comment explaining this invariant would help maintainability.
📝 Suggested comment
   if (is_ptx_) {
+    // Recursion guard: when visiting children of a PTX intrinsic, skip
+    // re-processing any nested PTX calls (not expected in practice).
     return Downcast<Call>(op);
   }

…hase.py

bugfix

af80b03

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

LeiWang1999 added 2 commits January 19, 2026 13:48

lint fix

3e6d3e6

test fix

d452b5e

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

test fix

22805b7

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

LeiWang1999 added 5 commits January 20, 2026 03:52

test fix

377c385

introduce make_gemm_fragment_8x8 to frontend

5338618

Enhance sparse_mla_bwd and example_gqa_decode_varlen_logits with kern…

0a81280

…el source prints; update LowerLDSMCopy to handle non-full range cases and clean up unused code in copy.cc; add debug prints in OptimizeForTarget for shared memory allocation.

fix

134308f

lint fix

492f53a

coderabbitai bot reviewed Jan 21, 2026

View reviewed changes

LeiWang1999 added 2 commits January 23, 2026 18:26

Merge branch 'main' of https://github.com/tile-ai/tilelang into fix_0119

79b4bd2

fix

6a2cf82

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

LeiWang1999 added 2 commits January 26, 2026 13:02

handle tl::ptx_ldmatrix

76c556c

Enable benchmarking in sparse_mla_bwd.py and remove debug prints in p…

00bc4b3

…hase.py

LeiWang1999 merged commit f5790f5 into tile-ai:main Jan 26, 2026
2 of 3 checks passed

This was referenced Jan 27, 2026

[BugFix] Fix boundary check in loop layout #1732

Closed

[Feature] Add fully replicated layout interface in annotation layout #1772

Merged

coderabbitai bot mentioned this pull request Feb 3, 2026

[Refactor] Reorganize ParallelOp code structure and move ProveFragmentContains to layout utils #1779

Merged

Conversation

LeiWang1999 commented Jan 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

coderabbitai bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Jan 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 18, 2026 •

edited

Loading