[Enhancement] Support Copy for Buffer Load witih scalar indices #946

LeiWang1999 · 2025-10-07T05:02:44Z

for i, j in T.Parallel(block_M, block_N):
    T.copy(A[by * block_M + i, bx * block_N + j], B[by * block_M + i, bx * block_N + j])

will be lowered into

for i, j in T.Parallel(block_M, block_N):
     B[by * block_M + i, bx * block_N + j] = A[by * block_M + i, bx * block_N + j]

Summary by CodeRabbit

Bug Fixes
- Resolved a copy operation edge case with unknown extents by safely lowering to a direct store, improving robustness and preventing assertion failures in certain kernels.
Tests
- Added tests for buffer-load copy and parallel copy patterns, including tiled execution and GPU runs, with correctness checks against reference outputs.
- Introduced test helpers to compile and run copy kernels across data types and shapes, enhancing coverage of compilation and runtime behavior.

…n tilelang - Introduced new functions for buffer load copy with stride and parallel execution. - Enhanced the copy logic in `copy.py` to simplify nested if statements for BufferLoad nodes. - Added corresponding test cases for the new buffer load functionalities.

github-actions · 2025-10-07T05:02:55Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

LeiWang1999 · 2025-10-07T05:02:55Z

also fix for issue #837

coderabbitai · 2025-10-07T05:03:18Z

Walkthrough

Adds early-lowering logic in tilelang copy to emit a BufferStore when both operands are BufferLoad with undetermined extents, altering control flow. Introduces new tests validating buffer load copy, including a parallel grid-tiling variant, with compilation and runtime checks.

Changes

Cohort / File(s)	Summary
Core copy lowering `tilelang/language/copy.py`	Adds an early return path: when both operands are BufferLoad and extents are None, lower directly to BufferStore instead of proceeding through the generic assertion/extent deduction path. No API changes.
Tests for copy and parallel copy `testing/python/language/test_tilelang_language_copy.py`	Adds test helpers and cases: single bufferload copy build; parallel A→B copy via tiled kernel; CUDA compile/run; output equivalence assertions; default configs and dtype handling.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Copy as copy(...)
  participant Lower as Lowering
  participant Store as BufferStore
  participant Generic as GenericPath

  Caller->>Copy: invoke copy(src, dst)
  Copy->>Lower: analyze operands and extents
  alt both operands are BufferLoad AND extents are None
    note over Lower: New early-lowering path
    Lower-->>Store: emit BufferStore(dst, src)
    Store-->>Caller: return
  else
    Lower-->>Generic: proceed with generic flow (assertions/extent deduction)
    Generic-->>Caller: continue existing lowering
  end

sequenceDiagram
  autonumber
  participant Test as test_tilelang_copy_buffer_load_with_parallel
  participant Build as T.Kernel build
  participant CUDA as CUDA runtime
  participant Run as Kernel launch
  participant Assert as NumPy/Torch assert

  Test->>Build: construct parallel tiled kernel (M,N, block_M, block_N)
  Build-->>CUDA: compile for CUDA
  Test->>Run: execute with random A
  Run-->>Test: produce B
  Test->>Assert: compare B ≈ A within tol
  Assert-->>Test: pass/fail

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit taps code with a gentle thrum,
Buffers now store where loads had come.
Early we hop, no hedging delay,
Tiles align, threads dance in array.
CUDA winds hum a parallel tune—
Copy made swift, beneath the moon. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly summarizes the main change, which is adding support for copy lowering of buffer load operations with scalar indices. It accurately reflects the PR objectives and is specific to the core functionality without extraneous details.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tilelang/language/copy.py (1)
48-54: Scalar copy lowering looks correct.

The early return for BufferLoad-to-BufferLoad with undetermined extents correctly generates a direct BufferStore, implementing the scalar copy optimization described in the PR.

A few observations:

The logic correctly checks both extents are None and both operands are BufferLoad before returning

The BufferStore signature (dst.buffer, src, dst.indices) is correct

The coalesced_width, disable_tma, and eviction_policy parameters are bypassed in this path, which is appropriate for scalar operations where bulk-transfer semantics don't apply

However, note that the function's docstring states Returns: tir.Call, but this new path returns tir.BufferStore. Consider updating the docstring to reflect this conditional return behavior.

Apply this diff to update the docstring:
     Returns:
-        tir.Call: A handle to the copy operation
+        Union[tir.Call, tir.BufferStore]: A handle to the copy operation (tir.BufferStore for scalar copies, tir.Call otherwise)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91d5ef5 and f36286b.

📒 Files selected for processing (2)

testing/python/language/test_tilelang_language_copy.py (1 hunks)
tilelang/language/copy.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

testing/python/language/test_tilelang_language_copy.py (4)

tilelang/language/allocate.py (1)

alloc_local (39-50)

tilelang/language/copy.py (1)

copy (10-86)

tilelang/jit/__init__.py (1)

compile (33-86)

tilelang/transform/pass_config.py (1)

PassConfigKey (6-101)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-amd
GitHub Check: format-check

🔇 Additional comments (2)

testing/python/language/test_tilelang_language_copy.py (2)
89-113: Compilation-only test for indirection pattern.

This test validates compilation for the scalar copy case with indirection (indices[pid] → idx[0]), but doesn't execute or verify runtime behavior.

Is runtime validation intentionally skipped? If the indirection pattern is challenging to test end-to-end, consider documenting why (e.g., requires special setup, known limitation, etc.).

120-155: LGTM! Parallel copy test matches PR example.

The test correctly validates the scalar copy lowering inside a parallel loop, matching the exact pattern shown in the PR description:
T.copy(A[by * block_M + i, bx * block_N + j], B[by * block_M + i, bx * block_N + j])
The runtime assertion confirms output matches input within tolerances.

…n tilelang (tile-ai#946) - Introduced new functions for buffer load copy with stride and parallel execution. - Enhanced the copy logic in `copy.py` to simplify nested if statements for BufferLoad nodes. - Added corresponding test cases for the new buffer load functionalities.

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

LeiWang1999 merged commit c61971e into tile-ai:main Oct 7, 2025
6 of 7 checks passed

LeiWang1999 mentioned this pull request Oct 7, 2025

[Bug] Can't deduce copy extents from args #837

Closed

This was referenced Oct 24, 2025

[Language] Initial version of tilelang frontend v2 #1120

Merged

[BugFix] alloc_var init failed to handle complex expression #1144

Merged

This was referenced Nov 3, 2025

[Fix] fix type imcompatible error in #1115 #1180

Merged

[Feat] Add swap like grammar in tuple assignment #1185

Merged

[Fix] Remove unsupported type params #1186

Merged

[Feat] Add support for T.serial with step and negative step #1188

Merged

coderabbitai bot mentioned this pull request Nov 4, 2025

[Feature] Enhance fill operation to support various buffer types #1189

Merged

This was referenced Nov 5, 2025

[Feat] Add A Pass to Handle Negative Index #1192

Merged

[Fix] Fix buffer re-import typo in tilelang.languge #1214

Merged

[Fix] Fix a type that make wrong T.macro backtrace #1234

Merged

kurisu6912 mentioned this pull request Nov 12, 2025

[Language] Add type stubs for tir op #1239

Merged

coderabbitai bot mentioned this pull request Nov 13, 2025

[Refactor] Update buffer handling in copy and atomic operations #1247

Merged

This was referenced Nov 21, 2025

[Feat] Add missing support for uint32x2, add unsigned implicit cast in bitwise op, add T.Ref as macro annotation #1302

Closed

[Fix] Remove unused let_bindings_ in CodeGenC to fix #1300 #1305

Merged

[Fix] Fix frame scope error in T.macro #1308

Merged

coderabbitai bot mentioned this pull request Nov 26, 2025

[Refactor] Enhance CopyNode's IterVar Creation and Range Handling #1346

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Support Copy for Buffer Load witih scalar indices #946

[Enhancement] Support Copy for Buffer Load witih scalar indices #946

Uh oh!

LeiWang1999 commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

LeiWang1999 commented Oct 7, 2025

Uh oh!

coderabbitai bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Enhancement] Support Copy for Buffer Load witih scalar indices #946

[Enhancement] Support Copy for Buffer Load witih scalar indices #946

Uh oh!

Conversation

LeiWang1999 commented Oct 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

LeiWang1999 commented Oct 7, 2025

Uh oh!

coderabbitai bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 7, 2025 •

edited

Loading