Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Oct 7, 2025

for i, j in T.Parallel(block_M, block_N):
    T.copy(A[by * block_M + i, bx * block_N + j], B[by * block_M + i, bx * block_N + j])

will be lowered into

for i, j in T.Parallel(block_M, block_N):
     B[by * block_M + i, bx * block_N + j] = A[by * block_M + i, bx * block_N + j]

Summary by CodeRabbit

  • Bug Fixes

    • Resolved a copy operation edge case with unknown extents by safely lowering to a direct store, improving robustness and preventing assertion failures in certain kernels.
  • Tests

    • Added tests for buffer-load copy and parallel copy patterns, including tiled execution and GPU runs, with correctness checks against reference outputs.
    • Introduced test helpers to compile and run copy kernels across data types and shapes, enhancing coverage of compilation and runtime behavior.

…n tilelang

- Introduced new functions for buffer load copy with stride and parallel execution.
- Enhanced the copy logic in `copy.py` to simplify nested if statements for BufferLoad nodes.
- Added corresponding test cases for the new buffer load functionalities.
@github-actions
Copy link

github-actions bot commented Oct 7, 2025

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

@LeiWang1999
Copy link
Member Author

also fix for issue #837

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

Walkthrough

Adds early-lowering logic in tilelang copy to emit a BufferStore when both operands are BufferLoad with undetermined extents, altering control flow. Introduces new tests validating buffer load copy, including a parallel grid-tiling variant, with compilation and runtime checks.

Changes

Cohort / File(s) Summary
Core copy lowering
tilelang/language/copy.py
Adds an early return path: when both operands are BufferLoad and extents are None, lower directly to BufferStore instead of proceeding through the generic assertion/extent deduction path. No API changes.
Tests for copy and parallel copy
testing/python/language/test_tilelang_language_copy.py
Adds test helpers and cases: single bufferload copy build; parallel A→B copy via tiled kernel; CUDA compile/run; output equivalence assertions; default configs and dtype handling.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Copy as copy(...)
  participant Lower as Lowering
  participant Store as BufferStore
  participant Generic as GenericPath

  Caller->>Copy: invoke copy(src, dst)
  Copy->>Lower: analyze operands and extents
  alt both operands are BufferLoad AND extents are None
    note over Lower: New early-lowering path
    Lower-->>Store: emit BufferStore(dst, src)
    Store-->>Caller: return
  else
    Lower-->>Generic: proceed with generic flow (assertions/extent deduction)
    Generic-->>Caller: continue existing lowering
  end
Loading
sequenceDiagram
  autonumber
  participant Test as test_tilelang_copy_buffer_load_with_parallel
  participant Build as T.Kernel build
  participant CUDA as CUDA runtime
  participant Run as Kernel launch
  participant Assert as NumPy/Torch assert

  Test->>Build: construct parallel tiled kernel (M,N, block_M, block_N)
  Build-->>CUDA: compile for CUDA
  Test->>Run: execute with random A
  Run-->>Test: produce B
  Test->>Assert: compare B ≈ A within tol
  Assert-->>Test: pass/fail
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit taps code with a gentle thrum,
Buffers now store where loads had come.
Early we hop, no hedging delay,
Tiles align, threads dance in array.
CUDA winds hum a parallel tune—
Copy made swift, beneath the moon. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly summarizes the main change, which is adding support for copy lowering of buffer load operations with scalar indices. It accurately reflects the PR objectives and is specific to the core functionality without extraneous details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tilelang/language/copy.py (1)

48-54: Scalar copy lowering looks correct.

The early return for BufferLoad-to-BufferLoad with undetermined extents correctly generates a direct BufferStore, implementing the scalar copy optimization described in the PR.

A few observations:

  • The logic correctly checks both extents are None and both operands are BufferLoad before returning
  • The BufferStore signature (dst.buffer, src, dst.indices) is correct
  • The coalesced_width, disable_tma, and eviction_policy parameters are bypassed in this path, which is appropriate for scalar operations where bulk-transfer semantics don't apply

However, note that the function's docstring states Returns: tir.Call, but this new path returns tir.BufferStore. Consider updating the docstring to reflect this conditional return behavior.

Apply this diff to update the docstring:

     Returns:
-        tir.Call: A handle to the copy operation
+        Union[tir.Call, tir.BufferStore]: A handle to the copy operation (tir.BufferStore for scalar copies, tir.Call otherwise)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91d5ef5 and f36286b.

📒 Files selected for processing (2)
  • testing/python/language/test_tilelang_language_copy.py (1 hunks)
  • tilelang/language/copy.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
testing/python/language/test_tilelang_language_copy.py (4)
tilelang/language/allocate.py (1)
  • alloc_local (39-50)
tilelang/language/copy.py (1)
  • copy (10-86)
tilelang/jit/__init__.py (1)
  • compile (33-86)
tilelang/transform/pass_config.py (1)
  • PassConfigKey (6-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-amd
  • GitHub Check: format-check
🔇 Additional comments (2)
testing/python/language/test_tilelang_language_copy.py (2)

89-113: Compilation-only test for indirection pattern.

This test validates compilation for the scalar copy case with indirection (indices[pid]idx[0]), but doesn't execute or verify runtime behavior.

Is runtime validation intentionally skipped? If the indirection pattern is challenging to test end-to-end, consider documenting why (e.g., requires special setup, known limitation, etc.).


120-155: LGTM! Parallel copy test matches PR example.

The test correctly validates the scalar copy lowering inside a parallel loop, matching the exact pattern shown in the PR description:

T.copy(A[by * block_M + i, bx * block_N + j], B[by * block_M + i, bx * block_N + j])

The runtime assertion confirms output matches input within tolerances.

@LeiWang1999 LeiWang1999 merged commit c61971e into tile-ai:main Oct 7, 2025
6 of 7 checks passed
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
…n tilelang (tile-ai#946)

- Introduced new functions for buffer load copy with stride and parallel execution.
- Enhanced the copy logic in `copy.py` to simplify nested if statements for BufferLoad nodes.
- Added corresponding test cases for the new buffer load functionalities.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant