Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Nov 4, 2025

  • Added support for BufferLoad in the fill function to handle different buffer types.
  • Updated Fill class to process region descriptors and buffer regions, improving flexibility in buffer handling.
  • Introduced checks for static bounds in region definitions to ensure safety during operations.
  • Refactored loop induction variable handling in FillNode to accommodate sliced regions.

Summary by CodeRabbit

  • New Features

    • Fill operation now accepts additional input forms and normalizes region-based inputs for broader use.
    • Region-based handling unified so fill always targets an explicit region descriptor.
  • Bug Fixes

    • Improved region bound checks with safer static validation and conditioned upper-bound checks.
    • Corrected SIMT loop indexing to respect sliced regions.
    • Hardened lowering behavior for unsupported destination scopes.
  • Tests

    • Added tests covering static and dynamic region fill behavior on CUDA.

- Added support for `BufferLoad` in the `fill` function to handle different buffer types.
- Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
- Introduced checks for static bounds in region definitions to ensure safety during operations.
- Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 4, 2025

Walkthrough

Fill now accepts BufferLoad in addition to Buffer and BufferRegion; language-level fill normalizes inputs into region descriptors and emits a region-based tl.fill. C++ lowering (src/op/fill.cc) implements multi-case region extraction, static-aware bounds checks, SIMT loop index offsets for sliced regions, and conservative failure handling. New tests exercise static and dynamic region fills on CUDA.

Changes

Cohort / File(s) Summary
Core Fill Implementation
src/op/fill.cc
Multi-case Fill constructor handling: region-descriptor calls, explicit BufferRegion, BufferLoad extraction, and access-pointer full-buffer case; use IntImmNode for mins/extents and perform static bounds checks only when minima/extents are statically known; condition upper-bound checks on destination shape knowledge; offset SIMT loop indices by region minima; fatal log + return-default on unsupported destination scope.
Language Binding
tilelang/language/fill.py
Signature updated to accept tir.BufferLoad; normalize tir.Var to let values; unify Buffer/BufferRegion/BufferLoad into region descriptors (with per-dimension extents fallback) and always emit tl.fill(region_call, value).
Test Coverage
testing/python/issue/test_tilelang_issue_1008.py
New tests: test_fill_with_static_region_kernel and test_fill_with_dynamic_region_kernel — create 256-element CUDA int64 tensor, call tilelang.jit kernels (128 threads) that fill with zeros; tests disable Warp specialization and TMA lowering to exercise new lowering paths.

Sequence Diagram(s)

sequenceDiagram
  participant Py as Python test / tilelang API
  participant TL as tilelang.fill (language)
  participant IR as tl.fill intrinsic (IR)
  participant LL as C++ lowering (src/op/fill.cc)
  participant GPU as Generated kernel / SIMT

  Py->>TL: call fill(buffer|buffer_region|buffer_load, value)
  TL-->>IR: normalize input -> emit tl.fill(region_call, value)
  IR->>LL: lowering of tl.fill(region_call)
  note right of LL `#DDEBF7`: Region extraction branches
  LL->>LL: detect case (RegionOp / BufferRegion / BufferLoad / AccessPtr)
  LL->>LL: static-aware bounds checks (IntImm checks)
  LL->>LL: MakeSIMTLoop (offset indices by region.min)
  LL-->>GPU: emit kernel code (or fatal + default on unsupported scope)
  GPU-->>Py: kernel executes, fills region
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay extra attention to src/op/fill.cc bounds-checking conditions and correct use of IntImmNode.
  • Verify SIMT index offset logic in MakeSIMTLoop for all dimensionalities and edge cases.
  • Validate tilelang/language/fill.py BufferLoad fallback path and correctness of region extents derivation.
  • Confirm tests properly disable passes and exercise both static and dynamic region paths.

Possibly related PRs

Poem

🐇 I hop through regions, buffers in tow,
I turn loads to ranges where values flow;
SIMT indices nudged by each slice I see,
Bounds checked and tidy — a filled tapestry. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main enhancement: adding support for various buffer types to the fill operation, which is the core focus of changes across all three modified files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fab31a and 5329a12.

📒 Files selected for processing (1)
  • testing/python/issue/test_tilelang_issue_1008.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • testing/python/issue/test_tilelang_issue_1008.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@LeiWang1999
Copy link
Member Author

for issue #1008

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/op/fill.cc (1)

93-99: Avoid null-dereferencing Ramp stride.

ramp->stride.as<IntImmNode>() can return nullptr; dereferencing it before the check will crash when the stride isn’t a compile-time constant. Grab the pointer into a temp, verify it’s non-null, then check the value.

-        CHECK(ramp->stride.as<IntImmNode>()->value == 1)
-            << "Only stride 1 ramps are supported";
+        const auto* stride_imm = ramp->stride.as<IntImmNode>();
+        CHECK(stride_imm && stride_imm->value == 1)
+            << "Only stride 1 ramps are supported";
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1768cbe and 3fab31a.

📒 Files selected for processing (3)
  • src/op/fill.cc (7 hunks)
  • testing/python/issue/test_tilelang_issue_1008.py (1 hunks)
  • tilelang/language/fill.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/op/fill.cc (3)
src/transform/pipeline_planning.cc (4)
  • region (470-477)
  • region (470-470)
  • region (480-487)
  • region (480-480)
tilelang/ir.py (1)
  • RegionOp (73-74)
tilelang/language/tir/op.py (1)
  • tvm_access_ptr (651-676)
tilelang/language/fill.py (3)
tilelang/language/utils.py (4)
  • buffer_to_tile_region (30-42)
  • buffer_region_to_tile_region (71-88)
  • buffer_load_to_tile_region (45-68)
  • region (8-27)
tilelang/language/frame.py (2)
  • has_let_value (189-198)
  • get_let_value (201-210)
tilelang/utils/language.py (1)
  • get_buffer_region_from_load (137-159)
testing/python/issue/test_tilelang_issue_1008.py (3)
tilelang/transform/pass_config.py (1)
  • PassConfigKey (6-144)
tilelang/language/symbolics.py (1)
  • symbolic (25-27)
tilelang/language/fill.py (1)
  • fill (14-47)
🪛 Ruff (0.14.3)
testing/python/issue/test_tilelang_issue_1008.py

17-17: Undefined name int64

(F821)


18-18: Local variable pid is assigned to but never used

Remove assignment to unused variable pid

(F841)


33-33: Undefined name int64

(F821)


34-34: Local variable pid is assigned to but never used

Remove assignment to unused variable pid

(F841)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint

Comment on lines +35 to +37
a, b = T.alloc_var('int'), T.alloc_var('int')
T.fill(x[a:b], 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Initialize dynamic slice bounds before calling T.fill.

T.alloc_var only gives you unconstrained scalars; unless you bind them immediately, they carry garbage when the kernel runs. Feeding x[a:b] into tl.fill right after allocation leaves the start/end of the region undefined, so the generated kernel can write outside x. Please derive the bounds from known expressions (e.g., reuse num_tokens) instead of uninitialized vars.

-            a, b = T.alloc_var('int'), T.alloc_var('int')
-            T.fill(x[a:b], 0)
+            start = T.max(num_tokens - 128, 0)
+            T.fill(x[start:num_tokens], 0)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In testing/python/issue/test_tilelang_issue_1008.py around lines 35 to 37, the
dynamic slice bounds a and b are allocated with T.alloc_var but never
initialized before calling T.fill(x[a:b], 0), leaving start/end undefined and
risking out-of-bounds writes; initialize those alloc_var scalars (e.g., set
a.value = 0 and b.value = num_tokens or assign them from an existing expression)
or replace x[a:b] with a slice derived directly from a known expression like
x[0:num_tokens] so the kernel receives concrete bounds before calling T.fill.

@LeiWang1999 LeiWang1999 merged commit a03df60 into tile-ai:main Nov 4, 2025
4 of 6 checks passed
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
…e-ai#1189)

* [Feature] Enhance fill operation to support various buffer types

- Added support for `BufferLoad` in the `fill` function to handle different buffer types.
- Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
- Introduced checks for static bounds in region definitions to ensure safety during operations.
- Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.

* lint fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant