[Feature] Enhance fill operation to support various buffer types #1189

LeiWang1999 · 2025-11-04T06:27:33Z

Added support for BufferLoad in the fill function to handle different buffer types.
Updated Fill class to process region descriptors and buffer regions, improving flexibility in buffer handling.
Introduced checks for static bounds in region definitions to ensure safety during operations.
Refactored loop induction variable handling in FillNode to accommodate sliced regions.

Summary by CodeRabbit

New Features
- Fill operation now accepts additional input forms and normalizes region-based inputs for broader use.
- Region-based handling unified so fill always targets an explicit region descriptor.
Bug Fixes
- Improved region bound checks with safer static validation and conditioned upper-bound checks.
- Corrected SIMT loop indexing to respect sliced regions.
- Hardened lowering behavior for unsupported destination scopes.
Tests
- Added tests covering static and dynamic region fill behavior on CUDA.

- Added support for `BufferLoad` in the `fill` function to handle different buffer types. - Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling. - Introduced checks for static bounds in region definitions to ensure safety during operations. - Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.

github-actions · 2025-11-04T06:27:43Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-11-04T06:27:43Z

Walkthrough

Fill now accepts BufferLoad in addition to Buffer and BufferRegion; language-level fill normalizes inputs into region descriptors and emits a region-based tl.fill. C++ lowering (src/op/fill.cc) implements multi-case region extraction, static-aware bounds checks, SIMT loop index offsets for sliced regions, and conservative failure handling. New tests exercise static and dynamic region fills on CUDA.

Changes

Cohort / File(s)	Summary
Core Fill Implementation `src/op/fill.cc`	Multi-case Fill constructor handling: region-descriptor calls, explicit `BufferRegion`, `BufferLoad` extraction, and access-pointer full-buffer case; use `IntImmNode` for mins/extents and perform static bounds checks only when minima/extents are statically known; condition upper-bound checks on destination shape knowledge; offset SIMT loop indices by region minima; fatal log + return-default on unsupported destination scope.
Language Binding `tilelang/language/fill.py`	Signature updated to accept `tir.BufferLoad`; normalize `tir.Var` to let values; unify Buffer/BufferRegion/BufferLoad into region descriptors (with per-dimension extents fallback) and always emit `tl.fill(region_call, value)`.
Test Coverage `testing/python/issue/test_tilelang_issue_1008.py`	New tests: `test_fill_with_static_region_kernel` and `test_fill_with_dynamic_region_kernel` — create 256-element CUDA int64 tensor, call tilelang.jit kernels (128 threads) that fill with zeros; tests disable Warp specialization and TMA lowering to exercise new lowering paths.

Sequence Diagram(s)

sequenceDiagram
  participant Py as Python test / tilelang API
  participant TL as tilelang.fill (language)
  participant IR as tl.fill intrinsic (IR)
  participant LL as C++ lowering (src/op/fill.cc)
  participant GPU as Generated kernel / SIMT

  Py->>TL: call fill(buffer|buffer_region|buffer_load, value)
  TL-->>IR: normalize input -> emit tl.fill(region_call, value)
  IR->>LL: lowering of tl.fill(region_call)
  note right of LL `#DDEBF7`: Region extraction branches
  LL->>LL: detect case (RegionOp / BufferRegion / BufferLoad / AccessPtr)
  LL->>LL: static-aware bounds checks (IntImm checks)
  LL->>LL: MakeSIMTLoop (offset indices by region.min)
  LL-->>GPU: emit kernel code (or fatal + default on unsupported scope)
  GPU-->>Py: kernel executes, fills region

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pay extra attention to src/op/fill.cc bounds-checking conditions and correct use of IntImmNode.
Verify SIMT index offset logic in MakeSIMTLoop for all dimensionalities and edge cases.
Validate tilelang/language/fill.py BufferLoad fallback path and correctness of region extents derivation.
Confirm tests properly disable passes and exercise both static and dynamic region paths.

Possibly related PRs

Allow fill global buffer #774 — Related Fill lowering changes for global destination-scope handling; overlaps in destination-scope validation and lowering behavior.
[Enhancement] Support Copy for Buffer Load witih scalar indices #946 — Introduces BufferLoad-specialized lowering paths in copy ops; conceptually related to adding BufferLoad support to fill.

Poem

🐇 I hop through regions, buffers in tow,
I turn loads to ranges where values flow;
SIMT indices nudged by each slice I see,
Bounds checked and tidy — a filled tapestry. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main enhancement: adding support for various buffer types to the fill operation, which is the core focus of changes across all three modified files.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fab31a and 5329a12.

📒 Files selected for processing (1)

testing/python/issue/test_tilelang_issue_1008.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

testing/python/issue/test_tilelang_issue_1008.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Quick Lint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

LeiWang1999 · 2025-11-04T06:27:44Z

for issue #1008

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/op/fill.cc (1)
93-99: Avoid null-dereferencing Ramp stride.

ramp->stride.as<IntImmNode>() can return nullptr; dereferencing it before the check will crash when the stride isn’t a compile-time constant. Grab the pointer into a temp, verify it’s non-null, then check the value.
-        CHECK(ramp->stride.as<IntImmNode>()->value == 1)
-            << "Only stride 1 ramps are supported";
+        const auto* stride_imm = ramp->stride.as<IntImmNode>();
+        CHECK(stride_imm && stride_imm->value == 1)
+            << "Only stride 1 ramps are supported";

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1768cbe and 3fab31a.

📒 Files selected for processing (3)

src/op/fill.cc (7 hunks)
testing/python/issue/test_tilelang_issue_1008.py (1 hunks)
tilelang/language/fill.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

src/op/fill.cc (3)

src/transform/pipeline_planning.cc (4)

region (470-477)

region (470-470)

region (480-487)

region (480-480)

tilelang/ir.py (1)

RegionOp (73-74)

tilelang/language/tir/op.py (1)

tvm_access_ptr (651-676)

tilelang/language/fill.py (3)

tilelang/language/utils.py (4)

buffer_to_tile_region (30-42)

buffer_region_to_tile_region (71-88)

buffer_load_to_tile_region (45-68)

region (8-27)

tilelang/language/frame.py (2)

has_let_value (189-198)

get_let_value (201-210)

tilelang/utils/language.py (1)

get_buffer_region_from_load (137-159)

testing/python/issue/test_tilelang_issue_1008.py (3)

tilelang/transform/pass_config.py (1)

PassConfigKey (6-144)

tilelang/language/symbolics.py (1)

symbolic (25-27)

tilelang/language/fill.py (1)

fill (14-47)

🪛 Ruff (0.14.3)

testing/python/issue/test_tilelang_issue_1008.py

17-17: Undefined name int64

(F821)

18-18: Local variable pid is assigned to but never used

Remove assignment to unused variable pid

(F841)

33-33: Undefined name int64

(F821)

34-34: Local variable pid is assigned to but never used

Remove assignment to unused variable pid

(F841)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Quick Lint

coderabbitai · 2025-11-04T06:38:12Z

testing/python/issue/test_tilelang_issue_1008.py

+            a, b = T.alloc_var('int'), T.alloc_var('int')
+            T.fill(x[a:b], 0)
+


⚠️ Potential issue | 🔴 Critical

Initialize dynamic slice bounds before calling T.fill.

T.alloc_var only gives you unconstrained scalars; unless you bind them immediately, they carry garbage when the kernel runs. Feeding x[a:b] into tl.fill right after allocation leaves the start/end of the region undefined, so the generated kernel can write outside x. Please derive the bounds from known expressions (e.g., reuse num_tokens) instead of uninitialized vars.

- a, b = T.alloc_var('int'), T.alloc_var('int') - T.fill(x[a:b], 0) + start = T.max(num_tokens - 128, 0) + T.fill(x[start:num_tokens], 0)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In testing/python/issue/test_tilelang_issue_1008.py around lines 35 to 37, the dynamic slice bounds a and b are allocated with T.alloc_var but never initialized before calling T.fill(x[a:b], 0), leaving start/end undefined and risking out-of-bounds writes; initialize those alloc_var scalars (e.g., set a.value = 0 and b.value = num_tokens or assign them from an existing expression) or replace x[a:b] with a slice derived directly from a known expression like x[0:num_tokens] so the kernel receives concrete bounds before calling T.fill.

…e-ai#1189) * [Feature] Enhance fill operation to support various buffer types - Added support for `BufferLoad` in the `fill` function to handle different buffer types. - Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling. - Introduced checks for static bounds in region definitions to ensure safety during operations. - Refactored loop induction variable handling in `FillNode` to accommodate sliced regions. * lint fix

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

lint fix

5329a12

LeiWang1999 merged commit a03df60 into tile-ai:main Nov 4, 2025
4 of 6 checks passed

coderabbitai bot mentioned this pull request Nov 4, 2025

[Refactor] Improve Python3.9 compatibility for ParamSpec and Self #1190

Merged

This was referenced Nov 5, 2025

[Feat] Add A Pass to Handle Negative Index #1192

Merged

[Fix] Fix buffer re-import typo in tilelang.languge #1214

Merged

[Fix] Fix a type that make wrong T.macro backtrace #1234

Merged

kurisu6912 mentioned this pull request Nov 12, 2025

[Language] Add type stubs for tir op #1239

Merged

coderabbitai bot mentioned this pull request Nov 13, 2025

[Refactor] Update buffer handling in copy and atomic operations #1247

Merged

This was referenced Nov 21, 2025

[Feat] Add missing support for uint32x2, add unsigned implicit cast in bitwise op, add T.Ref as macro annotation #1302

Closed

[Fix] Remove unused let_bindings_ in CodeGenC to fix #1300 #1305

Merged

[Fix] Fix frame scope error in T.macro #1308

Merged

coderabbitai bot mentioned this pull request Nov 25, 2025

[Refactor] Phaseout vmap for Tile Operators #1334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Enhance fill operation to support various buffer types #1189

[Feature] Enhance fill operation to support various buffer types #1189

Uh oh!

LeiWang1999 commented Nov 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

coderabbitai bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

LeiWang1999 commented Nov 4, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		a, b = T.alloc_var('int'), T.alloc_var('int')
		T.fill(x[a:b], 0)

[Feature] Enhance fill operation to support various buffer types #1189

[Feature] Enhance fill operation to support various buffer types #1189

Uh oh!

Conversation

LeiWang1999 commented Nov 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

coderabbitai bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

LeiWang1999 commented Nov 4, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Nov 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 4, 2025 •

edited

Loading