-
Notifications
You must be signed in to change notification settings - Fork 332
[Feature] Enhance fill operation to support various buffer types #1189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added support for `BufferLoad` in the `fill` function to handle different buffer types. - Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling. - Introduced checks for static bounds in region definitions to ensure safety during operations. - Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
WalkthroughFill now accepts BufferLoad in addition to Buffer and BufferRegion; language-level fill normalizes inputs into region descriptors and emits a region-based tl.fill. C++ lowering (src/op/fill.cc) implements multi-case region extraction, static-aware bounds checks, SIMT loop index offsets for sliced regions, and conservative failure handling. New tests exercise static and dynamic region fills on CUDA. Changes
Sequence Diagram(s)sequenceDiagram
participant Py as Python test / tilelang API
participant TL as tilelang.fill (language)
participant IR as tl.fill intrinsic (IR)
participant LL as C++ lowering (src/op/fill.cc)
participant GPU as Generated kernel / SIMT
Py->>TL: call fill(buffer|buffer_region|buffer_load, value)
TL-->>IR: normalize input -> emit tl.fill(region_call, value)
IR->>LL: lowering of tl.fill(region_call)
note right of LL `#DDEBF7`: Region extraction branches
LL->>LL: detect case (RegionOp / BufferRegion / BufferLoad / AccessPtr)
LL->>LL: static-aware bounds checks (IntImm checks)
LL->>LL: MakeSIMTLoop (offset indices by region.min)
LL-->>GPU: emit kernel code (or fatal + default on unsupported scope)
GPU-->>Py: kernel executes, fills region
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
for issue #1008 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/op/fill.cc (1)
93-99: Avoid null-dereferencingRampstride.
ramp->stride.as<IntImmNode>()can returnnullptr; dereferencing it before the check will crash when the stride isn’t a compile-time constant. Grab the pointer into a temp, verify it’s non-null, then check the value.- CHECK(ramp->stride.as<IntImmNode>()->value == 1) - << "Only stride 1 ramps are supported"; + const auto* stride_imm = ramp->stride.as<IntImmNode>(); + CHECK(stride_imm && stride_imm->value == 1) + << "Only stride 1 ramps are supported";
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/op/fill.cc(7 hunks)testing/python/issue/test_tilelang_issue_1008.py(1 hunks)tilelang/language/fill.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/op/fill.cc (3)
src/transform/pipeline_planning.cc (4)
region(470-477)region(470-470)region(480-487)region(480-480)tilelang/ir.py (1)
RegionOp(73-74)tilelang/language/tir/op.py (1)
tvm_access_ptr(651-676)
tilelang/language/fill.py (3)
tilelang/language/utils.py (4)
buffer_to_tile_region(30-42)buffer_region_to_tile_region(71-88)buffer_load_to_tile_region(45-68)region(8-27)tilelang/language/frame.py (2)
has_let_value(189-198)get_let_value(201-210)tilelang/utils/language.py (1)
get_buffer_region_from_load(137-159)
testing/python/issue/test_tilelang_issue_1008.py (3)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-144)tilelang/language/symbolics.py (1)
symbolic(25-27)tilelang/language/fill.py (1)
fill(14-47)
🪛 Ruff (0.14.3)
testing/python/issue/test_tilelang_issue_1008.py
17-17: Undefined name int64
(F821)
18-18: Local variable pid is assigned to but never used
Remove assignment to unused variable pid
(F841)
33-33: Undefined name int64
(F821)
34-34: Local variable pid is assigned to but never used
Remove assignment to unused variable pid
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
| a, b = T.alloc_var('int'), T.alloc_var('int') | ||
| T.fill(x[a:b], 0) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize dynamic slice bounds before calling T.fill.
T.alloc_var only gives you unconstrained scalars; unless you bind them immediately, they carry garbage when the kernel runs. Feeding x[a:b] into tl.fill right after allocation leaves the start/end of the region undefined, so the generated kernel can write outside x. Please derive the bounds from known expressions (e.g., reuse num_tokens) instead of uninitialized vars.
- a, b = T.alloc_var('int'), T.alloc_var('int')
- T.fill(x[a:b], 0)
+ start = T.max(num_tokens - 128, 0)
+ T.fill(x[start:num_tokens], 0)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In testing/python/issue/test_tilelang_issue_1008.py around lines 35 to 37, the
dynamic slice bounds a and b are allocated with T.alloc_var but never
initialized before calling T.fill(x[a:b], 0), leaving start/end undefined and
risking out-of-bounds writes; initialize those alloc_var scalars (e.g., set
a.value = 0 and b.value = num_tokens or assign them from an existing expression)
or replace x[a:b] with a slice derived directly from a known expression like
x[0:num_tokens] so the kernel receives concrete bounds before calling T.fill.
…e-ai#1189) * [Feature] Enhance fill operation to support various buffer types - Added support for `BufferLoad` in the `fill` function to handle different buffer types. - Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling. - Introduced checks for static bounds in region definitions to ensure safety during operations. - Refactored loop induction variable handling in `FillNode` to accommodate sliced regions. * lint fix
BufferLoadin thefillfunction to handle different buffer types.Fillclass to process region descriptors and buffer regions, improving flexibility in buffer handling.FillNodeto accommodate sliced regions.Summary by CodeRabbit
New Features
Bug Fixes
Tests