-
Notifications
You must be signed in to change notification settings - Fork 332
[Refactor] Improve Python3.9 compatibility for ParamSpec and Self #1190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added support for `BufferLoad` in the `fill` function to handle different buffer types. - Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling. - Introduced checks for static bounds in region definitions to ensure safety during operations. - Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.
- Added compatibility handling for ParamSpec and Self to support Python versions below 3.10 and 3.11 respectively. - Updated type annotations across multiple files to ensure consistent usage of typing features.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
WalkthroughAdds multi-form region parsing to Fill (tl.region, BufferRegion, BufferLoad, access-pointer), tightens bounds checks, adjusts SIMT lowering to honor region minima, fixes Lower return paths, expands TileLang fill input types and tests, and applies typing/import compatibility updates across modules. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant FillCtor as Fill Constructor
participant Parser as Region Parser
participant Verifier as Bounds Verifier
participant Lower as Lowering/SIMT
Caller->>FillCtor: call Fill(buffer, value)
alt descriptor is tl.region or tvm_access_ptr
FillCtor->>Parser: parse descriptor call -> dst, per-dim region
else descriptor is BufferRegion (legacy)
FillCtor->>Parser: read BufferRegion -> dst, region
else descriptor is BufferLoad
FillCtor->>Parser: BufferLoad -> derive dst and per-dim region entries
else fallback (access pointer)
FillCtor->>Parser: derive full-buffer region from buffer shape
end
FillCtor->>Verifier: check mins/extents (IntImmNode guards)
Verifier-->>FillCtor: verification result (skip extent upper-bound if symbolic)
FillCtor->>Lower: emit SIMT loops (indices = region.min + loop_var)
Lower-->>FillCtor: lowered Stmt or empty on unsupported scope
FillCtor-->>Caller: return Stmt
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/op/fill.cc (1)
92-101: Prevent null dereference when checking ramp strideLine 93 dereferences
ramp->stride.as<IntImmNode>()before verifying the cast succeeded. Ifstrideis symbolic or otherwise not anIntImmNode, this will segfault instead of raising a clean check failure. Guard the pointer first.- if (const auto *ramp = index.as<RampNode>()) { - CHECK(ramp->stride.as<IntImmNode>()->value == 1) - << "Only stride 1 ramps are supported"; + if (const auto *ramp = index.as<RampNode>()) { + const auto *stride_imm = ramp->stride.as<IntImmNode>(); + CHECK(stride_imm && stride_imm->value == 1) + << "Only stride 1 ramps are supported"; const auto *lanes = ramp->lanes.as<IntImmNode>(); CHECK(lanes) << "Scalable vectors not supported in BufferRegion conversion"; node->region.push_back(Range::FromMinExtent(ramp->base, ramp->lanes));
🧹 Nitpick comments (1)
testing/python/issue/test_tilelang_issue_1008.py (1)
1-53: Tests compile kernels but don't verify results.The tests successfully exercise static and dynamic region fills, but they don't assert that the fill operations produce correct results. Consider adding assertions to verify the buffer contents after filling.
Example enhancement for
test_fill_with_static_region_kernel:def test_fill_with_static_region_kernel(): kernel = _fill_with_static_region_kernel() x = torch.zeros((256,), dtype=torch.int64, device='cuda') kernel(x) + # Verify that x[0:128] was filled with 0 (already 0, but tests the path) + assert torch.all(x == 0), "Buffer should remain zero after fill"For the dynamic region test, you could set
aandbto specific values and verify the filled region.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/op/fill.cc(7 hunks)testing/python/issue/test_tilelang_issue_1008.py(1 hunks)tilelang/autotuner/tuner.py(1 hunks)tilelang/jit/__init__.py(1 hunks)tilelang/jit/kernel.py(1 hunks)tilelang/language/fill.py(2 hunks)tilelang/language/v2/ast.py(1 hunks)tilelang/language/v2/builder.py(2 hunks)tilelang/language/v2/dtypes.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
testing/python/issue/test_tilelang_issue_1008.py (3)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-144)tilelang/language/symbolics.py (1)
symbolic(25-27)tilelang/language/fill.py (1)
fill(14-47)
src/op/fill.cc (3)
src/transform/pipeline_planning.cc (4)
region(470-477)region(470-470)region(480-487)region(480-480)tilelang/ir.py (1)
RegionOp(73-74)tilelang/language/tir/op.py (1)
tvm_access_ptr(651-676)
tilelang/language/fill.py (4)
tilelang/language/utils.py (4)
buffer_to_tile_region(30-42)buffer_region_to_tile_region(71-88)buffer_load_to_tile_region(45-68)region(8-27)tilelang/language/frame.py (2)
has_let_value(189-198)get_let_value(201-210)tilelang/utils/language.py (1)
get_buffer_region_from_load(137-159)tilelang/language/tir/op.py (1)
call_intrin(120-145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
🔇 Additional comments (5)
tilelang/language/v2/ast.py (1)
4-9: LGTM! Clean Python 3.9 compatibility implementation.The try/except pattern correctly handles ParamSpec availability, which was introduced in Python 3.10. The fallback to
typing_extensionsensures compatibility with Python 3.9.tilelang/language/v2/dtypes.py (2)
5-5: LGTM! Union import added for Python 3.9 compatibility.
10-11: LGTM! Correct migration from PEP 604 to Union syntax.The change from
ir.Type | str | type | torch.dtype | dtypetoUnion[ir.Type, str, type, torch.dtype, dtype]ensures compatibility with Python 3.9, where the|operator for type unions is not yet available.tilelang/language/v2/builder.py (2)
15-20: LGTM! Comprehensive typing compatibility imports.The compatibility handling for both
ParamSpec(Python < 3.10) andSelf(Python < 3.11) follows the standard pattern and aligns with changes in other modules.
103-106: LGTM! Correct distinction between runtime and typing constructs.The approach is well-documented:
ContinueOrBreakas a tuple supportsisinstance()checks (used on line 175)AnyFrameusingUnion[...]maintains type annotation compatibility with Python 3.9
- Updated the minimum required Python version from 3.8 to 3.9 in `pyproject.toml`. - Removed references to Python 3.8 in classifiers. - Changed type annotations from `int | None` to `Optional[int]` in multiple example files for better clarity and compatibility. - Improved import statements to use `collections.abc` for `Iterable` and `contextlib` for `AbstractContextManager` in relevant files.
|
@codex rewiew |
|
To use Codex here, create an environment for this repo. |
- Replaced imports from `typing` with `collections.abc` for `Iterable` and `Mapping` in relevant files to improve compatibility and clarity. - Updated the caching decorator from `functools.lru_cache` to `functools.cache` for better performance in the C++ compiler retrieval function. - Adjusted import statements in the language proxy file to maintain consistency in type annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/language/v2/builder.py (1)
183-187: Handle optionalframebefore calling__enter__.
The new annotation advertises thatwith_frameacceptsNone, but we still unconditionally invokeself.enter_frame(frame), which will try to runNone.__enter__()and crash. Either drop the| Nonefrom the signature or guard the call so we skip the enter/exit path whenframeisNone.Please apply this guard:
def with_frame(self, frame: AbstractContextManager[Any] | None): pop_idx = len(self.frames) - yield self.enter_frame(frame) + if frame is None: + yield None + else: + yield self.enter_frame(frame) while len(self.frames) > pop_idx: self.frames.pop().__exit__(None, None, None)
🧹 Nitpick comments (1)
src/transform/layout_reducer.cc (1)
279-307: Approve the region-based Fill handling with a suggested refactor to reduce duplication.The implementation correctly adds support for region-based Fill calls alongside the existing
tvm_access_ptrpath. The bounds checking (lines 283, 297) and optional handling are solid.However, lines 285-292 and 299-306 contain nearly identical logic for checking and recording reducer buffers. Consider extracting this common pattern:
PrimExpr VisitExpr_(const CallNode *op_) final { auto op_ref = IRMutatorWithAnalyzer::VisitExpr_(op_).as<Call>().value(); auto op = op_ref.CopyOnWrite(); if (op->op.same_as(Fill::Get())) { ICHECK(!op->args.empty()); + std::optional<Var> extracted_var; + if (auto arg0_call = op->args[0].as<Call>()) { // Case 1: tl.region(...) — extract buffer var from its first arg if (arg0_call.value()->op.same_as(RegionOp::Get())) { ICHECK(!arg0_call.value()->args.empty()); if (auto bl = arg0_call.value()->args[0].as<BufferLoadNode>()) { - Var var = bl->buffer->data; - if (reducer_info_map_.count(var)) { - ICHECK(inside_reducer_range_.count(var) == 0) - << "T.fill on reducer must be enclosed with a " - "T.finalize_reducer " - "before next."; - inside_reducer_range_.Set(var, - reducer_info_map_.Get(var).value()); - } + extracted_var = bl->buffer->data; } } // Case 2: builtin.tvm_access_ptr(...) — existing path else if (arg0_call.value()->op.same_as(builtin::tvm_access_ptr())) { ICHECK(arg0_call.value()->args.size() > 1); - if (auto var = arg0_call.value()->args[1].as<Var>(); - var && reducer_info_map_.count(var.value())) { - ICHECK(inside_reducer_range_.count(var.value()) == 0) - << "T.fill on reducer must be enclosed with a " - "T.finalize_reducer " - "before next."; - inside_reducer_range_.Set( - var.value(), reducer_info_map_.Get(var.value()).value()); - } + extracted_var = arg0_call.value()->args[1].as<Var>(); } } + + if (extracted_var && reducer_info_map_.count(extracted_var.value())) { + ICHECK(inside_reducer_range_.count(extracted_var.value()) == 0) + << "T.fill on reducer must be enclosed with a " + "T.finalize_reducer before next."; + inside_reducer_range_.Set( + extracted_var.value(), + reducer_info_map_.Get(extracted_var.value()).value()); + } } else if (op->op.same_as(FinalizeReducerOp::Get())) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
examples/attention_sink/benchmark_gqa_sink_fwd.py(3 hunks)examples/attention_sink/benchmark_mha_sink_fwd.py(3 hunks)examples/attention_sink/example_gqa_sink_bwd_bhsd.py(1 hunks)examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py(1 hunks)examples/attention_sink/example_mha_sink_bwd_bhsd.py(1 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd.py(1 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py(1 hunks)pyproject.toml(2 hunks)src/transform/layout_reducer.cc(2 hunks)tilelang/jit/__init__.py(1 hunks)tilelang/language/v2/ast.py(2 hunks)tilelang/language/v2/builder.py(5 hunks)
✅ Files skipped from review due to trivial changes (1)
- tilelang/jit/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tilelang/language/v2/ast.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/transform/layout_reducer.cc (2)
src/op/parallel.cc (4)
op(97-107)op(97-97)op(109-117)op(109-109)tilelang/language/tir/op.py (1)
tvm_access_ptr(651-676)
examples/attention_sink/benchmark_mha_sink_fwd.py (1)
examples/attention_sink/benchmark_gqa_sink_fwd.py (1)
triton_program(98-124)
examples/attention_sink/benchmark_gqa_sink_fwd.py (1)
examples/attention_sink/benchmark_mha_sink_fwd.py (1)
triton_program(97-121)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.1
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
🔇 Additional comments (11)
examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py (1)
266-266: LGTM! Correct Python 3.9 compatibility fix.The type annotation change from
int | NonetoOptional[int]is correct, as the union operator|for types was introduced in Python 3.10 (PEP 604). This change ensures compatibility with Python 3.9.examples/attention_sink/example_mha_sink_bwd_bhsd.py (1)
443-443: LGTM! Correct Python 3.9 compatibility fix.The type annotation correctly uses
Optional[int]instead ofint | Noneto ensure compatibility with Python 3.9, where the union operator|for types is not available.pyproject.toml (2)
5-5: LGTM! Minimum Python version correctly updated.The Python requirement change from
>=3.8to>=3.9aligns with the PR's objective to improve Python 3.9 compatibility. This is necessary since the codebase now uses type annotations that require Python 3.9+.
120-120: LGTM! Ruff target version correctly updated.The ruff target-version update from
py38topy39correctly aligns with the new minimum Python version requirement, ensuring that ruff's linting rules are appropriate for Python 3.9+.examples/attention_sink/example_mha_sink_fwd_bhsd.py (1)
256-256: LGTM! Correct Python 3.9 compatibility fix.The type annotation correctly uses
Optional[int]to maintain compatibility with Python 3.9. The change is consistent with similar updates across other attention sink examples.examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
447-447: LGTM! Correct Python 3.9 compatibility fix.The type annotation change from
int | NonetoOptional[int]correctly ensures Python 3.9 compatibility. The change is consistent with the broader refactoring across all attention sink examples.examples/attention_sink/benchmark_mha_sink_fwd.py (1)
8-8: LGTM! Comprehensive Python 3.9 compatibility fixes.The changes correctly update both function signatures to use
Optional[int]instead ofint | None, and theOptionalimport is properly added. This ensures compatibility with Python 3.9 where the union operator|for types is not available.Also applies to: 97-97, 129-129
examples/attention_sink/benchmark_gqa_sink_fwd.py (1)
8-8: LGTM! Comprehensive Python 3.9 compatibility fixes.The changes correctly update both function signatures to use
Optional[int]instead ofint | None, with the properOptionalimport added. This ensures compatibility with Python 3.9 and is consistent with similar updates across other benchmark files.Also applies to: 98-98, 134-134
examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py (1)
275-275: LGTM! Correct Python 3.9 compatibility fix.The type annotation correctly uses
Optional[int]to ensure compatibility with Python 3.9. This change is consistent with the broader refactoring effort across all attention sink examples in this PR.src/transform/layout_reducer.cc (1)
17-17: LGTM! Necessary include for region operation support.The include is correctly added to support
RegionOp::Get()usage in the updated Fill handling logic.tilelang/language/v2/builder.py (1)
15-20: Good fallback for ParamSpec/Self imports.
This keeps the builder usable on Python 3.9+ by gracefully leaning ontyping_extensionswhen the stdlib lacks the symbols.
|
Local Test Can Pass, Merged. |
…le-ai#1190) * [Feature] Enhance fill operation to support various buffer types - Added support for `BufferLoad` in the `fill` function to handle different buffer types. - Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling. - Introduced checks for static bounds in region definitions to ensure safety during operations. - Refactored loop induction variable handling in `FillNode` to accommodate sliced regions. * lint fix * [Refactor] Improve Python compatibility for ParamSpec and Self - Added compatibility handling for ParamSpec and Self to support Python versions below 3.10 and 3.11 respectively. - Updated type annotations across multiple files to ensure consistent usage of typing features. * [Update] Require Python 3.9 and enhance type annotations - Updated the minimum required Python version from 3.8 to 3.9 in `pyproject.toml`. - Removed references to Python 3.8 in classifiers. - Changed type annotations from `int | None` to `Optional[int]` in multiple example files for better clarity and compatibility. - Improved import statements to use `collections.abc` for `Iterable` and `contextlib` for `AbstractContextManager` in relevant files. * [Refactor] Update import statements to enhance type annotations - Replaced imports from `typing` with `collections.abc` for `Iterable` and `Mapping` in relevant files to improve compatibility and clarity. - Updated the caching decorator from `functools.lru_cache` to `functools.cache` for better performance in the C++ compiler retrieval function. - Adjusted import statements in the language proxy file to maintain consistency in type annotations. * disable rocm rs nt test. * lint fix
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores