Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Nov 4, 2025

  • Added compatibility handling for ParamSpec and Self to support Python versions below 3.10 and 3.11 respectively.
  • Updated type annotations across multiple files to ensure consistent usage of typing features.

Summary by CodeRabbit

  • New Features

    • Fill now accepts additional buffer-like inputs for more flexible region-based fills.
  • Bug Fixes

    • Improved region bounds validation and corrected loop indexing for sliced regions.
    • Safer lowering behavior when encountering unsupported scopes.
    • Wider Python typing compatibility across the codebase.
  • Tests

    • Added tests for static and dynamic region fills; disabled a few AMD/ROCm-specific tests.
  • Chores

    • Project Python requirement raised to ≥ 3.9; various typing/import modernizations.

- Added support for `BufferLoad` in the `fill` function to handle different buffer types.
- Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
- Introduced checks for static bounds in region definitions to ensure safety during operations.
- Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.
- Added compatibility handling for ParamSpec and Self to support Python versions below 3.10 and 3.11 respectively.
- Updated type annotations across multiple files to ensure consistent usage of typing features.
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 4, 2025

Walkthrough

Adds multi-form region parsing to Fill (tl.region, BufferRegion, BufferLoad, access-pointer), tightens bounds checks, adjusts SIMT lowering to honor region minima, fixes Lower return paths, expands TileLang fill input types and tests, and applies typing/import compatibility updates across modules.

Changes

Cohort / File(s) Summary
Fill operator core logic
src/op/fill.cc
Implements multi-case region parsing (tl.region/tvm_access_ptr, BufferRegion, BufferLoad, access-pointer fallback); replaces IntImm checks with IntImmNode guards; skips extent upper-bound checks when dst shape is symbolic; offsets SIMT loop indices by region.min; returns empty Stmt() on unsupported scope; adds region.h include.
TileLang fill API & tests
tilelang/language/fill.py, testing/python/issue/test_tilelang_issue_1008.py
Extends fill() to accept tir.BufferLoad; normalizes let-bound TVM vars; converts Buffer/BufferRegion/BufferLoad to tile regions via new helpers; routes calls via region_call. Adds JIT tests for static (x[0:128]) and dynamic (x[a:b]) region fills.
Layout reducer integration
src/transform/layout_reducer.cc
Adds detection for region-based tl.region(...) Fill calls to extract buffer Var from BufferLoadNode and mark inside-reducer ranges alongside existing tvm_access_ptr path.
Typing compatibility & annotations
tilelang/autotuner/tuner.py, tilelang/jit/__init__.py, tilelang/jit/kernel.py, tilelang/language/v2/ast.py, tilelang/language/v2/builder.py, tilelang/language/v2/dtypes.py
Adds guarded imports for ParamSpec/Self (typing → typing_extensions fallback); moves some typing imports to collections.abc/contextlib; replaces PEP 604 unions with typing.Union[...]; introduces AbstractContextManager[Any] and updates related type aliases/signatures.
Examples: Optional type hints
examples/attention_sink/... (multiple files)
Replaces `int
Project metadata & tooling
pyproject.toml
Raises Python requirement from >=3.8 to >=3.9, removes 3.8 classifier, and updates Ruff target-version to py39.
Misc. small changes
tilelang/contrib/cc.py, tilelang/language/proxy.py, tilelang/carver/roller/policy/default.py, tilelang/carver/roller/shape_inference/tir.py
Minor compatibility/import updates: switch some typing imports to collections.abc, replace functools.lru_cache(maxsize=None) with functools.cache, and other import relocations.
Tests disabled (ROCm)
testing/python/amd/test_tilelang_test_amd.py
Three ROCm-specific tests commented-out / disabled.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant FillCtor as Fill Constructor
    participant Parser as Region Parser
    participant Verifier as Bounds Verifier
    participant Lower as Lowering/SIMT

    Caller->>FillCtor: call Fill(buffer, value)

    alt descriptor is tl.region or tvm_access_ptr
        FillCtor->>Parser: parse descriptor call -> dst, per-dim region
    else descriptor is BufferRegion (legacy)
        FillCtor->>Parser: read BufferRegion -> dst, region
    else descriptor is BufferLoad
        FillCtor->>Parser: BufferLoad -> derive dst and per-dim region entries
    else fallback (access pointer)
        FillCtor->>Parser: derive full-buffer region from buffer shape
    end

    FillCtor->>Verifier: check mins/extents (IntImmNode guards)
    Verifier-->>FillCtor: verification result (skip extent upper-bound if symbolic)

    FillCtor->>Lower: emit SIMT loops (indices = region.min + loop_var)
    Lower-->>FillCtor: lowered Stmt or empty on unsupported scope

    FillCtor-->>Caller: return Stmt
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • Areas needing extra attention:
    • src/op/fill.cc: correctness across all four parsing paths, IntImmNode usage, and extent-check logic when shapes are symbolic.
    • SIMT lowering: ensure per-dimension offset (region.min + var) is consistently applied and preserves indexing.
    • tilelang/language/fill.py: BufferLoad conversion, let-bound TVM var normalization, and correct propagation to region_call.
    • New tests: verify JIT pass config, disabled specializations, and kernel semantics cover intended scenarios.

Possibly related PRs

Suggested reviewers

  • oraluben

Poem

🐇 I nibbled through regions, four ways to fill,

Mins baked in loops — a tiny precision thrill.
Types and tests hopped in, tidy and neat,
A rabbit's small poem for code made complete. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main refactoring effort: improving Python 3.9 compatibility for ParamSpec and Self typing features across the codebase, which aligns with the comprehensive changes documented in the pull request summary.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6386185 and 4e3dc2c.

📒 Files selected for processing (1)
  • testing/python/amd/test_tilelang_test_amd.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build SDist
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.1
🔇 Additional comments (1)
testing/python/amd/test_tilelang_test_amd.py (1)

226-245: This change is unrelated to the PR objectives and lacks explanation.

The PR is about improving Python 3.9 compatibility for type annotations (ParamSpec and Self), but this file disables three ROCm-specific GEMM tests without any apparent connection to type compatibility. Key concerns:

  1. Inconsistent with PR scope: No type annotation changes, imports, or compatibility shims are present in this file.
  2. Unexplained test removal: There's no documentation explaining why these specific test_gemm_rs_* variants need to be disabled while similar test_gemm_* tests (lines 99-126) remain active.
  3. Test coverage loss: Disabling these tests removes validation for the matmul_rs code path, which includes the register-spilling variant with A_local fragment copying.
  4. Commented code: Leaving commented-out test code is generally discouraged; if tests are temporarily disabled, add a TODO/FIXME comment explaining why, or remove them entirely.

Please clarify:

  • Is this change intentional, or was it accidentally included from a different branch?
  • If intentional, why are these specific tests being disabled? Is there a known issue with the matmul_rs variant on ROCm?
  • Should this be in a separate PR with appropriate context?

If these tests are permanently removed, apply this approach instead:

-# @tilelang.testing.requires_rocm
-# def test_gemm_rs_f16f32f32_nt():
-#     run_gemm_rs(1024, 1024, 1024, False, False, "float16", "float32", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, False, True, "float16", "float32", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, True, True, "float16", "float32", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, True, False, "float16", "float32", "float32", 128, 128, 32)
-
-# @tilelang.testing.requires_rocm
-# def test_gemm_rs_bf16f32f32_nt():
-#     run_gemm_rs(1024, 1024, 1024, False, False, "bfloat16", "float32", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, False, True, "bfloat16", "float32", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, True, True, "bfloat16", "float32", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, True, False, "bfloat16", "float32", "float32", 128, 128, 32)
-
-# @tilelang.testing.requires_rocm
-# def test_gemm_rs_bf16bf16f32_nt():
-#     run_gemm_rs(1024, 1024, 1024, False, False, "bfloat16", "bfloat16", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, False, True, "bfloat16", "bfloat16", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, True, True, "bfloat16", "bfloat16", "float32", 128, 128, 32)
-#     run_gemm_rs(1024, 1024, 1024, True, False, "bfloat16", "bfloat16", "float32", 128, 128, 32)
+# TODO: Re-enable these tests once [specific issue] is resolved
+# Temporarily disabled due to [reason: e.g., ROCm compatibility issue, known bug in matmul_rs]

Or simply remove them entirely if they're no longer needed.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/op/fill.cc (1)

92-101: Prevent null dereference when checking ramp stride

Line 93 dereferences ramp->stride.as<IntImmNode>() before verifying the cast succeeded. If stride is symbolic or otherwise not an IntImmNode, this will segfault instead of raising a clean check failure. Guard the pointer first.

-      if (const auto *ramp = index.as<RampNode>()) {
-        CHECK(ramp->stride.as<IntImmNode>()->value == 1)
-            << "Only stride 1 ramps are supported";
+      if (const auto *ramp = index.as<RampNode>()) {
+        const auto *stride_imm = ramp->stride.as<IntImmNode>();
+        CHECK(stride_imm && stride_imm->value == 1)
+            << "Only stride 1 ramps are supported";
         const auto *lanes = ramp->lanes.as<IntImmNode>();
         CHECK(lanes)
             << "Scalable vectors not supported in BufferRegion conversion";
         node->region.push_back(Range::FromMinExtent(ramp->base, ramp->lanes));
🧹 Nitpick comments (1)
testing/python/issue/test_tilelang_issue_1008.py (1)

1-53: Tests compile kernels but don't verify results.

The tests successfully exercise static and dynamic region fills, but they don't assert that the fill operations produce correct results. Consider adding assertions to verify the buffer contents after filling.

Example enhancement for test_fill_with_static_region_kernel:

 def test_fill_with_static_region_kernel():
     kernel = _fill_with_static_region_kernel()
     x = torch.zeros((256,), dtype=torch.int64, device='cuda')
     kernel(x)
+    # Verify that x[0:128] was filled with 0 (already 0, but tests the path)
+    assert torch.all(x == 0), "Buffer should remain zero after fill"

For the dynamic region test, you could set a and b to specific values and verify the filled region.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1768cbe and 6de664e.

📒 Files selected for processing (9)
  • src/op/fill.cc (7 hunks)
  • testing/python/issue/test_tilelang_issue_1008.py (1 hunks)
  • tilelang/autotuner/tuner.py (1 hunks)
  • tilelang/jit/__init__.py (1 hunks)
  • tilelang/jit/kernel.py (1 hunks)
  • tilelang/language/fill.py (2 hunks)
  • tilelang/language/v2/ast.py (1 hunks)
  • tilelang/language/v2/builder.py (2 hunks)
  • tilelang/language/v2/dtypes.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
testing/python/issue/test_tilelang_issue_1008.py (3)
tilelang/transform/pass_config.py (1)
  • PassConfigKey (6-144)
tilelang/language/symbolics.py (1)
  • symbolic (25-27)
tilelang/language/fill.py (1)
  • fill (14-47)
src/op/fill.cc (3)
src/transform/pipeline_planning.cc (4)
  • region (470-477)
  • region (470-470)
  • region (480-487)
  • region (480-480)
tilelang/ir.py (1)
  • RegionOp (73-74)
tilelang/language/tir/op.py (1)
  • tvm_access_ptr (651-676)
tilelang/language/fill.py (4)
tilelang/language/utils.py (4)
  • buffer_to_tile_region (30-42)
  • buffer_region_to_tile_region (71-88)
  • buffer_load_to_tile_region (45-68)
  • region (8-27)
tilelang/language/frame.py (2)
  • has_let_value (189-198)
  • get_let_value (201-210)
tilelang/utils/language.py (1)
  • get_buffer_region_from_load (137-159)
tilelang/language/tir/op.py (1)
  • call_intrin (120-145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
🔇 Additional comments (5)
tilelang/language/v2/ast.py (1)

4-9: LGTM! Clean Python 3.9 compatibility implementation.

The try/except pattern correctly handles ParamSpec availability, which was introduced in Python 3.10. The fallback to typing_extensions ensures compatibility with Python 3.9.

tilelang/language/v2/dtypes.py (2)

5-5: LGTM! Union import added for Python 3.9 compatibility.


10-11: LGTM! Correct migration from PEP 604 to Union syntax.

The change from ir.Type | str | type | torch.dtype | dtype to Union[ir.Type, str, type, torch.dtype, dtype] ensures compatibility with Python 3.9, where the | operator for type unions is not yet available.

tilelang/language/v2/builder.py (2)

15-20: LGTM! Comprehensive typing compatibility imports.

The compatibility handling for both ParamSpec (Python < 3.10) and Self (Python < 3.11) follows the standard pattern and aligns with changes in other modules.


103-106: LGTM! Correct distinction between runtime and typing constructs.

The approach is well-documented:

  • ContinueOrBreak as a tuple supports isinstance() checks (used on line 175)
  • AnyFrame using Union[...] maintains type annotation compatibility with Python 3.9

- Updated the minimum required Python version from 3.8 to 3.9 in `pyproject.toml`.
- Removed references to Python 3.8 in classifiers.
- Changed type annotations from `int | None` to `Optional[int]` in multiple example files for better clarity and compatibility.
- Improved import statements to use `collections.abc` for `Iterable` and `contextlib` for `AbstractContextManager` in relevant files.
@LeiWang1999
Copy link
Member Author

@codex rewiew

@chatgpt-codex-connector
Copy link

To use Codex here, create an environment for this repo.

- Replaced imports from `typing` with `collections.abc` for `Iterable` and `Mapping` in relevant files to improve compatibility and clarity.
- Updated the caching decorator from `functools.lru_cache` to `functools.cache` for better performance in the C++ compiler retrieval function.
- Adjusted import statements in the language proxy file to maintain consistency in type annotations.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tilelang/language/v2/builder.py (1)

183-187: Handle optional frame before calling __enter__.
The new annotation advertises that with_frame accepts None, but we still unconditionally invoke self.enter_frame(frame), which will try to run None.__enter__() and crash. Either drop the | None from the signature or guard the call so we skip the enter/exit path when frame is None.

Please apply this guard:

     def with_frame(self, frame: AbstractContextManager[Any] | None):
         pop_idx = len(self.frames)
-        yield self.enter_frame(frame)
+        if frame is None:
+            yield None
+        else:
+            yield self.enter_frame(frame)
         while len(self.frames) > pop_idx:
             self.frames.pop().__exit__(None, None, None)
🧹 Nitpick comments (1)
src/transform/layout_reducer.cc (1)

279-307: Approve the region-based Fill handling with a suggested refactor to reduce duplication.

The implementation correctly adds support for region-based Fill calls alongside the existing tvm_access_ptr path. The bounds checking (lines 283, 297) and optional handling are solid.

However, lines 285-292 and 299-306 contain nearly identical logic for checking and recording reducer buffers. Consider extracting this common pattern:

   PrimExpr VisitExpr_(const CallNode *op_) final {
     auto op_ref = IRMutatorWithAnalyzer::VisitExpr_(op_).as<Call>().value();
     auto op = op_ref.CopyOnWrite();
     if (op->op.same_as(Fill::Get())) {
       ICHECK(!op->args.empty());
+      std::optional<Var> extracted_var;
+      
       if (auto arg0_call = op->args[0].as<Call>()) {
         // Case 1: tl.region(...) — extract buffer var from its first arg
         if (arg0_call.value()->op.same_as(RegionOp::Get())) {
           ICHECK(!arg0_call.value()->args.empty());
           if (auto bl = arg0_call.value()->args[0].as<BufferLoadNode>()) {
-            Var var = bl->buffer->data;
-            if (reducer_info_map_.count(var)) {
-              ICHECK(inside_reducer_range_.count(var) == 0)
-                  << "T.fill on reducer must be enclosed with a "
-                     "T.finalize_reducer "
-                     "before next.";
-              inside_reducer_range_.Set(var,
-                                        reducer_info_map_.Get(var).value());
-            }
+            extracted_var = bl->buffer->data;
           }
         }
         // Case 2: builtin.tvm_access_ptr(...) — existing path
         else if (arg0_call.value()->op.same_as(builtin::tvm_access_ptr())) {
           ICHECK(arg0_call.value()->args.size() > 1);
-          if (auto var = arg0_call.value()->args[1].as<Var>();
-              var && reducer_info_map_.count(var.value())) {
-            ICHECK(inside_reducer_range_.count(var.value()) == 0)
-                << "T.fill on reducer must be enclosed with a "
-                   "T.finalize_reducer "
-                   "before next.";
-            inside_reducer_range_.Set(
-                var.value(), reducer_info_map_.Get(var.value()).value());
-          }
+          extracted_var = arg0_call.value()->args[1].as<Var>();
         }
       }
+      
+      if (extracted_var && reducer_info_map_.count(extracted_var.value())) {
+        ICHECK(inside_reducer_range_.count(extracted_var.value()) == 0)
+            << "T.fill on reducer must be enclosed with a "
+               "T.finalize_reducer before next.";
+        inside_reducer_range_.Set(
+            extracted_var.value(), 
+            reducer_info_map_.Get(extracted_var.value()).value());
+      }
     } else if (op->op.same_as(FinalizeReducerOp::Get())) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6de664e and 4f4dae7.

📒 Files selected for processing (12)
  • examples/attention_sink/benchmark_gqa_sink_fwd.py (3 hunks)
  • examples/attention_sink/benchmark_mha_sink_fwd.py (3 hunks)
  • examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1 hunks)
  • examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py (1 hunks)
  • examples/attention_sink/example_mha_sink_bwd_bhsd.py (1 hunks)
  • examples/attention_sink/example_mha_sink_fwd_bhsd.py (1 hunks)
  • examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py (1 hunks)
  • pyproject.toml (2 hunks)
  • src/transform/layout_reducer.cc (2 hunks)
  • tilelang/jit/__init__.py (1 hunks)
  • tilelang/language/v2/ast.py (2 hunks)
  • tilelang/language/v2/builder.py (5 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tilelang/jit/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tilelang/language/v2/ast.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/transform/layout_reducer.cc (2)
src/op/parallel.cc (4)
  • op (97-107)
  • op (97-97)
  • op (109-117)
  • op (109-109)
tilelang/language/tir/op.py (1)
  • tvm_access_ptr (651-676)
examples/attention_sink/benchmark_mha_sink_fwd.py (1)
examples/attention_sink/benchmark_gqa_sink_fwd.py (1)
  • triton_program (98-124)
examples/attention_sink/benchmark_gqa_sink_fwd.py (1)
examples/attention_sink/benchmark_mha_sink_fwd.py (1)
  • triton_program (97-121)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build SDist
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.1
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
🔇 Additional comments (11)
examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py (1)

266-266: LGTM! Correct Python 3.9 compatibility fix.

The type annotation change from int | None to Optional[int] is correct, as the union operator | for types was introduced in Python 3.10 (PEP 604). This change ensures compatibility with Python 3.9.

examples/attention_sink/example_mha_sink_bwd_bhsd.py (1)

443-443: LGTM! Correct Python 3.9 compatibility fix.

The type annotation correctly uses Optional[int] instead of int | None to ensure compatibility with Python 3.9, where the union operator | for types is not available.

pyproject.toml (2)

5-5: LGTM! Minimum Python version correctly updated.

The Python requirement change from >=3.8 to >=3.9 aligns with the PR's objective to improve Python 3.9 compatibility. This is necessary since the codebase now uses type annotations that require Python 3.9+.


120-120: LGTM! Ruff target version correctly updated.

The ruff target-version update from py38 to py39 correctly aligns with the new minimum Python version requirement, ensuring that ruff's linting rules are appropriate for Python 3.9+.

examples/attention_sink/example_mha_sink_fwd_bhsd.py (1)

256-256: LGTM! Correct Python 3.9 compatibility fix.

The type annotation correctly uses Optional[int] to maintain compatibility with Python 3.9. The change is consistent with similar updates across other attention sink examples.

examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)

447-447: LGTM! Correct Python 3.9 compatibility fix.

The type annotation change from int | None to Optional[int] correctly ensures Python 3.9 compatibility. The change is consistent with the broader refactoring across all attention sink examples.

examples/attention_sink/benchmark_mha_sink_fwd.py (1)

8-8: LGTM! Comprehensive Python 3.9 compatibility fixes.

The changes correctly update both function signatures to use Optional[int] instead of int | None, and the Optional import is properly added. This ensures compatibility with Python 3.9 where the union operator | for types is not available.

Also applies to: 97-97, 129-129

examples/attention_sink/benchmark_gqa_sink_fwd.py (1)

8-8: LGTM! Comprehensive Python 3.9 compatibility fixes.

The changes correctly update both function signatures to use Optional[int] instead of int | None, with the proper Optional import added. This ensures compatibility with Python 3.9 and is consistent with similar updates across other benchmark files.

Also applies to: 98-98, 134-134

examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py (1)

275-275: LGTM! Correct Python 3.9 compatibility fix.

The type annotation correctly uses Optional[int] to ensure compatibility with Python 3.9. This change is consistent with the broader refactoring effort across all attention sink examples in this PR.

src/transform/layout_reducer.cc (1)

17-17: LGTM! Necessary include for region operation support.

The include is correctly added to support RegionOp::Get() usage in the updated Fill handling logic.

tilelang/language/v2/builder.py (1)

15-20: Good fallback for ParamSpec/Self imports.
This keeps the builder usable on Python 3.9+ by gracefully leaning on typing_extensions when the stdlib lacks the symbols.

@LeiWang1999
Copy link
Member Author

Local Test Can Pass, Merged.

@LeiWang1999 LeiWang1999 merged commit 7d96189 into tile-ai:main Nov 4, 2025
10 checks passed
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
…le-ai#1190)

* [Feature] Enhance fill operation to support various buffer types

- Added support for `BufferLoad` in the `fill` function to handle different buffer types.
- Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
- Introduced checks for static bounds in region definitions to ensure safety during operations.
- Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.

* lint fix

* [Refactor] Improve Python compatibility for ParamSpec and Self

- Added compatibility handling for ParamSpec and Self to support Python versions below 3.10 and 3.11 respectively.
- Updated type annotations across multiple files to ensure consistent usage of typing features.

* [Update] Require Python 3.9 and enhance type annotations

- Updated the minimum required Python version from 3.8 to 3.9 in `pyproject.toml`.
- Removed references to Python 3.8 in classifiers.
- Changed type annotations from `int | None` to `Optional[int]` in multiple example files for better clarity and compatibility.
- Improved import statements to use `collections.abc` for `Iterable` and `contextlib` for `AbstractContextManager` in relevant files.

* [Refactor] Update import statements to enhance type annotations

- Replaced imports from `typing` with `collections.abc` for `Iterable` and `Mapping` in relevant files to improve compatibility and clarity.
- Updated the caching decorator from `functools.lru_cache` to `functools.cache` for better performance in the C++ compiler retrieval function.
- Adjusted import statements in the language proxy file to maintain consistency in type annotations.

* disable rocm rs nt test.

* lint fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant