Skip to content

Conversation

@chengyupku
Copy link
Contributor

@chengyupku chengyupku commented Sep 14, 2025

  • Updated the TVM subproject to commit 87b845fa0e14c2029bbf5799fbbbb9d490db4f20.
  • Refactored the handling of BlockNode and BlockRealizeNode in warp_specialized_rewriter.cc to utilize the FilterByRole function, improving code clarity and maintainability.

Summary by CodeRabbit

  • New Features

    • Added a public attribute to mark and annotate warp-specialized functions for downstream passes.
  • Bug Fixes

    • Register-allocation hints are no longer applied when warp specialization is detected, preventing unintended overrides.
    • Injection now proceeds only when valid register hints are collected.
  • Refactor

    • Warp-specialization handling moved to an annotation-driven flow for more consistent cross-pass behavior.
  • Style

    • Minor trailing newline formatting fix.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 14, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Annotates warp-specialized functions with a new AttrStmt using kCustomWarpSpecialization, adds the public attribute key, and makes register-allocation collection/injection detect and short-circuit when that attribute is present.

Changes

Cohort / File(s) Summary
Warp specialization annotation
src/transform/warp_specialized_rewriter.cc
When warp specialization is detected, wrap the function body in an AttrStmt with attribute kCustomWarpSpecialization = 1 and return the annotated function; previously returned the function unchanged.
Public attribute key
src/op/builtin.h
Adds tvm::tl::attr::kCustomWarpSpecialization constant; no other API/signature changes.
Register-allocation annotation aware of warp specialization
src/transform/annotate_warp_group_reg_alloc.cc
SetMaxNRegCollector now visits AttrStmtNode, sets warp_specialized_ when kCustomWarpSpecialization is found, and returns an empty Array<IntImm> in that case. SetMaxNRegInjector no longer uses an external detector and skips injection when collected nreg_ is empty. Adds warp_specialized_ member.
Minor formatting
tilelang/language/builtin.py
Trailing newline added; no semantic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Rewriter as WarpSpecializedRewriter
  participant Func as PrimFunc
  participant Attr as AttrStmt(kCustomWarpSpecialization)

  Note over Rewriter,Func: Warp specialization annotation
  Rewriter->>Func: Analyze function for warp specialization
  alt warp-specialized
    Rewriter->>Attr: Create AttrStmt(key="kCustomWarpSpecialization", value=1)
    Attr->>Func: Wrap function body with AttrStmt
    Rewriter-->>Func: Return annotated function
  else not warp-specialized
    Rewriter->>Func: Return Substitute(... ) (unchanged path)
  end
Loading
sequenceDiagram
  autonumber
  participant Injector as SetMaxNRegInjector
  participant Collector as SetMaxNRegCollector
  participant Func as PrimFunc

  Note over Injector,Collector: Register-allocation collection & injection
  Injector->>Collector: Collect(Func)
  Collector->>Func: Visit body (including AttrStmt nodes)
  alt AttrStmt(kCustomWarpSpecialization) detected
    Func-->>Collector: Attr present -> Collector sets warp_specialized_ = true
    Collector-->>Injector: Return empty nreg_ array
    Injector->>Injector: Skip injection (nreg_ empty)
  else No annotation
    Collector-->>Injector: Return collected nreg_ values
    alt nreg_ non-empty
      Injector->>Func: Inject set_maxnreg hints
    else nreg_ empty
      Injector->>Injector: Skip injection
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

I hop and tag the warp with care,
A little flag flutters in the air.
Collectors yield and injectors rest,
Registers safe beneath my vest.
A rabbit's wink — a tidy quest. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and accurately summarizes the primary changes in the PR — updating the TVM subproject and refactoring BlockNode handling in warp_specialized_rewriter.cc — which matches the provided PR objectives and file-level summaries. It is specific, concise, and informative without using vague terms or unnecessary noise.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76d6f47 and e3692ef.

📒 Files selected for processing (1)
  • src/transform/annotate_warp_group_reg_alloc.cc (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/transform/annotate_warp_group_reg_alloc.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: bot-task
  • GitHub Check: format-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @chengyupku, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on updating the TVM subproject to its latest commit and enhancing the "warp_specialized_rewriter.cc" file. The refactoring in the C++ file streamlines the processing of "BlockNode" and "BlockRealizeNode" by adopting a unified "FilterByRole" mechanism, thereby improving the robustness and readability of the code.

Highlights

  • TVM Subproject Update: The TVM subproject has been updated to commit "87b845fa0e14c2029bbf5799fbbbb9d490db4f20".
  • Code Refactoring: The handling of "BlockNode" and "BlockRealizeNode" in "src/transform/warp_specialized_rewriter.cc" has been refactored to consistently use the "FilterByRole" function, improving code clarity and maintainability by removing previously unreachable "ICHECK(0)" statements.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the TVM subproject and refactors warp_specialized_rewriter.cc. The refactoring to handle BlockNode and BlockRealizeNode is a good step towards making the code more robust. However, I have a significant concern about the implementation. The current approach of overriding the VisitStmt_ methods with a direct call to FilterByRole breaks the recursive nature of StmtMutator. This could lead to statements inside these blocks being silently ignored, causing correctness issues. My review comment provides more details on this potential bug and suggests a path for correction.

Comment on lines 892 to 893
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Overriding VisitStmt_ for BlockNode and BlockRealizeNode to call FilterByRole(op) directly prevents the default recursive behavior of StmtMutator. These nodes are containers, and a StmtMutator should typically recurse into their body (e.g., by calling this->VisitStmt(op->body)).

By not performing this recursion, any statements within these blocks will be skipped by WSCodeEmitter, potentially leading to incorrect code generation. The same potential issue exists for the LetStmtNode visitor in this class.

The previous ICHECK(0) may have been a safeguard against this exact scenario, indicating that a simple filtering approach is insufficient. Please refactor this to ensure that the bodies of these container nodes are visited recursively, for example by manually implementing the recursion before applying filtering logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/transform/warp_specialized_rewriter.cc (1)

892-893: Optional: always traverse container nodes (Block/BlockRealize) even when roles match.

Currently, when the role matches the emitter, FilterByRole returns the node unchanged and skips traversal. For container nodes, you may still want to descend to enable nested rewrites (e.g., inner For with tl_pipeline_* annotations) even if the entire block is single‑role.

Apply this tweak if you want guaranteed traversal for containers while preserving role-based elision:

-  Stmt VisitStmt_(const BlockNode *op) final { return FilterByRole(op); }
+  Stmt VisitStmt_(const BlockNode *op) final {
+    Stmt out = FilterByRole(op);
+    if (out.same_as(GetRef<Stmt>(op)) && marker_.GetRole(op) != Role::kBoth && !mbarrier_only_) {
+      // Container: still traverse to allow nested transformations.
+      return StmtMutator::VisitStmt_(op);
+    }
+    return out;
+  }
-  Stmt VisitStmt_(const BlockRealizeNode *op) final { return FilterByRole(op); }
+  Stmt VisitStmt_(const BlockRealizeNode *op) final {
+    Stmt out = FilterByRole(op);
+    if (out.same_as(GetRef<Stmt>(op)) && marker_.GetRole(op) != Role::kBoth && !mbarrier_only_) {
+      // Container: still traverse to allow nested transformations.
+      return StmtMutator::VisitStmt_(op);
+    }
+    return out;
+  }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae9b706 and 43fc6e72f3ed6313d09e9b1c7b66675cdf1ffa6f.

📒 Files selected for processing (2)
  • 3rdparty/tvm (1 hunks)
  • src/transform/warp_specialized_rewriter.cc (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
PR: tile-ai/tilelang#794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.

Applied to files:

  • src/transform/warp_specialized_rewriter.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-amd
🔇 Additional comments (2)
3rdparty/tvm (1)

1-1: TVM submodule bump to 87b845f — small TIR refactor; verify downstream compatibility, CI, and licensing

Compare eddefbd65acb7b1ea51dd18068b4049754c4fa7a -> 87b845fa0e14c2029bbf5799fbbbb9d490db4f20 (1 commit: "Refactor BlockReadWriteDetector analysis on BlockRealizeNode") — changed file: src/tir/analysis/block_access_region_detector.cc.

  • Confirm warp_specialized_rewriter.cc still builds and FilterByRole() behavior is unchanged; run unit/e2e tests covering Block and BlockRealize paths.
  • Check for any local TVM patches that may have been dropped; reapply or document removals.
  • Ensure CI performs deterministic submodule updates (git submodule sync + git submodule update --init --depth …) and caches builds to avoid flakiness.
  • Verify license/NOTICE remains correct (TVM = Apache‑2.0) and no new third‑party code requires additional notices.
src/transform/warp_specialized_rewriter.cc (1)

892-893: Adopt FilterByRole for Block/BlockRealize — good cleanup and consistent with role marking.

This removes the previous “unreachable” stance and lets nested blocks flow through the same producer/consumer filtering as other nodes. Given WarpSpecializedRoleMarker assigns roles to both BlockNode and BlockRealizeNode, this is coherent.

Please confirm via tests that eliding a Block/BlockRealize with Evaluate(0) in non-selected roles does not violate any downstream TIR invariants (e.g., contexts expecting a BlockRealize as an immediate child). A small unit covering nested blocks with mixed roles would be ideal.

…warp group register allocation

- Added a new attribute `kCustomWarpSpecialization` to support custom warp specialization in the TileLang framework.
- Updated the `Collect` method in `SetMaxNRegCollector` to handle cases where warp specialization is detected, returning an empty array accordingly.
- Enhanced the `SetMaxNRegInjector` to skip processing when no registers are needed, improving efficiency.
- Modified the `WarpSpecialized` pass to include the new attribute in the function body when warp specialization is enabled, ensuring proper handling in transformations.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/op/builtin.h (1)

28-28: Name/key consistency for warp specialization attribute

You now have multiple warp‑specialization keys in the codebase (e.g., string literal "warp_specialize" in warp_specialized_rewriter.h vs this new kCustomWarpSpecialization). Consider centralizing on tl::attr constants for both to avoid drift and make usages grep‑able. If the intent is a function‑level marker, a brief comment here would help clarify semantics.

Option A (add alias for existing literal to de‑duplicate string usage across files):

 namespace attr {
 static constexpr const char *kPaddingMap = "padding_map";
 static constexpr const char *kWarpSpecializationScope =
     "kWarpSpecializationScope";
+// Function-level marker set by WarpSpecialized pass to indicate warp specialization is active.
+static constexpr const char *kWarpSpecialize = "warp_specialize";
 static constexpr const char *kCustomWarpSpecialization = "kCustomWarpSpecialization";
 }
src/transform/annotate_warp_group_reg_alloc.cc (2)

49-55: Also check value == 1 and keep backward compatibility with "warp_specialize"

Right now any presence of the key flips the flag. Guard on IntImm==1, and (optionally) accept the legacy "warp_specialize" key to be robust to mixed modules.

Apply:

-  void VisitStmt_(const AttrStmtNode *op) final {
-    if (op->attr_key == attr::kCustomWarpSpecialization) {
-      warp_specialized_ = true;
-    }
-    StmtExprVisitor::VisitStmt_(op);
-  }
+  void VisitStmt_(const AttrStmtNode *op) final {
+    if (op->attr_key == attr::kCustomWarpSpecialization ||
+        op->attr_key == "warp_specialize") {
+      if (const auto* imm = op->value.as<IntImmNode>()) {
+        warp_specialized_ = warp_specialized_ || (imm->value == 1);
+      }
+    }
+    StmtExprVisitor::VisitStmt_(op);
+  }

67-69: Defensive check on nreg_ size before indexing later

Collect guarantees size 0 or 2, but an explicit assertion prevents accidental OOB if invariants change.

Apply:

   T.nreg_ = SetMaxNRegCollector::Collect(f);
   if (T.nreg_.size() == 0) {
     return f;
   }
+  ICHECK_EQ(T.nreg_.size(), 2) << "Expected 2 register hints or empty; got " << T.nreg_.size();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43fc6e72f3ed6313d09e9b1c7b66675cdf1ffa6f and ee8aca1.

📒 Files selected for processing (3)
  • src/op/builtin.h (1 hunks)
  • src/transform/annotate_warp_group_reg_alloc.cc (2 hunks)
  • src/transform/warp_specialized_rewriter.cc (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/transform/warp_specialized_rewriter.cc
🧰 Additional context used
🧬 Code graph analysis (1)
src/transform/annotate_warp_group_reg_alloc.cc (2)
src/transform/warp_specialized_rewriter.cc (18)
  • op (38-43)
  • op (38-38)
  • op (73-83)
  • op (73-73)
  • op (85-93)
  • op (85-85)
  • op (95-100)
  • op (95-95)
  • op (102-110)
  • op (102-102)
  • op (134-146)
  • op (134-134)
  • op (148-177)
  • op (148-148)
  • op (179-189)
  • op (179-179)
  • f (1121-1140)
  • f (1121-1122)
src/transform/warp_specialized_rewriter.h (1)
  • VisitStmt_ (77-90)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: format-check
  • GitHub Check: bot-task
🔇 Additional comments (1)
src/transform/annotate_warp_group_reg_alloc.cc (1)

20-22: Early exit sentinel for warp‑specialized functions — pass order confirmed

WarpSpecialized injects attr::kCustomWarpSpecialization; AnnotateWarpGroupRegAlloc detects that attr and returns early; the engine pipeline invokes WarpSpecialized before AnnotateWarpGroupRegAlloc.

Locations: src/transform/warp_specialized_rewriter.cc (AttrStmt(..., attr::kCustomWarpSpecialization) at ~1288), src/transform/annotate_warp_group_reg_alloc.cc (VisitStmt_ checks attr::kCustomWarpSpecialization at ~48–52), tilelang/engine/phase.py (WarpSpecialized() then AnnotateWarpGroupRegAlloc() at ~128–130).

@LeiWang1999 LeiWang1999 merged commit 8b00522 into tile-ai:main Sep 15, 2025
5 of 8 checks passed
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
…arp_specialized_rewriter.cc (tile-ai#812)

* [Feature] Introduce custom warp specialization attribute and enhance warp group register allocation

- Added a new attribute `kCustomWarpSpecialization` to support custom warp specialization in the TileLang framework.
- Updated the `Collect` method in `SetMaxNRegCollector` to handle cases where warp specialization is detected, returning an empty array accordingly.
- Enhanced the `SetMaxNRegInjector` to skip processing when no registers are needed, improving efficiency.
- Modified the `WarpSpecialized` pass to include the new attribute in the function body when warp specialization is enabled, ensuring proper handling in transformations.

* lint

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants