Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Nov 21, 2025

This pull request introduces several improvements and refactorings to the loop vectorization infrastructure, with a focus on making vectorization analysis more robust and analyzer-aware. The most significant changes are the extension of vectorization-related functions to accept an arith::Analyzer, updates to the vectorization planning logic, and corresponding changes throughout the codebase to utilize the enhanced APIs.

Key changes include:

Vectorization API and Infrastructure Improvements

  • Added overloads for GetVectorizeSize and VectorizeLoop to accept an arith::Analyzer*, enabling more accurate and context-aware vectorization analysis. (src/transform/loop_vectorize.h, src/transform/loop_vectorize.cc) [1] [2] [3]
  • Refactored VectorizePlanner and related classes to use IRMutatorWithAnalyzer instead of IRVisitorWithAnalyzer, allowing for mutating analysis and better integration with analyzers. (src/transform/loop_vectorize.cc) [1] [2] [3] [4]

Propagation of Analyzer Context

  • Updated all call sites of VectorizeLoop and GetVectorizeSize to pass the appropriate analyzer argument, ensuring that vectorization decisions are made with full context. This includes changes in copy, fill, and parallel op lowering, as well as in layout inference and legalization passes. (src/op/copy.cc, src/op/fill.cc, src/op/parallel.cc, src/transform/layout_inference.cc, src/transform/legalize_vectorized_loop.cc) [1] [2] [3] [4] [5] [6] [7]

Layout Inference Enhancements

  • Modified BufferUseDefCollector in layout inference to maintain a vector of analyzers (analyzer_vec_), ensuring that each inference step uses the correct analyzer context. This includes proper cloning, size checks, and passing analyzers to InferLayout. (src/transform/layout_inference.cc) [1] [2] [3] [4] [5] [6]

Vectorization Analysis Logic

  • Improved the logic for determining if indices can be vectorized by simplifying expressions before divisibility checks, leading to more robust vectorization decisions. (src/transform/loop_vectorize.cc) [1] [2]

Dependency Updates

  • Updated the TVM submodule to a newer commit, pulling in upstream improvements and fixes. (3rdparty/tvm)

Summary by CodeRabbit

  • Refactor

    • Loop vectorization and planning are now analyzer-aware, improving vector-size decisions, nested-loop handling, and consistency of transformations across inference runs.
    • Layout inference now maintains per-inference analyzer state to preserve analysis context during inference and transformations.
    • Vectorization APIs extended with analyzer-aware variants to enable finer-grained optimization control.
  • Chores

    • Updated third-party submodule reference.

✏️ Tip: You can customize this high-level summary in your review settings.

- Modified `VectorizeLoop` and related functions to accept an `arith::Analyzer` parameter, enhancing their capability to perform analysis during vectorization.
- Updated multiple instances in `copy.cc`, `fill.cc`, `parallel.cc`, and layout inference files to utilize the new analyzer parameter for improved performance and correctness.
- Ensured consistency across vectorization logic by integrating the analyzer into existing workflows, facilitating better optimization opportunities.
@LeiWang1999 LeiWang1999 requested a review from Copilot November 21, 2025 13:10
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Per-inference arith::Analyzer instances are added and threaded through layout inference and lowering; vectorization APIs and planner were refactored to accept analyzer pointers; multiple call sites and the TVM submodule pointer were updated. (≤50 words)

Changes

Cohort / File(s) Summary
Submodule update
3rdparty/tvm
Submodule pointer advanced from bc31e7ad9f9fafd7659dfabafe359fd55a0ffc1e to cd2b2b6013d155b5822300b0a0740fa65320dd9e.
Lowering call-site updates
src/op/copy.cc, src/op/fill.cc, src/transform/legalize_vectorized_loop.cc
Updated VectorizeLoop call sites to pass an additional arith::Analyzer* argument.
Parallel/vectorize size usage
src/op/parallel.cc
GetVectorizeSize call sites changed to pass T.analyzer (threaded analyzer into vector-size computation).
Layout inference analyzer management
src/transform/layout_inference.cc
Added per-infer analyzer_vec_ (vector of unique_ptr<arith::Analyzer>), clone/push during visitation, thread cur_analyzer into InferLayout/RunInferStep, and added runtime size validation.
Vectorization refactor & API
src/transform/loop_vectorize.cc, src/transform/loop_vectorize.h
Refactored planner/visitors to analyzer-aware mutator/visitor, added overloads GetVectorizeSize(const For&, arith::Analyzer*) and VectorizeLoop(const For&, arith::Analyzer*, int), adjusted visitor/mutator return types, and added analyzer ownership/constructors.

Sequence Diagram

sequenceDiagram
    autonumber
    participant LI as LayoutInference
    participant AV as analyzer_vec_
    participant RI as RunInferStep
    participant VV as VectorizePlanner / VectorizeLoop

    LI->>AV: clone current analyzer and push per-infer
    LI->>RI: RunInferStep(cur_infer_id)
    RI->>AV: retrieve analyzer_vec_[cur_infer_id]
    AV-->>RI: cur_analyzer
    RI->>VV: VectorizeLoop(for_node, cur_analyzer)
    VV->>VV: use analyzer for GetVectorizeSize & expr simplification
    VV-->>RI: return transformed loop (vectorized or original)

    note right of AV: assert analyzer_vec_.size() == infer_list_.size()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to:
    • src/transform/loop_vectorize.cc / .h (visitor → mutator refactor, new overloads, analyzer ownership)
    • src/transform/layout_inference.cc (analyzer lifecycle, cloning, indexing and assertion)
    • All updated call sites (src/op/copy.cc, src/op/fill.cc, src/op/parallel.cc, src/transform/legalize_vectorized_loop.cc) for consistent analyzer passing

Possibly related PRs

Suggested reviewers

  • chengyupku
  • xysmlx

Poem

🐰 I hopped through loops and cloned a mind,

Per-infer analyzers now I find.
Mutators hum where visitors played,
Vector lanes carved with care and braid.
Hop, simplify, and ship — carrot-lined! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references backing up/propagating analyzer for arithmetic information, which directly aligns with the main refactoring objective of making vectorization analysis analyzer-aware.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab44b74 and 6221326.

📒 Files selected for processing (1)
  • src/transform/loop_vectorize.cc (8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
Repo: tile-ai/tilelang PR: 794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.

Applied to files:

  • src/transform/loop_vectorize.cc
🧬 Code graph analysis (1)
src/transform/loop_vectorize.cc (3)
src/transform/common/loop_fusion_utils.h (2)
  • VisitStmt_ (61-67)
  • VisitStmt_ (104-222)
src/transform/loop_partition.cc (8)
  • node (41-48)
  • node (41-41)
  • node (49-56)
  • node (49-49)
  • node (171-184)
  • node (171-171)
  • node (234-242)
  • node (234-234)
src/transform/atomicadd_vectorize.cc (2)
  • UpdateVectorSize (111-151)
  • UpdateVectorSize (111-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint
🔇 Additional comments (3)
src/transform/loop_vectorize.cc (3)

100-116: Nested loop detection logic is now correct.

The refactored nested loop detection visits only node->body (line 100), which correctly excludes the current ForNode itself. The conditional GCD adjustment (lines 106-116) will now properly run for true innermost loops.


247-254: Analyzer ownership pattern is safe and well-designed.

The two overloads provide flexibility: one creates a local stack-allocated analyzer for convenience (lines 247-250), while the other accepts an external analyzer for context-aware analysis (lines 252-254). The stack allocation ensures the analyzer outlives the planner usage without leaks.


325-347: VectorizeLoop overloads follow consistent pattern.

Both overloads correctly handle analyzer ownership and provide appropriate defaults when vectorize_hint <= 0. The implementation is consistent with the GetVectorizeSize overload pattern.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the loop vectorization infrastructure to make vectorization analysis more robust and analyzer-aware. The changes enable vectorization functions to accept an arith::Analyzer parameter, allowing for more accurate context-aware vectorization decisions based on arithmetic information available outside the vectorization pass.

  • Extends GetVectorizeSize and VectorizeLoop to accept an optional arith::Analyzer* parameter for context-aware analysis
  • Refactors VectorizePlanner to use IRMutatorWithAnalyzer instead of IRVisitorWithAnalyzer to support mutating analysis
  • Propagates analyzer context throughout the codebase to all vectorization call sites

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/transform/loop_vectorize.h Adds overloaded function declarations for GetVectorizeSize and VectorizeLoop that accept an arith::Analyzer* parameter
src/transform/loop_vectorize.cc Refactors VectorizePlanner to inherit from IRMutatorWithAnalyzer and implements analyzer-aware vectorization logic with improved expression simplification
src/transform/legalize_vectorized_loop.cc Updates VectorizeLoop call to pass the analyzer context
src/transform/layout_inference.cc Maintains a vector of analyzers and propagates appropriate analyzer to InferLayout and VectorizeLoop calls
src/op/parallel.cc Updates GetVectorizeSize call to pass the analyzer from layout inference arguments
src/op/fill.cc Updates all VectorizeLoop calls to pass the analyzer parameter
src/op/copy.cc Updates VectorizeLoop calls to pass the analyzer parameter
3rdparty/tvm Updates TVM submodule to newer commit

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bool contains_nested_for = false;
// Must analysis vectorization on the innermost loop
PostOrderVisit(Downcast<Stmt>(node), [&](const ObjectRef &obj) {
if (obj.as<ForNode>()) {
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for detecting nested loops incorrectly includes the current ForNode itself. PostOrderVisit will visit node and set contains_nested_for = true even when there are no nested loops. This causes the vectorization analysis to be skipped for all loops. Consider checking if the visited ForNode is different from node before setting the flag.

Suggested change
if (obj.as<ForNode>()) {
const ForNode* for_node = obj.as<ForNode>();
if (for_node && for_node != node) {

Copilot uses AI. Check for mistakes.
if (!analyzer->CanProveEqual(
FloorMod(Substitute(expr, {{var, zero}}), target_size_for_expr), 0)) {
if (!analyzer->CanProveEqual(FloorMod(simplified_expr, target_size_for_expr),
zero)) {
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing FloorMod(simplified_expr, target_size_for_expr) against zero is incorrect. The result of FloorMod has the same dtype as simplified_expr (i.e., expr.dtype()), but zero has type var.dtype(). These types may differ, causing the comparison to fail. Use make_const(expr.dtype(), 0) instead of zero.

Suggested change
zero)) {
make_const(expr.dtype(), 0))) {

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/op/parallel.cc (1)

453-491: Free-mode vector-size planning now analyzer-aware

Switching to GetVectorizeSize(maybe_remapped_root_, T.analyzer) lets ParallelOpNode::InferLayout reuse the per-infer analyzer state, aligning vectorization planning with layout inference. Assuming LayoutInferArgs::analyzer is always populated (as done in BufferUseDefCollector::RunInferStep), this is a sound change. You might optionally add an ICHECK(T.analyzer) near the top of InferLayout for robustness.

src/transform/loop_vectorize.cc (1)

170-199: Analyzer-aware helpers and improved base-offset check in vectorization are solid

  • UpdateVectorSize now correctly passes analyzer_ into CanProveIndependent and IndiceCanVectorize, matching the new signatures.
  • In IndiceCanVectorize, simplifying Substitute(expr, {{var, zero}}) before checking FloorMod(..., target_size_for_expr) is a good robustness improvement: it tests divisibility of the base offset rather than the full symbolic expression, which is often unprovable.
  • The analyzer-aware overloads of GetVectorizeSize and VectorizeLoop are thin wrappers that cleanly reuse VectorizePlanner(analyzer) and the existing VectorizeRewriter, preserving behavior while allowing callers to thread in their own analyzers.

Once the planner’s analyzer ownership and innermost-loop detection are fixed, this analyzer plumbing should make vectorization decisions more stable and context-aware.

Also applies to: 272-321, 248-252, 334-344

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf90a5f and d9560ea.

📒 Files selected for processing (8)
  • 3rdparty/tvm (1 hunks)
  • src/op/copy.cc (2 hunks)
  • src/op/fill.cc (2 hunks)
  • src/op/parallel.cc (1 hunks)
  • src/transform/layout_inference.cc (8 hunks)
  • src/transform/legalize_vectorized_loop.cc (1 hunks)
  • src/transform/loop_vectorize.cc (8 hunks)
  • src/transform/loop_vectorize.h (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
src/transform/legalize_vectorized_loop.cc (2)
src/transform/loop_vectorize.cc (4)
  • VectorizeLoop (323-332)
  • VectorizeLoop (323-323)
  • VectorizeLoop (334-344)
  • VectorizeLoop (334-335)
src/transform/vectorize_loop.cc (2)
  • VectorizeLoop (867-879)
  • VectorizeLoop (867-867)
src/op/parallel.cc (1)
src/transform/loop_vectorize.cc (4)
  • GetVectorizeSize (248-248)
  • GetVectorizeSize (248-248)
  • GetVectorizeSize (250-252)
  • GetVectorizeSize (250-250)
src/op/fill.cc (1)
src/transform/loop_vectorize.cc (4)
  • VectorizeLoop (323-332)
  • VectorizeLoop (323-323)
  • VectorizeLoop (334-344)
  • VectorizeLoop (334-335)
src/op/copy.cc (1)
src/transform/loop_vectorize.cc (4)
  • VectorizeLoop (323-332)
  • VectorizeLoop (323-323)
  • VectorizeLoop (334-344)
  • VectorizeLoop (334-335)
src/transform/loop_vectorize.h (1)
src/transform/loop_vectorize.cc (8)
  • GetVectorizeSize (248-248)
  • GetVectorizeSize (248-248)
  • GetVectorizeSize (250-252)
  • GetVectorizeSize (250-250)
  • VectorizeLoop (323-332)
  • VectorizeLoop (323-323)
  • VectorizeLoop (334-344)
  • VectorizeLoop (334-335)
src/transform/layout_inference.cc (1)
src/transform/loop_vectorize.cc (4)
  • VectorizeLoop (323-332)
  • VectorizeLoop (323-323)
  • VectorizeLoop (334-344)
  • VectorizeLoop (334-335)
src/transform/loop_vectorize.cc (2)
src/transform/common/loop_fusion_utils.h (2)
  • VisitStmt_ (61-67)
  • VisitStmt_ (104-222)
src/transform/loop_partition.cc (8)
  • node (41-48)
  • node (41-41)
  • node (49-56)
  • node (49-49)
  • node (171-184)
  • node (171-171)
  • node (234-242)
  • node (234-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint
🔇 Additional comments (7)
src/transform/legalize_vectorized_loop.cc (1)

68-77: Analyzer-aware vectorization hookup looks correct

Passing analyzer_ into VectorizeLoop keeps this legalizer aligned with the new analyzer-aware overload and reuses the same analysis context used during mutation. No additional issues here.

src/op/fill.cc (1)

203-237: Consistent use of analyzer-aware VectorizeLoop

All three updated call sites (local.fragment, local, shared/global) correctly pass the existing arith::Analyzer* into VectorizeLoop, reusing the same context already used for PartitionLoop and layout inference. This is aligned with the new API.

src/transform/loop_vectorize.h (1)

36-44: Analyzer-aware vectorization API is well-shaped

The added overloads for GetVectorizeSize and VectorizeLoop cleanly extend the API to take an arith::Analyzer* while keeping existing signatures intact, which should preserve compatibility for older call sites.

src/op/copy.cc (1)

842-869: Vectorization now correctly reuses the lowering analyzer

Both CPU and GPU branches now pass the existing arith::Analyzer* into VectorizeLoop, so vector-size planning can leverage the same simplification and bounds information used for MakeSIMTLoop, fusion, layout inference, and PartitionLoop. This looks consistent and correct.

src/transform/layout_inference.cc (2)

15-17: Per-infer analyzer propagation in layout inference is consistent

The new analyzer_vec_ machinery (clone on each operator/parallel-for, size checks in Run(), and retrieval in RunInferStep) cleanly associates a dedicated arith::Analyzer instance with each infer_list_ entry. Using cur_analyzer in LayoutInferArgs ensures operators get a stable, context-appropriate analyzer during InferLayout. The use of std::unique_ptr plus the explicit size invariants makes indexing safe. This design looks sound.

Also applies to: 89-116, 262-276, 434-461, 531-553, 682-695


1034-1036: Layout-time vectorization now uses the pass analyzer

Changing VectorizeLoop(for_node) to VectorizeLoop(for_node, analyzer_) in LayoutInferencer::VisitStmt_(const ForNode*) ensures that vectorization planning during layout inference reuses the same arithmetic analyzer as the mutator itself, which is aligned with the rest of the analyzer-aware API.

src/transform/loop_vectorize.cc (1)

48-71: Visitor→mutator refactor for access detection and planning is reasonable

Switching VectorizeFindGlobalAccess to StmtExprVisitor and adapting the various VisitStmt_/VisitExpr_ overrides in VectorizePlanner to call IRMutatorWithAnalyzer::Visit* is consistent with the mutator-based planner design. The behavior of global/shared access detection and the special handling of extern calls and casts is preserved while simplifying dependencies on analyzer-aware visitors.

Also applies to: 121-158, 164-168

@@ -1 +1 @@
Subproject commit bc31e7ad9f9fafd7659dfabafe359fd55a0ffc1e
Subproject commit cd2b2b6013d155b5822300b0a0740fa65320dd9e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify TVM submodule commit validity, stability, and compatibility.

This change updates the TVM submodule pointer to a new commit. Since this PR makes vectorization API changes (adding arith::Analyzer* parameters), we need to ensure:

  1. The commit cd2b2b6013d155b5822300b0a0740fa65320dd9e is valid, reachable, and stable (not a WIP branch).
  2. The TVM version at this commit is compatible with all the vectorization-related changes in this PR.
  3. There are no breaking API changes in TVM that would affect the modified call sites across src/op/copy.cc, src/op/fill.cc, src/op/parallel.cc, and src/transform/*.cc.
  4. The specific commit choice is documented (e.g., in the PR description or a commit message).

Consider whether pinning to a stable TVM release tag (instead of a commit SHA) would improve maintainability and reproducibility.

Run the following script to verify the TVM submodule commit:


🏁 Script executed:

#!/bin/bash
# Description: Verify TVM submodule commit validity and fetch commit details.

# Navigate to TVM submodule and fetch commit information
cd 3rdparty/tvm 2>/dev/null || { echo "TVM submodule not found"; exit 1; }

# Check if commit exists and is reachable
git log --oneline -1 cd2b2b6013d155b5822300b0a0740fa65320dd9e 2>/dev/null && echo "✓ Commit found" || echo "✗ Commit not found"

# Get commit message and author
echo -e "\nCommit details:"
git log -1 --format="%H%n%an%n%ai%n%s" cd2b2b6013d155b5822300b0a0740fa65320dd9e 2>/dev/null || echo "Unable to retrieve commit details"

# Check if commit is on main or master branch
echo -e "\nBranch status:"
git branch -r --contains cd2b2b6013d155b5822300b0a0740fa65320dd9e 2>/dev/null | head -5 || echo "Unable to determine branch"

# Get the diff summary between old and new commits
echo -e "\nChanges between commits (if both are reachable):"
git diff --stat bc31e7ad9f9fafd7659dfabafe359fd55a0ffc1e cd2b2b6013d155b5822300b0a0740fa65320dd9e 2>/dev/null | head -20 || echo "Unable to compute diff"

Length of output: 135


I need to verify whether the TVM commit exists in the public repository. The local submodule clone may not have fetched it. Let me search for information about this commit.

Let me verify the submodule reference itself and check for additional context:

Critical: TVM submodule points to non-existent commit

The TVM submodule pointer cd2b2b6013d155b5822300b0a0740fa65320dd9e cannot be found in the TVM repository. Verification shows:

  • Local submodule clone reports commit not found
  • Public TVM commit history (recent main branch) contains no such commit
  • The commit is not reachable from any TVM branch

This invalid reference will break reproducibility and cause build failures. Before merging, either:

  1. Verify the correct TVM commit hash and update the reference
  2. Confirm whether a stable TVM release tag should be used instead
  3. Document the rationale for this specific commit choice if intentional

Without a valid, publicly-accessible commit, this PR cannot be built or reproduced.

🤖 Prompt for AI Agents
In 3rdparty/tvm around lines 1 to 1, the submodule is pointing to a non-existent
commit (cd2b2b6013d155b5822300b0a0740fa65320dd9e) which will break builds;
verify the correct TVM commit or tag by checking the upstream TVM repo (git
fetch --all and git ls-remote or search for the intended commit/tag), then
update the submodule pointer to a publicly-accessible commit or stable release
(checkout the correct commit/tag in the submodule, git add it, and commit the
updated submodule SHA and any .gitmodules changes), or if the unreachable commit
is intentional, replace it with an explicit stable tag and add a short note in
the PR or repository docs explaining why that exact ref is required.

- Updated the PostOrderVisit function to analyze the body of the loop node instead of the node itself, ensuring proper handling of nested loops during vectorization analysis.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/transform/loop_vectorize.cc (3)

73-78: Heap-allocated analyzer leaks memory.

The default constructor allocates an arith::Analyzer on the heap with new, but IRMutatorWithAnalyzer does not take ownership of this pointer, resulting in a memory leak. Consider making the planner own a stack-allocated arith::Analyzer member and passing its address to the base class in the default constructor.

Based on learnings


97-119: Nested loop detection incorrectly includes the current node.

The PostOrderVisit traversal includes node itself, so contains_nested_for is always set to true even for innermost loops. This prevents the extent-based ZeroAwareGCD adjustment from ever running, which can lead to ICHECK failures when extent % vector_size_ != 0.

Apply this diff to exclude the current node:

-    PostOrderVisit(Downcast<Stmt>(node->body), [&](const ObjectRef &obj) {
-      if (obj.as<ForNode>()) {
+    PostOrderVisit(Downcast<Stmt>(node->body), [&](const ObjectRef &obj) {
+      if (auto *inner = obj.as<ForNode>()) {
+        if (inner != node) {
-        contains_nested_for = true;
+          contains_nested_for = true;
+        }
       }
     });

291-294: Type mismatch in divisibility check.

The constant zero has type var.dtype() (line 286), but FloorMod(simplified_expr, target_size_for_expr) has type expr.dtype(). This type mismatch can cause the comparison to fail when the types differ.

Apply this diff:

-                               zero)) {
+                               make_const(expr.dtype(), 0))) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9560ea and 7484f05.

📒 Files selected for processing (1)
  • src/transform/loop_vectorize.cc (8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/transform/loop_vectorize.cc (3)
src/op/parallel.cc (4)
  • VisitStmt_ (131-147)
  • VisitStmt_ (131-131)
  • VisitStmt_ (149-161)
  • VisitStmt_ (149-149)
src/transform/common/loop_fusion_utils.h (2)
  • VisitStmt_ (61-67)
  • VisitStmt_ (104-222)
src/transform/loop_partition.cc (8)
  • node (41-48)
  • node (41-41)
  • node (49-56)
  • node (49-49)
  • node (171-184)
  • node (171-171)
  • node (234-242)
  • node (234-234)
🔇 Additional comments (7)
src/transform/loop_vectorize.cc (7)

121-135: LGTM!

The refactor to return PrimExpr and use IRMutatorWithAnalyzer is correct, and the method properly delegates to the base class.


137-143: LGTM!

The refactor to return Stmt and use IRMutatorWithAnalyzer is correct.


145-148: LGTM!

The refactor is correct.


150-158: LGTM!

The refactor is correct.


164-168: LGTM!

The refactor is correct.


250-252: LGTM!

The new analyzer-aware overload correctly propagates the analyzer to the planner.


334-344: LGTM!

The new analyzer-aware overload correctly propagates the analyzer to the planner and maintains the same vectorization logic as the original function.

@LeiWang1999
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@LeiWang1999 LeiWang1999 merged commit 9f7bac4 into tile-ai:main Nov 23, 2025
6 checks passed
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
…ile-ai#1311)

* [Refactor] Update Vectorization Functions to Accept Analyzer Parameter

- Modified `VectorizeLoop` and related functions to accept an `arith::Analyzer` parameter, enhancing their capability to perform analysis during vectorization.
- Updated multiple instances in `copy.cc`, `fill.cc`, `parallel.cc`, and layout inference files to utilize the new analyzer parameter for improved performance and correctness.
- Ensured consistency across vectorization logic by integrating the analyzer into existing workflows, facilitating better optimization opportunities.

* [Fix] Corrected PostOrderVisit call in loop_vectorize.cc

- Updated the PostOrderVisit function to analyze the body of the loop node instead of the node itself, ensuring proper handling of nested loops during vectorization analysis.

* fix

* lint fix

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant