Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Dec 29, 2025

This pull request updates the loop vectorization logic in src/transform/vectorize_loop.cc to improve how scalarization is handled during vectorization. The main change is to defer scalarization to the top-level Vectorize function, ensuring that the entire original loop body is scalarized only when necessary, and preventing redundant or partial scalarization during traversal.

Improvements to vectorization and scalarization logic:

  • The Vectorize function now checks if scalarization is needed after attempting vectorization, and if so, applies scalarization to the entire original loop body instead of partially scalarizing during traversal.
  • The VisitStmt method is updated to return the original statement unchanged if scalarization is needed, delegating the scalarization responsibility to the top-level function and avoiding duplicate scalarization.

Summary by CodeRabbit

  • Refactor
    • Optimized vectorization logic to improve efficiency in handling loop transformations. Updated control flow handling to defer certain operations to higher-level execution paths, reducing redundant per-statement processing.

✏️ Tip: You can customize this high-level summary in your review settings.

* Enhanced the Vectorize function to retain the original body for scalarization if needed.
* Updated VisitStmt and VisitStmt_ methods to return the original statement when scalarization is required, ensuring proper handling of statements during vectorization.
* Added checks after visiting expressions and statements to maintain the integrity of the original structure when scalarization is triggered.

This refactor aims to streamline the vectorization process and improve the handling of scalarization scenarios.
@LeiWang1999
Copy link
Member Author

@regression-perf

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 29, 2025

📝 Walkthrough

Walkthrough

The change modifies vectorization logic in src/transform/vectorize_loop.cc to defer scalarization to the top-level path. When scalarization is needed, the entire original body is scalarized rather than the vectorized result. Per-statement mutation is bypassed when scalarization is flagged.

Changes

Cohort / File(s) Summary
Vectorization and scalarization control flow
src/transform/vectorize_loop.cc
Modified Vectorize entry to preserve original body and defer scalarization; statement-visitor mutator now returns original statement unchanged when scalarization is required, bypassing per-statement mutation

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • kurisu6912

Poem

🐰 A loop once split in scattered ways,
Now scalarizes through deferred days,
The vector and scalar dance as one,
No mutation per-statement done!
The control flow flows so clean,
Best vectorized flow I've seen! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Refactor] Improve scalarization handling in Pass VectorizeLoop' accurately reflects the main change: improving how scalarization is handled in the loop vectorization logic by deferring it to the top-level function.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/transform/vectorize_loop.cc (2)

217-224: Well-structured deferred scalarization approach.

The change correctly preserves the original body and conditionally scalarizes it only when needed, avoiding partial or redundant scalarization during traversal. This is cleaner than the per-statement scalarization pattern.

Optional: Consider adding a brief comment

A comment explaining the deferred scalarization strategy could help future maintainers understand why we preserve original_body and scalarize it instead of the partially vectorized result:

  static Stmt Vectorize(const Var &var, const PrimExpr &var_lanes, Stmt body) {
    TLVectorizer vec{var, var_lanes};
+   // Preserve original body to enable deferred scalarization:
+   // if vectorization cannot proceed, we scalarize the entire original body
+   // rather than partially vectorized fragments.
    Stmt original_body = body;
    auto vec_stmt = vec(std::move(body));
    // If scalarization is needed, scalarize the entire original body
    if (vec.need_scalarize_) {
      return vec.Scalarize(original_body);
    }
    return vec_stmt;
  }

231-238: Correct implementation of deferred scalarization short-circuit.

This diverges from the pattern in loop_vectorization_utils.h (which uses ICHECK(!need_scalarize_) before visiting, then scalarizes per-statement and resets the flag). The new approach intentionally allows need_scalarize_ to persist and short-circuits further statement visitation, deferring all scalarization to the top-level Vectorize function. This architectural change aligns with the PR objectives.

Optional: Consider clarifying the architectural difference

Since this differs from the utils pattern where scalarization is done per-statement with flag resets, a comment could help explain the persistent-flag-with-deferred-scalarization approach:

  Stmt VisitStmt(const Stmt &stmt) final {
-   // If scalarization is already needed, return original stmt unchanged
-   // to let the top-level Vectorize handle it
+   // Deferred scalarization: if need_scalarize_ is already set (e.g., by a
+   // prior expression or statement), return the original stmt unchanged to
+   // avoid further mutation. The top-level Vectorize will scalarize the
+   // entire original body as a whole.
    if (need_scalarize_) {
      return stmt;
    }
    return StmtMutator::VisitStmt(stmt);
  }

Based on learnings, in TVM's mutation patterns, returning the original node when the parent container doesn't need transformation is a common idiom. This pattern extends that by returning original statements once scalarization is flagged, delegating the transformation to a higher level.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 124583b and cc1dde6.

📒 Files selected for processing (1)
  • src/transform/vectorize_loop.cc
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
Repo: tile-ai/tilelang PR: 794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.

Applied to files:

  • src/transform/vectorize_loop.cc
🧬 Code graph analysis (1)
src/transform/vectorize_loop.cc (1)
src/transform/common/loop_vectorization_utils.h (1)
  • VisitStmt (185-194)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)

@LeiWang1999
Copy link
Member Author

LTCP, and regression has been verified locally, merged.

@LeiWang1999 LeiWang1999 merged commit cca8b6f into tile-ai:main Dec 29, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant