-
Notifications
You must be signed in to change notification settings - Fork 331
[Refactor] Backup Analyzer to get the appropriate arith informations #1311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Modified `VectorizeLoop` and related functions to accept an `arith::Analyzer` parameter, enhancing their capability to perform analysis during vectorization. - Updated multiple instances in `copy.cc`, `fill.cc`, `parallel.cc`, and layout inference files to utilize the new analyzer parameter for improved performance and correctness. - Ensured consistency across vectorization logic by integrating the analyzer into existing workflows, facilitating better optimization opportunities.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughPer-inference arith::Analyzer instances are added and threaded through layout inference and lowering; vectorization APIs and planner were refactored to accept analyzer pointers; multiple call sites and the TVM submodule pointer were updated. (≤50 words) Changes
Sequence DiagramsequenceDiagram
autonumber
participant LI as LayoutInference
participant AV as analyzer_vec_
participant RI as RunInferStep
participant VV as VectorizePlanner / VectorizeLoop
LI->>AV: clone current analyzer and push per-infer
LI->>RI: RunInferStep(cur_infer_id)
RI->>AV: retrieve analyzer_vec_[cur_infer_id]
AV-->>RI: cur_analyzer
RI->>VV: VectorizeLoop(for_node, cur_analyzer)
VV->>VV: use analyzer for GetVectorizeSize & expr simplification
VV-->>RI: return transformed loop (vectorized or original)
note right of AV: assert analyzer_vec_.size() == infer_list_.size()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-09-12T09:47:46.474ZApplied to files:
🧬 Code graph analysis (1)src/transform/loop_vectorize.cc (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors the loop vectorization infrastructure to make vectorization analysis more robust and analyzer-aware. The changes enable vectorization functions to accept an arith::Analyzer parameter, allowing for more accurate context-aware vectorization decisions based on arithmetic information available outside the vectorization pass.
- Extends
GetVectorizeSizeandVectorizeLoopto accept an optionalarith::Analyzer*parameter for context-aware analysis - Refactors
VectorizePlannerto useIRMutatorWithAnalyzerinstead ofIRVisitorWithAnalyzerto support mutating analysis - Propagates analyzer context throughout the codebase to all vectorization call sites
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/transform/loop_vectorize.h | Adds overloaded function declarations for GetVectorizeSize and VectorizeLoop that accept an arith::Analyzer* parameter |
| src/transform/loop_vectorize.cc | Refactors VectorizePlanner to inherit from IRMutatorWithAnalyzer and implements analyzer-aware vectorization logic with improved expression simplification |
| src/transform/legalize_vectorized_loop.cc | Updates VectorizeLoop call to pass the analyzer context |
| src/transform/layout_inference.cc | Maintains a vector of analyzers and propagates appropriate analyzer to InferLayout and VectorizeLoop calls |
| src/op/parallel.cc | Updates GetVectorizeSize call to pass the analyzer from layout inference arguments |
| src/op/fill.cc | Updates all VectorizeLoop calls to pass the analyzer parameter |
| src/op/copy.cc | Updates VectorizeLoop calls to pass the analyzer parameter |
| 3rdparty/tvm | Updates TVM submodule to newer commit |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bool contains_nested_for = false; | ||
| // Must analysis vectorization on the innermost loop | ||
| PostOrderVisit(Downcast<Stmt>(node), [&](const ObjectRef &obj) { | ||
| if (obj.as<ForNode>()) { |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for detecting nested loops incorrectly includes the current ForNode itself. PostOrderVisit will visit node and set contains_nested_for = true even when there are no nested loops. This causes the vectorization analysis to be skipped for all loops. Consider checking if the visited ForNode is different from node before setting the flag.
| if (obj.as<ForNode>()) { | |
| const ForNode* for_node = obj.as<ForNode>(); | |
| if (for_node && for_node != node) { |
| if (!analyzer->CanProveEqual( | ||
| FloorMod(Substitute(expr, {{var, zero}}), target_size_for_expr), 0)) { | ||
| if (!analyzer->CanProveEqual(FloorMod(simplified_expr, target_size_for_expr), | ||
| zero)) { |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing FloorMod(simplified_expr, target_size_for_expr) against zero is incorrect. The result of FloorMod has the same dtype as simplified_expr (i.e., expr.dtype()), but zero has type var.dtype(). These types may differ, causing the comparison to fail. Use make_const(expr.dtype(), 0) instead of zero.
| zero)) { | |
| make_const(expr.dtype(), 0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/op/parallel.cc (1)
453-491: Free-mode vector-size planning now analyzer-awareSwitching to
GetVectorizeSize(maybe_remapped_root_, T.analyzer)letsParallelOpNode::InferLayoutreuse the per-infer analyzer state, aligning vectorization planning with layout inference. AssumingLayoutInferArgs::analyzeris always populated (as done inBufferUseDefCollector::RunInferStep), this is a sound change. You might optionally add anICHECK(T.analyzer)near the top ofInferLayoutfor robustness.src/transform/loop_vectorize.cc (1)
170-199: Analyzer-aware helpers and improved base-offset check in vectorization are solid
UpdateVectorSizenow correctly passesanalyzer_intoCanProveIndependentandIndiceCanVectorize, matching the new signatures.- In
IndiceCanVectorize, simplifyingSubstitute(expr, {{var, zero}})before checkingFloorMod(..., target_size_for_expr)is a good robustness improvement: it tests divisibility of the base offset rather than the full symbolic expression, which is often unprovable.- The analyzer-aware overloads of
GetVectorizeSizeandVectorizeLoopare thin wrappers that cleanly reuseVectorizePlanner(analyzer)and the existingVectorizeRewriter, preserving behavior while allowing callers to thread in their own analyzers.Once the planner’s analyzer ownership and innermost-loop detection are fixed, this analyzer plumbing should make vectorization decisions more stable and context-aware.
Also applies to: 272-321, 248-252, 334-344
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
3rdparty/tvm(1 hunks)src/op/copy.cc(2 hunks)src/op/fill.cc(2 hunks)src/op/parallel.cc(1 hunks)src/transform/layout_inference.cc(8 hunks)src/transform/legalize_vectorized_loop.cc(1 hunks)src/transform/loop_vectorize.cc(8 hunks)src/transform/loop_vectorize.h(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
src/transform/legalize_vectorized_loop.cc (2)
src/transform/loop_vectorize.cc (4)
VectorizeLoop(323-332)VectorizeLoop(323-323)VectorizeLoop(334-344)VectorizeLoop(334-335)src/transform/vectorize_loop.cc (2)
VectorizeLoop(867-879)VectorizeLoop(867-867)
src/op/parallel.cc (1)
src/transform/loop_vectorize.cc (4)
GetVectorizeSize(248-248)GetVectorizeSize(248-248)GetVectorizeSize(250-252)GetVectorizeSize(250-250)
src/op/fill.cc (1)
src/transform/loop_vectorize.cc (4)
VectorizeLoop(323-332)VectorizeLoop(323-323)VectorizeLoop(334-344)VectorizeLoop(334-335)
src/op/copy.cc (1)
src/transform/loop_vectorize.cc (4)
VectorizeLoop(323-332)VectorizeLoop(323-323)VectorizeLoop(334-344)VectorizeLoop(334-335)
src/transform/loop_vectorize.h (1)
src/transform/loop_vectorize.cc (8)
GetVectorizeSize(248-248)GetVectorizeSize(248-248)GetVectorizeSize(250-252)GetVectorizeSize(250-250)VectorizeLoop(323-332)VectorizeLoop(323-323)VectorizeLoop(334-344)VectorizeLoop(334-335)
src/transform/layout_inference.cc (1)
src/transform/loop_vectorize.cc (4)
VectorizeLoop(323-332)VectorizeLoop(323-323)VectorizeLoop(334-344)VectorizeLoop(334-335)
src/transform/loop_vectorize.cc (2)
src/transform/common/loop_fusion_utils.h (2)
VisitStmt_(61-67)VisitStmt_(104-222)src/transform/loop_partition.cc (8)
node(41-48)node(41-41)node(49-56)node(49-49)node(171-184)node(171-171)node(234-242)node(234-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
🔇 Additional comments (7)
src/transform/legalize_vectorized_loop.cc (1)
68-77: Analyzer-aware vectorization hookup looks correctPassing
analyzer_intoVectorizeLoopkeeps this legalizer aligned with the new analyzer-aware overload and reuses the same analysis context used during mutation. No additional issues here.src/op/fill.cc (1)
203-237: Consistent use of analyzer-awareVectorizeLoopAll three updated call sites (
local.fragment,local,shared/global) correctly pass the existingarith::Analyzer*intoVectorizeLoop, reusing the same context already used forPartitionLoopand layout inference. This is aligned with the new API.src/transform/loop_vectorize.h (1)
36-44: Analyzer-aware vectorization API is well-shapedThe added overloads for
GetVectorizeSizeandVectorizeLoopcleanly extend the API to take anarith::Analyzer*while keeping existing signatures intact, which should preserve compatibility for older call sites.src/op/copy.cc (1)
842-869: Vectorization now correctly reuses the lowering analyzerBoth CPU and GPU branches now pass the existing
arith::Analyzer*intoVectorizeLoop, so vector-size planning can leverage the same simplification and bounds information used forMakeSIMTLoop, fusion, layout inference, andPartitionLoop. This looks consistent and correct.src/transform/layout_inference.cc (2)
15-17: Per-infer analyzer propagation in layout inference is consistentThe new
analyzer_vec_machinery (clone on each operator/parallel-for, size checks inRun(), and retrieval inRunInferStep) cleanly associates a dedicatedarith::Analyzerinstance with eachinfer_list_entry. Usingcur_analyzerinLayoutInferArgsensures operators get a stable, context-appropriate analyzer duringInferLayout. The use ofstd::unique_ptrplus the explicit size invariants makes indexing safe. This design looks sound.Also applies to: 89-116, 262-276, 434-461, 531-553, 682-695
1034-1036: Layout-time vectorization now uses the pass analyzerChanging
VectorizeLoop(for_node)toVectorizeLoop(for_node, analyzer_)inLayoutInferencer::VisitStmt_(const ForNode*)ensures that vectorization planning during layout inference reuses the same arithmetic analyzer as the mutator itself, which is aligned with the rest of the analyzer-aware API.src/transform/loop_vectorize.cc (1)
48-71: Visitor→mutator refactor for access detection and planning is reasonableSwitching
VectorizeFindGlobalAccesstoStmtExprVisitorand adapting the variousVisitStmt_/VisitExpr_overrides inVectorizePlannerto callIRMutatorWithAnalyzer::Visit*is consistent with the mutator-based planner design. The behavior of global/shared access detection and the special handling of extern calls and casts is preserved while simplifying dependencies on analyzer-aware visitors.Also applies to: 121-158, 164-168
| @@ -1 +1 @@ | |||
| Subproject commit bc31e7ad9f9fafd7659dfabafe359fd55a0ffc1e | |||
| Subproject commit cd2b2b6013d155b5822300b0a0740fa65320dd9e | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify TVM submodule commit validity, stability, and compatibility.
This change updates the TVM submodule pointer to a new commit. Since this PR makes vectorization API changes (adding arith::Analyzer* parameters), we need to ensure:
- The commit
cd2b2b6013d155b5822300b0a0740fa65320dd9eis valid, reachable, and stable (not a WIP branch). - The TVM version at this commit is compatible with all the vectorization-related changes in this PR.
- There are no breaking API changes in TVM that would affect the modified call sites across
src/op/copy.cc,src/op/fill.cc,src/op/parallel.cc, andsrc/transform/*.cc. - The specific commit choice is documented (e.g., in the PR description or a commit message).
Consider whether pinning to a stable TVM release tag (instead of a commit SHA) would improve maintainability and reproducibility.
Run the following script to verify the TVM submodule commit:
🏁 Script executed:
#!/bin/bash
# Description: Verify TVM submodule commit validity and fetch commit details.
# Navigate to TVM submodule and fetch commit information
cd 3rdparty/tvm 2>/dev/null || { echo "TVM submodule not found"; exit 1; }
# Check if commit exists and is reachable
git log --oneline -1 cd2b2b6013d155b5822300b0a0740fa65320dd9e 2>/dev/null && echo "✓ Commit found" || echo "✗ Commit not found"
# Get commit message and author
echo -e "\nCommit details:"
git log -1 --format="%H%n%an%n%ai%n%s" cd2b2b6013d155b5822300b0a0740fa65320dd9e 2>/dev/null || echo "Unable to retrieve commit details"
# Check if commit is on main or master branch
echo -e "\nBranch status:"
git branch -r --contains cd2b2b6013d155b5822300b0a0740fa65320dd9e 2>/dev/null | head -5 || echo "Unable to determine branch"
# Get the diff summary between old and new commits
echo -e "\nChanges between commits (if both are reachable):"
git diff --stat bc31e7ad9f9fafd7659dfabafe359fd55a0ffc1e cd2b2b6013d155b5822300b0a0740fa65320dd9e 2>/dev/null | head -20 || echo "Unable to compute diff"Length of output: 135
I need to verify whether the TVM commit exists in the public repository. The local submodule clone may not have fetched it. Let me search for information about this commit.
Let me verify the submodule reference itself and check for additional context:
Critical: TVM submodule points to non-existent commit
The TVM submodule pointer cd2b2b6013d155b5822300b0a0740fa65320dd9e cannot be found in the TVM repository. Verification shows:
- Local submodule clone reports commit not found
- Public TVM commit history (recent main branch) contains no such commit
- The commit is not reachable from any TVM branch
This invalid reference will break reproducibility and cause build failures. Before merging, either:
- Verify the correct TVM commit hash and update the reference
- Confirm whether a stable TVM release tag should be used instead
- Document the rationale for this specific commit choice if intentional
Without a valid, publicly-accessible commit, this PR cannot be built or reproduced.
🤖 Prompt for AI Agents
In 3rdparty/tvm around lines 1 to 1, the submodule is pointing to a non-existent
commit (cd2b2b6013d155b5822300b0a0740fa65320dd9e) which will break builds;
verify the correct TVM commit or tag by checking the upstream TVM repo (git
fetch --all and git ls-remote or search for the intended commit/tag), then
update the submodule pointer to a publicly-accessible commit or stable release
(checkout the correct commit/tag in the submodule, git add it, and commit the
updated submodule SHA and any .gitmodules changes), or if the unreachable commit
is intentional, replace it with an explicit stable tag and add a short note in
the PR or repository docs explaining why that exact ref is required.
- Updated the PostOrderVisit function to analyze the body of the loop node instead of the node itself, ensuring proper handling of nested loops during vectorization analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/transform/loop_vectorize.cc (3)
73-78: Heap-allocated analyzer leaks memory.The default constructor allocates an
arith::Analyzeron the heap withnew, butIRMutatorWithAnalyzerdoes not take ownership of this pointer, resulting in a memory leak. Consider making the planner own a stack-allocatedarith::Analyzermember and passing its address to the base class in the default constructor.Based on learnings
97-119: Nested loop detection incorrectly includes the current node.The
PostOrderVisittraversal includesnodeitself, socontains_nested_foris always set totrueeven for innermost loops. This prevents the extent-basedZeroAwareGCDadjustment from ever running, which can lead to ICHECK failures whenextent % vector_size_ != 0.Apply this diff to exclude the current node:
- PostOrderVisit(Downcast<Stmt>(node->body), [&](const ObjectRef &obj) { - if (obj.as<ForNode>()) { + PostOrderVisit(Downcast<Stmt>(node->body), [&](const ObjectRef &obj) { + if (auto *inner = obj.as<ForNode>()) { + if (inner != node) { - contains_nested_for = true; + contains_nested_for = true; + } } });
291-294: Type mismatch in divisibility check.The constant
zerohas typevar.dtype()(line 286), butFloorMod(simplified_expr, target_size_for_expr)has typeexpr.dtype(). This type mismatch can cause the comparison to fail when the types differ.Apply this diff:
- zero)) { + make_const(expr.dtype(), 0))) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/transform/loop_vectorize.cc(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/transform/loop_vectorize.cc (3)
src/op/parallel.cc (4)
VisitStmt_(131-147)VisitStmt_(131-131)VisitStmt_(149-161)VisitStmt_(149-149)src/transform/common/loop_fusion_utils.h (2)
VisitStmt_(61-67)VisitStmt_(104-222)src/transform/loop_partition.cc (8)
node(41-48)node(41-41)node(49-56)node(49-49)node(171-184)node(171-171)node(234-242)node(234-234)
🔇 Additional comments (7)
src/transform/loop_vectorize.cc (7)
121-135: LGTM!The refactor to return
PrimExprand useIRMutatorWithAnalyzeris correct, and the method properly delegates to the base class.
137-143: LGTM!The refactor to return
Stmtand useIRMutatorWithAnalyzeris correct.
145-148: LGTM!The refactor is correct.
150-158: LGTM!The refactor is correct.
164-168: LGTM!The refactor is correct.
250-252: LGTM!The new analyzer-aware overload correctly propagates the analyzer to the planner.
334-344: LGTM!The new analyzer-aware overload correctly propagates the analyzer to the planner and maintains the same vectorization logic as the original function.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ile-ai#1311) * [Refactor] Update Vectorization Functions to Accept Analyzer Parameter - Modified `VectorizeLoop` and related functions to accept an `arith::Analyzer` parameter, enhancing their capability to perform analysis during vectorization. - Updated multiple instances in `copy.cc`, `fill.cc`, `parallel.cc`, and layout inference files to utilize the new analyzer parameter for improved performance and correctness. - Ensured consistency across vectorization logic by integrating the analyzer into existing workflows, facilitating better optimization opportunities. * [Fix] Corrected PostOrderVisit call in loop_vectorize.cc - Updated the PostOrderVisit function to analyze the body of the loop node instead of the node itself, ensuring proper handling of nested loops during vectorization analysis. * fix * lint fix * fix
This pull request introduces several improvements and refactorings to the loop vectorization infrastructure, with a focus on making vectorization analysis more robust and analyzer-aware. The most significant changes are the extension of vectorization-related functions to accept an
arith::Analyzer, updates to the vectorization planning logic, and corresponding changes throughout the codebase to utilize the enhanced APIs.Key changes include:
Vectorization API and Infrastructure Improvements
GetVectorizeSizeandVectorizeLoopto accept anarith::Analyzer*, enabling more accurate and context-aware vectorization analysis. (src/transform/loop_vectorize.h,src/transform/loop_vectorize.cc) [1] [2] [3]VectorizePlannerand related classes to useIRMutatorWithAnalyzerinstead ofIRVisitorWithAnalyzer, allowing for mutating analysis and better integration with analyzers. (src/transform/loop_vectorize.cc) [1] [2] [3] [4]Propagation of Analyzer Context
VectorizeLoopandGetVectorizeSizeto pass the appropriateanalyzerargument, ensuring that vectorization decisions are made with full context. This includes changes in copy, fill, and parallel op lowering, as well as in layout inference and legalization passes. (src/op/copy.cc,src/op/fill.cc,src/op/parallel.cc,src/transform/layout_inference.cc,src/transform/legalize_vectorized_loop.cc) [1] [2] [3] [4] [5] [6] [7]Layout Inference Enhancements
BufferUseDefCollectorin layout inference to maintain a vector of analyzers (analyzer_vec_), ensuring that each inference step uses the correct analyzer context. This includes proper cloning, size checks, and passing analyzers toInferLayout. (src/transform/layout_inference.cc) [1] [2] [3] [4] [5] [6]Vectorization Analysis Logic
src/transform/loop_vectorize.cc) [1] [2]Dependency Updates
3rdparty/tvm)Summary by CodeRabbit
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.