Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Feb 9, 2026

TileLang uses ForKind::kParallel as a frontend "SIMT loop" marker in a few places.
When vectorization resolves to size=1 (a no-op) or when VectorizeRewriter splits
loops, leaving loops as kParallel can block downstream serial-only transforms
(e.g. pragma-unroll rewriting).

This change:

  • Rewrites kParallel -> kSerial when VectorizeLoop resolves to vector size 1.
  • Ensures VectorizeRewriter output also downgrades kParallel loops to kSerial
    (including the generated outer loop after splitting).
  • Renames LoopPragmaUnroll to PragmaUnrollLoop and applies unrolling
    consistently after partition/vectorize in loop lowering and ops.

Validated with:

  • debug/0209_issues/local_parallel.py
  • TILELANG_CACHE_DIR=/tmp/tilelang-cache python testing/python/cpu/test_tilelang_cpu_gemm.py

When vectorization resolves to size=1, keep semantics but rewrite ForKind::kParallel to kSerial so downstream transforms (e.g. pragma-unroll) can apply.\n\nAlso downgrade kParallel in VectorizeRewriter outputs, and rename/apply PragmaUnrollLoop consistently after partition/vectorize.
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

This PR refactors loop unrolling pragma application across operator lowering and loop transformation passes. It renames the public unrolling function from LoopPragmaUnroll to PragmaUnrollLoop, updates all call sites, and introduces a new ParallelToSerial helper to serialize parallel loops when vectorization is disabled (vector_size == 1).

Changes

Cohort / File(s) Summary
Operator Lowering
src/op/copy.cc, src/op/fill.cc, src/op/reduce.cc
Updated call sites to use renamed PragmaUnrollLoop function. In fill.cc, vectorization is now consistently followed by pragma-unroll application across all scope branches instead of returning vectorized loops directly.
Loop Partition API
src/transform/loop_partition.h, src/transform/loop_partition.cc
Renamed public function LoopPragmaUnroll to PragmaUnrollLoop. Refactored implementation to defer unrolling via the new wrapper function, applying it post-vectorization in the lower-phase path.
Loop Vectorization
src/transform/loop_vectorize.cc
Introduced new ParallelToSerial rewriter (internal) to downgrade ForKind::kParallel loops to ForKind::kSerial when vectorization is disabled. Applied to innermost and non-innermost vectorization paths to preserve semantics for subsequent pragma-unroll passes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Through loops we hop, unroll and roll,
Parallel threads now take control,
Vectorized dreams with pragmas true,
Serial safety shines anew! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix: treat kParallel as serial when vectorizing' accurately summarizes the main objective of the changeset, which is to rewrite ForKind::kParallel loops to kSerial during vectorization to unblock downstream transforms.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
src/transform/loop_partition.cc (1)

265-269: Typo in internal class name: LoopPramaUnrollerLoopPragmaUnroller.

Pre-existing, but since you're touching the public API name (LoopPragmaUnrollPragmaUnrollLoop), this would be a good time to fix the internal class name too.

🔤 Fix the typo
-class LoopPramaUnroller : public StmtExprMutator {
+class LoopPragmaUnroller : public StmtExprMutator {
 For PragmaUnrollLoop(For stmt) {
-  LoopPramaUnroller unroller;
+  LoopPragmaUnroller unroller;
   For unrolled = Downcast<For>(unroller(std::move(stmt)));
   return unrolled;
 }

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@LeiWang1999 LeiWang1999 merged commit d25f53c into tile-ai:main Feb 9, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant