[Enhancement] Enhance register vectorize inference #1785

LeiWang1999 · 2026-02-04T05:21:01Z

This pull request introduces an enhancement to the vectorization planning logic in loop_vectorize.cc, specifically improving how buffer indices are validated for vectorization at different vector sizes. The changes make the vectorization planner more robust by allowing it to recognize additional cases where indices can be safely vectorized, not just when they are invariant but also when they meet certain vectorization criteria.

Improvements to vectorization boundary checks:

The loop that checks buffer indices for invariance at the new vector size now also considers whether indices are vectorizable using the IndiceCanVectorize method, expanding the set of cases where vectorization is allowed.
Added a clarifying comment to indicate that the validation checks for both invariance and vectorizability at the new vector size boundary.

Code cleanup:

Removed unnecessary blank lines in the buffer stride and offset calculation logic for better readability.

Summary by CodeRabbit

Refactor
- Improved loop-vectorization revalidation so buffer vector-size feasibility is assessed with a broader feasibility check, enabling more loops to be safely vectorized.
Bug Fixes
- Fixed edge cases where vector-size adjustments were overly strict; the planner now safely reduces vector sizes when needed and logs adjustments to preserve correctness.
Chores
- Renamed an internal helper used by the vectorization planner (internal-only).

github-actions · 2026-02-04T05:21:11Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-02-04T05:21:26Z

📝 Walkthrough

Walkthrough

Renamed IndiceCanVectorize → IndicesCanVectorize and replaced strict invariance checks with IndicesCanVectorize(...) calls in memory-buffer revalidation paths; when the check fails, code recomputes vector_size_ via GCD with the buffer’s vector size and logs the adjustment.

Changes

Cohort / File(s)	Summary
Loop vectorize implementation `src/transform/loop_vectorize.cc`	Replaced strict invariance prerequisites with `IndicesCanVectorize(...)` in multiple memory-buffer revalidation branches and the buffer refinement loop; when infeasible, recompute `vector_size_` using GCD with buffer vector size and log the change.
Public header `src/transform/loop_vectorize.h`	Renamed exported function `IndiceCanVectorize(...)` → `IndicesCanVectorize(...)` without signature changes.
Copy node usage `src/op/copy.cc`	Updated two call sites in `CopyNode::LowerLDSMCopy` to call `IndicesCanVectorize(...)` instead of `IndiceCanVectorize(...)` (no other logic changes).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[Enhancement] Refactor vectorization checks in loop_vectorize #1440: Refactors invariant-boundary logic and interacts with the renamed/adjusted vectorization checks.
[Refactor] Backup Analyzer to get the appropriate arith informations #1311: Prior changes to the index-vectorization predicate and analyzer-aware handling that touch the same utility.
[Enhancement] Improve vectorization invariant check #1398: Renames/updates call sites for IndiceCanVectorize and adjusts related invariant checks.

Suggested reviewers

kurisu6912

Poem

🐰 I nibble code in moonlit rows,
I hop through loops where vector grows,
I check the indices, twist and scan,
If bounds don’t match, I GCD my plan,
Now buffers stride in tidy rows!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[Enhancement] Enhance register vectorize inference' is directly related to the main change: improving vectorization planning logic by enhancing buffer index validation across vector sizes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/transform/loop_vectorize.cc`:
- Around line 293-298: The check uses loop_extent_vector_size_ (a gcd) when
calling IndiceCanVectorize, which can under-report the loop extent and yield
false positives; change the call to pass the full loop extent used by
ComputeBufferVectorSize instead of loop_extent_vector_size_ so
IndiceCanVectorize validates all vector blocks; update the conditional around
IsExprInvariantInVectorBoundary and IndiceCanVectorize for elem_offset,
inner_for_->loop_var, vector_size_ to use the real extent value consistent with
ComputeBufferVectorSize.

src/transform/loop_vectorize.cc

SiriusNEO · 2026-02-04T09:51:48Z

@regression-perf

github-actions · 2026-02-04T10:25:54Z

Performance Regression Test Report

Triggered by: @SiriusNEO
Workflow run: https://github.com/tile-ai/tilelang/actions/runs/21666714087

Results

File	Original Latency	Current Latency	Speedup
example_dequant_groupedgemm_bf16_mxfp4_hopper	3.4452	3.51106	0.981241
example_gqa_decode	0.047969	0.048513	0.988787
example_warp_specialize_gemm_softpipe_stage2	0.038817	0.039105	0.992635
example_topk	0.010784	0.010816	0.997041
example_mha_sink_fwd_bhsd_wgmma_pipelined	0.0154061	0.0154489	0.997228
sparse_mla_fwd_pipelined	0.0955275	0.095627	0.998959
example_gqa_bwd	0.0497783	0.0498294	0.998973
example_dequant_gemm_w4a8	5.39271	5.39823	0.998978
example_gqa_sink_fwd_bhsd_wgmma_pipelined	0.0144383	0.0144505	0.999157
example_gqa_sink_fwd_bhsd_wgmma_pipelined_sliding_window	0.0145684	0.0145792	0.999263
example_dynamic	0.656651	0.657037	0.999413
example_tilelang_sparse_gqa_decode_varlen_indice	0.0170519	0.0170618	0.999424
example_dequant_gemm_bf16_mxfp4_hopper	0.508197	0.508462	0.999479
example_gqa_sink_bwd_bhsd	0.041637	0.0416563	0.999537
example_tilelang_gemm_splitk	1.42298	1.4236	0.999562
example_tilelang_sparse_gqa_decode_varlen_mask	0.0234171	0.0234273	0.999566
example_mha_sink_fwd_bhsd_wgmma_pipelined_sliding_window	0.015483	0.015489	0.999611
example_tilelang_nsa_decode	0.00734897	0.00735157	0.999646
example_tilelang_gemm_splitk_vectorize_atomicadd	1.4223	1.42262	0.999771
example_linear_attn_fwd	0.0368961	0.0369039	0.999791
example_mha_bwd_bhsd	0.0406662	0.0406743	0.999802
example_elementwise_add	0.295876	0.29593	0.999819
block_sparse_attn_tilelang	0.0102512	0.0102529	0.999832
example_vertical_slash_sparse_attn	0.237236	0.237273	0.999845
example_mla_decode	0.461257	0.461287	0.999935
example_gemm_schedule	0.0325868	0.0325886	0.999943
example_gemm_intrinsics	0.035072	0.035073	0.999971
example_mha_bwd_bshd	0.0412493	0.0412501	0.99998
example_tilelang_block_sparse_attn	0.0101599	0.0101589	1.0001
example_gqa_bwd_tma_reduce_varlen	0.0521638	0.0521577	1.00012
example_gemv	0.284859	0.284821	1.00013
example_linear_attn_bwd	0.153149	0.153115	1.00023
topk_selector	0.0535497	0.0535362	1.00025
example_group_per_split_token_cast_to_fp8	0.010356	0.0103531	1.00029
example_gqa_sink_bwd_bhsd_sliding_window	0.0255729	0.0255654	1.00029
fp8_lighting_indexer	0.0357631	0.0357491	1.00039
sparse_mla_bwd	0.383107	0.382953	1.0004
example_tilelang_gemm_fp8_intrinsic	0.933258	0.932883	1.0004
tilelang_example_sparse_tensorcore	0.0150401	0.0150339	1.00041
example_mha_sink_fwd_bhsd	0.0158734	0.0158658	1.00048
example_gqa_bwd_wgmma_pipelined	0.0696336	0.0695949	1.00056
example_mha_fwd_varlen	0.0454677	0.0454417	1.00057
example_blocksparse_gemm	0.0226856	0.0226717	1.00061
example_mha_sink_bwd_bhsd	0.0624088	0.0623702	1.00062
example_dequant_gemv_fp16xint4	0.0284425	0.0284243	1.00064
example_tilelang_nsa_fwd	0.00686032	0.00685593	1.00064
example_mha_sink_fwd_bhsd_sliding_window	0.015721	0.0157105	1.00067
example_per_token_cast_to_fp8	0.00740982	0.00740455	1.00071
example_mha_bwd_bshd_wgmma_pipelined	0.0256794	0.0256606	1.00073
example_convolution	1.33427	1.33299	1.00096
sparse_mla_fwd	0.130791	0.130647	1.0011
example_convolution_autotune	0.995691	0.993111	1.0026
example_tilelang_gemm_fp8	0.322486	0.32162	1.00269
example_mha_sink_bwd_bhsd_sliding_window	0.0448627	0.0446871	1.00393
example_gemm_autotune	0.022432	0.022336	1.0043
example_dequant_gemm_fp4_hopper	1.06159	1.05646	1.00486
example_warp_specialize_gemm_copy_1_gemm_0	0.038081	0.037856	1.00594
example_mha_inference	0.079328	0.078817	1.00648
example_dequant_gemm_bf16_fp4_hopper	0.577068	0.573161	1.00682
example_tilelang_gemm_fp8_2xAcc	0.189843	0.188131	1.0091
example_warp_specialize_gemm_barrierpipe_stage2	0.039009	0.038593	1.01078
example_gemm	0.023009	0.022561	1.01986
example_warp_specialize_gemm_copy_0_gemm_1	0.039424	0.038049	1.03614

Artifacts

regression_result.png (speedup plot) is attached as a workflow artifact. Download it from the workflow run page above.

fix

470f4d8

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

src/transform/loop_vectorize.cc Outdated Show resolved Hide resolved

SiriusNEO added 2 commits February 4, 2026 15:16

simplify the constraint

00b7dff

fix docs

7b91065

SiriusNEO merged commit df87c56 into tile-ai:main Feb 4, 2026
6 checks passed

kurisu6912 mentioned this pull request Feb 11, 2026

[LoopVectorize] Loop Independent Var Optimization in IfThenElse Expr kurisu6912/tilelang#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Enhance register vectorize inference #1785

[Enhancement] Enhance register vectorize inference #1785

LeiWang1999 commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

SiriusNEO commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Enhancement] Enhance register vectorize inference #1785

[Enhancement] Enhance register vectorize inference #1785

Conversation

LeiWang1999 commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SiriusNEO commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Performance Regression Test Report

Results

Artifacts

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LeiWang1999 commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading