[BugFix] Corrected when proving loop layout contains a fragment buffer layout by LeiWang1999 · Pull Request #1708 · tile-ai/tilelang

LeiWang1999 · 2026-01-21T15:02:05Z

Only prove loop contain loop we need index checking.

Summary by CodeRabbit

Refactor
- Updated validation logic to use default forward-index checking behavior during candidate validation and selection, resulting in improved control flow handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…n ParallelOpNode (tile-ai#1706)

github-actions · 2026-01-21T15:02:16Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-01-21T15:02:32Z

📝 Walkthrough

Walkthrough

The change removes explicit check_forward_index parameter passing from two ProveFragmentContains function calls in parallel operation candidate validation and selection logic, relying on default parameter behavior instead. The modification affects containment validation flow and error detection in ParallelOpNode.

Changes

Cohort / File(s)	Summary
Forward-Index Validation Simplification `src/op/parallel.cc`	Removed explicit `check_forward_index` parameter from `ProveFragmentContains` calls in `ValidateCandidateAgainstFragments` and the `contains` lambda within `ChooseBestCandidate`, delegating to default forward-index checking behavior

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A parameter departed with grace,
Default behavior takes its place,
Validation flows with less to say,
Simpler paths lead the way! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: removing the explicit check_forward_index parameter from ProveFragmentContains calls to correct fragment layout containment validation behavior.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/op/parallel.cc`:
- Around line 649-652: The comment above the ProveFragmentContains call is stale
and misleading because the call now omits the check_forward_index parameter
(defaulting to false); update or remove the comment to reflect that
forward-index checking is not performed when validating a loop against a buffer
fragment. Locate the comment near the ProveFragmentContains(candidate, fragment,
vars, indice_map_[buffer], analyzer_) invocation and either delete the lines
referencing check_forward_index=true or rewrite them to state that forward index
checking is intentionally disabled here and only used when comparing two loop
layouts.

coderabbitai · 2026-01-21T15:05:33Z

src/op/parallel.cc

    // check_forward_index=true: when validating loop layout against buffer
    // fragment, we need to ensure physical indices match for correct code gen.
    if (!ProveFragmentContains(candidate, fragment, vars, indice_map_[buffer],
-                               analyzer_, /*check_forward_index=*/true)) {
+                               analyzer_)) {


⚠️ Potential issue | 🟡 Minor

Stale comment contradicts updated code behavior.

The comment at lines 649-650 states check_forward_index=true is needed when validating loop layout against buffer fragment, but the code now omits this parameter (defaulting to false). Per the PR description ("Only prove loop contain loop we need index checking"), this appears intentional—forward index checking should only apply when comparing two loop layouts, not loop-vs-buffer.

Please update or remove the comment to reflect the corrected logic.

Suggested fix

auto fragment = T.layout_map[buffer].as<Fragment>().value(); - // check_forward_index=true: when validating loop layout against buffer - // fragment, we need to ensure physical indices match for correct code gen. + // check_forward_index defaults to false: forward-index equality is only + // required when comparing two loop layouts, not loop-vs-buffer fragment. if (!ProveFragmentContains(candidate, fragment, vars, indice_map_[buffer], analyzer_)) {

🤖 Prompt for AI Agents

In `@src/op/parallel.cc` around lines 649 - 652, The comment above the ProveFragmentContains call is stale and misleading because the call now omits the check_forward_index parameter (defaulting to false); update or remove the comment to reflect that forward-index checking is not performed when validating a loop against a buffer fragment. Locate the comment near the ProveFragmentContains(candidate, fragment, vars, indice_map_[buffer], analyzer_) invocation and either delete the lines referencing check_forward_index=true or rewrite them to state that forward index checking is intentionally disabled here and only used when comparing two loop layouts.

LeiWang1999 · 2026-01-21T15:37:36Z

@regression-perf

github-actions · 2026-01-21T16:20:50Z

Performance Regression Test Report

Triggered by: @LeiWang1999
Workflow run: https://github.com/tile-ai/tilelang/actions/runs/21215667446

Results

File	Original Latency	Current Latency	Speedup
example_warp_specialize_gemm_barrierpipe_stage2	0.038913	0.039745	0.979067
example_warp_specialize_gemm_softpipe_stage2	0.038848	0.039616	0.980614
example_mha_fwd_varlen	0.0454889	0.0458049	0.993101
example_mha_bwd_bshd_wgmma_pipelined	0.0262829	0.026463	0.993193
example_warp_specialize_gemm_copy_1_gemm_0	0.039104	0.039328	0.994304
example_gemm_autotune	0.022304	0.022368	0.997139
example_dequant_groupedgemm_bf16_mxfp4_hopper	3.50502	3.51288	0.997762
example_gqa_bwd_wgmma_pipelined	0.069753	0.0699044	0.997835
example_mha_bwd_bhsd	0.0407006	0.0407725	0.998239
example_gemm	0.022784	0.022817	0.998554
example_elementwise_add	0.297581	0.297931	0.998825
example_tilelang_gemm_splitk	1.42209	1.42358	0.998954
example_gemm_intrinsics	0.035072	0.035104	0.999088
block_sparse_attn_tilelang	0.0102884	0.0102976	0.999106
example_linear_attn_bwd	0.153622	0.15375	0.999165
example_mha_inference	0.081155	0.081217	0.999237
example_per_token_cast_to_fp8	0.0074654	0.00746935	0.999471
example_vertical_slash_sparse_attn	0.237336	0.23745	0.999521
tilelang_example_sparse_tensorcore	0.0150891	0.015094	0.999672
example_dynamic	0.656425	0.65649	0.999901
example_gqa_bwd_tma_reduce_varlen	0.0523199	0.052319	1.00002
example_tilelang_gemm_splitk_vectorize_atomicadd	1.42296	1.4224	1.00039
example_gqa_decode	0.048769	0.048737	1.00066
example_gqa_bwd	0.0498053	0.0497723	1.00066
example_linear_attn_fwd	0.037022	0.036983	1.00106
example_dequant_gemv_fp16xint4	0.0285811	0.0285406	1.00142
example_mha_bwd_bshd	0.0412908	0.0411898	1.00245
example_topk	0.010976	0.010944	1.00292
example_gemm_schedule	0.0325826	0.0324529	1.00399
example_gemv	0.28905	0.287707	1.00467
example_convolution_autotune	1.00065	0.995532	1.00514
example_warp_specialize_gemm_copy_0_gemm_1	0.039488	0.039074	1.0106
example_tilelang_nsa_fwd	0.00712779	0.00702128	1.01517
sparse_mla_fwd_pipelined	0.0971063	0.0955768	1.016
example_tilelang_nsa_decode	0.00750101	0.00738175	1.01616
example_dequant_gemm_w4a8	5.51353	5.39829	1.02135
example_dequant_gemm_fp4_hopper	1.08338	1.06033	1.02174
topk_selector	0.0554005	0.054102	1.024
example_tilelang_block_sparse_attn	0.0104986	0.0101735	1.03195
sparse_mla_fwd	0.13648	0.132156	1.03272
example_tilelang_sparse_gqa_decode_varlen_indice	0.0177937	0.017219	1.03338
sparse_mla_bwd	0.399511	0.385703	1.0358
example_dequant_gemm_bf16_fp4_hopper	0.600715	0.579627	1.03638
example_mha_sink_fwd_bhsd_wgmma_pipelined_sliding_window	0.0161235	0.015538	1.03768
example_tilelang_sparse_gqa_decode_varlen_mask	0.0248859	0.0239705	1.03819
example_mha_sink_fwd_bhsd_wgmma_pipelined	0.0160656	0.0154732	1.03828
fp8_lighting_indexer	0.0374487	0.0360614	1.03847
example_blocksparse_gemm	0.0236703	0.0227848	1.03886
example_dequant_gemm_bf16_mxfp4_hopper	0.529002	0.508459	1.0404
example_mha_sink_fwd_bhsd_sliding_window	0.0163796	0.0157415	1.04054
example_mha_sink_fwd_bhsd	0.016567	0.0159088	1.04137
example_gqa_sink_fwd_bhsd_wgmma_pipelined	0.0150825	0.0144752	1.04195
example_gqa_sink_fwd_bhsd_wgmma_pipelined_sliding_window	0.0152375	0.0146223	1.04207
example_mha_sink_bwd_bhsd_sliding_window	0.0469623	0.0449966	1.04368
example_gqa_sink_bwd_bhsd_sliding_window	0.0268799	0.0256366	1.0485
example_gqa_sink_bwd_bhsd	0.0438531	0.0416927	1.05182
example_group_per_split_token_cast_to_fp8	0.0109323	0.0103842	1.05278
example_mha_sink_bwd_bhsd	0.0658454	0.062417	1.05493
example_mla_decode	0.487754	0.461351	1.05723
example_convolution	1.41462	1.33305	1.06119

Artifacts

regression_result.png (speedup plot) is attached as a workflow artifact. Download it from the workflow run page above.

[BugFix] Corrected argument handling in ProveFragmentContains calls i…

43ce185

…n ParallelOpNode (tile-ai#1706)

coderabbitai bot reviewed Jan 21, 2026

View reviewed changes

LeiWang1999 merged commit 88ae8a8 into tile-ai:main Jan 21, 2026
7 checks passed

This was referenced Jan 21, 2026

[Bugfix] Improve robustness of ProveFragmentContains with fully replicated layout #1709

Merged

[Bugfix] revert incorrect fast path for parallel layout inference #1730

Merged

[Layout] Fix Layout Bugs in Parallel and Reduce #1713

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Corrected when proving loop layout contains a fragment buffer layout#1708

[BugFix] Corrected when proving loop layout contains a fragment buffer layout#1708
LeiWang1999 merged 1 commit intotile-ai:mainfrom
LeiWang1999:fix_0121

LeiWang1999 commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

coderabbitai bot commented Jan 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 21, 2026

Uh oh!

LeiWang1999 commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LeiWang1999 commented Jan 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

coderabbitai bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

LeiWang1999 commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Performance Regression Test Report

Results

Artifacts

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 21, 2026 •

edited

Loading