Skip to content

[BugFix] Corrected when proving loop layout contains a fragment buffer layout#1708

Merged
LeiWang1999 merged 1 commit intotile-ai:mainfrom
LeiWang1999:fix_0121
Jan 21, 2026
Merged

[BugFix] Corrected when proving loop layout contains a fragment buffer layout#1708
LeiWang1999 merged 1 commit intotile-ai:mainfrom
LeiWang1999:fix_0121

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Jan 21, 2026

Only prove loop contain loop we need index checking.

Summary by CodeRabbit

  • Refactor
    • Updated validation logic to use default forward-index checking behavior during candidate validation and selection, resulting in improved control flow handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 21, 2026

📝 Walkthrough

Walkthrough

The change removes explicit check_forward_index parameter passing from two ProveFragmentContains function calls in parallel operation candidate validation and selection logic, relying on default parameter behavior instead. The modification affects containment validation flow and error detection in ParallelOpNode.

Changes

Cohort / File(s) Summary
Forward-Index Validation Simplification
src/op/parallel.cc
Removed explicit check_forward_index parameter from ProveFragmentContains calls in ValidateCandidateAgainstFragments and the contains lambda within ChooseBestCandidate, delegating to default forward-index checking behavior

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A parameter departed with grace,
Default behavior takes its place,
Validation flows with less to say,
Simpler paths lead the way! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: removing the explicit check_forward_index parameter from ProveFragmentContains calls to correct fragment layout containment validation behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/op/parallel.cc`:
- Around line 649-652: The comment above the ProveFragmentContains call is stale
and misleading because the call now omits the check_forward_index parameter
(defaulting to false); update or remove the comment to reflect that
forward-index checking is not performed when validating a loop against a buffer
fragment. Locate the comment near the ProveFragmentContains(candidate, fragment,
vars, indice_map_[buffer], analyzer_) invocation and either delete the lines
referencing check_forward_index=true or rewrite them to state that forward index
checking is intentionally disabled here and only used when comparing two loop
layouts.

Comment on lines 649 to +652
// check_forward_index=true: when validating loop layout against buffer
// fragment, we need to ensure physical indices match for correct code gen.
if (!ProveFragmentContains(candidate, fragment, vars, indice_map_[buffer],
analyzer_, /*check_forward_index=*/true)) {
analyzer_)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale comment contradicts updated code behavior.

The comment at lines 649-650 states check_forward_index=true is needed when validating loop layout against buffer fragment, but the code now omits this parameter (defaulting to false). Per the PR description ("Only prove loop contain loop we need index checking"), this appears intentional—forward index checking should only apply when comparing two loop layouts, not loop-vs-buffer.

Please update or remove the comment to reflect the corrected logic.

Suggested fix
     auto fragment = T.layout_map[buffer].as<Fragment>().value();
-    // check_forward_index=true: when validating loop layout against buffer
-    // fragment, we need to ensure physical indices match for correct code gen.
+    // check_forward_index defaults to false: forward-index equality is only
+    // required when comparing two loop layouts, not loop-vs-buffer fragment.
     if (!ProveFragmentContains(candidate, fragment, vars, indice_map_[buffer],
                                analyzer_)) {
🤖 Prompt for AI Agents
In `@src/op/parallel.cc` around lines 649 - 652, The comment above the
ProveFragmentContains call is stale and misleading because the call now omits
the check_forward_index parameter (defaulting to false); update or remove the
comment to reflect that forward-index checking is not performed when validating
a loop against a buffer fragment. Locate the comment near the
ProveFragmentContains(candidate, fragment, vars, indice_map_[buffer], analyzer_)
invocation and either delete the lines referencing check_forward_index=true or
rewrite them to state that forward index checking is intentionally disabled here
and only used when comparing two loop layouts.

@LeiWang1999
Copy link
Member Author

@regression-perf

@github-actions
Copy link

Performance Regression Test Report

Triggered by: @LeiWang1999
Workflow run: https://github.com/tile-ai/tilelang/actions/runs/21215667446

Results

File Original Latency Current Latency Speedup
example_warp_specialize_gemm_barrierpipe_stage2 0.038913 0.039745 0.979067
example_warp_specialize_gemm_softpipe_stage2 0.038848 0.039616 0.980614
example_mha_fwd_varlen 0.0454889 0.0458049 0.993101
example_mha_bwd_bshd_wgmma_pipelined 0.0262829 0.026463 0.993193
example_warp_specialize_gemm_copy_1_gemm_0 0.039104 0.039328 0.994304
example_gemm_autotune 0.022304 0.022368 0.997139
example_dequant_groupedgemm_bf16_mxfp4_hopper 3.50502 3.51288 0.997762
example_gqa_bwd_wgmma_pipelined 0.069753 0.0699044 0.997835
example_mha_bwd_bhsd 0.0407006 0.0407725 0.998239
example_gemm 0.022784 0.022817 0.998554
example_elementwise_add 0.297581 0.297931 0.998825
example_tilelang_gemm_splitk 1.42209 1.42358 0.998954
example_gemm_intrinsics 0.035072 0.035104 0.999088
block_sparse_attn_tilelang 0.0102884 0.0102976 0.999106
example_linear_attn_bwd 0.153622 0.15375 0.999165
example_mha_inference 0.081155 0.081217 0.999237
example_per_token_cast_to_fp8 0.0074654 0.00746935 0.999471
example_vertical_slash_sparse_attn 0.237336 0.23745 0.999521
tilelang_example_sparse_tensorcore 0.0150891 0.015094 0.999672
example_dynamic 0.656425 0.65649 0.999901
example_gqa_bwd_tma_reduce_varlen 0.0523199 0.052319 1.00002
example_tilelang_gemm_splitk_vectorize_atomicadd 1.42296 1.4224 1.00039
example_gqa_decode 0.048769 0.048737 1.00066
example_gqa_bwd 0.0498053 0.0497723 1.00066
example_linear_attn_fwd 0.037022 0.036983 1.00106
example_dequant_gemv_fp16xint4 0.0285811 0.0285406 1.00142
example_mha_bwd_bshd 0.0412908 0.0411898 1.00245
example_topk 0.010976 0.010944 1.00292
example_gemm_schedule 0.0325826 0.0324529 1.00399
example_gemv 0.28905 0.287707 1.00467
example_convolution_autotune 1.00065 0.995532 1.00514
example_warp_specialize_gemm_copy_0_gemm_1 0.039488 0.039074 1.0106
example_tilelang_nsa_fwd 0.00712779 0.00702128 1.01517
sparse_mla_fwd_pipelined 0.0971063 0.0955768 1.016
example_tilelang_nsa_decode 0.00750101 0.00738175 1.01616
example_dequant_gemm_w4a8 5.51353 5.39829 1.02135
example_dequant_gemm_fp4_hopper 1.08338 1.06033 1.02174
topk_selector 0.0554005 0.054102 1.024
example_tilelang_block_sparse_attn 0.0104986 0.0101735 1.03195
sparse_mla_fwd 0.13648 0.132156 1.03272
example_tilelang_sparse_gqa_decode_varlen_indice 0.0177937 0.017219 1.03338
sparse_mla_bwd 0.399511 0.385703 1.0358
example_dequant_gemm_bf16_fp4_hopper 0.600715 0.579627 1.03638
example_mha_sink_fwd_bhsd_wgmma_pipelined_sliding_window 0.0161235 0.015538 1.03768
example_tilelang_sparse_gqa_decode_varlen_mask 0.0248859 0.0239705 1.03819
example_mha_sink_fwd_bhsd_wgmma_pipelined 0.0160656 0.0154732 1.03828
fp8_lighting_indexer 0.0374487 0.0360614 1.03847
example_blocksparse_gemm 0.0236703 0.0227848 1.03886
example_dequant_gemm_bf16_mxfp4_hopper 0.529002 0.508459 1.0404
example_mha_sink_fwd_bhsd_sliding_window 0.0163796 0.0157415 1.04054
example_mha_sink_fwd_bhsd 0.016567 0.0159088 1.04137
example_gqa_sink_fwd_bhsd_wgmma_pipelined 0.0150825 0.0144752 1.04195
example_gqa_sink_fwd_bhsd_wgmma_pipelined_sliding_window 0.0152375 0.0146223 1.04207
example_mha_sink_bwd_bhsd_sliding_window 0.0469623 0.0449966 1.04368
example_gqa_sink_bwd_bhsd_sliding_window 0.0268799 0.0256366 1.0485
example_gqa_sink_bwd_bhsd 0.0438531 0.0416927 1.05182
example_group_per_split_token_cast_to_fp8 0.0109323 0.0103842 1.05278
example_mha_sink_bwd_bhsd 0.0658454 0.062417 1.05493
example_mla_decode 0.487754 0.461351 1.05723
example_convolution 1.41462 1.33305 1.06119

Artifacts

  • regression_result.png (speedup plot) is attached as a workflow artifact. Download it from the workflow run page above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant