Skip to content

Conversation

@SiriusNEO
Copy link
Collaborator

@SiriusNEO SiriusNEO commented Jan 19, 2026

For some thread programs (non-tile programs), OOB accesses to non-global buffers may also happen. For better developing these kernels, we add additional warnings if the compiler detects such cases.

Summary by CodeRabbit

  • Chores
    • Improved internal memory access safety validation mechanisms to enhance coverage and robustness.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

📝 Walkthrough

Walkthrough

The primary visitor class in legalize_safe_memory_access.cc is renamed from GlobalMemChecker to SafeMemChecker. The CheckBufferIndices method signature is extended with a throw_warning parameter to split behavior: logging warnings for non-global buffers versus collecting bound violations for rewrite. Buffer validation is expanded from global-only to all buffers with conditional logic based on buffer type.

Changes

Cohort / File(s) Summary
Memory Access Legalization Refactor
src/transform/legalize_safe_memory_access.cc
Class renamed from GlobalMemChecker to SafeMemChecker across all instantiation sites. CheckBufferIndices signature extended with throw_warning parameter to conditionally log warnings (non-global buffers) or collect conditions (global buffers). New IsGlobalBuffer helper method added to determine buffer scope. Buffer validation expanded from global-only to all buffers in both Load and Store operations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A checker's quest to guard the code,
From buffers global, local load,
Now warnings leap and conditions flow,
Safe memory paths the rabbits know! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding compiler warnings for out-of-bounds accesses to non-global buffers, which aligns with the refactoring that extends buffer checking to all buffers and conditionally logs warnings based on whether buffers are global.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/transform/legalize_safe_memory_access.cc (2)

51-58: Stale comment.

The comment "Check if the buffer is in global scope" is outdated. The code now checks all buffers, with behavior varying based on whether the buffer is global. Consider updating it to match the corresponding comment in VisitExpr_ for consistency.

Suggested fix
   void VisitStmt_(const BufferStoreNode *op) final {
-    // Check if the buffer is in global scope
+    // If the buffer is in global scope, we will check its indices and add
+    // corresponding bound checks.
+    // If the buffer is in shared/local, we only log warnings if there
+    // are possible out-of-bounds.
     CheckBufferIndices(op->buffer, op->indices, /*is_load=*/false,
                        !IsGlobalBuffer(op->buffer));

60-69: Use existing utility from src/op/utils.h instead of duplicating.

An IsGlobalBuffer helper already exists in src/op/utils.h with identical logic plus a defensive buffer.defined() check that this implementation lacks. The lengthy comment is also outdated since the helper it mentions already exists.

Suggested fix

Include the header and remove the local definition:

 `#include` "../op/builtin.h"
 `#include` "../op/parallel.h"
+#include "../op/utils.h"
 `#include` "arith/ir_mutator_with_analyzer.h"

Then remove the local IsGlobalBuffer method (lines 60-69) and use the utility directly. The calls at lines 45 and 54 will work unchanged since the function signature is compatible.

@SiriusNEO
Copy link
Collaborator Author

@regression-perf

@github-actions
Copy link

Performance Regression Test Report

Triggered by: @SiriusNEO
Workflow run: https://github.com/tile-ai/tilelang/actions/runs/21129167533

Results

File Original Latency Current Latency Speedup
example_warp_specialize_gemm_softpipe_stage2 0.037344 0.037985 0.983125
example_tilelang_gemm_fp8_2xAcc 0.186531 0.188628 0.988883
example_warp_specialize_gemm_barrierpipe_stage2 0.039744 0.040161 0.989617
example_mha_inference 0.078817 0.079426 0.992332
example_gemm_autotune 0.0224 0.022496 0.995733
example_warp_specialize_gemm_copy_1_gemm_0 0.03824 0.038368 0.996664
example_tilelang_gemm_fp8 0.321524 0.322566 0.996771
example_gqa_bwd_wgmma_pipelined 0.0698203 0.0699543 0.998085
example_gqa_bwd 0.0498154 0.0498968 0.998369
example_tilelang_gemm_fp8_intrinsic 0.466907 0.467253 0.99926
example_mha_bwd_bshd_wgmma_pipelined 0.0262726 0.0262868 0.999461
example_tilelang_gemm_splitk 1.4221 1.42241 0.999782
example_mha_fwd_varlen 0.0455011 0.0455054 0.999906
example_vertical_slash_sparse_attn 0.237488 0.237496 0.999963
example_tilelang_gemm_splitk_vectorize_atomicadd 1.4236 1.4236 0.999999
example_elementwise_add 0.297514 0.297514 1
example_linear_attn_fwd 0.037014 0.0370112 1.00008
example_linear_attn_bwd 0.153596 0.153575 1.00014
tilelang_example_sparse_tensorcore 0.0150872 0.0150823 1.00032
example_gemv 0.289077 0.288981 1.00033
example_mha_bwd_bhsd 0.040705 0.0406896 1.00038
example_gemm_schedule 0.0326375 0.0326251 1.00038
block_sparse_attn_tilelang 0.0103021 0.0102979 1.0004
example_dequant_groupedgemm_bf16_mxfp4_hopper 3.52274 3.52125 1.00042
example_mha_bwd_bshd 0.0412918 0.0412701 1.00053
example_gqa_bwd_tma_reduce_varlen 0.0523562 0.0523157 1.00077
example_dequant_gemv_fp16xint4 0.0284779 0.0284358 1.00148
example_dynamic 0.657515 0.65608 1.00219
example_mha_sink_fwd_bhsd_sliding_window 0.0157356 0.015695 1.00259
example_mha_sink_bwd_bhsd_sliding_window 0.0450539 0.0449291 1.00278
example_per_token_cast_to_fp8 0.00739417 0.00737235 1.00296
example_gemm_intrinsics 0.035136 0.035008 1.00366
example_tilelang_nsa_fwd 0.00704099 0.00701414 1.00383
example_group_per_split_token_cast_to_fp8 0.010441 0.0103925 1.00467
example_mha_sink_fwd_bhsd_wgmma_pipelined 0.015556 0.0154759 1.00518
example_warp_specialize_gemm_copy_0_gemm_1 0.039905 0.03968 1.00567
example_topk 0.011168 0.011104 1.00576
example_tilelang_nsa_decode 0.00742707 0.00738411 1.00582
example_mha_sink_fwd_bhsd 0.0158063 0.015707 1.00632
fp8_lighting_indexer 0.0362201 0.0359827 1.0066
example_blocksparse_gemm 0.0228624 0.0227048 1.00694
example_gqa_decode 0.049025 0.048673 1.00723
example_gqa_sink_fwd_bhsd_wgmma_pipelined 0.0145715 0.0144668 1.00724
example_gqa_sink_fwd_bhsd_wgmma_pipelined_sliding_window 0.0147135 0.0146051 1.00743
example_mha_sink_fwd_bhsd_wgmma_pipelined_sliding_window 0.0156038 0.0154845 1.0077
example_tilelang_block_sparse_attn 0.0102435 0.0101648 1.00775
example_tilelang_sparse_gqa_decode_varlen_indice 0.0173402 0.0172053 1.00784
sparse_mla_fwd_pipelined 0.0954335 0.0946726 1.00804
example_dequant_gemm_w4a8 5.43957 5.39553 1.00816
example_tilelang_sparse_gqa_decode_varlen_mask 0.0242647 0.0240599 1.00851
example_mha_sink_bwd_bhsd 0.0629439 0.0624104 1.00855
topk_selector 0.0544383 0.053973 1.00862
example_convolution_autotune 0.999379 0.990828 1.00863
example_dequant_gemm_bf16_fp4_hopper 0.580685 0.57566 1.00873
sparse_mla_bwd 0.387856 0.38437 1.00907
sparse_mla_fwd 0.132295 0.131076 1.0093
example_dequant_gemm_bf16_mxfp4_hopper 0.514795 0.509728 1.00994
example_gqa_sink_bwd_bhsd_sliding_window 0.0258855 0.0256301 1.00997
example_convolution 1.34591 1.33148 1.01084
example_gqa_sink_bwd_bhsd 0.0421948 0.0417009 1.01184
example_mla_decode 0.467592 0.461353 1.01352
example_gemm 0.023009 0.022688 1.01415
example_dequant_gemm_fp4_hopper 1.09236 1.06338 1.02726

Artifacts

  • regression_result.png (speedup plot) is attached as a workflow artifact. Download it from the workflow run page above.

@LeiWang1999 LeiWang1999 merged commit a19206a into tile-ai:main Jan 19, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants