Skip to content

[Enhancement] Add thread count validation for ReduceOp fragment layout inference#1225

Merged
LeiWang1999 merged 2 commits intotile-ai:mainfrom
LeiWang1999:reduce_1111
Nov 11, 2025
Merged

[Enhancement] Add thread count validation for ReduceOp fragment layout inference#1225
LeiWang1999 merged 2 commits intotile-ai:mainfrom
LeiWang1999:reduce_1111

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Nov 11, 2025

This pull request improves the robustness of layout inference for reduce operations and updates the corresponding test coverage. The most significant change is the addition of a check to ensure that the thread count is divisible by the replicate extent during layout inference, which prevents invalid fragment-to-fragment mappings. Additionally, some test cases that would violate this new constraint have been removed.

Reduce operation layout inference improvements:

  • Added a check in ReduceOpNode::InferLayout (src/op/reduce.cc) to ensure that the thread count (num_threads) is divisible by the replicate extent (replicate_extent). If this condition is not met, an error is raised with a detailed message, preventing invalid fragment layout inference and guiding users on how to resolve the issue.

Testing updates:

  • Removed test cases in test_reduce_sum_shared and test_reduce_max_shared (testing/python/language/test_tilelang_language_reduce.py) that used thread and replicate extent combinations which are now disallowed by the new divisibility check. [1] [2]

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved error handling and validation for thread layout configuration in reduce operations with enhanced error messaging.

…t inference

* Introduced a check to ensure that the thread count is divisible by the replicate extent during layout inference in ReduceOpNode. This validation prevents layout inference failures and provides detailed error messages to guide users in resolving issues related to thread block sizes and fragment layouts.
* Updated tests to remove unsupported configurations that could lead to layout inference errors, ensuring more robust testing scenarios.
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 11, 2025

Walkthrough

The changes add a divisibility guard in reduce operation layout inference using an arithmetic analyzer to verify thread count divisibility by replicate extent, with abort behavior if unproven. Two related test cases are removed from the reduce test suite.

Changes

Cohort / File(s) Summary
Layout divisibility guard
src/op/reduce.cc
Added runtime check in ReduceOpNode::InferLayout using arith::Analyzer to verify thread count divisibility by replicate extent; triggers ICHECK(false) with detailed error message if divisibility cannot be proven
Test case cleanup
testing/python/language/test_tilelang_language_reduce.py
Removed two test invocations: run_reduce_sum(32, 96, mode="ss") from test_reduce_sum_shared and run_shared_reduce(..., 96, 48, "float32") from test_reduce_max_shared

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Review the divisibility check logic and analyzer usage in reduce.cc to confirm correctness
  • Verify that removed test cases are obsolete or no longer applicable given the new guard behavior
  • Ensure error message provides sufficient guidance for users encountering the check failure

Poem

🐰 A guard stands firm with math so true,
Ensuring threads divide just right through—
Old tests retire, their work now done,
Layout safety shines, bright as sun! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly summarizes the main change: adding thread count validation for ReduceOp layout inference, which matches the primary modification in src/op/reduce.cc.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47039f0 and 2938111.

📒 Files selected for processing (2)
  • src/op/reduce.cc (1 hunks)
  • testing/python/language/test_tilelang_language_reduce.py (0 hunks)
💤 Files with no reviewable changes (1)
  • testing/python/language/test_tilelang_language_reduce.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/op/reduce.cc (2)
tilelang/language/kernel.py (1)
  • num_threads (222-226)
src/tl_templates/cuda/reduce.h (1)
  • T (208-280)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint
🔇 Additional comments (1)
src/op/reduce.cc (1)

393-419: Divisibility validation logic is sound.

The bidirectional check correctly ensures that either num_threads is divisible by dest_buffer_rep_extent or vice versa, preventing invalid fragment-to-fragment mappings. The use of arith::Analyzer::CanProve and ICHECK(false) for compile-time validation is appropriate.

Comment on lines +406 to +418
ICHECK(false) << "ReduceOp fragment layout inference failed: "
"num_threads % replicate_extent != 0. "
<< "This mapping requires the block's thread count to be "
"divisible by the "
<< "replicate extent. "
<< "Try one of: (1) choose a thread block size divisible "
"by replicate_extent; "
<< "(2) pick a different reduce dimension or adjust the "
"source fragment layout; "
<< "Details: num_threads=" << num_threads
<< ", replicate_extent=" << indice_rep_extent
<< ", src=" << src << ", dst=" << dst;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error message should print the actual value being checked.

The error message on line 416 prints indice_rep_extent, but the divisibility check validates dest_buffer_rep_extent (which equals indice_rep_extent * src_rep_extent from line 378). This discrepancy could confuse users debugging the error.

Apply this diff to improve error message accuracy:

                      << "Details: num_threads=" << num_threads
-                     << ", replicate_extent=" << indice_rep_extent
+                     << ", replicate_extent=" << dest_buffer_rep_extent
                      << ", src=" << src << ", dst=" << dst;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ICHECK(false) << "ReduceOp fragment layout inference failed: "
"num_threads % replicate_extent != 0. "
<< "This mapping requires the block's thread count to be "
"divisible by the "
<< "replicate extent. "
<< "Try one of: (1) choose a thread block size divisible "
"by replicate_extent; "
<< "(2) pick a different reduce dimension or adjust the "
"source fragment layout; "
<< "Details: num_threads=" << num_threads
<< ", replicate_extent=" << indice_rep_extent
<< ", src=" << src << ", dst=" << dst;
}
ICHECK(false) << "ReduceOp fragment layout inference failed: "
"num_threads % replicate_extent != 0. "
<< "This mapping requires the block's thread count to be "
"divisible by the "
<< "replicate extent. "
<< "Try one of: (1) choose a thread block size divisible "
"by replicate_extent; "
<< "(2) pick a different reduce dimension or adjust the "
"source fragment layout; "
<< "Details: num_threads=" << num_threads
<< ", replicate_extent=" << dest_buffer_rep_extent
<< ", src=" << src << ", dst=" << dst;
}
🤖 Prompt for AI Agents
In src/op/reduce.cc around lines 406 to 418, the error message prints
indice_rep_extent but the divisibility check is actually on
dest_buffer_rep_extent (which equals indice_rep_extent * src_rep_extent); update
the ICHECK message to print dest_buffer_rep_extent (or include both
dest_buffer_rep_extent and indice_rep_extent) so the logged values match the
condition being checked and give accurate debugging info.

@LeiWang1999 LeiWang1999 merged commit 67cc861 into tile-ai:main Nov 11, 2025
7 checks passed
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
…t inference (tile-ai#1225)

* [Enhancement] Add thread count validation for ReduceOp fragment layout inference

* Introduced a check to ensure that the thread count is divisible by the replicate extent during layout inference in ReduceOpNode. This validation prevents layout inference failures and provides detailed error messages to guide users in resolving issues related to thread block sizes and fragment layouts.
* Updated tests to remove unsupported configurations that could lead to layout inference errors, ensuring more robust testing scenarios.

* lint fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant